* [PATCH 0/2] KVM: arm64: Filtering PMU events
From: Marc Zyngier @ 2020-02-14 18:36 UTC (permalink / raw)
To: linux-arm-kernel, kvmarm, kvm
Cc: James Morse, Julien Thierry, Suzuki K Poulose
It is at times necessary to prevent a guest from being able to sample
certain events if multiple CPUs share resources such as a cache level.
In this case, it would be interesting if the VMM could simply prevent
certain events from being counted instead of simply not exposing a PMU.
Given that most events are not architected, there is no easy way
to designate which events shouldn't be counted other than specifying
the raw event number.
Since I have no idea whether it is better to use an event whitelist
or blacklist, the proposed API takes a cue from the x86 version and
allows either allowing or denying counting of ranges of events.
The event space being pretty large (16bits on ARMv8.1), the default
policy is set by the first filter that gets installed (default deny if
we first allow, default allow if we first deny).
The filter state is global to the guest, despite the PMU being per
CPU. I'm not sure whether it would be worth it making it CPU-private.
Anyway, I'd be interesting in comments on how people would use this.
I'll try to push a patch against kvmtool that implement this shortly
(what I have currently is a harcoded set of hacks).
Marc Zyngier (2):
KVM: arm64: Add PMU event filtering infrastructure
KVM: arm64: Document PMU filtering API
Documentation/virt/kvm/devices/vcpu.txt | 28 +++++++++
arch/arm64/include/asm/kvm_host.h | 6 ++
arch/arm64/include/uapi/asm/kvm.h | 16 ++++++
virt/kvm/arm/arm.c | 2 +
virt/kvm/arm/pmu.c | 76 +++++++++++++++++++++----
5 files changed, 116 insertions(+), 12 deletions(-)
--
2.20.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* [PATCH 1/2] KVM: arm64: Add PMU event filtering infrastructure
From: Marc Zyngier @ 2020-02-14 18:36 UTC (permalink / raw)
To: linux-arm-kernel, kvmarm, kvm
Cc: James Morse, Julien Thierry, Suzuki K Poulose
In-Reply-To: <20200214183615.25498-1-maz@kernel.org>
It can be desirable to expose a PMU to a guest, and yet not want the
guest to be able to count some of the implemented events (because this
would give information on shared resources, for example.
For this, let's extend the PMUv3 device API, and offer a way to setup a
bitmap of the allowed events (the default being no bitmap, and thus no
filtering).
Userspace can thus allow/deny ranges of event. The default policy
depends on the "polarity" of the first filter setup (default deny if the
filter allows events, and default allow if the filter denies events).
This allows to setup exactly what is allowed for a given guest.
Note that although the ioctl is per-vcpu, the map of allowed events is
global to the VM (it can be setup from any vcpu until the vcpu PMU is
initialized).
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/include/asm/kvm_host.h | 6 +++
arch/arm64/include/uapi/asm/kvm.h | 16 +++++++
virt/kvm/arm/arm.c | 2 +
virt/kvm/arm/pmu.c | 76 ++++++++++++++++++++++++++-----
4 files changed, 88 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d87aa609d2b6..b594c604215f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -91,6 +91,12 @@ struct kvm_arch {
* supported.
*/
bool return_nisv_io_abort_to_user;
+
+ /*
+ * VM-wide PMU filter, implemented as a bitmap and big enough
+ * for up to 65536 events
+ */
+ unsigned long *pmu_filter;
};
#define KVM_NR_MEM_OBJS 40
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index ba85bb23f060..7b1511d6ce44 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -159,6 +159,21 @@ struct kvm_sync_regs {
struct kvm_arch_memory_slot {
};
+/*
+ * PMU filter structure. Describe a range of events with a particular
+ * action. To be used with KVM_ARM_VCPU_PMU_V3_FILTER.
+ */
+struct kvm_pmu_event_filter {
+ __u16 base_event;
+ __u16 nevents;
+
+#define KVM_PMU_EVENT_ALLOW 0
+#define KVM_PMU_EVENT_DENY 1
+
+ __u8 action;
+ __u8 pad[3];
+};
+
/* for KVM_GET/SET_VCPU_EVENTS */
struct kvm_vcpu_events {
struct {
@@ -329,6 +344,7 @@ struct kvm_vcpu_events {
#define KVM_ARM_VCPU_PMU_V3_CTRL 0
#define KVM_ARM_VCPU_PMU_V3_IRQ 0
#define KVM_ARM_VCPU_PMU_V3_INIT 1
+#define KVM_ARM_VCPU_PMU_V3_FILTER 2
#define KVM_ARM_VCPU_TIMER_CTRL 1
#define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0
#define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index d65a0faa46d8..9132f7508975 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -164,6 +164,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
free_percpu(kvm->arch.last_vcpu_ran);
kvm->arch.last_vcpu_ran = NULL;
+ bitmap_free(kvm->arch.pmu_filter);
+
for (i = 0; i < KVM_MAX_VCPUS; ++i) {
if (kvm->vcpus[i]) {
kvm_vcpu_destroy(kvm->vcpus[i]);
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index f0d0312c0a55..817f341ed9ad 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -585,6 +585,14 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
pmc->idx != ARMV8_PMU_CYCLE_IDX)
return;
+ /*
+ * If we have a filter in place and that the event isn't allowed, do
+ * not install a perf event either.
+ */
+ if (vcpu->kvm->arch.pmu_filter &&
+ !test_bit(eventsel, vcpu->kvm->arch.pmu_filter))
+ return;
+
memset(&attr, 0, sizeof(struct perf_event_attr));
attr.type = PERF_TYPE_RAW;
attr.size = sizeof(attr);
@@ -735,15 +743,6 @@ int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu)
{
- if (!kvm_arm_support_pmu_v3())
- return -ENODEV;
-
- if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
- return -ENXIO;
-
- if (vcpu->arch.pmu.created)
- return -EBUSY;
-
if (irqchip_in_kernel(vcpu->kvm)) {
int ret;
@@ -794,8 +793,19 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
return true;
}
+#define NR_EVENTS (ARMV8_PMU_EVTYPE_EVENT + 1)
+
int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
{
+ if (!kvm_arm_support_pmu_v3())
+ return -ENODEV;
+
+ if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
+ return -ENODEV;
+
+ if (vcpu->arch.pmu.created)
+ return -EBUSY;
+
switch (attr->attr) {
case KVM_ARM_VCPU_PMU_V3_IRQ: {
int __user *uaddr = (int __user *)(long)attr->addr;
@@ -804,9 +814,6 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
if (!irqchip_in_kernel(vcpu->kvm))
return -EINVAL;
- if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
- return -ENODEV;
-
if (get_user(irq, uaddr))
return -EFAULT;
@@ -824,6 +831,50 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
vcpu->arch.pmu.irq_num = irq;
return 0;
}
+ case KVM_ARM_VCPU_PMU_V3_FILTER: {
+ struct kvm_pmu_event_filter __user *uaddr;
+ struct kvm_pmu_event_filter filter;
+
+ uaddr = (struct kvm_pmu_event_filter __user *)(long)attr->addr;
+
+ if (copy_from_user(&filter, uaddr, sizeof(filter)))
+ return -EFAULT;
+
+ if (((u32)filter.base_event + filter.nevents) > NR_EVENTS ||
+ (filter.action != KVM_PMU_EVENT_ALLOW &&
+ filter.action != KVM_PMU_EVENT_DENY))
+ return -EINVAL;
+
+ mutex_lock(&vcpu->kvm->lock);
+
+ if (!vcpu->kvm->arch.pmu_filter) {
+ vcpu->kvm->arch.pmu_filter = bitmap_alloc(NR_EVENTS, GFP_KERNEL);
+ if (!vcpu->kvm->arch.pmu_filter) {
+ mutex_unlock(&vcpu->kvm->lock);
+ return -ENOMEM;
+ }
+
+ /*
+ * The default depends on the first applied filter.
+ * If it allows events, the default is to deny.
+ * Conversely, if the first filter denies a set of
+ * events, the default is to allow.
+ */
+ if (filter.action == KVM_PMU_EVENT_ALLOW)
+ bitmap_zero(vcpu->kvm->arch.pmu_filter, NR_EVENTS);
+ else
+ bitmap_fill(vcpu->kvm->arch.pmu_filter, NR_EVENTS);
+ }
+
+ if (filter.action == KVM_PMU_EVENT_ALLOW)
+ bitmap_set(vcpu->kvm->arch.pmu_filter, filter.base_event, filter.nevents);
+ else
+ bitmap_clear(vcpu->kvm->arch.pmu_filter, filter.base_event, filter.nevents);
+
+ mutex_unlock(&vcpu->kvm->lock);
+
+ return 0;
+ }
case KVM_ARM_VCPU_PMU_V3_INIT:
return kvm_arm_pmu_v3_init(vcpu);
}
@@ -860,6 +911,7 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
switch (attr->attr) {
case KVM_ARM_VCPU_PMU_V3_IRQ:
case KVM_ARM_VCPU_PMU_V3_INIT:
+ case KVM_ARM_VCPU_PMU_V3_FILTER:
if (kvm_arm_support_pmu_v3() &&
test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
return 0;
--
2.20.1
^ permalink raw reply related
* [PATCH 2/2] KVM: arm64: Document PMU filtering API
From: Marc Zyngier @ 2020-02-14 18:36 UTC (permalink / raw)
To: linux-arm-kernel, kvmarm, kvm
Cc: James Morse, Julien Thierry, Suzuki K Poulose
In-Reply-To: <20200214183615.25498-1-maz@kernel.org>
Add a small blurb describing how the event filtering API gets used.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
Documentation/virt/kvm/devices/vcpu.txt | 28 +++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/Documentation/virt/kvm/devices/vcpu.txt b/Documentation/virt/kvm/devices/vcpu.txt
index 6f3bd64a05b0..76902e6bbc05 100644
--- a/Documentation/virt/kvm/devices/vcpu.txt
+++ b/Documentation/virt/kvm/devices/vcpu.txt
@@ -36,6 +36,34 @@ Request the initialization of the PMUv3. If using the PMUv3 with an in-kernel
virtual GIC implementation, this must be done after initializing the in-kernel
irqchip.
+1.3 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_FILTER
+Parameters: in kvm_device_attr.addr the address for a PMU event filter is a
+ pointer to a struct kvm_pmu_event_filter
+Returns: -ENODEV: PMUv3 not supported or GIC not initialized
+ -ENXIO: PMUv3 not properly configured or in-kernel irqchip not
+ configured as required prior to calling this attribute
+ -EBUSY: PMUv3 already initialized
+
+Request the installation of a PMU event filter describe as follows:
+
+struct kvm_pmu_event_filter {
+ __u16 base_event;
+ __u16 nevents;
+
+#define KVM_PMU_EVENT_ALLOW 0
+#define KVM_PMU_EVENT_DENY 1
+
+ __u8 action;
+ __u8 pad[3];
+};
+
+A filter range is defined as the range [base_event, base_event + nevents[,
+together with an action (KVM_PMU_EVENT_ALLOW or KVM_PMU_EVENT_DENY). The
+first registered range defines the global policy (global ALLOW if the first
+action is DENY, global DENY if the first action is ALLOW). Multiple ranges
+can be programmed, and but fit within the 16bit space defined by the ARMv8.1
+PMU architecture.
+
2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
Architectures: ARM,ARM64
--
2.20.1
^ permalink raw reply related
* [PATCH 0/2] KVM: arm64: Filtering PMU events
From: Marc Zyngier @ 2020-02-14 18:36 UTC (permalink / raw)
To: linux-arm-kernel, kvmarm, kvm
Cc: James Morse, Julien Thierry, Suzuki K Poulose
It is at times necessary to prevent a guest from being able to sample
certain events if multiple CPUs share resources such as a cache level.
In this case, it would be interesting if the VMM could simply prevent
certain events from being counted instead of simply not exposing a PMU.
Given that most events are not architected, there is no easy way
to designate which events shouldn't be counted other than specifying
the raw event number.
Since I have no idea whether it is better to use an event whitelist
or blacklist, the proposed API takes a cue from the x86 version and
allows either allowing or denying counting of ranges of events.
The event space being pretty large (16bits on ARMv8.1), the default
policy is set by the first filter that gets installed (default deny if
we first allow, default allow if we first deny).
The filter state is global to the guest, despite the PMU being per
CPU. I'm not sure whether it would be worth it making it CPU-private.
Anyway, I'd be interesting in comments on how people would use this.
I'll try to push a patch against kvmtool that implement this shortly
(what I have currently is a harcoded set of hacks).
Marc Zyngier (2):
KVM: arm64: Add PMU event filtering infrastructure
KVM: arm64: Document PMU filtering API
Documentation/virt/kvm/devices/vcpu.txt | 28 +++++++++
arch/arm64/include/asm/kvm_host.h | 6 ++
arch/arm64/include/uapi/asm/kvm.h | 16 ++++++
virt/kvm/arm/arm.c | 2 +
virt/kvm/arm/pmu.c | 76 +++++++++++++++++++++----
5 files changed, 116 insertions(+), 12 deletions(-)
--
2.20.1
^ permalink raw reply
* [Intel-gfx] ✗ Fi.CI.BUILD: failure for tests/i915/gem_exec_store: remove hard coded engine limit
From: Patchwork @ 2020-02-14 18:35 UTC (permalink / raw)
To: Matthew Auld; +Cc: intel-gfx
In-Reply-To: <20200214130826.31004-1-matthew.auld@intel.com>
== Series Details ==
Series: tests/i915/gem_exec_store: remove hard coded engine limit
URL : https://patchwork.freedesktop.org/series/73466/
State : failure
== Summary ==
Applying: tests/i915/gem_exec_store: remove hard coded engine limit
error: sha1 information is lacking or useless (tests/i915/gem_exec_store.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 tests/i915/gem_exec_store: remove hard coded engine limit
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply
* Re: arm64 iommu groups issue
From: Robin Murphy @ 2020-02-14 18:35 UTC (permalink / raw)
To: John Garry, Marc Zyngier, Will Deacon, Lorenzo Pieralisi,
Sudeep Holla, Guohanjun (Hanjun Guo)
Cc: Saravana Kannan, linux-kernel@vger.kernel.org, Alex Williamson,
Shameerali Kolothum Thodi, Linuxarm, iommu, Bjorn Helgaas,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <35fc8d13-b1c1-6a9e-4242-284da7f00764@huawei.com>
On 14/02/2020 2:09 pm, John Garry wrote:
>>>
>>> @@ -2420,6 +2421,10 @@ void pci_device_add(struct pci_dev *dev, struct
>>> pci_bus *bus)
>>> /* Set up MSI IRQ domain */
>>> pci_set_msi_domain(dev);
>>>
>>> + parent = dev->dev.parent;
>>> + if (parent && parent->bus == &pci_bus_type)
>>> + device_link_add(&dev->dev, parent, DL_FLAG_AUTOPROBE_CONSUMER);
>>> +
>>> /* Notifier could use PCI capabilities */
>>> dev->match_driver = false;
>>> ret = device_add(&dev->dev);
>>> --
>>>
>>> This would work, but the problem is that if the port driver fails in
>>> probing - and not just for -EPROBE_DEFER - then the child device will
>>> never probe. This very thing happens on my dev board. However we could
>>> expand the device links API to cover this sort of scenario.
>>
>> Yes, that's an undesirable issue, but in fact I think it's mostly
>> indicative that involving drivers in something which is designed to
>> happen at a level below drivers is still fundamentally wrong and doomed
>> to be fragile at best.
>
> Right, and even worse is that it relies on the port driver even existing
> at all.
>
> All this iommu group assignment should be taken outside device driver
> probe paths.
>
> However we could still consider device links for sync'ing the SMMU and
> each device probing.
Yes, we should get that for DT now thanks to the of_devlink stuff, but
cooking up some equivalent for IORT might be worthwhile.
>> Another thought that crosses my mind is that when pci_device_group()
>> walks up to the point of ACS isolation and doesn't find an existing
>> group, it can still infer that everything it walked past *should* be put
>> in the same group it's then eventually going to return. Unfortunately I
>> can't see an obvious way for it to act on that knowledge, though, since
>> recursive iommu_probe_device() is unlikely to end well.
>
> I'd be inclined not to change that code.
>
>>
>>> As for alternatives, it looks pretty difficult to me to disassociate the
>>> group allocation from the dma_configure path.
>>
>> Indeed it's non-trivial, but it really does need cleaning up at some
>> point.
>>
>> Having just had yet another spark, does something like the untested
>> super-hack below work at all?
>
> I tried it and it doesn't (yet) work.
Bleh - further reinforcement of the "ideas after 6PM are bad ideas" rule...
> So when we try
> iommu_bus_replay()->add_iommu_group()->iommu_probe_device()->arm_smmu_add_device(),
>
> the iommu_fwspec is still NULL for that device - this is not set until
> later when the device driver is going to finally probe in
> iort_iommu_xlate()->iommu_fwspec_init(), and it's too late...
>
> And this looks to be the reason for which current
> iommu_bus_init()->bus_for_each_device(..., add_iommu_group) fails also.
Of course, just adding a 'correct' add_device replay without the
of_xlate process doesn't help at all. No wonder this looked suspiciously
simpler than where the first idea left off...
(on reflection, the core of this idea seems to be recycling the existing
iommu_bus_init walk rather than building up a separate "waiting list",
while forgetting that that wasn't the difficult part of the original
idea anyway)
> On this current code mentioned, the principle of this seems wrong to me
> - we call bus_for_each_device(..., add_iommu_group) for the first SMMU
> in the system which probes, but we attempt to add_iommu_group() for all
> devices on the bus, even though the SMMU for that device may yet to have
> probed.
Yes, iommu_bus_init() is one of the places still holding a
deeply-ingrained assumption that the ops go live for all IOMMU instances
at once, which is what warranted the further replay in
of_iommu_configure() originally. Moving that out of
of_platform_device_create() to support probe deferral is where the
trouble really started.
Robin.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* Re: [PATCH 13/15] drm/amdgpu/display: split dp connector registration (v3)
From: Daniel Vetter @ 2020-02-14 18:35 UTC (permalink / raw)
To: Alex Deucher
Cc: Alex Deucher, Maling list - DRI developers, amd-gfx list,
Daniel Vetter
In-Reply-To: <CADnq5_PW7QwUk6TdaWiY3i=udua1REkw0HDQZ3eBwk4Xg24OSg@mail.gmail.com>
On Fri, Feb 14, 2020 at 12:39:22PM -0500, Alex Deucher wrote:
> On Fri, Feb 14, 2020 at 2:39 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Fri, Feb 07, 2020 at 04:17:13PM -0500, Alex Deucher wrote:
> > > Split into init and register functions to avoid a segfault
> > > in some configs when the load/unload callbacks are removed.
> > >
> > > v2:
> > > - add back accidently dropped has_aux setting
> > > - set dev in late_register
> > >
> > > v3:
> > > - fix dp cec ordering
> >
> > Why did you move this back out of the late_register callback when going
> > from v2->v3? In i915 we register the cec stuff from ->late_register, like
>
> I got a bunch of complaints from the cec code when I had it switched
> the other way. They went away when I moved it back. I don't remember
> the exact messages off hand.
Would be interesting to learn want went wrong, just in case there's a core
bug here somewhere that prevents drivers from tdtr. But definitely no
reason to hold off this patch.
-Daniel
>
> Alex
>
> > anything else userspace visible. Maybe follow-up patch (the idea behind
> > removing the ->load callback is to close all the driver load races,
> > instead of only open("/dev/dri/0"), which is protected by
> > drm_global_mutex). On this:
> >
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >
> > Cheers, Daniel
> >
> > >
> > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > > ---
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 16 ++++++++++++++++
> > > drivers/gpu/drm/amd/amdgpu/atombios_dp.c | 10 ++--------
> > > .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 7 ++++++-
> > > 3 files changed, 24 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
> > > index ec1501e3a63a..f355d9a752d2 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
> > > @@ -1461,6 +1461,20 @@ static enum drm_mode_status amdgpu_connector_dp_mode_valid(struct drm_connector
> > > return MODE_OK;
> > > }
> > >
> > > +static int
> > > +amdgpu_connector_late_register(struct drm_connector *connector)
> > > +{
> > > + struct amdgpu_connector *amdgpu_connector = to_amdgpu_connector(connector);
> > > + int r = 0;
> > > +
> > > + if (amdgpu_connector->ddc_bus->has_aux) {
> > > + amdgpu_connector->ddc_bus->aux.dev = amdgpu_connector->base.kdev;
> > > + r = drm_dp_aux_register(&amdgpu_connector->ddc_bus->aux);
> > > + }
> > > +
> > > + return r;
> > > +}
> > > +
> > > static const struct drm_connector_helper_funcs amdgpu_connector_dp_helper_funcs = {
> > > .get_modes = amdgpu_connector_dp_get_modes,
> > > .mode_valid = amdgpu_connector_dp_mode_valid,
> > > @@ -1475,6 +1489,7 @@ static const struct drm_connector_funcs amdgpu_connector_dp_funcs = {
> > > .early_unregister = amdgpu_connector_unregister,
> > > .destroy = amdgpu_connector_destroy,
> > > .force = amdgpu_connector_dvi_force,
> > > + .late_register = amdgpu_connector_late_register,
> > > };
> > >
> > > static const struct drm_connector_funcs amdgpu_connector_edp_funcs = {
> > > @@ -1485,6 +1500,7 @@ static const struct drm_connector_funcs amdgpu_connector_edp_funcs = {
> > > .early_unregister = amdgpu_connector_unregister,
> > > .destroy = amdgpu_connector_destroy,
> > > .force = amdgpu_connector_dvi_force,
> > > + .late_register = amdgpu_connector_late_register,
> > > };
> > >
> > > void
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
> > > index ea702a64f807..9b74cfdba7b8 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
> > > @@ -186,16 +186,10 @@ amdgpu_atombios_dp_aux_transfer(struct drm_dp_aux *aux, struct drm_dp_aux_msg *m
> > >
> > > void amdgpu_atombios_dp_aux_init(struct amdgpu_connector *amdgpu_connector)
> > > {
> > > - int ret;
> > > -
> > > amdgpu_connector->ddc_bus->rec.hpd = amdgpu_connector->hpd.hpd;
> > > - amdgpu_connector->ddc_bus->aux.dev = amdgpu_connector->base.kdev;
> > > amdgpu_connector->ddc_bus->aux.transfer = amdgpu_atombios_dp_aux_transfer;
> > > - ret = drm_dp_aux_register(&amdgpu_connector->ddc_bus->aux);
> > > - if (!ret)
> > > - amdgpu_connector->ddc_bus->has_aux = true;
> > > -
> > > - WARN(ret, "drm_dp_aux_register_i2c_bus() failed with error %d\n", ret);
> > > + drm_dp_aux_init(&amdgpu_connector->ddc_bus->aux);
> > > + amdgpu_connector->ddc_bus->has_aux = true;
> > > }
> > >
> > > /***** general DP utility functions *****/
> > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> > > index 3959c942c88b..d5b9e72f2649 100644
> > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> > > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> > > @@ -155,6 +155,11 @@ amdgpu_dm_mst_connector_late_register(struct drm_connector *connector)
> > > struct amdgpu_dm_connector *amdgpu_dm_connector =
> > > to_amdgpu_dm_connector(connector);
> > > struct drm_dp_mst_port *port = amdgpu_dm_connector->port;
> > > + int r;
> > > +
> > > + r = drm_dp_aux_register(&amdgpu_dm_connector->dm_dp_aux.aux);
> > > + if (r)
> > > + return r;
> > >
> > > #if defined(CONFIG_DEBUG_FS)
> > > connector_debugfs_init(amdgpu_dm_connector);
> > > @@ -484,7 +489,7 @@ void amdgpu_dm_initialize_dp_connector(struct amdgpu_display_manager *dm,
> > > aconnector->dm_dp_aux.aux.transfer = dm_dp_aux_transfer;
> > > aconnector->dm_dp_aux.ddc_service = aconnector->dc_link->ddc;
> > >
> > > - drm_dp_aux_register(&aconnector->dm_dp_aux.aux);
> > > + drm_dp_aux_init(&aconnector->dm_dp_aux.aux);
> > > drm_dp_cec_register_connector(&aconnector->dm_dp_aux.aux,
> > > &aconnector->base);
> > >
> > > --
> > > 2.24.1
> > >
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply
* Re: [PATCH 13/15] drm/amdgpu/display: split dp connector registration (v3)
From: Daniel Vetter @ 2020-02-14 18:35 UTC (permalink / raw)
To: Alex Deucher; +Cc: Alex Deucher, Maling list - DRI developers, amd-gfx list
In-Reply-To: <CADnq5_PW7QwUk6TdaWiY3i=udua1REkw0HDQZ3eBwk4Xg24OSg@mail.gmail.com>
On Fri, Feb 14, 2020 at 12:39:22PM -0500, Alex Deucher wrote:
> On Fri, Feb 14, 2020 at 2:39 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Fri, Feb 07, 2020 at 04:17:13PM -0500, Alex Deucher wrote:
> > > Split into init and register functions to avoid a segfault
> > > in some configs when the load/unload callbacks are removed.
> > >
> > > v2:
> > > - add back accidently dropped has_aux setting
> > > - set dev in late_register
> > >
> > > v3:
> > > - fix dp cec ordering
> >
> > Why did you move this back out of the late_register callback when going
> > from v2->v3? In i915 we register the cec stuff from ->late_register, like
>
> I got a bunch of complaints from the cec code when I had it switched
> the other way. They went away when I moved it back. I don't remember
> the exact messages off hand.
Would be interesting to learn want went wrong, just in case there's a core
bug here somewhere that prevents drivers from tdtr. But definitely no
reason to hold off this patch.
-Daniel
>
> Alex
>
> > anything else userspace visible. Maybe follow-up patch (the idea behind
> > removing the ->load callback is to close all the driver load races,
> > instead of only open("/dev/dri/0"), which is protected by
> > drm_global_mutex). On this:
> >
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> >
> > Cheers, Daniel
> >
> > >
> > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> > > ---
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 16 ++++++++++++++++
> > > drivers/gpu/drm/amd/amdgpu/atombios_dp.c | 10 ++--------
> > > .../amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 7 ++++++-
> > > 3 files changed, 24 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
> > > index ec1501e3a63a..f355d9a752d2 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
> > > @@ -1461,6 +1461,20 @@ static enum drm_mode_status amdgpu_connector_dp_mode_valid(struct drm_connector
> > > return MODE_OK;
> > > }
> > >
> > > +static int
> > > +amdgpu_connector_late_register(struct drm_connector *connector)
> > > +{
> > > + struct amdgpu_connector *amdgpu_connector = to_amdgpu_connector(connector);
> > > + int r = 0;
> > > +
> > > + if (amdgpu_connector->ddc_bus->has_aux) {
> > > + amdgpu_connector->ddc_bus->aux.dev = amdgpu_connector->base.kdev;
> > > + r = drm_dp_aux_register(&amdgpu_connector->ddc_bus->aux);
> > > + }
> > > +
> > > + return r;
> > > +}
> > > +
> > > static const struct drm_connector_helper_funcs amdgpu_connector_dp_helper_funcs = {
> > > .get_modes = amdgpu_connector_dp_get_modes,
> > > .mode_valid = amdgpu_connector_dp_mode_valid,
> > > @@ -1475,6 +1489,7 @@ static const struct drm_connector_funcs amdgpu_connector_dp_funcs = {
> > > .early_unregister = amdgpu_connector_unregister,
> > > .destroy = amdgpu_connector_destroy,
> > > .force = amdgpu_connector_dvi_force,
> > > + .late_register = amdgpu_connector_late_register,
> > > };
> > >
> > > static const struct drm_connector_funcs amdgpu_connector_edp_funcs = {
> > > @@ -1485,6 +1500,7 @@ static const struct drm_connector_funcs amdgpu_connector_edp_funcs = {
> > > .early_unregister = amdgpu_connector_unregister,
> > > .destroy = amdgpu_connector_destroy,
> > > .force = amdgpu_connector_dvi_force,
> > > + .late_register = amdgpu_connector_late_register,
> > > };
> > >
> > > void
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
> > > index ea702a64f807..9b74cfdba7b8 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
> > > @@ -186,16 +186,10 @@ amdgpu_atombios_dp_aux_transfer(struct drm_dp_aux *aux, struct drm_dp_aux_msg *m
> > >
> > > void amdgpu_atombios_dp_aux_init(struct amdgpu_connector *amdgpu_connector)
> > > {
> > > - int ret;
> > > -
> > > amdgpu_connector->ddc_bus->rec.hpd = amdgpu_connector->hpd.hpd;
> > > - amdgpu_connector->ddc_bus->aux.dev = amdgpu_connector->base.kdev;
> > > amdgpu_connector->ddc_bus->aux.transfer = amdgpu_atombios_dp_aux_transfer;
> > > - ret = drm_dp_aux_register(&amdgpu_connector->ddc_bus->aux);
> > > - if (!ret)
> > > - amdgpu_connector->ddc_bus->has_aux = true;
> > > -
> > > - WARN(ret, "drm_dp_aux_register_i2c_bus() failed with error %d\n", ret);
> > > + drm_dp_aux_init(&amdgpu_connector->ddc_bus->aux);
> > > + amdgpu_connector->ddc_bus->has_aux = true;
> > > }
> > >
> > > /***** general DP utility functions *****/
> > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> > > index 3959c942c88b..d5b9e72f2649 100644
> > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> > > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> > > @@ -155,6 +155,11 @@ amdgpu_dm_mst_connector_late_register(struct drm_connector *connector)
> > > struct amdgpu_dm_connector *amdgpu_dm_connector =
> > > to_amdgpu_dm_connector(connector);
> > > struct drm_dp_mst_port *port = amdgpu_dm_connector->port;
> > > + int r;
> > > +
> > > + r = drm_dp_aux_register(&amdgpu_dm_connector->dm_dp_aux.aux);
> > > + if (r)
> > > + return r;
> > >
> > > #if defined(CONFIG_DEBUG_FS)
> > > connector_debugfs_init(amdgpu_dm_connector);
> > > @@ -484,7 +489,7 @@ void amdgpu_dm_initialize_dp_connector(struct amdgpu_display_manager *dm,
> > > aconnector->dm_dp_aux.aux.transfer = dm_dp_aux_transfer;
> > > aconnector->dm_dp_aux.ddc_service = aconnector->dc_link->ddc;
> > >
> > > - drm_dp_aux_register(&aconnector->dm_dp_aux.aux);
> > > + drm_dp_aux_init(&aconnector->dm_dp_aux.aux);
> > > drm_dp_cec_register_connector(&aconnector->dm_dp_aux.aux,
> > > &aconnector->base);
> > >
> > > --
> > > 2.24.1
> > >
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply
* Re: arm64 iommu groups issue
From: Robin Murphy @ 2020-02-14 18:35 UTC (permalink / raw)
To: John Garry, Marc Zyngier, Will Deacon, Lorenzo Pieralisi,
Sudeep Holla, Guohanjun (Hanjun Guo)
Cc: Saravana Kannan, linux-kernel@vger.kernel.org, Alex Williamson,
Linuxarm, iommu, Bjorn Helgaas,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <35fc8d13-b1c1-6a9e-4242-284da7f00764@huawei.com>
On 14/02/2020 2:09 pm, John Garry wrote:
>>>
>>> @@ -2420,6 +2421,10 @@ void pci_device_add(struct pci_dev *dev, struct
>>> pci_bus *bus)
>>> /* Set up MSI IRQ domain */
>>> pci_set_msi_domain(dev);
>>>
>>> + parent = dev->dev.parent;
>>> + if (parent && parent->bus == &pci_bus_type)
>>> + device_link_add(&dev->dev, parent, DL_FLAG_AUTOPROBE_CONSUMER);
>>> +
>>> /* Notifier could use PCI capabilities */
>>> dev->match_driver = false;
>>> ret = device_add(&dev->dev);
>>> --
>>>
>>> This would work, but the problem is that if the port driver fails in
>>> probing - and not just for -EPROBE_DEFER - then the child device will
>>> never probe. This very thing happens on my dev board. However we could
>>> expand the device links API to cover this sort of scenario.
>>
>> Yes, that's an undesirable issue, but in fact I think it's mostly
>> indicative that involving drivers in something which is designed to
>> happen at a level below drivers is still fundamentally wrong and doomed
>> to be fragile at best.
>
> Right, and even worse is that it relies on the port driver even existing
> at all.
>
> All this iommu group assignment should be taken outside device driver
> probe paths.
>
> However we could still consider device links for sync'ing the SMMU and
> each device probing.
Yes, we should get that for DT now thanks to the of_devlink stuff, but
cooking up some equivalent for IORT might be worthwhile.
>> Another thought that crosses my mind is that when pci_device_group()
>> walks up to the point of ACS isolation and doesn't find an existing
>> group, it can still infer that everything it walked past *should* be put
>> in the same group it's then eventually going to return. Unfortunately I
>> can't see an obvious way for it to act on that knowledge, though, since
>> recursive iommu_probe_device() is unlikely to end well.
>
> I'd be inclined not to change that code.
>
>>
>>> As for alternatives, it looks pretty difficult to me to disassociate the
>>> group allocation from the dma_configure path.
>>
>> Indeed it's non-trivial, but it really does need cleaning up at some
>> point.
>>
>> Having just had yet another spark, does something like the untested
>> super-hack below work at all?
>
> I tried it and it doesn't (yet) work.
Bleh - further reinforcement of the "ideas after 6PM are bad ideas" rule...
> So when we try
> iommu_bus_replay()->add_iommu_group()->iommu_probe_device()->arm_smmu_add_device(),
>
> the iommu_fwspec is still NULL for that device - this is not set until
> later when the device driver is going to finally probe in
> iort_iommu_xlate()->iommu_fwspec_init(), and it's too late...
>
> And this looks to be the reason for which current
> iommu_bus_init()->bus_for_each_device(..., add_iommu_group) fails also.
Of course, just adding a 'correct' add_device replay without the
of_xlate process doesn't help at all. No wonder this looked suspiciously
simpler than where the first idea left off...
(on reflection, the core of this idea seems to be recycling the existing
iommu_bus_init walk rather than building up a separate "waiting list",
while forgetting that that wasn't the difficult part of the original
idea anyway)
> On this current code mentioned, the principle of this seems wrong to me
> - we call bus_for_each_device(..., add_iommu_group) for the first SMMU
> in the system which probes, but we attempt to add_iommu_group() for all
> devices on the bus, even though the SMMU for that device may yet to have
> probed.
Yes, iommu_bus_init() is one of the places still holding a
deeply-ingrained assumption that the ops go live for all IOMMU instances
at once, which is what warranted the further replay in
of_iommu_configure() originally. Moving that out of
of_platform_device_create() to support probe deferral is where the
trouble really started.
Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
^ permalink raw reply
* Re: [PATCH AUTOSEL 5.5 219/542] ARM: OMAP2+: use separate IOMMU pdata to fix DRA7 IPU1 boot
From: Suman Anna @ 2020-02-14 18:34 UTC (permalink / raw)
To: Sasha Levin, linux-kernel, stable
Cc: Tony Lindgren, linux-omap, linux-arm-kernel
In-Reply-To: <20200214154854.6746-219-sashal@kernel.org>
Hi Sasha,
On 2/14/20 9:43 AM, Sasha Levin wrote:
> From: Suman Anna <s-anna@ti.com>
>
> [ Upstream commit 4601832f40501efc3c2fd264a5a69bd1ac17d520 ]
>
> The IPU1 MMU has been using common IOMMU pdata quirks defined and
> used by all IPU IOMMU devices on OMAP4 and beyond. Separate out the
> pdata for IPU1 MMU with the additional .set_pwrdm_constraint ops
> plugged in, so that the IPU1 power domain can be restricted to ON
> state during the boot and active period of the IPU1 remote processor.
> This eliminates the pre-conditions for the IPU1 boot issue as
> described in commit afe518400bdb ("iommu/omap: fix boot issue on
> remoteprocs with AMMU/Unicache").
>
> NOTE:
> 1. RET is not a valid target power domain state on DRA7 platforms,
> and IPU power domain is normally programmed for OFF. The IPU1
> still fails to boot though, and an unclearable l3_noc error is
> thrown currently on 4.14 kernel without this fix. This behavior
> is slightly different from previous 4.9 LTS kernel.
> 2. The fix is currently applied only to IPU1 on DRA7xx SoC, as the
> other affected processors on OMAP4/OMAP5/DRA7 are in domains
> that are not entering RET. IPU2 on DRA7 is in CORE power domain
> which is only programmed for ON power state. The fix can be easily
> scaled if these domains do hit RET in the future.
> 3. The issue was not seen on current DRA7 platforms if any of the
> DSP remote processors were booted and using one of the GPTimers
> 5, 6, 7 or 8 on previous 4.9 LTS kernel. This was due to the
> errata fix for i874 implemented in commit 1cbabcb9807e ("ARM:
> DRA7: clockdomain: Implement timer workaround for errata i874")
> which keeps the IPU1 power domain from entering RET when the
> timers are active. But the timer workaround did not make any
> difference on 4.14 kernel, and an l3_noc error was seen still
> without this fix.
>
> Signed-off-by: Suman Anna <s-anna@ti.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
And drop this one as well, since mainline doesn't yet boot
the processors, so this is not needed for stable queue.
regards
Suman
> ---
> arch/arm/mach-omap2/pdata-quirks.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/arch/arm/mach-omap2/pdata-quirks.c b/arch/arm/mach-omap2/pdata-quirks.c
> index 7a79bcc02a11b..c3be1db9685cd 100644
> --- a/arch/arm/mach-omap2/pdata-quirks.c
> +++ b/arch/arm/mach-omap2/pdata-quirks.c
> @@ -43,6 +43,17 @@ struct pdata_init {
> static struct of_dev_auxdata omap_auxdata_lookup[];
> static struct twl4030_gpio_platform_data twl_gpio_auxdata;
>
> +#if IS_ENABLED(CONFIG_OMAP_IOMMU)
> +int omap_iommu_set_pwrdm_constraint(struct platform_device *pdev, bool request,
> + u8 *pwrst);
> +#else
> +static inline int omap_iommu_set_pwrdm_constraint(struct platform_device *pdev,
> + bool request, u8 *pwrst)
> +{
> + return 0;
> +}
> +#endif
> +
> #ifdef CONFIG_MACH_NOKIA_N8X0
> static void __init omap2420_n8x0_legacy_init(void)
> {
> @@ -286,6 +297,10 @@ static void __init omap5_uevm_legacy_init(void)
> #endif
>
> #ifdef CONFIG_SOC_DRA7XX
> +static struct iommu_platform_data dra7_ipu1_dsp_iommu_pdata = {
> + .set_pwrdm_constraint = omap_iommu_set_pwrdm_constraint,
> +};
> +
> static struct omap_hsmmc_platform_data dra7_hsmmc_data_mmc1;
> static struct omap_hsmmc_platform_data dra7_hsmmc_data_mmc2;
> static struct omap_hsmmc_platform_data dra7_hsmmc_data_mmc3;
> @@ -517,6 +532,12 @@ static struct of_dev_auxdata omap_auxdata_lookup[] = {
> &dra7_hsmmc_data_mmc2),
> OF_DEV_AUXDATA("ti,dra7-hsmmc", 0x480ad000, "480ad000.mmc",
> &dra7_hsmmc_data_mmc3),
> + OF_DEV_AUXDATA("ti,dra7-dsp-iommu", 0x40d01000, "40d01000.mmu",
> + &dra7_ipu1_dsp_iommu_pdata),
> + OF_DEV_AUXDATA("ti,dra7-dsp-iommu", 0x41501000, "41501000.mmu",
> + &dra7_ipu1_dsp_iommu_pdata),
> + OF_DEV_AUXDATA("ti,dra7-iommu", 0x58882000, "58882000.mmu",
> + &dra7_ipu1_dsp_iommu_pdata),
> #endif
> /* Common auxdata */
> OF_DEV_AUXDATA("ti,sysc", 0, NULL, &ti_sysc_pdata),
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* Re: arm64 iommu groups issue
From: Robin Murphy @ 2020-02-14 18:35 UTC (permalink / raw)
To: John Garry, Marc Zyngier, Will Deacon, Lorenzo Pieralisi,
Sudeep Holla, Guohanjun (Hanjun Guo)
Cc: iommu, linux-arm-kernel@lists.infradead.org, Linuxarm,
Shameerali Kolothum Thodi, Alex Williamson, Bjorn Helgaas,
linux-kernel@vger.kernel.org, Saravana Kannan
In-Reply-To: <35fc8d13-b1c1-6a9e-4242-284da7f00764@huawei.com>
On 14/02/2020 2:09 pm, John Garry wrote:
>>>
>>> @@ -2420,6 +2421,10 @@ void pci_device_add(struct pci_dev *dev, struct
>>> pci_bus *bus)
>>> /* Set up MSI IRQ domain */
>>> pci_set_msi_domain(dev);
>>>
>>> + parent = dev->dev.parent;
>>> + if (parent && parent->bus == &pci_bus_type)
>>> + device_link_add(&dev->dev, parent, DL_FLAG_AUTOPROBE_CONSUMER);
>>> +
>>> /* Notifier could use PCI capabilities */
>>> dev->match_driver = false;
>>> ret = device_add(&dev->dev);
>>> --
>>>
>>> This would work, but the problem is that if the port driver fails in
>>> probing - and not just for -EPROBE_DEFER - then the child device will
>>> never probe. This very thing happens on my dev board. However we could
>>> expand the device links API to cover this sort of scenario.
>>
>> Yes, that's an undesirable issue, but in fact I think it's mostly
>> indicative that involving drivers in something which is designed to
>> happen at a level below drivers is still fundamentally wrong and doomed
>> to be fragile at best.
>
> Right, and even worse is that it relies on the port driver even existing
> at all.
>
> All this iommu group assignment should be taken outside device driver
> probe paths.
>
> However we could still consider device links for sync'ing the SMMU and
> each device probing.
Yes, we should get that for DT now thanks to the of_devlink stuff, but
cooking up some equivalent for IORT might be worthwhile.
>> Another thought that crosses my mind is that when pci_device_group()
>> walks up to the point of ACS isolation and doesn't find an existing
>> group, it can still infer that everything it walked past *should* be put
>> in the same group it's then eventually going to return. Unfortunately I
>> can't see an obvious way for it to act on that knowledge, though, since
>> recursive iommu_probe_device() is unlikely to end well.
>
> I'd be inclined not to change that code.
>
>>
>>> As for alternatives, it looks pretty difficult to me to disassociate the
>>> group allocation from the dma_configure path.
>>
>> Indeed it's non-trivial, but it really does need cleaning up at some
>> point.
>>
>> Having just had yet another spark, does something like the untested
>> super-hack below work at all?
>
> I tried it and it doesn't (yet) work.
Bleh - further reinforcement of the "ideas after 6PM are bad ideas" rule...
> So when we try
> iommu_bus_replay()->add_iommu_group()->iommu_probe_device()->arm_smmu_add_device(),
>
> the iommu_fwspec is still NULL for that device - this is not set until
> later when the device driver is going to finally probe in
> iort_iommu_xlate()->iommu_fwspec_init(), and it's too late...
>
> And this looks to be the reason for which current
> iommu_bus_init()->bus_for_each_device(..., add_iommu_group) fails also.
Of course, just adding a 'correct' add_device replay without the
of_xlate process doesn't help at all. No wonder this looked suspiciously
simpler than where the first idea left off...
(on reflection, the core of this idea seems to be recycling the existing
iommu_bus_init walk rather than building up a separate "waiting list",
while forgetting that that wasn't the difficult part of the original
idea anyway)
> On this current code mentioned, the principle of this seems wrong to me
> - we call bus_for_each_device(..., add_iommu_group) for the first SMMU
> in the system which probes, but we attempt to add_iommu_group() for all
> devices on the bus, even though the SMMU for that device may yet to have
> probed.
Yes, iommu_bus_init() is one of the places still holding a
deeply-ingrained assumption that the ops go live for all IOMMU instances
at once, which is what warranted the further replay in
of_iommu_configure() originally. Moving that out of
of_platform_device_create() to support probe deferral is where the
trouble really started.
Robin.
^ permalink raw reply
* Re: [PATCH AUTOSEL 5.5 219/542] ARM: OMAP2+: use separate IOMMU pdata to fix DRA7 IPU1 boot
From: Suman Anna @ 2020-02-14 18:34 UTC (permalink / raw)
To: Sasha Levin, linux-kernel, stable
Cc: Tony Lindgren, linux-omap, linux-arm-kernel
In-Reply-To: <20200214154854.6746-219-sashal@kernel.org>
Hi Sasha,
On 2/14/20 9:43 AM, Sasha Levin wrote:
> From: Suman Anna <s-anna@ti.com>
>
> [ Upstream commit 4601832f40501efc3c2fd264a5a69bd1ac17d520 ]
>
> The IPU1 MMU has been using common IOMMU pdata quirks defined and
> used by all IPU IOMMU devices on OMAP4 and beyond. Separate out the
> pdata for IPU1 MMU with the additional .set_pwrdm_constraint ops
> plugged in, so that the IPU1 power domain can be restricted to ON
> state during the boot and active period of the IPU1 remote processor.
> This eliminates the pre-conditions for the IPU1 boot issue as
> described in commit afe518400bdb ("iommu/omap: fix boot issue on
> remoteprocs with AMMU/Unicache").
>
> NOTE:
> 1. RET is not a valid target power domain state on DRA7 platforms,
> and IPU power domain is normally programmed for OFF. The IPU1
> still fails to boot though, and an unclearable l3_noc error is
> thrown currently on 4.14 kernel without this fix. This behavior
> is slightly different from previous 4.9 LTS kernel.
> 2. The fix is currently applied only to IPU1 on DRA7xx SoC, as the
> other affected processors on OMAP4/OMAP5/DRA7 are in domains
> that are not entering RET. IPU2 on DRA7 is in CORE power domain
> which is only programmed for ON power state. The fix can be easily
> scaled if these domains do hit RET in the future.
> 3. The issue was not seen on current DRA7 platforms if any of the
> DSP remote processors were booted and using one of the GPTimers
> 5, 6, 7 or 8 on previous 4.9 LTS kernel. This was due to the
> errata fix for i874 implemented in commit 1cbabcb9807e ("ARM:
> DRA7: clockdomain: Implement timer workaround for errata i874")
> which keeps the IPU1 power domain from entering RET when the
> timers are active. But the timer workaround did not make any
> difference on 4.14 kernel, and an l3_noc error was seen still
> without this fix.
>
> Signed-off-by: Suman Anna <s-anna@ti.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
And drop this one as well, since mainline doesn't yet boot
the processors, so this is not needed for stable queue.
regards
Suman
> ---
> arch/arm/mach-omap2/pdata-quirks.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/arch/arm/mach-omap2/pdata-quirks.c b/arch/arm/mach-omap2/pdata-quirks.c
> index 7a79bcc02a11b..c3be1db9685cd 100644
> --- a/arch/arm/mach-omap2/pdata-quirks.c
> +++ b/arch/arm/mach-omap2/pdata-quirks.c
> @@ -43,6 +43,17 @@ struct pdata_init {
> static struct of_dev_auxdata omap_auxdata_lookup[];
> static struct twl4030_gpio_platform_data twl_gpio_auxdata;
>
> +#if IS_ENABLED(CONFIG_OMAP_IOMMU)
> +int omap_iommu_set_pwrdm_constraint(struct platform_device *pdev, bool request,
> + u8 *pwrst);
> +#else
> +static inline int omap_iommu_set_pwrdm_constraint(struct platform_device *pdev,
> + bool request, u8 *pwrst)
> +{
> + return 0;
> +}
> +#endif
> +
> #ifdef CONFIG_MACH_NOKIA_N8X0
> static void __init omap2420_n8x0_legacy_init(void)
> {
> @@ -286,6 +297,10 @@ static void __init omap5_uevm_legacy_init(void)
> #endif
>
> #ifdef CONFIG_SOC_DRA7XX
> +static struct iommu_platform_data dra7_ipu1_dsp_iommu_pdata = {
> + .set_pwrdm_constraint = omap_iommu_set_pwrdm_constraint,
> +};
> +
> static struct omap_hsmmc_platform_data dra7_hsmmc_data_mmc1;
> static struct omap_hsmmc_platform_data dra7_hsmmc_data_mmc2;
> static struct omap_hsmmc_platform_data dra7_hsmmc_data_mmc3;
> @@ -517,6 +532,12 @@ static struct of_dev_auxdata omap_auxdata_lookup[] = {
> &dra7_hsmmc_data_mmc2),
> OF_DEV_AUXDATA("ti,dra7-hsmmc", 0x480ad000, "480ad000.mmc",
> &dra7_hsmmc_data_mmc3),
> + OF_DEV_AUXDATA("ti,dra7-dsp-iommu", 0x40d01000, "40d01000.mmu",
> + &dra7_ipu1_dsp_iommu_pdata),
> + OF_DEV_AUXDATA("ti,dra7-dsp-iommu", 0x41501000, "41501000.mmu",
> + &dra7_ipu1_dsp_iommu_pdata),
> + OF_DEV_AUXDATA("ti,dra7-iommu", 0x58882000, "58882000.mmu",
> + &dra7_ipu1_dsp_iommu_pdata),
> #endif
> /* Common auxdata */
> OF_DEV_AUXDATA("ti,sysc", 0, NULL, &ti_sysc_pdata),
>
^ permalink raw reply
* Re: [PATCH AUTOSEL 5.5 218/542] ARM: OMAP2+: Add workaround for DRA7 DSP MStandby errata i879
From: Suman Anna @ 2020-02-14 18:34 UTC (permalink / raw)
To: Sasha Levin, linux-kernel, stable
Cc: Tony Lindgren, linux-omap, linux-arm-kernel
In-Reply-To: <20200214154854.6746-218-sashal@kernel.org>
Hi Sasha,
On 2/14/20 9:43 AM, Sasha Levin wrote:
> From: Suman Anna <s-anna@ti.com>
>
> [ Upstream commit 2f14101a1d760db72393910d481fbf7768c44530 ]
>
> Errata Title:
> i879: DSP MStandby requires CD_EMU in SW_WKUP
>
> Description:
> The DSP requires the internal emulation clock to be actively toggling
> in order to successfully enter a low power mode via execution of the
> IDLE instruction and PRCM MStandby/Idle handshake. This assumes that
> other prerequisites and software sequence are followed.
>
> Workaround:
> The emulation clock to the DSP is free-running anytime CCS is connected
> via JTAG debugger to the DSP subsystem or when the CD_EMU clock domain
> is set in SW_WKUP mode. The CD_EMU domain can be set in SW_WKUP mode
> via the CM_EMU_CLKSTCTRL [1:0]CLKTRCTRL field.
>
> Implementation:
> This patch implements this workaround by denying the HW_AUTO mode
> for the EMU clockdomain during the power-up of any DSP processor
> and re-enabling the HW_AUTO mode during the shutdown of the last
> DSP processor (actually done during the enabling and disabling of
> the respective DSP MDMA MMUs). Reference counting has to be used to
> manage the independent sequencing between the multiple DSP processors.
>
> This switching is done at runtime rather than a static clockdomain
> flags value to meet the target power domain state for the EMU power
> domain during suspend.
>
> Note that the DSP MStandby behavior is not consistent across all
> boards prior to this fix. Please see commit 45f871eec6c0 ("ARM:
> OMAP2+: Extend DRA7 IPU1 MMU pdata quirks to DSP MDMA MMUs") for
> details.
>
> Signed-off-by: Suman Anna <s-anna@ti.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
You can drop this from the 5.5-stable queue. Mainline doesn't yet boot
the processors, so this is not needed for stable queue.
regards
Suman
> ---
> arch/arm/mach-omap2/omap-iommu.c | 43 +++++++++++++++++++++++++++++---
> 1 file changed, 40 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/mach-omap2/omap-iommu.c b/arch/arm/mach-omap2/omap-iommu.c
> index f1a6ece8108e4..78247e6f4a720 100644
> --- a/arch/arm/mach-omap2/omap-iommu.c
> +++ b/arch/arm/mach-omap2/omap-iommu.c
> @@ -11,14 +11,43 @@
>
> #include "omap_hwmod.h"
> #include "omap_device.h"
> +#include "clockdomain.h"
> #include "powerdomain.h"
>
> +static void omap_iommu_dra7_emu_swsup_config(struct platform_device *pdev,
> + bool enable)
> +{
> + static struct clockdomain *emu_clkdm;
> + static DEFINE_SPINLOCK(emu_lock);
> + static atomic_t count;
> + struct device_node *np = pdev->dev.of_node;
> +
> + if (!of_device_is_compatible(np, "ti,dra7-dsp-iommu"))
> + return;
> +
> + if (!emu_clkdm) {
> + emu_clkdm = clkdm_lookup("emu_clkdm");
> + if (WARN_ON_ONCE(!emu_clkdm))
> + return;
> + }
> +
> + spin_lock(&emu_lock);
> +
> + if (enable && (atomic_inc_return(&count) == 1))
> + clkdm_deny_idle(emu_clkdm);
> + else if (!enable && (atomic_dec_return(&count) == 0))
> + clkdm_allow_idle(emu_clkdm);
> +
> + spin_unlock(&emu_lock);
> +}
> +
> int omap_iommu_set_pwrdm_constraint(struct platform_device *pdev, bool request,
> u8 *pwrst)
> {
> struct powerdomain *pwrdm;
> struct omap_device *od;
> u8 next_pwrst;
> + int ret = 0;
>
> od = to_omap_device(pdev);
> if (!od)
> @@ -31,13 +60,21 @@ int omap_iommu_set_pwrdm_constraint(struct platform_device *pdev, bool request,
> if (!pwrdm)
> return -EINVAL;
>
> - if (request)
> + if (request) {
> *pwrst = pwrdm_read_next_pwrst(pwrdm);
> + omap_iommu_dra7_emu_swsup_config(pdev, true);
> + }
>
> if (*pwrst > PWRDM_POWER_RET)
> - return 0;
> + goto out;
>
> next_pwrst = request ? PWRDM_POWER_ON : *pwrst;
>
> - return pwrdm_set_next_pwrst(pwrdm, next_pwrst);
> + ret = pwrdm_set_next_pwrst(pwrdm, next_pwrst);
> +
> +out:
> + if (!request)
> + omap_iommu_dra7_emu_swsup_config(pdev, false);
> +
> + return ret;
> }
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* Re: [PATCH AUTOSEL 5.5 218/542] ARM: OMAP2+: Add workaround for DRA7 DSP MStandby errata i879
From: Suman Anna @ 2020-02-14 18:34 UTC (permalink / raw)
To: Sasha Levin, linux-kernel, stable
Cc: Tony Lindgren, linux-omap, linux-arm-kernel
In-Reply-To: <20200214154854.6746-218-sashal@kernel.org>
Hi Sasha,
On 2/14/20 9:43 AM, Sasha Levin wrote:
> From: Suman Anna <s-anna@ti.com>
>
> [ Upstream commit 2f14101a1d760db72393910d481fbf7768c44530 ]
>
> Errata Title:
> i879: DSP MStandby requires CD_EMU in SW_WKUP
>
> Description:
> The DSP requires the internal emulation clock to be actively toggling
> in order to successfully enter a low power mode via execution of the
> IDLE instruction and PRCM MStandby/Idle handshake. This assumes that
> other prerequisites and software sequence are followed.
>
> Workaround:
> The emulation clock to the DSP is free-running anytime CCS is connected
> via JTAG debugger to the DSP subsystem or when the CD_EMU clock domain
> is set in SW_WKUP mode. The CD_EMU domain can be set in SW_WKUP mode
> via the CM_EMU_CLKSTCTRL [1:0]CLKTRCTRL field.
>
> Implementation:
> This patch implements this workaround by denying the HW_AUTO mode
> for the EMU clockdomain during the power-up of any DSP processor
> and re-enabling the HW_AUTO mode during the shutdown of the last
> DSP processor (actually done during the enabling and disabling of
> the respective DSP MDMA MMUs). Reference counting has to be used to
> manage the independent sequencing between the multiple DSP processors.
>
> This switching is done at runtime rather than a static clockdomain
> flags value to meet the target power domain state for the EMU power
> domain during suspend.
>
> Note that the DSP MStandby behavior is not consistent across all
> boards prior to this fix. Please see commit 45f871eec6c0 ("ARM:
> OMAP2+: Extend DRA7 IPU1 MMU pdata quirks to DSP MDMA MMUs") for
> details.
>
> Signed-off-by: Suman Anna <s-anna@ti.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
You can drop this from the 5.5-stable queue. Mainline doesn't yet boot
the processors, so this is not needed for stable queue.
regards
Suman
> ---
> arch/arm/mach-omap2/omap-iommu.c | 43 +++++++++++++++++++++++++++++---
> 1 file changed, 40 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/mach-omap2/omap-iommu.c b/arch/arm/mach-omap2/omap-iommu.c
> index f1a6ece8108e4..78247e6f4a720 100644
> --- a/arch/arm/mach-omap2/omap-iommu.c
> +++ b/arch/arm/mach-omap2/omap-iommu.c
> @@ -11,14 +11,43 @@
>
> #include "omap_hwmod.h"
> #include "omap_device.h"
> +#include "clockdomain.h"
> #include "powerdomain.h"
>
> +static void omap_iommu_dra7_emu_swsup_config(struct platform_device *pdev,
> + bool enable)
> +{
> + static struct clockdomain *emu_clkdm;
> + static DEFINE_SPINLOCK(emu_lock);
> + static atomic_t count;
> + struct device_node *np = pdev->dev.of_node;
> +
> + if (!of_device_is_compatible(np, "ti,dra7-dsp-iommu"))
> + return;
> +
> + if (!emu_clkdm) {
> + emu_clkdm = clkdm_lookup("emu_clkdm");
> + if (WARN_ON_ONCE(!emu_clkdm))
> + return;
> + }
> +
> + spin_lock(&emu_lock);
> +
> + if (enable && (atomic_inc_return(&count) == 1))
> + clkdm_deny_idle(emu_clkdm);
> + else if (!enable && (atomic_dec_return(&count) == 0))
> + clkdm_allow_idle(emu_clkdm);
> +
> + spin_unlock(&emu_lock);
> +}
> +
> int omap_iommu_set_pwrdm_constraint(struct platform_device *pdev, bool request,
> u8 *pwrst)
> {
> struct powerdomain *pwrdm;
> struct omap_device *od;
> u8 next_pwrst;
> + int ret = 0;
>
> od = to_omap_device(pdev);
> if (!od)
> @@ -31,13 +60,21 @@ int omap_iommu_set_pwrdm_constraint(struct platform_device *pdev, bool request,
> if (!pwrdm)
> return -EINVAL;
>
> - if (request)
> + if (request) {
> *pwrst = pwrdm_read_next_pwrst(pwrdm);
> + omap_iommu_dra7_emu_swsup_config(pdev, true);
> + }
>
> if (*pwrst > PWRDM_POWER_RET)
> - return 0;
> + goto out;
>
> next_pwrst = request ? PWRDM_POWER_ON : *pwrst;
>
> - return pwrdm_set_next_pwrst(pwrdm, next_pwrst);
> + ret = pwrdm_set_next_pwrst(pwrdm, next_pwrst);
> +
> +out:
> + if (!request)
> + omap_iommu_dra7_emu_swsup_config(pdev, false);
> +
> + return ret;
> }
>
^ permalink raw reply
* [RFC PATCH 0/1] Change newlib configuration to require libgloss
From: Mark Hatle @ 2020-02-14 18:34 UTC (permalink / raw)
To: openembedded-core, raj.khem
This is an RFC, because I only know of my particular use-case. So I'm not
sure how generic of a change this really is.
In my own use-case, we want to build a generic newlib and then customize
our equivalent to libgloss when linking baremetal applications.
I found that if we don't pass --disable-newlib-supplied-syscalls, even if
libgloss was linked in, the syscalls -always- came from newlib (on 32-bit
arm) and would not permit my implementations to override the built-in
newlib versions.
By setting the disable in BOTH newlib an libgloss, newlib syscalls are
disabled, while libgloss syscalls are enabled. This results in a newlib
that will now require to be linked against a 'BSP' implementation, such
as libgloss, in order to build baremetal applications.
Alternatively to this I'd suggest we add pkgconfigs (or even distro
flags) to be able to change this behavior as well as flip some other
newlib switches if desired.
Mark Hatle (1):
newlib: Move syscalls from newlib to libgloss
meta/recipes-core/newlib/newlib.inc | 1 +
1 file changed, 1 insertion(+)
--
2.17.1
^ permalink raw reply
* [RFC PATCH 1/1] newlib: Move syscalls from newlib to libgloss
From: Mark Hatle @ 2020-02-14 18:34 UTC (permalink / raw)
To: openembedded-core, raj.khem
In-Reply-To: <20200214183414.141058-1-mark.hatle@kernel.crashing.org>
By passing --disabled-newlib-supplied-syscalls, newlib will disable the
generation of builtin syscalls and move this to libgloss. (This also
affects the generation of crt0.o.)
libgloss SHOULD then provide the syscalls, crt0.o and other functions that
are no longer part of newlib itself. This now means that you must link
with both newlib and libgloss, whereas before newlib would run in many
configurations by itself.
Signed-off-by: Mark Hatle <mark.hatle@kernel.crashing.org>
---
meta/recipes-core/newlib/newlib.inc | 1 +
1 file changed, 1 insertion(+)
diff --git a/meta/recipes-core/newlib/newlib.inc b/meta/recipes-core/newlib/newlib.inc
index d7ac8bff17..5edea8aba1 100644
--- a/meta/recipes-core/newlib/newlib.inc
+++ b/meta/recipes-core/newlib/newlib.inc
@@ -42,6 +42,7 @@ EXTRA_OECONF = " \
--with-gnu-as \
--with-gnu-ld \
--disable-multilib \
+ --disable-newlib-supplied-syscalls \
"
do_configure[cleandirs] = "${B}"
--
2.17.1
^ permalink raw reply related
* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
From: Daniel Vetter @ 2020-02-14 18:34 UTC (permalink / raw)
To: Kenny Ho
Cc: Daniel Vetter, Greathouse, Joseph, Kenny Ho, Kuehling, Felix,
jsparks, amd-gfx mailing list, lkaplan, Alex Deucher, nirmoy.das,
Maling list - DRI developers, Jason Ekstrand, Tejun Heo, cgroups,
juan.zuniga-anaya, Christian König, damon.mcdougall
In-Reply-To: <CAOWid-f62Uv=GZXX2V2BsQGM5A1JJG_qmyrOwd=KwZBx_sr-bg@mail.gmail.com>
On Fri, Feb 14, 2020 at 12:08:35PM -0500, Kenny Ho wrote:
> Hi Jason,
>
> Thanks for the review.
>
> On Fri, Feb 14, 2020 at 11:44 AM Jason Ekstrand <jason@jlekstrand.net> wrote:
> >
> > Pardon my ignorance but I'm a bit confused by this. What is a "logical GPU"? What are we subdividing? Are we carving up memory? Compute power? Both?
>
> The intention is compute but it is up to the individual drm driver to decide.
I think guidance from Tejun in previos discussions was pretty clear that
he expects cgroups to be both a) standardized and c) sufficient clear
meaning that end-users have a clear understanding of what happens when
they change the resource allocation.
I'm not sure lgpu here, at least as specified, passes either. But I also
don't have much clue, so pulled Jason in - he understands how this all
gets reflected to userspace apis a lot better than me.
-Daniel
>
> > If it's carving up compute power, what's actually being carved up? Time? Execution units/waves/threads? Even if that's the case, what advantage does it give to have it in terms of a fixed set of lgpus where each cgroup gets to pick a fixed set. Does affinity matter that much? Why not just say how many waves the GPU supports and that they have to be allocated in chunks of 16 waves (pulling a number out of thin air) and let the cgroup specify how many waves it wants.
> >
> > Don't get me wrong here. I'm all for the notion of being able to use cgroups to carve up GPU compute resources. However, this sounds to me like the most AMD-specific solution possible. We (Intel) could probably do some sort of carving up as well but we'd likely want to do it with preemption and time-slicing rather than handing out specific EUs.
>
> This has been discussed in the RFC before
> (https://www.spinics.net/lists/cgroups/msg23469.html.) As mentioned
> before, the idea of a compute unit is hardly an AMD specific thing as
> it is in the OpenCL standard and part of the architecture of many
> different vendors. In addition, the interface presented here supports
> Intel's use case. What you described is what I considered as the
> "anonymous resources" view of the lgpu. What you/Intel can do, is to
> register your device to drmcg to have 100 lgpu and users can specify
> simply by count. So if they want to allocate 5% for a cgroup, they
> would set count=5. Per the documentation in this patch: "Some DRM
> devices may only support lgpu as anonymous resources. In such case,
> the significance of the position of the set bits in list will be
> ignored." What Intel does with the user expressed configuration of "5
> out of 100" is entirely up to Intel (time slice if you like, change to
> specific EUs later if you like, or make it driver configurable to
> support both if you like.)
>
> Regards,
> Kenny
>
> >
> > On Fri, Feb 14, 2020 at 9:57 AM Kenny Ho <Kenny.Ho@amd.com> wrote:
> >>
> >> drm.lgpu
> >> A read-write nested-keyed file which exists on all cgroups.
> >> Each entry is keyed by the DRM device's major:minor.
> >>
> >> lgpu stands for logical GPU, it is an abstraction used to
> >> subdivide a physical DRM device for the purpose of resource
> >> management. This file stores user configuration while the
> >> drm.lgpu.effective reflects the actual allocation after
> >> considering the relationship between the cgroups and their
> >> configurations.
> >>
> >> The lgpu is a discrete quantity that is device specific (i.e.
> >> some DRM devices may have 64 lgpus while others may have 100
> >> lgpus.) The lgpu is a single quantity that can be allocated
> >> in three different ways denoted by the following nested keys.
> >>
> >> ===== ==============================================
> >> weight Allocate by proportion in relationship with
> >> active sibling cgroups
> >> count Allocate by amount statically, treat lgpu as
> >> anonymous resources
> >> list Allocate statically, treat lgpu as named
> >> resource
> >> ===== ==============================================
> >>
> >> For example:
> >> 226:0 weight=100 count=256 list=0-255
> >> 226:1 weight=100 count=4 list=0,2,4,6
> >> 226:2 weight=100 count=32 list=32-63
> >> 226:3 weight=100 count=0 list=
> >> 226:4 weight=500 count=0 list=
> >>
> >> lgpu is represented by a bitmap and uses the bitmap_parselist
> >> kernel function so the list key input format is a
> >> comma-separated list of decimal numbers and ranges.
> >>
> >> Consecutively set bits are shown as two hyphen-separated decimal
> >> numbers, the smallest and largest bit numbers set in the range.
> >> Optionally each range can be postfixed to denote that only parts
> >> of it should be set. The range will divided to groups of
> >> specific size.
> >> Syntax: range:used_size/group_size
> >> Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >>
> >> The count key is the hamming weight / hweight of the bitmap.
> >>
> >> Weight, count and list accept the max and default keywords.
> >>
> >> Some DRM devices may only support lgpu as anonymous resources.
> >> In such case, the significance of the position of the set bits
> >> in list will be ignored.
> >>
> >> The weight quantity is only in effect when static allocation
> >> is not used (by setting count=0) for this cgroup. The weight
> >> quantity distributes lgpus that are not statically allocated by
> >> the siblings. For example, given siblings cgroupA, cgroupB and
> >> cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
> >> 0-63, no lgpu is available to be distributed by weight.
> >> Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
> >> cgroupC will be starved if it tries to allocate by weight.
> >>
> >> On the other hand, if cgroupA has weight=100 count=0, cgroupB
> >> has list=16-47, and cgroupC has weight=100 count=0, then 32
> >> lgpus are available to be distributed evenly between cgroupA
> >> and cgroupC. In drm.lgpu.effective, cgroupA will have
> >> list=0-15 and cgroupC will have list=48-63.
> >>
> >> This lgpu resource supports the 'allocation' and 'weight'
> >> resource distribution model.
> >>
> >> drm.lgpu.effective
> >> A read-only nested-keyed file which exists on all cgroups.
> >> Each entry is keyed by the DRM device's major:minor.
> >>
> >> lgpu stands for logical GPU, it is an abstraction used to
> >> subdivide a physical DRM device for the purpose of resource
> >> management. This file reflects the actual allocation after
> >> considering the relationship between the cgroups and their
> >> configurations in drm.lgpu.
> >>
> >> Change-Id: Idde0ef9a331fd67bb9c7eb8ef9978439e6452488
> >> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> >> ---
> >> Documentation/admin-guide/cgroup-v2.rst | 80 ++++++
> >> include/drm/drm_cgroup.h | 3 +
> >> include/linux/cgroup_drm.h | 22 ++
> >> kernel/cgroup/drm.c | 324 +++++++++++++++++++++++-
> >> 4 files changed, 427 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> >> index ce5dc027366a..d8a41956e5c7 100644
> >> --- a/Documentation/admin-guide/cgroup-v2.rst
> >> +++ b/Documentation/admin-guide/cgroup-v2.rst
> >> @@ -2120,6 +2120,86 @@ DRM Interface Files
> >> Set largest allocation for /dev/dri/card1 to 4MB
> >> echo "226:1 4m" > drm.buffer.peak.max
> >>
> >> + drm.lgpu
> >> + A read-write nested-keyed file which exists on all cgroups.
> >> + Each entry is keyed by the DRM device's major:minor.
> >> +
> >> + lgpu stands for logical GPU, it is an abstraction used to
> >> + subdivide a physical DRM device for the purpose of resource
> >> + management. This file stores user configuration while the
> >> + drm.lgpu.effective reflects the actual allocation after
> >> + considering the relationship between the cgroups and their
> >> + configurations.
> >> +
> >> + The lgpu is a discrete quantity that is device specific (i.e.
> >> + some DRM devices may have 64 lgpus while others may have 100
> >> + lgpus.) The lgpu is a single quantity that can be allocated
> >> + in three different ways denoted by the following nested keys.
> >> +
> >> + ===== ==============================================
> >> + weight Allocate by proportion in relationship with
> >> + active sibling cgroups
> >> + count Allocate by amount statically, treat lgpu as
> >> + anonymous resources
> >> + list Allocate statically, treat lgpu as named
> >> + resource
> >> + ===== ==============================================
> >> +
> >> + For example:
> >> + 226:0 weight=100 count=256 list=0-255
> >> + 226:1 weight=100 count=4 list=0,2,4,6
> >> + 226:2 weight=100 count=32 list=32-63
> >> + 226:3 weight=100 count=0 list=
> >> + 226:4 weight=500 count=0 list=
> >> +
> >> + lgpu is represented by a bitmap and uses the bitmap_parselist
> >> + kernel function so the list key input format is a
> >> + comma-separated list of decimal numbers and ranges.
> >> +
> >> + Consecutively set bits are shown as two hyphen-separated decimal
> >> + numbers, the smallest and largest bit numbers set in the range.
> >> + Optionally each range can be postfixed to denote that only parts
> >> + of it should be set. The range will divided to groups of
> >> + specific size.
> >> + Syntax: range:used_size/group_size
> >> + Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >> +
> >> + The count key is the hamming weight / hweight of the bitmap.
> >> +
> >> + Weight, count and list accept the max and default keywords.
> >> +
> >> + Some DRM devices may only support lgpu as anonymous resources.
> >> + In such case, the significance of the position of the set bits
> >> + in list will be ignored.
> >> +
> >> + The weight quantity is only in effect when static allocation
> >> + is not used (by setting count=0) for this cgroup. The weight
> >> + quantity distributes lgpus that are not statically allocated by
> >> + the siblings. For example, given siblings cgroupA, cgroupB and
> >> + cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
> >> + 0-63, no lgpu is available to be distributed by weight.
> >> + Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
> >> + cgroupC will be starved if it tries to allocate by weight.
> >> +
> >> + On the other hand, if cgroupA has weight=100 count=0, cgroupB
> >> + has list=16-47, and cgroupC has weight=100 count=0, then 32
> >> + lgpus are available to be distributed evenly between cgroupA
> >> + and cgroupC. In drm.lgpu.effective, cgroupA will have
> >> + list=0-15 and cgroupC will have list=48-63.
> >> +
> >> + This lgpu resource supports the 'allocation' and 'weight'
> >> + resource distribution model.
> >> +
> >> + drm.lgpu.effective
> >> + A read-only nested-keyed file which exists on all cgroups.
> >> + Each entry is keyed by the DRM device's major:minor.
> >> +
> >> + lgpu stands for logical GPU, it is an abstraction used to
> >> + subdivide a physical DRM device for the purpose of resource
> >> + management. This file reflects the actual allocation after
> >> + considering the relationship between the cgroups and their
> >> + configurations in drm.lgpu.
> >> +
> >> GEM Buffer Ownership
> >> ~~~~~~~~~~~~~~~~~~~~
> >>
> >> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> >> index 2b41d4d22e33..619a110cc748 100644
> >> --- a/include/drm/drm_cgroup.h
> >> +++ b/include/drm/drm_cgroup.h
> >> @@ -17,6 +17,9 @@ struct drmcg_props {
> >>
> >> s64 bo_limits_total_allocated_default;
> >> s64 bo_limits_peak_allocated_default;
> >> +
> >> + int lgpu_capacity;
> >> + DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >> };
> >>
> >> void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
> >> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> >> index eae400f3d9b4..bb09704e7f71 100644
> >> --- a/include/linux/cgroup_drm.h
> >> +++ b/include/linux/cgroup_drm.h
> >> @@ -11,10 +11,14 @@
> >> /* limit defined per the way drm_minor_alloc operates */
> >> #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> >>
> >> +#define MAX_DRMCG_LGPU_CAPACITY 256
> >> +
> >> enum drmcg_res_type {
> >> DRMCG_TYPE_BO_TOTAL,
> >> DRMCG_TYPE_BO_PEAK,
> >> DRMCG_TYPE_BO_COUNT,
> >> + DRMCG_TYPE_LGPU,
> >> + DRMCG_TYPE_LGPU_EFF,
> >> __DRMCG_TYPE_LAST,
> >> };
> >>
> >> @@ -32,6 +36,24 @@ struct drmcg_device_resource {
> >> s64 bo_limits_peak_allocated;
> >>
> >> s64 bo_stats_count_allocated;
> >> +
> >> + /**
> >> + * Logical GPU
> >> + *
> >> + * *_cfg are properties configured by users
> >> + * *_eff are the effective properties being applied to the hardware
> >> + * *_stg is used to calculate _eff before applying to _eff
> >> + * after considering the entire hierarchy
> >> + */
> >> + DECLARE_BITMAP(lgpu_stg, MAX_DRMCG_LGPU_CAPACITY);
> >> + /* user configurations */
> >> + s64 lgpu_weight_cfg;
> >> + DECLARE_BITMAP(lgpu_cfg, MAX_DRMCG_LGPU_CAPACITY);
> >> + /* effective lgpu for the cgroup after considering
> >> + * relationship with other cgroup
> >> + */
> >> + s64 lgpu_count_eff;
> >> + DECLARE_BITMAP(lgpu_eff, MAX_DRMCG_LGPU_CAPACITY);
> >> };
> >>
> >> /**
> >> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> >> index 5fcbbc13fa1c..a4e88a3704bb 100644
> >> --- a/kernel/cgroup/drm.c
> >> +++ b/kernel/cgroup/drm.c
> >> @@ -9,6 +9,7 @@
> >> #include <linux/seq_file.h>
> >> #include <linux/mutex.h>
> >> #include <linux/kernel.h>
> >> +#include <linux/bitmap.h>
> >> #include <linux/cgroup_drm.h>
> >> #include <drm/drm_file.h>
> >> #include <drm/drm_drv.h>
> >> @@ -41,6 +42,10 @@ enum drmcg_file_type {
> >> DRMCG_FTYPE_DEFAULT,
> >> };
> >>
> >> +#define LGPU_LIMITS_NAME_LIST "list"
> >> +#define LGPU_LIMITS_NAME_COUNT "count"
> >> +#define LGPU_LIMITS_NAME_WEIGHT "weight"
> >> +
> >> /**
> >> * drmcg_bind - Bind DRM subsystem to cgroup subsystem
> >> * @acq_dm: function pointer to the drm_minor_acquire function
> >> @@ -98,6 +103,13 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
> >> ddr->bo_limits_peak_allocated =
> >> dev->drmcg_props.bo_limits_peak_allocated_default;
> >>
> >> + bitmap_copy(ddr->lgpu_cfg, dev->drmcg_props.lgpu_slots,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> + bitmap_copy(ddr->lgpu_stg, dev->drmcg_props.lgpu_slots,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> + ddr->lgpu_weight_cfg = CGROUP_WEIGHT_DFL;
> >> +
> >> return 0;
> >> }
> >>
> >> @@ -121,6 +133,120 @@ static inline void drmcg_update_cg_tree(struct drm_device *dev)
> >> mutex_unlock(&cgroup_mutex);
> >> }
> >>
> >> +static void drmcg_calculate_effective_lgpu(struct drm_device *dev,
> >> + const unsigned long *free_static,
> >> + const unsigned long *free_weighted,
> >> + struct drmcg *parent_drmcg)
> >> +{
> >> + int capacity = dev->drmcg_props.lgpu_capacity;
> >> + DECLARE_BITMAP(lgpu_unused, MAX_DRMCG_LGPU_CAPACITY);
> >> + DECLARE_BITMAP(lgpu_by_weight, MAX_DRMCG_LGPU_CAPACITY);
> >> + struct drmcg_device_resource *parent_ddr;
> >> + struct drmcg_device_resource *ddr;
> >> + int minor = dev->primary->index;
> >> + struct cgroup_subsys_state *pos;
> >> + struct drmcg *child;
> >> + s64 weight_sum = 0;
> >> + s64 unused;
> >> +
> >> + parent_ddr = parent_drmcg->dev_resources[minor];
> >> +
> >> + if (bitmap_empty(parent_ddr->lgpu_cfg, capacity))
> >> + /* no static cfg, use weight for calculating the effective */
> >> + bitmap_copy(parent_ddr->lgpu_stg, free_weighted, capacity);
> >> + else
> >> + /* lgpu statically configured, use the overlap as effective */
> >> + bitmap_and(parent_ddr->lgpu_stg, free_static,
> >> + parent_ddr->lgpu_cfg, capacity);
> >> +
> >> + /* calculate lgpu available for distribution by weight for children */
> >> + bitmap_copy(lgpu_unused, parent_ddr->lgpu_stg, capacity);
> >> + css_for_each_child(pos, &parent_drmcg->css) {
> >> + child = css_to_drmcg(pos);
> >> + ddr = child->dev_resources[minor];
> >> +
> >> + if (bitmap_empty(ddr->lgpu_cfg, capacity))
> >> + /* no static allocation, participate in weight dist */
> >> + weight_sum += ddr->lgpu_weight_cfg;
> >> + else
> >> + /* take out statically allocated lgpu by siblings */
> >> + bitmap_andnot(lgpu_unused, lgpu_unused, ddr->lgpu_cfg,
> >> + capacity);
> >> + }
> >> +
> >> + unused = bitmap_weight(lgpu_unused, capacity);
> >> +
> >> + css_for_each_child(pos, &parent_drmcg->css) {
> >> + child = css_to_drmcg(pos);
> >> + ddr = child->dev_resources[minor];
> >> +
> >> + bitmap_zero(lgpu_by_weight, capacity);
> >> + /* no static allocation, participate in weight distribution */
> >> + if (bitmap_empty(ddr->lgpu_cfg, capacity)) {
> >> + int c;
> >> + int p = 0;
> >> +
> >> + for (c = ddr->lgpu_weight_cfg * unused / weight_sum;
> >> + c > 0; c--) {
> >> + p = find_next_bit(lgpu_unused, capacity, p);
> >> + if (p < capacity) {
> >> + clear_bit(p, lgpu_unused);
> >> + set_bit(p, lgpu_by_weight);
> >> + }
> >> + }
> >> +
> >> + }
> >> +
> >> + drmcg_calculate_effective_lgpu(dev, parent_ddr->lgpu_stg,
> >> + lgpu_by_weight, child);
> >> + }
> >> +}
> >> +
> >> +static void drmcg_apply_effective_lgpu(struct drm_device *dev)
> >> +{
> >> + int capacity = dev->drmcg_props.lgpu_capacity;
> >> + int minor = dev->primary->index;
> >> + struct drmcg_device_resource *ddr;
> >> + struct cgroup_subsys_state *pos;
> >> + struct drmcg *drmcg;
> >> +
> >> + if (root_drmcg == NULL) {
> >> + WARN_ON(root_drmcg == NULL);
> >> + return;
> >> + }
> >> +
> >> + rcu_read_lock();
> >> +
> >> + /* process the entire cgroup tree from root to simplify the algorithm */
> >> + drmcg_calculate_effective_lgpu(dev, dev->drmcg_props.lgpu_slots,
> >> + dev->drmcg_props.lgpu_slots, root_drmcg);
> >> +
> >> + /* apply changes to effective only if there is a change */
> >> + css_for_each_descendant_pre(pos, &root_drmcg->css) {
> >> + drmcg = css_to_drmcg(pos);
> >> + ddr = drmcg->dev_resources[minor];
> >> +
> >> + if (!bitmap_equal(ddr->lgpu_stg, ddr->lgpu_eff, capacity)) {
> >> + bitmap_copy(ddr->lgpu_eff, ddr->lgpu_stg, capacity);
> >> + ddr->lgpu_count_eff =
> >> + bitmap_weight(ddr->lgpu_eff, capacity);
> >> + }
> >> + }
> >> + rcu_read_unlock();
> >> +}
> >> +
> >> +static void drmcg_apply_effective(enum drmcg_res_type type,
> >> + struct drm_device *dev, struct drmcg *changed_drmcg)
> >> +{
> >> + switch (type) {
> >> + case DRMCG_TYPE_LGPU:
> >> + drmcg_apply_effective_lgpu(dev);
> >> + break;
> >> + default:
> >> + break;
> >> + }
> >> +}
> >> +
> >> /**
> >> * drmcg_register_dev - register a DRM device for usage in drm cgroup
> >> * @dev: DRM device
> >> @@ -143,7 +269,13 @@ void drmcg_register_dev(struct drm_device *dev)
> >> {
> >> dev->driver->drmcg_custom_init(dev, &dev->drmcg_props);
> >>
> >> + WARN_ON(dev->drmcg_props.lgpu_capacity !=
> >> + bitmap_weight(dev->drmcg_props.lgpu_slots,
> >> + MAX_DRMCG_LGPU_CAPACITY));
> >> +
> >> drmcg_update_cg_tree(dev);
> >> +
> >> + drmcg_apply_effective(DRMCG_TYPE_LGPU, dev, root_drmcg);
> >> }
> >> mutex_unlock(&drmcg_mutex);
> >> }
> >> @@ -297,7 +429,8 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
> >> }
> >>
> >> static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >> - struct seq_file *sf, enum drmcg_res_type type)
> >> + struct seq_file *sf, enum drmcg_res_type type,
> >> + struct drm_device *dev)
> >> {
> >> if (ddr == NULL) {
> >> seq_puts(sf, "\n");
> >> @@ -311,6 +444,25 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >> case DRMCG_TYPE_BO_PEAK:
> >> seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
> >> break;
> >> + case DRMCG_TYPE_LGPU:
> >> + seq_printf(sf, "%s=%lld %s=%d %s=%*pbl\n",
> >> + LGPU_LIMITS_NAME_WEIGHT,
> >> + ddr->lgpu_weight_cfg,
> >> + LGPU_LIMITS_NAME_COUNT,
> >> + bitmap_weight(ddr->lgpu_cfg,
> >> + dev->drmcg_props.lgpu_capacity),
> >> + LGPU_LIMITS_NAME_LIST,
> >> + dev->drmcg_props.lgpu_capacity,
> >> + ddr->lgpu_cfg);
> >> + break;
> >> + case DRMCG_TYPE_LGPU_EFF:
> >> + seq_printf(sf, "%s=%lld %s=%*pbl\n",
> >> + LGPU_LIMITS_NAME_COUNT,
> >> + ddr->lgpu_count_eff,
> >> + LGPU_LIMITS_NAME_LIST,
> >> + dev->drmcg_props.lgpu_capacity,
> >> + ddr->lgpu_eff);
> >> + break;
> >> default:
> >> seq_puts(sf, "\n");
> >> break;
> >> @@ -329,6 +481,17 @@ static void drmcg_print_default(struct drmcg_props *props,
> >> seq_printf(sf, "%lld\n",
> >> props->bo_limits_peak_allocated_default);
> >> break;
> >> + case DRMCG_TYPE_LGPU:
> >> + seq_printf(sf, "%s=%d %s=%d %s=%*pbl\n",
> >> + LGPU_LIMITS_NAME_WEIGHT,
> >> + CGROUP_WEIGHT_DFL,
> >> + LGPU_LIMITS_NAME_COUNT,
> >> + bitmap_weight(props->lgpu_slots,
> >> + props->lgpu_capacity),
> >> + LGPU_LIMITS_NAME_LIST,
> >> + props->lgpu_capacity,
> >> + props->lgpu_slots);
> >> + break;
> >> default:
> >> seq_puts(sf, "\n");
> >> break;
> >> @@ -358,7 +521,7 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data)
> >> drmcg_print_stats(ddr, sf, type);
> >> break;
> >> case DRMCG_FTYPE_LIMIT:
> >> - drmcg_print_limits(ddr, sf, type);
> >> + drmcg_print_limits(ddr, sf, type, minor->dev);
> >> break;
> >> case DRMCG_FTYPE_DEFAULT:
> >> drmcg_print_default(&minor->dev->drmcg_props, sf, type);
> >> @@ -415,6 +578,115 @@ static int drmcg_process_limit_s64_val(char *sval, bool is_mem,
> >> return rc;
> >> }
> >>
> >> +static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> >> + struct drm_device *dev, char *attrs)
> >> +{
> >> + DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >> + DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >> + enum drmcg_res_type type =
> >> + DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> >> + struct drmcg *drmcg = css_to_drmcg(of_css(of));
> >> + struct drmcg_props *props = &dev->drmcg_props;
> >> + char *cft_name = of_cft(of)->name;
> >> + int minor = dev->primary->index;
> >> + char *nested = strstrip(attrs);
> >> + struct drmcg_device_resource *ddr =
> >> + drmcg->dev_resources[minor];
> >> + char *attr;
> >> + char sname[256];
> >> + char sval[256];
> >> + s64 val;
> >> + int rc;
> >> +
> >> + while (nested != NULL) {
> >> + attr = strsep(&nested, " ");
> >> +
> >> + if (sscanf(attr, "%255[^=]=%255[^=]", sname, sval) != 2)
> >> + continue;
> >> +
> >> + switch (type) {
> >> + case DRMCG_TYPE_LGPU:
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> >> + strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) &&
> >> + strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256))
> >> + continue;
> >> +
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) &&
> >> + (!strcmp("max", sval) ||
> >> + !strcmp("default", sval))) {
> >> + bitmap_copy(ddr->lgpu_cfg, props->lgpu_slots,
> >> + props->lgpu_capacity);
> >> +
> >> + continue;
> >> + }
> >> +
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) == 0) {
> >> + rc = drmcg_process_limit_s64_val(sval,
> >> + false, CGROUP_WEIGHT_DFL,
> >> + CGROUP_WEIGHT_MAX, &val);
> >> +
> >> + if (rc || val < CGROUP_WEIGHT_MIN ||
> >> + val > CGROUP_WEIGHT_MAX) {
> >> + drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> + minor);
> >> + continue;
> >> + }
> >> +
> >> + ddr->lgpu_weight_cfg = val;
> >> + continue;
> >> + }
> >> +
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> >> + rc = drmcg_process_limit_s64_val(sval,
> >> + false, props->lgpu_capacity,
> >> + props->lgpu_capacity, &val);
> >> +
> >> + if (rc || val < 0) {
> >> + drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> + minor);
> >> + continue;
> >> + }
> >> +
> >> + bitmap_zero(tmp_bitmap,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> + bitmap_set(tmp_bitmap, 0, val);
> >> + }
> >> +
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> >> + rc = bitmap_parselist(sval, tmp_bitmap,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> + if (rc) {
> >> + drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> + minor);
> >> + continue;
> >> + }
> >> +
> >> + bitmap_andnot(chk_bitmap, tmp_bitmap,
> >> + props->lgpu_slots,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> + /* user setting does not intersect with
> >> + * available lgpu */
> >> + if (!bitmap_empty(chk_bitmap,
> >> + MAX_DRMCG_LGPU_CAPACITY)) {
> >> + drmcg_pr_cft_err(drmcg, 0, cft_name,
> >> + minor);
> >> + continue;
> >> + }
> >> + }
> >> +
> >> + bitmap_copy(ddr->lgpu_cfg, tmp_bitmap,
> >> + props->lgpu_capacity);
> >> +
> >> + break; /* DRMCG_TYPE_LGPU */
> >> + default:
> >> + break;
> >> + } /* switch (type) */
> >> + }
> >> +}
> >> +
> >> +
> >> /**
> >> * drmcg_limit_write - parse cgroup interface files to obtain user config
> >> *
> >> @@ -499,9 +771,15 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> >>
> >> ddr->bo_limits_peak_allocated = val;
> >> break;
> >> + case DRMCG_TYPE_LGPU:
> >> + drmcg_nested_limit_parse(of, dm->dev, sattr);
> >> + break;
> >> default:
> >> break;
> >> }
> >> +
> >> + drmcg_apply_effective(type, dm->dev, drmcg);
> >> +
> >> mutex_unlock(&dm->dev->drmcg_mutex);
> >>
> >> mutex_lock(&drmcg_mutex);
> >> @@ -560,12 +838,51 @@ struct cftype files[] = {
> >> .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
> >> DRMCG_FTYPE_STATS),
> >> },
> >> + {
> >> + .name = "lgpu",
> >> + .seq_show = drmcg_seq_show,
> >> + .write = drmcg_limit_write,
> >> + .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >> + DRMCG_FTYPE_LIMIT),
> >> + },
> >> + {
> >> + .name = "lgpu.default",
> >> + .seq_show = drmcg_seq_show,
> >> + .flags = CFTYPE_ONLY_ON_ROOT,
> >> + .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >> + DRMCG_FTYPE_DEFAULT),
> >> + },
> >> + {
> >> + .name = "lgpu.effective",
> >> + .seq_show = drmcg_seq_show,
> >> + .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU_EFF,
> >> + DRMCG_FTYPE_LIMIT),
> >> + },
> >> { } /* terminate */
> >> };
> >>
> >> +static int drmcg_online_fn(int id, void *ptr, void *data)
> >> +{
> >> + struct drm_minor *minor = ptr;
> >> + struct drmcg *drmcg = data;
> >> +
> >> + if (minor->type != DRM_MINOR_PRIMARY)
> >> + return 0;
> >> +
> >> + drmcg_apply_effective(DRMCG_TYPE_LGPU, minor->dev, drmcg);
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int drmcg_css_online(struct cgroup_subsys_state *css)
> >> +{
> >> + return drm_minor_for_each(&drmcg_online_fn, css_to_drmcg(css));
> >> +}
> >> +
> >> struct cgroup_subsys drm_cgrp_subsys = {
> >> .css_alloc = drmcg_css_alloc,
> >> .css_free = drmcg_css_free,
> >> + .css_online = drmcg_css_online,
> >> .early_init = false,
> >> .legacy_cftypes = files,
> >> .dfl_cftypes = files,
> >> @@ -585,6 +902,9 @@ void drmcg_device_early_init(struct drm_device *dev)
> >> dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
> >> dev->drmcg_props.bo_limits_peak_allocated_default = S64_MAX;
> >>
> >> + dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> >> + bitmap_fill(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> drmcg_update_cg_tree(dev);
> >> }
> >> EXPORT_SYMBOL(drmcg_device_early_init);
> >> --
> >> 2.25.0
> >>
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply
* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
From: Daniel Vetter @ 2020-02-14 18:34 UTC (permalink / raw)
To: Kenny Ho
Cc: Greathouse, Joseph, Kenny Ho, Kuehling, Felix, jsparks,
amd-gfx mailing list, lkaplan, Alex Deucher, nirmoy.das,
Maling list - DRI developers, Jason Ekstrand, Tejun Heo, cgroups,
juan.zuniga-anaya, Christian König, damon.mcdougall
In-Reply-To: <CAOWid-f62Uv=GZXX2V2BsQGM5A1JJG_qmyrOwd=KwZBx_sr-bg@mail.gmail.com>
On Fri, Feb 14, 2020 at 12:08:35PM -0500, Kenny Ho wrote:
> Hi Jason,
>
> Thanks for the review.
>
> On Fri, Feb 14, 2020 at 11:44 AM Jason Ekstrand <jason@jlekstrand.net> wrote:
> >
> > Pardon my ignorance but I'm a bit confused by this. What is a "logical GPU"? What are we subdividing? Are we carving up memory? Compute power? Both?
>
> The intention is compute but it is up to the individual drm driver to decide.
I think guidance from Tejun in previos discussions was pretty clear that
he expects cgroups to be both a) standardized and c) sufficient clear
meaning that end-users have a clear understanding of what happens when
they change the resource allocation.
I'm not sure lgpu here, at least as specified, passes either. But I also
don't have much clue, so pulled Jason in - he understands how this all
gets reflected to userspace apis a lot better than me.
-Daniel
>
> > If it's carving up compute power, what's actually being carved up? Time? Execution units/waves/threads? Even if that's the case, what advantage does it give to have it in terms of a fixed set of lgpus where each cgroup gets to pick a fixed set. Does affinity matter that much? Why not just say how many waves the GPU supports and that they have to be allocated in chunks of 16 waves (pulling a number out of thin air) and let the cgroup specify how many waves it wants.
> >
> > Don't get me wrong here. I'm all for the notion of being able to use cgroups to carve up GPU compute resources. However, this sounds to me like the most AMD-specific solution possible. We (Intel) could probably do some sort of carving up as well but we'd likely want to do it with preemption and time-slicing rather than handing out specific EUs.
>
> This has been discussed in the RFC before
> (https://www.spinics.net/lists/cgroups/msg23469.html.) As mentioned
> before, the idea of a compute unit is hardly an AMD specific thing as
> it is in the OpenCL standard and part of the architecture of many
> different vendors. In addition, the interface presented here supports
> Intel's use case. What you described is what I considered as the
> "anonymous resources" view of the lgpu. What you/Intel can do, is to
> register your device to drmcg to have 100 lgpu and users can specify
> simply by count. So if they want to allocate 5% for a cgroup, they
> would set count=5. Per the documentation in this patch: "Some DRM
> devices may only support lgpu as anonymous resources. In such case,
> the significance of the position of the set bits in list will be
> ignored." What Intel does with the user expressed configuration of "5
> out of 100" is entirely up to Intel (time slice if you like, change to
> specific EUs later if you like, or make it driver configurable to
> support both if you like.)
>
> Regards,
> Kenny
>
> >
> > On Fri, Feb 14, 2020 at 9:57 AM Kenny Ho <Kenny.Ho@amd.com> wrote:
> >>
> >> drm.lgpu
> >> A read-write nested-keyed file which exists on all cgroups.
> >> Each entry is keyed by the DRM device's major:minor.
> >>
> >> lgpu stands for logical GPU, it is an abstraction used to
> >> subdivide a physical DRM device for the purpose of resource
> >> management. This file stores user configuration while the
> >> drm.lgpu.effective reflects the actual allocation after
> >> considering the relationship between the cgroups and their
> >> configurations.
> >>
> >> The lgpu is a discrete quantity that is device specific (i.e.
> >> some DRM devices may have 64 lgpus while others may have 100
> >> lgpus.) The lgpu is a single quantity that can be allocated
> >> in three different ways denoted by the following nested keys.
> >>
> >> ===== ==============================================
> >> weight Allocate by proportion in relationship with
> >> active sibling cgroups
> >> count Allocate by amount statically, treat lgpu as
> >> anonymous resources
> >> list Allocate statically, treat lgpu as named
> >> resource
> >> ===== ==============================================
> >>
> >> For example:
> >> 226:0 weight=100 count=256 list=0-255
> >> 226:1 weight=100 count=4 list=0,2,4,6
> >> 226:2 weight=100 count=32 list=32-63
> >> 226:3 weight=100 count=0 list=
> >> 226:4 weight=500 count=0 list=
> >>
> >> lgpu is represented by a bitmap and uses the bitmap_parselist
> >> kernel function so the list key input format is a
> >> comma-separated list of decimal numbers and ranges.
> >>
> >> Consecutively set bits are shown as two hyphen-separated decimal
> >> numbers, the smallest and largest bit numbers set in the range.
> >> Optionally each range can be postfixed to denote that only parts
> >> of it should be set. The range will divided to groups of
> >> specific size.
> >> Syntax: range:used_size/group_size
> >> Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >>
> >> The count key is the hamming weight / hweight of the bitmap.
> >>
> >> Weight, count and list accept the max and default keywords.
> >>
> >> Some DRM devices may only support lgpu as anonymous resources.
> >> In such case, the significance of the position of the set bits
> >> in list will be ignored.
> >>
> >> The weight quantity is only in effect when static allocation
> >> is not used (by setting count=0) for this cgroup. The weight
> >> quantity distributes lgpus that are not statically allocated by
> >> the siblings. For example, given siblings cgroupA, cgroupB and
> >> cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
> >> 0-63, no lgpu is available to be distributed by weight.
> >> Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
> >> cgroupC will be starved if it tries to allocate by weight.
> >>
> >> On the other hand, if cgroupA has weight=100 count=0, cgroupB
> >> has list=16-47, and cgroupC has weight=100 count=0, then 32
> >> lgpus are available to be distributed evenly between cgroupA
> >> and cgroupC. In drm.lgpu.effective, cgroupA will have
> >> list=0-15 and cgroupC will have list=48-63.
> >>
> >> This lgpu resource supports the 'allocation' and 'weight'
> >> resource distribution model.
> >>
> >> drm.lgpu.effective
> >> A read-only nested-keyed file which exists on all cgroups.
> >> Each entry is keyed by the DRM device's major:minor.
> >>
> >> lgpu stands for logical GPU, it is an abstraction used to
> >> subdivide a physical DRM device for the purpose of resource
> >> management. This file reflects the actual allocation after
> >> considering the relationship between the cgroups and their
> >> configurations in drm.lgpu.
> >>
> >> Change-Id: Idde0ef9a331fd67bb9c7eb8ef9978439e6452488
> >> Signed-off-by: Kenny Ho <Kenny.Ho@amd.com>
> >> ---
> >> Documentation/admin-guide/cgroup-v2.rst | 80 ++++++
> >> include/drm/drm_cgroup.h | 3 +
> >> include/linux/cgroup_drm.h | 22 ++
> >> kernel/cgroup/drm.c | 324 +++++++++++++++++++++++-
> >> 4 files changed, 427 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> >> index ce5dc027366a..d8a41956e5c7 100644
> >> --- a/Documentation/admin-guide/cgroup-v2.rst
> >> +++ b/Documentation/admin-guide/cgroup-v2.rst
> >> @@ -2120,6 +2120,86 @@ DRM Interface Files
> >> Set largest allocation for /dev/dri/card1 to 4MB
> >> echo "226:1 4m" > drm.buffer.peak.max
> >>
> >> + drm.lgpu
> >> + A read-write nested-keyed file which exists on all cgroups.
> >> + Each entry is keyed by the DRM device's major:minor.
> >> +
> >> + lgpu stands for logical GPU, it is an abstraction used to
> >> + subdivide a physical DRM device for the purpose of resource
> >> + management. This file stores user configuration while the
> >> + drm.lgpu.effective reflects the actual allocation after
> >> + considering the relationship between the cgroups and their
> >> + configurations.
> >> +
> >> + The lgpu is a discrete quantity that is device specific (i.e.
> >> + some DRM devices may have 64 lgpus while others may have 100
> >> + lgpus.) The lgpu is a single quantity that can be allocated
> >> + in three different ways denoted by the following nested keys.
> >> +
> >> + ===== ==============================================
> >> + weight Allocate by proportion in relationship with
> >> + active sibling cgroups
> >> + count Allocate by amount statically, treat lgpu as
> >> + anonymous resources
> >> + list Allocate statically, treat lgpu as named
> >> + resource
> >> + ===== ==============================================
> >> +
> >> + For example:
> >> + 226:0 weight=100 count=256 list=0-255
> >> + 226:1 weight=100 count=4 list=0,2,4,6
> >> + 226:2 weight=100 count=32 list=32-63
> >> + 226:3 weight=100 count=0 list=
> >> + 226:4 weight=500 count=0 list=
> >> +
> >> + lgpu is represented by a bitmap and uses the bitmap_parselist
> >> + kernel function so the list key input format is a
> >> + comma-separated list of decimal numbers and ranges.
> >> +
> >> + Consecutively set bits are shown as two hyphen-separated decimal
> >> + numbers, the smallest and largest bit numbers set in the range.
> >> + Optionally each range can be postfixed to denote that only parts
> >> + of it should be set. The range will divided to groups of
> >> + specific size.
> >> + Syntax: range:used_size/group_size
> >> + Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >> +
> >> + The count key is the hamming weight / hweight of the bitmap.
> >> +
> >> + Weight, count and list accept the max and default keywords.
> >> +
> >> + Some DRM devices may only support lgpu as anonymous resources.
> >> + In such case, the significance of the position of the set bits
> >> + in list will be ignored.
> >> +
> >> + The weight quantity is only in effect when static allocation
> >> + is not used (by setting count=0) for this cgroup. The weight
> >> + quantity distributes lgpus that are not statically allocated by
> >> + the siblings. For example, given siblings cgroupA, cgroupB and
> >> + cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
> >> + 0-63, no lgpu is available to be distributed by weight.
> >> + Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
> >> + cgroupC will be starved if it tries to allocate by weight.
> >> +
> >> + On the other hand, if cgroupA has weight=100 count=0, cgroupB
> >> + has list=16-47, and cgroupC has weight=100 count=0, then 32
> >> + lgpus are available to be distributed evenly between cgroupA
> >> + and cgroupC. In drm.lgpu.effective, cgroupA will have
> >> + list=0-15 and cgroupC will have list=48-63.
> >> +
> >> + This lgpu resource supports the 'allocation' and 'weight'
> >> + resource distribution model.
> >> +
> >> + drm.lgpu.effective
> >> + A read-only nested-keyed file which exists on all cgroups.
> >> + Each entry is keyed by the DRM device's major:minor.
> >> +
> >> + lgpu stands for logical GPU, it is an abstraction used to
> >> + subdivide a physical DRM device for the purpose of resource
> >> + management. This file reflects the actual allocation after
> >> + considering the relationship between the cgroups and their
> >> + configurations in drm.lgpu.
> >> +
> >> GEM Buffer Ownership
> >> ~~~~~~~~~~~~~~~~~~~~
> >>
> >> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> >> index 2b41d4d22e33..619a110cc748 100644
> >> --- a/include/drm/drm_cgroup.h
> >> +++ b/include/drm/drm_cgroup.h
> >> @@ -17,6 +17,9 @@ struct drmcg_props {
> >>
> >> s64 bo_limits_total_allocated_default;
> >> s64 bo_limits_peak_allocated_default;
> >> +
> >> + int lgpu_capacity;
> >> + DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >> };
> >>
> >> void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
> >> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> >> index eae400f3d9b4..bb09704e7f71 100644
> >> --- a/include/linux/cgroup_drm.h
> >> +++ b/include/linux/cgroup_drm.h
> >> @@ -11,10 +11,14 @@
> >> /* limit defined per the way drm_minor_alloc operates */
> >> #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> >>
> >> +#define MAX_DRMCG_LGPU_CAPACITY 256
> >> +
> >> enum drmcg_res_type {
> >> DRMCG_TYPE_BO_TOTAL,
> >> DRMCG_TYPE_BO_PEAK,
> >> DRMCG_TYPE_BO_COUNT,
> >> + DRMCG_TYPE_LGPU,
> >> + DRMCG_TYPE_LGPU_EFF,
> >> __DRMCG_TYPE_LAST,
> >> };
> >>
> >> @@ -32,6 +36,24 @@ struct drmcg_device_resource {
> >> s64 bo_limits_peak_allocated;
> >>
> >> s64 bo_stats_count_allocated;
> >> +
> >> + /**
> >> + * Logical GPU
> >> + *
> >> + * *_cfg are properties configured by users
> >> + * *_eff are the effective properties being applied to the hardware
> >> + * *_stg is used to calculate _eff before applying to _eff
> >> + * after considering the entire hierarchy
> >> + */
> >> + DECLARE_BITMAP(lgpu_stg, MAX_DRMCG_LGPU_CAPACITY);
> >> + /* user configurations */
> >> + s64 lgpu_weight_cfg;
> >> + DECLARE_BITMAP(lgpu_cfg, MAX_DRMCG_LGPU_CAPACITY);
> >> + /* effective lgpu for the cgroup after considering
> >> + * relationship with other cgroup
> >> + */
> >> + s64 lgpu_count_eff;
> >> + DECLARE_BITMAP(lgpu_eff, MAX_DRMCG_LGPU_CAPACITY);
> >> };
> >>
> >> /**
> >> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> >> index 5fcbbc13fa1c..a4e88a3704bb 100644
> >> --- a/kernel/cgroup/drm.c
> >> +++ b/kernel/cgroup/drm.c
> >> @@ -9,6 +9,7 @@
> >> #include <linux/seq_file.h>
> >> #include <linux/mutex.h>
> >> #include <linux/kernel.h>
> >> +#include <linux/bitmap.h>
> >> #include <linux/cgroup_drm.h>
> >> #include <drm/drm_file.h>
> >> #include <drm/drm_drv.h>
> >> @@ -41,6 +42,10 @@ enum drmcg_file_type {
> >> DRMCG_FTYPE_DEFAULT,
> >> };
> >>
> >> +#define LGPU_LIMITS_NAME_LIST "list"
> >> +#define LGPU_LIMITS_NAME_COUNT "count"
> >> +#define LGPU_LIMITS_NAME_WEIGHT "weight"
> >> +
> >> /**
> >> * drmcg_bind - Bind DRM subsystem to cgroup subsystem
> >> * @acq_dm: function pointer to the drm_minor_acquire function
> >> @@ -98,6 +103,13 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
> >> ddr->bo_limits_peak_allocated =
> >> dev->drmcg_props.bo_limits_peak_allocated_default;
> >>
> >> + bitmap_copy(ddr->lgpu_cfg, dev->drmcg_props.lgpu_slots,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> + bitmap_copy(ddr->lgpu_stg, dev->drmcg_props.lgpu_slots,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> + ddr->lgpu_weight_cfg = CGROUP_WEIGHT_DFL;
> >> +
> >> return 0;
> >> }
> >>
> >> @@ -121,6 +133,120 @@ static inline void drmcg_update_cg_tree(struct drm_device *dev)
> >> mutex_unlock(&cgroup_mutex);
> >> }
> >>
> >> +static void drmcg_calculate_effective_lgpu(struct drm_device *dev,
> >> + const unsigned long *free_static,
> >> + const unsigned long *free_weighted,
> >> + struct drmcg *parent_drmcg)
> >> +{
> >> + int capacity = dev->drmcg_props.lgpu_capacity;
> >> + DECLARE_BITMAP(lgpu_unused, MAX_DRMCG_LGPU_CAPACITY);
> >> + DECLARE_BITMAP(lgpu_by_weight, MAX_DRMCG_LGPU_CAPACITY);
> >> + struct drmcg_device_resource *parent_ddr;
> >> + struct drmcg_device_resource *ddr;
> >> + int minor = dev->primary->index;
> >> + struct cgroup_subsys_state *pos;
> >> + struct drmcg *child;
> >> + s64 weight_sum = 0;
> >> + s64 unused;
> >> +
> >> + parent_ddr = parent_drmcg->dev_resources[minor];
> >> +
> >> + if (bitmap_empty(parent_ddr->lgpu_cfg, capacity))
> >> + /* no static cfg, use weight for calculating the effective */
> >> + bitmap_copy(parent_ddr->lgpu_stg, free_weighted, capacity);
> >> + else
> >> + /* lgpu statically configured, use the overlap as effective */
> >> + bitmap_and(parent_ddr->lgpu_stg, free_static,
> >> + parent_ddr->lgpu_cfg, capacity);
> >> +
> >> + /* calculate lgpu available for distribution by weight for children */
> >> + bitmap_copy(lgpu_unused, parent_ddr->lgpu_stg, capacity);
> >> + css_for_each_child(pos, &parent_drmcg->css) {
> >> + child = css_to_drmcg(pos);
> >> + ddr = child->dev_resources[minor];
> >> +
> >> + if (bitmap_empty(ddr->lgpu_cfg, capacity))
> >> + /* no static allocation, participate in weight dist */
> >> + weight_sum += ddr->lgpu_weight_cfg;
> >> + else
> >> + /* take out statically allocated lgpu by siblings */
> >> + bitmap_andnot(lgpu_unused, lgpu_unused, ddr->lgpu_cfg,
> >> + capacity);
> >> + }
> >> +
> >> + unused = bitmap_weight(lgpu_unused, capacity);
> >> +
> >> + css_for_each_child(pos, &parent_drmcg->css) {
> >> + child = css_to_drmcg(pos);
> >> + ddr = child->dev_resources[minor];
> >> +
> >> + bitmap_zero(lgpu_by_weight, capacity);
> >> + /* no static allocation, participate in weight distribution */
> >> + if (bitmap_empty(ddr->lgpu_cfg, capacity)) {
> >> + int c;
> >> + int p = 0;
> >> +
> >> + for (c = ddr->lgpu_weight_cfg * unused / weight_sum;
> >> + c > 0; c--) {
> >> + p = find_next_bit(lgpu_unused, capacity, p);
> >> + if (p < capacity) {
> >> + clear_bit(p, lgpu_unused);
> >> + set_bit(p, lgpu_by_weight);
> >> + }
> >> + }
> >> +
> >> + }
> >> +
> >> + drmcg_calculate_effective_lgpu(dev, parent_ddr->lgpu_stg,
> >> + lgpu_by_weight, child);
> >> + }
> >> +}
> >> +
> >> +static void drmcg_apply_effective_lgpu(struct drm_device *dev)
> >> +{
> >> + int capacity = dev->drmcg_props.lgpu_capacity;
> >> + int minor = dev->primary->index;
> >> + struct drmcg_device_resource *ddr;
> >> + struct cgroup_subsys_state *pos;
> >> + struct drmcg *drmcg;
> >> +
> >> + if (root_drmcg == NULL) {
> >> + WARN_ON(root_drmcg == NULL);
> >> + return;
> >> + }
> >> +
> >> + rcu_read_lock();
> >> +
> >> + /* process the entire cgroup tree from root to simplify the algorithm */
> >> + drmcg_calculate_effective_lgpu(dev, dev->drmcg_props.lgpu_slots,
> >> + dev->drmcg_props.lgpu_slots, root_drmcg);
> >> +
> >> + /* apply changes to effective only if there is a change */
> >> + css_for_each_descendant_pre(pos, &root_drmcg->css) {
> >> + drmcg = css_to_drmcg(pos);
> >> + ddr = drmcg->dev_resources[minor];
> >> +
> >> + if (!bitmap_equal(ddr->lgpu_stg, ddr->lgpu_eff, capacity)) {
> >> + bitmap_copy(ddr->lgpu_eff, ddr->lgpu_stg, capacity);
> >> + ddr->lgpu_count_eff =
> >> + bitmap_weight(ddr->lgpu_eff, capacity);
> >> + }
> >> + }
> >> + rcu_read_unlock();
> >> +}
> >> +
> >> +static void drmcg_apply_effective(enum drmcg_res_type type,
> >> + struct drm_device *dev, struct drmcg *changed_drmcg)
> >> +{
> >> + switch (type) {
> >> + case DRMCG_TYPE_LGPU:
> >> + drmcg_apply_effective_lgpu(dev);
> >> + break;
> >> + default:
> >> + break;
> >> + }
> >> +}
> >> +
> >> /**
> >> * drmcg_register_dev - register a DRM device for usage in drm cgroup
> >> * @dev: DRM device
> >> @@ -143,7 +269,13 @@ void drmcg_register_dev(struct drm_device *dev)
> >> {
> >> dev->driver->drmcg_custom_init(dev, &dev->drmcg_props);
> >>
> >> + WARN_ON(dev->drmcg_props.lgpu_capacity !=
> >> + bitmap_weight(dev->drmcg_props.lgpu_slots,
> >> + MAX_DRMCG_LGPU_CAPACITY));
> >> +
> >> drmcg_update_cg_tree(dev);
> >> +
> >> + drmcg_apply_effective(DRMCG_TYPE_LGPU, dev, root_drmcg);
> >> }
> >> mutex_unlock(&drmcg_mutex);
> >> }
> >> @@ -297,7 +429,8 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
> >> }
> >>
> >> static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >> - struct seq_file *sf, enum drmcg_res_type type)
> >> + struct seq_file *sf, enum drmcg_res_type type,
> >> + struct drm_device *dev)
> >> {
> >> if (ddr == NULL) {
> >> seq_puts(sf, "\n");
> >> @@ -311,6 +444,25 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >> case DRMCG_TYPE_BO_PEAK:
> >> seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
> >> break;
> >> + case DRMCG_TYPE_LGPU:
> >> + seq_printf(sf, "%s=%lld %s=%d %s=%*pbl\n",
> >> + LGPU_LIMITS_NAME_WEIGHT,
> >> + ddr->lgpu_weight_cfg,
> >> + LGPU_LIMITS_NAME_COUNT,
> >> + bitmap_weight(ddr->lgpu_cfg,
> >> + dev->drmcg_props.lgpu_capacity),
> >> + LGPU_LIMITS_NAME_LIST,
> >> + dev->drmcg_props.lgpu_capacity,
> >> + ddr->lgpu_cfg);
> >> + break;
> >> + case DRMCG_TYPE_LGPU_EFF:
> >> + seq_printf(sf, "%s=%lld %s=%*pbl\n",
> >> + LGPU_LIMITS_NAME_COUNT,
> >> + ddr->lgpu_count_eff,
> >> + LGPU_LIMITS_NAME_LIST,
> >> + dev->drmcg_props.lgpu_capacity,
> >> + ddr->lgpu_eff);
> >> + break;
> >> default:
> >> seq_puts(sf, "\n");
> >> break;
> >> @@ -329,6 +481,17 @@ static void drmcg_print_default(struct drmcg_props *props,
> >> seq_printf(sf, "%lld\n",
> >> props->bo_limits_peak_allocated_default);
> >> break;
> >> + case DRMCG_TYPE_LGPU:
> >> + seq_printf(sf, "%s=%d %s=%d %s=%*pbl\n",
> >> + LGPU_LIMITS_NAME_WEIGHT,
> >> + CGROUP_WEIGHT_DFL,
> >> + LGPU_LIMITS_NAME_COUNT,
> >> + bitmap_weight(props->lgpu_slots,
> >> + props->lgpu_capacity),
> >> + LGPU_LIMITS_NAME_LIST,
> >> + props->lgpu_capacity,
> >> + props->lgpu_slots);
> >> + break;
> >> default:
> >> seq_puts(sf, "\n");
> >> break;
> >> @@ -358,7 +521,7 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data)
> >> drmcg_print_stats(ddr, sf, type);
> >> break;
> >> case DRMCG_FTYPE_LIMIT:
> >> - drmcg_print_limits(ddr, sf, type);
> >> + drmcg_print_limits(ddr, sf, type, minor->dev);
> >> break;
> >> case DRMCG_FTYPE_DEFAULT:
> >> drmcg_print_default(&minor->dev->drmcg_props, sf, type);
> >> @@ -415,6 +578,115 @@ static int drmcg_process_limit_s64_val(char *sval, bool is_mem,
> >> return rc;
> >> }
> >>
> >> +static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> >> + struct drm_device *dev, char *attrs)
> >> +{
> >> + DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >> + DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >> + enum drmcg_res_type type =
> >> + DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> >> + struct drmcg *drmcg = css_to_drmcg(of_css(of));
> >> + struct drmcg_props *props = &dev->drmcg_props;
> >> + char *cft_name = of_cft(of)->name;
> >> + int minor = dev->primary->index;
> >> + char *nested = strstrip(attrs);
> >> + struct drmcg_device_resource *ddr =
> >> + drmcg->dev_resources[minor];
> >> + char *attr;
> >> + char sname[256];
> >> + char sval[256];
> >> + s64 val;
> >> + int rc;
> >> +
> >> + while (nested != NULL) {
> >> + attr = strsep(&nested, " ");
> >> +
> >> + if (sscanf(attr, "%255[^=]=%255[^=]", sname, sval) != 2)
> >> + continue;
> >> +
> >> + switch (type) {
> >> + case DRMCG_TYPE_LGPU:
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> >> + strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) &&
> >> + strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256))
> >> + continue;
> >> +
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) &&
> >> + (!strcmp("max", sval) ||
> >> + !strcmp("default", sval))) {
> >> + bitmap_copy(ddr->lgpu_cfg, props->lgpu_slots,
> >> + props->lgpu_capacity);
> >> +
> >> + continue;
> >> + }
> >> +
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) == 0) {
> >> + rc = drmcg_process_limit_s64_val(sval,
> >> + false, CGROUP_WEIGHT_DFL,
> >> + CGROUP_WEIGHT_MAX, &val);
> >> +
> >> + if (rc || val < CGROUP_WEIGHT_MIN ||
> >> + val > CGROUP_WEIGHT_MAX) {
> >> + drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> + minor);
> >> + continue;
> >> + }
> >> +
> >> + ddr->lgpu_weight_cfg = val;
> >> + continue;
> >> + }
> >> +
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> >> + rc = drmcg_process_limit_s64_val(sval,
> >> + false, props->lgpu_capacity,
> >> + props->lgpu_capacity, &val);
> >> +
> >> + if (rc || val < 0) {
> >> + drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> + minor);
> >> + continue;
> >> + }
> >> +
> >> + bitmap_zero(tmp_bitmap,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> + bitmap_set(tmp_bitmap, 0, val);
> >> + }
> >> +
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> >> + rc = bitmap_parselist(sval, tmp_bitmap,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> + if (rc) {
> >> + drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> + minor);
> >> + continue;
> >> + }
> >> +
> >> + bitmap_andnot(chk_bitmap, tmp_bitmap,
> >> + props->lgpu_slots,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> + /* user setting does not intersect with
> >> + * available lgpu */
> >> + if (!bitmap_empty(chk_bitmap,
> >> + MAX_DRMCG_LGPU_CAPACITY)) {
> >> + drmcg_pr_cft_err(drmcg, 0, cft_name,
> >> + minor);
> >> + continue;
> >> + }
> >> + }
> >> +
> >> + bitmap_copy(ddr->lgpu_cfg, tmp_bitmap,
> >> + props->lgpu_capacity);
> >> +
> >> + break; /* DRMCG_TYPE_LGPU */
> >> + default:
> >> + break;
> >> + } /* switch (type) */
> >> + }
> >> +}
> >> +
> >> +
> >> /**
> >> * drmcg_limit_write - parse cgroup interface files to obtain user config
> >> *
> >> @@ -499,9 +771,15 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> >>
> >> ddr->bo_limits_peak_allocated = val;
> >> break;
> >> + case DRMCG_TYPE_LGPU:
> >> + drmcg_nested_limit_parse(of, dm->dev, sattr);
> >> + break;
> >> default:
> >> break;
> >> }
> >> +
> >> + drmcg_apply_effective(type, dm->dev, drmcg);
> >> +
> >> mutex_unlock(&dm->dev->drmcg_mutex);
> >>
> >> mutex_lock(&drmcg_mutex);
> >> @@ -560,12 +838,51 @@ struct cftype files[] = {
> >> .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
> >> DRMCG_FTYPE_STATS),
> >> },
> >> + {
> >> + .name = "lgpu",
> >> + .seq_show = drmcg_seq_show,
> >> + .write = drmcg_limit_write,
> >> + .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >> + DRMCG_FTYPE_LIMIT),
> >> + },
> >> + {
> >> + .name = "lgpu.default",
> >> + .seq_show = drmcg_seq_show,
> >> + .flags = CFTYPE_ONLY_ON_ROOT,
> >> + .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >> + DRMCG_FTYPE_DEFAULT),
> >> + },
> >> + {
> >> + .name = "lgpu.effective",
> >> + .seq_show = drmcg_seq_show,
> >> + .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU_EFF,
> >> + DRMCG_FTYPE_LIMIT),
> >> + },
> >> { } /* terminate */
> >> };
> >>
> >> +static int drmcg_online_fn(int id, void *ptr, void *data)
> >> +{
> >> + struct drm_minor *minor = ptr;
> >> + struct drmcg *drmcg = data;
> >> +
> >> + if (minor->type != DRM_MINOR_PRIMARY)
> >> + return 0;
> >> +
> >> + drmcg_apply_effective(DRMCG_TYPE_LGPU, minor->dev, drmcg);
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int drmcg_css_online(struct cgroup_subsys_state *css)
> >> +{
> >> + return drm_minor_for_each(&drmcg_online_fn, css_to_drmcg(css));
> >> +}
> >> +
> >> struct cgroup_subsys drm_cgrp_subsys = {
> >> .css_alloc = drmcg_css_alloc,
> >> .css_free = drmcg_css_free,
> >> + .css_online = drmcg_css_online,
> >> .early_init = false,
> >> .legacy_cftypes = files,
> >> .dfl_cftypes = files,
> >> @@ -585,6 +902,9 @@ void drmcg_device_early_init(struct drm_device *dev)
> >> dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
> >> dev->drmcg_props.bo_limits_peak_allocated_default = S64_MAX;
> >>
> >> + dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> >> + bitmap_fill(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> drmcg_update_cg_tree(dev);
> >> }
> >> EXPORT_SYMBOL(drmcg_device_early_init);
> >> --
> >> 2.25.0
> >>
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply
* Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource
From: Daniel Vetter @ 2020-02-14 18:34 UTC (permalink / raw)
To: Kenny Ho
Cc: Jason Ekstrand, Kenny Ho, cgroups-u79uwXL29TY76Z2rM5mHXA,
Maling list - DRI developers, amd-gfx mailing list, Tejun Heo,
Alex Deucher, Christian König, Kuehling, Felix,
Greathouse, Joseph, jsparks-WVYJKLFxKCc, lkaplan-WVYJKLFxKCc,
Daniel Vetter, nirmoy.das-5C7GfCeVMHo,
damon.mcdougall-5C7GfCeVMHo, juan.zuniga-anaya-5C7GfCeVMHo
In-Reply-To: <CAOWid-f62Uv=GZXX2V2BsQGM5A1JJG_qmyrOwd=KwZBx_sr-bg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Fri, Feb 14, 2020 at 12:08:35PM -0500, Kenny Ho wrote:
> Hi Jason,
>
> Thanks for the review.
>
> On Fri, Feb 14, 2020 at 11:44 AM Jason Ekstrand <jason-fQELhIk9awXprZlt/sZkLg@public.gmane.org> wrote:
> >
> > Pardon my ignorance but I'm a bit confused by this. What is a "logical GPU"? What are we subdividing? Are we carving up memory? Compute power? Both?
>
> The intention is compute but it is up to the individual drm driver to decide.
I think guidance from Tejun in previos discussions was pretty clear that
he expects cgroups to be both a) standardized and c) sufficient clear
meaning that end-users have a clear understanding of what happens when
they change the resource allocation.
I'm not sure lgpu here, at least as specified, passes either. But I also
don't have much clue, so pulled Jason in - he understands how this all
gets reflected to userspace apis a lot better than me.
-Daniel
>
> > If it's carving up compute power, what's actually being carved up? Time? Execution units/waves/threads? Even if that's the case, what advantage does it give to have it in terms of a fixed set of lgpus where each cgroup gets to pick a fixed set. Does affinity matter that much? Why not just say how many waves the GPU supports and that they have to be allocated in chunks of 16 waves (pulling a number out of thin air) and let the cgroup specify how many waves it wants.
> >
> > Don't get me wrong here. I'm all for the notion of being able to use cgroups to carve up GPU compute resources. However, this sounds to me like the most AMD-specific solution possible. We (Intel) could probably do some sort of carving up as well but we'd likely want to do it with preemption and time-slicing rather than handing out specific EUs.
>
> This has been discussed in the RFC before
> (https://www.spinics.net/lists/cgroups/msg23469.html.) As mentioned
> before, the idea of a compute unit is hardly an AMD specific thing as
> it is in the OpenCL standard and part of the architecture of many
> different vendors. In addition, the interface presented here supports
> Intel's use case. What you described is what I considered as the
> "anonymous resources" view of the lgpu. What you/Intel can do, is to
> register your device to drmcg to have 100 lgpu and users can specify
> simply by count. So if they want to allocate 5% for a cgroup, they
> would set count=5. Per the documentation in this patch: "Some DRM
> devices may only support lgpu as anonymous resources. In such case,
> the significance of the position of the set bits in list will be
> ignored." What Intel does with the user expressed configuration of "5
> out of 100" is entirely up to Intel (time slice if you like, change to
> specific EUs later if you like, or make it driver configurable to
> support both if you like.)
>
> Regards,
> Kenny
>
> >
> > On Fri, Feb 14, 2020 at 9:57 AM Kenny Ho <Kenny.Ho-5C7GfCeVMHo@public.gmane.org> wrote:
> >>
> >> drm.lgpu
> >> A read-write nested-keyed file which exists on all cgroups.
> >> Each entry is keyed by the DRM device's major:minor.
> >>
> >> lgpu stands for logical GPU, it is an abstraction used to
> >> subdivide a physical DRM device for the purpose of resource
> >> management. This file stores user configuration while the
> >> drm.lgpu.effective reflects the actual allocation after
> >> considering the relationship between the cgroups and their
> >> configurations.
> >>
> >> The lgpu is a discrete quantity that is device specific (i.e.
> >> some DRM devices may have 64 lgpus while others may have 100
> >> lgpus.) The lgpu is a single quantity that can be allocated
> >> in three different ways denoted by the following nested keys.
> >>
> >> ===== ==============================================
> >> weight Allocate by proportion in relationship with
> >> active sibling cgroups
> >> count Allocate by amount statically, treat lgpu as
> >> anonymous resources
> >> list Allocate statically, treat lgpu as named
> >> resource
> >> ===== ==============================================
> >>
> >> For example:
> >> 226:0 weight=100 count=256 list=0-255
> >> 226:1 weight=100 count=4 list=0,2,4,6
> >> 226:2 weight=100 count=32 list=32-63
> >> 226:3 weight=100 count=0 list=
> >> 226:4 weight=500 count=0 list=
> >>
> >> lgpu is represented by a bitmap and uses the bitmap_parselist
> >> kernel function so the list key input format is a
> >> comma-separated list of decimal numbers and ranges.
> >>
> >> Consecutively set bits are shown as two hyphen-separated decimal
> >> numbers, the smallest and largest bit numbers set in the range.
> >> Optionally each range can be postfixed to denote that only parts
> >> of it should be set. The range will divided to groups of
> >> specific size.
> >> Syntax: range:used_size/group_size
> >> Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >>
> >> The count key is the hamming weight / hweight of the bitmap.
> >>
> >> Weight, count and list accept the max and default keywords.
> >>
> >> Some DRM devices may only support lgpu as anonymous resources.
> >> In such case, the significance of the position of the set bits
> >> in list will be ignored.
> >>
> >> The weight quantity is only in effect when static allocation
> >> is not used (by setting count=0) for this cgroup. The weight
> >> quantity distributes lgpus that are not statically allocated by
> >> the siblings. For example, given siblings cgroupA, cgroupB and
> >> cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
> >> 0-63, no lgpu is available to be distributed by weight.
> >> Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
> >> cgroupC will be starved if it tries to allocate by weight.
> >>
> >> On the other hand, if cgroupA has weight=100 count=0, cgroupB
> >> has list=16-47, and cgroupC has weight=100 count=0, then 32
> >> lgpus are available to be distributed evenly between cgroupA
> >> and cgroupC. In drm.lgpu.effective, cgroupA will have
> >> list=0-15 and cgroupC will have list=48-63.
> >>
> >> This lgpu resource supports the 'allocation' and 'weight'
> >> resource distribution model.
> >>
> >> drm.lgpu.effective
> >> A read-only nested-keyed file which exists on all cgroups.
> >> Each entry is keyed by the DRM device's major:minor.
> >>
> >> lgpu stands for logical GPU, it is an abstraction used to
> >> subdivide a physical DRM device for the purpose of resource
> >> management. This file reflects the actual allocation after
> >> considering the relationship between the cgroups and their
> >> configurations in drm.lgpu.
> >>
> >> Change-Id: Idde0ef9a331fd67bb9c7eb8ef9978439e6452488
> >> Signed-off-by: Kenny Ho <Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
> >> ---
> >> Documentation/admin-guide/cgroup-v2.rst | 80 ++++++
> >> include/drm/drm_cgroup.h | 3 +
> >> include/linux/cgroup_drm.h | 22 ++
> >> kernel/cgroup/drm.c | 324 +++++++++++++++++++++++-
> >> 4 files changed, 427 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> >> index ce5dc027366a..d8a41956e5c7 100644
> >> --- a/Documentation/admin-guide/cgroup-v2.rst
> >> +++ b/Documentation/admin-guide/cgroup-v2.rst
> >> @@ -2120,6 +2120,86 @@ DRM Interface Files
> >> Set largest allocation for /dev/dri/card1 to 4MB
> >> echo "226:1 4m" > drm.buffer.peak.max
> >>
> >> + drm.lgpu
> >> + A read-write nested-keyed file which exists on all cgroups.
> >> + Each entry is keyed by the DRM device's major:minor.
> >> +
> >> + lgpu stands for logical GPU, it is an abstraction used to
> >> + subdivide a physical DRM device for the purpose of resource
> >> + management. This file stores user configuration while the
> >> + drm.lgpu.effective reflects the actual allocation after
> >> + considering the relationship between the cgroups and their
> >> + configurations.
> >> +
> >> + The lgpu is a discrete quantity that is device specific (i.e.
> >> + some DRM devices may have 64 lgpus while others may have 100
> >> + lgpus.) The lgpu is a single quantity that can be allocated
> >> + in three different ways denoted by the following nested keys.
> >> +
> >> + ===== ==============================================
> >> + weight Allocate by proportion in relationship with
> >> + active sibling cgroups
> >> + count Allocate by amount statically, treat lgpu as
> >> + anonymous resources
> >> + list Allocate statically, treat lgpu as named
> >> + resource
> >> + ===== ==============================================
> >> +
> >> + For example:
> >> + 226:0 weight=100 count=256 list=0-255
> >> + 226:1 weight=100 count=4 list=0,2,4,6
> >> + 226:2 weight=100 count=32 list=32-63
> >> + 226:3 weight=100 count=0 list=
> >> + 226:4 weight=500 count=0 list=
> >> +
> >> + lgpu is represented by a bitmap and uses the bitmap_parselist
> >> + kernel function so the list key input format is a
> >> + comma-separated list of decimal numbers and ranges.
> >> +
> >> + Consecutively set bits are shown as two hyphen-separated decimal
> >> + numbers, the smallest and largest bit numbers set in the range.
> >> + Optionally each range can be postfixed to denote that only parts
> >> + of it should be set. The range will divided to groups of
> >> + specific size.
> >> + Syntax: range:used_size/group_size
> >> + Example: 0-1023:2/256 ==> 0,1,256,257,512,513,768,769
> >> +
> >> + The count key is the hamming weight / hweight of the bitmap.
> >> +
> >> + Weight, count and list accept the max and default keywords.
> >> +
> >> + Some DRM devices may only support lgpu as anonymous resources.
> >> + In such case, the significance of the position of the set bits
> >> + in list will be ignored.
> >> +
> >> + The weight quantity is only in effect when static allocation
> >> + is not used (by setting count=0) for this cgroup. The weight
> >> + quantity distributes lgpus that are not statically allocated by
> >> + the siblings. For example, given siblings cgroupA, cgroupB and
> >> + cgroupC for a DRM device that has 64 lgpus, if cgroupA occupies
> >> + 0-63, no lgpu is available to be distributed by weight.
> >> + Similarly, if cgroupA has list=0-31 and cgroupB has list=16-63,
> >> + cgroupC will be starved if it tries to allocate by weight.
> >> +
> >> + On the other hand, if cgroupA has weight=100 count=0, cgroupB
> >> + has list=16-47, and cgroupC has weight=100 count=0, then 32
> >> + lgpus are available to be distributed evenly between cgroupA
> >> + and cgroupC. In drm.lgpu.effective, cgroupA will have
> >> + list=0-15 and cgroupC will have list=48-63.
> >> +
> >> + This lgpu resource supports the 'allocation' and 'weight'
> >> + resource distribution model.
> >> +
> >> + drm.lgpu.effective
> >> + A read-only nested-keyed file which exists on all cgroups.
> >> + Each entry is keyed by the DRM device's major:minor.
> >> +
> >> + lgpu stands for logical GPU, it is an abstraction used to
> >> + subdivide a physical DRM device for the purpose of resource
> >> + management. This file reflects the actual allocation after
> >> + considering the relationship between the cgroups and their
> >> + configurations in drm.lgpu.
> >> +
> >> GEM Buffer Ownership
> >> ~~~~~~~~~~~~~~~~~~~~
> >>
> >> diff --git a/include/drm/drm_cgroup.h b/include/drm/drm_cgroup.h
> >> index 2b41d4d22e33..619a110cc748 100644
> >> --- a/include/drm/drm_cgroup.h
> >> +++ b/include/drm/drm_cgroup.h
> >> @@ -17,6 +17,9 @@ struct drmcg_props {
> >>
> >> s64 bo_limits_total_allocated_default;
> >> s64 bo_limits_peak_allocated_default;
> >> +
> >> + int lgpu_capacity;
> >> + DECLARE_BITMAP(lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >> };
> >>
> >> void drmcg_bind(struct drm_minor (*(*acq_dm)(unsigned int minor_id)),
> >> diff --git a/include/linux/cgroup_drm.h b/include/linux/cgroup_drm.h
> >> index eae400f3d9b4..bb09704e7f71 100644
> >> --- a/include/linux/cgroup_drm.h
> >> +++ b/include/linux/cgroup_drm.h
> >> @@ -11,10 +11,14 @@
> >> /* limit defined per the way drm_minor_alloc operates */
> >> #define MAX_DRM_DEV (64 * DRM_MINOR_RENDER)
> >>
> >> +#define MAX_DRMCG_LGPU_CAPACITY 256
> >> +
> >> enum drmcg_res_type {
> >> DRMCG_TYPE_BO_TOTAL,
> >> DRMCG_TYPE_BO_PEAK,
> >> DRMCG_TYPE_BO_COUNT,
> >> + DRMCG_TYPE_LGPU,
> >> + DRMCG_TYPE_LGPU_EFF,
> >> __DRMCG_TYPE_LAST,
> >> };
> >>
> >> @@ -32,6 +36,24 @@ struct drmcg_device_resource {
> >> s64 bo_limits_peak_allocated;
> >>
> >> s64 bo_stats_count_allocated;
> >> +
> >> + /**
> >> + * Logical GPU
> >> + *
> >> + * *_cfg are properties configured by users
> >> + * *_eff are the effective properties being applied to the hardware
> >> + * *_stg is used to calculate _eff before applying to _eff
> >> + * after considering the entire hierarchy
> >> + */
> >> + DECLARE_BITMAP(lgpu_stg, MAX_DRMCG_LGPU_CAPACITY);
> >> + /* user configurations */
> >> + s64 lgpu_weight_cfg;
> >> + DECLARE_BITMAP(lgpu_cfg, MAX_DRMCG_LGPU_CAPACITY);
> >> + /* effective lgpu for the cgroup after considering
> >> + * relationship with other cgroup
> >> + */
> >> + s64 lgpu_count_eff;
> >> + DECLARE_BITMAP(lgpu_eff, MAX_DRMCG_LGPU_CAPACITY);
> >> };
> >>
> >> /**
> >> diff --git a/kernel/cgroup/drm.c b/kernel/cgroup/drm.c
> >> index 5fcbbc13fa1c..a4e88a3704bb 100644
> >> --- a/kernel/cgroup/drm.c
> >> +++ b/kernel/cgroup/drm.c
> >> @@ -9,6 +9,7 @@
> >> #include <linux/seq_file.h>
> >> #include <linux/mutex.h>
> >> #include <linux/kernel.h>
> >> +#include <linux/bitmap.h>
> >> #include <linux/cgroup_drm.h>
> >> #include <drm/drm_file.h>
> >> #include <drm/drm_drv.h>
> >> @@ -41,6 +42,10 @@ enum drmcg_file_type {
> >> DRMCG_FTYPE_DEFAULT,
> >> };
> >>
> >> +#define LGPU_LIMITS_NAME_LIST "list"
> >> +#define LGPU_LIMITS_NAME_COUNT "count"
> >> +#define LGPU_LIMITS_NAME_WEIGHT "weight"
> >> +
> >> /**
> >> * drmcg_bind - Bind DRM subsystem to cgroup subsystem
> >> * @acq_dm: function pointer to the drm_minor_acquire function
> >> @@ -98,6 +103,13 @@ static inline int init_drmcg_single(struct drmcg *drmcg, struct drm_device *dev)
> >> ddr->bo_limits_peak_allocated =
> >> dev->drmcg_props.bo_limits_peak_allocated_default;
> >>
> >> + bitmap_copy(ddr->lgpu_cfg, dev->drmcg_props.lgpu_slots,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> + bitmap_copy(ddr->lgpu_stg, dev->drmcg_props.lgpu_slots,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> + ddr->lgpu_weight_cfg = CGROUP_WEIGHT_DFL;
> >> +
> >> return 0;
> >> }
> >>
> >> @@ -121,6 +133,120 @@ static inline void drmcg_update_cg_tree(struct drm_device *dev)
> >> mutex_unlock(&cgroup_mutex);
> >> }
> >>
> >> +static void drmcg_calculate_effective_lgpu(struct drm_device *dev,
> >> + const unsigned long *free_static,
> >> + const unsigned long *free_weighted,
> >> + struct drmcg *parent_drmcg)
> >> +{
> >> + int capacity = dev->drmcg_props.lgpu_capacity;
> >> + DECLARE_BITMAP(lgpu_unused, MAX_DRMCG_LGPU_CAPACITY);
> >> + DECLARE_BITMAP(lgpu_by_weight, MAX_DRMCG_LGPU_CAPACITY);
> >> + struct drmcg_device_resource *parent_ddr;
> >> + struct drmcg_device_resource *ddr;
> >> + int minor = dev->primary->index;
> >> + struct cgroup_subsys_state *pos;
> >> + struct drmcg *child;
> >> + s64 weight_sum = 0;
> >> + s64 unused;
> >> +
> >> + parent_ddr = parent_drmcg->dev_resources[minor];
> >> +
> >> + if (bitmap_empty(parent_ddr->lgpu_cfg, capacity))
> >> + /* no static cfg, use weight for calculating the effective */
> >> + bitmap_copy(parent_ddr->lgpu_stg, free_weighted, capacity);
> >> + else
> >> + /* lgpu statically configured, use the overlap as effective */
> >> + bitmap_and(parent_ddr->lgpu_stg, free_static,
> >> + parent_ddr->lgpu_cfg, capacity);
> >> +
> >> + /* calculate lgpu available for distribution by weight for children */
> >> + bitmap_copy(lgpu_unused, parent_ddr->lgpu_stg, capacity);
> >> + css_for_each_child(pos, &parent_drmcg->css) {
> >> + child = css_to_drmcg(pos);
> >> + ddr = child->dev_resources[minor];
> >> +
> >> + if (bitmap_empty(ddr->lgpu_cfg, capacity))
> >> + /* no static allocation, participate in weight dist */
> >> + weight_sum += ddr->lgpu_weight_cfg;
> >> + else
> >> + /* take out statically allocated lgpu by siblings */
> >> + bitmap_andnot(lgpu_unused, lgpu_unused, ddr->lgpu_cfg,
> >> + capacity);
> >> + }
> >> +
> >> + unused = bitmap_weight(lgpu_unused, capacity);
> >> +
> >> + css_for_each_child(pos, &parent_drmcg->css) {
> >> + child = css_to_drmcg(pos);
> >> + ddr = child->dev_resources[minor];
> >> +
> >> + bitmap_zero(lgpu_by_weight, capacity);
> >> + /* no static allocation, participate in weight distribution */
> >> + if (bitmap_empty(ddr->lgpu_cfg, capacity)) {
> >> + int c;
> >> + int p = 0;
> >> +
> >> + for (c = ddr->lgpu_weight_cfg * unused / weight_sum;
> >> + c > 0; c--) {
> >> + p = find_next_bit(lgpu_unused, capacity, p);
> >> + if (p < capacity) {
> >> + clear_bit(p, lgpu_unused);
> >> + set_bit(p, lgpu_by_weight);
> >> + }
> >> + }
> >> +
> >> + }
> >> +
> >> + drmcg_calculate_effective_lgpu(dev, parent_ddr->lgpu_stg,
> >> + lgpu_by_weight, child);
> >> + }
> >> +}
> >> +
> >> +static void drmcg_apply_effective_lgpu(struct drm_device *dev)
> >> +{
> >> + int capacity = dev->drmcg_props.lgpu_capacity;
> >> + int minor = dev->primary->index;
> >> + struct drmcg_device_resource *ddr;
> >> + struct cgroup_subsys_state *pos;
> >> + struct drmcg *drmcg;
> >> +
> >> + if (root_drmcg == NULL) {
> >> + WARN_ON(root_drmcg == NULL);
> >> + return;
> >> + }
> >> +
> >> + rcu_read_lock();
> >> +
> >> + /* process the entire cgroup tree from root to simplify the algorithm */
> >> + drmcg_calculate_effective_lgpu(dev, dev->drmcg_props.lgpu_slots,
> >> + dev->drmcg_props.lgpu_slots, root_drmcg);
> >> +
> >> + /* apply changes to effective only if there is a change */
> >> + css_for_each_descendant_pre(pos, &root_drmcg->css) {
> >> + drmcg = css_to_drmcg(pos);
> >> + ddr = drmcg->dev_resources[minor];
> >> +
> >> + if (!bitmap_equal(ddr->lgpu_stg, ddr->lgpu_eff, capacity)) {
> >> + bitmap_copy(ddr->lgpu_eff, ddr->lgpu_stg, capacity);
> >> + ddr->lgpu_count_eff =
> >> + bitmap_weight(ddr->lgpu_eff, capacity);
> >> + }
> >> + }
> >> + rcu_read_unlock();
> >> +}
> >> +
> >> +static void drmcg_apply_effective(enum drmcg_res_type type,
> >> + struct drm_device *dev, struct drmcg *changed_drmcg)
> >> +{
> >> + switch (type) {
> >> + case DRMCG_TYPE_LGPU:
> >> + drmcg_apply_effective_lgpu(dev);
> >> + break;
> >> + default:
> >> + break;
> >> + }
> >> +}
> >> +
> >> /**
> >> * drmcg_register_dev - register a DRM device for usage in drm cgroup
> >> * @dev: DRM device
> >> @@ -143,7 +269,13 @@ void drmcg_register_dev(struct drm_device *dev)
> >> {
> >> dev->driver->drmcg_custom_init(dev, &dev->drmcg_props);
> >>
> >> + WARN_ON(dev->drmcg_props.lgpu_capacity !=
> >> + bitmap_weight(dev->drmcg_props.lgpu_slots,
> >> + MAX_DRMCG_LGPU_CAPACITY));
> >> +
> >> drmcg_update_cg_tree(dev);
> >> +
> >> + drmcg_apply_effective(DRMCG_TYPE_LGPU, dev, root_drmcg);
> >> }
> >> mutex_unlock(&drmcg_mutex);
> >> }
> >> @@ -297,7 +429,8 @@ static void drmcg_print_stats(struct drmcg_device_resource *ddr,
> >> }
> >>
> >> static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >> - struct seq_file *sf, enum drmcg_res_type type)
> >> + struct seq_file *sf, enum drmcg_res_type type,
> >> + struct drm_device *dev)
> >> {
> >> if (ddr == NULL) {
> >> seq_puts(sf, "\n");
> >> @@ -311,6 +444,25 @@ static void drmcg_print_limits(struct drmcg_device_resource *ddr,
> >> case DRMCG_TYPE_BO_PEAK:
> >> seq_printf(sf, "%lld\n", ddr->bo_limits_peak_allocated);
> >> break;
> >> + case DRMCG_TYPE_LGPU:
> >> + seq_printf(sf, "%s=%lld %s=%d %s=%*pbl\n",
> >> + LGPU_LIMITS_NAME_WEIGHT,
> >> + ddr->lgpu_weight_cfg,
> >> + LGPU_LIMITS_NAME_COUNT,
> >> + bitmap_weight(ddr->lgpu_cfg,
> >> + dev->drmcg_props.lgpu_capacity),
> >> + LGPU_LIMITS_NAME_LIST,
> >> + dev->drmcg_props.lgpu_capacity,
> >> + ddr->lgpu_cfg);
> >> + break;
> >> + case DRMCG_TYPE_LGPU_EFF:
> >> + seq_printf(sf, "%s=%lld %s=%*pbl\n",
> >> + LGPU_LIMITS_NAME_COUNT,
> >> + ddr->lgpu_count_eff,
> >> + LGPU_LIMITS_NAME_LIST,
> >> + dev->drmcg_props.lgpu_capacity,
> >> + ddr->lgpu_eff);
> >> + break;
> >> default:
> >> seq_puts(sf, "\n");
> >> break;
> >> @@ -329,6 +481,17 @@ static void drmcg_print_default(struct drmcg_props *props,
> >> seq_printf(sf, "%lld\n",
> >> props->bo_limits_peak_allocated_default);
> >> break;
> >> + case DRMCG_TYPE_LGPU:
> >> + seq_printf(sf, "%s=%d %s=%d %s=%*pbl\n",
> >> + LGPU_LIMITS_NAME_WEIGHT,
> >> + CGROUP_WEIGHT_DFL,
> >> + LGPU_LIMITS_NAME_COUNT,
> >> + bitmap_weight(props->lgpu_slots,
> >> + props->lgpu_capacity),
> >> + LGPU_LIMITS_NAME_LIST,
> >> + props->lgpu_capacity,
> >> + props->lgpu_slots);
> >> + break;
> >> default:
> >> seq_puts(sf, "\n");
> >> break;
> >> @@ -358,7 +521,7 @@ static int drmcg_seq_show_fn(int id, void *ptr, void *data)
> >> drmcg_print_stats(ddr, sf, type);
> >> break;
> >> case DRMCG_FTYPE_LIMIT:
> >> - drmcg_print_limits(ddr, sf, type);
> >> + drmcg_print_limits(ddr, sf, type, minor->dev);
> >> break;
> >> case DRMCG_FTYPE_DEFAULT:
> >> drmcg_print_default(&minor->dev->drmcg_props, sf, type);
> >> @@ -415,6 +578,115 @@ static int drmcg_process_limit_s64_val(char *sval, bool is_mem,
> >> return rc;
> >> }
> >>
> >> +static void drmcg_nested_limit_parse(struct kernfs_open_file *of,
> >> + struct drm_device *dev, char *attrs)
> >> +{
> >> + DECLARE_BITMAP(tmp_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >> + DECLARE_BITMAP(chk_bitmap, MAX_DRMCG_LGPU_CAPACITY);
> >> + enum drmcg_res_type type =
> >> + DRMCG_CTF_PRIV2RESTYPE(of_cft(of)->private);
> >> + struct drmcg *drmcg = css_to_drmcg(of_css(of));
> >> + struct drmcg_props *props = &dev->drmcg_props;
> >> + char *cft_name = of_cft(of)->name;
> >> + int minor = dev->primary->index;
> >> + char *nested = strstrip(attrs);
> >> + struct drmcg_device_resource *ddr =
> >> + drmcg->dev_resources[minor];
> >> + char *attr;
> >> + char sname[256];
> >> + char sval[256];
> >> + s64 val;
> >> + int rc;
> >> +
> >> + while (nested != NULL) {
> >> + attr = strsep(&nested, " ");
> >> +
> >> + if (sscanf(attr, "%255[^=]=%255[^=]", sname, sval) != 2)
> >> + continue;
> >> +
> >> + switch (type) {
> >> + case DRMCG_TYPE_LGPU:
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) &&
> >> + strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) &&
> >> + strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256))
> >> + continue;
> >> +
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) &&
> >> + (!strcmp("max", sval) ||
> >> + !strcmp("default", sval))) {
> >> + bitmap_copy(ddr->lgpu_cfg, props->lgpu_slots,
> >> + props->lgpu_capacity);
> >> +
> >> + continue;
> >> + }
> >> +
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_WEIGHT, 256) == 0) {
> >> + rc = drmcg_process_limit_s64_val(sval,
> >> + false, CGROUP_WEIGHT_DFL,
> >> + CGROUP_WEIGHT_MAX, &val);
> >> +
> >> + if (rc || val < CGROUP_WEIGHT_MIN ||
> >> + val > CGROUP_WEIGHT_MAX) {
> >> + drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> + minor);
> >> + continue;
> >> + }
> >> +
> >> + ddr->lgpu_weight_cfg = val;
> >> + continue;
> >> + }
> >> +
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_COUNT, 256) == 0) {
> >> + rc = drmcg_process_limit_s64_val(sval,
> >> + false, props->lgpu_capacity,
> >> + props->lgpu_capacity, &val);
> >> +
> >> + if (rc || val < 0) {
> >> + drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> + minor);
> >> + continue;
> >> + }
> >> +
> >> + bitmap_zero(tmp_bitmap,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> + bitmap_set(tmp_bitmap, 0, val);
> >> + }
> >> +
> >> + if (strncmp(sname, LGPU_LIMITS_NAME_LIST, 256) == 0) {
> >> + rc = bitmap_parselist(sval, tmp_bitmap,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> + if (rc) {
> >> + drmcg_pr_cft_err(drmcg, rc, cft_name,
> >> + minor);
> >> + continue;
> >> + }
> >> +
> >> + bitmap_andnot(chk_bitmap, tmp_bitmap,
> >> + props->lgpu_slots,
> >> + MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> + /* user setting does not intersect with
> >> + * available lgpu */
> >> + if (!bitmap_empty(chk_bitmap,
> >> + MAX_DRMCG_LGPU_CAPACITY)) {
> >> + drmcg_pr_cft_err(drmcg, 0, cft_name,
> >> + minor);
> >> + continue;
> >> + }
> >> + }
> >> +
> >> + bitmap_copy(ddr->lgpu_cfg, tmp_bitmap,
> >> + props->lgpu_capacity);
> >> +
> >> + break; /* DRMCG_TYPE_LGPU */
> >> + default:
> >> + break;
> >> + } /* switch (type) */
> >> + }
> >> +}
> >> +
> >> +
> >> /**
> >> * drmcg_limit_write - parse cgroup interface files to obtain user config
> >> *
> >> @@ -499,9 +771,15 @@ static ssize_t drmcg_limit_write(struct kernfs_open_file *of, char *buf,
> >>
> >> ddr->bo_limits_peak_allocated = val;
> >> break;
> >> + case DRMCG_TYPE_LGPU:
> >> + drmcg_nested_limit_parse(of, dm->dev, sattr);
> >> + break;
> >> default:
> >> break;
> >> }
> >> +
> >> + drmcg_apply_effective(type, dm->dev, drmcg);
> >> +
> >> mutex_unlock(&dm->dev->drmcg_mutex);
> >>
> >> mutex_lock(&drmcg_mutex);
> >> @@ -560,12 +838,51 @@ struct cftype files[] = {
> >> .private = DRMCG_CTF_PRIV(DRMCG_TYPE_BO_COUNT,
> >> DRMCG_FTYPE_STATS),
> >> },
> >> + {
> >> + .name = "lgpu",
> >> + .seq_show = drmcg_seq_show,
> >> + .write = drmcg_limit_write,
> >> + .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >> + DRMCG_FTYPE_LIMIT),
> >> + },
> >> + {
> >> + .name = "lgpu.default",
> >> + .seq_show = drmcg_seq_show,
> >> + .flags = CFTYPE_ONLY_ON_ROOT,
> >> + .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU,
> >> + DRMCG_FTYPE_DEFAULT),
> >> + },
> >> + {
> >> + .name = "lgpu.effective",
> >> + .seq_show = drmcg_seq_show,
> >> + .private = DRMCG_CTF_PRIV(DRMCG_TYPE_LGPU_EFF,
> >> + DRMCG_FTYPE_LIMIT),
> >> + },
> >> { } /* terminate */
> >> };
> >>
> >> +static int drmcg_online_fn(int id, void *ptr, void *data)
> >> +{
> >> + struct drm_minor *minor = ptr;
> >> + struct drmcg *drmcg = data;
> >> +
> >> + if (minor->type != DRM_MINOR_PRIMARY)
> >> + return 0;
> >> +
> >> + drmcg_apply_effective(DRMCG_TYPE_LGPU, minor->dev, drmcg);
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int drmcg_css_online(struct cgroup_subsys_state *css)
> >> +{
> >> + return drm_minor_for_each(&drmcg_online_fn, css_to_drmcg(css));
> >> +}
> >> +
> >> struct cgroup_subsys drm_cgrp_subsys = {
> >> .css_alloc = drmcg_css_alloc,
> >> .css_free = drmcg_css_free,
> >> + .css_online = drmcg_css_online,
> >> .early_init = false,
> >> .legacy_cftypes = files,
> >> .dfl_cftypes = files,
> >> @@ -585,6 +902,9 @@ void drmcg_device_early_init(struct drm_device *dev)
> >> dev->drmcg_props.bo_limits_total_allocated_default = S64_MAX;
> >> dev->drmcg_props.bo_limits_peak_allocated_default = S64_MAX;
> >>
> >> + dev->drmcg_props.lgpu_capacity = MAX_DRMCG_LGPU_CAPACITY;
> >> + bitmap_fill(dev->drmcg_props.lgpu_slots, MAX_DRMCG_LGPU_CAPACITY);
> >> +
> >> drmcg_update_cg_tree(dev);
> >> }
> >> EXPORT_SYMBOL(drmcg_device_early_init);
> >> --
> >> 2.25.0
> >>
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply
* [PATCH AUTOSEL 4.19 039/252] powerpc/powernv/ioda: Fix ref count for devices with their own PE
From: Sasha Levin @ 2020-02-14 16:08 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Frederic Barrat, linuxppc-dev, Andrew Donnellan, Sasha Levin
In-Reply-To: <20200214161147.15842-1-sashal@kernel.org>
From: Frederic Barrat <fbarrat@linux.ibm.com>
[ Upstream commit 05dd7da76986937fb288b4213b1fa10dbe0d1b33 ]
The pci_dn structure used to store a pointer to the struct pci_dev, so
taking a reference on the device was required. However, the pci_dev
pointer was later removed from the pci_dn structure, but the reference
was kept for the npu device.
See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary
pcidev from pci_dn").
We don't need to take a reference on the device when assigning the PE
as the struct pnv_ioda_pe is cleaned up at the same time as
the (physical) device is released. Doing so prevents the device from
being released, which is a problem for opencapi devices, since we want
to be able to remove them through PCI hotplug.
Now the ugly part: nvlink npu devices are not meant to be
released. Because of the above, we've always leaked a reference and
simply removing it now is dangerous and would likely require more
work. There's currently no release device callback for nvlink devices
for example. So to be safe, this patch leaks a reference on the npu
device, but only for nvlink and not opencapi.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191121134918.7155-2-fbarrat@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/platforms/powernv/pci-ioda.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 28adfe4dd04c5..e47ff05c5996f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1071,14 +1071,13 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
return NULL;
}
- /* NOTE: We get only one ref to the pci_dev for the pdn, not for the
- * pointer in the PE data structure, both should be destroyed at the
- * same time. However, this needs to be looked at more closely again
- * once we actually start removing things (Hotplug, SR-IOV, ...)
+ /* NOTE: We don't get a reference for the pointer in the PE
+ * data structure, both the device and PE structures should be
+ * destroyed at the same time. However, removing nvlink
+ * devices will need some work.
*
* At some point we want to remove the PDN completely anyways
*/
- pci_dev_get(dev);
pdn->pe_number = pe->pe_number;
pe->flags = PNV_IODA_PE_DEV;
pe->pdev = dev;
@@ -1093,7 +1092,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
pnv_ioda_free_pe(pe);
pdn->pe_number = IODA_INVALID_PE;
pe->pdev = NULL;
- pci_dev_put(dev);
return NULL;
}
@@ -1213,6 +1211,14 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
struct pci_controller *hose = pci_bus_to_host(npu_pdev->bus);
struct pnv_phb *phb = hose->private_data;
+ /*
+ * Intentionally leak a reference on the npu device (for
+ * nvlink only; this is not an opencapi path) to make sure it
+ * never goes away, as it's been the case all along and some
+ * work is needed otherwise.
+ */
+ pci_dev_get(npu_pdev);
+
/*
* Due to a hardware errata PE#0 on the NPU is reserved for
* error handling. This means we only have three PEs remaining
@@ -1236,7 +1242,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
*/
dev_info(&npu_pdev->dev,
"Associating to existing PE %x\n", pe_num);
- pci_dev_get(npu_pdev);
npu_pdn = pci_get_pdn(npu_pdev);
rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
npu_pdn->pe_number = pe_num;
--
2.20.1
^ permalink raw reply related
* Re: [PATCH AUTOSEL 5.4 192/459] ARM: OMAP2+: use separate IOMMU pdata to fix DRA7 IPU1 boot
From: Suman Anna @ 2020-02-14 18:33 UTC (permalink / raw)
To: Sasha Levin, linux-kernel, stable
Cc: Tony Lindgren, linux-omap, linux-arm-kernel
In-Reply-To: <20200214160149.11681-192-sashal@kernel.org>
Hi Sasha,
On 2/14/20 9:57 AM, Sasha Levin wrote:
> From: Suman Anna <s-anna@ti.com>
>
> [ Upstream commit 4601832f40501efc3c2fd264a5a69bd1ac17d520 ]
>
> The IPU1 MMU has been using common IOMMU pdata quirks defined and
> used by all IPU IOMMU devices on OMAP4 and beyond. Separate out the
> pdata for IPU1 MMU with the additional .set_pwrdm_constraint ops
> plugged in, so that the IPU1 power domain can be restricted to ON
> state during the boot and active period of the IPU1 remote processor.
> This eliminates the pre-conditions for the IPU1 boot issue as
> described in commit afe518400bdb ("iommu/omap: fix boot issue on
> remoteprocs with AMMU/Unicache").
>
> NOTE:
> 1. RET is not a valid target power domain state on DRA7 platforms,
> and IPU power domain is normally programmed for OFF. The IPU1
> still fails to boot though, and an unclearable l3_noc error is
> thrown currently on 4.14 kernel without this fix. This behavior
> is slightly different from previous 4.9 LTS kernel.
> 2. The fix is currently applied only to IPU1 on DRA7xx SoC, as the
> other affected processors on OMAP4/OMAP5/DRA7 are in domains
> that are not entering RET. IPU2 on DRA7 is in CORE power domain
> which is only programmed for ON power state. The fix can be easily
> scaled if these domains do hit RET in the future.
> 3. The issue was not seen on current DRA7 platforms if any of the
> DSP remote processors were booted and using one of the GPTimers
> 5, 6, 7 or 8 on previous 4.9 LTS kernel. This was due to the
> errata fix for i874 implemented in commit 1cbabcb9807e ("ARM:
> DRA7: clockdomain: Implement timer workaround for errata i874")
> which keeps the IPU1 power domain from entering RET when the
> timers are active. But the timer workaround did not make any
> difference on 4.14 kernel, and an l3_noc error was seen still
> without this fix.
>
> Signed-off-by: Suman Anna <s-anna@ti.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
You can drop this as well. Same comments as on PATCH AUTOSEL 5.4 191/459.
regards
Suman
> ---
> arch/arm/mach-omap2/pdata-quirks.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/arch/arm/mach-omap2/pdata-quirks.c b/arch/arm/mach-omap2/pdata-quirks.c
> index 2657752b90670..e13dcc0083a05 100644
> --- a/arch/arm/mach-omap2/pdata-quirks.c
> +++ b/arch/arm/mach-omap2/pdata-quirks.c
> @@ -45,6 +45,17 @@ struct pdata_init {
> static struct of_dev_auxdata omap_auxdata_lookup[];
> static struct twl4030_gpio_platform_data twl_gpio_auxdata;
>
> +#if IS_ENABLED(CONFIG_OMAP_IOMMU)
> +int omap_iommu_set_pwrdm_constraint(struct platform_device *pdev, bool request,
> + u8 *pwrst);
> +#else
> +static inline int omap_iommu_set_pwrdm_constraint(struct platform_device *pdev,
> + bool request, u8 *pwrst)
> +{
> + return 0;
> +}
> +#endif
> +
> #ifdef CONFIG_MACH_NOKIA_N8X0
> static void __init omap2420_n8x0_legacy_init(void)
> {
> @@ -337,6 +348,10 @@ static void __init omap5_uevm_legacy_init(void)
> #endif
>
> #ifdef CONFIG_SOC_DRA7XX
> +static struct iommu_platform_data dra7_ipu1_dsp_iommu_pdata = {
> + .set_pwrdm_constraint = omap_iommu_set_pwrdm_constraint,
> +};
> +
> static struct omap_hsmmc_platform_data dra7_hsmmc_data_mmc1;
> static struct omap_hsmmc_platform_data dra7_hsmmc_data_mmc2;
> static struct omap_hsmmc_platform_data dra7_hsmmc_data_mmc3;
> @@ -568,6 +583,12 @@ static struct of_dev_auxdata omap_auxdata_lookup[] = {
> &dra7_hsmmc_data_mmc2),
> OF_DEV_AUXDATA("ti,dra7-hsmmc", 0x480ad000, "480ad000.mmc",
> &dra7_hsmmc_data_mmc3),
> + OF_DEV_AUXDATA("ti,dra7-dsp-iommu", 0x40d01000, "40d01000.mmu",
> + &dra7_ipu1_dsp_iommu_pdata),
> + OF_DEV_AUXDATA("ti,dra7-dsp-iommu", 0x41501000, "41501000.mmu",
> + &dra7_ipu1_dsp_iommu_pdata),
> + OF_DEV_AUXDATA("ti,dra7-iommu", 0x58882000, "58882000.mmu",
> + &dra7_ipu1_dsp_iommu_pdata),
> #endif
> /* Common auxdata */
> OF_DEV_AUXDATA("ti,sysc", 0, NULL, &ti_sysc_pdata),
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* Re: [PATCH AUTOSEL 5.4 192/459] ARM: OMAP2+: use separate IOMMU pdata to fix DRA7 IPU1 boot
From: Suman Anna @ 2020-02-14 18:33 UTC (permalink / raw)
To: Sasha Levin, linux-kernel, stable
Cc: Tony Lindgren, linux-omap, linux-arm-kernel
In-Reply-To: <20200214160149.11681-192-sashal@kernel.org>
Hi Sasha,
On 2/14/20 9:57 AM, Sasha Levin wrote:
> From: Suman Anna <s-anna@ti.com>
>
> [ Upstream commit 4601832f40501efc3c2fd264a5a69bd1ac17d520 ]
>
> The IPU1 MMU has been using common IOMMU pdata quirks defined and
> used by all IPU IOMMU devices on OMAP4 and beyond. Separate out the
> pdata for IPU1 MMU with the additional .set_pwrdm_constraint ops
> plugged in, so that the IPU1 power domain can be restricted to ON
> state during the boot and active period of the IPU1 remote processor.
> This eliminates the pre-conditions for the IPU1 boot issue as
> described in commit afe518400bdb ("iommu/omap: fix boot issue on
> remoteprocs with AMMU/Unicache").
>
> NOTE:
> 1. RET is not a valid target power domain state on DRA7 platforms,
> and IPU power domain is normally programmed for OFF. The IPU1
> still fails to boot though, and an unclearable l3_noc error is
> thrown currently on 4.14 kernel without this fix. This behavior
> is slightly different from previous 4.9 LTS kernel.
> 2. The fix is currently applied only to IPU1 on DRA7xx SoC, as the
> other affected processors on OMAP4/OMAP5/DRA7 are in domains
> that are not entering RET. IPU2 on DRA7 is in CORE power domain
> which is only programmed for ON power state. The fix can be easily
> scaled if these domains do hit RET in the future.
> 3. The issue was not seen on current DRA7 platforms if any of the
> DSP remote processors were booted and using one of the GPTimers
> 5, 6, 7 or 8 on previous 4.9 LTS kernel. This was due to the
> errata fix for i874 implemented in commit 1cbabcb9807e ("ARM:
> DRA7: clockdomain: Implement timer workaround for errata i874")
> which keeps the IPU1 power domain from entering RET when the
> timers are active. But the timer workaround did not make any
> difference on 4.14 kernel, and an l3_noc error was seen still
> without this fix.
>
> Signed-off-by: Suman Anna <s-anna@ti.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
You can drop this as well. Same comments as on PATCH AUTOSEL 5.4 191/459.
regards
Suman
> ---
> arch/arm/mach-omap2/pdata-quirks.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/arch/arm/mach-omap2/pdata-quirks.c b/arch/arm/mach-omap2/pdata-quirks.c
> index 2657752b90670..e13dcc0083a05 100644
> --- a/arch/arm/mach-omap2/pdata-quirks.c
> +++ b/arch/arm/mach-omap2/pdata-quirks.c
> @@ -45,6 +45,17 @@ struct pdata_init {
> static struct of_dev_auxdata omap_auxdata_lookup[];
> static struct twl4030_gpio_platform_data twl_gpio_auxdata;
>
> +#if IS_ENABLED(CONFIG_OMAP_IOMMU)
> +int omap_iommu_set_pwrdm_constraint(struct platform_device *pdev, bool request,
> + u8 *pwrst);
> +#else
> +static inline int omap_iommu_set_pwrdm_constraint(struct platform_device *pdev,
> + bool request, u8 *pwrst)
> +{
> + return 0;
> +}
> +#endif
> +
> #ifdef CONFIG_MACH_NOKIA_N8X0
> static void __init omap2420_n8x0_legacy_init(void)
> {
> @@ -337,6 +348,10 @@ static void __init omap5_uevm_legacy_init(void)
> #endif
>
> #ifdef CONFIG_SOC_DRA7XX
> +static struct iommu_platform_data dra7_ipu1_dsp_iommu_pdata = {
> + .set_pwrdm_constraint = omap_iommu_set_pwrdm_constraint,
> +};
> +
> static struct omap_hsmmc_platform_data dra7_hsmmc_data_mmc1;
> static struct omap_hsmmc_platform_data dra7_hsmmc_data_mmc2;
> static struct omap_hsmmc_platform_data dra7_hsmmc_data_mmc3;
> @@ -568,6 +583,12 @@ static struct of_dev_auxdata omap_auxdata_lookup[] = {
> &dra7_hsmmc_data_mmc2),
> OF_DEV_AUXDATA("ti,dra7-hsmmc", 0x480ad000, "480ad000.mmc",
> &dra7_hsmmc_data_mmc3),
> + OF_DEV_AUXDATA("ti,dra7-dsp-iommu", 0x40d01000, "40d01000.mmu",
> + &dra7_ipu1_dsp_iommu_pdata),
> + OF_DEV_AUXDATA("ti,dra7-dsp-iommu", 0x41501000, "41501000.mmu",
> + &dra7_ipu1_dsp_iommu_pdata),
> + OF_DEV_AUXDATA("ti,dra7-iommu", 0x58882000, "58882000.mmu",
> + &dra7_ipu1_dsp_iommu_pdata),
> #endif
> /* Common auxdata */
> OF_DEV_AUXDATA("ti,sysc", 0, NULL, &ti_sysc_pdata),
>
^ permalink raw reply
* Re: [V2 2/3] firmware: arm_sdei: Removed multiple white lines.
From: James Morse @ 2020-02-14 18:32 UTC (permalink / raw)
To: luanshi; +Cc: linux-kernel, linux-arm-kernel
In-Reply-To: <1579145331-78633-2-git-send-email-zhangliguang@linux.alibaba.com>
Hi Luanshi,
On 16/01/2020 03:28, luanshi wrote:
> Remove one unnecessary white line.
>
> Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
> ---
> drivers/firmware/arm_sdei.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/firmware/arm_sdei.c b/drivers/firmware/arm_sdei.c
> index 37e9bf0..f81c09e 100644
> --- a/drivers/firmware/arm_sdei.c
> +++ b/drivers/firmware/arm_sdei.c
> @@ -599,7 +599,6 @@ static int _sdei_event_register(struct sdei_event *event)
> event->registered,
> SDEI_EVENT_REGISTER_RM_ANY, 0);
>
> -
> err = sdei_do_cross_call(_local_event_register, event);
> if (err) {
> spin_lock(&event->sdei_event_lock);
I'm afraid these whitespace-only patches aren't worth sending. If its not caught at
review, it gets to annoy the reader until someone can do a drive-by fix when they are
changing adjacent code.
I've merged this with the first patch in the eventual series.
Thanks,
James
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* Re: [V2 1/3] firmware: arm_sdei: fix possible deadlock
From: James Morse @ 2020-02-14 18:32 UTC (permalink / raw)
To: luanshi; +Cc: linux-kernel, linux-arm-kernel
In-Reply-To: <1579145331-78633-1-git-send-email-zhangliguang@linux.alibaba.com>
Hi Luanshi,
On 16/01/2020 03:28, luanshi wrote:
> We call sdei_reregister_event() with sdei_list_lock held but
> _sdei_event_register() and sdei_event_destroy() also acquires
> sdei_list_lock thus creating A-A deadlock.
>
> Fixes: da351827240e ("firmware: arm_sdei: Add support for CPU and system
> power states")
>
(Nit: stray whitespace in the fixes tag, the backport tools may choke on this)
(Please include 'PATCH' in the [] section of the subject when posting, its part of the
'canonical patch format', and my scripts for pulling a series of the list depend on it!)
> ---
Thanks for picking up my suggestion, ... it was what I think should have been done in the
first place to avoid this bug.
Looking at your patch, we'd need to take the per-event lock around the reads of reregister
and reenable in sdei_cpuhp_up() too, and sdei_reregister_shared(), ... and this quickly
becomes much noisier than a patch for stable should be. (Sorry, I should have tried it
before suggesting it!)
I've picked up your first version, but instead of duplicating the contents of the
function, I've added '_llocked' wrappers to account for that lock already being held. This
isn't great as we have _locked too, but lockdep should keep us honest.
Because I started with your patch, git has kept you as author.
This ended up as patch 2, because it was also necessary to move those reregister updates
into their callers to fix hibernate.
I'll posted what I have next week, sorry for the hiatus.
Thanks,
James
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* [PATCH] usb: dwc3: Check that the request is valid in dwc3_gadget_giveback()
From: Marek Vasut @ 2020-02-14 18:33 UTC (permalink / raw)
To: u-boot
In-Reply-To: <20200214122328.24987-1-vigneshr@ti.com>
On 2/14/20 1:23 PM, Vignesh Raghavendra wrote:
> From: Jean-Jacques Hiblot <jjhiblot@ti.com>
>
> This fixes potential issues reported by klokworks:
> Pointer 'req' returned from call to function 'next_request' at line 531 and
> 538 may be NULL and will be dereferenced in dwc3_gadget_giveback()
Shouldn't you rather handle the issue in dwc3_remove_requests() ?
Also, please explain what conditions trigger this issue, i.e. when req
becomes NULL.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.