Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] arm64: Override SPSR.SS when single-stepping is enabled
From: Will Deacon @ 2020-06-03 15:53 UTC (permalink / raw)
  To: Keno Fischer
  Cc: Mark Rutland, Luis Machado, kernel-team, stable, linux-arm-kernel
In-Reply-To: <CABV8kRwrnixNc074-jQhZzeucGHx9_e5FnQmBS=VuL=tFGjY-Q@mail.gmail.com>

Hi Keno,

Thanks for having a look.

On Wed, Jun 03, 2020 at 11:42:46AM -0400, Keno Fischer wrote:
> On Wed, Jun 3, 2020 at 11:10 AM Will Deacon <will@kernel.org> wrote:
> >
> > Luis reports that, when reverse debugging with GDB, single-step does not
> > function as expected on arm64:
> >
> >   | I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP
> >   | request by GDB won't execute the underlying instruction. As a consequence,
> >   | the PC doesn't move, but we return a SIGTRAP just like we would for a
> >   | regular successful PTRACE_SINGLESTEP request.
> >
> > The underlying problem is that when the CPU register state is restored
> > as part of a reverse step, the SPSR.SS bit is cleared and so the hardware
> > single-step state can transition to the "active-pending" state, causing
> > an unexpected step exception to be taken immediately if a step operation
> > is attempted.
> 
> We saw this issue also and worked around it in user-space [1]. That said,
> I think I'm ok with this change in the kernel, since I can't think of
> a particularly useful usecase for this feature.
> 
> However, at the same time as changing this, we should probably make sure
> to enable the syscall exit pseudo-singlestep trap (similar issue as the other
> patch I had sent for the signal pseudo-singlestep trap), since otherwise
> ptracers might get confused about the lack of singlestep trap during a
> singlestep -> seccomp -> singlestep path (which would give one trap
> less with this patch than before).

Hmm, I don't completely follow your example. Please could you spell it
out a bit more? I fast-forward the stepping state machine on sigreturn,
which I thought would be sufficient. Perhaps you're referring to a variant
of the situation mentioned by Mark, which I didn't think could happen
with ptrace [2].

Cheers,

Will

[2] https://lore.kernel.org/r/20200531095216.GB30204@willie-the-truck

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 1/2] arm64: Override SPSR.SS when single-stepping is enabled
From: Keno Fischer @ 2020-06-03 15:42 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, Luis Machado, kernel-team, stable, linux-arm-kernel
In-Reply-To: <20200603151033.11512-2-will@kernel.org>

On Wed, Jun 3, 2020 at 11:10 AM Will Deacon <will@kernel.org> wrote:
>
> Luis reports that, when reverse debugging with GDB, single-step does not
> function as expected on arm64:
>
>   | I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP
>   | request by GDB won't execute the underlying instruction. As a consequence,
>   | the PC doesn't move, but we return a SIGTRAP just like we would for a
>   | regular successful PTRACE_SINGLESTEP request.
>
> The underlying problem is that when the CPU register state is restored
> as part of a reverse step, the SPSR.SS bit is cleared and so the hardware
> single-step state can transition to the "active-pending" state, causing
> an unexpected step exception to be taken immediately if a step operation
> is attempted.

We saw this issue also and worked around it in user-space [1]. That said,
I think I'm ok with this change in the kernel, since I can't think of
a particularly useful usecase for this feature.

However, at the same time as changing this, we should probably make sure
to enable the syscall exit pseudo-singlestep trap (similar issue as the other
patch I had sent for the signal pseudo-singlestep trap), since otherwise
ptracers might get confused about the lack of singlestep trap during a
singlestep -> seccomp -> singlestep path (which would give one trap
less with this patch than before).

Keno

[1] https://github.com/mozilla/rr/blob/36aa5328a2240dc3d794c14926e0754f66ee28e0/src/Task.cc#L1352-L1362

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v8 4/8] PM / EM: add support for other devices than CPUs in Energy Model
From: Rafael J. Wysocki @ 2020-06-03 15:40 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Nishanth Menon, Juri Lelli, Rafael J. Wysocki, Peter Zijlstra,
	Viresh Kumar, Liviu Dudau, dri-devel, Bjorn Andersson,
	Benjamin Segall, alyssa.rosenzweig, Fabio Estevam,
	Matthias Kaehlcke, Rob Herring, Amit Kucheria, Lorenzo Pieralisi,
	Kevin Hilman, Andy Gross, Daniel Lezcano, steven.price,
	Chanwoo Choi, Ingo Molnar, dl-linux-imx, Zhang, Rui, Mel Gorman,
	orjan.eide, Daniel Vetter, Linux PM, linux-arm-msm, Sascha Hauer,
	Steven Rostedt, moderated list:ARM/Mediatek SoC...,
	Matthias Brugger, Linux OMAP Mailing List, Dietmar Eggemann,
	Linux ARM, David Airlie, Tomeu Vizoso, Quentin Perret,
	Stephen Boyd, Randy Dunlap, Rafael J. Wysocki,
	Linux Kernel Mailing List, Bartlomiej Zolnierkiewicz,
	Sascha Hauer, Sudeep Holla, patrick.bellasi, Shawn Guo
In-Reply-To: <d6a0d345-53ef-523c-836d-3bc4ea4c6e66@arm.com>

On Wed, Jun 3, 2020 at 5:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 6/3/20 4:13 PM, Rafael J. Wysocki wrote:
> > On Tue, Jun 2, 2020 at 1:31 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
> >>
> >> Hi Daniel,
> >>
> >> On 6/1/20 10:44 PM, Daniel Lezcano wrote:
> >>> On 27/05/2020 11:58, Lukasz Luba wrote:
> >>>> Add support for other devices than CPUs. The registration function
> >>>> does not require a valid cpumask pointer and is ready to handle new
> >>>> devices. Some of the internal structures has been reorganized in order to
> >>>> keep consistent view (like removing per_cpu pd pointers).
> >>>>
> >>>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> >>>> ---
> >>>
> >>> [ ... ]
> >>>
> >>>>    }
> >>>>    EXPORT_SYMBOL_GPL(em_register_perf_domain);
> >>>> +
> >>>> +/**
> >>>> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
> >>>> + * @dev             : Device for which the EM is registered
> >>>> + *
> >>>> + * Try to unregister the EM for the specified device (but not a CPU).
> >>>> + */
> >>>> +void em_dev_unregister_perf_domain(struct device *dev)
> >>>> +{
> >>>> +    if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
> >>>> +            return;
> >>>> +
> >>>> +    if (_is_cpu_device(dev))
> >>>> +            return;
> >>>> +
> >>>> +    mutex_lock(&em_pd_mutex);
> >>>
> >>> Is the mutex really needed?
> >>
> >> I just wanted to align this unregister code with register. Since there
> >> is debugfs dir lookup and the device's EM existence checks I thought it
> >> wouldn't harm just to lock for a while and make sure the registration
> >> path is not used. These two paths shouldn't affect each other, but with
> >> modules loading/unloading I wanted to play safe.
> >>
> >> I can change it maybe to just dmb() and the end of the function if it's
> >> a big performance problem in this unloading path. What do you think?
> >
> > I would rather leave the mutex locking as is.
> >
> > However, the question to ask is what exactly may go wrong without that
> > locking in place?  Is there any specific race condition that you are
> > concerned about?
> >
>
> I tried to test this with module loading & unloading with panfrost
> driver and CPU hotplug (which should bail out quickly) and was OK.
> I don't see any particular race. I don't too much about the
> debugfs code, though. That's why I tried to protect from some
> scripts/services which try to re-load the driver.
>
> Apart from that, maybe just this dev->em = NULL to be populated to all
> CPUs, which mutex_unlock synchronizes for free here.

If it may run concurrently with the registration for the same device,
the locking is necessary, but in that case the !dev->em_pd check needs
to go under the mutex too IMO, or you may end up leaking the pd if the
registration can run between that check and the point at which the
mutex is taken.

Apart from this your kerneldoc comments might me improved IMO.

First of all, you can use @dev inside of a kerneldoc (if @dev
represents an argument of the documented function) and that will
produce the right output automatically.

Second, it is better to avoid saying things like "Try to unregister
..." in kerneldoc comments (the "Try to" part is redundant).  Simply
say "Unregister ..." instead.

Thanks!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump
From: John Donnelly @ 2020-06-03 15:30 UTC (permalink / raw)
  To: chenzhou
  Cc: Simon Horman, Devicetree List, Arnd Bergmann, Baoquan He,
	Linux Doc Mailing List, Catalin Marinas, Bhupesh Sharma,
	guohanjun, kexec mailing list, Linux Kernel Mailing List,
	Will Deacon, Rob Herring, James Morse, nsaenzjulienne,
	Prabhakar Kushwaha, Thomas Gleixner, Prabhakar Kushwaha,
	RuiRui Yang, Ingo Molnar, linux-arm-kernel
In-Reply-To: <8463464e-5461-f328-621c-bacc6a3b88dd@huawei.com>



> On Jun 3, 2020, at 8:20 AM, chenzhou <chenzhou10@huawei.com> wrote:
> 
> Hi,
> 
> 
> On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
>> Hi Chen,
>> 
>> On Tue, Jun 2, 2020 at 8:12 PM John Donnelly <john.p.donnelly@oracle.com> wrote:
>>> 
>>> 
>>>> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <prabhakar.pkin@gmail.com> wrote:
>>>> 
>>>> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <john.p.donnelly@oracle.com> wrote:
>>>>> Hi .  See below !
>>>>> 
>>>>>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
>>>>>> 
>>>>>> Hi John,
>>>>>> 
>>>>>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <John.P.donnelly@oracle.com> wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> 
>>>>>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>>>>>>> Hi Chen,
>>>>>>>> 
>>>>>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <chenzhou10@huawei.com> wrote:
>>>>>>>>> This patch series enable reserving crashkernel above 4G in arm64.
>>>>>>>>> 
>>>>>>>>> There are following issues in arm64 kdump:
>>>>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>>>>>>>>> when there is no enough low memory.
>>>>>>>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>>>>>>>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>>>>>>>>> will boot failure because there is no low memory available for allocation.
>>>>>>>>> 
>>>>>>>> We are getting "warn_alloc" [1] warning during boot of kdump kernel
>>>>>>>> with bootargs as [2] of primary kernel.
>>>>>>>> This error observed on ThunderX2  ARM64 platform.
>>>>>>>> 
>>>>>>>> It is observed with latest upstream tag (v5.7-rc3) with this patch set
>>>>>>>> and https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
>>>>>>>> Also **without** this patch-set
>>>>>>>> "https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$"
>>>>>>>> 
>>>>>>>> This issue comes whenever crashkernel memory is reserved after 0xc000_0000.
>>>>>>>> More details discussed earlier in
>>>>>>>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$  without any
>>>>>>>> solution
>>>>>>>> 
>>>>>>>> This patch-set is expected to solve similar kind of issue.
>>>>>>>> i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
>>>>>>>> observation should be considered/fixed. .
>>>>>>>> 
>>>>>>>> --pk
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>>> [   30.367696] NET: Registered protocol family 16
>>>>>>>> [   30.369973] swapper/0: page allocation failure: order:6,
>>>>>>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>>>>>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3+ #121
>>>>>>>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
>>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/yyyy
>>>>>>>> [   30.369984] Call trace:
>>>>>>>> [   30.369989]  dump_backtrace+0x0/0x1f8
>>>>>>>> [   30.369991]  show_stack+0x20/0x30
>>>>>>>> [   30.369997]  dump_stack+0xc0/0x10c
>>>>>>>> [   30.370001]  warn_alloc+0x10c/0x178
>>>>>>>> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0xb50
>>>>>>>> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
>>>>>>>> [   30.370008]  alloc_page_interleave+0x24/0x98
>>>>>>>> [   30.370011]  alloc_pages_current+0xe4/0x108
>>>>>>>> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
>>>>>>>> [   30.370020]  do_one_initcall+0x54/0x228
>>>>>>>> [   30.370027]  kernel_init_freeable+0x228/0x2cc
>>>>>>>> [   30.370031]  kernel_init+0x1c/0x110
>>>>>>>> [   30.370034]  ret_from_fork+0x10/0x18
>>>>>>>> [   30.370036] Mem-Info:
>>>>>>>> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
>>>>>>>> [   30.370064]  active_file:0 inactive_file:0 isolated_file:0
>>>>>>>> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
>>>>>>>> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
>>>>>>>> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
>>>>>>>> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
>>>>>>>> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
>>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
>>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>>> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>>>>>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
>>>>>>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
>>>>>>>> unstable:0kB all_unreclaimable? no
>>>>>>>> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
>>>>>>>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>>>>>>> present:128kB managed:0kB mlocked:0kB kernel_stack:0kB pagetables:0kB
>>>>>>>> bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>>> [   30.370084] lowmem_reserve[]: 0 250 6063 6063
>>>>>>>> [   30.370090] Node 0 DMA32 free:256000kB min:408kB low:664kB
>>>>>>>> high:920kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>>>>>>> present:269700kB managed:256000kB mlocked:0kB kernel_stack:0kB
>>>>>>>> pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>>> [   30.370094] lowmem_reserve[]: 0 0 5813 5813
>>>>>>>> [   30.370100] Node 0 Normal free:5894876kB min:9552kB low:15504kB
>>>>>>>> high:21456kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>>>>>>> active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB
>>>>>>>> present:8388608kB managed:5953112kB mlocked:0kB kernel_stack:21672kB
>>>>>>>> pagetables:56kB bounce:0kB free_pcp:876kB local_pcp:176kB free_cma:0kB
>>>>>>>> [   30.370104] lowmem_reserve[]: 0 0 0 0
>>>>>>>> [   30.370107] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
>>>>>>>> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
>>>>>>>> [   30.370113] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB
>>>>>>>> 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 62*4096kB (M) = 256000kB
>>>>>>>> [   30.370119] Node 0 Normal: 2*4kB (M) 3*8kB (ME) 2*16kB (UE) 3*32kB
>>>>>>>> (UM) 1*64kB (U) 2*128kB (M) 2*256kB (ME) 3*512kB (ME) 3*1024kB (ME)
>>>>>>>> 3*2048kB (UME) 1436*4096kB (M) = 5893600kB
>>>>>>>> [   30.370129] Node 0 hugepages_total=0 hugepages_free=0
>>>>>>>> hugepages_surp=0 hugepages_size=1048576kB
>>>>>>>> [   30.370130] 0 total pagecache pages
>>>>>>>> [   30.370132] 0 pages in swap cache
>>>>>>>> [   30.370134] Swap cache stats: add 0, delete 0, find 0/0
>>>>>>>> [   30.370135] Free swap  = 0kB
>>>>>>>> [   30.370136] Total swap = 0kB
>>>>>>>> [   30.370137] 2164609 pages RAM
>>>>>>>> [   30.370139] 0 pages HighMem/MovableOnly
>>>>>>>> [   30.370140] 612331 pages reserved
>>>>>>>> [   30.370141] 0 pages hwpoisoned
>>>>>>>> [   30.370143] DMA: failed to allocate 256 KiB pool for atomic
>>>>>>>> coherent allocation
>>>>>>> 
>>>>>>> During my testing I saw the same error and Chen's  solution corrected it .
>>>>>> Which combination you are using on your side? I am using Prabhakar's
>>>>>> suggested environment and can reproduce the issue
>>>>>> with or without Chen's crashkernel support above 4G patchset.
>>>>>> 
>>>>>> I am also using a ThunderX2 platform with latest makedumpfile code and
>>>>>> kexec-tools (with the suggested patch
>>>>>> <https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!J6lUig58-Gw6TKZnEEYzEeSU36T-1SqlB1kImU00xtX_lss5Tx-JbUmLE9TJC3foXBLg$ >).
>>>>>> 
>>>>>> Thanks,
>>>>>> Bhupesh
>>>>> 
>>>>> I did this activity 5 months ago and I have moved on to other activities. My DMA failures were related to PCI devices that could not be enumerated because  low-DMA space was not  available when crashkernel was moved above 4G; I don’t recall the exact platform.
>>>>> 
>>>>> 
>>>>> 
>>>>> For this failure ,
>>>>> 
>>>>>>>> DMA: failed to allocate 256 KiB pool for atomic
>>>>>>>> coherent allocation
>>>>> 
>>>>> Is due to :
>>>>> 
>>>>> 
>>>>> 3618082c
>>>>> ("arm64 use both ZONE_DMA and ZONE_DMA32")
>>>>> 
>>>>> With the introduction of ZONE_DMA to support the Raspberry DMA
>>>>> region below 1G, the crashkernel is placed in the upper 4G
>>>>> ZONE_DMA_32 region. Since the crashkernel does not have access
>>>>> to the ZONE_DMA region, it prints out call trace during bootup.
>>>>> 
>>>>> It is due to having this CONFIG item  ON  :
>>>>> 
>>>>> 
>>>>> CONFIG_ZONE_DMA=y
>>>>> 
>>>>> Turning off ZONE_DMA fixes a issue and Raspberry PI 4 will
>>>>> use the device tree to specify memory below 1G.
>>>>> 
>>>>> 
>>>> Disabling ZONE_DMA is temporary solution.  We may need proper solution
>>> 
>>> Perhaps the Raspberry platform configuration dependencies need separated  from “server class” Arm  equipment ?  Or auto-configured on boot ?  Consult an expert ;-)
>>> 
>>> 
>>> 
>>>>> I would like to see Chen’s feature added , perhaps as EXPERIMENTAL,  so we can get some configuration testing done on it.   It corrects having a DMA zone in low memory while crash-kernel is above 4GB.  This has been going on for a year now.
>>>> I will also like this patch to be added in Linux as early as possible.
>>>> 
>>>> Issue mentioned by me happens with or without this patch.
>>>> 
>>>> This patch-set can consider fixing because it uses low memory for DMA
>>>> & swiotlb only.
>>>> We can consider restricting crashkernel within the required range like below
>>>> 
>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>>> index 7f9e5a6dc48c..bd67b90d35bd 100644
>>>> --- a/kernel/crash_core.c
>>>> +++ b/kernel/crash_core.c
>>>> @@ -354,7 +354,7 @@ int __init reserve_crashkernel_low(void)
>>>>                       return 0;
>>>>       }
>>>> 
>>>> -       low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>>>> +       low_base = memblock_find_in_range(0,0xc0000000, low_size, CRASH_ALIGN);
>>>>       if (!low_base) {
>>>>               pr_err("Cannot reserve %ldMB crashkernel low memory,
>>>> please try smaller size.\n",
>>>>                      (unsigned long)(low_size >> 20));
>>>> 
>>>> 
>>>    I suspect  0xc0000000  would need to be a CONFIG item  and not hard-coded.
>>> 
>> if you consider this as valid change,  can you please incorporate as
>> part of your patch-set.
> 
> After commit 1a8e1cef7 ("arm64: use both ZONE_DMA and ZONE_DMA32"),the 0-4G memory is splited
> to DMA [mem 0x0000000000000000-0x000000003fffffff] and DMA32 [mem 0x0000000040000000-0x00000000ffffffff] on arm64.
> 
> From the above discussion, on your platform, the low crashkernel fall in DMA32 region, but your environment needs to access DMA
> region, so there is the call trace.
> 
> I have a question, why do you choose 0xc0000000 here?
> 
> Besides, this is common code, we also need to consider about x86.
> 

 + nsaenzjulienne@suse.de 

  Exactly .  This is why it needs to be a CONFIG option for  Raspberry ..,  or device tree option. 


  We could revert 1a8e1cef7 since it broke  Arm kdump too.


> 
> Thanks,
> Chen Zhou
> 


 
 
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [arm64:devel/mte-v4 25/28] arch/arm64/kernel/cpufeature.c:137:5: warning: no previous prototype for 'mte_ftr_filter'
From: Catalin Marinas @ 2020-06-03 15:29 UTC (permalink / raw)
  To: kernel test robot; +Cc: Vincenzo, Frascino, , kbuild-all, linux-arm-kernel
In-Reply-To: <202006032134.2oUO53va%lkp@intel.com>

On Wed, Jun 03, 2020 at 09:31:38PM +0800, kernel test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git devel/mte-v4
> head:   628f92bb11400cab1adb95c32c05cb1c3c384248
> commit: bcf807ed1e622f8311b05e34816df6c41aed54c0 [25/28] arm64: mte: Kconfig entry
> config: arm64-allyesconfig (attached as .config)
> compiler: aarch64-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         git checkout bcf807ed1e622f8311b05e34816df6c41aed54c0
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm64 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All warnings (new ones prefixed by >>, old ones prefixed by <<):
> 
> >> arch/arm64/kernel/cpufeature.c:137:5: warning: no previous prototype for 'mte_ftr_filter' [-Wmissing-prototypes]
> 137 | s64 mte_ftr_filter(const struct arm64_ftr_bits *ftrp, s64 val)
> |     ^~~~~~~~~~~~~~
> 
> vim +/mte_ftr_filter +137 arch/arm64/kernel/cpufeature.c

Thanks for the report, it should indeed be a static function. I'll fix
it locally but in v5 I'll drop the DT patch altogether.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v3 032/105] drm/vc4: crtc: Enable and disable the PV in atomic_enable / disable
From: Stefan Wahren @ 2020-06-03 15:26 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Tim Gover, Dave Stevenson, LKML, DRI Development, Eric Anholt,
	bcm-kernel-feedback-list, linux-arm-kernel, Phil Elwell,
	Nicolas Saenz Julienne, linux-rpi-kernel
In-Reply-To: <20200603131434.2gmofg7sf7luakje@gilmour>

Hi Maxime,

Am 03.06.20 um 15:14 schrieb Maxime Ripard:
> Hi Stefan,
>
> On Tue, Jun 02, 2020 at 10:03:13PM +0200, Stefan Wahren wrote:
>> Am 02.06.20 um 21:31 schrieb Eric Anholt:
>>> On Tue, Jun 2, 2020 at 8:02 AM Dave Stevenson
>>> <dave.stevenson@raspberrypi.com> wrote:
>>>> Hi Maxime and Eric
>>>>
>>>> On Tue, 2 Jun 2020 at 15:12, Maxime Ripard <maxime@cerno.tech> wrote:
>>>>> Hi Eric
>>>>>
>>>>> On Wed, May 27, 2020 at 09:54:44AM -0700, Eric Anholt wrote:
>>>>>> On Wed, May 27, 2020 at 8:50 AM Maxime Ripard <maxime@cerno.tech> wrote:
>>>>>>> The VIDEN bit in the pixelvalve currently being used to enable or disable
>>>>>>> the pixelvalve seems to not be enough in some situations, which whill end
>>>>>>> up with the pixelvalve stalling.
>>>>>>>
>>>>>>> In such a case, even re-enabling VIDEN doesn't bring it back and we need to
>>>>>>> clear the FIFO. This can only be done if the pixelvalve is disabled though.
>>>>>>>
>>>>>>> In order to overcome this, we can configure the pixelvalve during
>>>>>>> mode_set_no_fb, but only enable it in atomic_enable and flush the FIFO
>>>>>>> there, and in atomic_disable disable the pixelvalve again.
>>>>>> What displays has this been tested with?  Getting this sequencing
>>>>>> right is so painful, and things like DSI are tricky to get to light
>>>>>> up.
>>>>> That FIFO is between the HVS and the HDMI PVs, so this was obviously
>>>>> tested against that. Dave also tested the DSI output IIRC, so we should
>>>>> be covered here.
>>>> DSI wasn't working on the first patch set that Maxime sent - I haven't
>>>> tested this one as yet but will do so.
>>>> DPI was working early on to both an Adafruit 800x480 DPI panel, and
>>>> via a VGA666 as VGA.
>>>> HDMI is obviously working.
>>>> VEC is being ignored now. The clock structure is more restricted than
>>>> earlier chips, so to get the required clocks for the VEC without using
>>>> fractional divides it compromises the clock that other parts of the
>>>> system can run at (IIRC including the ARM). That's why the VEC has to
>>>> be explicitly enabled for the firmware to enable it as the only
>>>> output. It's annoying, but that's just a restriction of the chip.
>>> I'm more concerned with "make sure we don't regress pre-pi4 with this
>>> series" than "pi4 displays all work from the beginning"
>> unfortuntely i can confirm this. With this patch series (using Maxime's
>> git repo with multi_v7_defconfig) my Raspberry Pi 3 B hangs up while
>> starting X (screen stays black, heartbeat stops, no more output at the
>> debug UART). AFAIR v2 didn't had this issue.
> Did it happen with a DSI display or something else?
with my HDMI display (HP ZR2440w). At first everything looks good
(Raspbian splash screen is displayed), but before the real workspace is
displayed this issue happens.
> I've been trying to setup the DSI display on an RPi3 today, but noticed
> that it looks like there's a regression in next that prevents the HDMI
> driver to load entirely (without my patches).

I took your rpi4-kms branch and switched before your series and
everything works fine. I will try to bisect the offending commit.

Stefan

>
> Maxime


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v8 4/8] PM / EM: add support for other devices than CPUs in Energy Model
From: Lukasz Luba @ 2020-06-03 15:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Nishanth Menon, Juri Lelli, Peter Zijlstra, Viresh Kumar,
	Liviu Dudau, dri-devel, Bjorn Andersson, Benjamin Segall,
	alyssa.rosenzweig, Fabio Estevam, Matthias Kaehlcke, Rob Herring,
	Amit Kucheria, Lorenzo Pieralisi, Kevin Hilman, Andy Gross,
	Daniel Lezcano, steven.price, Chanwoo Choi, Ingo Molnar,
	dl-linux-imx, Zhang, Rui, Mel Gorman, orjan.eide, Daniel Vetter,
	Linux PM, linux-arm-msm, Sascha Hauer, Steven Rostedt,
	moderated list:ARM/Mediatek SoC..., Matthias Brugger,
	Linux OMAP Mailing List, Dietmar Eggemann, Linux ARM,
	David Airlie, Tomeu Vizoso, Quentin Perret, Stephen Boyd,
	Randy Dunlap, Rafael J. Wysocki, Linux Kernel Mailing List,
	Bartlomiej Zolnierkiewicz, Sascha Hauer, Sudeep Holla,
	patrick.bellasi, Shawn Guo
In-Reply-To: <CAJZ5v0jwoNSYOz3nGqNshd=5btsLxOp-di-Dot+cHqAQZEQVRw@mail.gmail.com>



On 6/3/20 4:13 PM, Rafael J. Wysocki wrote:
> On Tue, Jun 2, 2020 at 1:31 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>> Hi Daniel,
>>
>> On 6/1/20 10:44 PM, Daniel Lezcano wrote:
>>> On 27/05/2020 11:58, Lukasz Luba wrote:
>>>> Add support for other devices than CPUs. The registration function
>>>> does not require a valid cpumask pointer and is ready to handle new
>>>> devices. Some of the internal structures has been reorganized in order to
>>>> keep consistent view (like removing per_cpu pd pointers).
>>>>
>>>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>>>> ---
>>>
>>> [ ... ]
>>>
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(em_register_perf_domain);
>>>> +
>>>> +/**
>>>> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
>>>> + * @dev             : Device for which the EM is registered
>>>> + *
>>>> + * Try to unregister the EM for the specified device (but not a CPU).
>>>> + */
>>>> +void em_dev_unregister_perf_domain(struct device *dev)
>>>> +{
>>>> +    if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
>>>> +            return;
>>>> +
>>>> +    if (_is_cpu_device(dev))
>>>> +            return;
>>>> +
>>>> +    mutex_lock(&em_pd_mutex);
>>>
>>> Is the mutex really needed?
>>
>> I just wanted to align this unregister code with register. Since there
>> is debugfs dir lookup and the device's EM existence checks I thought it
>> wouldn't harm just to lock for a while and make sure the registration
>> path is not used. These two paths shouldn't affect each other, but with
>> modules loading/unloading I wanted to play safe.
>>
>> I can change it maybe to just dmb() and the end of the function if it's
>> a big performance problem in this unloading path. What do you think?
> 
> I would rather leave the mutex locking as is.
> 
> However, the question to ask is what exactly may go wrong without that
> locking in place?  Is there any specific race condition that you are
> concerned about?
> 

I tried to test this with module loading & unloading with panfrost
driver and CPU hotplug (which should bail out quickly) and was OK.
I don't see any particular race. I don't too much about the
debugfs code, though. That's why I tried to protect from some
scripts/services which try to re-load the driver.

Apart from that, maybe just this dev->em = NULL to be populated to all
CPUs, which mutex_unlock synchronizes for free here.

Regards,
Lukasz



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v2 2/3] media: stm32-dcmi: Set minimum cpufreq requirement
From: Valentin Schneider @ 2020-06-03 15:25 UTC (permalink / raw)
  To: Benjamin Gaignard
  Cc: alexandre.torgue, rjw, linux-kernel, mcoquelin.stm32,
	hugues.fruchet, mchehab, linux-stm32, linux-arm-kernel,
	linux-media
In-Reply-To: <20200603124559.22652-3-benjamin.gaignard@st.com>


On 03/06/20 13:45, Benjamin Gaignard wrote:
> +static void dcmi_set_min_frequency(struct stm32_dcmi *dcmi, u64 freq)
> +{
> +	struct cpufreq_policy *p;
> +	int cpu;
> +
> +	for_each_cpu(cpu, irq_get_affinity_mask(dcmi->irq)) {
> +		p = per_cpu(policy, cpu);
> +		if (!p)
> +			continue;
> +
> +		freq_qos_update_request(&per_cpu(qos_req, cpu), freq);
> +	}
> +}
> +

You may want to use a "visited" cpumask as I suggested in the previous
thread, since a policy can cover more than one CPU (IOW, a frequency domain
can span more than one CPU). It's not required per-se, AFAICT, but it makes
things a bit neater.

I also think you'll have to use the affinity notifier
(irq_set_affinity_notifier()), since AFAICT userspace can change the
affinity of that IRQ. I suppose you'll want something like:
- Check if we currently are in streaming mode
- Clear the QoS request for CPUs that were previously boosted but that
  aren't in the new mask
- Add the request for the new CPUs.

You'll probably need serialize the reading of the mask in the regular
dcmi_set_min_frequency() as well. I concur all of that is somewhat
annoying, but AFAICT that's required for a sturdy implementation.

>  static int dcmi_start_streaming(struct vb2_queue *vq, unsigned int count)
>  {
>       struct stm32_dcmi *dcmi = vb2_get_drv_priv(vq);

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq
From: Marc Zyngier @ 2020-06-03 15:16 UTC (permalink / raw)
  To: Saidi, Ali
  Cc: Herrenschmidt, Benjamin, jason, Machulsky, Zorik, linux-kernel,
	Zilberman, Zeev, linux-arm-kernel, tglx, Woodhouse, David
In-Reply-To: <AE04B507-C5E2-44D2-9190-41E9BE720F9D@amazon.com>

On 2020-06-02 19:47, Saidi, Ali wrote:

[...]

> Looks like the x86 apic set_affinity call explicitly checks for if
> it’s activated in the managed case which makes sense given the code
> Ben posted above:
>           /*
>            * Core code can call here for inactive interrupts. For 
> inactive
>            * interrupts which use managed or reservation mode there is 
> no
>            * point in going through the vector assignment right now as 
> the
>            * activation will assign a vector which fits the destination
>            * cpumask. Let the core code store the destination mask and 
> be
>            * done with it.
>            */
>           if (!irqd_is_activated(irqd) &&
>               (apicd->is_managed || apicd->can_reserve))
> 
> My original patch should certain check activated and not disabled.
> With that do you still have reservations Marc?

I'd still prefer it if we could do something in core code, rather
than spreading these checks in the individual drivers. If we can't,
fair enough. But it feels like the core set_affinity function could
just do the same thing in a single place (although the started vs
activated is yet another piece of the puzzle I didn't consider,
and the ITS doesn't need the "can_reserve" thing).

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v8 4/8] PM / EM: add support for other devices than CPUs in Energy Model
From: Rafael J. Wysocki @ 2020-06-03 15:13 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Nishanth Menon, Juri Lelli, Peter Zijlstra, Viresh Kumar,
	Liviu Dudau, dri-devel, Bjorn Andersson, Benjamin Segall,
	alyssa.rosenzweig, Fabio Estevam, Matthias Kaehlcke, Rob Herring,
	Amit Kucheria, Lorenzo Pieralisi, Kevin Hilman, Andy Gross,
	Daniel Lezcano, steven.price, Chanwoo Choi, Ingo Molnar,
	dl-linux-imx, Zhang, Rui, Mel Gorman, orjan.eide, Daniel Vetter,
	Linux PM, linux-arm-msm, Sascha Hauer, Steven Rostedt,
	moderated list:ARM/Mediatek SoC..., Matthias Brugger,
	Linux OMAP Mailing List, Dietmar Eggemann, Linux ARM,
	David Airlie, Tomeu Vizoso, Quentin Perret, Stephen Boyd,
	Randy Dunlap, Rafael J. Wysocki, Linux Kernel Mailing List,
	Bartlomiej Zolnierkiewicz, Sascha Hauer, Sudeep Holla,
	patrick.bellasi, Shawn Guo
In-Reply-To: <7201e161-6952-6e28-4036-bd0f0353ec30@arm.com>

On Tue, Jun 2, 2020 at 1:31 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
> Hi Daniel,
>
> On 6/1/20 10:44 PM, Daniel Lezcano wrote:
> > On 27/05/2020 11:58, Lukasz Luba wrote:
> >> Add support for other devices than CPUs. The registration function
> >> does not require a valid cpumask pointer and is ready to handle new
> >> devices. Some of the internal structures has been reorganized in order to
> >> keep consistent view (like removing per_cpu pd pointers).
> >>
> >> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> >> ---
> >
> > [ ... ]
> >
> >>   }
> >>   EXPORT_SYMBOL_GPL(em_register_perf_domain);
> >> +
> >> +/**
> >> + * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
> >> + * @dev             : Device for which the EM is registered
> >> + *
> >> + * Try to unregister the EM for the specified device (but not a CPU).
> >> + */
> >> +void em_dev_unregister_perf_domain(struct device *dev)
> >> +{
> >> +    if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
> >> +            return;
> >> +
> >> +    if (_is_cpu_device(dev))
> >> +            return;
> >> +
> >> +    mutex_lock(&em_pd_mutex);
> >
> > Is the mutex really needed?
>
> I just wanted to align this unregister code with register. Since there
> is debugfs dir lookup and the device's EM existence checks I thought it
> wouldn't harm just to lock for a while and make sure the registration
> path is not used. These two paths shouldn't affect each other, but with
> modules loading/unloading I wanted to play safe.
>
> I can change it maybe to just dmb() and the end of the function if it's
> a big performance problem in this unloading path. What do you think?

I would rather leave the mutex locking as is.

However, the question to ask is what exactly may go wrong without that
locking in place?  Is there any specific race condition that you are
concerned about?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* [PATCH] scs: Report SCS usage in bytes rather than number of entries
From: Will Deacon @ 2020-06-03 15:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Sami Tolvanen, Will Deacon, keescook, linux-arm-kernel

Fix the SCS debug usage check so that we report the number of bytes
usedm, rather than the number of entries.

Fixes: 5bbaf9d1fcb9 ("scs: Add support for stack usage debugging")
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 kernel/scs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/scs.c b/kernel/scs.c
index 222a7a9ad543..5d4d9bbdec36 100644
--- a/kernel/scs.c
+++ b/kernel/scs.c
@@ -74,7 +74,7 @@ static void scs_check_usage(struct task_struct *tsk)
 	for (p = task_scs(tsk); p < __scs_magic(tsk); ++p) {
 		if (!READ_ONCE_NOCHECK(*p))
 			break;
-		used++;
+		used += sizeof(*p);
 	}
 
 	while (used > curr) {
-- 
2.27.0.rc2.251.g90737beb825-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 2/2] arm64: Use test_tsk_thread_flag() for checking TIF_SINGLESTEP
From: Will Deacon @ 2020-06-03 15:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Mark Rutland, kernel-team, Luis Machado, Will Deacon,
	Keno Fischer
In-Reply-To: <20200603151033.11512-1-will@kernel.org>

Rather than open-code test_tsk_thread_flag() at each callsite, simply
replace the couple of offenders with calls to test_tsk_thread_flag()
directly.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/debug-monitors.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index 732e7ecaa692..8ab6c615ac50 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -394,14 +394,14 @@ void user_rewind_single_step(struct task_struct *task)
 	 * If single step is active for this thread, then set SPSR.SS
 	 * to 1 to avoid returning to the active-pending state.
 	 */
-	if (test_ti_thread_flag(task_thread_info(task), TIF_SINGLESTEP))
+	if (test_tsk_thread_flag(task, TIF_SINGLESTEP))
 		set_regs_spsr_ss(task_pt_regs(task));
 }
 NOKPROBE_SYMBOL(user_rewind_single_step);
 
 void user_fastforward_single_step(struct task_struct *task)
 {
-	if (test_ti_thread_flag(task_thread_info(task), TIF_SINGLESTEP))
+	if (test_tsk_thread_flag(task, TIF_SINGLESTEP))
 		clear_regs_spsr_ss(task_pt_regs(task));
 }
 
-- 
2.27.0.rc2.251.g90737beb825-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 1/2] arm64: Override SPSR.SS when single-stepping is enabled
From: Will Deacon @ 2020-06-03 15:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Mark Rutland, Luis Machado, Will Deacon, stable, Keno Fischer,
	kernel-team
In-Reply-To: <20200603151033.11512-1-will@kernel.org>

Luis reports that, when reverse debugging with GDB, single-step does not
function as expected on arm64:

  | I've noticed, under very specific conditions, that a PTRACE_SINGLESTEP
  | request by GDB won't execute the underlying instruction. As a consequence,
  | the PC doesn't move, but we return a SIGTRAP just like we would for a
  | regular successful PTRACE_SINGLESTEP request.

The underlying problem is that when the CPU register state is restored
as part of a reverse step, the SPSR.SS bit is cleared and so the hardware
single-step state can transition to the "active-pending" state, causing
an unexpected step exception to be taken immediately if a step operation
is attempted.

In hindsight, we probably shouldn't have exposed SPSR.SS in the pstate
accessible by the GPR regset, but it's a bit late for that now. Instead,
simply prevent userspace from configuring the bit to a value which is
inconsistent with the TIF_SINGLESTEP state for the task being traced.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1eed6d69-d53d-9657-1fc9-c089be07f98c@linaro.org
Reported-by: Luis Machado <luis.machado@linaro.org>
Tested-by: Luis Machado <luis.machado@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/debug-monitors.h |  2 ++
 arch/arm64/kernel/debug-monitors.c      | 20 ++++++++++++++++----
 arch/arm64/kernel/ptrace.c              |  4 ++--
 arch/arm64/kernel/signal.c              |  6 +++++-
 4 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
index e5ceea213e39..0b298f48f5bf 100644
--- a/arch/arm64/include/asm/debug-monitors.h
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -109,6 +109,8 @@ void disable_debug_monitors(enum dbg_active_el el);
 
 void user_rewind_single_step(struct task_struct *task);
 void user_fastforward_single_step(struct task_struct *task);
+void user_regs_reset_single_step(struct user_pt_regs *regs,
+				 struct task_struct *task);
 
 void kernel_enable_single_step(struct pt_regs *regs);
 void kernel_disable_single_step(void);
diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index 15e80c876d46..732e7ecaa692 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -141,17 +141,20 @@ postcore_initcall(debug_monitors_init);
 /*
  * Single step API and exception handling.
  */
-static void set_regs_spsr_ss(struct pt_regs *regs)
+static void set_user_regs_spsr_ss(struct user_pt_regs *regs)
 {
 	regs->pstate |= DBG_SPSR_SS;
 }
-NOKPROBE_SYMBOL(set_regs_spsr_ss);
+NOKPROBE_SYMBOL(set_user_regs_spsr_ss);
 
-static void clear_regs_spsr_ss(struct pt_regs *regs)
+static void clear_user_regs_spsr_ss(struct user_pt_regs *regs)
 {
 	regs->pstate &= ~DBG_SPSR_SS;
 }
-NOKPROBE_SYMBOL(clear_regs_spsr_ss);
+NOKPROBE_SYMBOL(clear_user_regs_spsr_ss);
+
+#define set_regs_spsr_ss(r)	set_user_regs_spsr_ss(&(r)->user_regs)
+#define clear_regs_spsr_ss(r)	clear_user_regs_spsr_ss(&(r)->user_regs)
 
 static DEFINE_SPINLOCK(debug_hook_lock);
 static LIST_HEAD(user_step_hook);
@@ -402,6 +405,15 @@ void user_fastforward_single_step(struct task_struct *task)
 		clear_regs_spsr_ss(task_pt_regs(task));
 }
 
+void user_regs_reset_single_step(struct user_pt_regs *regs,
+				 struct task_struct *task)
+{
+	if (test_tsk_thread_flag(task, TIF_SINGLESTEP))
+		set_user_regs_spsr_ss(regs);
+	else
+		clear_user_regs_spsr_ss(regs);
+}
+
 /* Kernel API */
 void kernel_enable_single_step(struct pt_regs *regs)
 {
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 585dd7f5c826..e871ab3ab29b 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -1934,8 +1934,8 @@ static int valid_native_regs(struct user_pt_regs *regs)
  */
 int valid_user_regs(struct user_pt_regs *regs, struct task_struct *task)
 {
-	if (!test_tsk_thread_flag(task, TIF_SINGLESTEP))
-		regs->pstate &= ~DBG_SPSR_SS;
+	/* https://lore.kernel.org/lkml/20191118131525.GA4180@willie-the-truck */
+	user_regs_reset_single_step(regs, task);
 
 	if (is_compat_thread(task_thread_info(task)))
 		return valid_compat_regs(regs);
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 801d56cdf701..c57a077f66cf 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -505,8 +505,12 @@ static int restore_sigframe(struct pt_regs *regs,
 	forget_syscall(regs);
 
 	err |= !valid_user_regs(&regs->user_regs, current);
-	if (err == 0)
+
+	if (err == 0) {
+		/* Make it look like we stepped the sigreturn system call */
+		user_fastforward_single_step(current);
 		err = parse_user_sigframe(&user, sf);
+	}
 
 	if (err == 0 && system_supports_fpsimd()) {
 		if (!user.fpsimd)
-- 
2.27.0.rc2.251.g90737beb825-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH 0/2] arm64: Fix single-stepping with reverse debugging
From: Will Deacon @ 2020-06-03 15:10 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Mark Rutland, kernel-team, Luis Machado, Will Deacon,
	Keno Fischer

Hi all,

This series fixes a bug in our hardware single-step and ptrace logic that
was reported by Luis after seeing some GDB testsuite failures. The second
patch is cosmetic, but I was in the area.

Cheers,

Will

Cc: <kernel-team@android.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Luis Machado <luis.machado@linaro.org>
Cc: Keno Fischer <keno@juliacomputing.com>

--->8

Will Deacon (2):
  arm64: Override SPSR.SS when single-stepping is enabled
  arm64: Use test_tsk_thread_flag() for checking TIF_SINGLESTEP

 arch/arm64/include/asm/debug-monitors.h |  2 ++
 arch/arm64/kernel/debug-monitors.c      | 24 ++++++++++++++++++------
 arch/arm64/kernel/ptrace.c              |  4 ++--
 arch/arm64/kernel/signal.c              |  6 +++++-
 4 files changed, 27 insertions(+), 9 deletions(-)

-- 
2.27.0.rc2.251.g90737beb825-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: arm64 build issue and mainline crash (was Re: linux-next: Tree for Jun 3)
From: John Garry @ 2020-06-03 14:08 UTC (permalink / raw)
  To: Ard Biesheuvel, Stephen Rothwell, Masahiro Yamada
  Cc: Linux Next Mailing List, Linux Kernel Mailing List, Linux ARM
In-Reply-To: <CAMj1kXFes_Q2r8o-7bNYoYzz0OLSy+h70fBd6-Vse=3syJDptg@mail.gmail.com>

>>>> In addition, the reason I was testing this was because Linus' master
>>>> (d6f9469a03d8 Merge tag 'erofs-for-5.8-rc1' of
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs) was crashing:
>>>>
>>>> [ 5.368948] loop: module loaded
>>>> [ 5.372113] Unable to handle kernel paging request at virtual address
>>>> fffff9ffcfec4000
>>>> [ 5.380067] Mem abort info:
>>>> [ 5.382865]ESR = 0x96000044
>>>> [ 5.385927]EC = 0x25: DABT (current EL), IL = 32 bits
>>>> [ 5.391260]SET = 0, FnV = 0
>>>> [ 5.394319]EA = 0, S1PTW = 0
>>>> [ 5.397467] Data abort info:
>>>> [ 5.400354]ISV = 0, ISS = 0x00000044
>>>> [ 5.404203]CM = 0, WnR = 1
>>>> [ 5.407178] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000002f3f1000
>>>> [ 5.413909] [fffff9ffcfec4000] pgd=0000000000000000
>>>> [ 5.418807] Internal error: Oops: 96000044 [#1] PREEMPT SMP
>>>> [ 5.424399] Modules linked in:
>>>> [ 5.427462] CPU: 11 PID: 1 Comm: swapper/0 Not tainted
>>>> 5.7.0-05047-gd6f9469a03d8 #388
>>>> [ 5.435325] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05
>>>> IT21 Nemo 2.0 RC0 04/18/2018
>>>> [ 5.444499] pstate: 40000005 (nZcv daif -PAN -UAO BTYPE=--)
>>>> [ 5.450098] pc : __memset+0x16c/0x1c0
>>>> [ 5.453770] lr : pcpu_alloc+0x1a0/0x668
>>>> [ 5.457615] sp : ffff800011d3bbd0
>>>> [ 5.460936] x29: ffff800011d3bbd0 x28: ffff001fb5495180
>>>> [ 5.466267] x27: 0000000000000001 x26: 0000000000000100
>>>> [ 5.471597] x25: 0000000000000001 x24: 0000000000000001
>>>> [ 5.476928] x23: ffff80001135e9a0 x22: ffff80001196a200
>>>> [ 5.482259] x21: ffff80001196a360 x20: 0000000000000000
>>>> [ 5.487590] x19: 0000000000000000 x18: fffffe107e6fdb08
>>>> [ 5.492920] x17: 000000000000003f x16: 0000000000000000
>>>> [ 5.498251] x15: ffff001ffbffee00 x14: 0000000000000002
>>>> [ 5.503581] x13: 0000000000000000 x12: 000000000000003f
>>>> [ 5.508912] x11: 0000000000000040 x10: 0000000000000040
>>>> [ 5.514243] x9 : 0000000000000000 x8 : fffff9ffcfec4000
>>>> [ 5.519573] x7 : 0000000000000000 x6 : 000000000000003f
>>>> [ 5.524904] x5 : 0000000000000040 x4 : 0000000000000000
>>>> [ 5.530234] x3 : 0000000000000004 x2 : 00000000000000c0
>>>> [ 5.535565] x1 : 0000000000000000 x0 : fffff9ffcfec4000
>>>> [ 5.540896] Call trace:
>>>> [ 5.543344]  __memset+0x16c/0x1c0
>>>> [ 5.546666]  __alloc_percpu+0x14/0x1c
>>>> [ 5.550338]  alloc_workqueue+0x164/0x42c
>>>> [ 5.554273]  init+0x24/0xa4
>>>> [ 5.557071]  do_one_initcall+0x50/0x194
>>>> [ 5.560917]  kernel_init_freeable+0x1e4/0x250
>>>> [ 5.565288]  kernel_init+0x10/0x104
>>>> [ 5.568785]  ret_from_fork+0x10/0x18
>>>> [ 5.572372] Code: 91010108 54ffff4a 8b040108 cb050042 (d50b7428)
>>>> [ 5.578568] ---[ end trace 63c299bbe9b8ea9e ]---
>>>> [ 5.583205] Kernel panic - not syncing: Attempted to kill init!
>>>> exitcode=0x0000000b
>>>> [ 5.590903] SMP: stopping secondary CPUs
>>>> [ 5.594846] Kernel Offset: 0xf0000 from 0xffff800010000000
>>>> [ 5.600350] PHYS_OFFSET: 0x0
>>>> [ 5.603235] CPU features: 0x240022,20806008
>>>> [ 5.607430] Memory Limit: none
>>>> [ 5.610500] ---[ end Kernel panic - not syncing: Attempted to kill init!
>>>> exitcode=0x0000000b ]---
>>>>
>>>> I'll check that when I get a chance. Maybe all just transient.
>>>
>>> Yeah, I forgot to add a patch to make arm64 build again (it will be in
>>> linux-next tomorrow).  If you want to apply it to your tree, here is
>>> what I was given:
>>>
>>> diff --git a/Makefile b/Makefile
>>> index f80c4ff93ec9..fbb4b95ae648 100644
>>> --- a/Makefile
>>> +++ b/Makefile
>>> @@ -1074,7 +1074,7 @@ build-dirs        := $(vmlinux-dirs)
>>>   clean-dirs     := $(vmlinux-alldirs)
>>>
>>>   # Externally visible symbols (used by link-vmlinux.sh)
>>> -KBUILD_VMLINUX_OBJS := $(head-y) $(addsuffix built-in.a, $(core-y))
>>> +KBUILD_VMLINUX_OBJS := $(head-y) $(patsubst %/,%/built-in.a, $(core-y))
>>>   KBUILD_VMLINUX_OBJS += $(addsuffix built-in.a, $(filter %/, $(libs-y)))
>>>   ifdef CONFIG_MODULES
>>>   KBUILD_VMLINUX_OBJS += $(patsubst %/, %/lib.a, $(filter %/, $(libs-y)))
>>

thanks, that works. And, for anyone interested, no crash, like mainline, 
above.

John

> 
> 
> 9203be13ef5bfd1fcf39f7b7fe94597d2d2a0eb4 is the first bad commit
> commit 9203be13ef5bfd1fcf39f7b7fe94597d2d2a0eb4
> Author: Masahiro Yamada <masahiroy@kernel.org>
> Date:   Mon Jun 1 14:56:59 2020 +0900
> 
>      kbuild: refactor KBUILD_VMLINUX_{OBJS,LIBS} calculation
> 
>      Do not overwrite core-y or drivers-y. Remove libs-y1 and libs-y2.
> 
>      Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> .
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 1/5] drm/omap: Fix suspend resume regression after platform data removal
From: Tony Lindgren @ 2020-06-03 14:06 UTC (permalink / raw)
  To: Tomi Valkeinen
  Cc: Nishanth Menon, Tero Kristo, Grygorii Strashko, Dave Gerlach,
	Greg Kroah-Hartman, linux-kernel, dri-devel, Andrew F . Davis,
	Peter Ujfalusi, Faiz Abbas, Laurent Pinchart, Keerthy, Suman Anna,
	linux-omap, linux-arm-kernel, Roger Quadros
In-Reply-To: <16ba1808-5c7f-573d-8dd0-c80cac2f476e@ti.com>

* Tomi Valkeinen <tomi.valkeinen@ti.com> [200603 12:34]:
> Hi Tony,
> 
> On 31/05/2020 22:39, Tony Lindgren wrote:
> > When booting without legacy platform data, we no longer have omap_device
> > calling PM runtime suspend for us on suspend. This causes the driver
> > context not be saved as we have no suspend and resume functions defined.
> > 
> > Let's fix the issue by switching over to use UNIVERSAL_DEV_PM_OPS as it
> > will call the existing PM runtime suspend functions on suspend.
> 
> I don't think we can use UNIVERSAL_DEV_PM_OPS, as we can't disable DSS
> modules in any order, but things have to be shut down in orderly manner.

OK. I presume you talk about the order of dss child devices here.

> omapdrm hasn't relied on omap_device calling runtime suspend for us (I
> didn't know it does that). We have system suspend hooks in omap_drv.c:

We had omap_device sort of brute forcing things to idle on suspend
which only really works for interconnect target modules with one
device in them.

> SIMPLE_DEV_PM_OPS(omapdrm_pm_ops, omap_drm_suspend, omap_drm_resume)
> 
> omap_drm_suspend() is supposed to turn off the displays, which then cause
> dispc_runtime_put (and other runtime_puts) to be called, which result in
> dispc_runtime_suspend (and other runtime PM suspends).

OK thanks for explaining, I missed that part.

> So... For some reason that's no longer happening? I need to try to find a
> board with which suspend/resume works (without DSS)...

Yes it seems something has changed. When diffing the dmesg debug output
on suspend and resume, context save and restore functions are no longer
called as the PM runtime suspend and resume functions are no longer
called on suspend and resume.

I'll drop this patch, and will be applying the rest of the series to
fixes if no objections.

Thanks,

Tony

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 2/2] coresight: tmc: Add shutdown callback for TMC ETR/ETF
From: Sai Prakash Ranjan @ 2020-06-03 14:02 UTC (permalink / raw)
  To: Mike Leach
  Cc: Mathieu Poirier, Suzuki K Poulose, linux-arm-msm, Coresight ML,
	Linux Kernel Mailing List, Stephen Boyd, Robin Murphy,
	linux-arm-kernel
In-Reply-To: <CAJ9a7Vj3sL=4O3DU+aJWYLhue1UxQmX4Ba5JdEnmxKDEYo_z4Q@mail.gmail.com>

Hi Mike,

On 2020-06-03 19:21, Mike Leach wrote:
> Hi,
> 
> 
> On Wed, 3 Jun 2020 at 14:34, Robin Murphy <robin.murphy@arm.com> wrote:
>> 
>> On 2020-06-03 14:22, Mike Leach wrote:
>> > Hi Sai,
>> >
>> > On Wed, 3 Jun 2020 at 13:14, Sai Prakash Ranjan
>> > <saiprakash.ranjan@codeaurora.org> wrote:
>> >>
>> >> Hi Mike,
>> >>
>> >> On 2020-06-03 16:57, Mike Leach wrote:
>> >>> Hi,
>> >>>
>> >>> On Wed, 3 Jun 2020 at 11:24, Sai Prakash Ranjan
>> >>> <saiprakash.ranjan@codeaurora.org> wrote:
>> >>>>
>> >>>> Hi Mike,
>> >>>>
>> >>>> Thanks again for looking at this.
>> >>>>
>> >>>> On 2020-06-03 03:42, Mike Leach wrote:
>> >>>> [...]
>> >>>>
>> >>>>>>
>> >>>>>> SMMU/IOMMU won't be able to do much here as it is the client's
>> >>>>>> responsiblity to
>> >>>>>> properly shutdown and SMMU device link just makes sure that
>> >>>>>> SMMU(supplier) shutdown is
>> >>>>>> called only after its consumers shutdown callbacks are called.
>> >>>>>
>> >>>>> I think this use case can be handled slightly differently than the
>> >>>>> general requirements for modular CoreSight drivers.
>> >>>>>
>> >>>>> What is needed here is a way of stopping the underlying ETR hardware
>> >>>>> from issuing data to the SMMU, until the entire device has been shut
>> >>>>> down, in a way that does not remove the driver, breaking existing
>> >>>>> references and causing a system crash.
>> >>>>>
>> >>>>> We could introduce a new mode to the ETR driver - e.g.
>> >>>>> CS_MODE_SHUTDOWN.
>> >>>>>
>> >>>>> At the end of the block tmc_shutdown(struct amba_device *adev), set
>> >>>>> drvdata->mode to CS_MODE_SHUTDOWN & remove the coresight_unregister().
>> >>>>> This new mode can be used to  prevent the underlying hardware from
>> >>>>> being able to restart until the device is re-powered.
>> >>>>>
>> >>>>> This mode can be detected in the code that enables / disables the ETR
>> >>>>> and handled appropriately (updates to tmc_enable_etr_sink and
>> >>>>> tmc_disable_etr_sink).
>> >>>>> This mode will persist until the device is re-started - but because we
>> >>>>> are on the device shutdown path this is not an issue.
>> >>>>>
>> >>>>> This should leave the CoreSight infrastructure stable until the
>> >>>>> drivers are shut down normally as part of the device power down
>> >>>>> process.
>> >>>>>
>> >>>>
>> >>>> Sounds good to me, but if the coresight_unregister() is the trouble
>> >>>> point
>> >>>> causing these crashes, then can't we just remove that from
>> >>>> tmc_shutdown()
>> >>>> callback? This would be like maintaining the same behaviour as now
>> >>>> where
>> >>>> on reboot/shutdown we basically don't do anything except for disabling
>> >>>> ETR.
>> >>>
>> >>> No - the new mode prevents race conditions where the thread shutting
>> >>> down the SMMU does the ETR shutdown, but then another thread happens
>> >>> to be trying to start trace and restarts the ETR.
>> >>> It also prevents the condition Mathieu discussed where a thread might
>> >>> be attempting to shutdown trace - this could try to disable the
>> >>> hardware again re-releasing resources/ re-flushing and waiting for
>> >>> stop.
>> >>>
>> >>
>> >> I do not think there will a race between SMMU shutdown and ETR shutdown.
>> >> Driver core takes care of calling SMMU shutdown after its consumer
>> >> shutdown callbacks via device link, otherwise there would already be
>> >> bugs in all other client drivers.
>> >>
>> >
>> > I am not saying there could be a race between tmc_shutdowm and
>> > Smmu_shutdown - there may be a case if the coresight_disable_path
>> > sequence is running and gets to the point of disabling the ETR after
>> > the SMMU callback has disabled it.
>> 
>> I'm confused now - there is no "SMMU callback", we're talking about 
>> the
>> system-wide cleanup from kernel_shutdown_prepare() or
>> kernel_restart_prepare(). As far as I'm aware userspace should be long
>> gone by that point, so although trace may have been left running ||
>            ((offset >= TRCCIDCVRn(0)) && (offset <= TRCVMIDCVRn(7)), 
> the
>> chance of racing against other driver operations seems pretty 
>> unlikely.
>> 
> 
> Sorry - bad choice of terminology. I was referring to the SMMU
> ensuring that it had all its clients shut-down before if shut down. To
> quote Sai...
> 
>>>>>> SMMU device link just makes sure that
>> >>>>>> SMMU(supplier) shutdown is
>> >>>>>> called only after its consumers shutdown callbacks are called.
> 
> I agree it is unlikely, but if removing the device from the CoreSight
> infrastructure via coresight_unregister() is a potential source of a
> crash, it would seem that there is a potential path where some
> CoreSight driver side work might be possible. therefore a mode to
> prevent this crash, and ensure that the device hardware remains off
> and not sending trace to SMMU until such time as shutdown / reboot
> restart occurs, seemed prudent.
> 

Actually I did not see any crash with coresight_unregister() during
reboot/shutdown as I mentioned previously to Mathieu's query on
this being similar to remove callback. I think the crash with
coresight_unregister() is only seen when we have coresight as module
and the userspace is pretty much there to enable/disable trace when
we try to bind/unbind. But here we only consider the system 
reboot/shutdown
where pretty much everything is down by this point.

Thanks,
Sai
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 2/2] coresight: tmc: Add shutdown callback for TMC ETR/ETF
From: Mike Leach @ 2020-06-03 13:51 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Sai Prakash Ranjan, Mathieu Poirier, Suzuki K Poulose,
	linux-arm-msm, Coresight ML, Linux Kernel Mailing List,
	Stephen Boyd, linux-arm-kernel
In-Reply-To: <1a5a6a6d-b86d-df45-cf91-7081e70d88a3@arm.com>

Hi,


On Wed, 3 Jun 2020 at 14:34, Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2020-06-03 14:22, Mike Leach wrote:
> > Hi Sai,
> >
> > On Wed, 3 Jun 2020 at 13:14, Sai Prakash Ranjan
> > <saiprakash.ranjan@codeaurora.org> wrote:
> >>
> >> Hi Mike,
> >>
> >> On 2020-06-03 16:57, Mike Leach wrote:
> >>> Hi,
> >>>
> >>> On Wed, 3 Jun 2020 at 11:24, Sai Prakash Ranjan
> >>> <saiprakash.ranjan@codeaurora.org> wrote:
> >>>>
> >>>> Hi Mike,
> >>>>
> >>>> Thanks again for looking at this.
> >>>>
> >>>> On 2020-06-03 03:42, Mike Leach wrote:
> >>>> [...]
> >>>>
> >>>>>>
> >>>>>> SMMU/IOMMU won't be able to do much here as it is the client's
> >>>>>> responsiblity to
> >>>>>> properly shutdown and SMMU device link just makes sure that
> >>>>>> SMMU(supplier) shutdown is
> >>>>>> called only after its consumers shutdown callbacks are called.
> >>>>>
> >>>>> I think this use case can be handled slightly differently than the
> >>>>> general requirements for modular CoreSight drivers.
> >>>>>
> >>>>> What is needed here is a way of stopping the underlying ETR hardware
> >>>>> from issuing data to the SMMU, until the entire device has been shut
> >>>>> down, in a way that does not remove the driver, breaking existing
> >>>>> references and causing a system crash.
> >>>>>
> >>>>> We could introduce a new mode to the ETR driver - e.g.
> >>>>> CS_MODE_SHUTDOWN.
> >>>>>
> >>>>> At the end of the block tmc_shutdown(struct amba_device *adev), set
> >>>>> drvdata->mode to CS_MODE_SHUTDOWN & remove the coresight_unregister().
> >>>>> This new mode can be used to  prevent the underlying hardware from
> >>>>> being able to restart until the device is re-powered.
> >>>>>
> >>>>> This mode can be detected in the code that enables / disables the ETR
> >>>>> and handled appropriately (updates to tmc_enable_etr_sink and
> >>>>> tmc_disable_etr_sink).
> >>>>> This mode will persist until the device is re-started - but because we
> >>>>> are on the device shutdown path this is not an issue.
> >>>>>
> >>>>> This should leave the CoreSight infrastructure stable until the
> >>>>> drivers are shut down normally as part of the device power down
> >>>>> process.
> >>>>>
> >>>>
> >>>> Sounds good to me, but if the coresight_unregister() is the trouble
> >>>> point
> >>>> causing these crashes, then can't we just remove that from
> >>>> tmc_shutdown()
> >>>> callback? This would be like maintaining the same behaviour as now
> >>>> where
> >>>> on reboot/shutdown we basically don't do anything except for disabling
> >>>> ETR.
> >>>
> >>> No - the new mode prevents race conditions where the thread shutting
> >>> down the SMMU does the ETR shutdown, but then another thread happens
> >>> to be trying to start trace and restarts the ETR.
> >>> It also prevents the condition Mathieu discussed where a thread might
> >>> be attempting to shutdown trace - this could try to disable the
> >>> hardware again re-releasing resources/ re-flushing and waiting for
> >>> stop.
> >>>
> >>
> >> I do not think there will a race between SMMU shutdown and ETR shutdown.
> >> Driver core takes care of calling SMMU shutdown after its consumer
> >> shutdown callbacks via device link, otherwise there would already be
> >> bugs in all other client drivers.
> >>
> >
> > I am not saying there could be a race between tmc_shutdowm and
> > Smmu_shutdown - there may be a case if the coresight_disable_path
> > sequence is running and gets to the point of disabling the ETR after
> > the SMMU callback has disabled it.
>
> I'm confused now - there is no "SMMU callback", we're talking about the
> system-wide cleanup from kernel_shutdown_prepare() or
> kernel_restart_prepare(). As far as I'm aware userspace should be long
> gone by that point, so although trace may have been left running ||
           ((offset >= TRCCIDCVRn(0)) && (offset <= TRCVMIDCVRn(7)), the
> chance of racing against other driver operations seems pretty unlikely.
>

Sorry - bad choice of terminology. I was referring to the SMMU
ensuring that it had all its clients shut-down before if shut down. To
quote Sai...

>>>>> SMMU device link just makes sure that
> >>>>>> SMMU(supplier) shutdown is
> >>>>>> called only after its consumers shutdown callbacks are called.

I agree it is unlikely, but if removing the device from the CoreSight
infrastructure via coresight_unregister() is a potential source of a
crash, it would seem that there is a potential path where some
CoreSight driver side work might be possible. therefore a mode to
prevent this crash, and ensure that the device hardware remains off
and not sending trace to SMMU until such time as shutdown / reboot
restart occurs, seemed prudent.

Mike

> Robin.



--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 2/2] coresight: tmc: Add shutdown callback for TMC ETR/ETF
From: Sai Prakash Ranjan @ 2020-06-03 13:51 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Mathieu Poirier, Suzuki K Poulose, linux-arm-msm, coresight,
	linux-kernel, Stephen Boyd, linux-arm-kernel, Mike Leach
In-Reply-To: <68444180-3ebe-8988-493a-fdd1dff994b6@arm.com>

Hi Robin,

On 2020-06-03 19:10, Robin Murphy wrote:
> On 2020-06-03 13:26, Sai Prakash Ranjan wrote:
>> Hi Robin,
>> 
>> On 2020-06-03 17:51, Robin Murphy wrote:
>>> On 2020-06-03 13:00, Sai Prakash Ranjan wrote:
>>>> Hi Robin, Mathieu
>>>> 
>>>> On 2020-06-03 17:07, Robin Murphy wrote:
>>>>> On 2020-06-01 22:28, Mathieu Poirier wrote:
>>>>>> That being said I'm sure that dependencies on an IOMMU isn't a 
>>>>>> problem confined
>>>>>> to coresight. I am adding Robin Murphy, who added this commit [1], 
>>>>>> to the thread
>>>>>> in the hope that he can provide guidance on the right way to do 
>>>>>> this.
>>>>> 
>>>>> Right, it's not specific to CoreSight, and it's not even specific 
>>>>> to
>>>>> IOMMUs really. In short, blame kexec ;)
>>>>> 
>>>> 
>>>> Yes it is not specific to coresight, we are targeting this for all
>>>> consumers/clients of SMMU(atleast on SC7180 SoC). We have display 
>>>> throwing
>>>> NoC/interconnect errors[1] during reboot after SMMU is disabled.
>>>> This is also not specific to kexec either as you explained here [2] 
>>>> about
>>>> a case with display which is exacly what is happening in our system 
>>>> [1].
>>> 
>>> Sure, but those instances are begging the question of why the SMMU is
>>> disabled at reboot in the first place ;)
>>> 
>> 
>> That is what happens in SMMU shutdown callback right? It is the 
>> reboot/shutdown flow.
> 
> Yes, that's where it happens, but my point is *why* it happens at all.
> 
> hint: `git log --grep=shutdown drivers/iommu/`
> 

Ah my change :)

> If we could assume the system is always about to be powered off or
> reset, we wouldn't need to do anything to the SMMU either ;)
> 

Are you hinting at removing SMMU shutdown callback altogether ;)

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] sound: usb: pcm: fix incorrect power state when playing sound after PM_AUTO suspend
From: Macpaul Lin @ 2020-06-03 13:50 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: alsa-devel, Szabolcs Szőke, Mediatek WSD Upstream,
	Greg Kroah-Hartman, linux-usb, Takashi Iwai, stable, linux-kernel,
	Hui Wang, Alexander Tsoy, linux-mediatek, Matthias Brugger,
	Johan Hovold, Jaroslav Kysela, Macpaul Lin, linux-arm-kernel
In-Reply-To: <s5hr1uwco4c.wl-tiwai@suse.de>

On Wed, 2020-06-03 at 14:47 +0200, Takashi Iwai wrote:
> On Wed, 03 Jun 2020 14:39:24 +0200,
> Macpaul Lin wrote:
> > 
> > On Wed, 2020-06-03 at 10:45 +0200, Takashi Iwai wrote:
> > > On Wed, 03 Jun 2020 08:54:51 +0200,
> > > Takashi Iwai wrote:
> > > > 
> > > > On Wed, 03 Jun 2020 08:28:09 +0200,
> > > > Takashi Iwai wrote:
> > > > > 
> > > > > And, the most suspicious case is the last one,
> > > > > chip->num_suspended-intf.  It means that the device has multiple
> > > > > USB interfaces and they went to suspend, while the resume isn't
> > > > > performed for the all suspended interfaces in return.
> > > > 
> > > > If this is the cause, a patch like below might help.
> > > > It gets/puts the all assigned interfaced instead of only the primary
> > > > one.
> > > 
> > > ... and considering of the problem again, rather the patch below might
> > > be the right answer.  Now the driver tries to remember at which state
> > > it entered into the system-suspend.  Upon resume, in return, when the
> > > state reaches back to that point, set the card state to D0.
> > > 
> > > The previous patch can be applied on the top, too, and it might be
> > > worth to apply both.
> > > 
> > > Let me know if any of those actually helps.
> > > 
> > > 
> > > Takashi
> > 
> > Thanks for your response so quickly.
> > I've just test this patch since it looks like enough for the issue.
> 
> Good to hear!
> 
> > This patch worked since the flag system_suspend will be set at the same
> > time when power state has been changed. I have 2 interface with the head
> > set. But actually the problem happened when primary one is suspended.
> 
> Currently the autosuspend is set only to the primary interface; IOW,
> the other interfaces will never get autosuspend, and the another
> suspend-all-intf patch should improve that situation.  But it won't
> fix your actual bug, obviously :)
> 
> > So I didn't test the earlier patch "suspend all interface instead of
> > only the primary one."
> 
> Could you try it one on top of the last patch?  At least I'd like to
> see whether it causes any regression.

I've tried both of these 2 patches together, and it looks okay.

> > Will you resend this patch officially later? I think this solution is
> > required to send to stable, too. It's better to have it for other stable
> > kernel versions include android's.
> 
> Yes, that's a general bug and worth to be merged quickly.
> I'm going to submit a proper patch soon later.
> 
> 
> thanks,
> 
> Takashi
> 

Thanks!
Macpaul Lin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v3 04/20] arm64: dts: arm: vexpress: Move fixed devices out of bus node
From: Rob Herring @ 2020-06-03 13:49 UTC (permalink / raw)
  To: André Przywara
  Cc: Mark Rutland, devicetree, Lorenzo Pieralisi, Liviu Dudau,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Sudeep Holla, Guenter Roeck
In-Reply-To: <1d111f40-1702-7ea0-825f-ab08d77353e9@arm.com>

On Wed, Jun 3, 2020 at 5:21 AM André Przywara <andre.przywara@arm.com> wrote:
>
> On 02/06/2020 00:12, Rob Herring wrote:
>
> Hi,
>
> > On Mon, Jun 1, 2020 at 4:15 AM André Przywara <andre.przywara@arm.com> wrote:
> >>
> >> On 28/05/2020 14:30, André Przywara wrote:
> >>
> >> Hi,
> >>
> >>> On 28/05/2020 03:48, Guenter Roeck wrote:
> >>>
> >>> Hi Guenter,
> >>>
> >>>> On Wed, May 13, 2020 at 11:30:00AM +0100, Andre Przywara wrote:
> >>>>> The devicetree compiler complains when DT nodes without a reg property
> >>>>> live inside a (simple) bus node:
> >>>>> Warning (simple_bus_reg): Node /bus@8000000/motherboard-bus/refclk32khz
> >>>>>                           missing or empty reg/ranges property
> >>>>>
> >>>>> Move the fixed clocks, the fixed regulator, the leds and the config bus
> >>>>> subtree to the root node, since they do not depend on any busses.
> >>>>>
> >>>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> >>>>
> >>>> This patch results in tracebacks when booting the vexpress-a15 machine
> >>>> with vexpress-v2p-ca15-tc1 devicetree file in qemu. Reverting it as well
> >>>> as the subsequent patches affecting the same file (to avoid revert
> >>>> conflicts) fixes the problem.
> >>>
> >>> Many thanks for the heads up! I was able to reproduce it here. On the
> >>> first glance it looks like the UART is probed before the clocks now,
> >>> because the traversal of the changed DT leads to a different probe
> >>> order. I will look into how to fix this.
> >>
> >> Turned out to be a bit more complicated:
> >> The arm,vexpress,config-bus driver walks up the device tree to find a
> >> arm,vexpress,site property [1]. With this patch the first parent node
> >> with that property it finds is now the root node, with the wrong site ID
> >> (0xf instead of 0x0). So it queries the wrong clocks (those IDs are
> >> actually reserved there), and QEMU reports back "0", consequently [2].
> >> Finding a clock frequency in the range of [0, 0] won't get very far.
> >>
> >> Possible solutions are:
> >> 1) Just keep the mcc and its children at where it is in mainline right
> >> now, so *partly* reverting this patch. This has the problem of still
> >> producing a dtc warning, so kind of defeats the purpose of this patch.
> >>
> >> 2) Add a "arm,vexpress,site = <0>;" line to the "mcc" node itself.
> >> Works, but looks somewhat dodgy, as the mcc node should really be a
> >> child of the motherboard node, and we should not hack around this.
> >>
> >> 3) Dig deeper and fix the DT in a way that makes dtc happy. Might
> >> involve (dummy?) ranges or reg properties. My gut feeling is that
> >> arm,vexpress-sysreg,func should really have been "reg" in the first
> >> place, but that's too late to change now, anyway.
> >>
> >> I will post 2) as a fix if 3) turns out to be not feasible.
> >
> > I would just do 1).
> >
> > To some extent, the warnings are for avoiding poor design on new
> > bindings. We need a way to distinguish between existing boards and new
> > ones. Maybe dts needs to learn some warning disable annotations or we
> > need per target warning settings (DTC_FLAGS_foo.dtb ?). Or maybe this
> > check is just too strict.
>
> So I was always wondering about this check, actually. A simple-bus
> describes a bus which is mapped into the CPU address space (in contrast
> to say an I2C bus, for instance). So children of this bus node typically
> have a reg property.
>
> Now also those simple-bus nodes seem to be used to logically group
> hardware in a DT (see this "motherboard" node here). *If* we go with
> this, we should also allow other subnodes, for instance fixed-clocks:
> after all there is probably an actual fixed crystal oscillator on the
> motherboard, so it would also belong in there.

Yes, that's probably right. We'd want things grouped if this was
something applied as an overlay.

> I see that (ab)using simple-bus for *just* grouping nodes is probably
> not a good design, but I don't see why *every* child must be mapped into
> the address space.
>
> Maybe dtc's simple-bus check should indeed be relaxed, to just require
> *at least one* child with a reg or ranges property, but also allow other
> nodes?

It's made you think about the proper location, so it's accomplished
its goal. Maybe this is one that's not without false positives. It
would be good to distinguish between what's for sure a warning and
what's maybe a warning as just blindly fixing the warning is not the
answer.

Rob

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 2/2] coresight: tmc: Add shutdown callback for TMC ETR/ETF
From: Sai Prakash Ranjan @ 2020-06-03 13:43 UTC (permalink / raw)
  To: Robin Murphy, Mike Leach
  Cc: Mathieu Poirier, Suzuki K Poulose, linux-arm-msm, Coresight ML,
	Linux Kernel Mailing List, Stephen Boyd, linux-arm-kernel
In-Reply-To: <1a5a6a6d-b86d-df45-cf91-7081e70d88a3@arm.com>

Hi Mike,

On 2020-06-03 19:04, Robin Murphy wrote:
> On 2020-06-03 14:22, Mike Leach wrote:
>> Hi Sai,
>> 
>> On Wed, 3 Jun 2020 at 13:14, Sai Prakash Ranjan
>> <saiprakash.ranjan@codeaurora.org> wrote:
>>> 
>>> Hi Mike,
>>> 
>>> On 2020-06-03 16:57, Mike Leach wrote:
>>>> Hi,
>>>> 
>>>> On Wed, 3 Jun 2020 at 11:24, Sai Prakash Ranjan
>>>> <saiprakash.ranjan@codeaurora.org> wrote:
>>>>> 
>>>>> Hi Mike,
>>>>> 
>>>>> Thanks again for looking at this.
>>>>> 
>>>>> On 2020-06-03 03:42, Mike Leach wrote:
>>>>> [...]
>>>>> 
>>>>>>> 
>>>>>>> SMMU/IOMMU won't be able to do much here as it is the client's
>>>>>>> responsiblity to
>>>>>>> properly shutdown and SMMU device link just makes sure that
>>>>>>> SMMU(supplier) shutdown is
>>>>>>> called only after its consumers shutdown callbacks are called.
>>>>>> 
>>>>>> I think this use case can be handled slightly differently than the
>>>>>> general requirements for modular CoreSight drivers.
>>>>>> 
>>>>>> What is needed here is a way of stopping the underlying ETR 
>>>>>> hardware
>>>>>> from issuing data to the SMMU, until the entire device has been 
>>>>>> shut
>>>>>> down, in a way that does not remove the driver, breaking existing
>>>>>> references and causing a system crash.
>>>>>> 
>>>>>> We could introduce a new mode to the ETR driver - e.g.
>>>>>> CS_MODE_SHUTDOWN.
>>>>>> 
>>>>>> At the end of the block tmc_shutdown(struct amba_device *adev), 
>>>>>> set
>>>>>> drvdata->mode to CS_MODE_SHUTDOWN & remove the 
>>>>>> coresight_unregister().
>>>>>> This new mode can be used to  prevent the underlying hardware from
>>>>>> being able to restart until the device is re-powered.
>>>>>> 
>>>>>> This mode can be detected in the code that enables / disables the 
>>>>>> ETR
>>>>>> and handled appropriately (updates to tmc_enable_etr_sink and
>>>>>> tmc_disable_etr_sink).
>>>>>> This mode will persist until the device is re-started - but 
>>>>>> because we
>>>>>> are on the device shutdown path this is not an issue.
>>>>>> 
>>>>>> This should leave the CoreSight infrastructure stable until the
>>>>>> drivers are shut down normally as part of the device power down
>>>>>> process.
>>>>>> 
>>>>> 
>>>>> Sounds good to me, but if the coresight_unregister() is the trouble
>>>>> point
>>>>> causing these crashes, then can't we just remove that from
>>>>> tmc_shutdown()
>>>>> callback? This would be like maintaining the same behaviour as now
>>>>> where
>>>>> on reboot/shutdown we basically don't do anything except for 
>>>>> disabling
>>>>> ETR.
>>>> 
>>>> No - the new mode prevents race conditions where the thread shutting
>>>> down the SMMU does the ETR shutdown, but then another thread happens
>>>> to be trying to start trace and restarts the ETR.
>>>> It also prevents the condition Mathieu discussed where a thread 
>>>> might
>>>> be attempting to shutdown trace - this could try to disable the
>>>> hardware again re-releasing resources/ re-flushing and waiting for
>>>> stop.
>>>> 
>>> 
>>> I do not think there will a race between SMMU shutdown and ETR 
>>> shutdown.
>>> Driver core takes care of calling SMMU shutdown after its consumer
>>> shutdown callbacks via device link, otherwise there would already be
>>> bugs in all other client drivers.
>>> 
>> 
>> I am not saying there could be a race between tmc_shutdowm and
>> Smmu_shutdown - there may be a case if the coresight_disable_path
>> sequence is running and gets to the point of disabling the ETR after
>> the SMMU callback has disabled it.
> 
> I'm confused now - there is no "SMMU callback", we're talking about
> the system-wide cleanup from kernel_shutdown_prepare() or
> kernel_restart_prepare(). As far as I'm aware userspace should be long
> gone by that point, so although trace may have been left running, the
> chance of racing against other driver operations seems pretty
> unlikely.
> 

As Robin said, it is not SMMU callback but the normal reboot/shutdown
flow and race is unlikely at that point.

    tmc_shutdown()
     platform_drv_shutdown()
       device_shutdown()
        kernel_restart_prepare()
         kernel_restart()

If I am not clear enough, first all the consumer shutdown callbacks of 
SMMU
are called like above tmc_shutdown() and then we call the 
arm_smmu_device_shutdown(),
this ordering is ensured by the device links.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 2/2] coresight: tmc: Add shutdown callback for TMC ETR/ETF
From: Robin Murphy @ 2020-06-03 13:40 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Mathieu Poirier, Suzuki K Poulose, linux-arm-msm, coresight,
	linux-kernel, Stephen Boyd, linux-arm-kernel, Mike Leach
In-Reply-To: <7fe5762b5cb8f87e988232922d06c55d@codeaurora.org>

On 2020-06-03 13:26, Sai Prakash Ranjan wrote:
> Hi Robin,
> 
> On 2020-06-03 17:51, Robin Murphy wrote:
>> On 2020-06-03 13:00, Sai Prakash Ranjan wrote:
>>> Hi Robin, Mathieu
>>>
>>> On 2020-06-03 17:07, Robin Murphy wrote:
>>>> On 2020-06-01 22:28, Mathieu Poirier wrote:
>>>>> That being said I'm sure that dependencies on an IOMMU isn't a 
>>>>> problem confined
>>>>> to coresight. I am adding Robin Murphy, who added this commit [1], 
>>>>> to the thread
>>>>> in the hope that he can provide guidance on the right way to do this.
>>>>
>>>> Right, it's not specific to CoreSight, and it's not even specific to
>>>> IOMMUs really. In short, blame kexec ;)
>>>>
>>>
>>> Yes it is not specific to coresight, we are targeting this for all
>>> consumers/clients of SMMU(atleast on SC7180 SoC). We have display 
>>> throwing
>>> NoC/interconnect errors[1] during reboot after SMMU is disabled.
>>> This is also not specific to kexec either as you explained here [2] 
>>> about
>>> a case with display which is exacly what is happening in our system [1].
>>
>> Sure, but those instances are begging the question of why the SMMU is
>> disabled at reboot in the first place ;)
>>
> 
> That is what happens in SMMU shutdown callback right? It is the 
> reboot/shutdown flow.

Yes, that's where it happens, but my point is *why* it happens at all.

hint: `git log --grep=shutdown drivers/iommu/`

If we could assume the system is always about to be powered off or 
reset, we wouldn't need to do anything to the SMMU either ;)

Robin.

> 
>     arm_smmu_device_shutdown()
>      platform_drv_shutdown()
>       device_shutdown()
>        kernel_restart_prepare()
>         kernel_restart()
> 
> Thanks,
> Sai
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH 2/2] coresight: tmc: Add shutdown callback for TMC ETR/ETF
From: Robin Murphy @ 2020-06-03 13:34 UTC (permalink / raw)
  To: Mike Leach, Sai Prakash Ranjan
  Cc: Mathieu Poirier, Suzuki K Poulose, linux-arm-msm, Coresight ML,
	Linux Kernel Mailing List, Stephen Boyd, linux-arm-kernel
In-Reply-To: <CAJ9a7VgCFeHNbY_9Gwvu6uT9MFBeY=_GCaN4N1dwmm+iNpfJOw@mail.gmail.com>

On 2020-06-03 14:22, Mike Leach wrote:
> Hi Sai,
> 
> On Wed, 3 Jun 2020 at 13:14, Sai Prakash Ranjan
> <saiprakash.ranjan@codeaurora.org> wrote:
>>
>> Hi Mike,
>>
>> On 2020-06-03 16:57, Mike Leach wrote:
>>> Hi,
>>>
>>> On Wed, 3 Jun 2020 at 11:24, Sai Prakash Ranjan
>>> <saiprakash.ranjan@codeaurora.org> wrote:
>>>>
>>>> Hi Mike,
>>>>
>>>> Thanks again for looking at this.
>>>>
>>>> On 2020-06-03 03:42, Mike Leach wrote:
>>>> [...]
>>>>
>>>>>>
>>>>>> SMMU/IOMMU won't be able to do much here as it is the client's
>>>>>> responsiblity to
>>>>>> properly shutdown and SMMU device link just makes sure that
>>>>>> SMMU(supplier) shutdown is
>>>>>> called only after its consumers shutdown callbacks are called.
>>>>>
>>>>> I think this use case can be handled slightly differently than the
>>>>> general requirements for modular CoreSight drivers.
>>>>>
>>>>> What is needed here is a way of stopping the underlying ETR hardware
>>>>> from issuing data to the SMMU, until the entire device has been shut
>>>>> down, in a way that does not remove the driver, breaking existing
>>>>> references and causing a system crash.
>>>>>
>>>>> We could introduce a new mode to the ETR driver - e.g.
>>>>> CS_MODE_SHUTDOWN.
>>>>>
>>>>> At the end of the block tmc_shutdown(struct amba_device *adev), set
>>>>> drvdata->mode to CS_MODE_SHUTDOWN & remove the coresight_unregister().
>>>>> This new mode can be used to  prevent the underlying hardware from
>>>>> being able to restart until the device is re-powered.
>>>>>
>>>>> This mode can be detected in the code that enables / disables the ETR
>>>>> and handled appropriately (updates to tmc_enable_etr_sink and
>>>>> tmc_disable_etr_sink).
>>>>> This mode will persist until the device is re-started - but because we
>>>>> are on the device shutdown path this is not an issue.
>>>>>
>>>>> This should leave the CoreSight infrastructure stable until the
>>>>> drivers are shut down normally as part of the device power down
>>>>> process.
>>>>>
>>>>
>>>> Sounds good to me, but if the coresight_unregister() is the trouble
>>>> point
>>>> causing these crashes, then can't we just remove that from
>>>> tmc_shutdown()
>>>> callback? This would be like maintaining the same behaviour as now
>>>> where
>>>> on reboot/shutdown we basically don't do anything except for disabling
>>>> ETR.
>>>
>>> No - the new mode prevents race conditions where the thread shutting
>>> down the SMMU does the ETR shutdown, but then another thread happens
>>> to be trying to start trace and restarts the ETR.
>>> It also prevents the condition Mathieu discussed where a thread might
>>> be attempting to shutdown trace - this could try to disable the
>>> hardware again re-releasing resources/ re-flushing and waiting for
>>> stop.
>>>
>>
>> I do not think there will a race between SMMU shutdown and ETR shutdown.
>> Driver core takes care of calling SMMU shutdown after its consumer
>> shutdown callbacks via device link, otherwise there would already be
>> bugs in all other client drivers.
>>
> 
> I am not saying there could be a race between tmc_shutdowm and
> Smmu_shutdown - there may be a case if the coresight_disable_path
> sequence is running and gets to the point of disabling the ETR after
> the SMMU callback has disabled it.

I'm confused now - there is no "SMMU callback", we're talking about the 
system-wide cleanup from kernel_shutdown_prepare() or 
kernel_restart_prepare(). As far as I'm aware userspace should be long 
gone by that point, so although trace may have been left running, the 
chance of racing against other driver operations seems pretty unlikely.

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* [arm64:devel/mte-v4 25/28] arch/arm64/kernel/cpufeature.c:137:5: warning: no previous prototype for 'mte_ftr_filter'
From: kernel test robot @ 2020-06-03 13:31 UTC (permalink / raw)
  To: Vincenzo, Frascino,; +Cc: Catalin Marinas, kbuild-all, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 4302 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git devel/mte-v4
head:   628f92bb11400cab1adb95c32c05cb1c3c384248
commit: bcf807ed1e622f8311b05e34816df6c41aed54c0 [25/28] arm64: mte: Kconfig entry
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        git checkout bcf807ed1e622f8311b05e34816df6c41aed54c0
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

>> arch/arm64/kernel/cpufeature.c:137:5: warning: no previous prototype for 'mte_ftr_filter' [-Wmissing-prototypes]
137 | s64 mte_ftr_filter(const struct arm64_ftr_bits *ftrp, s64 val)
|     ^~~~~~~~~~~~~~

vim +/mte_ftr_filter +137 arch/arm64/kernel/cpufeature.c

4225028da943ad3 Catalin Marinas 2020-05-15  136  
22fba7c9a015db8 Catalin Marinas 2020-04-08 @137  s64 mte_ftr_filter(const struct arm64_ftr_bits *ftrp, s64 val)
22fba7c9a015db8 Catalin Marinas 2020-04-08  138  {
22fba7c9a015db8 Catalin Marinas 2020-04-08  139  	struct device_node *np;
22fba7c9a015db8 Catalin Marinas 2020-04-08  140  	static bool memory_checked = false;
22fba7c9a015db8 Catalin Marinas 2020-04-08  141  	static bool mte_capable = true;
22fba7c9a015db8 Catalin Marinas 2020-04-08  142  
4225028da943ad3 Catalin Marinas 2020-05-15  143  	if (mte_disable)
4225028da943ad3 Catalin Marinas 2020-05-15  144  		return ID_AA64PFR1_MTE_NI;
4225028da943ad3 Catalin Marinas 2020-05-15  145  
22fba7c9a015db8 Catalin Marinas 2020-04-08  146  	/* EL0-only MTE is not supported by Linux, don't expose it */
22fba7c9a015db8 Catalin Marinas 2020-04-08  147  	if (val < ID_AA64PFR1_MTE)
22fba7c9a015db8 Catalin Marinas 2020-04-08  148  		return ID_AA64PFR1_MTE_NI;
22fba7c9a015db8 Catalin Marinas 2020-04-08  149  
22fba7c9a015db8 Catalin Marinas 2020-04-08  150  	if (memory_checked)
22fba7c9a015db8 Catalin Marinas 2020-04-08  151  		return mte_capable ? val : ID_AA64PFR1_MTE_NI;
22fba7c9a015db8 Catalin Marinas 2020-04-08  152  
22fba7c9a015db8 Catalin Marinas 2020-04-08  153  	if (!acpi_disabled) {
22fba7c9a015db8 Catalin Marinas 2020-04-08  154  		pr_warn("MTE not supported on ACPI systems\n");
22fba7c9a015db8 Catalin Marinas 2020-04-08  155  		return ID_AA64PFR1_MTE_NI;
22fba7c9a015db8 Catalin Marinas 2020-04-08  156  	}
22fba7c9a015db8 Catalin Marinas 2020-04-08  157  
22fba7c9a015db8 Catalin Marinas 2020-04-08  158  	/* check the DT "memory" nodes for MTE support */
22fba7c9a015db8 Catalin Marinas 2020-04-08  159  	for_each_node_by_type(np, "memory") {
22fba7c9a015db8 Catalin Marinas 2020-04-08  160  		memory_checked = true;
22fba7c9a015db8 Catalin Marinas 2020-04-08  161  		mte_capable &= of_property_read_bool(np, "arm,armv8.5-memtag");
22fba7c9a015db8 Catalin Marinas 2020-04-08  162  	}
22fba7c9a015db8 Catalin Marinas 2020-04-08  163  
22fba7c9a015db8 Catalin Marinas 2020-04-08  164  	if (!memory_checked || !mte_capable) {
22fba7c9a015db8 Catalin Marinas 2020-04-08  165  		pr_warn("System memory is not MTE-capable\n");
22fba7c9a015db8 Catalin Marinas 2020-04-08  166  		memory_checked = true;
22fba7c9a015db8 Catalin Marinas 2020-04-08  167  		mte_capable = false;
22fba7c9a015db8 Catalin Marinas 2020-04-08  168  		return ID_AA64PFR1_MTE_NI;
22fba7c9a015db8 Catalin Marinas 2020-04-08  169  	}
22fba7c9a015db8 Catalin Marinas 2020-04-08  170  
22fba7c9a015db8 Catalin Marinas 2020-04-08  171  	return val;
22fba7c9a015db8 Catalin Marinas 2020-04-08  172  }
22fba7c9a015db8 Catalin Marinas 2020-04-08  173  #endif
22fba7c9a015db8 Catalin Marinas 2020-04-08  174  

:::::: The code at line 137 was first introduced by commit
:::::: 22fba7c9a015db8fbbb8004e8f82b10767d98de5 arm64: mte: Check the DT memory nodes for MTE support

:::::: TO: Catalin Marinas <catalin.marinas@arm.com>
:::::: CC: Catalin Marinas <catalin.marinas@arm.com>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 71821 bytes --]

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox