From: James Clark <james.clark@arm.com>
To: hejunhao <hejunhao3@huawei.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
jonathan.cameron@huawei.com, mike.leach@linaro.org,
linuxarm@huawei.com, yangyicong@huawei.com,
prime.zeng@hisilicon.com
Subject: Re: [PATCH v2 1/2] coresight: trbe: Fix TRBE potential sleep in atomic context
Date: Thu, 17 Aug 2023 10:57:41 +0100 [thread overview]
Message-ID: <5f7e6321-5d19-e11e-2fc4-cda2d6cc6ffe@arm.com> (raw)
In-Reply-To: <308c70c7-c8db-158a-d9ad-bcc1f0db77b9@huawei.com>
On 17/08/2023 09:41, hejunhao wrote:
> Hi Anshuman Khandual,
>
>
> On 2023/8/17 15:13, Anshuman Khandual wrote:
>> Hello Junhao,
>>
>> On 8/16/23 19:40, Suzuki K Poulose wrote:
>>> From: Junhao He <hejunhao3@huawei.com>
>>>
>>> smp_call_function_single() will allocate an IPI interrupt vector to
>>> the target processor and send a function call request to the interrupt
>>> vector. After the target processor receives the IPI interrupt, it will
>>> execute arm_trbe_remove_coresight_cpu() call request in the interrupt
>>> handler.
>>>
>>> According to the device_unregister() stack information, if other process
>>> is useing the device, the down_write() may sleep, and trigger deadlocks
>>> or unexpected errors.
>>>
>>> arm_trbe_remove_coresight_cpu
>>> coresight_unregister
>>> device_unregister
>>> device_del
>>> kobject_del
>>> __kobject_del
>>> sysfs_remove_dir
>>> kernfs_remove
>>> down_write ---------> it may sleep
>> But how did you really detect this problem ? Does this show up as an
>> warning when
>> you enable lockdep debug ? OR it really happened during a real
>> workload execution
>> followed by TRBE module unload. Although the problem seems plausible
>> (which needs
>> fixing), just wondering how did we trigger this.
>
> Yes, it really happened during a real workload.
>
> If the TRBE driver is loaded and unloaded cyclically. the test script
> following:
>
> for ((i=0;i<99999;i++))
> do
> insmod coresight-trbe.ko;
> rmmod coresight-trbe.ko;
> echo "loop $i";
> done
>
> The kernel will report a panic.
>
I wonder how easy it would be to add a kselftest to do this with all of
the Coresight modules. Because we also had a problem with bad reference
counting preventing an unload of the CTI module. Although that did
require starting a perf session, which might complicated the test.
>>> Add a helper arm_trbe_disable_cpu() to disable TRBE precpu irq and reset
>>> per TRBE.
>>> Simply call arm_trbe_remove_coresight_cpu() directly without useing the
>>> smp_call_function_single(), which is the same as registering the TRBE
>>> coresight device.
>>>
>>> Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver")
>>> Signed-off-by: Junhao He <hejunhao3@huawei.com>
>>> Link:
>>> https://lore.kernel.org/r/20230814093813.19152-2-hejunhao3@huawei.com
>>> [ Remove duplicate cpumask checks during removal ]
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
>>> drivers/hwtracing/coresight/coresight-trbe.c | 33 +++++++++++---------
>>> 1 file changed, 18 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c
>>> b/drivers/hwtracing/coresight/coresight-trbe.c
>>> index 7720619909d6..025f70adee47 100644
>>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>>> @@ -1225,6 +1225,17 @@ static void arm_trbe_enable_cpu(void *info)
>>> enable_percpu_irq(drvdata->irq, IRQ_TYPE_NONE);
>>> }
>>> +static void arm_trbe_disable_cpu(void *info)
>>> +{
>>> + struct trbe_drvdata *drvdata = info;
>>> + struct trbe_cpudata *cpudata = this_cpu_ptr(drvdata->cpudata);
>>> +
>>> + disable_percpu_irq(drvdata->irq);
>>> + trbe_reset_local(cpudata);
>>> + cpudata->drvdata = NULL;
>>> +}
>>> +
>>> +
>>> static void arm_trbe_register_coresight_cpu(struct trbe_drvdata
>>> *drvdata, int cpu)
>>> {
>>> struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
>>> @@ -1326,18 +1337,12 @@ static void arm_trbe_probe_cpu(void *info)
>>> cpumask_clear_cpu(cpu, &drvdata->supported_cpus);
>>> }
>>> -static void arm_trbe_remove_coresight_cpu(void *info)
>>> +static void arm_trbe_remove_coresight_cpu(struct trbe_drvdata
>>> *drvdata, int cpu)
>>> {
>>> - int cpu = smp_processor_id();
>>> - struct trbe_drvdata *drvdata = info;
>>> - struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata, cpu);
>>> struct coresight_device *trbe_csdev =
>>> coresight_get_percpu_sink(cpu);
>>> - disable_percpu_irq(drvdata->irq);
>>> - trbe_reset_local(cpudata);
>>> if (trbe_csdev) {
>>> coresight_unregister(trbe_csdev);
>>> - cpudata->drvdata = NULL;
>>> coresight_set_percpu_sink(cpu, NULL);
>>> }
>>> }
>>> @@ -1366,8 +1371,10 @@ static int arm_trbe_remove_coresight(struct
>>> trbe_drvdata *drvdata)
>>> {
>>> int cpu;
>>> - for_each_cpu(cpu, &drvdata->supported_cpus)
>>> - smp_call_function_single(cpu, arm_trbe_remove_coresight_cpu,
>>> drvdata, 1);
>>> + for_each_cpu(cpu, &drvdata->supported_cpus) {
>>> + smp_call_function_single(cpu, arm_trbe_disable_cpu, drvdata,
>>> 1);
>>> + arm_trbe_remove_coresight_cpu(drvdata, cpu);
>>> + }
>>> free_percpu(drvdata->cpudata);
>>> return 0;
>>> }
>>> @@ -1406,12 +1413,8 @@ static int arm_trbe_cpu_teardown(unsigned int
>>> cpu, struct hlist_node *node)
>>> {
>>> struct trbe_drvdata *drvdata = hlist_entry_safe(node, struct
>>> trbe_drvdata, hotplug_node);
>>> - if (cpumask_test_cpu(cpu, &drvdata->supported_cpus)) {
>>> - struct trbe_cpudata *cpudata = per_cpu_ptr(drvdata->cpudata,
>>> cpu);
>>> -
>>> - disable_percpu_irq(drvdata->irq);
>>> - trbe_reset_local(cpudata);
>>> - }
>>> + if (cpumask_test_cpu(cpu, &drvdata->supported_cpus))
>>> + arm_trbe_disable_cpu(drvdata);
>> This code hunk seems unrelated to the context here other than just
>> finding another use case
>> for arm_trbe_disable_cpu(). The problem is - arm_trbe_disable_cpu()
>> resets cpudata->drvdata
>> which might not get re-initialized back in arm_trbe_cpu_startup(), as
>> there will still be a
>> per cpu sink associated as confirmed with coresight_get_percpu_sink().
>> I guess it might be
>> better to drop this change and just keep everything limited to SMP IPI
>> callback reworking in
>> arm_trbe_remove_coresight().
>
> OK, will fix it. The change is just to simplify the code of cpu_teardown.
> Maybe we can consider whether we need to set "cpudata->drvdata = NULL"
> in arm_trbe_disable_cpu()? If it's not necessary, This can be kept.
> Then drop the release cpudata->drvdata from arm_trbe_disable_cpu().
>
> Best regards,
> Junhao.
>
>>> return 0;
>>> }
>>>
>> .
>>
>
> _______________________________________________
> CoreSight mailing list -- coresight@lists.linaro.org
> To unsubscribe send an email to coresight-leave@lists.linaro.org
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-08-17 9:58 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-14 9:38 [PATCH 0/2] Fix some issues with TRBE building as a module Junhao He
2023-08-14 9:38 ` [PATCH 1/2] coresight: trbe: Fix TRBE potential sleep in atomic context Junhao He
2023-08-14 10:34 ` Suzuki K Poulose
2023-08-14 13:32 ` hejunhao
2023-08-14 22:57 ` Suzuki K Poulose
2023-08-15 11:40 ` hejunhao
2023-08-14 9:38 ` [PATCH 2/2] coresight: core: Fix multiple free TRBE platform data resource Junhao He
2023-08-14 22:47 ` Suzuki K Poulose
2023-08-15 11:38 ` hejunhao
2023-08-16 13:13 ` Suzuki K Poulose
2023-08-16 13:58 ` Suzuki K Poulose
2023-08-16 14:10 ` [PATCH v2 1/2] coresight: trbe: Fix TRBE potential sleep in atomic context Suzuki K Poulose
2023-08-16 14:10 ` [PATCH 2/2] coresight: trbe: Allocate platform data per device Suzuki K Poulose
2023-08-17 6:37 ` Anshuman Khandual
2023-08-17 9:24 ` James Clark
2023-08-17 10:01 ` Suzuki K Poulose
2023-08-17 10:16 ` Anshuman Khandual
2023-08-17 10:33 ` Suzuki K Poulose
2023-08-17 10:01 ` Suzuki K Poulose
2023-08-17 8:47 ` hejunhao
2023-08-17 7:13 ` [PATCH v2 1/2] coresight: trbe: Fix TRBE potential sleep in atomic context Anshuman Khandual
2023-08-17 8:41 ` hejunhao
2023-08-17 9:57 ` James Clark [this message]
2023-08-17 9:59 ` Suzuki K Poulose
2023-08-17 6:18 ` [PATCH 0/2] Fix some issues with TRBE building as a module Anshuman Khandual
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5f7e6321-5d19-e11e-2fc4-cda2d6cc6ffe@arm.com \
--to=james.clark@arm.com \
--cc=anshuman.khandual@arm.com \
--cc=coresight@lists.linaro.org \
--cc=hejunhao3@huawei.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=mike.leach@linaro.org \
--cc=prime.zeng@hisilicon.com \
--cc=suzuki.poulose@arm.com \
--cc=yangyicong@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox