All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jon Hunter <jonathanh@nvidia.com>
To: Dmitry Osipenko <digetx@gmail.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>,
	Thierry Reding <thierry.reding@gmail.com>,
	Alexandre Courbot <gnurou@gmail.com>,
	"Peter De Schrijver" <pdeschrijver@nvidia.com>,
	Prashant Gaikwad <pgaikwad@nvidia.com>,
	<linux-clk@vger.kernel.org>, <linux-tegra@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] soc/tegra: pmc: Fix "scheduling while atomic"
Date: Thu, 26 May 2016 16:27:19 +0100	[thread overview]
Message-ID: <574715D7.4060004@nvidia.com> (raw)
In-Reply-To: <d7a01310-869e-197e-9fbe-5b4783629e10@gmail.com>


On 26/05/16 15:57, Dmitry Osipenko wrote:
> On 26.05.2016 17:32, Jon Hunter wrote:
>>
>> On 26/05/16 12:42, Dmitry Osipenko wrote:
>>> On 26.05.2016 11:42, Jon Hunter wrote:
>>>>
>>>> On 25/05/16 19:51, Dmitry Osipenko wrote:
>>>>> On 25.05.2016 18:09, Jon Hunter wrote:
>>>>
>>>> ...
>>>>
>>>>>> If you are able to reproduce this on v3.18, then it would be good if
>>>>>> you
>>>>>> could trace the CCF calls around this WARNING to see what is causing
>>>>>> the
>>>>>> contention.
>>>>>
>>>>> I managed to reproduce it with some CCF "tracing".
>>>>> Full kmsg log is here: https://bpaste.net/show/d8ab7b7534b7
>>>>>
>>>>> Looks like CPU freq governor thread yields during clk_set_rate() and
>>>>> then CPU idle kicks in, taking the same mutex.
>>>>
>>>> On the surface that sounds odd to me, but without understanding the
>>>> details, I guess I don't know if this is a valid thing to be doing or
>>>> even how that actually works!
>>>>
>>>
>>> The reason of that happening should be that I'm using clk PRE/POST rate
>>> change notifiers in my DVFS driver that takes other mutexes and they
>>> could be locked, causing schedule. I haven't mentioned it before, sorry.
>>
>> OK, but I am not sure how these "other mutexes" would be relevant here
>> without any more details.
>>
>>> From drivers/clk/clk.c:
>>>
>>> static struct task_struct *prepare_owner;
>>>
>>> ...
>>>
>>> /***           locking             ***/
>>> static void clk_prepare_lock(void)
>>> {
>>>     if (!mutex_trylock(&prepare_lock)) {
>>>         if (prepare_owner == current) {
>>>             prepare_refcnt++;
>>>             return;
>>>         }
>>>         mutex_lock(&prepare_lock);
>>>     }
>>>
>>> You can see that it would lock the mutex if prepare_owner != current, in
>>> my case it's idle thread != interactive gov. thread.
>>
>> Right, but that would imply that someone else is actively doing
>> something with a clock. However, if we are entering LP2, then that
>> implies that all CPUs are idle and so I still don't understand the
>> scenario where this would be locked in that case. May be there is
>> something I am overlooking here?
>>
>>>>> However, cpufreq_interactive governor is android specific governor and
>>>>> isn't in upstream kernel yet. Quick googling shows that recent
>>>>> "upstreaming" patch uses same cpufreq_interactive_speedchange_task:
>>>>> https://lkml.org/lkml/2016/5/20/41
>>>>
>>>> Do you know if this version they are upstreaming could also yield
>>>> during
>>>> the clk_set_rate()?
>>>>
>>>
>>> I think it should be assumed that any clk_set_rate() potentially could.
>>> Please correct me if I'm wrong.
>>>
>>>>> I'm not aware of other possibility to reproduce this issue, it needs
>>>>> some CCF interaction from a separate task. So the current upstream
>>>>> kernel shouldn't be affected, I guess.
>>>>
>>>> What still does not make sense to me is why any frequency changes have
>>>> not completed before we attempt to enter the LP2 state?
>>>>
>>> Why not? I don't see any CPUIDLE <-> CPUFREQ interlocking. Do you think
>>> it could be harmful somehow?
>>
>> Like I said before, I still don't understand that scenario that is
>> causing this and without being able to fully understand it, I have no
>> idea what the exact problem we are trying to fix here is.
>>
> 
> That's how I see it:
> 
> +----------------------------------------------+
> |                    CPU 0                     |
> +-------------------+--------------------------+
> |    Idle thread    | Interactive gov. thread  |
> +----------------------------------------------+
> |     inactive      |                          |
> |                   |                          |
> |                   |   CPU freq. change       |
> |                   |                          |
> |                   |   clk_set_rate()         |
> |                   |                          |
> |       ...         |   clk_prepare_lock()     |
> |                   |                          |
> |                   |   PRE rate notifier call |
> |                   |                          |
> |                   |   schedule               |

What is this notifier doing? Is there some sort of hardware activity
that it is waiting for to complete?

Cheers
Jon

-- 
nvpublic

WARNING: multiple messages have this Message-ID (diff)
From: Jon Hunter <jonathanh-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
To: Dmitry Osipenko <digetx-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Stephen Warren <swarren-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>,
	Thierry Reding
	<thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Alexandre Courbot
	<gnurou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Peter De Schrijver
	<pdeschrijver-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>,
	Prashant Gaikwad
	<pgaikwad-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>,
	linux-clk-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] soc/tegra: pmc: Fix "scheduling while atomic"
Date: Thu, 26 May 2016 16:27:19 +0100	[thread overview]
Message-ID: <574715D7.4060004@nvidia.com> (raw)
In-Reply-To: <d7a01310-869e-197e-9fbe-5b4783629e10-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>


On 26/05/16 15:57, Dmitry Osipenko wrote:
> On 26.05.2016 17:32, Jon Hunter wrote:
>>
>> On 26/05/16 12:42, Dmitry Osipenko wrote:
>>> On 26.05.2016 11:42, Jon Hunter wrote:
>>>>
>>>> On 25/05/16 19:51, Dmitry Osipenko wrote:
>>>>> On 25.05.2016 18:09, Jon Hunter wrote:
>>>>
>>>> ...
>>>>
>>>>>> If you are able to reproduce this on v3.18, then it would be good if
>>>>>> you
>>>>>> could trace the CCF calls around this WARNING to see what is causing
>>>>>> the
>>>>>> contention.
>>>>>
>>>>> I managed to reproduce it with some CCF "tracing".
>>>>> Full kmsg log is here: https://bpaste.net/show/d8ab7b7534b7
>>>>>
>>>>> Looks like CPU freq governor thread yields during clk_set_rate() and
>>>>> then CPU idle kicks in, taking the same mutex.
>>>>
>>>> On the surface that sounds odd to me, but without understanding the
>>>> details, I guess I don't know if this is a valid thing to be doing or
>>>> even how that actually works!
>>>>
>>>
>>> The reason of that happening should be that I'm using clk PRE/POST rate
>>> change notifiers in my DVFS driver that takes other mutexes and they
>>> could be locked, causing schedule. I haven't mentioned it before, sorry.
>>
>> OK, but I am not sure how these "other mutexes" would be relevant here
>> without any more details.
>>
>>> From drivers/clk/clk.c:
>>>
>>> static struct task_struct *prepare_owner;
>>>
>>> ...
>>>
>>> /***           locking             ***/
>>> static void clk_prepare_lock(void)
>>> {
>>>     if (!mutex_trylock(&prepare_lock)) {
>>>         if (prepare_owner == current) {
>>>             prepare_refcnt++;
>>>             return;
>>>         }
>>>         mutex_lock(&prepare_lock);
>>>     }
>>>
>>> You can see that it would lock the mutex if prepare_owner != current, in
>>> my case it's idle thread != interactive gov. thread.
>>
>> Right, but that would imply that someone else is actively doing
>> something with a clock. However, if we are entering LP2, then that
>> implies that all CPUs are idle and so I still don't understand the
>> scenario where this would be locked in that case. May be there is
>> something I am overlooking here?
>>
>>>>> However, cpufreq_interactive governor is android specific governor and
>>>>> isn't in upstream kernel yet. Quick googling shows that recent
>>>>> "upstreaming" patch uses same cpufreq_interactive_speedchange_task:
>>>>> https://lkml.org/lkml/2016/5/20/41
>>>>
>>>> Do you know if this version they are upstreaming could also yield
>>>> during
>>>> the clk_set_rate()?
>>>>
>>>
>>> I think it should be assumed that any clk_set_rate() potentially could.
>>> Please correct me if I'm wrong.
>>>
>>>>> I'm not aware of other possibility to reproduce this issue, it needs
>>>>> some CCF interaction from a separate task. So the current upstream
>>>>> kernel shouldn't be affected, I guess.
>>>>
>>>> What still does not make sense to me is why any frequency changes have
>>>> not completed before we attempt to enter the LP2 state?
>>>>
>>> Why not? I don't see any CPUIDLE <-> CPUFREQ interlocking. Do you think
>>> it could be harmful somehow?
>>
>> Like I said before, I still don't understand that scenario that is
>> causing this and without being able to fully understand it, I have no
>> idea what the exact problem we are trying to fix here is.
>>
> 
> That's how I see it:
> 
> +----------------------------------------------+
> |                    CPU 0                     |
> +-------------------+--------------------------+
> |    Idle thread    | Interactive gov. thread  |
> +----------------------------------------------+
> |     inactive      |                          |
> |                   |                          |
> |                   |   CPU freq. change       |
> |                   |                          |
> |                   |   clk_set_rate()         |
> |                   |                          |
> |       ...         |   clk_prepare_lock()     |
> |                   |                          |
> |                   |   PRE rate notifier call |
> |                   |                          |
> |                   |   schedule               |

What is this notifier doing? Is there some sort of hardware activity
that it is waiting for to complete?

Cheers
Jon

-- 
nvpublic

  reply	other threads:[~2016-05-26 15:27 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-17 13:34 [PATCH] soc/tegra: pmc: Fix "scheduling while atomic" Dmitry Osipenko
2016-05-05 11:45 ` Dmitry Osipenko
2016-05-05 11:45   ` Dmitry Osipenko
2016-05-05 13:17 ` Jon Hunter
2016-05-05 13:17   ` Jon Hunter
2016-05-05 14:24   ` Dmitry Osipenko
2016-05-25 15:09     ` Jon Hunter
2016-05-25 15:09       ` Jon Hunter
2016-05-25 18:51       ` Dmitry Osipenko
2016-05-25 18:51         ` Dmitry Osipenko
2016-05-26  8:42         ` Jon Hunter
2016-05-26  8:42           ` Jon Hunter
2016-05-26 11:42           ` Dmitry Osipenko
2016-05-26 14:32             ` Jon Hunter
2016-05-26 14:32               ` Jon Hunter
2016-05-26 14:57               ` Dmitry Osipenko
2016-05-26 15:27                 ` Jon Hunter [this message]
2016-05-26 15:27                   ` Jon Hunter
2016-05-26 17:01                   ` Dmitry Osipenko
2016-05-27 12:46                     ` Jon Hunter
2016-05-27 12:46                       ` Jon Hunter
2016-05-27 14:43                       ` Dmitry Osipenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=574715D7.4060004@nvidia.com \
    --to=jonathanh@nvidia.com \
    --cc=digetx@gmail.com \
    --cc=gnurou@gmail.com \
    --cc=linux-clk@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=pdeschrijver@nvidia.com \
    --cc=pgaikwad@nvidia.com \
    --cc=swarren@wwwdotorg.org \
    --cc=thierry.reding@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.