All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juri Lelli <juri.lelli@arm.com>
To: Kirill Tkhai <ktkhai@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Kirill Tkhai <tkhai@yandex.ru>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Juri Lelli <juri.lelli@gmail.com>
Subject: Re: [PATCH v2 1/3] sched/dl: Implement cancel_dl_timer() to use in switched_from_dl()
Date: Tue, 21 Oct 2014 12:41:11 +0100	[thread overview]
Message-ID: <54464657.1060000@arm.com> (raw)
In-Reply-To: <1413888481.19914.45.camel@tkhai>

On 21/10/14 11:48, Kirill Tkhai wrote:
> В Вт, 21/10/2014 в 11:30 +0100, Juri Lelli пишет:
>> Hi Kirill,
>>
>> sorry for the late reply, but I was busy doing other stuff and then
>> travelling.
>>
>> On 02/10/14 11:05, Kirill Tkhai wrote:
>>> В Чт, 02/10/2014 в 11:34 +0200, Peter Zijlstra пишет:
>>>> On Wed, Oct 01, 2014 at 01:04:22AM +0400, Kirill Tkhai wrote:
>>>>> From: Kirill Tkhai <ktkhai@parallels.com>
>>>>>
>>>>> hrtimer_try_to_cancel() may bring a suprise, its call may fail.
>>>>
>>>> Well, not really a surprise that, its a _try_ operation after all.
>>>>
>>>>> raw_spin_lock(&rq->lock)
>>>>> ...                            dl_task_timer                 raw_spin_lock(&rq->lock)
>>>>> ...                               raw_spin_lock(&rq->lock)   ...
>>>>>    switched_from_dl()             ...                        ...
>>>>>       hrtimer_try_to_cancel()     ...                        ...
>>>>>    switched_to_fair()             ...                        ...
>>>>> ...                               ...                        ...
>>>>> ...                               ...                        ...
>>>>> raw_spin_unlock(&rq->lock)        ...                        (asquired)
>>>>> ...                               ...                        ...
>>>>> ...                               ...                        ...
>>>>> do_exit()                         ...                        ...
>>>>>    schedule()                     ...                        ...
>>>>>       raw_spin_lock(&rq->lock)    ...                        raw_spin_unlock(&rq->lock)
>>>>>       ...                         ...                        ...
>>>>>       raw_spin_unlock(&rq->lock)  ...                        raw_spin_lock(&rq->lock)
>>>>>       ...                         ...                        (asquired)
>>>>>       put_task_struct()           ...                        ...
>>>>>           free_task_struct()      ...                        ...
>>>>>       ...                         ...                        raw_spin_unlock(&rq->lock)
>>>>> ...                               (asquired)                 ...
>>>>> ...                               ...                        ...
>>>>> ...                               Surprise!!!                ...
>>>>>
>>>>> So, let's implement 100% guaranteed way to cancel the timer and let's
>>>>> be sure we are safe even in very unlikely situations.
>>>>>
>>>>> We do not create any problem with rq unlocking, because it already
>>>>> may happed below in pull_dl_task(). No problem with deadline tasks
>>>>> balancing too.
>>>>
>>>> That doesn't sound right. pull_dl_task() is an entirely different
>>>> callchain than switched_from(). Now it might still be fine, but you
>>>> cannot compare it with pull_dl_task.
>>>
>>> I mean that caller of switched_from_dl() already knows about this situation,
>>> and we do not limit the area of its use.
>>>
>>
>> Not sure what you mean with "the caller already knows...". Also, can you
>> detail more about the different callchains?
> 
> We have only caller of switched_from_dl(). It's check_class_changed().
> This function doesn't suppose that lock is always locked during its call.
> 
> What other details you want?
> 

Ok, now is more clear, thanks. I was just wondering about what Peter
asked. If you can detail more about why we are still fine with it,
instead that just "it already was possible in pull_dl_task() below",
that would be nice to have.

Also, check_class_changed() is called from several places
(rt_mutex_setprio() for example), are we fine with all this callplaces
as well?

>>
>> Do you have any test for this situation? Do you experienced any crash?
>> As you know, the replenishment timer is of key importance for us, and
>> I'd like to be 100% sure we don't introduce any problems with this
>> change :).
> 
> No, I haven't written any tests to reproduce namely this situation.
> I found it by code analyzing. The same way we fixed the problem
> with rq change in dl_task_timer():
> 
>     http://www.spinics.net/lists/stable/msg49080.html
>

Yeah, but I did write a test for that race:

 "Juri Lelli reports he got this race when dl_bandwidth_enabled()
  was not set."

And after that I felt more confident about the change :).

> Are you agree the race is here? It's my fix, and if brings a problem
> please clarify it.
> 

Yeah, it seems that the race may happen. I'm just saying that it would
be nice to see it happening before we fix the thing. I wish I have some
time to try to setup a test. Even if I can't spot any problems with your
patch, apart from small comments below, not being completely confident
that this doesn't introduce regression elsewhere brought me to ask from
more details.

> I'm waiting for your reply.
> 
> Thanks,
> Kirill
> 
>>> Does this sound better?
>>>
>>> [PATCH] sched/dl: Implement cancel_dl_timer() to use in switched_from_dl()
>>>     
>>> Currently used hrtimer_try_to_cancel() is racy:
>>>
>>> raw_spin_lock(&rq->lock)
>>> ...                            dl_task_timer                 raw_spin_lock(&rq->lock)
>>> ...                               raw_spin_lock(&rq->lock)   ...
>>>    switched_from_dl()             ...                        ...
>>>       hrtimer_try_to_cancel()     ...                        ...
>>>    switched_to_fair()             ...                        ...
>>> ...                               ...                        ...
>>> ...                               ...                        ...
>>> raw_spin_unlock(&rq->lock)        ...                        (asquired)
>>> ...                               ...                        ...
>>> ...                               ...                        ...
>>> do_exit()                         ...                        ...
>>>    schedule()                     ...                        ...
>>>       raw_spin_lock(&rq->lock)    ...                        raw_spin_unlock(&rq->lock)
>>>       ...                         ...                        ...
>>>       raw_spin_unlock(&rq->lock)  ...                        raw_spin_lock(&rq->lock)
>>>       ...                         ...                        (asquired)
>>>       put_task_struct()           ...                        ...
>>>           free_task_struct()      ...                        ...
>>>       ...                         ...                        raw_spin_unlock(&rq->lock)
>>> ...                               (asquired)                 ...
>>> ...                               ...                        ...
>>> ...                               (use after free)           ...
>>>
>>>     
>>> So, let's implement 100% guaranteed way to cancel the timer and let's
>>> be sure we are safe even in very unlikely situations.
>>>
>>> rq unlocking does not limit the area of switched_from_dl() use, because
>>> it already was possible in pull_dl_task() below.
>>>
>>> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
>>>
>>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>>> index abfaf3d..63f8b4a 100644
>>> --- a/kernel/sched/deadline.c
>>> +++ b/kernel/sched/deadline.c
>>> @@ -555,11 +555,6 @@ void init_dl_task_timer(struct sched_dl_entity *dl_se)
>>>  {
>>>  	struct hrtimer *timer = &dl_se->dl_timer;
>>>  
>>> -	if (hrtimer_active(timer)) {
>>> -		hrtimer_try_to_cancel(timer);
>>> -		return;
>>> -	}
>>> -
>>>  	hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>>>  	timer->function = dl_task_timer;
>>>  }
>>> @@ -1567,10 +1562,34 @@ void init_sched_dl_class(void)
>>>  
>>>  #endif /* CONFIG_SMP */
>>>  
>>> +/*
>>> + *  Surely cancel task's dl_timer. May drop rq->lock.
>>> + */

Maybe we can add comments explaining why we are fine releasing the lock
here.

>>> +static void cancel_dl_timer(struct rq *rq, struct task_struct *p)
>>> +{
>>> +	struct hrtimer *dl_timer = &p->dl.dl_timer;
>>> +
>>> +	/* Nobody will change task's class if pi_lock is held */
>>> +	lockdep_assert_held(&p->pi_lock);
>>> +
>>> +	if (hrtimer_active(dl_timer)) {
>>> +		int ret = hrtimer_try_to_cancel(dl_timer);
>>> +
>>> +		if (unlikely(ret == -1)) {
>>> +			/*
>>> +			 * Note, p may migrate OR new deadline tasks
>>> +			 * may appear in rq when we are unlocking it.
>>> +			 */

Yeah, some comments also here on why this is all good?

Thanks a lot Kirill!

Best,

- Juri

>>> +			raw_spin_unlock(&rq->lock);
>>> +			hrtimer_cancel(dl_timer);
>>> +			raw_spin_lock(&rq->lock);
>>> +		}
>>> +	}
>>> +}
>>> +
>>>  static void switched_from_dl(struct rq *rq, struct task_struct *p)
>>>  {
>>> -	if (hrtimer_active(&p->dl.dl_timer) && !dl_policy(p->policy))
>>> -		hrtimer_try_to_cancel(&p->dl.dl_timer);
>>> +	cancel_dl_timer(rq, p);
>>>  
>>>  	__dl_clear_params(p);
>>>  
>>>
>>>
>>>
>>
> 
> 
> 


  reply	other threads:[~2014-10-21 11:41 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-30 21:04 [PATCH v2 1/3] sched/dl: Implement cancel_dl_timer() to use in switched_from_dl() Kirill Tkhai
2014-09-30 21:04 ` [PATCH v2 2/3] sched/dl: Cleanup prio_changed_dl() Kirill Tkhai
2014-10-02  9:36   ` Peter Zijlstra
2014-10-02  9:52     ` Kirill Tkhai
2014-10-21 16:24       ` Juri Lelli
2014-10-21 16:33         ` Kirill Tkhai
2014-10-22  9:33           ` Juri Lelli
2014-10-23 23:04         ` Wanpeng Li
2014-10-24  9:26           ` Juri Lelli
2014-09-30 21:04 ` [PATCH v2 3/3] sched/fair: Delete resched_cpu() from idle_balance() Kirill Tkhai
2014-10-03  5:28   ` [tip:sched/core] " tip-bot for Kirill Tkhai
2014-10-02  9:34 ` [PATCH v2 1/3] sched/dl: Implement cancel_dl_timer() to use in switched_from_dl() Peter Zijlstra
2014-10-02 10:05   ` Kirill Tkhai
2014-10-21 10:30     ` Juri Lelli
2014-10-21 10:48       ` Kirill Tkhai
2014-10-21 11:41         ` Juri Lelli [this message]
2014-10-21 14:21           ` Kirill Tkhai
2014-10-22 10:00             ` Juri Lelli
2014-10-23  8:39               ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54464657.1060000@arm.com \
    --to=juri.lelli@arm.com \
    --cc=juri.lelli@gmail.com \
    --cc=ktkhai@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tkhai@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.