Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Luca Abeni <luca.abeni@santannapisa.it>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@arm.com>,
	Claudio Scordino <claudio@evidence.eu.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Tommaso Cucinotta <tommaso.cucinotta@sssup.it>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Joel Fernandes <joelaf@google.com>,
	Mathieu Poirier <mathieu.poirier@linaro.org>
Subject: Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization
Date: Mon, 24 Jul 2017 10:06:09 +0200	[thread overview]
Message-ID: <20170724100609.4ec38473@luca> (raw)
In-Reply-To: <20170324224715.4098dbfb@nowhere>

Hi Peter,

On Fri, 24 Mar 2017 22:47:15 +0100
luca abeni <luca.abeni@santannapisa.it> wrote:

> Hi Peter,
> 
> On Fri, 24 Mar 2017 14:20:41 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:
> >   
> > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > index d67eee8..952cac8 100644
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -445,16 +445,33 @@ struct sched_dl_entity {
> > >  	 *
> > >  	 * @dl_yielded tells if task gave up the CPU before
> > > consuming
> > >  	 * all its available runtime during the last job.
> > > +	 *
> > > +	 * @dl_non_contending tells if task is inactive while still
> > > +	 * contributing to the active utilization. In other words,
> > > it
> > > +	 * indicates if the inactive timer has been armed and its
> > > handler
> > > +	 * has not been executed yet. This flag is useful to avoid
> > > race
> > > +	 * conditions between the inactive timer handler and the
> > > wakeup
> > > +	 * code.
> > >  	 */
> > >  	int				dl_throttled;
> > >  	int				dl_boosted;
> > >  	int				dl_yielded;
> > > +	int				dl_non_contending;    
> > 
> > We should maybe look into making those bits :1, not something for this
> > patch though;  
> 
> Yes, grouping all the flags in a single field was my intention too... I
> planned to submit a patch to do this after merging the reclaiming
> patches... But maybe it is better to do this first :)

I implemented this change, but before submitting the patch I have a
small question.
I implemented some helpers to access the various
{throttled,boosted,yielded,non_contending} flags. I have some
"dl_{throttled,boosted,...}()" inline functions for reading the values
of the flags, and some inline functions for setting / clearing the
flags. For these, I have two possibilities:
- using two separate "dl_set_{throttled,...}()" and
  "dl_clear_{throttled,..}()" functions for each flag
- using one single "dl_set_{throttled,...}(dl, value)" function per
  flag, in which the flag's value is specified.

I have no preferences (with the first proposal, I introduce more inline
functions, but I think the functions can be made more efficient /
optimized). Which one of the two proposals is preferred? (or, there is
a third, better, idea that I overlooked?)


			Thanks,
				Luca

> 
> 
> > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > > index cef9adb..86aed82 100644
> > > --- a/kernel/sched/deadline.c
> > > +++ b/kernel/sched/deadline.c
> > > @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq
> > > *dl_rq) dl_rq->running_bw = 0;
> > >  }
> > >  
> > > +void dl_change_utilization(struct task_struct *p, u64 new_bw)
> > > +{
> > > +	if (!task_on_rq_queued(p)) {
> > > +		struct rq *rq = task_rq(p);
> > > +
> > > +		if (p->dl.dl_non_contending) {
> > > +			sub_running_bw(p->dl.dl_bw, &rq->dl);
> > > +			p->dl.dl_non_contending = 0;
> > > +			/*
> > > +			 * If the timer handler is currently
> > > running and the
> > > +			 * timer cannot be cancelled,
> > > inactive_task_timer()
> > > +			 * will see that dl_not_contending is not
> > > set, and
> > > +			 * will not touch the rq's active
> > > utilization,
> > > +			 * so we are still safe.
> > > +			 */
> > > +			if
> > > (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1)
> > > +				put_task_struct(p);
> > > +		}
> > > +	}
> > > +}    
> > 
> > If we rearrange that slightly we can avoid the double indent:  
> [...]
> Ok; I'll rework the code in this way on Monday.
> 
> [...]
> > > +		if (hrtimer_try_to_cancel(&dl_se->inactive_timer)
> > > == 1)
> > > +			put_task_struct(dl_task_of(dl_se));
> > > +		dl_se->dl_non_contending = 0;    
> > 
> > This had me worried for a little while as being a use-after-free, but
> > this put_task_struct() cannot be the last in this case. Might be
> > worth a comment.  
> 
> Or maybe it is better to move "dl_se->dl_non_contending = 0;" before
> hrtimer_try_to_cancel()?
> 
> >   
> > > +	} else {
> > > +		/*
> > > +		 * Since "dl_non_contending" is not set, the
> > > +		 * task's utilization has already been removed from
> > > +		 * active utilization (either when the task
> > > blocked,
> > > +		 * when the "inactive timer" fired).
> > > +		 * So, add it back.
> > > +		 */
> > > +		add_running_bw(dl_se->dl_bw, dl_rq);
> > > +	}
> > > +}    
> > 
> > In general I feel it would be nice to have a state diagram included
> > somewhere near these two functions. It would be nice to not have to
> > dig out the PDF every time.  
> 
> Ok... Since I am not good at ascii art, would it be ok to add a textual
> description? If yes, I'll add a comment like:
> "
> The utilization of a task is added to the runqueue's active utilization
> when the task becomes active (is enqueued in the runqueue), and is
> removed when the task becomes inactive. A task does not become
> immediately inactive when it blocks, but becomes inactive at the so
> called "0 lag time"; so, we setup the "inactive timer" to fire at the
> "0 lag time". When the "inactive timer" fires, the task utilization is
> removed from the runqueue's active utilization. If the task wakes up
> again on the same runqueue before the "0 lag time", the active
> utilization must not be changed and the "inactive timer" must be
> cancelled. If the task wakes up again on a different runqueue before
> the "0 lag time", then the task's utilization must be removed from the
> previous runqueue's active utilization and must be added to the new
> runqueue's active utilization.
> In order to avoid races between a task waking up on a runqueue while the
> "inactive timer" is running on a different CPU, the "dl_non_contending"
> flag is used to indicate that a task is not on a runqueue but is active
> (so, the flag is set when the task blocks and is cleared when the
> "inactive timer" fires or when the task  wakes up).
> "
> (if this is ok, where can I add this comment?)
> 
> 
> > > +static void migrate_task_rq_dl(struct task_struct *p)
> > > +{
> > > +	if ((p->state == TASK_WAKING) &&
> > > (p->dl.dl_non_contending)) {
> > > +		struct rq *rq = task_rq(p);
> > > +
> > > +		raw_spin_lock(&rq->lock);
> > > +		sub_running_bw(p->dl.dl_bw, &rq->dl);
> > > +		p->dl.dl_non_contending = 0;
> > > +		/*
> > > +		 * If the timer handler is currently running and
> > > the
> > > +		 * timer cannot be cancelled, inactive_task_timer()
> > > +		 * will see that dl_not_contending is not set, and
> > > +		 * will not touch the rq's active utilization,
> > > +		 * so we are still safe.
> > > +		 */
> > > +		if (hrtimer_try_to_cancel(&p->dl.inactive_timer)
> > > == 1)
> > > +			put_task_struct(p);
> > > +
> > > +		raw_spin_unlock(&rq->lock);
> > > +	}
> > > +}    
> > 
> > This can similarly be reduced in indent, albeit this is only a single
> > indent level, so not as onerous as the other one.  
> 
> Ok; I'll do this on Monday.
> 
> 
> > What had me puzzled for a while here is taking the lock; because some
> > callers of set_task_cpu() must in fact hold this lock already. Worth a
> > comment I feel.  
> 
> Ok; I'll add a comment mentioning that since state is TASK_WAKING we do
> not have the lock.
> 
> 
> > Once I figured out the exact locking; I realized you do this on
> > cross-cpu wakeups. We spend a fair amount of effort not to take both
> > rq locks. But I suppose you have to here.  
> 
> The problem is that when a task wakes up before the "0 lag time" on a
> different runqueue, we must "migrate" its utilization from the old
> runqueue to the new one (see comment above). And I need the lock to
> avoid races... The only alternative I can think about is to ad a new
> lock protecting the active utilization, and to take it instead of the
> runqueue lock (I do not know if this is enough, I need to check...).
> I'll have a look on Monday.
> 
> 
> 
> 			Thanks,
> 				Luca

next prev parent reply	other threads:[~2017-07-24  8:06 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-24  3:52 [RFC v5 0/9] CPU reclaiming for SCHED_DEADLINE luca abeni
2017-03-24  3:52 ` [RFC v5 1/9] sched/deadline: track the active utilization luca abeni
2017-03-26 17:04   ` Mathieu Poirier
2017-03-26 20:55     ` luca abeni
2017-03-24  3:52 ` [RFC v5 2/9] sched/deadline: improve the tracking of " luca abeni
2017-03-24 13:20   ` Peter Zijlstra
2017-03-24 21:47     ` luca abeni
2017-03-25  2:31       ` Steven Rostedt
2017-03-27  8:20         ` Luca Abeni
2017-03-27  8:54           ` Claudio Scordino
2017-03-27  7:17       ` Juri Lelli
2017-03-27  7:43         ` Luca Abeni
2017-03-27  8:45           ` Juri Lelli
2017-03-27  7:36       ` Luca Abeni
2017-07-24  8:06       ` Luca Abeni [this message]
2017-07-24  9:04         ` Peter Zijlstra
2017-07-25  6:41           ` Luca Abeni
2017-03-24 13:23   ` Peter Zijlstra
2017-07-24  7:54     ` Luca Abeni
2017-07-24  9:11       ` Peter Zijlstra
2017-07-25  6:46         ` Luca Abeni
2017-03-26 17:32   ` Mathieu Poirier
2017-03-26 21:01     ` luca abeni
2017-03-24  3:52 ` [RFC v5 3/9] sched/deadline: fix the update of the total -deadline utilization luca abeni
2017-03-24  3:52 ` [RFC v5 4/9] sched/deadline: implement GRUB accounting luca abeni
2017-03-24  3:52 ` [RFC v5 5/9] sched/deadline: do not reclaim the whole CPU bandwidth luca abeni
2017-03-24 14:00   ` Peter Zijlstra
2017-03-24 21:58     ` luca abeni
2017-03-25  2:38       ` Steven Rostedt
2017-03-27  8:35         ` Peter Zijlstra
2017-03-24  3:52 ` [RFC v5 6/9] sched/deadline: make GRUB a task's flag luca abeni
2017-03-24  3:53 ` [RFC v5 7/9] sched/deadline: track the "total rq utilization" too luca abeni
2017-03-24  3:53 ` [RFC v5 8/9] sched/deadline: base GRUB reclaiming on the inactive utilization luca abeni
2017-03-27 14:26   ` Peter Zijlstra
2017-03-27 14:56     ` Luca Abeni
2017-03-27 15:53       ` Peter Zijlstra
2017-03-27 17:02         ` luca abeni
2017-05-08  7:41     ` Luca Abeni
2017-05-08  8:06       ` Peter Zijlstra
2017-05-09  9:37         ` Luca Abeni
2017-03-24  3:53 ` [RFC v5 9/9] sched/deadline: also reclaim bandwidth not used by dl tasks luca abeni
2017-03-27 14:03   ` Peter Zijlstra
2017-03-27 14:48     ` Luca Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170724100609.4ec38473@luca \
    --to=luca.abeni@santannapisa.it \
    --cc=bristot@redhat.com \
    --cc=claudio@evidence.eu.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.poirier@linaro.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tommaso.cucinotta@sssup.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox