* Re: [PATCH] xen: sched: avoid races on time values read from NOW()
2016-05-19 8:11 ` [PATCH] " Dario Faggioli
@ 2016-05-19 9:26 ` George Dunlap
2016-05-19 11:01 ` Wei Liu
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: George Dunlap @ 2016-05-19 9:26 UTC (permalink / raw)
To: Dario Faggioli, xen-devel; +Cc: Meng Xu, Wei Liu
On 19/05/16 09:11, Dario Faggioli wrote:
> or (even in cases where there is no race, e.g., outside
> of Credit2) avoid using a time sample which may be rather
> old, and hence stale.
>
> In fact, we should only sample NOW() from _inside_
> the critical region within which the value we read is
> used. If we don't, in case we have to spin for a while
> before entering the region, when actually using it:
>
> 1) we will use something that, at the veryy least, is
> not really "now", because of the spinning,
>
> 2) if someone else sampled NOW() during a critical
> region protected by the lock we are spinning on,
> and if we compare the two samples when we get
> inside our region, our one will be 'earlier',
> even if we actually arrived later, which is a
> race.
>
> In Credit2, we see an instance of 2), in runq_tickle(),
> when it is called by csched2_context_saved() as it samples
> NOW() before acquiring the runq lock. This makes things
> look like the time went backwards, and it confuses the
> algorithm (there's even a d2printk() about it, which would
> trigger all the time, if enabled).
>
> In RTDS, something similar happens in repl_timer_handler(),
> and there's another instance in schedule() (in generic code),
> so fix these cases too.
>
> While there, improve csched2_vcpu_wake() and and rt_vcpu_wake()
> a little as well (removing a pointless initialization, and
> moving the sampling a bit closer to its use). These two hunks
> entail no further functional changes.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
I agree this is a fairly low-risk bugfix that should be considered for 4.7.
-George
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Wei Liu <wei.liu@citrix.com>
> ---
> xen/common/sched_credit2.c | 4 ++--
> xen/common/sched_rt.c | 7 +++++--
> xen/common/schedule.c | 4 +++-
> 3 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index f95e509..1933ff1 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -1028,7 +1028,7 @@ static void
> csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
> {
> struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
> - s_time_t now = 0;
> + s_time_t now;
>
> /* Schedule lock should be held at this point. */
>
> @@ -1085,8 +1085,8 @@ static void
> csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
> {
> struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
> - s_time_t now = NOW();
> spinlock_t *lock = vcpu_schedule_lock_irq(vc);
> + s_time_t now = NOW();
>
> BUG_ON( !is_idle_vcpu(vc) && svc->rqd != RQD(ops, vc->processor));
>
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index aa3ffd2..0946101 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -1198,7 +1198,7 @@ static void
> rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
> {
> struct rt_vcpu * const svc = rt_vcpu(vc);
> - s_time_t now = NOW();
> + s_time_t now;
> bool_t missed;
>
> BUG_ON( is_idle_vcpu(vc) );
> @@ -1225,6 +1225,7 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
> * If a deadline passed while svc was asleep/blocked, we need new
> * scheduling parameters (a new deadline and full budget).
> */
> + now = NOW();
>
> missed = ( now >= svc->cur_deadline );
> if ( missed )
> @@ -1394,7 +1395,7 @@ rt_dom_cntl(
> * from the replq and does the actual replenishment.
> */
> static void repl_timer_handler(void *data){
> - s_time_t now = NOW();
> + s_time_t now;
> struct scheduler *ops = data;
> struct rt_private *prv = rt_priv(ops);
> struct list_head *replq = rt_replq(ops);
> @@ -1406,6 +1407,8 @@ static void repl_timer_handler(void *data){
>
> spin_lock_irq(&prv->lock);
>
> + now = NOW();
> +
> /*
> * Do the replenishment and move replenished vcpus
> * to the temporary list to tickle.
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 80fea39..5e35310 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1320,7 +1320,7 @@ static void vcpu_periodic_timer_work(struct vcpu *v)
> static void schedule(void)
> {
> struct vcpu *prev = current, *next = NULL;
> - s_time_t now = NOW();
> + s_time_t now;
> struct scheduler *sched;
> unsigned long *tasklet_work = &this_cpu(tasklet_work_to_do);
> bool_t tasklet_work_scheduled = 0;
> @@ -1355,6 +1355,8 @@ static void schedule(void)
>
> lock = pcpu_schedule_lock_irq(cpu);
>
> + now = NOW();
> +
> stop_timer(&sd->s_timer);
>
> /* get policy-specific decision on scheduling... */
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] xen: sched: avoid races on time values read from NOW()
2016-05-19 8:11 ` [PATCH] " Dario Faggioli
2016-05-19 9:26 ` George Dunlap
@ 2016-05-19 11:01 ` Wei Liu
2016-05-19 15:22 ` Meng Xu
2016-05-24 10:08 ` Jan Beulich
3 siblings, 0 replies; 8+ messages in thread
From: Wei Liu @ 2016-05-19 11:01 UTC (permalink / raw)
To: Dario Faggioli; +Cc: Wei Liu, xen-devel, Wei Liu, George Dunlap, Meng Xu
On Thu, May 19, 2016 at 10:11:35AM +0200, Dario Faggioli wrote:
> or (even in cases where there is no race, e.g., outside
> of Credit2) avoid using a time sample which may be rather
> old, and hence stale.
>
> In fact, we should only sample NOW() from _inside_
> the critical region within which the value we read is
> used. If we don't, in case we have to spin for a while
> before entering the region, when actually using it:
>
> 1) we will use something that, at the veryy least, is
> not really "now", because of the spinning,
>
> 2) if someone else sampled NOW() during a critical
> region protected by the lock we are spinning on,
> and if we compare the two samples when we get
> inside our region, our one will be 'earlier',
> even if we actually arrived later, which is a
> race.
>
> In Credit2, we see an instance of 2), in runq_tickle(),
> when it is called by csched2_context_saved() as it samples
> NOW() before acquiring the runq lock. This makes things
> look like the time went backwards, and it confuses the
> algorithm (there's even a d2printk() about it, which would
> trigger all the time, if enabled).
>
> In RTDS, something similar happens in repl_timer_handler(),
> and there's another instance in schedule() (in generic code),
> so fix these cases too.
>
> While there, improve csched2_vcpu_wake() and and rt_vcpu_wake()
> a little as well (removing a pointless initialization, and
> moving the sampling a bit closer to its use). These two hunks
> entail no further functional changes.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Wei Liu <wei.liu@citrix.com>
Subject to review from Meng and George:
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xen: sched: avoid races on time values read from NOW()
2016-05-19 8:11 ` [PATCH] " Dario Faggioli
2016-05-19 9:26 ` George Dunlap
2016-05-19 11:01 ` Wei Liu
@ 2016-05-19 15:22 ` Meng Xu
2016-05-24 10:08 ` Jan Beulich
3 siblings, 0 replies; 8+ messages in thread
From: Meng Xu @ 2016-05-19 15:22 UTC (permalink / raw)
To: Dario Faggioli; +Cc: xen-devel@lists.xenproject.org, Wei Liu, George Dunlap
On Thu, May 19, 2016 at 4:11 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> or (even in cases where there is no race, e.g., outside
> of Credit2) avoid using a time sample which may be rather
> old, and hence stale.
>
> In fact, we should only sample NOW() from _inside_
> the critical region within which the value we read is
> used. If we don't, in case we have to spin for a while
> before entering the region, when actually using it:
>
> 1) we will use something that, at the veryy least, is
> not really "now", because of the spinning,
>
> 2) if someone else sampled NOW() during a critical
> region protected by the lock we are spinning on,
> and if we compare the two samples when we get
> inside our region, our one will be 'earlier',
> even if we actually arrived later, which is a
> race.
>
> In Credit2, we see an instance of 2), in runq_tickle(),
> when it is called by csched2_context_saved() as it samples
> NOW() before acquiring the runq lock. This makes things
> look like the time went backwards, and it confuses the
> algorithm (there's even a d2printk() about it, which would
> trigger all the time, if enabled).
>
> In RTDS, something similar happens in repl_timer_handler(),
> and there's another instance in schedule() (in generic code),
> so fix these cases too.
>
> While there, improve csched2_vcpu_wake() and and rt_vcpu_wake()
> a little as well (removing a pointless initialization, and
> moving the sampling a bit closer to its use). These two hunks
> entail no further functional changes.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Meng Xu <mengxu@cis.upenn.edu>
> Cc: Wei Liu <wei.liu@citrix.com>
> ---
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
The bug will cause incorrect budget accounting for one VCPU when the
race occurs.
Best Regards,
Meng
-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] xen: sched: avoid races on time values read from NOW()
2016-05-19 8:11 ` [PATCH] " Dario Faggioli
` (2 preceding siblings ...)
2016-05-19 15:22 ` Meng Xu
@ 2016-05-24 10:08 ` Jan Beulich
2016-05-24 12:12 ` Dario Faggioli
3 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2016-05-24 10:08 UTC (permalink / raw)
To: Dario Faggioli; +Cc: Wei Liu, xen-devel, Meng Xu, George Dunlap
>>> On 19.05.16 at 10:11, <dario.faggioli@citrix.com> wrote:
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -1198,7 +1198,7 @@ static void
> rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
> {
> struct rt_vcpu * const svc = rt_vcpu(vc);
> - s_time_t now = NOW();
> + s_time_t now;
> bool_t missed;
>
> BUG_ON( is_idle_vcpu(vc) );
> @@ -1225,6 +1225,7 @@ rt_vcpu_wake(const struct scheduler *ops, struct vcpu
> *vc)
> * If a deadline passed while svc was asleep/blocked, we need new
> * scheduling parameters (a new deadline and full budget).
> */
> + now = NOW();
>
> missed = ( now >= svc->cur_deadline );
> if ( missed )
> @@ -1394,7 +1395,7 @@ rt_dom_cntl(
> * from the replq and does the actual replenishment.
> */
> static void repl_timer_handler(void *data){
> - s_time_t now = NOW();
> + s_time_t now;
> struct scheduler *ops = data;
> struct rt_private *prv = rt_priv(ops);
> struct list_head *replq = rt_replq(ops);
> @@ -1406,6 +1407,8 @@ static void repl_timer_handler(void *data){
>
> spin_lock_irq(&prv->lock);
>
> + now = NOW();
> +
> /*
> * Do the replenishment and move replenished vcpus
> * to the temporary list to tickle.
While backporting this for 4.6 (which required substantial
adjustment to the sched_rt.c part) I noticed that there's another
of the cases mentioned in the description in rt_vcpu_insert(). The
commit message doesn't mention why this was left unchanged, so
was not fixing this perhaps just an oversight?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] xen: sched: avoid races on time values read from NOW()
2016-05-24 10:08 ` Jan Beulich
@ 2016-05-24 12:12 ` Dario Faggioli
0 siblings, 0 replies; 8+ messages in thread
From: Dario Faggioli @ 2016-05-24 12:12 UTC (permalink / raw)
To: Jan Beulich; +Cc: Wei Liu, xen-devel, Meng Xu, George Dunlap
[-- Attachment #1.1: Type: text/plain, Size: 851 bytes --]
On Tue, 2016-05-24 at 04:08 -0600, Jan Beulich wrote:
> While backporting this for 4.6 (which required substantial
> adjustment to the sched_rt.c part)
>
Yep, I figure it did. :-(
It sounds like you've pretty much done with it, but if not, and if you
want me or Meng to provide the backport, just ask!
> I noticed that there's another
> of the cases mentioned in the description in rt_vcpu_insert(). The
> commit message doesn't mention why this was left unchanged, so
> was not fixing this perhaps just an oversight?
>
It is indeed. I'll send a patch.
Sorry,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread