[BUG] mistakenly wake in Xen's credit scheduler

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [BUG] mistakenly wake in Xen's credit scheduler
@ 2015-10-26 22:30 Kun Suo
  2015-10-27  5:48 ` Jia Rao
  0 siblings, 1 reply; 16+ messages in thread
From: Kun Suo @ 2015-10-26 22:30 UTC (permalink / raw)
  To: xen-devel@lists.xen.org; +Cc: Yong Zhao, Jia Rao


[-- Attachment #1.1: Type: text/plain, Size: 2626 bytes --]

Hi all,

The BOOST mechanism in Xen credit scheduler is designed to prioritize VM which has I/O-intensive application to handle the I/O request in time. However, this does not always work as expected.


(1) Problem description
--------------------------
Suppose two VMs(named VM-I/O and VM-CPU) both have one virtual CPU and they are pinned to the same physical CPU. An I/O-intensive application(e.g. Netperf) runs in the VM-I/O and a CPU-intensive application(e.g. Loop) runs in the VM-CPU. When a client is sending I/O requests to VM-I/O, its vCPU cannot become BOOST state but obtains very little CPU cycles(less than 1% in Xen 4.6). Both the throughput and latency are very terrible.



(2) Problem analysis
--------------------------
This problem is due to the wake mechanism in Xen and CPU-intensive workload will be waked and boosted by mistake.

Suppose the vCPU of VM-CPU is running and an I/O request comes, the current vCPU(vCPU of VM-CPU) will be marked as _VPF_migrating.

static inline void __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
{
...
           if ( new_idlers_empty && new->pri > cur->pri )
           {
               SCHED_STAT_CRANK(tickle_idlers_none);
               SCHED_VCPU_STAT_CRANK(cur, kicked_away);
               SCHED_VCPU_STAT_CRANK(cur, migrate_r);
               SCHED_STAT_CRANK(migrate_kicked_away);
               set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
               __cpumask_set_cpu(cpu, &mask);
           }
}


next time when the schedule happens and the prev is the vCPU of VM-CPU, the context_saved(vcpu) will be executed. Because the vCPU has been marked as _VPF_migrating and it will then be waked up.

void context_saved(struct vcpu *prev)
{
    ...

    if ( unlikely(test_bit(_VPF_migrating, &prev->pause_flags)) )
        vcpu_migrate(prev);
}

Once the state of vCPU of VM-CPU is UNDER, it will be changed into BOOST state which is designed originally for I/O-intensive vCPU. If this happen, even though the vCPU of VM-I/O becomes BOOST, it cannot get the physical CPU immediately but wait until the vCPU of VM-CPU is scheduled out. That will harm the I/O performance significantly.



(3) Our Test results
--------------------------
Hypervisor: Xen 4.6
Dom 0 & Dom U: Linux 3.18
Client: Linux 3.18
Network: 1 Gigabit Ethernet

Throughput:
Only VM-I/O: 941 Mbps
co-Run VM-I/O and VM-CPU: 32 Mbps

Latency:
Only VM-I/O: 78 usec
co-Run VM-I/O and VM-CPU: 109093 usec



This bug has been there from Xen 4.2 to Xen 4.6.

Thanks.

Reported by Tony Suo and Yong Zhao from UCCS



[-- Attachment #1.2: Type: text/html, Size: 11761 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-26 22:30 Kun Suo
@ 2015-10-27  5:48 ` Jia Rao
  0 siblings, 0 replies; 16+ messages in thread
From: Jia Rao @ 2015-10-27  5:48 UTC (permalink / raw)
  To: Kun Suo; +Cc: Yong Zhao, Jia Rao, xen-devel@lists.xen.org

Most mailing lists only accept text-only emails. I saw you had bold font in your last email, which may be in html rich format.
> On Oct 26, 2015, at 4:30 PM, Kun Suo <ksuo@uccs.edu> wrote:
> 
> Hi all,
> 
> The BOOST mechanism in Xen credit scheduler is designed to prioritize VM which has I/O-intensive application to handle the I/O request in time. However, this does not always work as expected.
> 
> 
> (1) Problem description
> --------------------------
> Suppose two VMs(named VM-I/O and VM-CPU) both have one virtual CPU and they are pinned to the same physical CPU. An I/O-intensive application(e.g. Netperf) runs in the VM-I/O and a CPU-intensive application(e.g. Loop) runs in the VM-CPU. When a client is sending I/O requests to VM-I/O, its vCPU cannot become BOOST state but obtains very little CPU cycles(less than 1% in Xen 4.6). Both the throughput and latency are very terrible.
> 
> 
> 
> (2) Problem analysis
> --------------------------
> This problem is due to the wake mechanism in Xen and CPU-intensive workload will be waked and boosted by mistake.
> 
> Suppose the vCPU of VM-CPU is running and an I/O request comes, the current vCPU(vCPU of VM-CPU) will be marked as _VPF_migrating. 
> 
> static inline void __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
> {
> ...
>            if ( new_idlers_empty && new->pri > cur->pri )
>            {
>                SCHED_STAT_CRANK(tickle_idlers_none);
>                SCHED_VCPU_STAT_CRANK(cur, kicked_away);
>                SCHED_VCPU_STAT_CRANK(cur, migrate_r);
>                SCHED_STAT_CRANK(migrate_kicked_away);
>                set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
>                __cpumask_set_cpu(cpu, &mask);
>            }
> }
> 
> 
> next time when the schedule happens and the prev is the vCPU of VM-CPU, the context_saved(vcpu) will be executed. Because the vCPU has been marked as _VPF_migrating and it will then be waked up. 
> 
> void context_saved(struct vcpu *prev)
> {
>     ...
> 
>     if ( unlikely(test_bit(_VPF_migrating, &prev->pause_flags)) )
>         vcpu_migrate(prev);
> }
> 
> Once the state of vCPU of VM-CPU is UNDER, it will be changed into BOOST state which is designed originally for I/O-intensive vCPU. If this happen, even though the vCPU of VM-I/O becomes BOOST, it cannot get the physical CPU immediately but wait until the vCPU of VM-CPU is scheduled out. That will harm the I/O performance significantly.
> 
> 
> 
> (3) Our Test results
> --------------------------
> Hypervisor: Xen 4.6
> Dom 0 & Dom U: Linux 3.18
> Client: Linux 3.18
> Network: 1 Gigabit Ethernet
> 
> Throughput:
> Only VM-I/O: 941 Mbps
> co-Run VM-I/O and VM-CPU: 32 Mbps
> 
> Latency:
> Only VM-I/O: 78 usec
> co-Run VM-I/O and VM-CPU: 109093 usec
> 
> 
> 
> This bug has been there from Xen 4.2 to Xen 4.6.
> 
> Thanks.
> 
> Reported by Tony Suo and Yong Zhao from UCCS
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [BUG] mistakenly wake in Xen's credit scheduler
@ 2015-10-27  5:59 suokun
  2015-10-27  9:44 ` George Dunlap
  2015-10-27 10:44 ` Dario Faggioli
  0 siblings, 2 replies; 16+ messages in thread
From: suokun @ 2015-10-27  5:59 UTC (permalink / raw)
  To: xen-devel

Hi all,

The BOOST mechanism in Xen credit scheduler is designed to prioritize
VM which has I/O-intensive application to handle the I/O request in
time. However, this does not always work as expected.

(1) Problem description
--------------------------------
Suppose two VMs(named VM-I/O and VM-CPU) both have one virtual CPU and
they are pinned to the same physical CPU. An I/O-intensive
application(e.g. Netperf) runs in the VM-I/O and a CPU-intensive
application(e.g. Loop) runs in the VM-CPU. When a client is sending
I/O requests to VM-I/O, its vCPU cannot become BOOST state but obtains
very little CPU cycles(less than 1% in Xen 4.6). Both the throughput
and latency are very terrible.

(2) Problem analysis
--------------------------------
This problem is due to the wake mechanism in Xen and CPU-intensive
workload will be waked and boosted by mistake.

Suppose the vCPU of VM-CPU is running and an I/O request comes, the
current vCPU(vCPU of VM-CPU) will be marked as _VPF_migrating.

static inline void __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
{
...
           if ( new_idlers_empty && new->pri > cur->pri )
           {
               SCHED_STAT_CRANK(tickle_idlers_none);
               SCHED_VCPU_STAT_CRANK(cur, kicked_away);
               SCHED_VCPU_STAT_CRANK(cur, migrate_r);
               SCHED_STAT_CRANK(migrate_kicked_away);
               set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
               __cpumask_set_cpu(cpu, &mask);
           }
}

next time when the schedule happens and the prev is the vCPU of
VM-CPU, the context_saved(vcpu) will be executed. Because the vCPU has
been marked as _VPF_migrating and it will then be waked up.

void context_saved(struct vcpu *prev)
{
    ...

    if ( unlikely(test_bit(_VPF_migrating, &prev->pause_flags)) )
        vcpu_migrate(prev);
}

Once the state of vCPU of VM-CPU is UNDER, it will be changed into
BOOST state which is designed originally for I/O-intensive vCPU. If
this happen, even though the vCPU of VM-I/O becomes BOOST, it cannot
get the physical CPU immediately but wait until the vCPU of VM-CPU is
scheduled out. That will harm the I/O performance significantly.

(3) Our Test results
--------------------------------
Hypervisor: Xen 4.6
Dom 0 & Dom U: Linux 3.18
Client: Linux 3.18
Network: 1 Gigabit Ethernet

Throughput:
Only VM-I/O: 941 Mbps
co-Run VM-I/O and VM-CPU: 32 Mbps

Latency:
Only VM-I/O: 78 usec
co-Run VM-I/O and VM-CPU: 109093 usec

This bug has been there since Xen 4.2 and still exists in the latest Xen 4.6.
Thanks.
Reported by Tony Suo and Yong Zhao from UCCS

-- 

**********************************
> Tony Suo
> Email: suokunstar@gmail.com
> University of Colorado at Colorado Springs
**********************************

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-27  5:59 [BUG] mistakenly wake in Xen's credit scheduler suokun
@ 2015-10-27  9:44 ` George Dunlap
  2015-10-27  9:53   ` Dario Faggioli
  2015-10-27 20:11   ` suokun
  2015-10-27 10:44 ` Dario Faggioli
  1 sibling, 2 replies; 16+ messages in thread
From: George Dunlap @ 2015-10-27  9:44 UTC (permalink / raw)
  To: suokun; +Cc: xen-devel@lists.xen.org

On Tue, Oct 27, 2015 at 5:59 AM, suokun <suokunstar@gmail.com> wrote:
> Hi all,
>
> The BOOST mechanism in Xen credit scheduler is designed to prioritize
> VM which has I/O-intensive application to handle the I/O request in
> time. However, this does not always work as expected.

Thanks for the exploration, and the analysis.

The BOOST mechanism is part of the reason I began to write the credit2
scheduler, which we are  hoping (any day now) to make the default
scheduler.  It was designed specifically with the workload you mention
in mind.  Would you care to try your test again and see how it fares?

Also, do you have a patch to fix it in credit1? :-)

 -George

>
>
> (1) Problem description
> --------------------------------
> Suppose two VMs(named VM-I/O and VM-CPU) both have one virtual CPU and
> they are pinned to the same physical CPU. An I/O-intensive
> application(e.g. Netperf) runs in the VM-I/O and a CPU-intensive
> application(e.g. Loop) runs in the VM-CPU. When a client is sending
> I/O requests to VM-I/O, its vCPU cannot become BOOST state but obtains
> very little CPU cycles(less than 1% in Xen 4.6). Both the throughput
> and latency are very terrible.
>
>
>
> (2) Problem analysis
> --------------------------------
> This problem is due to the wake mechanism in Xen and CPU-intensive
> workload will be waked and boosted by mistake.
>
> Suppose the vCPU of VM-CPU is running and an I/O request comes, the
> current vCPU(vCPU of VM-CPU) will be marked as _VPF_migrating.
>
> static inline void __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
> {
> ...
>            if ( new_idlers_empty && new->pri > cur->pri )
>            {
>                SCHED_STAT_CRANK(tickle_idlers_none);
>                SCHED_VCPU_STAT_CRANK(cur, kicked_away);
>                SCHED_VCPU_STAT_CRANK(cur, migrate_r);
>                SCHED_STAT_CRANK(migrate_kicked_away);
>                set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
>                __cpumask_set_cpu(cpu, &mask);
>            }
> }
>
>
> next time when the schedule happens and the prev is the vCPU of
> VM-CPU, the context_saved(vcpu) will be executed. Because the vCPU has
> been marked as _VPF_migrating and it will then be waked up.
>
> void context_saved(struct vcpu *prev)
> {
>     ...
>
>     if ( unlikely(test_bit(_VPF_migrating, &prev->pause_flags)) )
>         vcpu_migrate(prev);
> }
>
> Once the state of vCPU of VM-CPU is UNDER, it will be changed into
> BOOST state which is designed originally for I/O-intensive vCPU. If
> this happen, even though the vCPU of VM-I/O becomes BOOST, it cannot
> get the physical CPU immediately but wait until the vCPU of VM-CPU is
> scheduled out. That will harm the I/O performance significantly.
>
>
>
> (3) Our Test results
> --------------------------------
> Hypervisor: Xen 4.6
> Dom 0 & Dom U: Linux 3.18
> Client: Linux 3.18
> Network: 1 Gigabit Ethernet
>
> Throughput:
> Only VM-I/O: 941 Mbps
> co-Run VM-I/O and VM-CPU: 32 Mbps
>
> Latency:
> Only VM-I/O: 78 usec
> co-Run VM-I/O and VM-CPU: 109093 usec
>
>
>
> This bug has been there since Xen 4.2 and still exists in the latest Xen 4.6.
> Thanks.
> Reported by Tony Suo and Yong Zhao from UCCS
>
> --
>
> **********************************
>> Tony Suo
>> Email: suokunstar@gmail.com
>> University of Colorado at Colorado Springs
> **********************************
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-27  9:44 ` George Dunlap
@ 2015-10-27  9:53   ` Dario Faggioli
  2015-10-27 20:11   ` suokun
  1 sibling, 0 replies; 16+ messages in thread
From: Dario Faggioli @ 2015-10-27  9:53 UTC (permalink / raw)
  To: George Dunlap, suokun; +Cc: xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 1486 bytes --]

On Tue, 2015-10-27 at 09:44 +0000, George Dunlap wrote:
> On Tue, Oct 27, 2015 at 5:59 AM, suokun <suokunstar@gmail.com> wrote:
> > Hi all,
> > 
> > The BOOST mechanism in Xen credit scheduler is designed to
> > prioritize
> > VM which has I/O-intensive application to handle the I/O request in
> > time. However, this does not always work as expected.
> 
> Thanks for the exploration, and the analysis.
> 
Yep, indeed. :-)

> The BOOST mechanism is part of the reason I began to write the
> credit2
> scheduler, which we are  hoping (any day now) to make the default
> scheduler.  It was designed specifically with the workload you
> mention
> in mind.  
>
The whole BOOST thing is an hack, and I don't have much problems
believing it interacts poorly with the tickling mechanism, which, in
Credit1, is not very precise and reliable (e.g., in Credit2, there is a
'tickled' mask).

That being said, I'm looking at the analysis itself, and I'm not sure I
understand what exactly you are suggesting it's going on... I will
reply shortly with a few questions.

> Would you care to try your test again and see how it fares?
> 
Well, that would be a lot useful, for sure! :-D

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-27  5:59 [BUG] mistakenly wake in Xen's credit scheduler suokun
  2015-10-27  9:44 ` George Dunlap
@ 2015-10-27 10:44 ` Dario Faggioli
  2015-10-27 20:32   ` suokun
  1 sibling, 1 reply; 16+ messages in thread
From: Dario Faggioli @ 2015-10-27 10:44 UTC (permalink / raw)
  To: suokun, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4308 bytes --]

On Mon, 2015-10-26 at 23:59 -0600, suokun wrote:
> Hi all,
> 
Hi,

And first of all, thanks for resending in plain text, this is much
appreciated.

Thanks also for the report. I'm not sure I can figure out completely
what you're saying that you are seeing happening, let's see if you can
help me... :-)

> (1) Problem description
> --------------------------------
> Suppose two VMs(named VM-I/O and VM-CPU) both have one virtual CPU
> and
> they are pinned to the same physical CPU. An I/O-intensive
> application(e.g. Netperf) runs in the VM-I/O and a CPU-intensive
> application(e.g. Loop) runs in the VM-CPU. When a client is sending
> I/O requests to VM-I/O, its vCPU cannot become BOOST state but
> obtains
> very little CPU cycles(less than 1% in Xen 4.6). Both the throughput
> and latency are very terrible.
> 
I see. And I take it that you have a test case that makes it easy to
trigger this behavior. Feel free to post the code to make that happen
here, I'll be glad to have a look myself.

> (2) Problem analysis
> --------------------------------
> This problem is due to the wake mechanism in Xen and CPU-intensive
> workload will be waked and boosted by mistake.
> 
> Suppose the vCPU of VM-CPU is running and an I/O request comes, the
> current vCPU(vCPU of VM-CPU) will be marked as _VPF_migrating.
> 
> [...]
> 
> next time when the schedule happens and the prev is the vCPU of
> VM-CPU, the context_saved(vcpu) will be executed. 
>
What do you mean "next time"? if the vcpu of VM-CPU was running, at the
point that it became 'prev', someone else must have be running. Are you
seeing something different than this?

> Because the vCPU has
> been marked as _VPF_migrating and it will then be waked up.
> 
Yes, but again, of what "next time when the schedule happens" are we
talking about?

If VM-IO's vcpu is really being boosted by the I/O event (is this the
case?), then the schedule invocation that follows its wakeup, should
just run it (and, as you say, make VM-CPU's vcpu become prev).

Then, yes, context_saved() is called, which calls vcpu_migrate(), which
then calls vcpu_wake()-->csched_vcpu_wake(), on prev == VM-CPU's vcpu.
_BUT_ that would most likely do just nothing, as VM-CPU's vcpu is on
the runqueue at this point, and csched_vcpu_wake() has this:

    if ( unlikely(__vcpu_on_runq(svc)) )
    {
        SCHED_STAT_CRANK(vcpu_wake_onrunq);                            
                                               
        return;                                                        
                                               
    }

So, no boosting happens.

> Once the state of vCPU of VM-CPU is UNDER, it will be changed into
> BOOST state which is designed originally for I/O-intensive vCPU.
>
Again, I don't think I see how.

> this happen, even though the vCPU of VM-I/O becomes BOOST, it cannot
> get the physical CPU immediately but wait until the vCPU of VM-CPU is
> scheduled out. That will harm the I/O performance significantly.
> 
If the vcpu of VM-IO becomes BOOST, because of an I/O event, it seems
to me that it should manage to get scheduled immediately.

> (3) Our Test results
> --------------------------------
> Hypervisor: Xen 4.6
> Dom 0 & Dom U: Linux 3.18
> Client: Linux 3.18
> Network: 1 Gigabit Ethernet
> 
> Throughput:
> Only VM-I/O: 941 Mbps
> co-Run VM-I/O and VM-CPU: 32 Mbps
> 
> Latency:
> Only VM-I/O: 78 usec
> co-Run VM-I/O and VM-CPU: 109093 usec
> 
Yeah, that's pretty poor, and I'm not saying we don't have an issue. I
just don't understand/don't agree with the analysis.

> This bug has been there since Xen 4.2 and still exists in the latest
> Xen 4.6.
>
The code that set the _VPF_migrating bit in __runq_tickle() was not
there in Xen 4.2. It has been introduced in Xen 4.3. With "since Xen
4.2" do you mean 4.2 included or not?

So, apart from the numbers above, what are there other data and hints
that led you to the analysis?

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-27  9:44 ` George Dunlap
  2015-10-27  9:53   ` Dario Faggioli
@ 2015-10-27 20:11   ` suokun
  2015-10-28  5:39     ` suokun
  2015-10-28  5:54     ` Dario Faggioli
  1 sibling, 2 replies; 16+ messages in thread
From: suokun @ 2015-10-27 20:11 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

On Tue, Oct 27, 2015 at 3:44 AM, George Dunlap <dunlapg@umich.edu> wrote:
> On Tue, Oct 27, 2015 at 5:59 AM, suokun <suokunstar@gmail.com> wrote:
>> Hi all,
>>
>> The BOOST mechanism in Xen credit scheduler is designed to prioritize
>> VM which has I/O-intensive application to handle the I/O request in
>> time. However, this does not always work as expected.
>
> Thanks for the exploration, and the analysis.
>
> The BOOST mechanism is part of the reason I began to write the credit2
> scheduler, which we are  hoping (any day now) to make the default
> scheduler.  It was designed specifically with the workload you mention
> in mind.  Would you care to try your test again and see how it fares?
>

Hi, George,

Thank you for your reply. I have test credit2 this morning. The I/O
performance is correct, however, the CPU accounting seems not correct.
Here is my experiment on credit2:

VM-IO:          1-vCPU pinned to a pCPU, running netperf
VM-CPU:      1-vCPU pinned the the same pCPU, running a while(1) loop
The throughput of netperf is the same(941Mbps) as VM-IO runs alone.

However, when I use xl top to show the VM CPU utilization, VM-IO takes
73% of CPU time and VM-CPU takes 99% CPU time. Their sum is more than
100%. I doubt it is due to the CPU utilization accounting in credit2
scheduler.


> Also, do you have a patch to fix it in credit1? :-)
>

For the patch to my problem in credit1. I have two ideas:

1) if the vCPU cannot migrate(e.g. pinned, CPU affinity, even only has
one physical CPU), do not set the _VPF_migrating flag.

2) let the BOOST state can preempt with each other.

Actually I have tested both separately and they both work. But
personally I prefer the first option because it solved the problem
from the source.

Best
Tony

>  -George
>
>>
>>
>> (1) Problem description
>> --------------------------------
>> Suppose two VMs(named VM-I/O and VM-CPU) both have one virtual CPU and
>> they are pinned to the same physical CPU. An I/O-intensive
>> application(e.g. Netperf) runs in the VM-I/O and a CPU-intensive
>> application(e.g. Loop) runs in the VM-CPU. When a client is sending
>> I/O requests to VM-I/O, its vCPU cannot become BOOST state but obtains
>> very little CPU cycles(less than 1% in Xen 4.6). Both the throughput
>> and latency are very terrible.
>>
>>
>>
>> (2) Problem analysis
>> --------------------------------
>> This problem is due to the wake mechanism in Xen and CPU-intensive
>> workload will be waked and boosted by mistake.
>>
>> Suppose the vCPU of VM-CPU is running and an I/O request comes, the
>> current vCPU(vCPU of VM-CPU) will be marked as _VPF_migrating.
>>
>> static inline void __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
>> {
>> ...
>>            if ( new_idlers_empty && new->pri > cur->pri )
>>            {
>>                SCHED_STAT_CRANK(tickle_idlers_none);
>>                SCHED_VCPU_STAT_CRANK(cur, kicked_away);
>>                SCHED_VCPU_STAT_CRANK(cur, migrate_r);
>>                SCHED_STAT_CRANK(migrate_kicked_away);
>>                set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
>>                __cpumask_set_cpu(cpu, &mask);
>>            }
>> }
>>
>>
>> next time when the schedule happens and the prev is the vCPU of
>> VM-CPU, the context_saved(vcpu) will be executed. Because the vCPU has
>> been marked as _VPF_migrating and it will then be waked up.
>>
>> void context_saved(struct vcpu *prev)
>> {
>>     ...
>>
>>     if ( unlikely(test_bit(_VPF_migrating, &prev->pause_flags)) )
>>         vcpu_migrate(prev);
>> }
>>
>> Once the state of vCPU of VM-CPU is UNDER, it will be changed into
>> BOOST state which is designed originally for I/O-intensive vCPU. If
>> this happen, even though the vCPU of VM-I/O becomes BOOST, it cannot
>> get the physical CPU immediately but wait until the vCPU of VM-CPU is
>> scheduled out. That will harm the I/O performance significantly.
>>
>>
>>
>> (3) Our Test results
>> --------------------------------
>> Hypervisor: Xen 4.6
>> Dom 0 & Dom U: Linux 3.18
>> Client: Linux 3.18
>> Network: 1 Gigabit Ethernet
>>
>> Throughput:
>> Only VM-I/O: 941 Mbps
>> co-Run VM-I/O and VM-CPU: 32 Mbps
>>
>> Latency:
>> Only VM-I/O: 78 usec
>> co-Run VM-I/O and VM-CPU: 109093 usec
>>
>>
>>
>> This bug has been there since Xen 4.2 and still exists in the latest Xen 4.6.
>> Thanks.
>> Reported by Tony Suo and Yong Zhao from UCCS
>>
>> --
>>
>> **********************************
>>> Tony Suo
>>> Email: suokunstar@gmail.com
>>> University of Colorado at Colorado Springs
>> **********************************
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-27 10:44 ` Dario Faggioli
@ 2015-10-27 20:32   ` suokun
  2015-10-28  5:41     ` Dario Faggioli
  0 siblings, 1 reply; 16+ messages in thread
From: suokun @ 2015-10-27 20:32 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel

On Tue, Oct 27, 2015 at 4:44 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On Mon, 2015-10-26 at 23:59 -0600, suokun wrote:
>> Hi all,
>>
> Hi,
>
> And first of all, thanks for resending in plain text, this is much
> appreciated.
>
> Thanks also for the report. I'm not sure I can figure out completely
> what you're saying that you are seeing happening, let's see if you can
> help me... :-)
>

Hi, Dario,
Thank you for your reply.

>> (1) Problem description
>> --------------------------------
>> Suppose two VMs(named VM-I/O and VM-CPU) both have one virtual CPU
>> and
>> they are pinned to the same physical CPU. An I/O-intensive
>> application(e.g. Netperf) runs in the VM-I/O and a CPU-intensive
>> application(e.g. Loop) runs in the VM-CPU. When a client is sending
>> I/O requests to VM-I/O, its vCPU cannot become BOOST state but
>> obtains
>> very little CPU cycles(less than 1% in Xen 4.6). Both the throughput
>> and latency are very terrible.
>>
> I see. And I take it that you have a test case that makes it easy to
> trigger this behavior. Feel free to post the code to make that happen
> here, I'll be glad to have a look myself.
>

Here are my two VMs running on the same physical CPU.
VM-IO: 1-vCPU pinned to a pCPU, running netperf
VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop
Another machine run the netperf client to send the requests to VM-IO.

My code is very simple:
in VM-IO, as server side: $ netserver -p 12345
in VM-CPU, just running a while(1) loop: $./loop
in the client, send I/O request to the VM-IO: $ netperf -H [server_ip]
-l 15 -t TCP_STREAM -p 12345

>> (2) Problem analysis
>> --------------------------------
>> This problem is due to the wake mechanism in Xen and CPU-intensive
>> workload will be waked and boosted by mistake.
>>
>> Suppose the vCPU of VM-CPU is running and an I/O request comes, the
>> current vCPU(vCPU of VM-CPU) will be marked as _VPF_migrating.
>>
>> [...]
>>
>> next time when the schedule happens and the prev is the vCPU of
>> VM-CPU, the context_saved(vcpu) will be executed.
>>
> What do you mean "next time"? if the vcpu of VM-CPU was running, at the
> point that it became 'prev', someone else must have be running. Are you
> seeing something different than this?
>
>> Because the vCPU has
>> been marked as _VPF_migrating and it will then be waked up.
>>
> Yes, but again, of what "next time when the schedule happens" are we
> talking about?
>
> If VM-IO's vcpu is really being boosted by the I/O event (is this the
> case?), then the schedule invocation that follows its wakeup, should
> just run it (and, as you say, make VM-CPU's vcpu become prev).
>
> Then, yes, context_saved() is called, which calls vcpu_migrate(), which
> then calls vcpu_wake()-->csched_vcpu_wake(), on prev == VM-CPU's vcpu.
> _BUT_ that would most likely do just nothing, as VM-CPU's vcpu is on
> the runqueue at this point, and csched_vcpu_wake() has this:
>
>     if ( unlikely(__vcpu_on_runq(svc)) )
>     {
>         SCHED_STAT_CRANK(vcpu_wake_onrunq);
>
>         return;
>
>     }
>
> So, no boosting happens.
>

The setting that led to the poor IO performance is as follows:
VM-IO:  1-vCPU pinned to a pCPU, running netperf
VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop

The root cause is that when an IO request comes, VM-IO’s vCPU is
elevated to BOOST and goes through vcpu_wake —> __runq_tickle. In
__runq_tickle, the currently running vCPU (i.e., the vCPU from VM-CPU)
is marked as _VPF_migrating. Then, Xen goes through schedule() to
reschedule the current vCPU (i.e., vCPU from VM-CPU) and schedule the
next vCPU (i.e., the vCPU from VM-IO). Due to the _VPF_migrating flag,
the descheduled vCPU will be migrated in context_saved() and later
woken up in cpu_wake(). Indeed, csched_vcpu_wake() will quit if the
vCPU from VM-CPU is on run queue. But it is actually not. In
csched_schedule(), the vCPU will not be inserted back to run queue
because it is not runnable due to the __VPF_migrating bit in
pause_flags. As such, the vCPU from VM-CPU will boosted and not be
preempted by a later IO request because BOOST can not preempt BOOST.

A simple fix would be allowing BOOST to preempt BOOST. A better fix
would be checking the CPU affinity before setting the __VPF_migrating
flag.


>> Once the state of vCPU of VM-CPU is UNDER, it will be changed into
>> BOOST state which is designed originally for I/O-intensive vCPU.
>>
> Again, I don't think I see how.
>
>> this happen, even though the vCPU of VM-I/O becomes BOOST, it cannot
>> get the physical CPU immediately but wait until the vCPU of VM-CPU is
>> scheduled out. That will harm the I/O performance significantly.
>>
> If the vcpu of VM-IO becomes BOOST, because of an I/O event, it seems
> to me that it should manage to get scheduled immediately.
>
>> (3) Our Test results
>> --------------------------------
>> Hypervisor: Xen 4.6
>> Dom 0 & Dom U: Linux 3.18
>> Client: Linux 3.18
>> Network: 1 Gigabit Ethernet
>>
>> Throughput:
>> Only VM-I/O: 941 Mbps
>> co-Run VM-I/O and VM-CPU: 32 Mbps
>>
>> Latency:
>> Only VM-I/O: 78 usec
>> co-Run VM-I/O and VM-CPU: 109093 usec
>>
> Yeah, that's pretty poor, and I'm not saying we don't have an issue. I
> just don't understand/don't agree with the analysis.
>
>> This bug has been there since Xen 4.2 and still exists in the latest
>> Xen 4.6.
>>
> The code that set the _VPF_migrating bit in __runq_tickle() was not
> there in Xen 4.2. It has been introduced in Xen 4.3. With "since Xen
> 4.2" do you mean 4.2 included or not?
>

Sorry about here. We have tested on Xen 4.3, 4.4, 4.5, 4.6, it has the
same issue.

Thanks.
Tony


> So, apart from the numbers above, what are there other data and hints
> that led you to the analysis?
>
> Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-27 20:11   ` suokun
@ 2015-10-28  5:39     ` suokun
  2015-10-28  5:54     ` Dario Faggioli
  1 sibling, 0 replies; 16+ messages in thread
From: suokun @ 2015-10-28  5:39 UTC (permalink / raw)
  To: George Dunlap; +Cc: Dario Faggioli, xen-devel

On Tue, Oct 27, 2015 at 2:11 PM, suokun <suokunstar@gmail.com> wrote:
> On Tue, Oct 27, 2015 at 3:44 AM, George Dunlap <dunlapg@umich.edu> wrote:
>> On Tue, Oct 27, 2015 at 5:59 AM, suokun <suokunstar@gmail.com> wrote:
>>> Hi all,
>>>
>>> The BOOST mechanism in Xen credit scheduler is designed to prioritize
>>> VM which has I/O-intensive application to handle the I/O request in
>>> time. However, this does not always work as expected.
>>
>> Thanks for the exploration, and the analysis.
>>
>> The BOOST mechanism is part of the reason I began to write the credit2
>> scheduler, which we are  hoping (any day now) to make the default
>> scheduler.  It was designed specifically with the workload you mention
>> in mind.  Would you care to try your test again and see how it fares?
>>
>
> Hi, George,
>
> Thank you for your reply. I have test credit2 this morning. The I/O
> performance is correct, however, the CPU accounting seems not correct.
> Here is my experiment on credit2:
>
> VM-IO:          1-vCPU pinned to a pCPU, running netperf
> VM-CPU:      1-vCPU pinned the the same pCPU, running a while(1) loop
> The throughput of netperf is the same(941Mbps) as VM-IO runs alone.
>
> However, when I use xl top to show the VM CPU utilization, VM-IO takes
> 73% of CPU time and VM-CPU takes 99% CPU time. Their sum is more than
> 100%. I doubt it is due to the CPU utilization accounting in credit2
> scheduler.
>
>
>> Also, do you have a patch to fix it in credit1? :-)
>>
>
> For the patch to my problem in credit1. I have two ideas:
>
> 1) if the vCPU cannot migrate(e.g. pinned, CPU affinity, even only has
> one physical CPU), do not set the _VPF_migrating flag.
>
> 2) let the BOOST state can preempt with each other.
>
> Actually I have tested both separately and they both work. But
> personally I prefer the first option because it solved the problem
> from the source.
>
> Best
> Tony

Here is my patch:

+++ /xen/common/sched_credit.c

if ( new_idlers_empty && new->pri > cur->pri )
{
    SCHED_STAT_CRANK(tickle_idlers_none);
    SCHED_VCPU_STAT_CRANK(cur, kicked_away);
    SCHED_VCPU_STAT_CRANK(cur, migrate_r);
    SCHED_STAT_CRANK(migrate_kicked_away);

+   /* migration can happen only cpu number greater than 1 and vcpu is
not pinned to a single physical CPU */
+   if(num_online_cpus() > 1 &&
cpumask_weight((cur->vcpu)->cpu_hard_affinity) > 1) {
        set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
+   }
    cpumask_set_cpu(cpu, &mask);
}

Best
Tony


>
>>  -George
>>
>>>
>>>
>>> (1) Problem description
>>> --------------------------------
>>> Suppose two VMs(named VM-I/O and VM-CPU) both have one virtual CPU and
>>> they are pinned to the same physical CPU. An I/O-intensive
>>> application(e.g. Netperf) runs in the VM-I/O and a CPU-intensive
>>> application(e.g. Loop) runs in the VM-CPU. When a client is sending
>>> I/O requests to VM-I/O, its vCPU cannot become BOOST state but obtains
>>> very little CPU cycles(less than 1% in Xen 4.6). Both the throughput
>>> and latency are very terrible.
>>>
>>>
>>>
>>> (2) Problem analysis
>>> --------------------------------
>>> This problem is due to the wake mechanism in Xen and CPU-intensive
>>> workload will be waked and boosted by mistake.
>>>
>>> Suppose the vCPU of VM-CPU is running and an I/O request comes, the
>>> current vCPU(vCPU of VM-CPU) will be marked as _VPF_migrating.
>>>
>>> static inline void __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
>>> {
>>> ...
>>>            if ( new_idlers_empty && new->pri > cur->pri )
>>>            {
>>>                SCHED_STAT_CRANK(tickle_idlers_none);
>>>                SCHED_VCPU_STAT_CRANK(cur, kicked_away);
>>>                SCHED_VCPU_STAT_CRANK(cur, migrate_r);
>>>                SCHED_STAT_CRANK(migrate_kicked_away);
>>>                set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
>>>                __cpumask_set_cpu(cpu, &mask);
>>>            }
>>> }
>>>
>>>
>>> next time when the schedule happens and the prev is the vCPU of
>>> VM-CPU, the context_saved(vcpu) will be executed. Because the vCPU has
>>> been marked as _VPF_migrating and it will then be waked up.
>>>
>>> void context_saved(struct vcpu *prev)
>>> {
>>>     ...
>>>
>>>     if ( unlikely(test_bit(_VPF_migrating, &prev->pause_flags)) )
>>>         vcpu_migrate(prev);
>>> }
>>>
>>> Once the state of vCPU of VM-CPU is UNDER, it will be changed into
>>> BOOST state which is designed originally for I/O-intensive vCPU. If
>>> this happen, even though the vCPU of VM-I/O becomes BOOST, it cannot
>>> get the physical CPU immediately but wait until the vCPU of VM-CPU is
>>> scheduled out. That will harm the I/O performance significantly.
>>>
>>>
>>>
>>> (3) Our Test results
>>> --------------------------------
>>> Hypervisor: Xen 4.6
>>> Dom 0 & Dom U: Linux 3.18
>>> Client: Linux 3.18
>>> Network: 1 Gigabit Ethernet
>>>
>>> Throughput:
>>> Only VM-I/O: 941 Mbps
>>> co-Run VM-I/O and VM-CPU: 32 Mbps
>>>
>>> Latency:
>>> Only VM-I/O: 78 usec
>>> co-Run VM-I/O and VM-CPU: 109093 usec
>>>
>>>
>>>
>>> This bug has been there since Xen 4.2 and still exists in the latest Xen 4.6.
>>> Thanks.
>>> Reported by Tony Suo and Yong Zhao from UCCS
>>>
>>> --
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel



-- 

**********************************
> Kun SUO
> Email: suokunstar@gmail.com   |   ksuo@uccs.edu
> University of Colorado at Colorado Springs
> 1420 Austin Bluffs Pkwy, Colorado Springs, CO 80918
**********************************

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-27 20:32   ` suokun
@ 2015-10-28  5:41     ` Dario Faggioli
  2015-10-28 17:04       ` suokun
  0 siblings, 1 reply; 16+ messages in thread
From: Dario Faggioli @ 2015-10-28  5:41 UTC (permalink / raw)
  To: suokun; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 3509 bytes --]

On Tue, 2015-10-27 at 14:32 -0600, suokun wrote:
> On Tue, Oct 27, 2015 at 4:44 AM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:

> Hi, Dario,
> Thank you for your reply.
>
Hi,

> Here are my two VMs running on the same physical CPU.
> VM-IO: 1-vCPU pinned to a pCPU, running netperf
> VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop
> Another machine run the netperf client to send the requests to VM-IO.
> 
> My code is very simple:
> in VM-IO, as server side: $ netserver -p 12345
> in VM-CPU, just running a while(1) loop: $./loop
> in the client, send I/O request to the VM-IO: $ netperf -H
> [server_ip]
> -l 15 -t TCP_STREAM -p 12345
> 
Ok, thanks.

> The setting that led to the poor IO performance is as follows:
> VM-IO:  1-vCPU pinned to a pCPU, running netperf
> VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop
> 
> The root cause is that when an IO request comes, VM-IO’s vCPU is
> elevated to BOOST and goes through vcpu_wake —> __runq_tickle. In
> __runq_tickle, the currently running vCPU (i.e., the vCPU from VM
> -CPU) is marked as _VPF_migrating.
>
Ok.

> Then, Xen goes through schedule() to
> reschedule the current vCPU (i.e., vCPU from VM-CPU) and schedule the
> next vCPU (i.e., the vCPU from VM-IO). Due to the _VPF_migrating 
> flag, the descheduled vCPU will be migrated in context_saved() and 
> later woken up in cpu_wake().
>
Sure.

> Indeed, csched_vcpu_wake() will quit if the
> vCPU from VM-CPU is on run queue. But it is actually not. In
> csched_schedule(), the vCPU will not be inserted back to run queue
> because it is not runnable due to the __VPF_migrating bit in
> pause_flags. As such, the vCPU from VM-CPU will boosted and not be
> preempted by a later IO request because BOOST can not preempt BOOST.
> 
Aha! Now I see what you mean. From the previous email, I couldn't
really tell which one call to schedule you where looking at, during
each phase of the analysis... Thanks for clarifying!

And, yes, I agree with you that, since the vCPU of VM-CPU fails the
vcpu_runnable() test, it's being treated as it is really waking up from
sleep, in csched_vcpu_wake(), and hence boosted.

> A simple fix would be allowing BOOST to preempt BOOST. 
>
Nah, that would be an hack on top of an hack! :-P

> A better fix
> would be checking the CPU affinity before setting the __VPF_migrating
> flag.
> 
Yeah, I like this better. So, can you try the patch attached to this
email?

Here at my place, without any patch, I get the following results:

 idle:       throughput = 806.64
 with noise: throughput = 166.50

With the patch, I get this:

 idle:       throughput = 807.18
 with noise: throughput = 731.66

The patch (if you confirm that it works) fixes the bug in this
particular situations, where vCPUs are all pinned to the same pCPUs,
but does not prevent vCPUs being migrated around the pCPUs to become
BOOSTed in Credit2.

That is something I think we should avoid, and I've got a (small) patch
series ready for that. I'll give some more testing to it before sending
it to the list, though, as I want to make sure it's not causing
regressions.

Thanks and Regards,
Dario
--- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.1.2: xen-sched-credit1-fix-tickle-migrate-cur.patch --]
[-- Type: text/x-patch, Size: 5341 bytes --]

commit 16381936ad320d010c7566c946a3e528f803e78a
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Tue Oct 27 23:22:16 2015 +0100

    xen: credit1: on vCPU wakeup, kick away current only if makes sense
    
    In fact, when waking up a vCPU, __runq_tickle() is called
    to allow the new vCPU to run on a pCPU (which one, depends
    on the relationship between the priority of the new vCPU,
    and the ones of the vCPUs that are already running).
    
    If there is no idle processor on which the new vCPU can
    run (e.g., because of pinning/affinity), we try to migrate
    away the vCPU that is currently running on the new vCPU's
    processor (i.e., the processor on which the vCPU is waking
    up).
    
    Now, trying to migrate a vCPU has the effect of pushing it
    through a
    
     running --> offline --> runnable
    
    transition, which, in turn, has the following negative
    effects:
    
     1) Credit1 counts that as a wakeup, and it BOOSTs the
        vCPU, even if it is a CPU-bound one, which wouldn't
        normally have deserved boosting. This can prevent
        legit IO-bound vCPUs to get ahold of the processor
        until such spurious boosting expires, hurting the
        performance!
    
     2) since the vCPU is fails the vcpu_runnable() test
        (within the call to csched_schedule() that follows
        the wakeup, as a consequence of tickling) the
        scheduling rate-limiting mechanism is also fooled,
        i.e., the context switch happens even if less than
        the minimum execution amount of time passed.
    
    In particular, 1) has been reported to cause the following
    issue:
    
     * VM-IO: 1-vCPU pinned to a pCPU, running netperf
     * VM-CPU: 1-vCPU pinned the the same pCPU, running a busy
               CPU loop
     ==> Only VM-I/O: throughput is 806.64 Mbps
     ==> VM-I/O + VM-CPU: throughput is 166.50 Mbps
    
    This patch solves (for the above scenario) the problem
    by checking whether or not it makes sense to try to
    migrate away the vCPU currently running on the processor.
    In fact, we shouldn't even try to do it, if there aren't
    idle processors where such a vCPU can execute. In such case,
    Attempting the migration is just futile (harmful, actually!).
    
    With this patch, in the above configuration, results are:
    
     ==> Only VM-I/O: throughput is 807.18 Mbps
     ==> VM-I/O + VM-CPU: throughput is 731.66 Mbps
    
    Note that, still about 1), it is _wrong_ that Credit1
    treats wakeups resulting from migration of a vCPU to
    another pCPU as "regular wakeups", hence granting BOOST
    priority to the vCPUs experiencing that. However:
     - fixing that is non-trivial, and requires being done
       in its own patch;
     - that is orthogonal to the fix being introduced here.
       That is to say, even when Credit1 will be fixed not
       to boost migrating vCPUs, this patch will be still
       corect and necessary.
    
    Reported-by: suokun <suokunstar@gmail.com>
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
    ---
    Cc: George Dunlap <george.dunlap@citrix.com>
    Cc: suokun <suokunstar@gmail.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index b8f28fe..1b30e67 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -426,9 +426,10 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
 
             /*
              * If there are no suitable idlers for new, and it's higher
-             * priority than cur, ask the scheduler to migrate cur away.
-             * We have to act like this (instead of just waking some of
-             * the idlers suitable for cur) because cur is running.
+             * priority than cur, check whether we can migrate cur away.
+             * (We have to do it indirectly, via _VPF_migrating, instead
+             * of just tickling any idler suitable for cur) because cur
+             * is running.)
              *
              * If there are suitable idlers for new, no matter priorities,
              * leave cur alone (as it is running and is, likely, cache-hot)
@@ -437,11 +438,18 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
              */
             if ( new_idlers_empty && new->pri > cur->pri )
             {
+                csched_balance_cpumask(cur->vcpu, balance_step,
+                                       csched_balance_mask(cpu));
+                if ( cpumask_intersects(csched_balance_mask(cpu),
+                                        &idle_mask) )
+                {
+                    SCHED_VCPU_STAT_CRANK(cur, kicked_away);
+                    SCHED_VCPU_STAT_CRANK(cur, migrate_r);
+                    SCHED_STAT_CRANK(migrate_kicked_away);
+                    set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
+                }
+                /* Tickle cpu anyway, to let new preempt cur. */
                 SCHED_STAT_CRANK(tickle_idlers_none);
-                SCHED_VCPU_STAT_CRANK(cur, kicked_away);
-                SCHED_VCPU_STAT_CRANK(cur, migrate_r);
-                SCHED_STAT_CRANK(migrate_kicked_away);
-                set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
                 __cpumask_set_cpu(cpu, &mask);
             }
             else if ( !new_idlers_empty )

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-27 20:11   ` suokun
  2015-10-28  5:39     ` suokun
@ 2015-10-28  5:54     ` Dario Faggioli
  2015-10-28  6:01       ` Juergen Gross
  1 sibling, 1 reply; 16+ messages in thread
From: Dario Faggioli @ 2015-10-28  5:54 UTC (permalink / raw)
  To: suokun, George Dunlap; +Cc: xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2883 bytes --]

On Tue, 2015-10-27 at 14:11 -0600, suokun wrote:
> On Tue, Oct 27, 2015 at 3:44 AM, George Dunlap <dunlapg@umich.edu>
> wrote:
> > On Tue, Oct 27, 2015 at 5:59 AM, suokun <suokunstar@gmail.com>
> > wrote:

> Thank you for your reply. I have test credit2 this morning. The I/O
> performance is correct, however, the CPU accounting seems not
> correct.
> Here is my experiment on credit2:
> 
> VM-IO:          1-vCPU pinned to a pCPU, running netperf
> VM-CPU:      1-vCPU pinned the the same pCPU, running a while(1) loop
> The throughput of netperf is the same(941Mbps) as VM-IO runs alone.
> 
> However, when I use xl top to show the VM CPU utilization, VM-IO
> takes
> 73% of CPU time and VM-CPU takes 99% CPU time. Their sum is more than
> 100%. I doubt it is due to the CPU utilization accounting in credit2
> scheduler.
> 
Yeah, well, sorry, but even if we both (me and George) encouraged you
to try Credit2, that wasn't a great idea. :-(  In fact, you're using
pinning for this test, and Credit2 does not have pinning (yet)! :-P

That explains why utilizations are summing up to higher than 100%:
vCPUs are just not being confined to one processor.

Pinning for Credit2 is just around the corner. Let's try this again
when it will be there, ok? :-D

> > Also, do you have a patch to fix it in credit1? :-)
> > 
> 
> For the patch to my problem in credit1. I have two ideas:
> 
> 1) if the vCPU cannot migrate(e.g. pinned, CPU affinity, even only
> has
> one physical CPU), do not set the _VPF_migrating flag.
> 
Yep, that's step 1. I hadn't seen this mail, so I produced a patch
myself (see my other reply). Is it similar to your one? If you could
test it, it would be great.

Even after this is done, though, we still need to fix the fact that
Credit1 boosts vCPUs upon migrations, which looks utterly crazy to me!
I've got (drafted) patches for that too, but I want to stress test them
a bit more before submitting them officially. I'm attaching them to
this email, feel free to have a look and provide your views.

> 2) let the BOOST state can preempt with each other.
> 
Yeah, but...

> Actually I have tested both separately and they both work. But
> personally I prefer the first option because it solved the problem
> from the source.
> 
... I don't like 2) that much either. Credit1 is, by design, round
-robin within equal priority levels. There are already quite a few
hacks in that code, and breaking even that rather basic assumption
would scary an awful lot!! :-O

Thanks a lot again fro your report and your analysis.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.1.2: xen-credit1-avoid-boosting-when-migrating.patch --]
[-- Type: text/x-patch, Size: 1350 bytes --]

commit 6d60b1ecf7d79d00d946ae19a52f256d4bc2a823
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Wed Oct 28 01:15:35 2015 +0100

    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 683feeb..29a9175 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1005,15 +1005,17 @@ csched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc, unsigned wf)
      * more CPU resource intensive VCPUs without impacting overall 
      * system fairness.
      *
-     * The one exception is for VCPUs of capped domains unpausing
-     * after earning credits they had overspent. We don't boost
-     * those.
+     * There are a couple of exceptions:
+     *  - VCPUs of capped domains unpausing after earning credits
+     *    they had overspent;
+     *  - VCPUs that are being migrated to another pCPU, rather
+     *    than actually waking up after being blocked.
+     * We don't boost those.
      */
     if ( svc->pri == CSCHED_PRI_TS_UNDER &&
-         !test_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) )
-    {
+         !test_bit(CSCHED_FLAG_VCPU_PARKED, &svc->flags) &&
+         !(wf & WF_migrating) )
         svc->pri = CSCHED_PRI_TS_BOOST;
-    }
 
     /* Put the VCPU on the runq and tickle CPUs */
     __runq_insert(cpu, svc);

[-- Attachment #1.1.3: xen-sched-introduce-wakeup-flags.patch --]
[-- Type: text/x-patch, Size: 4984 bytes --]

commit b71e3f10898a6d1c849bd9135f15aa8367d897d0
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Wed Oct 28 00:47:17 2015 +0100

    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index dbe02ed..de65e96 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -537,7 +537,7 @@ a653sched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
  * @param vc        Pointer to the VCPU structure for the current domain
  */
 static void
-a653sched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+a653sched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc, unsigned wf)
 {
     if ( AVCPU(vc) != NULL )
         AVCPU(vc)->awake = 1;
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index b8f28fe..683feeb 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -966,7 +966,7 @@ csched_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+csched_vcpu_wake(const struct scheduler *ops, struct vcpu *vc, unsigned wf)
 {
     struct csched_vcpu * const svc = CSCHED_VCPU(vc);
     const unsigned int cpu = vc->processor;
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 6695729..6b32778 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -957,7 +957,7 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
 }
 
 static void
-csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc, unsigned wf)
 {
     struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
     s_time_t now = 0;
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 6a341b1..677e3f4 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1031,7 +1031,7 @@ out:
  * TODO: what if these two vcpus belongs to the same domain?
  */
 static void
-rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc)
+rt_vcpu_wake(const struct scheduler *ops, struct vcpu *vc, unsigned wf)
 {
     struct rt_vcpu * const svc = rt_vcpu(vc);
     s_time_t now = NOW();
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index c5f640f..bacee73 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -407,7 +407,7 @@ void vcpu_sleep_sync(struct vcpu *v)
     sync_vcpu_execstate(v);
 }
 
-void vcpu_wake(struct vcpu *v)
+static void _vcpu_wake(struct vcpu *v, unsigned wake_flags)
 {
     unsigned long flags;
     spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
@@ -416,7 +416,7 @@ void vcpu_wake(struct vcpu *v)
     {
         if ( v->runstate.state >= RUNSTATE_blocked )
             vcpu_runstate_change(v, RUNSTATE_runnable, NOW());
-        SCHED_OP(VCPU2OP(v), wake, v);
+        SCHED_OP(VCPU2OP(v), wake, v, wake_flags);
     }
     else if ( !(v->pause_flags & VPF_blocked) )
     {
@@ -429,6 +429,11 @@ void vcpu_wake(struct vcpu *v)
     TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id);
 }
 
+void vcpu_wake(struct vcpu *v)
+{
+    return _vcpu_wake(v, WF_wakeup);
+}
+
 void vcpu_unblock(struct vcpu *v)
 {
     if ( !test_and_clear_bit(_VPF_blocked, &v->pause_flags) )
@@ -577,8 +582,8 @@ static void vcpu_migrate(struct vcpu *v)
     if ( old_cpu != new_cpu )
         sched_move_irqs(v);
 
-    /* Wake on new CPU. */
-    vcpu_wake(v);
+    /* Wake on new CPU (and let the scheduler know it's a migration). */
+    _vcpu_wake(v, WF_migrating);
 }
 
 /*
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 493d43f..af1ed60 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -144,7 +144,8 @@ struct scheduler {
     void         (*remove_vcpu)    (const struct scheduler *, struct vcpu *);
 
     void         (*sleep)          (const struct scheduler *, struct vcpu *);
-    void         (*wake)           (const struct scheduler *, struct vcpu *);
+    void         (*wake)           (const struct scheduler *, struct vcpu *,
+                                    unsigned int);
     void         (*yield)          (const struct scheduler *, struct vcpu *);
     void         (*context_saved)  (const struct scheduler *, struct vcpu *);
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 3729b0f..6e7a108 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -753,6 +753,16 @@ static inline struct domain *next_domain_in_cpupool(
 #define _VPF_in_reset        7
 #define VPF_in_reset         (1UL<<_VPF_in_reset)
 
+/*
+ * VCPU wake up flags.
+ */
+/* VCPU being actually woken up. */
+#define _WF_wakeup           0
+#define WF_wakeup            (1U<<_WF_wakeup)
+/* VCPU being (woken just after having been) migrated. */
+#define _WF_migrating        1
+#define WF_migrating         (1U<<_WF_migrating)
+
 static inline int vcpu_runnable(struct vcpu *v)
 {
     return !(v->pause_flags |

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-28  5:54     ` Dario Faggioli
@ 2015-10-28  6:01       ` Juergen Gross
  2015-10-28  6:08         ` Dario Faggioli
  0 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2015-10-28  6:01 UTC (permalink / raw)
  To: Dario Faggioli, suokun, George Dunlap; +Cc: xen-devel

On 10/28/2015 06:54 AM, Dario Faggioli wrote:
> On Tue, 2015-10-27 at 14:11 -0600, suokun wrote:
>> On Tue, Oct 27, 2015 at 3:44 AM, George Dunlap <dunlapg@umich.edu>
>> wrote:
>>> On Tue, Oct 27, 2015 at 5:59 AM, suokun <suokunstar@gmail.com>
>>> wrote:
>
>> Thank you for your reply. I have test credit2 this morning. The I/O
>> performance is correct, however, the CPU accounting seems not
>> correct.
>> Here is my experiment on credit2:
>>
>> VM-IO:          1-vCPU pinned to a pCPU, running netperf
>> VM-CPU:      1-vCPU pinned the the same pCPU, running a while(1) loop
>> The throughput of netperf is the same(941Mbps) as VM-IO runs alone.
>>
>> However, when I use xl top to show the VM CPU utilization, VM-IO
>> takes
>> 73% of CPU time and VM-CPU takes 99% CPU time. Their sum is more than
>> 100%. I doubt it is due to the CPU utilization accounting in credit2
>> scheduler.
>>
> Yeah, well, sorry, but even if we both (me and George) encouraged you
> to try Credit2, that wasn't a great idea. :-(  In fact, you're using
> pinning for this test, and Credit2 does not have pinning (yet)! :-P
>
> That explains why utilizations are summing up to higher than 100%:
> vCPUs are just not being confined to one processor.
>
> Pinning for Credit2 is just around the corner. Let's try this again
> when it will be there, ok? :-D

Or try it in a cpupool with just one pcpu?


Juergen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-28  6:01       ` Juergen Gross
@ 2015-10-28  6:08         ` Dario Faggioli
  2015-10-28 11:03           ` George Dunlap
  0 siblings, 1 reply; 16+ messages in thread
From: Dario Faggioli @ 2015-10-28  6:08 UTC (permalink / raw)
  To: Juergen Gross, suokun, George Dunlap; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1073 bytes --]

On Wed, 2015-10-28 at 07:01 +0100, Juergen Gross wrote:
> On 10/28/2015 06:54 AM, Dario Faggioli wrote:

> > Yeah, well, sorry, but even if we both (me and George) encouraged
> > you
> > to try Credit2, that wasn't a great idea. :-(  In fact, you're
> > using
> > pinning for this test, and Credit2 does not have pinning (yet)! :-P
> > 
> > That explains why utilizations are summing up to higher than 100%:
> > vCPUs are just not being confined to one processor.
> > 
> > Pinning for Credit2 is just around the corner. Let's try this again
> > when it will be there, ok? :-D
> 
> Or try it in a cpupool with just one pcpu?
> 
Oh, well, yes. That is certainly an alternative. :-)

I'm curious about how it'd go, so I'm probably going to give it a try
later...

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-28  6:08         ` Dario Faggioli
@ 2015-10-28 11:03           ` George Dunlap
  0 siblings, 0 replies; 16+ messages in thread
From: George Dunlap @ 2015-10-28 11:03 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Juergen Gross, suokun, xen-devel@lists.xen.org

On Wed, Oct 28, 2015 at 6:08 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On Wed, 2015-10-28 at 07:01 +0100, Juergen Gross wrote:
>> On 10/28/2015 06:54 AM, Dario Faggioli wrote:
>
>> > Yeah, well, sorry, but even if we both (me and George) encouraged
>> > you
>> > to try Credit2, that wasn't a great idea. :-(  In fact, you're
>> > using
>> > pinning for this test, and Credit2 does not have pinning (yet)! :-P
>> >
>> > That explains why utilizations are summing up to higher than 100%:
>> > vCPUs are just not being confined to one processor.
>> >
>> > Pinning for Credit2 is just around the corner. Let's try this again
>> > when it will be there, ok? :-D
>>
>> Or try it in a cpupool with just one pcpu?
>>
> Oh, well, yes. That is certainly an alternative. :-)
>
> I'm curious about how it'd go, so I'm probably going to give it a try
> later...

I was going to say, using cpupools rather than pinning is probably a
better idea anyway.  I might go so far as to say that if you're trying
to test the actual scheduling algorithms, you should *only* use
cpupools and *never* pinning.  The whole reason we invented cpupools
in the first place was that pinning tends to break the assumptions of
the scheduling algorithms in ways it's not really cost-effective to
work around.

 -George

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-28  5:41     ` Dario Faggioli
@ 2015-10-28 17:04       ` suokun
  2015-10-29 10:25         ` Dario Faggioli
  0 siblings, 1 reply; 16+ messages in thread
From: suokun @ 2015-10-28 17:04 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: jgross, George Dunlap, xen-devel

On Tue, Oct 27, 2015 at 11:41 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On Tue, 2015-10-27 at 14:32 -0600, suokun wrote:
>> On Tue, Oct 27, 2015 at 4:44 AM, Dario Faggioli
>> <dario.faggioli@citrix.com> wrote:
>
>> Hi, Dario,
>> Thank you for your reply.
>>
> Hi,
>
>> Here are my two VMs running on the same physical CPU.
>> VM-IO: 1-vCPU pinned to a pCPU, running netperf
>> VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop
>> Another machine run the netperf client to send the requests to VM-IO.
>>
>> My code is very simple:
>> in VM-IO, as server side: $ netserver -p 12345
>> in VM-CPU, just running a while(1) loop: $./loop
>> in the client, send I/O request to the VM-IO: $ netperf -H
>> [server_ip]
>> -l 15 -t TCP_STREAM -p 12345
>>
> Ok, thanks.
>
>> The setting that led to the poor IO performance is as follows:
>> VM-IO:  1-vCPU pinned to a pCPU, running netperf
>> VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop
>>
>> The root cause is that when an IO request comes, VM-IO’s vCPU is
>> elevated to BOOST and goes through vcpu_wake —> __runq_tickle. In
>> __runq_tickle, the currently running vCPU (i.e., the vCPU from VM
>> -CPU) is marked as _VPF_migrating.
>>
> Ok.
>
>> Then, Xen goes through schedule() to
>> reschedule the current vCPU (i.e., vCPU from VM-CPU) and schedule the
>> next vCPU (i.e., the vCPU from VM-IO). Due to the _VPF_migrating
>> flag, the descheduled vCPU will be migrated in context_saved() and
>> later woken up in cpu_wake().
>>
> Sure.
>
>> Indeed, csched_vcpu_wake() will quit if the
>> vCPU from VM-CPU is on run queue. But it is actually not. In
>> csched_schedule(), the vCPU will not be inserted back to run queue
>> because it is not runnable due to the __VPF_migrating bit in
>> pause_flags. As such, the vCPU from VM-CPU will boosted and not be
>> preempted by a later IO request because BOOST can not preempt BOOST.
>>
> Aha! Now I see what you mean. From the previous email, I couldn't
> really tell which one call to schedule you where looking at, during
> each phase of the analysis... Thanks for clarifying!
>
> And, yes, I agree with you that, since the vCPU of VM-CPU fails the
> vcpu_runnable() test, it's being treated as it is really waking up from
> sleep, in csched_vcpu_wake(), and hence boosted.
>
>> A simple fix would be allowing BOOST to preempt BOOST.
>>
> Nah, that would be an hack on top of an hack! :-P
>
>> A better fix
>> would be checking the CPU affinity before setting the __VPF_migrating
>> flag.
>>
> Yeah, I like this better. So, can you try the patch attached to this
> email?
>
> Here at my place, without any patch, I get the following results:
>
>  idle:       throughput = 806.64
>  with noise: throughput = 166.50
>
> With the patch, I get this:
>
>  idle:       throughput = 807.18
>  with noise: throughput = 731.66
>
> The patch (if you confirm that it works) fixes the bug in this
> particular situations, where vCPUs are all pinned to the same pCPUs,
> but does not prevent vCPUs being migrated around the pCPUs to become
> BOOSTed in Credit2.
>
> That is something I think we should avoid, and I've got a (small) patch
> series ready for that. I'll give some more testing to it before sending
> it to the list, though, as I want to make sure it's not causing
> regressions.
>
> Thanks and Regards,
> Dario

Hi, Dario,

thank you for your reply.

Here is my patch, actually just one line of code:

if ( new_idlers_empty && new->pri > cur->pri )
{
    SCHED_STAT_CRANK(tickle_idlers_none);
    SCHED_VCPU_STAT_CRANK(cur, kicked_away);
    SCHED_VCPU_STAT_CRANK(cur, migrate_r);
    SCHED_STAT_CRANK(migrate_kicked_away);

+   /* migration can happen only cpu number greater than 1 and vcpu is
not pinned to a single physical CPU */
+   if(num_online_cpus() > 1 &&
cpumask_weight((cur->vcpu)->cpu_hard_affinity) > 1) {
        set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
+   }
    cpumask_set_cpu(cpu, &mask);
}

without patch:
idle: throughput = 941
with noise: throughput = 32

with patch
idle: throughput = 941
with noise: throughput = 691


I tried your patch, here is the test results on my machine:
with your patch
idle: throughput = 941
with noise: throughput = 658

Both our patch can improve the I/O throughput with noise
significantly. But still, compared to the I/O-only scenario, there is
a 250~290 gap.

That is due to the ratelimit in Xen's credit scheduler. The default
value of rate limit is 1000us which means once CPU-intensive vCPU
starts to run, I/O-intensive vCPU need to wait 1000us even though an
I/O-request comes or its priority is BOOST. However, the time interval
between two I/O requests in Netperf is just tens of microsecond, far
less than the ratelimit. That will make some I/O-request cannot be
handled in time, cause the loss of throughput.

I tried to reduce the rate limit manually and the throughput will
increase after that.

Best
Tony


> ---
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] mistakenly wake in Xen's credit scheduler
  2015-10-28 17:04       ` suokun
@ 2015-10-29 10:25         ` Dario Faggioli
  0 siblings, 0 replies; 16+ messages in thread
From: Dario Faggioli @ 2015-10-29 10:25 UTC (permalink / raw)
  To: suokun; +Cc: jgross, George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2776 bytes --]

On Wed, 2015-10-28 at 11:04 -0600, suokun wrote:
> Hi, Dario,
> 
Hi,

> Here is my patch, actually just one line of code:
> 
Yep, I saw it on the list, only after writing the email when I asked
you about it. :-)

> if ( new_idlers_empty && new->pri > cur->pri )
> {
>     SCHED_STAT_CRANK(tickle_idlers_none);
>     SCHED_VCPU_STAT_CRANK(cur, kicked_away);
>     SCHED_VCPU_STAT_CRANK(cur, migrate_r);
>     SCHED_STAT_CRANK(migrate_kicked_away);
> 
> +   /* migration can happen only cpu number greater than 1 and vcpu
> is
> not pinned to a single physical CPU */
> +   if(num_online_cpus() > 1 &&
> cpumask_weight((cur->vcpu)->cpu_hard_affinity) > 1) {
>         set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
> +   }
>
This is ok, in the specific case under test here. However, while we are
here, it also makes sense to check whether migration will actually have
any chance of happening. That is influenced by whether there are
suitable idle pCPUs in the system (we're doing stuff like that in this
everywhere in this function).

In fact, even when cur has broader affinity, if none of the pCPUs where
it can run are idle, it does not make any sense to attempt the
migration (and, in fact, without the other fix I was mentioning in
place, that would trigger the spurious boosting behavior that you
discovered).

Also, given how load balancing works in Credit1, i.e., it takes both
hard and soft affinity into account, we need to use the proper mask,
depending on what 'balancing step' we are in.

That is what my patch is doing.

> Both our patch can improve the I/O throughput with noise
> significantly. But still, compared to the I/O-only scenario, there is
> a 250~290 gap.
> 
> That is due to the ratelimit in Xen's credit scheduler. 
>
Yes, I investigated that myself, and I also traced it to that root
cause.

> The default
> value of rate limit is 1000us which means once CPU-intensive vCPU
> starts to run, I/O-intensive vCPU need to wait 1000us even though an
> I/O-request comes or its priority is BOOST. However, the time
> interval
> between two I/O requests in Netperf is just tens of microsecond, far
> less than the ratelimit. That will make some I/O-request cannot be
> handled in time, cause the loss of throughput.
> 
Indeed.

> I tried to reduce the rate limit manually and the throughput will
> increase after that.
> 
I saw that too.

Thanks again a lot for the report, and for testing the patch.

Regards,
Dario
---
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-10-29 10:25 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-27  5:59 [BUG] mistakenly wake in Xen's credit scheduler suokun
2015-10-27  9:44 ` George Dunlap
2015-10-27  9:53   ` Dario Faggioli
2015-10-27 20:11   ` suokun
2015-10-28  5:39     ` suokun
2015-10-28  5:54     ` Dario Faggioli
2015-10-28  6:01       ` Juergen Gross
2015-10-28  6:08         ` Dario Faggioli
2015-10-28 11:03           ` George Dunlap
2015-10-27 10:44 ` Dario Faggioli
2015-10-27 20:32   ` suokun
2015-10-28  5:41     ` Dario Faggioli
2015-10-28 17:04       ` suokun
2015-10-29 10:25         ` Dario Faggioli
  -- strict thread matches above, loose matches on Subject: below --
2015-10-26 22:30 Kun Suo
2015-10-27  5:48 ` Jia Rao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).