* xen/arm: Domain not fully destroyed when using credit2 @ 2017-01-23 19:42 Julien Grall 2017-01-24 0:16 ` Stefano Stabellini 2017-01-24 8:20 ` Jan Beulich 0 siblings, 2 replies; 33+ messages in thread From: Julien Grall @ 2017-01-23 19:42 UTC (permalink / raw) To: Stefano Stabellini, Dario Faggioli, George Dunlap, Andrew Cooper, Jan Beulich, Konrad Rzeszutek Wilk, Wei Liu, Ian Jackson, Tim Deegan Cc: xen-devel Hi all, Before someone dig into the scheduler, I don't think this is an issue in credit2 but the use of it highlight a bug in another component (I think RCU). Whilst testing other patches today, I have noticed that some part of the resources allocated to a guest were not released during the destruction. The configuration of the test is: - ARM platform with 6 cores - staging Xen with credit2 enabled by default - DOM0 using 2 pinned vCPUs The test is creating a guest vCPUs and then destroyed. After the test, some resourced are not released (or could be released a long time after). Looking at the code, domain resources are released in 2 phases: - domain_destroy: called when there is no more reference on the domain (see put_domain) - complete_domain_destroy: called when the RCU is quiescent The function domain_destroy will setup the RCU callback (complete_domain_destroy) by calling call_rcu. call_rcu will add the callback into the RCU list and then will may send an IPI (see force_quiescent_state) if the threshold reached. This IPI is here to make sure all CPUs are quiescent before calling the callbacks (e.g complete_domain_destroy). In my case, the threshold has not reached and therefore an IPI is not sent. On ARM, the idle will run when the pCPU has no work to do. This loop will wait to receive an interrupt (see wfi) and check if there is some work to do when the CPU has waken-up (i.e an interrupt was received). The problem I encountered is the idle CPU will never receive interrupts (no timer, nor IPI...) and therefore never check whether the RCU has some work to do. From my understanding, this is a bug in how RCU is handled (see comment above rcu_start_batch), it expects each CPU (no broadcast) to check whether there is RCU work. But this is relying on someone else (timer?) to fire an interrupt. Any incoming interrupts will make a pCPU checking the RCU state. On ARM, the biggest source of IPI was credit1 or timer if a guest vCPU was scheduled on that pCPU. But it looks like the IPI traffic with credit2 was reduced to none (which is a really good thing :)), and no guest timer was scheduled because no vCPU ever run on this pCPU. I think the bug has always been here (both ARM and x86), but never detected because any incoming interrupts will make the pCPU to check the RCU state. However, I am not sure how to resolve this issue. Any thoughts? Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-23 19:42 xen/arm: Domain not fully destroyed when using credit2 Julien Grall @ 2017-01-24 0:16 ` Stefano Stabellini 2017-01-24 12:52 ` Julien Grall 2017-01-24 8:20 ` Jan Beulich 1 sibling, 1 reply; 33+ messages in thread From: Stefano Stabellini @ 2017-01-24 0:16 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Dario Faggioli, Ian Jackson, Tim Deegan, Jan Beulich, xen-devel On Mon, 23 Jan 2017, Julien Grall wrote: > Hi all, > > Before someone dig into the scheduler, I don't think this is an issue in > credit2 but the use of it highlight a bug in another component (I think RCU). > > Whilst testing other patches today, I have noticed that some part of the > resources allocated to a guest were not released during the destruction. > > The configuration of the test is: > - ARM platform with 6 cores > - staging Xen with credit2 enabled by default > - DOM0 using 2 pinned vCPUs > > The test is creating a guest vCPUs and then destroyed. After the test, some > resourced are not released (or could be released a long time > after). > > Looking at the code, domain resources are released in 2 phases: > - domain_destroy: called when there is no more reference on the domain > (see put_domain) > - complete_domain_destroy: called when the RCU is quiescent > > The function domain_destroy will setup the RCU callback > (complete_domain_destroy) by calling call_rcu. call_rcu will add the callback > into the RCU list and then will may send an IPI (see force_quiescent_state) if > the threshold reached. This IPI is here to make sure all CPUs are quiescent > before calling the callbacks (e.g complete_domain_destroy). In my case, the > threshold has not reached and therefore an IPI is not sent. > > On ARM, the idle will run when the pCPU has no work to do. This loop will wait > to receive an interrupt (see wfi) and check if there is some work to do when > the CPU has waken-up (i.e an interrupt was received). > > The problem I encountered is the idle CPU will never receive interrupts (no > timer, nor IPI...) and therefore never check whether the RCU has some work to > do. > > From my understanding, this is a bug in how RCU is handled (see comment above > rcu_start_batch), it expects each CPU (no broadcast) to check whether there is > RCU work. But this is relying on someone else (timer?) to fire an interrupt. > > Any incoming interrupts will make a pCPU checking the RCU state. On ARM, the > biggest source of IPI was credit1 or timer if a guest vCPU was scheduled on > that pCPU. But it looks like the IPI traffic with credit2 was reduced to none > (which is a really good thing :)), and no guest timer was scheduled because no > vCPU ever run on this pCPU. > > I think the bug has always been here (both ARM and x86), but never detected > because any incoming interrupts will make the pCPU to check the RCU state. > > However, I am not sure how to resolve this issue. Any thoughts? Well done for finding the bug! Sending an IPI on call_rcu is easy, but it would be better not to wake up the sleeping cpus at all. If they are running the idle_loop, they cannot be holding any rcu references for the domain which is about to be destroyed, right? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 0:16 ` Stefano Stabellini @ 2017-01-24 12:52 ` Julien Grall 0 siblings, 0 replies; 33+ messages in thread From: Julien Grall @ 2017-01-24 12:52 UTC (permalink / raw) To: Stefano Stabellini Cc: Wei Liu, George Dunlap, Andrew Cooper, Dario Faggioli, Ian Jackson, Tim Deegan, Jan Beulich, xen-devel Hi Stefano, On 24/01/17 00:16, Stefano Stabellini wrote: > On Mon, 23 Jan 2017, Julien Grall wrote: >> Hi all, >> >> Before someone dig into the scheduler, I don't think this is an issue in >> credit2 but the use of it highlight a bug in another component (I think RCU). >> >> Whilst testing other patches today, I have noticed that some part of the >> resources allocated to a guest were not released during the destruction. >> >> The configuration of the test is: >> - ARM platform with 6 cores >> - staging Xen with credit2 enabled by default >> - DOM0 using 2 pinned vCPUs >> >> The test is creating a guest vCPUs and then destroyed. After the test, some >> resourced are not released (or could be released a long time >> after). >> >> Looking at the code, domain resources are released in 2 phases: >> - domain_destroy: called when there is no more reference on the domain >> (see put_domain) >> - complete_domain_destroy: called when the RCU is quiescent >> >> The function domain_destroy will setup the RCU callback >> (complete_domain_destroy) by calling call_rcu. call_rcu will add the callback >> into the RCU list and then will may send an IPI (see force_quiescent_state) if >> the threshold reached. This IPI is here to make sure all CPUs are quiescent >> before calling the callbacks (e.g complete_domain_destroy). In my case, the >> threshold has not reached and therefore an IPI is not sent. >> >> On ARM, the idle will run when the pCPU has no work to do. This loop will wait >> to receive an interrupt (see wfi) and check if there is some work to do when >> the CPU has waken-up (i.e an interrupt was received). >> >> The problem I encountered is the idle CPU will never receive interrupts (no >> timer, nor IPI...) and therefore never check whether the RCU has some work to >> do. >> >> From my understanding, this is a bug in how RCU is handled (see comment above >> rcu_start_batch), it expects each CPU (no broadcast) to check whether there is >> RCU work. But this is relying on someone else (timer?) to fire an interrupt. >> >> Any incoming interrupts will make a pCPU checking the RCU state. On ARM, the >> biggest source of IPI was credit1 or timer if a guest vCPU was scheduled on >> that pCPU. But it looks like the IPI traffic with credit2 was reduced to none >> (which is a really good thing :)), and no guest timer was scheduled because no >> vCPU ever run on this pCPU. >> >> I think the bug has always been here (both ARM and x86), but never detected >> because any incoming interrupts will make the pCPU to check the RCU state. >> >> However, I am not sure how to resolve this issue. Any thoughts? > > Well done for finding the bug! > > Sending an IPI on call_rcu is easy, but it would be better not to wake > up the sleeping cpus at all. If they are running the idle_loop, they > cannot be holding any rcu references for the domain which is about to be > destroyed, right? The problem is not only about domain but anything using the RCU. idle pCPU may have to process softirq time to time. I can't find any reason for a softirq to be forbidden to hold an RCU reference. So I think we have to ensure that this pCPU is really doing nothing. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-23 19:42 xen/arm: Domain not fully destroyed when using credit2 Julien Grall 2017-01-24 0:16 ` Stefano Stabellini @ 2017-01-24 8:20 ` Jan Beulich 2017-01-24 10:50 ` Julien Grall 1 sibling, 1 reply; 33+ messages in thread From: Jan Beulich @ 2017-01-24 8:20 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Dario Faggioli, Ian Jackson, Tim Deegan, xen-devel >>> On 23.01.17 at 20:42, <julien.grall@arm.com> wrote: > Whilst testing other patches today, I have noticed that some part of the > resources allocated to a guest were not released during the destruction. > > The configuration of the test is: > - ARM platform with 6 cores > - staging Xen with credit2 enabled by default > - DOM0 using 2 pinned vCPUs > > The test is creating a guest vCPUs and then destroyed. After the test, > some resourced are not released (or could be released a long time > after). > > Looking at the code, domain resources are released in 2 phases: > - domain_destroy: called when there is no more reference on the domain > (see put_domain) > - complete_domain_destroy: called when the RCU is quiescent > > The function domain_destroy will setup the RCU callback > (complete_domain_destroy) by calling call_rcu. call_rcu will add the > callback into the RCU list and then will may send an IPI (see > force_quiescent_state) if the threshold reached. This IPI is here to > make sure all CPUs are quiescent before calling the callbacks (e.g > complete_domain_destroy). In my case, the threshold has not reached and > therefore an IPI is not sent. But wait - isn't it the nature of RCU that it may take arbitrary time until the actual call(s) happen(s)? If an upper limit is required by a user of RCU, I think it would need to be that entity to arrange for early expiry. I notice in this context that we don't even have synchronize_rcu() in our sources. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 8:20 ` Jan Beulich @ 2017-01-24 10:50 ` Julien Grall 2017-01-24 11:02 ` Jan Beulich 2017-01-24 12:53 ` Dario Faggioli 0 siblings, 2 replies; 33+ messages in thread From: Julien Grall @ 2017-01-24 10:50 UTC (permalink / raw) To: Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Dario Faggioli, Ian Jackson, Tim Deegan, xen-devel Hi Jan, On 24/01/2017 08:20, Jan Beulich wrote: >>>> On 23.01.17 at 20:42, <julien.grall@arm.com> wrote: >> Whilst testing other patches today, I have noticed that some part of the >> resources allocated to a guest were not released during the destruction. >> >> The configuration of the test is: >> - ARM platform with 6 cores >> - staging Xen with credit2 enabled by default >> - DOM0 using 2 pinned vCPUs >> >> The test is creating a guest vCPUs and then destroyed. After the test, >> some resourced are not released (or could be released a long time >> after). >> >> Looking at the code, domain resources are released in 2 phases: >> - domain_destroy: called when there is no more reference on the domain >> (see put_domain) >> - complete_domain_destroy: called when the RCU is quiescent >> >> The function domain_destroy will setup the RCU callback >> (complete_domain_destroy) by calling call_rcu. call_rcu will add the >> callback into the RCU list and then will may send an IPI (see >> force_quiescent_state) if the threshold reached. This IPI is here to >> make sure all CPUs are quiescent before calling the callbacks (e.g >> complete_domain_destroy). In my case, the threshold has not reached and >> therefore an IPI is not sent. > > But wait - isn't it the nature of RCU that it may take arbitrary time > until the actual call(s) happen(s)? Today this arbitrary time could be infinite if an idle pCPU does not receive an interrupt. So some part of domain resource will never be freed. If I am power-cycling a domain in loop, after some time the toolstack will fail to allocate memory because of exhausted resources. Previous instance of the domain was not yet fully destroyed (e.g complete_domain_destroy was not called). > If an upper limit is required by > a user of RCU, I think it would need to be that entity to arrange > for early expiry. This is happening with all the user and not only a domain. Looking at the code, there are already some upper limit: - call_rcu will call force_quiescent_state if the number of element in the RCU queue is > 10000 - the RCU has a grace period (not sure how long), but no timer to ensure the RCU will be called Reducing the threshold in call_rcu (see qhimark) will not help as you may still face memory exhaustion (see above). So I think the only best solution is to actually implement properly the grace period. > I notice in this context that we don't even have > synchronize_rcu() in our sources. I don't think this is a problem here if we handle properly the grace period. Regards, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 10:50 ` Julien Grall @ 2017-01-24 11:02 ` Jan Beulich 2017-01-24 12:30 ` Julien Grall 2017-01-24 12:53 ` Dario Faggioli 1 sibling, 1 reply; 33+ messages in thread From: Jan Beulich @ 2017-01-24 11:02 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Dario Faggioli, Ian Jackson, Tim Deegan, xen-devel >>> On 24.01.17 at 11:50, <julien.grall@arm.com> wrote: > On 24/01/2017 08:20, Jan Beulich wrote: >>>>> On 23.01.17 at 20:42, <julien.grall@arm.com> wrote: >>> Whilst testing other patches today, I have noticed that some part of the >>> resources allocated to a guest were not released during the destruction. >>> >>> The configuration of the test is: >>> - ARM platform with 6 cores >>> - staging Xen with credit2 enabled by default >>> - DOM0 using 2 pinned vCPUs >>> >>> The test is creating a guest vCPUs and then destroyed. After the test, >>> some resourced are not released (or could be released a long time >>> after). >>> >>> Looking at the code, domain resources are released in 2 phases: >>> - domain_destroy: called when there is no more reference on the domain >>> (see put_domain) >>> - complete_domain_destroy: called when the RCU is quiescent >>> >>> The function domain_destroy will setup the RCU callback >>> (complete_domain_destroy) by calling call_rcu. call_rcu will add the >>> callback into the RCU list and then will may send an IPI (see >>> force_quiescent_state) if the threshold reached. This IPI is here to >>> make sure all CPUs are quiescent before calling the callbacks (e.g >>> complete_domain_destroy). In my case, the threshold has not reached and >>> therefore an IPI is not sent. >> >> But wait - isn't it the nature of RCU that it may take arbitrary time >> until the actual call(s) happen(s)? > > Today this arbitrary time could be infinite if an idle pCPU does not > receive an interrupt. So some part of domain resource will never be freed. > > If I am power-cycling a domain in loop, after some time the toolstack > will fail to allocate memory because of exhausted resources. Previous > instance of the domain was not yet fully destroyed (e.g > complete_domain_destroy was not called). > >> If an upper limit is required by >> a user of RCU, I think it would need to be that entity to arrange >> for early expiry. > > This is happening with all the user and not only a domain. Looking at > the code, there are already some upper limit: > - call_rcu will call force_quiescent_state if the number of element in > the RCU queue is > 10000 > - the RCU has a grace period (not sure how long), but no timer to > ensure the RCU will be called This remark in parentheses is quite relevant here, I think: There simply is no upper bound, aiui. This is a conceptional aspect. But I'm in no way an RCU expert, so I may well be entirely off. > Reducing the threshold in call_rcu (see qhimark) will not help as you > may still face memory exhaustion (see above). So I think the only best > solution is to actually implement properly the grace period. Well, with the above in mind - what does "properly" mean here? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 11:02 ` Jan Beulich @ 2017-01-24 12:30 ` Julien Grall 0 siblings, 0 replies; 33+ messages in thread From: Julien Grall @ 2017-01-24 12:30 UTC (permalink / raw) To: Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Dario Faggioli, Ian Jackson, Tim Deegan, xen-devel Hi, On 24/01/17 11:02, Jan Beulich wrote: >>>> On 24.01.17 at 11:50, <julien.grall@arm.com> wrote: >> On 24/01/2017 08:20, Jan Beulich wrote: >>>>>> On 23.01.17 at 20:42, <julien.grall@arm.com> wrote: >>>> Whilst testing other patches today, I have noticed that some part of the >>>> resources allocated to a guest were not released during the destruction. >>>> >>>> The configuration of the test is: >>>> - ARM platform with 6 cores >>>> - staging Xen with credit2 enabled by default >>>> - DOM0 using 2 pinned vCPUs >>>> >>>> The test is creating a guest vCPUs and then destroyed. After the test, >>>> some resourced are not released (or could be released a long time >>>> after). >>>> >>>> Looking at the code, domain resources are released in 2 phases: >>>> - domain_destroy: called when there is no more reference on the domain >>>> (see put_domain) >>>> - complete_domain_destroy: called when the RCU is quiescent >>>> >>>> The function domain_destroy will setup the RCU callback >>>> (complete_domain_destroy) by calling call_rcu. call_rcu will add the >>>> callback into the RCU list and then will may send an IPI (see >>>> force_quiescent_state) if the threshold reached. This IPI is here to >>>> make sure all CPUs are quiescent before calling the callbacks (e.g >>>> complete_domain_destroy). In my case, the threshold has not reached and >>>> therefore an IPI is not sent. >>> >>> But wait - isn't it the nature of RCU that it may take arbitrary time >>> until the actual call(s) happen(s)? >> >> Today this arbitrary time could be infinite if an idle pCPU does not >> receive an interrupt. So some part of domain resource will never be freed. >> >> If I am power-cycling a domain in loop, after some time the toolstack >> will fail to allocate memory because of exhausted resources. Previous >> instance of the domain was not yet fully destroyed (e.g >> complete_domain_destroy was not called). >> >>> If an upper limit is required by >>> a user of RCU, I think it would need to be that entity to arrange >>> for early expiry. >> >> This is happening with all the user and not only a domain. Looking at >> the code, there are already some upper limit: >> - call_rcu will call force_quiescent_state if the number of element in >> the RCU queue is > 10000 >> - the RCU has a grace period (not sure how long), but no timer to >> ensure the RCU will be called > > This remark in parentheses is quite relevant here, I think: There > simply is no upper bound, aiui. This is a conceptional aspect. But > I'm in no way an RCU expert, so I may well be entirely off. I would be surprised that it is a normal behavior to have an idle pCPU (because of wfi or equivalent instruction on x86) blocking the RCU forever as it is the case today. > >> Reducing the threshold in call_rcu (see qhimark) will not help as you >> may still face memory exhaustion (see above). So I think the only best >> solution is to actually implement properly the grace period. > > Well, with the above in mind - what does "properly" mean here? By properly, I meant that either the idle pCPU should not be taken into account into the grace period or we need a timer (or else) on the idle pCPU to check whether there is some work to do (see rcu_pending). Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 10:50 ` Julien Grall 2017-01-24 11:02 ` Jan Beulich @ 2017-01-24 12:53 ` Dario Faggioli 2017-01-24 13:04 ` Julien Grall 1 sibling, 1 reply; 33+ messages in thread From: Dario Faggioli @ 2017-01-24 12:53 UTC (permalink / raw) To: Julien Grall, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 2101 bytes --] On Tue, 2017-01-24 at 10:50 +0000, Julien Grall wrote: > On 24/01/2017 08:20, Jan Beulich wrote: > > > > > On 23.01.17 at 20:42, <julien.grall@arm.com> wrote: > > > The function domain_destroy will setup the RCU callback > > > (complete_domain_destroy) by calling call_rcu. call_rcu will add > > > the > > > callback into the RCU list and then will may send an IPI (see > > > force_quiescent_state) if the threshold reached. This IPI is here > > > to > > > make sure all CPUs are quiescent before calling the callbacks > > > (e.g > > > complete_domain_destroy). In my case, the threshold has not > > > reached and > > > therefore an IPI is not sent. > > > > But wait - isn't it the nature of RCU that it may take arbitrary > > time > > until the actual call(s) happen(s)? > > Today this arbitrary time could be infinite if an idle pCPU does not > receive an interrupt. So some part of domain resource will never be > freed. > > If I am power-cycling a domain in loop, after some time the > toolstack > will fail to allocate memory because of exhausted resources. > Previous > instance of the domain was not yet fully destroyed (e.g > complete_domain_destroy was not called). > Do you have a script and/or some more info for letting me try to reproduce it (e.g., you say some otf the vCPUs are pinned, which one? etc)? I'm a bit curious about why you're saying this is being exposed by using Credit2. In fact: 1) I've power-cycled quite a few domains in these last months, while under Credit2, and I don't think I have encountered it on x86; 2) I see how it may be related to Credit2 being more deterministic and not trying to schedule stuff around pseudo-randomly like Credit1 does... but I'd like to try investigating a bit more. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 12:53 ` Dario Faggioli @ 2017-01-24 13:04 ` Julien Grall 2017-01-24 13:05 ` Julien Grall 2017-01-24 13:19 ` Dario Faggioli 0 siblings, 2 replies; 33+ messages in thread From: Julien Grall @ 2017-01-24 13:04 UTC (permalink / raw) To: Dario Faggioli, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, xen-devel Hi Dario, On 24/01/17 12:53, Dario Faggioli wrote: > On Tue, 2017-01-24 at 10:50 +0000, Julien Grall wrote: >> On 24/01/2017 08:20, Jan Beulich wrote: >>>>>> On 23.01.17 at 20:42, <julien.grall@arm.com> wrote: >>>> The function domain_destroy will setup the RCU callback >>>> (complete_domain_destroy) by calling call_rcu. call_rcu will add >>>> the >>>> callback into the RCU list and then will may send an IPI (see >>>> force_quiescent_state) if the threshold reached. This IPI is here >>>> to >>>> make sure all CPUs are quiescent before calling the callbacks >>>> (e.g >>>> complete_domain_destroy). In my case, the threshold has not >>>> reached and >>>> therefore an IPI is not sent. >>> >>> But wait - isn't it the nature of RCU that it may take arbitrary >>> time >>> until the actual call(s) happen(s)? >> >> Today this arbitrary time could be infinite if an idle pCPU does not >> receive an interrupt. So some part of domain resource will never be >> freed. >> >> If I am power-cycling a domain in loop, after some time the >> toolstack >> will fail to allocate memory because of exhausted resources. >> Previous >> instance of the domain was not yet fully destroyed (e.g >> complete_domain_destroy was not called). >> > Do you have a script and/or some more info for letting me try to > reproduce it (e.g., you say some otf the vCPUs are pinned, which one? > etc)? That was mentioned in my first e-mail :). My configuration is: - ARM platform with 6 cores - staging Xen with credit2 enabled by default - DOM0 using 2 pinned vCPUs - Guest using 2 vCPUs (not pinned) The script is really simple: for i in `seq 1 10`; do sudo xl create ~/works/guest/guest.cfg; sudo xl destroy guest; done > > I'm a bit curious about why you're saying this is being exposed by > using Credit2. It is been exposed by Credit2 because compared to Credit1 there is no interrupt traffic made by the scheduler. On ARM with credit2 the interrupt traffic is reduced to none for idle pCPU. In fact: > 1) I've power-cycled quite a few domains in these last months, while > under Credit2, and I don't think I have encountered it on x86; AFAIU, IPI is often the only way to broadcast some instruction on x86. So compare to ARM, you have likely an higher interrupt traffic. Also, the problem is not obvious to spot unless you look at the free memory (via xl info) before and after. Another solution is printing a message in both domain_destroy and complete_domain_destroy. You will spot the first message directly. The latter may never be printed. > 2) I see how it may be related to Credit2 being more deterministic > and not trying to schedule stuff around pseudo-randomly like > Credit1 does... but I'd like to try investigating a bit more. I am able to reliable reproduce on a Juno-r2. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 13:04 ` Julien Grall @ 2017-01-24 13:05 ` Julien Grall 2017-01-24 13:19 ` Dario Faggioli 1 sibling, 0 replies; 33+ messages in thread From: Julien Grall @ 2017-01-24 13:05 UTC (permalink / raw) To: Dario Faggioli, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, xen-devel On 24/01/17 13:04, Julien Grall wrote: > Hi Dario, > > On 24/01/17 12:53, Dario Faggioli wrote: >> On Tue, 2017-01-24 at 10:50 +0000, Julien Grall wrote: >>> On 24/01/2017 08:20, Jan Beulich wrote: >>>>>>> On 23.01.17 at 20:42, <julien.grall@arm.com> wrote: >>>>> The function domain_destroy will setup the RCU callback >>>>> (complete_domain_destroy) by calling call_rcu. call_rcu will add >>>>> the >>>>> callback into the RCU list and then will may send an IPI (see >>>>> force_quiescent_state) if the threshold reached. This IPI is here >>>>> to >>>>> make sure all CPUs are quiescent before calling the callbacks >>>>> (e.g >>>>> complete_domain_destroy). In my case, the threshold has not >>>>> reached and >>>>> therefore an IPI is not sent. >>>> >>>> But wait - isn't it the nature of RCU that it may take arbitrary >>>> time >>>> until the actual call(s) happen(s)? >>> >>> Today this arbitrary time could be infinite if an idle pCPU does not >>> receive an interrupt. So some part of domain resource will never be >>> freed. >>> >>> If I am power-cycling a domain in loop, after some time the >>> toolstack >>> will fail to allocate memory because of exhausted resources. >>> Previous >>> instance of the domain was not yet fully destroyed (e.g >>> complete_domain_destroy was not called). >>> >> Do you have a script and/or some more info for letting me try to >> reproduce it (e.g., you say some otf the vCPUs are pinned, which one? >> etc)? > > That was mentioned in my first e-mail :). My configuration is: > - ARM platform with 6 cores > - staging Xen with credit2 enabled by default > - DOM0 using 2 pinned vCPUs To clarify here, DOM0 has only 2 vCPUs. Both are pinned. > - Guest using 2 vCPUs (not pinned) > > The script is really simple: > > for i in `seq 1 10`; do > sudo xl create ~/works/guest/guest.cfg; > sudo xl destroy guest; > done > >> >> I'm a bit curious about why you're saying this is being exposed by >> using Credit2. > > It is been exposed by Credit2 because compared to Credit1 there is no > interrupt traffic made by the scheduler. On ARM with credit2 the > interrupt traffic is reduced to none for idle pCPU. > > In fact: >> 1) I've power-cycled quite a few domains in these last months, while >> under Credit2, and I don't think I have encountered it on x86; > > AFAIU, IPI is often the only way to broadcast some instruction on x86. > So compare to ARM, you have likely an higher interrupt traffic. > > Also, the problem is not obvious to spot unless you look at the free > memory (via xl info) before and after. Another solution is printing a > message in both domain_destroy and complete_domain_destroy. > > You will spot the first message directly. The latter may never be printed. > >> 2) I see how it may be related to Credit2 being more deterministic >> and not trying to schedule stuff around pseudo-randomly like >> Credit1 does... but I'd like to try investigating a bit more. > > I am able to reliable reproduce on a Juno-r2. > > Cheers, > -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 13:04 ` Julien Grall 2017-01-24 13:05 ` Julien Grall @ 2017-01-24 13:19 ` Dario Faggioli 2017-01-24 13:24 ` Julien Grall 1 sibling, 1 reply; 33+ messages in thread From: Dario Faggioli @ 2017-01-24 13:19 UTC (permalink / raw) To: Julien Grall, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 2745 bytes --] On Tue, 2017-01-24 at 13:04 +0000, Julien Grall wrote: > On 24/01/17 12:53, Dario Faggioli wrote: > > Do you have a script and/or some more info for letting me try to > > reproduce it (e.g., you say some otf the vCPUs are pinned, which > > one? > > etc)? > > That was mentioned in my first e-mail :). My configuration is: > - ARM platform with 6 cores > - staging Xen with credit2 enabled by default > - DOM0 using 2 pinned vCPUs > - Guest using 2 vCPUs (not pinned) > Yeah, but some of the details were either missing, or not clear to me... Sorry for bothering and thanks for re-stating this here. :-) How are Dom0 vCPUs pinned, exclusively (i.e., there are 2 pCPUs on which _only_ Dom0 and _no_ DomU can run)? > The script is really simple: > > for i in `seq 1 10`; do > sudo xl create ~/works/guest/guest.cfg; > sudo xl destroy guest; > done > Ok. > > I'm a bit curious about why you're saying this is being exposed by > > using Credit2. > > It is been exposed by Credit2 because compared to Credit1 there is > no > interrupt traffic made by the scheduler. > So, when you say "no interrupt traffic", do you perhaps mean that SCHEDULE_SOFTIRQ is rarely (never!) raised for idle pCPUs? Or are you really talking about actual interrupts (either inter-processor or not)? > On ARM with credit2 the > interrupt traffic is reduced to none for idle pCPU. > Yes, but _iff_ we're talking about SCHEDULE_SOFTIRQ events, for a truly idle pCPU (e.g., if I use vcpu-pin to *forbid* every vCPU to execute there), that's _zero_ also for Credit1, at least on x86 (I've just tried)! Perhaps this is too extreme/unrealistic of an idle situation, but I'm trying to understand the problem. :-) > In fact: > > > > 1) I've power-cycled quite a few domains in these last months, > > while > > under Credit2, and I don't think I have encountered it on x86; > > AFAIU, IPI is often the only way to broadcast some instruction on > x86. > So compare to ARM, you have likely an higher interrupt traffic. > Right. > Also, the problem is not obvious to spot unless you look at the free > memory (via xl info) before and after. Another solution is printing > a > message in both domain_destroy and complete_domain_destroy. > > You will spot the first message directly. The latter may never be > printed. > Yep, I was already instrumenting the code like this... I'll let you know. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 13:19 ` Dario Faggioli @ 2017-01-24 13:24 ` Julien Grall 2017-01-24 13:40 ` Dario Faggioli 0 siblings, 1 reply; 33+ messages in thread From: Julien Grall @ 2017-01-24 13:24 UTC (permalink / raw) To: Dario Faggioli, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, xen-devel On 24/01/17 13:19, Dario Faggioli wrote: > On Tue, 2017-01-24 at 13:04 +0000, Julien Grall wrote: >> On 24/01/17 12:53, Dario Faggioli wrote: >>> Do you have a script and/or some more info for letting me try to >>> reproduce it (e.g., you say some otf the vCPUs are pinned, which >>> one? >>> etc)? >> >> That was mentioned in my first e-mail :). My configuration is: >> - ARM platform with 6 cores >> - staging Xen with credit2 enabled by default >> - DOM0 using 2 pinned vCPUs >> - Guest using 2 vCPUs (not pinned) >> > Yeah, but some of the details were either missing, or not clear to > me... Sorry for bothering and thanks for re-stating this here. :-) > > How are Dom0 vCPUs pinned, exclusively (i.e., there are 2 pCPUs on > which _only_ Dom0 and _no_ DomU can run)? I have dom0_vcpu_pins on Xen command line option (so I guess only pinned?), no further configuration for DOM0. > >> The script is really simple: >> >> for i in `seq 1 10`; do >> sudo xl create ~/works/guest/guest.cfg; >> sudo xl destroy guest; >> done >> > Ok. > >>> I'm a bit curious about why you're saying this is being exposed by >>> using Credit2. >> >> It is been exposed by Credit2 because compared to Credit1 there is >> no >> interrupt traffic made by the scheduler. >> > So, when you say "no interrupt traffic", do you perhaps mean that > SCHEDULE_SOFTIRQ is rarely (never!) raised for idle pCPUs? Or are you > really talking about actual interrupts (either inter-processor or not)? I am talking about actual physical interrupts. The traffic is reduced to none with credit2 on idle pCPU. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 13:24 ` Julien Grall @ 2017-01-24 13:40 ` Dario Faggioli 2017-01-24 13:49 ` Julien Grall 0 siblings, 1 reply; 33+ messages in thread From: Dario Faggioli @ 2017-01-24 13:40 UTC (permalink / raw) To: Julien Grall, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1405 bytes --] On Tue, 2017-01-24 at 13:24 +0000, Julien Grall wrote: > On 24/01/17 13:19, Dario Faggioli wrote: > > How are Dom0 vCPUs pinned, exclusively (i.e., there are 2 pCPUs on > > which _only_ Dom0 and _no_ DomU can run)? > > I have dom0_vcpu_pins on Xen command line option (so I guess only > pinned?), no further configuration for DOM0. > Ok, thanks. Yeah, that means Dom0 vCPU 0 is pinned to pCPU 0, and vCPU 1 is pinned to pCPU 1. And it's not excluside, i.e., other domains can run on pCPUs 0 and 1, if the scheduler decides so (because they're free, because the scheduler decides to preempt Dom0, etc). This is of course fine, I just wanted to make sure I was understanding the setup. > > So, when you say "no interrupt traffic", do you perhaps mean that > > SCHEDULE_SOFTIRQ is rarely (never!) raised for idle pCPUs? Or are > > you > > really talking about actual interrupts (either inter-processor or > > not)? > > I am talking about actual physical interrupts. The traffic is reduced > to > none with credit2 on idle pCPU. > Ah, wow... And how --forgive my naiveness-- do you measure / check that? Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 13:40 ` Dario Faggioli @ 2017-01-24 13:49 ` Julien Grall 2017-01-24 14:16 ` Dario Faggioli 0 siblings, 1 reply; 33+ messages in thread From: Julien Grall @ 2017-01-24 13:49 UTC (permalink / raw) To: Dario Faggioli, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, xen-devel Hi, On 24/01/17 13:40, Dario Faggioli wrote: > On Tue, 2017-01-24 at 13:24 +0000, Julien Grall wrote: >> On 24/01/17 13:19, Dario Faggioli wrote: >>> How are Dom0 vCPUs pinned, exclusively (i.e., there are 2 pCPUs on >>> which _only_ Dom0 and _no_ DomU can run)? >> >> I have dom0_vcpu_pins on Xen command line option (so I guess only >> pinned?), no further configuration for DOM0. >> > Ok, thanks. Yeah, that means Dom0 vCPU 0 is pinned to pCPU 0, and vCPU > 1 is pinned to pCPU 1. And it's not excluside, i.e., other domains can > run on pCPUs 0 and 1, if the scheduler decides so (because they're > free, because the scheduler decides to preempt Dom0, etc). > > This is of course fine, I just wanted to make sure I was understanding > the setup. > >>> So, when you say "no interrupt traffic", do you perhaps mean that >>> SCHEDULE_SOFTIRQ is rarely (never!) raised for idle pCPUs? Or are >>> you >>> really talking about actual interrupts (either inter-processor or >>> not)? >> >> I am talking about actual physical interrupts. The traffic is reduced >> to >> none with credit2 on idle pCPU. >> > Ah, wow... And how --forgive my naiveness-- do you measure / check > that? I added a print in the interrupt path (gic_interrupt for ARM) to dump the interrupt number. This needs to be restrict to CPU2 and above to avoid been flooded: if ( smp_processor_id() > 1 ) printk("%s: CPU%u IRQ%u\n", __FUNCTION__, smp_processor_id(), irq); I also added a print in the idle loop before and after the idling instruction (wfi for ARM, pm_idle for x86 I think). You can see the CPU to go in idle mode but never coming back. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 13:49 ` Julien Grall @ 2017-01-24 14:16 ` Dario Faggioli 2017-01-24 15:06 ` Julien Grall 0 siblings, 1 reply; 33+ messages in thread From: Dario Faggioli @ 2017-01-24 14:16 UTC (permalink / raw) To: Julien Grall, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1305 bytes --] On Tue, 2017-01-24 at 13:49 +0000, Julien Grall wrote: > On 24/01/17 13:40, Dario Faggioli wrote: > > Ah, wow... And how --forgive my naiveness-- do you measure / check > > that? > > I added a print in the interrupt path (gic_interrupt for ARM) to > dump > the interrupt number. This needs to be restrict to CPU2 and above to > avoid been flooded: > > if ( smp_processor_id() > 1 ) > printk("%s: CPU%u IRQ%u\n", __FUNCTION__, smp_processor_id(), > irq); > Ok. > I also added a print in the idle loop before and after the idling > instruction (wfi for ARM, pm_idle for x86 I think). You can see the > CPU > to go in idle mode but never coming back. > I see. Yes, this is very different on x86. There, we have tracing (BTW, did that made it to ARM eventually?) and there's TRC_PM_IDLE_ENTRY/EXIT which do pretty much the same of your printk-s. And if I look at it, I do see even totally idle (from the scheduler point of view) pCPUs, I indeed see them going back and forth from and to C3. Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 14:16 ` Dario Faggioli @ 2017-01-24 15:06 ` Julien Grall 2017-01-25 11:10 ` Dario Faggioli 0 siblings, 1 reply; 33+ messages in thread From: Julien Grall @ 2017-01-24 15:06 UTC (permalink / raw) To: Dario Faggioli, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, xen-devel On 24/01/17 14:16, Dario Faggioli wrote: > On Tue, 2017-01-24 at 13:49 +0000, Julien Grall wrote: >> On 24/01/17 13:40, Dario Faggioli wrote: >>> Ah, wow... And how --forgive my naiveness-- do you measure / check >>> that? >> >> I added a print in the interrupt path (gic_interrupt for ARM) to >> dump >> the interrupt number. This needs to be restrict to CPU2 and above to >> avoid been flooded: >> >> if ( smp_processor_id() > 1 ) >> printk("%s: CPU%u IRQ%u\n", __FUNCTION__, smp_processor_id(), >> irq); >> > Ok. > >> I also added a print in the idle loop before and after the idling >> instruction (wfi for ARM, pm_idle for x86 I think). You can see the >> CPU >> to go in idle mode but never coming back. >> > I see. Yes, this is very different on x86. > > There, we have tracing (BTW, did that made it to ARM eventually?) and > there's TRC_PM_IDLE_ENTRY/EXIT which do pretty much the same of your > printk-s. There is patch on the ML for xentrace support (see [1]) but nothing has been upstreamed yet. Waiting for a new version from the contributor. > > And if I look at it, I do see even totally idle (from the scheduler > point of view) pCPUs, I indeed see them going back and forth from and > to C3. My knowledge on x86 is limited. When does a CPU decides to leave the idle mode? In the case of ARM, the wfi instruction will put the CPU in idle mode until an interrupt is received. Cheers, [1] https://lists.xenproject.org/archives/html/xen-devel/2016-04/msg00464.html -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-24 15:06 ` Julien Grall @ 2017-01-25 11:10 ` Dario Faggioli 2017-01-25 12:38 ` Julien Grall 0 siblings, 1 reply; 33+ messages in thread From: Dario Faggioli @ 2017-01-25 11:10 UTC (permalink / raw) To: Julien Grall, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 2492 bytes --] On Tue, 2017-01-24 at 15:06 +0000, Julien Grall wrote: > On 24/01/17 14:16, Dario Faggioli wrote: > > There, we have tracing (BTW, did that made it to ARM eventually?) > > and > > there's TRC_PM_IDLE_ENTRY/EXIT which do pretty much the same of > > your > > printk-s. > > There is patch on the ML for xentrace support (see [1]) but nothing > has > been upstreamed yet. Waiting for a new version from the contributor. > Yep, that was I was remembering, and referring to. Thanks for the update. > > And if I look at it, I do see even totally idle (from the scheduler > > point of view) pCPUs, I indeed see them going back and forth from > > and > > to C3. > > My knowledge on x86 is limited. When does a CPU decides to leave the > idle mode? > I'm not an expert of that part either. Jan and Andrew for sure know best how monitor/mwait works (both in general, and our own implementation). What I know (and can quickly infer from glancing at the code), is that timers are certainly involved. In fact, we wake up when the most imminent timer would expire (see mwait_idle_with_hints()), and a timer set by the scheduler fully qualifies as being the one (if it's the most imminent). My point was that, still from scheduling perspective, neither Credit1 nor Credit2 sets a wakeup timer for idle pCPUs. Well, in Credit1, the master_ticker timer is never stopped (while, e.g., the per-pCPU tick is stopped before entering deep sleep, via sched_tick_suspend(), see commit 964fae8ac), but that's only 1 pCPU. > In the case of ARM, the wfi instruction will put the CPU in idle > mode > until an interrupt is received. > Just looking up references for MWAIT, I've found this: (http://x86.renejeschke.de/html/file_module_x86_id_215.html) "A store to the address range armed by the MONITOR instruction, an interrupt, an NMI or SMI, a debug exception, a machine check exception, the BINIT# signal, the INIT# signal, or the RESET# signal will exit the implementation-dependent-optimized state. Note that an interrupt will cause the processor to exit only if the state was entered with interrupts enabled." So, yeah, interrupt, as expectable, wakes x86 up. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-25 11:10 ` Dario Faggioli @ 2017-01-25 12:38 ` Julien Grall 2017-01-25 12:40 ` Andrew Cooper 2017-01-25 16:00 ` Dario Faggioli 0 siblings, 2 replies; 33+ messages in thread From: Julien Grall @ 2017-01-25 12:38 UTC (permalink / raw) To: Dario Faggioli, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, xen-devel Hi Dario, On 25/01/17 11:10, Dario Faggioli wrote: > On Tue, 2017-01-24 at 15:06 +0000, Julien Grall wrote: >> On 24/01/17 14:16, Dario Faggioli wrote: >>> There, we have tracing (BTW, did that made it to ARM eventually?) >>> and >>> there's TRC_PM_IDLE_ENTRY/EXIT which do pretty much the same of >>> your >>> printk-s. >> >> There is patch on the ML for xentrace support (see [1]) but nothing >> has >> been upstreamed yet. Waiting for a new version from the contributor. >> > Yep, that was I was remembering, and referring to. Thanks for the > update. > >>> And if I look at it, I do see even totally idle (from the scheduler >>> point of view) pCPUs, I indeed see them going back and forth from >>> and >>> to C3. >> >> My knowledge on x86 is limited. When does a CPU decides to leave the >> idle mode? >> > I'm not an expert of that part either. Jan and Andrew for sure know > best how monitor/mwait works (both in general, and our own > implementation). > > What I know (and can quickly infer from glancing at the code), is that > timers are certainly involved. > > In fact, we wake up when the most imminent timer would expire (see > mwait_idle_with_hints()), and a timer set by the scheduler fully > qualifies as being the one (if it's the most imminent). > > My point was that, still from scheduling perspective, neither Credit1 > nor Credit2 sets a wakeup timer for idle pCPUs. > > Well, in Credit1, the master_ticker timer is never stopped (while, > e.g., the per-pCPU tick is stopped before entering deep sleep, > via sched_tick_suspend(), see commit 964fae8ac), but that's only 1 > pCPU. The function sched_tick_suspend is never called on ARM. The power saving in Xen ARM is still very limited and this would need to be updated in the future. So I guess that's why I still see interrupt coming on the idle pCPU when credit1 is used. Looking at credit2, the callback tick_suspend is not called. Does it mean there is no per-pCPU timer? Now, from my understanding, if we decide to call sched_tick_suspend on ARM before idling. We will likely have the same problem with credit1 because there is no more interrupt to wake-up the pCPU. But I don't think this is an issue in the scheduler. IHMO, the problem is in the RCU. Indeed a CPU in lower power mode (i.e wfi on ARM or pm_idle on x86 is been executed) will never get out to tell to the RCU : "I am quiet, go ahead". So the RCU will never be able to reclaim the memory and will result on a memory exhaustion if the pCPU never receive an interrupt (this could happen if pCPU has never ran a guest). The question now, is how to fix it? Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-25 12:38 ` Julien Grall @ 2017-01-25 12:40 ` Andrew Cooper 2017-01-25 14:23 ` Julien Grall 2017-01-25 16:00 ` Dario Faggioli 1 sibling, 1 reply; 33+ messages in thread From: Andrew Cooper @ 2017-01-25 12:40 UTC (permalink / raw) To: Julien Grall, Dario Faggioli, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan, Ian Jackson, xen-devel On 25/01/17 12:38, Julien Grall wrote: > Hi Dario, > > On 25/01/17 11:10, Dario Faggioli wrote: >> On Tue, 2017-01-24 at 15:06 +0000, Julien Grall wrote: >>> On 24/01/17 14:16, Dario Faggioli wrote: >>>> There, we have tracing (BTW, did that made it to ARM eventually?) >>>> and >>>> there's TRC_PM_IDLE_ENTRY/EXIT which do pretty much the same of >>>> your >>>> printk-s. >>> >>> There is patch on the ML for xentrace support (see [1]) but nothing >>> has >>> been upstreamed yet. Waiting for a new version from the contributor. >>> >> Yep, that was I was remembering, and referring to. Thanks for the >> update. >> >>>> And if I look at it, I do see even totally idle (from the scheduler >>>> point of view) pCPUs, I indeed see them going back and forth from >>>> and >>>> to C3. >>> >>> My knowledge on x86 is limited. When does a CPU decides to leave the >>> idle mode? >>> >> I'm not an expert of that part either. Jan and Andrew for sure know >> best how monitor/mwait works (both in general, and our own >> implementation). >> >> What I know (and can quickly infer from glancing at the code), is that >> timers are certainly involved. >> >> In fact, we wake up when the most imminent timer would expire (see >> mwait_idle_with_hints()), and a timer set by the scheduler fully >> qualifies as being the one (if it's the most imminent). >> >> My point was that, still from scheduling perspective, neither Credit1 >> nor Credit2 sets a wakeup timer for idle pCPUs. >> >> Well, in Credit1, the master_ticker timer is never stopped (while, >> e.g., the per-pCPU tick is stopped before entering deep sleep, >> via sched_tick_suspend(), see commit 964fae8ac), but that's only 1 >> pCPU. > > The function sched_tick_suspend is never called on ARM. The power > saving in Xen ARM is still very limited and this would need to be > updated in the future. > > So I guess that's why I still see interrupt coming on the idle pCPU > when credit1 is used. Looking at credit2, the callback tick_suspend is > not called. Does it mean there is no per-pCPU timer? > > Now, from my understanding, if we decide to call sched_tick_suspend on > ARM before idling. We will likely have the same problem with credit1 > because there is no more interrupt to wake-up the pCPU. > > But I don't think this is an issue in the scheduler. IHMO, the problem > is in the RCU. Indeed a CPU in lower power mode (i.e wfi on ARM or > pm_idle on x86 is been executed) will never get out to tell to the RCU > : "I am quiet, go ahead". So the RCU will never be able to reclaim the > memory and will result on a memory exhaustion if the pCPU never > receive an interrupt (this could happen if pCPU has never ran a guest). Yes. This is a core problem, not ARM specific. x86 is saved by the time calibration rendezvous which IPIs all cores every 1s. > > The question now, is how to fix it? This is going to involve a better understanding of how RCU is supposed to work. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-25 12:40 ` Andrew Cooper @ 2017-01-25 14:23 ` Julien Grall 0 siblings, 0 replies; 33+ messages in thread From: Julien Grall @ 2017-01-25 14:23 UTC (permalink / raw) To: Andrew Cooper, Dario Faggioli, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan, Ian Jackson, xen-devel On 25/01/17 12:40, Andrew Cooper wrote: > On 25/01/17 12:38, Julien Grall wrote: >> Hi Dario, >> >> On 25/01/17 11:10, Dario Faggioli wrote: >>> On Tue, 2017-01-24 at 15:06 +0000, Julien Grall wrote: >>>> On 24/01/17 14:16, Dario Faggioli wrote: >>>>> There, we have tracing (BTW, did that made it to ARM eventually?) >>>>> and >>>>> there's TRC_PM_IDLE_ENTRY/EXIT which do pretty much the same of >>>>> your >>>>> printk-s. >>>> >>>> There is patch on the ML for xentrace support (see [1]) but nothing >>>> has >>>> been upstreamed yet. Waiting for a new version from the contributor. >>>> >>> Yep, that was I was remembering, and referring to. Thanks for the >>> update. >>> >>>>> And if I look at it, I do see even totally idle (from the scheduler >>>>> point of view) pCPUs, I indeed see them going back and forth from >>>>> and >>>>> to C3. >>>> >>>> My knowledge on x86 is limited. When does a CPU decides to leave the >>>> idle mode? >>>> >>> I'm not an expert of that part either. Jan and Andrew for sure know >>> best how monitor/mwait works (both in general, and our own >>> implementation). >>> >>> What I know (and can quickly infer from glancing at the code), is that >>> timers are certainly involved. >>> >>> In fact, we wake up when the most imminent timer would expire (see >>> mwait_idle_with_hints()), and a timer set by the scheduler fully >>> qualifies as being the one (if it's the most imminent). >>> >>> My point was that, still from scheduling perspective, neither Credit1 >>> nor Credit2 sets a wakeup timer for idle pCPUs. >>> >>> Well, in Credit1, the master_ticker timer is never stopped (while, >>> e.g., the per-pCPU tick is stopped before entering deep sleep, >>> via sched_tick_suspend(), see commit 964fae8ac), but that's only 1 >>> pCPU. >> >> The function sched_tick_suspend is never called on ARM. The power >> saving in Xen ARM is still very limited and this would need to be >> updated in the future. >> >> So I guess that's why I still see interrupt coming on the idle pCPU >> when credit1 is used. Looking at credit2, the callback tick_suspend is >> not called. Does it mean there is no per-pCPU timer? >> >> Now, from my understanding, if we decide to call sched_tick_suspend on >> ARM before idling. We will likely have the same problem with credit1 >> because there is no more interrupt to wake-up the pCPU. >> >> But I don't think this is an issue in the scheduler. IHMO, the problem >> is in the RCU. Indeed a CPU in lower power mode (i.e wfi on ARM or >> pm_idle on x86 is been executed) will never get out to tell to the RCU >> : "I am quiet, go ahead". So the RCU will never be able to reclaim the >> memory and will result on a memory exhaustion if the pCPU never >> receive an interrupt (this could happen if pCPU has never ran a guest). > > Yes. This is a core problem, not ARM specific. > > x86 is saved by the time calibration rendezvous which IPIs all cores > every 1s. > >> >> The question now, is how to fix it? > > > This is going to involve a better understanding of how RCU is supposed > to work. I think we all agree that someone needs to kick the other pCPU to check whether the RCU is been used. Looking at the documentation of our RCU code ([1]), section "RCU Implementations" it seems that we are expecting a timer to kick periodically pCPU and if there is some RCU work pending. [1] http://lse.sourceforge.net/locking/rcupdate.html > > ~Andrew > -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-25 12:38 ` Julien Grall 2017-01-25 12:40 ` Andrew Cooper @ 2017-01-25 16:00 ` Dario Faggioli 2017-01-31 16:30 ` Julien Grall 1 sibling, 1 reply; 33+ messages in thread From: Dario Faggioli @ 2017-01-25 16:00 UTC (permalink / raw) To: Julien Grall, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 2679 bytes --] On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote: > Hi Dario, > Hey, > On 25/01/17 11:10, Dario Faggioli wrote: > > My point was that, still from scheduling perspective, neither > > Credit1 > > nor Credit2 sets a wakeup timer for idle pCPUs. > > > > Well, in Credit1, the master_ticker timer is never stopped (while, > > e.g., the per-pCPU tick is stopped before entering deep sleep, > > via sched_tick_suspend(), see commit 964fae8ac), but that's only 1 > > pCPU. > > The function sched_tick_suspend is never called on ARM. The power > saving > in Xen ARM is still very limited and this would need to be updated > in > the future. > > So I guess that's why I still see interrupt coming on the idle pCPU > when > credit1 is used. > Yes. If you don't suspend the tick before going to wfi/hlt/whatever, there will be a timer firing --and AFAICT waking you up from the low power state-- every 10ms (with default Credit1 timeslice), even for idle pCPUs. > Looking at credit2, the callback tick_suspend is not > called. Does it mean there is no per-pCPU timer? > Exactly, we (happily) don't need that in Credit2. :-) > Now, from my understanding, if we decide to call sched_tick_suspend > on > ARM before idling. We will likely have the same problem with credit1 > because there is no more interrupt to wake-up the pCPU. > Basing on what you've said so far in this thread, I tend to think that, yes, that would be the case. > But I don't think this is an issue in the scheduler. > Agreed. > IHMO, the problem > is in the RCU. Indeed a CPU in lower power mode (i.e wfi on ARM or > pm_idle on x86 is been executed) will never get out to tell to the > RCU : > "I am quiet, go ahead". So the RCU will never be able to reclaim the > memory and will result on a memory exhaustion if the pCPU never > receive > an interrupt (this could happen if pCPU has never ran a guest). > > The question now, is how to fix it? > And a good one. I may be wrong (I certainly wasn't around at the time), but ISTR out RCU code is imported/inspired by Linux... Looking there again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra monster, with 100 heads and sharpen claws! :-O And, while, in there, it has to be like that, I don't think we need all such complexity, and hence we can't just re-sync. :-/ Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-25 16:00 ` Dario Faggioli @ 2017-01-31 16:30 ` Julien Grall 2017-01-31 22:10 ` Stefano Stabellini 2017-02-01 18:21 ` Wei Liu 0 siblings, 2 replies; 33+ messages in thread From: Julien Grall @ 2017-01-31 16:30 UTC (permalink / raw) To: Dario Faggioli, Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan, xen-devel Hi Dario, On 25/01/17 16:00, Dario Faggioli wrote: > On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote: >> On 25/01/17 11:10, Dario Faggioli wrote: > And a good one. I may be wrong (I certainly wasn't around at the time), > but ISTR out RCU code is imported/inspired by Linux... Looking there > again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra > monster, with 100 heads and sharpen claws! :-O > > And, while, in there, it has to be like that, I don't think we need all > such complexity, and hence we can't just re-sync. :-/ Yeah, even the tiny RCU code is quite complex :/. I've looked at our RCU code and noticed there is a link in the header to [1]. It seems to be a documentation about the RCU code we used. From my understanding of the "RCU Implementations", the authors are expecting a timer to kick periodically pCPU and check if there is some RCU work pending. We could add this timer but it would prevent an idle pCPU to stay in low power mode for a long time. Another solution would be to send an interrupt to each pCPU when call_rcu is called rather depending on a mark. Although this would still wake-up the pCPU even it was doing nothing. Any better ideas? Cheers, [1] http://lse.sourceforge.net/locking/rcupdate.html -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-31 16:30 ` Julien Grall @ 2017-01-31 22:10 ` Stefano Stabellini 2017-02-01 18:21 ` Wei Liu 1 sibling, 0 replies; 33+ messages in thread From: Stefano Stabellini @ 2017-01-31 22:10 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Dario Faggioli, Ian Jackson, Tim Deegan, Jan Beulich, xen-devel On Tue, 31 Jan 2017, Julien Grall wrote: > Hi Dario, > > On 25/01/17 16:00, Dario Faggioli wrote: > > On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote: > > > On 25/01/17 11:10, Dario Faggioli wrote: > > And a good one. I may be wrong (I certainly wasn't around at the time), > > but ISTR out RCU code is imported/inspired by Linux... Looking there > > again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra > > monster, with 100 heads and sharpen claws! :-O > > > > And, while, in there, it has to be like that, I don't think we need all > > such complexity, and hence we can't just re-sync. :-/ > > Yeah, even the tiny RCU code is quite complex :/. I've looked at our RCU code > and noticed there is a link in the header to [1]. > > It seems to be a documentation about the RCU code we used. From my > understanding of the "RCU Implementations", the authors are expecting a timer > to kick periodically pCPU and check if there is some RCU work pending. > > We could add this timer but it would prevent an idle pCPU to stay in low power > mode for a long time. Another solution would be to send an interrupt to each > pCPU when call_rcu is called rather depending on a mark. Although this would > still wake-up the pCPU even it was doing nothing. > > Any better ideas? Julien, thanks for looking into this. Instead of the RCU, could we send an interrupt to all pCPU *not* in idle mode? We could have a shared bitmask in memory with all pCPUs currently sleeping. > Cheers, > > [1] http://lse.sourceforge.net/locking/rcupdate.html > > -- > Julien Grall > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-01-31 16:30 ` Julien Grall 2017-01-31 22:10 ` Stefano Stabellini @ 2017-02-01 18:21 ` Wei Liu 2017-02-02 11:22 ` Jan Beulich 1 sibling, 1 reply; 33+ messages in thread From: Wei Liu @ 2017-02-01 18:21 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper, Dario Faggioli, Ian Jackson, Tim Deegan, Jan Beulich, xen-devel On Tue, Jan 31, 2017 at 04:30:50PM +0000, Julien Grall wrote: > Hi Dario, > > On 25/01/17 16:00, Dario Faggioli wrote: > > On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote: > > > On 25/01/17 11:10, Dario Faggioli wrote: > > And a good one. I may be wrong (I certainly wasn't around at the time), > > but ISTR out RCU code is imported/inspired by Linux... Looking there > > again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra > > monster, with 100 heads and sharpen claws! :-O > > > > And, while, in there, it has to be like that, I don't think we need all > > such complexity, and hence we can't just re-sync. :-/ > > Yeah, even the tiny RCU code is quite complex :/. I've looked at our RCU > code and noticed there is a link in the header to [1]. > > It seems to be a documentation about the RCU code we used. From my > understanding of the "RCU Implementations", the authors are expecting a > timer to kick periodically pCPU and check if there is some RCU work pending. > > We could add this timer but it would prevent an idle pCPU to stay in low > power mode for a long time. Another solution would be to send an interrupt > to each pCPU when call_rcu is called rather depending on a mark. Although > this would still wake-up the pCPU even it was doing nothing. > > Any better ideas? > Worth checking all the RCU docs in Linux (Documentation/RCU). I think there are descriptions about idle or no-tick variants. It would be useful to know how Linux handles this. I suspect RCU in Linux is more capable than the one in Xen... Wei. > Cheers, > > [1] http://lse.sourceforge.net/locking/rcupdate.html > > -- > Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-02-01 18:21 ` Wei Liu @ 2017-02-02 11:22 ` Jan Beulich 2017-02-02 11:53 ` Wei Liu 2017-02-02 12:01 ` Dario Faggioli 0 siblings, 2 replies; 33+ messages in thread From: Jan Beulich @ 2017-02-02 11:22 UTC (permalink / raw) To: Julien Grall, Wei Liu Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Dario Faggioli, Ian Jackson, Tim Deegan, xen-devel >>> On 01.02.17 at 19:21, <wei.liu2@citrix.com> wrote: > On Tue, Jan 31, 2017 at 04:30:50PM +0000, Julien Grall wrote: >> Hi Dario, >> >> On 25/01/17 16:00, Dario Faggioli wrote: >> > On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote: >> > > On 25/01/17 11:10, Dario Faggioli wrote: >> > And a good one. I may be wrong (I certainly wasn't around at the time), >> > but ISTR out RCU code is imported/inspired by Linux... Looking there >> > again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra >> > monster, with 100 heads and sharpen claws! :-O >> > >> > And, while, in there, it has to be like that, I don't think we need all >> > such complexity, and hence we can't just re-sync. :-/ >> >> Yeah, even the tiny RCU code is quite complex :/. I've looked at our RCU >> code and noticed there is a link in the header to [1]. >> >> It seems to be a documentation about the RCU code we used. From my >> understanding of the "RCU Implementations", the authors are expecting a >> timer to kick periodically pCPU and check if there is some RCU work pending. >> >> We could add this timer but it would prevent an idle pCPU to stay in low >> power mode for a long time. Another solution would be to send an interrupt >> to each pCPU when call_rcu is called rather depending on a mark. Although >> this would still wake-up the pCPU even it was doing nothing. >> >> Any better ideas? > > Worth checking all the RCU docs in Linux (Documentation/RCU). > > I think there are descriptions about idle or no-tick variants. It would > be useful to know how Linux handles this. I suspect RCU in Linux is more > capable than the one in Xen... Isn't all we need an rcu_idle_{enter,exit}() implementation (and of course calls to them placed where needed)? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-02-02 11:22 ` Jan Beulich @ 2017-02-02 11:53 ` Wei Liu 2017-02-02 12:18 ` Julien Grall 2017-02-02 12:01 ` Dario Faggioli 1 sibling, 1 reply; 33+ messages in thread From: Wei Liu @ 2017-02-02 11:53 UTC (permalink / raw) To: Jan Beulich Cc: Stefano Stabellini, Wei Liu, George Dunlap, AndrewCooper, Dario Faggioli, Ian Jackson, Tim Deegan, Julien Grall, xen-devel On Thu, Feb 02, 2017 at 04:22:53AM -0700, Jan Beulich wrote: > >>> On 01.02.17 at 19:21, <wei.liu2@citrix.com> wrote: > > On Tue, Jan 31, 2017 at 04:30:50PM +0000, Julien Grall wrote: > >> Hi Dario, > >> > >> On 25/01/17 16:00, Dario Faggioli wrote: > >> > On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote: > >> > > On 25/01/17 11:10, Dario Faggioli wrote: > >> > And a good one. I may be wrong (I certainly wasn't around at the time), > >> > but ISTR out RCU code is imported/inspired by Linux... Looking there > >> > again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra > >> > monster, with 100 heads and sharpen claws! :-O > >> > > >> > And, while, in there, it has to be like that, I don't think we need all > >> > such complexity, and hence we can't just re-sync. :-/ > >> > >> Yeah, even the tiny RCU code is quite complex :/. I've looked at our RCU > >> code and noticed there is a link in the header to [1]. > >> > >> It seems to be a documentation about the RCU code we used. From my > >> understanding of the "RCU Implementations", the authors are expecting a > >> timer to kick periodically pCPU and check if there is some RCU work pending. > >> > >> We could add this timer but it would prevent an idle pCPU to stay in low > >> power mode for a long time. Another solution would be to send an interrupt > >> to each pCPU when call_rcu is called rather depending on a mark. Although > >> this would still wake-up the pCPU even it was doing nothing. > >> > >> Any better ideas? > > > > Worth checking all the RCU docs in Linux (Documentation/RCU). > > > > I think there are descriptions about idle or no-tick variants. It would > > be useful to know how Linux handles this. I suspect RCU in Linux is more > > capable than the one in Xen... > > Isn't all we need an rcu_idle_{enter,exit}() implementation (and of > course calls to them placed where needed)? > I'm no RCU expert, but having checked Linux source code and the documentation of rcu_idle_{enter,exit}, what you said makes sense to me. Wei. > Jan > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-02-02 11:53 ` Wei Liu @ 2017-02-02 12:18 ` Julien Grall 2017-02-02 12:51 ` Dario Faggioli 0 siblings, 1 reply; 33+ messages in thread From: Julien Grall @ 2017-02-02 12:18 UTC (permalink / raw) To: Wei Liu, Jan Beulich, Dario Faggioli Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan, xen-devel, Ian Jackson Hi, On 02/02/17 11:53, Wei Liu wrote: > On Thu, Feb 02, 2017 at 04:22:53AM -0700, Jan Beulich wrote: >>>>> On 01.02.17 at 19:21, <wei.liu2@citrix.com> wrote: >>> On Tue, Jan 31, 2017 at 04:30:50PM +0000, Julien Grall wrote: >>>> Hi Dario, >>>> >>>> On 25/01/17 16:00, Dario Faggioli wrote: >>>>> On Wed, 2017-01-25 at 12:38 +0000, Julien Grall wrote: >>>>>> On 25/01/17 11:10, Dario Faggioli wrote: >>>>> And a good one. I may be wrong (I certainly wasn't around at the time), >>>>> but ISTR out RCU code is imported/inspired by Linux... Looking there >>>>> again may help, but, nowadays, Linux RCU subsystem is a Lernaean Hydra >>>>> monster, with 100 heads and sharpen claws! :-O >>>>> >>>>> And, while, in there, it has to be like that, I don't think we need all >>>>> such complexity, and hence we can't just re-sync. :-/ >>>> >>>> Yeah, even the tiny RCU code is quite complex :/. I've looked at our RCU >>>> code and noticed there is a link in the header to [1]. >>>> >>>> It seems to be a documentation about the RCU code we used. From my >>>> understanding of the "RCU Implementations", the authors are expecting a >>>> timer to kick periodically pCPU and check if there is some RCU work pending. >>>> >>>> We could add this timer but it would prevent an idle pCPU to stay in low >>>> power mode for a long time. Another solution would be to send an interrupt >>>> to each pCPU when call_rcu is called rather depending on a mark. Although >>>> this would still wake-up the pCPU even it was doing nothing. >>>> >>>> Any better ideas? >>> >>> Worth checking all the RCU docs in Linux (Documentation/RCU). >>> >>> I think there are descriptions about idle or no-tick variants. It would >>> be useful to know how Linux handles this. I suspect RCU in Linux is more >>> capable than the one in Xen... >> >> Isn't all we need an rcu_idle_{enter,exit}() implementation (and of >> course calls to them placed where needed)? >> > > I'm no RCU expert, but having checked Linux source code and the > documentation of rcu_idle_{enter,exit}, what you said makes sense to me. And the doc seems to confirm that (see Documentation/RCU/rcu.txt): "Just as with spinlocks, RCU readers are not permitted to block, switch to user-mode execution, or enter the idle loop. Therefore, as soon as a CPU is seen passing through any of these three states, we know that that CPU has exited any previous RCU read-side critical sections. So, if we remove an item from a linked list, and then wait until all CPUs have switched context, executed in user mode, or executed in the idle loop, we can safely free up that item." Dario, are you going to look into the issue? Or shall I try to write a patch for it? Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-02-02 12:18 ` Julien Grall @ 2017-02-02 12:51 ` Dario Faggioli 2017-02-02 13:26 ` Julien Grall 0 siblings, 1 reply; 33+ messages in thread From: Dario Faggioli @ 2017-02-02 12:51 UTC (permalink / raw) To: Julien Grall, Wei Liu, Jan Beulich Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan, xen-devel, Ian Jackson [-- Attachment #1.1: Type: text/plain, Size: 711 bytes --] On Thu, 2017-02-02 at 12:18 +0000, Julien Grall wrote: > Dario, are you going to look into the issue? Or shall I try to write > a > patch for it? > I'd be up for looking into this. BUT, I'm travelling this weekend, and am probably going to be busy next week (sorry). So, I expect to be able to do something useful only, let's stay, from Mon 13th. If that's ok, do sign me up. If you're more in a hurry, feel free to beat me. :-) Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-02-02 12:51 ` Dario Faggioli @ 2017-02-02 13:26 ` Julien Grall 2017-02-02 13:32 ` Dario Faggioli 0 siblings, 1 reply; 33+ messages in thread From: Julien Grall @ 2017-02-02 13:26 UTC (permalink / raw) To: Dario Faggioli, Wei Liu, Jan Beulich Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan, xen-devel, Ian Jackson Hi Dario, On 02/02/17 12:51, Dario Faggioli wrote: > On Thu, 2017-02-02 at 12:18 +0000, Julien Grall wrote: >> Dario, are you going to look into the issue? Or shall I try to write >> a >> patch for it? >> > I'd be up for looking into this. BUT, I'm travelling this weekend, and > am probably going to be busy next week (sorry). > > So, I expect to be able to do something useful only, let's stay, from > Mon 13th. If that's ok, do sign me up. If you're more in a hurry, feel > free to beat me. :-) I have plenty of others things to do, and will happily let you handle this. It is not urgent, thought it will be good to have it fixed for Xen 4.9 :). Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-02-02 13:26 ` Julien Grall @ 2017-02-02 13:32 ` Dario Faggioli 2017-03-28 18:30 ` Julien Grall 0 siblings, 1 reply; 33+ messages in thread From: Dario Faggioli @ 2017-02-02 13:32 UTC (permalink / raw) To: Julien Grall, Wei Liu, Jan Beulich Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan, xen-devel, Ian Jackson [-- Attachment #1.1: Type: text/plain, Size: 905 bytes --] On Thu, 2017-02-02 at 13:26 +0000, Julien Grall wrote: > On 02/02/17 12:51, Dario Faggioli wrote: > > So, I expect to be able to do something useful only, let's stay, > > from > > Mon 13th. If that's ok, do sign me up. If you're more in a hurry, > > feel > > free to beat me. :-) > > I have plenty of others things to do, and will happily let you > handle > this. It is not urgent, thought it will be good to have it fixed for > Xen > 4.9 :). > Ok, sign me up for it then. We absolutely want it for 4.9, I agree. Track it in your RM emails, with my name on it, if you want. I'll cry if I need help. :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-02-02 13:32 ` Dario Faggioli @ 2017-03-28 18:30 ` Julien Grall 2017-03-30 7:38 ` Dario Faggioli 0 siblings, 1 reply; 33+ messages in thread From: Julien Grall @ 2017-03-28 18:30 UTC (permalink / raw) To: Dario Faggioli, Wei Liu, Jan Beulich Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan, xen-devel, Ian Jackson Hi Dario, On 02/02/17 13:32, Dario Faggioli wrote: > On Thu, 2017-02-02 at 13:26 +0000, Julien Grall wrote: >> On 02/02/17 12:51, Dario Faggioli wrote: >>> So, I expect to be able to do something useful only, let's stay, >>> from >>> Mon 13th. If that's ok, do sign me up. If you're more in a hurry, >>> feel >>> free to beat me. :-) >> >> I have plenty of others things to do, and will happily let you >> handle >> this. It is not urgent, thought it will be good to have it fixed for >> Xen >> 4.9 :). >> > Ok, sign me up for it then. We absolutely want it for 4.9, I agree. Do you have any update on this? This would allow us to use credit 2 on ARM when physical processor are idle. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-03-28 18:30 ` Julien Grall @ 2017-03-30 7:38 ` Dario Faggioli 0 siblings, 0 replies; 33+ messages in thread From: Dario Faggioli @ 2017-03-30 7:38 UTC (permalink / raw) To: Julien Grall, Wei Liu, Jan Beulich Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan, xen-devel, Ian Jackson [-- Attachment #1.1: Type: text/plain, Size: 878 bytes --] On Tue, 2017-03-28 at 19:30 +0100, Julien Grall wrote: > Hi Dario, > Hey, > On 02/02/17 13:32, Dario Faggioli wrote: > > On Thu, 2017-02-02 at 13:26 +0000, Julien Grall wrote: > > > > > Ok, sign me up for it then. We absolutely want it for 4.9, I agree. > > Do you have any update on this? This would allow us to use credit 2 > on > ARM when physical processor are idle. > Yes, sorry for the delay. I've started working on this, and I have it half done, but then had to switch to something else. I most likely will be able to get back to it tomorrow, and finish and send something soon. Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xen/arm: Domain not fully destroyed when using credit2 2017-02-02 11:22 ` Jan Beulich 2017-02-02 11:53 ` Wei Liu @ 2017-02-02 12:01 ` Dario Faggioli 1 sibling, 0 replies; 33+ messages in thread From: Dario Faggioli @ 2017-02-02 12:01 UTC (permalink / raw) To: Jan Beulich, Julien Grall, Wei Liu Cc: Stefano Stabellini, George Dunlap, AndrewCooper, Tim Deegan, xen-devel, Ian Jackson [-- Attachment #1.1: Type: text/plain, Size: 2306 bytes --] On Thu, 2017-02-02 at 04:22 -0700, Jan Beulich wrote: > On 01.02.17 at 19:21, <wei.liu2@citrix.com> wrote: > > On Tue, Jan 31, 2017 at 04:30:50PM +0000, Julien Grall wrote: > > > Yeah, even the tiny RCU code is quite complex :/. I've looked at > > > our RCUcode and noticed there is a link in the header to [1]. > > > > > > It seems to be a documentation about the RCU code we used. From > > > my > > > understanding of the "RCU Implementations", the authors are > > > expecting a > > > timer to kick periodically pCPU and check if there is some RCU > > > work pending. > > Worth checking all the RCU docs in Linux (Documentation/RCU). > > > > I think there are descriptions about idle or no-tick variants. > It surely is worth, but bearing in mind that, as said before, Linux RCUs are indeed more powerful than what we have, but also much much much much more complex than what we probably need. And (for Julien), perhaps it's me, but I don't think I see references or hints at using a timer in the docs you linked, nor on other RCU doc material. As a matter of fact, I agree with Jan, i.e., > Isn't all we need an rcu_idle_{enter,exit}() implementation (and of > course calls to them placed where needed)? > This is what I think we're missing. And, AFAIUI, it's sort of similar to what Stefano (I think) was saying, that a CPU going idle is a step toward grace period, because rcu critical sections can't occur on it. As per what Julien said about softirqs (which also looks right to me), this is how Linux handles the issue: http://lxr.free-electrons.com/source/kernel/rcu/tree.c#L733 /** * rcu_idle_enter - inform RCU that current CPU is entering idle * * Enter idle mode, in other words, -leave- the mode in which RCU * read-side critical sections can occur. (Though RCU read-side * critical sections can occur in irq handlers in idle, a possibility * handled by irq_enter() and irq_exit().) */ So we may also need rcu_irq_enter() and rcu_irq_exit(). Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2017-03-30 7:38 UTC | newest] Thread overview: 33+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-01-23 19:42 xen/arm: Domain not fully destroyed when using credit2 Julien Grall 2017-01-24 0:16 ` Stefano Stabellini 2017-01-24 12:52 ` Julien Grall 2017-01-24 8:20 ` Jan Beulich 2017-01-24 10:50 ` Julien Grall 2017-01-24 11:02 ` Jan Beulich 2017-01-24 12:30 ` Julien Grall 2017-01-24 12:53 ` Dario Faggioli 2017-01-24 13:04 ` Julien Grall 2017-01-24 13:05 ` Julien Grall 2017-01-24 13:19 ` Dario Faggioli 2017-01-24 13:24 ` Julien Grall 2017-01-24 13:40 ` Dario Faggioli 2017-01-24 13:49 ` Julien Grall 2017-01-24 14:16 ` Dario Faggioli 2017-01-24 15:06 ` Julien Grall 2017-01-25 11:10 ` Dario Faggioli 2017-01-25 12:38 ` Julien Grall 2017-01-25 12:40 ` Andrew Cooper 2017-01-25 14:23 ` Julien Grall 2017-01-25 16:00 ` Dario Faggioli 2017-01-31 16:30 ` Julien Grall 2017-01-31 22:10 ` Stefano Stabellini 2017-02-01 18:21 ` Wei Liu 2017-02-02 11:22 ` Jan Beulich 2017-02-02 11:53 ` Wei Liu 2017-02-02 12:18 ` Julien Grall 2017-02-02 12:51 ` Dario Faggioli 2017-02-02 13:26 ` Julien Grall 2017-02-02 13:32 ` Dario Faggioli 2017-03-28 18:30 ` Julien Grall 2017-03-30 7:38 ` Dario Faggioli 2017-02-02 12:01 ` Dario Faggioli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).