xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* xen/arm: Domain not fully destroyed when using credit2
@ 2017-01-23 19:42 Julien Grall
  2017-01-24  0:16 ` Stefano Stabellini
  2017-01-24  8:20 ` Jan Beulich
  0 siblings, 2 replies; 33+ messages in thread
From: Julien Grall @ 2017-01-23 19:42 UTC (permalink / raw)
  To: Stefano Stabellini, Dario Faggioli, George Dunlap, Andrew Cooper,
	Jan Beulich, Konrad Rzeszutek Wilk, Wei Liu, Ian Jackson,
	Tim Deegan
  Cc: xen-devel

Hi all,

Before someone dig into the scheduler, I don't think this is an issue in 
credit2 but the use of it highlight a bug in another component (I think 
RCU).

Whilst testing other patches today, I have noticed that some part of the 
resources allocated to a guest were not released during the destruction.

The configuration of the test is:
	- ARM platform with 6 cores
	- staging Xen with credit2 enabled by default
	- DOM0 using 2 pinned vCPUs

The test is creating a guest vCPUs and then destroyed. After the test, 
some resourced are not released (or could be released a long time
after).

Looking at the code, domain resources are released in 2 phases:
	- domain_destroy: called when there is no more reference on the domain 
(see put_domain)
	- complete_domain_destroy: called when the RCU is quiescent

The function domain_destroy will setup the RCU callback 
(complete_domain_destroy) by calling call_rcu. call_rcu will add the 
callback into the RCU list and then will may send an IPI (see 
force_quiescent_state) if the threshold reached. This IPI is here to 
make sure all CPUs are quiescent before calling the callbacks (e.g 
complete_domain_destroy). In my case, the threshold has not reached and 
therefore an IPI is not sent.

On ARM, the idle will run when the pCPU has no work to do. This loop 
will wait to receive an interrupt (see wfi) and check if there is some 
work to do when the CPU has waken-up (i.e an interrupt was received).

The problem I encountered is the idle CPU will never receive interrupts 
(no timer, nor IPI...) and therefore never check whether the RCU has 
some work to do.

 From my understanding, this is a bug in how RCU is handled (see comment 
above rcu_start_batch), it expects each CPU (no broadcast) to check 
whether there is RCU work. But this is relying on someone else (timer?) 
to fire an interrupt.

Any incoming interrupts will make a pCPU checking the RCU state. On ARM, 
the biggest source of IPI was credit1 or timer if a guest vCPU was 
scheduled on that pCPU. But it looks like the IPI traffic with credit2 
was reduced to none (which is a really good thing :)), and no guest 
timer was scheduled because no vCPU ever run on this pCPU.

I think the bug has always been here (both ARM and x86), but never 
detected because any incoming interrupts will make the pCPU to check the 
RCU state.

However, I am not sure how to resolve this issue. Any thoughts?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-03-30  7:38 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-23 19:42 xen/arm: Domain not fully destroyed when using credit2 Julien Grall
2017-01-24  0:16 ` Stefano Stabellini
2017-01-24 12:52   ` Julien Grall
2017-01-24  8:20 ` Jan Beulich
2017-01-24 10:50   ` Julien Grall
2017-01-24 11:02     ` Jan Beulich
2017-01-24 12:30       ` Julien Grall
2017-01-24 12:53     ` Dario Faggioli
2017-01-24 13:04       ` Julien Grall
2017-01-24 13:05         ` Julien Grall
2017-01-24 13:19         ` Dario Faggioli
2017-01-24 13:24           ` Julien Grall
2017-01-24 13:40             ` Dario Faggioli
2017-01-24 13:49               ` Julien Grall
2017-01-24 14:16                 ` Dario Faggioli
2017-01-24 15:06                   ` Julien Grall
2017-01-25 11:10                     ` Dario Faggioli
2017-01-25 12:38                       ` Julien Grall
2017-01-25 12:40                         ` Andrew Cooper
2017-01-25 14:23                           ` Julien Grall
2017-01-25 16:00                         ` Dario Faggioli
2017-01-31 16:30                           ` Julien Grall
2017-01-31 22:10                             ` Stefano Stabellini
2017-02-01 18:21                             ` Wei Liu
2017-02-02 11:22                               ` Jan Beulich
2017-02-02 11:53                                 ` Wei Liu
2017-02-02 12:18                                   ` Julien Grall
2017-02-02 12:51                                     ` Dario Faggioli
2017-02-02 13:26                                       ` Julien Grall
2017-02-02 13:32                                         ` Dario Faggioli
2017-03-28 18:30                                           ` Julien Grall
2017-03-30  7:38                                             ` Dario Faggioli
2017-02-02 12:01                                 ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).