* Multiple issues with event channel on Xen on ARM
@ 2014-02-04 23:18 Julien Grall
2014-02-05 10:09 ` David Vrabel
2014-02-05 10:45 ` David Vrabel
0 siblings, 2 replies; 8+ messages in thread
From: Julien Grall @ 2014-02-04 23:18 UTC (permalink / raw)
To: david.vrabel; +Cc: xen-devel, Ian Campbell, Stefano Stabellini
Hello David,
I'm currently trying to use Linux 3.14-rc1 as Linux guest on Xen on ARM (Xen 4.4-rc3).
I have multiple issues with your event channel patch series on Linux and Xen side.
I tried to use Linux 3.14-rc1 as dom0 but it was worst (unable to create guests).
I'm using a simple guest config:
kernel="/root/zImage"
memory=32
name="test"
vcpus=1
autoballon="off"
extra="console=hvc0"
If everything is ok, I should see that Linux is unable to find the root filesystem.
But here, Linux is stucked.
>From Linux side, after bisecting, I found that the offending commit is:
xen/events: remove unnecessary init_evtchn_cpu_bindings()
Because the guest-side binding of an event to a VCPU (i.e., setting
the local per-cpu masks) is always explicitly done after an event
channel is bound to a port, there is no need to initialize all
possible events as bound to VCPU 0 at start of day or after a resume.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
With this patch, the function __xen_evtchn_do_upcall won't be able
to find an events (pendings_bits == 0 every time).
It seems the second part of init_evtchn_cpu_bindings is necessary on ARM.
Now, if I'm using Linux 3.14-rc1 as guest and trying to destroy the domain,
I get this following Xen trace:
(XEN) Assertion 'slot >= 0 && slot < DOMHEAP_ENTRIES' failed, line 334, file mm.c
(XEN) Xen BUG at mm.c:334
(XEN) CPU1: Unexpected Trap: Undefined Instruction
(XEN) ----[ Xen-4.4-rc2 arm32 debug=y Tainted: C ]----
(XEN) CPU: 1
(XEN) PC: 0023f7d0 __bug+0x28/0x44
(XEN) CPSR: 2000015a MODE:Hypervisor
(XEN) R0: 002646dc R1: 00000003 R2: 3fd21d80 R3: 00000fff
(XEN) R4: 002612b4 R5: 0000014e R6: 00000c00 R7: 00000000
(XEN) R8: 4007f080 R9: 9ed7e000 R10:7e9ed6e8 R11:7ffdfd64 R12:00000004
(XEN) HYP: SP: 7ffdfd5c LR: 0023f7d0
(XEN)
(XEN) VTCR_EL2: 80002558
(XEN) VTTBR_EL2: 00010002f9ffc000
(XEN)
(XEN) SCTLR_EL2: 30cd187f
(XEN) HCR_EL2: 0000000000282835
(XEN) TTBR0_EL2: 00000000be016000
(XEN)
(XEN) ESR_EL2: 00000000
(XEN) HPFAR_EL2: 0000000000fff110
(XEN) HDFAR: a0800f00
(XEN) HIFAR: 00000000
(XEN)
(XEN) Xen stack trace from sp=7ffdfd5c:
(XEN) 00000001 7ffdfd74 00247d6c 40076000 10011000 7ffdfd84 0020b17c 40076000
(XEN) 4007f000 7ffdfd94 0020b1d0 40025b70 40076000 7ffdfda4 0020bd3c 4007f000
(XEN) 00000000 7ffdfdc4 0020b024 00000000 8000da84 4007f000 76f9a004 7ffdff58
(XEN) 00000005 7ffdfddc 00207f80 00000001 8000da84 4007f000 76f9a004 7ffdfeec
(XEN) 00206050 00002002 00000000 00002100 00000000 00000000 00000000 00000000
(XEN) 00000000 00000000 00000000 000be077 00000000 2e3022e0 00000000 00000001
(XEN) 00000000 00000000 238c3fd2 604c3b53 fe4aec89 e4988389 00000000 00000002
(XEN) 00000009 76ef0003 76fb7680 76ecf000 00000000 7e9ed7cc 00000001 76fb3140
(XEN) 00000001 00000005 00000000 00036718 76df3018 0003e740 00000001 7e9ed79c
(XEN) 76fb2000 76f20000 0005e770 76f23c70 76fb24c0 00000000 76c00740 7e9ed80c
(XEN) 76fa5857 00000000 00000001 00000005 00000000 7e9ed7bc 76efe04c 00000001
(XEN) 00035830 00035030 00000003 7ffdfedc 8000da84 00000ea1 00000005 7ffdff58
(XEN) 00000005 9ed7e000 7e9ed6e8 7ffdff54 0024cee0 76fb7578 0022c540 0022c334
(XEN) 00000001 002ae000 002b1ff0 002e7614 400238d8 0000000d 00000000 7ffdff3c
(XEN) 7e9ed594 9ecad680 00000005 00305000 00000005 9ed7e000 76fb7578 9ecad680
(XEN) 00000005 00305000 00000005 9ed7e000 7e9ed6e8 7ffdff58 0024f6d0 76f9a004
(XEN) 76f23c70 76fb7578 76fb3140 76fb7578 9ecad680 00000005 00305000 00000005
(XEN) 9ed7e000 7e9ed6e8 9f4172d0 00000024 ffffffff 76f141e4 8000da84 60000013
(XEN) 00000000 7e9ed6ac 80551900 80011cc0 9ed7feac 801f6e9c 8055190c 80011fc0
(XEN) 80551918 80011ea0 00000000 00000000 00000000 00000000 00000000 00000000
(XEN) Xen call trace:
(XEN) [<0023f7d0>] __bug+0x28/0x44 (PC)
(XEN) [<0023f7d0>] __bug+0x28/0x44 (LR)
(XEN) [<00247d6c>] domain_page_map_to_mfn+0x50/0xb4
(XEN) [<0020b17c>] unmap_guest_page+0x20/0x54
(XEN) [<0020b1d0>] cleanup_control_block+0x20/0x34
(XEN) [<0020bd3c>] evtchn_fifo_destroy+0x2c/0x6c
(XEN) [<0020b024>] evtchn_destroy+0x1a8/0x1b0
(XEN) [<00207f80>] domain_kill+0x60/0x128
(XEN) [<00206050>] do_domctl+0xa7c/0x1104
(XEN) [<0024cee0>] do_trap_hypervisor+0xad8/0xd78
(XEN) [<0024f6d0>] return_from_trap+0/0x4
(XEN)
I will try to give more input tomorrow for the Xen bug.
Sincerely yours,
--
Julien Grall
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Multiple issues with event channel on Xen on ARM
2014-02-04 23:18 Multiple issues with event channel on Xen on ARM Julien Grall
@ 2014-02-05 10:09 ` David Vrabel
2014-02-05 10:33 ` Julien Grall
2014-02-05 10:45 ` David Vrabel
1 sibling, 1 reply; 8+ messages in thread
From: David Vrabel @ 2014-02-05 10:09 UTC (permalink / raw)
To: Julien Grall; +Cc: xen-devel, Ian Campbell, Stefano Stabellini
On 04/02/14 23:18, Julien Grall wrote:
>
> Now, if I'm using Linux 3.14-rc1 as guest and trying to destroy the domain,
> I get this following Xen trace:
>
> (XEN) Assertion 'slot >= 0 && slot < DOMHEAP_ENTRIES' failed, line 334, file mm.c
[...]
> (XEN) Xen call trace:
> (XEN) [<0023f7d0>] __bug+0x28/0x44 (PC)
> (XEN) [<0023f7d0>] __bug+0x28/0x44 (LR)
> (XEN) [<00247d6c>] domain_page_map_to_mfn+0x50/0xb4
> (XEN) [<0020b17c>] unmap_guest_page+0x20/0x54
> (XEN) [<0020b1d0>] cleanup_control_block+0x20/0x34
> (XEN) [<0020bd3c>] evtchn_fifo_destroy+0x2c/0x6c
> (XEN) [<0020b024>] evtchn_destroy+0x1a8/0x1b0
> (XEN) [<00207f80>] domain_kill+0x60/0x128
> (XEN) [<00206050>] do_domctl+0xa7c/0x1104
> (XEN) [<0024cee0>] do_trap_hypervisor+0xad8/0xd78
> (XEN) [<0024f6d0>] return_from_trap+0/0x4
This is because ARM's domain_page_map_to_mfn() doesn't work with pages
mapped with map_domain_page_global() which uses vmap().
x86's implementation has
if ( va >= VMAP_VIRT_START && va < VMAP_VIRT_END )
{
pl1e = virt_to_xen_l1e(va);
BUG_ON(!pl1e);
}
// ...
return l1e_get_pfn(*pl1e);
So I think ARM's needs something similar.
David
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Multiple issues with event channel on Xen on ARM
2014-02-05 10:09 ` David Vrabel
@ 2014-02-05 10:33 ` Julien Grall
0 siblings, 0 replies; 8+ messages in thread
From: Julien Grall @ 2014-02-05 10:33 UTC (permalink / raw)
To: David Vrabel; +Cc: xen-devel, Ian Campbell, Stefano Stabellini
On 05/02/14 10:09, David Vrabel wrote:
> On 04/02/14 23:18, Julien Grall wrote:
>>
>> Now, if I'm using Linux 3.14-rc1 as guest and trying to destroy the domain,
>> I get this following Xen trace:
>>
>> (XEN) Assertion 'slot >= 0 && slot < DOMHEAP_ENTRIES' failed, line 334, file mm.c
> [...]
>> (XEN) Xen call trace:
>> (XEN) [<0023f7d0>] __bug+0x28/0x44 (PC)
>> (XEN) [<0023f7d0>] __bug+0x28/0x44 (LR)
>> (XEN) [<00247d6c>] domain_page_map_to_mfn+0x50/0xb4
>> (XEN) [<0020b17c>] unmap_guest_page+0x20/0x54
>> (XEN) [<0020b1d0>] cleanup_control_block+0x20/0x34
>> (XEN) [<0020bd3c>] evtchn_fifo_destroy+0x2c/0x6c
>> (XEN) [<0020b024>] evtchn_destroy+0x1a8/0x1b0
>> (XEN) [<00207f80>] domain_kill+0x60/0x128
>> (XEN) [<00206050>] do_domctl+0xa7c/0x1104
>> (XEN) [<0024cee0>] do_trap_hypervisor+0xad8/0xd78
>> (XEN) [<0024f6d0>] return_from_trap+0/0x4
>
> This is because ARM's domain_page_map_to_mfn() doesn't work with pages
> mapped with map_domain_page_global() which uses vmap().
>
> x86's implementation has
>
> if ( va >= VMAP_VIRT_START && va < VMAP_VIRT_END )
> {
> pl1e = virt_to_xen_l1e(va);
> BUG_ON(!pl1e);
> }
> // ...
> return l1e_get_pfn(*pl1e);
>
> So I think ARM's needs something similar.
Thanks David, I will take a look on it. For the first bug (in Linux) do
you have any input?
Regards,
--
Julien Grall
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Multiple issues with event channel on Xen on ARM
2014-02-04 23:18 Multiple issues with event channel on Xen on ARM Julien Grall
2014-02-05 10:09 ` David Vrabel
@ 2014-02-05 10:45 ` David Vrabel
2014-02-05 13:34 ` David Vrabel
1 sibling, 1 reply; 8+ messages in thread
From: David Vrabel @ 2014-02-05 10:45 UTC (permalink / raw)
To: Julien Grall; +Cc: Ian Campbell, Stefano Stabellini, david.vrabel, xen-devel
On 04/02/14 23:18, Julien Grall wrote:
> Hello David,
>
> I'm currently trying to use Linux 3.14-rc1 as Linux guest on Xen on ARM (Xen 4.4-rc3).
>
> I have multiple issues with your event channel patch series on Linux and Xen side.
> I tried to use Linux 3.14-rc1 as dom0 but it was worst (unable to create guests).
I think there must be two issues here as both 2-level and FIFO events
are broken.
> I'm using a simple guest config:
> kernel="/root/zImage"
> memory=32
> name="test"
> vcpus=1
> autoballon="off"
> extra="console=hvc0"
>
> If everything is ok, I should see that Linux is unable to find the root filesystem.
> But here, Linux is stucked.
>
>>From Linux side, after bisecting, I found that the offending commit is:
> xen/events: remove unnecessary init_evtchn_cpu_bindings()
>
> Because the guest-side binding of an event to a VCPU (i.e., setting
> the local per-cpu masks) is always explicitly done after an event
> channel is bound to a port, there is no need to initialize all
> possible events as bound to VCPU 0 at start of day or after a resume.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>
> With this patch, the function __xen_evtchn_do_upcall won't be able
> to find an events (pendings_bits == 0 every time).
> It seems the second part of init_evtchn_cpu_bindings is necessary on ARM.
I think this is because binding an interdomain or allocating an unbound
event channel does call bind_evtchn_to_cpu(evtchn, 0) which is required
to set the local VCPU masks.
I think this happened to work on x86 because during the generic irq
setup, the irq affinity is always set which then binds the event channel
to the right VCPU. I guess ARM's irq setup misses this step.
This shouldn't affect the FIFO-based events though since
evtchn_fifo_bind_to_cpu() is a no-op.
David
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Multiple issues with event channel on Xen on ARM
2014-02-05 10:45 ` David Vrabel
@ 2014-02-05 13:34 ` David Vrabel
2014-02-05 13:44 ` Julien Grall
0 siblings, 1 reply; 8+ messages in thread
From: David Vrabel @ 2014-02-05 13:34 UTC (permalink / raw)
To: David Vrabel; +Cc: Julien Grall, Ian Campbell, Stefano Stabellini, xen-devel
On 05/02/14 10:45, David Vrabel wrote:
> On 04/02/14 23:18, Julien Grall wrote:
>> Hello David,
>>
>> I'm currently trying to use Linux 3.14-rc1 as Linux guest on Xen on ARM (Xen 4.4-rc3).
>>
>> I have multiple issues with your event channel patch series on Linux and Xen side.
>> I tried to use Linux 3.14-rc1 as dom0 but it was worst (unable to create guests).
>
> I think there must be two issues here as both 2-level and FIFO events
> are broken.
>
>> I'm using a simple guest config:
>> kernel="/root/zImage"
>> memory=32
>> name="test"
>> vcpus=1
>> autoballon="off"
>> extra="console=hvc0"
>>
>> If everything is ok, I should see that Linux is unable to find the root filesystem.
>> But here, Linux is stucked.
>>
>> >From Linux side, after bisecting, I found that the offending commit is:
>> xen/events: remove unnecessary init_evtchn_cpu_bindings()
>>
>> Because the guest-side binding of an event to a VCPU (i.e., setting
>> the local per-cpu masks) is always explicitly done after an event
>> channel is bound to a port, there is no need to initialize all
>> possible events as bound to VCPU 0 at start of day or after a resume.
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>
>> With this patch, the function __xen_evtchn_do_upcall won't be able
>> to find an events (pendings_bits == 0 every time).
>> It seems the second part of init_evtchn_cpu_bindings is necessary on ARM.
>
> I think this is because binding an interdomain or allocating an unbound
> event channel does call bind_evtchn_to_cpu(evtchn, 0) which is required
> to set the local VCPU masks.
>
> I think this happened to work on x86 because during the generic irq
> setup, the irq affinity is always set which then binds the event channel
> to the right VCPU. I guess ARM's irq setup misses this step.
>
> This shouldn't affect the FIFO-based events though since
> evtchn_fifo_bind_to_cpu() is a no-op.
I think the following patch should fix the 2-level problems.
You can force the use of 2-level events by using the xen.fifo_events=0
Linux command line option.
8<-------------------------------------------------
xen/events: bind all new interdomain events to VCPU0
From: David Vrabel <david.vrabel@citrix.com>
Commit fc087e10734a4d3e40693fc099461ec1270b3fff (xen/events: remove
unnecessary init_evtchn_cpu_bindings()) causes a regression.
The kernel-side VCPU binding was not being correctly set for newly
allocated or bound interdomain events. In ARM guests where 2-level
events were used, this would result in no interdomain events being
handled because the local VCPU masks would all be clear.
x86 guests would work because the irq affinity was set during irq
setup and this would set the correct kernel-side VCPU binding.
Fix this by by properly initializing the kernel-side VCPU binding in
bind_evtchn_to_irq().
Reported-by: Julian Grall <julien.grall@linaro.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
drivers/xen/events/events_base.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 4672e00..5cc1f78 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -862,6 +862,9 @@ int bind_evtchn_to_irq(unsigned int evtchn)
irq = ret;
goto out;
}
+
+ /* Newly bound event channels start off on VCPU0. */
+ bind_evtchn_to_cpu(evtchn, 0);
} else {
struct irq_info *info = info_for_irq(irq);
WARN_ON(info == NULL || info->type != IRQT_EVTCHN);
--
1.7.2.5
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: Multiple issues with event channel on Xen on ARM
2014-02-05 13:34 ` David Vrabel
@ 2014-02-05 13:44 ` Julien Grall
2014-02-05 13:52 ` David Vrabel
0 siblings, 1 reply; 8+ messages in thread
From: Julien Grall @ 2014-02-05 13:44 UTC (permalink / raw)
To: David Vrabel; +Cc: Stefano Stabellini, Ian Campbell, xen-devel
On 02/05/2014 01:34 PM, David Vrabel wrote:
Hello David,
> I think the following patch should fix the 2-level problems.
>
> You can force the use of 2-level events by using the xen.fifo_events=0
> Linux command line option.
Thanks for the patch, I'm now able to use 2-level events without issue
for a guest.
Now, I need to look at the fifo events when the domain is killed.
> 8<-------------------------------------------------
> xen/events: bind all new interdomain events to VCPU0
>
> From: David Vrabel <david.vrabel@citrix.com>
>
> Commit fc087e10734a4d3e40693fc099461ec1270b3fff (xen/events: remove
> unnecessary init_evtchn_cpu_bindings()) causes a regression.
>
> The kernel-side VCPU binding was not being correctly set for newly
> allocated or bound interdomain events. In ARM guests where 2-level
> events were used, this would result in no interdomain events being
> handled because the local VCPU masks would all be clear.
>
> x86 guests would work because the irq affinity was set during irq
> setup and this would set the correct kernel-side VCPU binding.
>
> Fix this by by properly initializing the kernel-side VCPU binding in
> bind_evtchn_to_irq().
>
> Reported-by: Julian Grall <julien.grall@linaro.org>
s/Julian/Julien/
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: Julien Grall <julien.grall@linaro.org>
Regards,
> ---
> drivers/xen/events/events_base.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
> index 4672e00..5cc1f78 100644
> --- a/drivers/xen/events/events_base.c
> +++ b/drivers/xen/events/events_base.c
> @@ -862,6 +862,9 @@ int bind_evtchn_to_irq(unsigned int evtchn)
> irq = ret;
> goto out;
> }
> +
> + /* Newly bound event channels start off on VCPU0. */
> + bind_evtchn_to_cpu(evtchn, 0);
> } else {
> struct irq_info *info = info_for_irq(irq);
> WARN_ON(info == NULL || info->type != IRQT_EVTCHN);
>
--
Julien Grall
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Multiple issues with event channel on Xen on ARM
2014-02-05 13:44 ` Julien Grall
@ 2014-02-05 13:52 ` David Vrabel
2014-02-05 13:58 ` Julien Grall
0 siblings, 1 reply; 8+ messages in thread
From: David Vrabel @ 2014-02-05 13:52 UTC (permalink / raw)
To: Julien Grall; +Cc: Stefano Stabellini, Ian Campbell, xen-devel
On 05/02/14 13:44, Julien Grall wrote:
> On 02/05/2014 01:34 PM, David Vrabel wrote:
>
> Hello David,
>
>> I think the following patch should fix the 2-level problems.
>>
>> You can force the use of 2-level events by using the xen.fifo_events=0
>> Linux command line option.
>
> Thanks for the patch, I'm now able to use 2-level events without issue
> for a guest.
Good. Thanks for testing.
> Now, I need to look at the fifo events when the domain is killed.
Do FIFO event works apart from the crash on domain shutdown?
>> Reported-by: Julian Grall <julien.grall@linaro.org>
>
> s/Julian/Julien/
Oops. Sorry!
David
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Multiple issues with event channel on Xen on ARM
2014-02-05 13:52 ` David Vrabel
@ 2014-02-05 13:58 ` Julien Grall
0 siblings, 0 replies; 8+ messages in thread
From: Julien Grall @ 2014-02-05 13:58 UTC (permalink / raw)
To: David Vrabel; +Cc: Stefano Stabellini, Ian Campbell, xen-devel
On 02/05/2014 01:52 PM, David Vrabel wrote:
> On 05/02/14 13:44, Julien Grall wrote:
>> On 02/05/2014 01:34 PM, David Vrabel wrote:
>>
>> Hello David,
>>
>>> I think the following patch should fix the 2-level problems.
>>>
>>> You can force the use of 2-level events by using the xen.fifo_events=0
>>> Linux command line option.
>>
>> Thanks for the patch, I'm now able to use 2-level events without issue
>> for a guest.
>
> Good. Thanks for testing.
>
>> Now, I need to look at the fifo events when the domain is killed.
>
> Do FIFO event works apart from the crash on domain shutdown?
In the guest yes. I have to try dom0 now.
--
Julien Grall
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-02-05 13:58 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-04 23:18 Multiple issues with event channel on Xen on ARM Julien Grall
2014-02-05 10:09 ` David Vrabel
2014-02-05 10:33 ` Julien Grall
2014-02-05 10:45 ` David Vrabel
2014-02-05 13:34 ` David Vrabel
2014-02-05 13:44 ` Julien Grall
2014-02-05 13:52 ` David Vrabel
2014-02-05 13:58 ` Julien Grall
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).