xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] Improvements for domain_crash_synchronous
@ 2013-09-04 18:18 Andrew Cooper
  2013-09-04 18:18 ` [PATCH 1/2] x86/traps: Record last extable faulting address Andrew Cooper
  2013-09-04 18:18 ` [PATCH 2/2] Xen/x86: Improve information from domain_crash_synchronous Andrew Cooper
  0 siblings, 2 replies; 9+ messages in thread
From: Andrew Cooper @ 2013-09-04 18:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

This patch series is RFC because the second patch would benifit greatly from
following Jan's patch introducing UNLIKELY_DONE(), my GLOBAL() patch, backed
onto that series, and a further improvement to unconditionally provide a label
to the entry of an UNLIKELY section.

If there are no strong objections to the change in the meantime, I do intend
to rebase this series when the other patches are committed.

~Andrew

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] x86/traps: Record last extable faulting address
  2013-09-04 18:18 [PATCH RFC 0/2] Improvements for domain_crash_synchronous Andrew Cooper
@ 2013-09-04 18:18 ` Andrew Cooper
  2013-09-04 18:18 ` [PATCH 2/2] Xen/x86: Improve information from domain_crash_synchronous Andrew Cooper
  1 sibling, 0 replies; 9+ messages in thread
From: Andrew Cooper @ 2013-09-04 18:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

... so the following patch can identify the location of faults leading to a
decision to crash a domain.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/traps.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 9db42c82..50fb6ba 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -89,6 +89,7 @@ static char __read_mostly opt_nmi[10] = "fatal";
 string_param("nmi", opt_nmi);
 
 DEFINE_PER_CPU(u64, efer);
+static DEFINE_PER_CPU(unsigned long, last_extable_addr);
 
 DEFINE_PER_CPU_READ_MOSTLY(u32, ler_msr);
 
@@ -550,6 +551,7 @@ static inline void do_trap(
     {
         dprintk(XENLOG_ERR, "Trap %d: %p -> %p\n",
                 trapnr, _p(regs->eip), _p(fixup));
+        this_cpu(last_extable_addr) = regs->eip;
         regs->eip = fixup;
         return;
     }
@@ -1038,6 +1040,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
  die:
     if ( (fixup = search_exception_table(regs->eip)) != 0 )
     {
+        this_cpu(last_extable_addr) = regs->eip;
         regs->eip = fixup;
         return;
     }
@@ -1370,6 +1373,7 @@ void do_page_fault(struct cpu_user_regs *regs)
             perfc_incr(copy_user_faults);
             if ( unlikely(regs->error_code & PFEC_reserved_bit) )
                 reserved_bit_page_fault(addr, regs);
+            this_cpu(last_extable_addr) = regs->eip;
             regs->eip = fixup;
             return;
         }
@@ -3062,6 +3066,7 @@ void do_general_protection(struct cpu_user_regs *regs)
     {
         dprintk(XENLOG_INFO, "GPF (%04x): %p -> %p\n",
                 regs->error_code, _p(regs->eip), _p(fixup));
+        this_cpu(last_extable_addr) = regs->eip;
         regs->eip = fixup;
         return;
     }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] Xen/x86: Improve information from domain_crash_synchronous
  2013-09-04 18:18 [PATCH RFC 0/2] Improvements for domain_crash_synchronous Andrew Cooper
  2013-09-04 18:18 ` [PATCH 1/2] x86/traps: Record last extable faulting address Andrew Cooper
@ 2013-09-04 18:18 ` Andrew Cooper
  2013-09-05  8:15   ` Jan Beulich
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2013-09-04 18:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

As it currently stands, the string "domain_crash_sync called from entry.S" is
not helpful at identifying why the domain was crashed, and a debug build of
Xen doesn't help the matter

This patch improves the information printed, by pointing to where the crash
decision was made.

Specific improvements include:
 * Moving the ascii string "domain_crash_sync called from entry.S\n" away from
   some semi-hot code cache lines.
 * Moving the printk into C code (especially as this_cpu() is miserable to use
   in assembly code)
 * Undo the previous confusing situation of having the
   domain_crash_synchronous() as a macro in C code, yet a global symbol in
   assembly code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>

---

There were 4 jmps to the old domain_crash_synchronous which have been replaced
by unlikely sections, due to now providing an address in %rdi.  None of them
appear to need the recovery back to a sensible stack which is possibly
required through an extable redirection.
---
 xen/arch/x86/traps.c               |   12 ++++++++
 xen/arch/x86/x86_64/compat/entry.S |   16 ++++++++---
 xen/arch/x86/x86_64/entry.S        |   53 ++++++++++++++++++++----------------
 xen/include/xen/sched.h            |    7 +++++
 4 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 50fb6ba..225dda9 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -3748,6 +3748,18 @@ unsigned long do_get_debugreg(int reg)
     return -EINVAL;
 }
 
+void asm_domain_crash_synchronous(unsigned long addr)
+{
+    if ( addr == 0 )
+        addr = this_cpu(last_extable_addr);
+
+    printk("domain_crash_sync called from entry.S\n"
+           "  fault at %p", _p(addr));
+    print_symbol(" %s\n", addr);
+
+    __domain_crash_synchronous();
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index c0afe2c..2a6f1cf 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -263,6 +263,7 @@ ENTRY(compat_int80_direct_trap)
 /*   {[ERRCODE,] EIP, CS, EFLAGS, [ESP, SS]}                             */
 /* %rdx: trap_bounce, %rbx: struct vcpu                                  */
 /* On return only %rbx and %rdx are guaranteed non-clobbered.            */
+.globl compat_create_bounce_frame
 compat_create_bounce_frame:
         ASSERT_INTERRUPTS_ENABLED
         mov   %fs,%edi
@@ -329,7 +330,12 @@ UNLIKELY_END(compat_bounce_failsafe)
         movzwl TRAPBOUNCE_cs(%rdx),%eax
         /* Null selectors (0-3) are not allowed. */
         testl $~3,%eax
-        jz    domain_crash_synchronous
+.Lcompat_bounce_null_selector:
+UNLIKELY_START(z, compat_bounce_null_selector)
+        lea   .Lcompat_bounce_null_selector(%rip), %rdi
+        jmp   asm_domain_crash_synchronous
+        ud2a
+UNLIKELY_END(compat_bounce_null_selector)
         movl  %eax,UREGS_cs+8(%rsp)
         movl  TRAPBOUNCE_eip(%rdx),%eax
         movl  %eax,UREGS_rip+8(%rsp)
@@ -339,10 +345,10 @@ UNLIKELY_END(compat_bounce_failsafe)
         xorl  %edi,%edi
         jmp   .Lft13
 .previous
-        _ASM_EXTABLE(.Lft1,  domain_crash_synchronous)
+        _ASM_EXTABLE(.Lft1,  dom_crash_sync_extable)
         _ASM_EXTABLE(.Lft2,  compat_crash_page_fault)
         _ASM_EXTABLE(.Lft3,  compat_crash_page_fault_4)
-        _ASM_EXTABLE(.Lft4,  domain_crash_synchronous)
+        _ASM_EXTABLE(.Lft4,  dom_crash_sync_extable)
         _ASM_EXTABLE(.Lft5,  compat_crash_page_fault_4)
         _ASM_EXTABLE(.Lft6,  compat_crash_page_fault_8)
         _ASM_EXTABLE(.Lft7,  compat_crash_page_fault)
@@ -363,7 +369,9 @@ compat_crash_page_fault:
 .Lft14: mov   %edi,%fs
         movl  %esi,%edi
         call  show_page_walk
-        jmp   domain_crash_synchronous
+        xorl  %edi,%edi
+        jmp   asm_domain_crash_synchronous
+        ud2a
 .section .fixup,"ax"
 .Lfx14:
         xorl  %edi,%edi
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 5beeccb..fdd2d3c 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -350,6 +350,7 @@ int80_slow_path:
 /*   { RCX, R11, [DS-GS,] [CR2,] [ERRCODE,] RIP, CS, RFLAGS, RSP, SS }   */
 /* %rdx: trap_bounce, %rbx: struct vcpu                                  */
 /* On return only %rbx and %rdx are guaranteed non-clobbered.            */
+.globl create_bounce_frame
 create_bounce_frame:
         ASSERT_INTERRUPTS_ENABLED
         testb $TF_kernel_mode,VCPU_thread_flags(%rbx)
@@ -371,7 +372,12 @@ create_bounce_frame:
         sbb   %ecx,%ecx                 # In +ve address space? Then okay.
         cmpq  %rax,%rsi
         adc   %ecx,%ecx                 # Above Xen private area? Then okay.
-        jg    domain_crash_synchronous
+.Lbad_sp:
+UNLIKELY_START(g, create_bounce_frame_bad_sp)
+        lea   .Lbad_sp(%rip), %rdi
+        jmp   asm_domain_crash_synchronous
+        ud2a
+UNLIKELY_END(create_bounce_frame_bad_sp)
         movb  TRAPBOUNCE_flags(%rdx),%cl
         subq  $40,%rsi
         movq  UREGS_ss+8(%rsp),%rax
@@ -430,26 +436,28 @@ UNLIKELY_END(bounce_failsafe)
         movq  $FLAT_KERNEL_CS,UREGS_cs+8(%rsp)
         movq  TRAPBOUNCE_eip(%rdx),%rax
         testq %rax,%rax
-        jz    domain_crash_synchronous
+.Lbad_ip:
+UNLIKELY_START(z, create_bounce_frame_bad_ip)
+        lea   .Lbad_ip(%rip), %rdi
+        jmp   asm_domain_crash_synchronous
+        ud2a
+UNLIKELY_END(create_bounce_frame_bad_ip)
         movq  %rax,UREGS_rip+8(%rsp)
         ret
-        _ASM_EXTABLE(.Lft2,  domain_crash_synchronous)
-        _ASM_EXTABLE(.Lft3,  domain_crash_synchronous)
-        _ASM_EXTABLE(.Lft4,  domain_crash_synchronous)
-        _ASM_EXTABLE(.Lft5,  domain_crash_synchronous)
-        _ASM_EXTABLE(.Lft6,  domain_crash_synchronous)
-        _ASM_EXTABLE(.Lft7,  domain_crash_synchronous)
-        _ASM_EXTABLE(.Lft8,  domain_crash_synchronous)
-        _ASM_EXTABLE(.Lft9,  domain_crash_synchronous)
-        _ASM_EXTABLE(.Lft10, domain_crash_synchronous)
-        _ASM_EXTABLE(.Lft11, domain_crash_synchronous)
-        _ASM_EXTABLE(.Lft12, domain_crash_synchronous)
-        _ASM_EXTABLE(.Lft13, domain_crash_synchronous)
-
-domain_crash_synchronous_string:
-        .asciz "domain_crash_sync called from entry.S\n"
-
-ENTRY(domain_crash_synchronous)
+        _ASM_EXTABLE(.Lft2,  dom_crash_sync_extable)
+        _ASM_EXTABLE(.Lft3,  dom_crash_sync_extable)
+        _ASM_EXTABLE(.Lft4,  dom_crash_sync_extable)
+        _ASM_EXTABLE(.Lft5,  dom_crash_sync_extable)
+        _ASM_EXTABLE(.Lft6,  dom_crash_sync_extable)
+        _ASM_EXTABLE(.Lft7,  dom_crash_sync_extable)
+        _ASM_EXTABLE(.Lft8,  dom_crash_sync_extable)
+        _ASM_EXTABLE(.Lft9,  dom_crash_sync_extable)
+        _ASM_EXTABLE(.Lft10, dom_crash_sync_extable)
+        _ASM_EXTABLE(.Lft11, dom_crash_sync_extable)
+        _ASM_EXTABLE(.Lft12, dom_crash_sync_extable)
+        _ASM_EXTABLE(.Lft13, dom_crash_sync_extable)
+
+ENTRY(dom_crash_sync_extable)
         # Get out of the guest-save area of the stack.
         GET_STACK_BASE(%rax)
         leaq  STACK_CPUINFO_FIELD(guest_cpu_user_regs)(%rax),%rsp
@@ -460,11 +468,8 @@ ENTRY(domain_crash_synchronous)
         setz  %al
         leal  (%rax,%rax,2),%eax
         orb   %al,UREGS_cs(%rsp)
-        # printk(domain_crash_synchronous_string)
-        leaq  domain_crash_synchronous_string(%rip),%rdi
-        xorl  %eax,%eax
-        call  printk
-        jmp  __domain_crash_synchronous
+        xorl  %edi,%edi
+        jmp   asm_domain_crash_synchronous
 
 /* No special register assumptions. */
 ENTRY(ret_from_intr)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ae6a3b8..8e66d6b 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -541,6 +541,13 @@ void __domain_crash_synchronous(void) __attribute__((noreturn));
     __domain_crash_synchronous();                                         \
 } while (0)
 
+/*
+ * Called from assembly code, with an optional address to help indicate why
+ * the crash occured.  If addr is 0, look up address from last extable
+ * redirection.
+ */
+void asm_domain_crash_synchronous(unsigned long addr) __attribute__((noreturn));
+
 #define set_current_state(_s) do { current->state = (_s); } while (0)
 void scheduler_init(void);
 int  sched_init_vcpu(struct vcpu *v, unsigned int processor);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] Xen/x86: Improve information from domain_crash_synchronous
  2013-09-04 18:18 ` [PATCH 2/2] Xen/x86: Improve information from domain_crash_synchronous Andrew Cooper
@ 2013-09-05  8:15   ` Jan Beulich
  2013-09-05  9:44     ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2013-09-05  8:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Keir Fraser

>>> On 04.09.13 at 20:18, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> As it currently stands, the string "domain_crash_sync called from entry.S" is
> not helpful at identifying why the domain was crashed, and a debug build of
> Xen doesn't help the matter
> 
> This patch improves the information printed, by pointing to where the crash
> decision was made.

Looks quite useful.

> --- a/xen/arch/x86/x86_64/compat/entry.S
> +++ b/xen/arch/x86/x86_64/compat/entry.S
> @@ -263,6 +263,7 @@ ENTRY(compat_int80_direct_trap)
>  /*   {[ERRCODE,] EIP, CS, EFLAGS, [ESP, SS]}                             */
>  /* %rdx: trap_bounce, %rbx: struct vcpu                                  */
>  /* On return only %rbx and %rdx are guaranteed non-clobbered.            */
> +.globl compat_create_bounce_frame
>  compat_create_bounce_frame:

Is the addition above a left-over? I don't see any use of the
label outside of this file.

> @@ -329,7 +330,12 @@ UNLIKELY_END(compat_bounce_failsafe)
>          movzwl TRAPBOUNCE_cs(%rdx),%eax
>          /* Null selectors (0-3) are not allowed. */
>          testl $~3,%eax
> -        jz    domain_crash_synchronous
> +.Lcompat_bounce_null_selector:
> +UNLIKELY_START(z, compat_bounce_null_selector)
> +        lea   .Lcompat_bounce_null_selector(%rip), %rdi
> +        jmp   asm_domain_crash_synchronous
> +        ud2a
> +UNLIKELY_END(compat_bounce_null_selector)

Here and further down you don't really need the label at the
start of the unlikely section - the place can as well be identified
by using

        lea   (%rip), %rdi

inside that section (the place is still unique, just outside the
original code stream, i.e. just slightly more difficult to
re-associate).

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] Xen/x86: Improve information from domain_crash_synchronous
  2013-09-05  8:15   ` Jan Beulich
@ 2013-09-05  9:44     ` Andrew Cooper
  2013-09-05  9:51       ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2013-09-05  9:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser

On 05/09/13 09:15, Jan Beulich wrote:
>>>> On 04.09.13 at 20:18, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> As it currently stands, the string "domain_crash_sync called from entry.S" is
>> not helpful at identifying why the domain was crashed, and a debug build of
>> Xen doesn't help the matter
>>
>> This patch improves the information printed, by pointing to where the crash
>> decision was made.
> Looks quite useful.
>
>> --- a/xen/arch/x86/x86_64/compat/entry.S
>> +++ b/xen/arch/x86/x86_64/compat/entry.S
>> @@ -263,6 +263,7 @@ ENTRY(compat_int80_direct_trap)
>>  /*   {[ERRCODE,] EIP, CS, EFLAGS, [ESP, SS]}                             */
>>  /* %rdx: trap_bounce, %rbx: struct vcpu                                  */
>>  /* On return only %rbx and %rdx are guaranteed non-clobbered.            */
>> +.globl compat_create_bounce_frame
>>  compat_create_bounce_frame:
> Is the addition above a left-over? I don't see any use of the
> label outside of this file.

This is for the benifit of print_symbol(), which will now give
create_bounce_frame()+some rather than trap_nop()+loads.

>
>> @@ -329,7 +330,12 @@ UNLIKELY_END(compat_bounce_failsafe)
>>          movzwl TRAPBOUNCE_cs(%rdx),%eax
>>          /* Null selectors (0-3) are not allowed. */
>>          testl $~3,%eax
>> -        jz    domain_crash_synchronous
>> +.Lcompat_bounce_null_selector:
>> +UNLIKELY_START(z, compat_bounce_null_selector)
>> +        lea   .Lcompat_bounce_null_selector(%rip), %rdi
>> +        jmp   asm_domain_crash_synchronous
>> +        ud2a
>> +UNLIKELY_END(compat_bounce_null_selector)
> Here and further down you don't really need the label at the
> start of the unlikely section - the place can as well be identified
> by using
>
>         lea   (%rip), %rdi
>
> inside that section (the place is still unique, just outside the
> original code stream, i.e. just slightly more difficult to
> re-associate).
>
> Jan
>

But in an unlikely section, %rip is shifted quite a lot from %rip of the
code immediately before.  This is also for the benefit of print_symbol()
which will pick up the {compat_,}create_bounce_frame rather than the
global symbol surrounding the unlikely section.

~Andrew

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] Xen/x86: Improve information from domain_crash_synchronous
  2013-09-05  9:44     ` Andrew Cooper
@ 2013-09-05  9:51       ` Jan Beulich
  2013-09-05  9:57         ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2013-09-05  9:51 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Keir Fraser

>>> On 05.09.13 at 11:44, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 05/09/13 09:15, Jan Beulich wrote:
>>>>> On 04.09.13 at 20:18, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> As it currently stands, the string "domain_crash_sync called from entry.S" 
> is
>>> not helpful at identifying why the domain was crashed, and a debug build of
>>> Xen doesn't help the matter
>>>
>>> This patch improves the information printed, by pointing to where the crash
>>> decision was made.
>> Looks quite useful.
>>
>>> --- a/xen/arch/x86/x86_64/compat/entry.S
>>> +++ b/xen/arch/x86/x86_64/compat/entry.S
>>> @@ -263,6 +263,7 @@ ENTRY(compat_int80_direct_trap)
>>>  /*   {[ERRCODE,] EIP, CS, EFLAGS, [ESP, SS]}                             */
>>>  /* %rdx: trap_bounce, %rbx: struct vcpu                                  */
>>>  /* On return only %rbx and %rdx are guaranteed non-clobbered.            */
>>> +.globl compat_create_bounce_frame
>>>  compat_create_bounce_frame:
>> Is the addition above a left-over? I don't see any use of the
>> label outside of this file.
> 
> This is for the benifit of print_symbol(), which will now give
> create_bounce_frame()+some rather than trap_nop()+loads.

That shouldn't be happening - whether a symbol is local or global
should not matter to symbol table generation and consumption.
The matter would be different is the label started with .L...

>>> @@ -329,7 +330,12 @@ UNLIKELY_END(compat_bounce_failsafe)
>>>          movzwl TRAPBOUNCE_cs(%rdx),%eax
>>>          /* Null selectors (0-3) are not allowed. */
>>>          testl $~3,%eax
>>> -        jz    domain_crash_synchronous
>>> +.Lcompat_bounce_null_selector:
>>> +UNLIKELY_START(z, compat_bounce_null_selector)
>>> +        lea   .Lcompat_bounce_null_selector(%rip), %rdi
>>> +        jmp   asm_domain_crash_synchronous
>>> +        ud2a
>>> +UNLIKELY_END(compat_bounce_null_selector)
>> Here and further down you don't really need the label at the
>> start of the unlikely section - the place can as well be identified
>> by using
>>
>>         lea   (%rip), %rdi
>>
>> inside that section (the place is still unique, just outside the
>> original code stream, i.e. just slightly more difficult to
>> re-associate).
> 
> But in an unlikely section, %rip is shifted quite a lot from %rip of the
> code immediately before.  This is also for the benefit of print_symbol()
> which will pick up the {compat_,}create_bounce_frame rather than the
> global symbol surrounding the unlikely section.

I understand that, but stray labels are at clear risk of getting
deleted by a subsequent cleanup patch anyway. Hence either
we need a solution without stray labels, or live with the need
to re-associate the address pointed to be the crash log
messages to the original function.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] Xen/x86: Improve information from domain_crash_synchronous
  2013-09-05  9:51       ` Jan Beulich
@ 2013-09-05  9:57         ` Andrew Cooper
  2013-09-05 10:17           ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2013-09-05  9:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser

On 05/09/13 10:51, Jan Beulich wrote:
>>>> On 05.09.13 at 11:44, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 05/09/13 09:15, Jan Beulich wrote:
>>>>>> On 04.09.13 at 20:18, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>> As it currently stands, the string "domain_crash_sync called from entry.S" 
>> is
>>>> not helpful at identifying why the domain was crashed, and a debug build of
>>>> Xen doesn't help the matter
>>>>
>>>> This patch improves the information printed, by pointing to where the crash
>>>> decision was made.
>>> Looks quite useful.
>>>
>>>> --- a/xen/arch/x86/x86_64/compat/entry.S
>>>> +++ b/xen/arch/x86/x86_64/compat/entry.S
>>>> @@ -263,6 +263,7 @@ ENTRY(compat_int80_direct_trap)
>>>>  /*   {[ERRCODE,] EIP, CS, EFLAGS, [ESP, SS]}                             */
>>>>  /* %rdx: trap_bounce, %rbx: struct vcpu                                  */
>>>>  /* On return only %rbx and %rdx are guaranteed non-clobbered.            */
>>>> +.globl compat_create_bounce_frame
>>>>  compat_create_bounce_frame:
>>> Is the addition above a left-over? I don't see any use of the
>>> label outside of this file.
>> This is for the benifit of print_symbol(), which will now give
>> create_bounce_frame()+some rather than trap_nop()+loads.
> That shouldn't be happening - whether a symbol is local or global
> should not matter to symbol table generation and consumption.
> The matter would be different is the label started with .L...

Hmm - this was caught by my testing.  I had initially assumed that
print_symbol() would DTRT, but it didn't.  Perhaps it is been fed off
the global symbol table rather than the debug symbol table.

>
>>>> @@ -329,7 +330,12 @@ UNLIKELY_END(compat_bounce_failsafe)
>>>>          movzwl TRAPBOUNCE_cs(%rdx),%eax
>>>>          /* Null selectors (0-3) are not allowed. */
>>>>          testl $~3,%eax
>>>> -        jz    domain_crash_synchronous
>>>> +.Lcompat_bounce_null_selector:
>>>> +UNLIKELY_START(z, compat_bounce_null_selector)
>>>> +        lea   .Lcompat_bounce_null_selector(%rip), %rdi
>>>> +        jmp   asm_domain_crash_synchronous
>>>> +        ud2a
>>>> +UNLIKELY_END(compat_bounce_null_selector)
>>> Here and further down you don't really need the label at the
>>> start of the unlikely section - the place can as well be identified
>>> by using
>>>
>>>         lea   (%rip), %rdi
>>>
>>> inside that section (the place is still unique, just outside the
>>> original code stream, i.e. just slightly more difficult to
>>> re-associate).
>> But in an unlikely section, %rip is shifted quite a lot from %rip of the
>> code immediately before.  This is also for the benefit of print_symbol()
>> which will pick up the {compat_,}create_bounce_frame rather than the
>> global symbol surrounding the unlikely section.
> I understand that, but stray labels are at clear risk of getting
> deleted by a subsequent cleanup patch anyway. Hence either
> we need a solution without stray labels, or live with the need
> to re-associate the address pointed to be the crash log
> messages to the original function.
>
> Jan
>

That was eluded to in my patch 0 (perhaps not well enough), where I
intend to augment UNLIKELY_START() to automatically generate this
symbol, and provide an __UNLIKLEY_ENTRY_SYM() accessor.  The code for
that was rather tangled with your UNLIKELY_DONE() patch, which is why I
left it and was going to fix up after your series is committed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] Xen/x86: Improve information from domain_crash_synchronous
  2013-09-05  9:57         ` Andrew Cooper
@ 2013-09-05 10:17           ` Jan Beulich
  2013-09-05 12:57             ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2013-09-05 10:17 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Keir Fraser

>>> On 05.09.13 at 11:57, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 05/09/13 10:51, Jan Beulich wrote:
>>>>> On 05.09.13 at 11:44, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 05/09/13 09:15, Jan Beulich wrote:
>>>>>>> On 04.09.13 at 20:18, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>> --- a/xen/arch/x86/x86_64/compat/entry.S
>>>>> +++ b/xen/arch/x86/x86_64/compat/entry.S
>>>>> @@ -263,6 +263,7 @@ ENTRY(compat_int80_direct_trap)
>>>>>  /*   {[ERRCODE,] EIP, CS, EFLAGS, [ESP, SS]}                             */
>>>>>  /* %rdx: trap_bounce, %rbx: struct vcpu                                  */
>>>>>  /* On return only %rbx and %rdx are guaranteed non-clobbered.            */
>>>>> +.globl compat_create_bounce_frame
>>>>>  compat_create_bounce_frame:
>>>> Is the addition above a left-over? I don't see any use of the
>>>> label outside of this file.
>>> This is for the benifit of print_symbol(), which will now give
>>> create_bounce_frame()+some rather than trap_nop()+loads.
>> That shouldn't be happening - whether a symbol is local or global
>> should not matter to symbol table generation and consumption.
>> The matter would be different is the label started with .L...
> 
> Hmm - this was caught by my testing.  I had initially assumed that
> print_symbol() would DTRT, but it didn't.  Perhaps it is been fed off
> the global symbol table rather than the debug symbol table.

The debug symbol table never gets used, but local symbols
should always end up in the normal ELF symbol table. I just
checked - they do here.

>>>>> @@ -329,7 +330,12 @@ UNLIKELY_END(compat_bounce_failsafe)
>>>>>          movzwl TRAPBOUNCE_cs(%rdx),%eax
>>>>>          /* Null selectors (0-3) are not allowed. */
>>>>>          testl $~3,%eax
>>>>> -        jz    domain_crash_synchronous
>>>>> +.Lcompat_bounce_null_selector:
>>>>> +UNLIKELY_START(z, compat_bounce_null_selector)
>>>>> +        lea   .Lcompat_bounce_null_selector(%rip), %rdi
>>>>> +        jmp   asm_domain_crash_synchronous
>>>>> +        ud2a
>>>>> +UNLIKELY_END(compat_bounce_null_selector)
>>>> Here and further down you don't really need the label at the
>>>> start of the unlikely section - the place can as well be identified
>>>> by using
>>>>
>>>>         lea   (%rip), %rdi
>>>>
>>>> inside that section (the place is still unique, just outside the
>>>> original code stream, i.e. just slightly more difficult to
>>>> re-associate).
>>> But in an unlikely section, %rip is shifted quite a lot from %rip of the
>>> code immediately before.  This is also for the benefit of print_symbol()
>>> which will pick up the {compat_,}create_bounce_frame rather than the
>>> global symbol surrounding the unlikely section.
>> I understand that, but stray labels are at clear risk of getting
>> deleted by a subsequent cleanup patch anyway. Hence either
>> we need a solution without stray labels, or live with the need
>> to re-associate the address pointed to be the crash log
>> messages to the original function.
> 
> That was eluded to in my patch 0 (perhaps not well enough), where I
> intend to augment UNLIKELY_START() to automatically generate this
> symbol, and provide an __UNLIKLEY_ENTRY_SYM() accessor.  The code for
> that was rather tangled with your UNLIKELY_DONE() patch, which is why I
> left it and was going to fix up after your series is committed.

I did realize those intentions, but whether an orphan label gets
added here or in the macro doesn't matter - the label remains
orphaned, and hence would be a likely subject to janitorial work.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] Xen/x86: Improve information from domain_crash_synchronous
  2013-09-05 10:17           ` Jan Beulich
@ 2013-09-05 12:57             ` Andrew Cooper
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Cooper @ 2013-09-05 12:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser

On 05/09/13 11:17, Jan Beulich wrote:
>>> That shouldn't be happening - whether a symbol is local or global
>>> should not matter to symbol table generation and consumption.
>>> The matter would be different is the label started with .L...
>> Hmm - this was caught by my testing.  I had initially assumed that
>> print_symbol() would DTRT, but it didn't.  Perhaps it is been fed off
>> the global symbol table rather than the debug symbol table.
> The debug symbol table never gets used, but local symbols
> should always end up in the normal ELF symbol table. I just
> checked - they do here.

So they are.  I have just double checked my debugging, and I was getting
mixed up with the issue from below, which would leave these .globl as
failed debugging.  I shall remove them.

>
>>>>>> @@ -329,7 +330,12 @@ UNLIKELY_END(compat_bounce_failsafe)
>>>>>>          movzwl TRAPBOUNCE_cs(%rdx),%eax
>>>>>>          /* Null selectors (0-3) are not allowed. */
>>>>>>          testl $~3,%eax
>>>>>> -        jz    domain_crash_synchronous
>>>>>> +.Lcompat_bounce_null_selector:
>>>>>> +UNLIKELY_START(z, compat_bounce_null_selector)
>>>>>> +        lea   .Lcompat_bounce_null_selector(%rip), %rdi
>>>>>> +        jmp   asm_domain_crash_synchronous
>>>>>> +        ud2a
>>>>>> +UNLIKELY_END(compat_bounce_null_selector)
>>>>> Here and further down you don't really need the label at the
>>>>> start of the unlikely section - the place can as well be identified
>>>>> by using
>>>>>
>>>>>         lea   (%rip), %rdi
>>>>>
>>>>> inside that section (the place is still unique, just outside the
>>>>> original code stream, i.e. just slightly more difficult to
>>>>> re-associate).
>>>> But in an unlikely section, %rip is shifted quite a lot from %rip of the
>>>> code immediately before.  This is also for the benefit of print_symbol()
>>>> which will pick up the {compat_,}create_bounce_frame rather than the
>>>> global symbol surrounding the unlikely section.
>>> I understand that, but stray labels are at clear risk of getting
>>> deleted by a subsequent cleanup patch anyway. Hence either
>>> we need a solution without stray labels, or live with the need
>>> to re-associate the address pointed to be the crash log
>>> messages to the original function.
>> That was eluded to in my patch 0 (perhaps not well enough), where I
>> intend to augment UNLIKELY_START() to automatically generate this
>> symbol, and provide an __UNLIKLEY_ENTRY_SYM() accessor.  The code for
>> that was rather tangled with your UNLIKELY_DONE() patch, which is why I
>> left it and was going to fix up after your series is committed.
> I did realize those intentions, but whether an orphan label gets
> added here or in the macro doesn't matter - the label remains
> orphaned, and hence would be a likely subject to janitorial work.
>
> Jan
>

But any janitorial work which removes the proposed new label from
UNLIKELY_START() will cause a subsequent assemble error when
__UNLIKELY_ENTRY_SYM() references a non-existant label.

~Andrew

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-09-05 12:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-04 18:18 [PATCH RFC 0/2] Improvements for domain_crash_synchronous Andrew Cooper
2013-09-04 18:18 ` [PATCH 1/2] x86/traps: Record last extable faulting address Andrew Cooper
2013-09-04 18:18 ` [PATCH 2/2] Xen/x86: Improve information from domain_crash_synchronous Andrew Cooper
2013-09-05  8:15   ` Jan Beulich
2013-09-05  9:44     ` Andrew Cooper
2013-09-05  9:51       ` Jan Beulich
2013-09-05  9:57         ` Andrew Cooper
2013-09-05 10:17           ` Jan Beulich
2013-09-05 12:57             ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).