current not very current (vs curr

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* current not very current (vs curr_vcpu)
@ 2010-02-19  4:21 Mukesh Rathor
  2010-02-19  8:12 ` Keir Fraser
  0 siblings, 1 reply; 13+ messages in thread
From: Mukesh Rathor @ 2010-02-19  4:21 UTC (permalink / raw)
  To: Xen-devel@lists.xensource.com

Hi,

This on xen 4.0. 

I noticed while debugging something that current is not pointing to the
current vcpu upon serial interrupt. The regs->SP clearly shows 64bit dom0
stack, guest_mode(regs) returns 1, but current is pointing
to idle vcpu. I'm not able to figure how this is possible. I can come up
with scenario where dom0.vcpu yields cpu to idle vcpu in which case 
curr_vcpu will point to dom0.vcpu and current to idle vcpu. But in that
case guest_mode(regs) will be false, and regs.SP will show hyp stack.
Correct?

Am I correct that if guest_mode() then current should always point to 
guest vcpu? If yes, then I will debug this further.

Here's what I see in ns16550_interrupt: 
      regs.SP: ffffffff8041bf50  (my 64bit dom0 stack)
      regs.IP: ffffffff800053aa (dom0 return from hypercall)
      regs: ffff82c48030ff28 (hyp cpu 0 stack)
      SP in ns16550_interrupt(): ffff82c48030fd50 (hyp stack again)
      guest_mode == 1
      current == idle domain ??????

thanks in advance,
Mukesh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: current not very current (vs curr_vcpu)
  2010-02-19  4:21 current not very current (vs curr_vcpu) Mukesh Rathor
@ 2010-02-19  8:12 ` Keir Fraser
  2010-02-19 19:23   ` Mukesh Rathor
  0 siblings, 1 reply; 13+ messages in thread
From: Keir Fraser @ 2010-02-19  8:12 UTC (permalink / raw)
  To: Mukesh Rathor, Xen-devel@lists.xensource.com

On 19/02/2010 04:21, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:

> I noticed while debugging something that current is not pointing to the
> current vcpu upon serial interrupt. The regs->SP clearly shows 64bit dom0
> stack, guest_mode(regs) returns 1, but current is pointing
> to idle vcpu. I'm not able to figure how this is possible. I can come up
> with scenario where dom0.vcpu yields cpu to idle vcpu in which case
> curr_vcpu will point to dom0.vcpu and current to idle vcpu. But in that
> case guest_mode(regs) will be false, and regs.SP will show hyp stack.
> Correct?
> 
> Am I correct that if guest_mode() then current should always point to
> guest vcpu? If yes, then I will debug this further.

The behaviour you see is expected. Guest_mode() is meaningless when running
an idle vcpu. This is because the guest regs at the bottom of the stack are
junk for an idle vcpu (and also we do lazy state synchronisation, so they
may be the valid active regs for the last scheduled non-idle vcpu).

 -- Keir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: current not very current (vs curr_vcpu)
  2010-02-19  8:12 ` Keir Fraser
@ 2010-02-19 19:23   ` Mukesh Rathor
  2010-02-19 21:34     ` Keir Fraser
  0 siblings, 1 reply; 13+ messages in thread
From: Mukesh Rathor @ 2010-02-19 19:23 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel@lists.xensource.com

On Fri, 19 Feb 2010 08:12:18 +0000
Keir Fraser <keir.fraser@eu.citrix.com> wrote:

> On 19/02/2010 04:21, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:
> 
> > I noticed while debugging something that current is not pointing to
> > the current vcpu upon serial interrupt. The regs->SP clearly shows
> > 64bit dom0 stack, guest_mode(regs) returns 1, but current is
> > pointing to idle vcpu. I'm not able to figure how this is possible.
> > I can come up with scenario where dom0.vcpu yields cpu to idle vcpu
> > in which case curr_vcpu will point to dom0.vcpu and current to idle
> > vcpu. But in that case guest_mode(regs) will be false, and regs.SP
> > will show hyp stack. Correct?
> > 
> > Am I correct that if guest_mode() then current should always point
> > to guest vcpu? If yes, then I will debug this further.
> 
> The behaviour you see is expected. Guest_mode() is meaningless when
> running an idle vcpu. This is because the guest regs at the bottom of
> the stack are junk for an idle vcpu (and also we do lazy state
> synchronisation, so they may be the valid active regs for the last
> scheduled non-idle vcpu).
> 
>  -- Keir
> 

Yes, but my point is it doesn't appear to be running idle vcpu as
indicated by regs->rsp and regs->rip. They both point to dom0 context.
This from printk in ns16550_interrupt().

To rephrase the question, if regs->rip and regs->rsp show guest context
in do_IRQ(), then current must always point to guest vcpu, correct?

thanks,
Mukesh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: current not very current (vs curr_vcpu)
  2010-02-19 19:23   ` Mukesh Rathor
@ 2010-02-19 21:34     ` Keir Fraser
  2010-02-20  3:50       ` Mukesh Rathor
  0 siblings, 1 reply; 13+ messages in thread
From: Keir Fraser @ 2010-02-19 21:34 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel@lists.xensource.com

On 19/02/2010 19:23, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:

> Yes, but my point is it doesn't appear to be running idle vcpu as
> indicated by regs->rsp and regs->rip. They both point to dom0 context.
> This from printk in ns16550_interrupt().
> 
> To rephrase the question, if regs->rip and regs->rsp show guest context
> in do_IRQ(), then current must always point to guest vcpu, correct?

Oh, so the serial interrupt interrupted a guest's execution? Then yes,
current should point at that guest.

 -- Keir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: current not very current (vs curr_vcpu)
  2010-02-19 21:34     ` Keir Fraser
@ 2010-02-20  3:50       ` Mukesh Rathor
  2010-02-20  7:45         ` Keir Fraser
  0 siblings, 1 reply; 13+ messages in thread
From: Mukesh Rathor @ 2010-02-20  3:50 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel@lists.xensource.com

On Fri, 19 Feb 2010 21:34:39 +0000
Keir Fraser <keir.fraser@eu.citrix.com> wrote:

> On 19/02/2010 19:23, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:
> 
> > Yes, but my point is it doesn't appear to be running idle vcpu as
> > indicated by regs->rsp and regs->rip. They both point to dom0
> > context. This from printk in ns16550_interrupt().
> > 
> > To rephrase the question, if regs->rip and regs->rsp show guest
> > context in do_IRQ(), then current must always point to guest vcpu,
> > correct?
> 
> Oh, so the serial interrupt interrupted a guest's execution? Then yes,
> current should point at that guest.
> 
>  -- Keir
> 

ah, I see what's going on. context_switch() is scheduling idle vcpu, and
calls continue_idle_domain() to reset_stack_and_jump(idle_loop).
well, reset_stack_and_jump() is setting rsp to guest_cpu_user_regs(),
and interrupt is coming right at that instant. so: 

diff = (char *)guest_cpu_user_regs() - (char *)(r)  is 0

and as a result, guest_mode(regs) == true.

It appears to me, that guest_mode() needs to check if current
rsp is already guest_cpu_user_regs() in which case a stack switch
has not happened, and guest_mode should be false. 
Or, perhaps, just not enable interrupts in context_switch and do 
it in idle_loop().

what do you think?

thanks,
Mukesh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: current not very current (vs curr_vcpu)
  2010-02-20  3:50       ` Mukesh Rathor
@ 2010-02-20  7:45         ` Keir Fraser
  2010-02-22 18:59           ` Mukesh Rathor
  2010-02-23 19:46           ` Mukesh Rathor
  0 siblings, 2 replies; 13+ messages in thread
From: Keir Fraser @ 2010-02-20  7:45 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel@lists.xensource.com

On 20/02/2010 03:50, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:

> ah, I see what's going on. context_switch() is scheduling idle vcpu, and
> calls continue_idle_domain() to reset_stack_and_jump(idle_loop).
> well, reset_stack_and_jump() is setting rsp to guest_cpu_user_regs(),
> and interrupt is coming right at that instant. so:
> 
> diff = (char *)guest_cpu_user_regs() - (char *)(r)  is 0
> 
> and as a result, guest_mode(regs) == true.

Well, I don't see how this scenario works. If rsp==g_c_u_r() at the instant
the interrupt comes in, then the stack frame for the interrupt will be
*above* g_c_u_r(). Thus 'diff' in guest_mode() will evaluate non-zero and
positive, and regs->{rip,rsp} should point at hypervisor code/stack.

Also: in your original email you said regs.rsp pointed at dom0 stack. That
doesn't tally with you saying that rsp==g_c_u_r() (an address in hypervisor
space) immediately before the interrupt, in this email. Regs->rsp in the
scenario you describe here should be exactly equal to g_c_u_r().

 -- Keir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: current not very current (vs curr_vcpu)
  2010-02-20  7:45         ` Keir Fraser
@ 2010-02-22 18:59           ` Mukesh Rathor
  2010-02-23 19:46           ` Mukesh Rathor
  1 sibling, 0 replies; 13+ messages in thread
From: Mukesh Rathor @ 2010-02-22 18:59 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel@lists.xensource.com

On Sat, 20 Feb 2010 07:45:26 +0000
Keir Fraser <keir.fraser@eu.citrix.com> wrote:

> On 20/02/2010 03:50, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:
> 
> > ah, I see what's going on. context_switch() is scheduling idle
> > vcpu, and calls continue_idle_domain() to
> > reset_stack_and_jump(idle_loop). well, reset_stack_and_jump() is
> > setting rsp to guest_cpu_user_regs(), and interrupt is coming right
> > at that instant. so:
> > 
> > diff = (char *)guest_cpu_user_regs() - (char *)(r)  is 0
> > 
> > and as a result, guest_mode(regs) == true.
> 
> Well, I don't see how this scenario works. If rsp==g_c_u_r() at the
> instant the interrupt comes in, then the stack frame for the
> interrupt will be *above* g_c_u_r(). Thus 'diff' in guest_mode() will
> evaluate non-zero and positive, and regs->{rip,rsp} should point at
> hypervisor code/stack.
> 
> Also: in your original email you said regs.rsp pointed at dom0 stack.
> That doesn't tally with you saying that rsp==g_c_u_r() (an address in
> hypervisor space) immediately before the interrupt, in this email.
> Regs->rsp in the scenario you describe here should be exactly equal
> to g_c_u_r().
> 
>  -- Keir
> 

yes, you are right! Interrupt coming in would make rsp go up, 
my brain missed it. regs->rsp and rip clearly indicate cpu in
dom0, I thought friday that may be it was showing stale entries. I'll
continue debugging more, instrumented hypervisor seems to indicate
interrupt coming in during context switch, but I need to
fine grain it. Will keep you posted.

thanks,
Mukesh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: current not very current (vs curr_vcpu)
  2010-02-20  7:45         ` Keir Fraser
  2010-02-22 18:59           ` Mukesh Rathor
@ 2010-02-23 19:46           ` Mukesh Rathor
  2010-02-23 21:03             ` Keir Fraser
  1 sibling, 1 reply; 13+ messages in thread
From: Mukesh Rathor @ 2010-02-23 19:46 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel@lists.xensource.com

On Sat, 20 Feb 2010 07:45:26 +0000
Keir Fraser <keir.fraser@eu.citrix.com> wrote:

> On 20/02/2010 03:50, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:
> 
> > ah, I see what's going on. context_switch() is scheduling idle
> > vcpu, and calls continue_idle_domain() to
> > reset_stack_and_jump(idle_loop). well, reset_stack_and_jump() is
> > setting rsp to guest_cpu_user_regs(), and interrupt is coming right
> > at that instant. so:
> > 
> > diff = (char *)guest_cpu_user_regs() - (char *)(r)  is 0
> > 
> > and as a result, guest_mode(regs) == true.
> 
> Well, I don't see how this scenario works. If rsp==g_c_u_r() at the
> instant the interrupt comes in, then the stack frame for the
> interrupt will be *above* g_c_u_r(). Thus 'diff' in guest_mode() will
> evaluate non-zero and positive, and regs->{rip,rsp} should point at
> hypervisor code/stack.
> 
> Also: in your original email you said regs.rsp pointed at dom0 stack.
> That doesn't tally with you saying that rsp==g_c_u_r() (an address in
> hypervisor space) immediately before the interrupt, in this email.
> Regs->rsp in the scenario you describe here should be exactly equal
> to g_c_u_r().
> 
>  -- Keir
> 

Ok, I think I found it.  Initially, my printk in serial_rx() showed 
  regs == ffff82c48030ff28 == guest_cpu_user_regs

This led me down to reset_stack_and_jump where sp is set to g_c_u_r.

Anyways, on this big box, I'm using virtual serial via the service 
processor. So, it looks like serial interrupts are not going thru
do_IRQ(), but ns16550_poll(). 

__do_softirq -> execute_timer -> ns16550_poll -> serial_rx_interrupt.

However, in ns16550_poll():

struct cpu_user_regs *regs = guest_cpu_user_regs(); <------

The cpu is clearly running idle_vcpu, so current is correctly pointing
to idle vcpu. But guest_mode() is showing guest mode incorrectly.

I'm not much familiar with ns16550 stuff, so cant' think of a fix other
than just setting regs to current stack pointer in ns16550_poll().

__asm__ ( "movq %%rsp,%0" : "=r" (val));
struct cpu_user_regs *regs = val;


Let me know if you like the fix and I'll submit a patch.

thanks,
Mukesh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: current not very current (vs curr_vcpu)
  2010-02-23 19:46           ` Mukesh Rathor
@ 2010-02-23 21:03             ` Keir Fraser
  2010-02-24  3:55               ` Mukesh Rathor
  0 siblings, 1 reply; 13+ messages in thread
From: Keir Fraser @ 2010-02-23 21:03 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel@lists.xensource.com

On 23/02/2010 19:46, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:

> struct cpu_user_regs *regs = guest_cpu_user_regs(); <------
> 
> The cpu is clearly running idle_vcpu, so current is correctly pointing
> to idle vcpu. But guest_mode() is showing guest mode incorrectly.
> 
> I'm not much familiar with ns16550 stuff, so cant' think of a fix other
> than just setting regs to current stack pointer in ns16550_poll().
> 
> __asm__ ( "movq %%rsp,%0" : "=r" (val));
> struct cpu_user_regs *regs = val;
> 
> Let me know if you like the fix and I'll submit a patch.

Given the only thing this apparently affected was some of your own ad-hoc
debug code, do we really care about this at all? We can probably happily
just leave it.

 -- Keir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: current not very current (vs curr_vcpu)
  2010-02-23 21:03             ` Keir Fraser
@ 2010-02-24  3:55               ` Mukesh Rathor
  2010-02-24 10:45                 ` Keir Fraser
  0 siblings, 1 reply; 13+ messages in thread
From: Mukesh Rathor @ 2010-02-24  3:55 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel@lists.xensource.com

On Tue, 23 Feb 2010 21:03:04 +0000
Keir Fraser <keir.fraser@eu.citrix.com> wrote:

> On 23/02/2010 19:46, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:
> 
> > struct cpu_user_regs *regs = guest_cpu_user_regs(); <------
> > 
> > The cpu is clearly running idle_vcpu, so current is correctly
> > pointing to idle vcpu. But guest_mode() is showing guest mode
> > incorrectly.
> > 
> > I'm not much familiar with ns16550 stuff, so cant' think of a fix
> > other than just setting regs to current stack pointer in
> > ns16550_poll().
> > 
> > __asm__ ( "movq %%rsp,%0" : "=r" (val));
> > struct cpu_user_regs *regs = val;
> > 
> > Let me know if you like the fix and I'll submit a patch.
> 
> Given the only thing this apparently affected was some of your own
> ad-hoc debug code, do we really care about this at all? We can
> probably happily just leave it.

Well, I'm afraid not. It breaks the debug code to debug the hang. More
importantly, it also breaks my debuggers, which some people from outside
oracle are also using. Most of our new high end servers are accessed
via virtual serial port, so if ns16550_poll() call is related to it,
then it'll only get worse. Moreover, anybody reading and copying that
code to do something similar will be misled.

I realize it's ugly, and the fix is more than just setting regs to 
current stack pointer. But I'm willing to do the work and come up
with something. Since, it's broken already, it'll be low risk to you.

I can come up with some sort of glue code that'll push 
eflgas/cs/eip/cpu_user_regs on the current stack and then set regs. Let
me know if you think of better idea, or think that is the best approach.

execute_timers -> ns16550_poll_glue -> ns16550_poll(data, regs)
   where ns16550_poll_glue in some file.S will just push context on
   stack.

thanks,
Mukesh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: current not very current (vs curr_vcpu)
  2010-02-24  3:55               ` Mukesh Rathor
@ 2010-02-24 10:45                 ` Keir Fraser
  2010-02-25  1:06                   ` Mukesh Rathor
  0 siblings, 1 reply; 13+ messages in thread
From: Keir Fraser @ 2010-02-24 10:45 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel@lists.xensource.com

On 24/02/2010 03:55, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:

>> Given the only thing this apparently affected was some of your own
>> ad-hoc debug code, do we really care about this at all? We can
>> probably happily just leave it.
> 
> Well, I'm afraid not. It breaks the debug code to debug the hang. More
> importantly, it also breaks my debuggers, which some people from outside
> oracle are also using. Most of our new high end servers are accessed
> via virtual serial port, so if ns16550_poll() call is related to it,
> then it'll only get worse. Moreover, anybody reading and copying that
> code to do something similar will be misled.

Okay, see how xen-unstable:20969 works for you.

 -- Keir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: current not very current (vs curr_vcpu)
  2010-02-24 10:45                 ` Keir Fraser
@ 2010-02-25  1:06                   ` Mukesh Rathor
  2010-02-25  8:07                     ` Keir Fraser
  0 siblings, 1 reply; 13+ messages in thread
From: Mukesh Rathor @ 2010-02-25  1:06 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel@lists.xensource.com

On Wed, 24 Feb 2010 10:45:36 +0000
Keir Fraser <keir.fraser@eu.citrix.com> wrote:

> On 24/02/2010 03:55, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:
.... 
> > Well, I'm afraid not. It breaks the debug code to debug the hang.
> > More importantly, it also breaks my debuggers, which some people
> > from outside oracle are also using. Most of our new high end
> > servers are accessed via virtual serial port, so if ns16550_poll()
> > call is related to it, then it'll only get worse. Moreover, anybody
> > reading and copying that code to do something similar will be
> > misled.
> 
> Okay, see how xen-unstable:20969 works for you.


Yup, better. Thanks a lot.

BTW, since debuggers only care about BUG and ASSERT, perhaps 
DEBUGGER_trap_entry could be moved after BUGFRAME_warn, next time
you are in do_invalid_op().

Mukesh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: current not very current (vs curr_vcpu)
  2010-02-25  1:06                   ` Mukesh Rathor
@ 2010-02-25  8:07                     ` Keir Fraser
  0 siblings, 0 replies; 13+ messages in thread
From: Keir Fraser @ 2010-02-25  8:07 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: Xen-devel@lists.xensource.com

On 25/02/2010 01:06, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:

>> Okay, see how xen-unstable:20969 works for you.
> 
> 
> Yup, better. Thanks a lot.
> 
> BTW, since debuggers only care about BUG and ASSERT, perhaps
> DEBUGGER_trap_entry could be moved after BUGFRAME_warn, next time
> you are in do_invalid_op().

DEBUGGER_trap_entry is done early in all trap handlers. There are
DEBUGGER_trap_fatal hooks in the ASSERT and BUG paths already which can be
hooked in preference. The place to express that policy is in the debugger.h
header or the debugger itself. And I think it already is done correctly.

 -- Keir

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-02-25  8:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-19  4:21 current not very current (vs curr_vcpu) Mukesh Rathor
2010-02-19  8:12 ` Keir Fraser
2010-02-19 19:23   ` Mukesh Rathor
2010-02-19 21:34     ` Keir Fraser
2010-02-20  3:50       ` Mukesh Rathor
2010-02-20  7:45         ` Keir Fraser
2010-02-22 18:59           ` Mukesh Rathor
2010-02-23 19:46           ` Mukesh Rathor
2010-02-23 21:03             ` Keir Fraser
2010-02-24  3:55               ` Mukesh Rathor
2010-02-24 10:45                 ` Keir Fraser
2010-02-25  1:06                   ` Mukesh Rathor
2010-02-25  8:07                     ` Keir Fraser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).