All of lore.kernel.org
 help / color / mirror / Atom feed
* shadow2 corrupting PV guest state
@ 2006-10-13 23:27 Jeremy Fitzhardinge
  2006-10-14  7:05 ` Keir Fraser
  2006-10-20 13:42 ` Doi.Tsunehisa
  0 siblings, 2 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2006-10-13 23:27 UTC (permalink / raw)
  To: Michael A Fetterman, Tim Deegan, Keir Fraser; +Cc: Chris Wright, Xen-devel

I've been fighting random crashes in the paravirt tree for a while.  
After a fair amount of head-banging, it  looks to me like the shadow2 
code is trashing the guest stack (and maybe register state) at random 
points.

If I boot a kernel with CONFIG_DEBUG_PAGEALLOC enabled (which 
dramatically increases the rate of pagetable modifications), it rarely 
makes it through early boot without some random crash.  The crashes are 
often at the same place, but they move around; however they tend to be 
near places where the pagetable is touched.  It may also interact with 
timer events; certainly masking events seems to help a bit.

I tend to see this a lot more when running under qemu, but I've also 
seen strange things happen on real hardware. 

If I roll Xen back to pre-shadow2 (change fda70200da01), all these 
mysterious crashes disappear. 

Looking into it a bit more deeply, the kind of crash I'm seeing are 
along the lines of:

    mov (%ebx), %eax       # works; %ebx is a valid pointer
    call xen_enable_irq
    mov %eax, (%ebx)   # crashes; %ebx will equal 0, 1, or something bad


where xen_enable_irq will have pushed %ebx, set the flag state, polled 
for pending events and popped %ebx.  My suspicion is that something 
about re-enabling interrupts is causing the on-stack version of %ebx to 
get trashed, rather than the actual %ebx register state (in general the 
corrupted register is the one near or at the top of the stack).  
Sometimes the corruption shows up as %eip off in the weeds (either at 
NULL-ish addresses, or executing the stack).

I'm speculating that the sequence is:

   1. change pagetable; this creates a deferred pagetable update
   2. enable events
   3. handle pending timer interrupt, which also does a deferred
      pagetable update
   4. resume running with corrupted stack

But I don't really know enough about how shadow2 works to know if that's 
really plausible.  Maybe a vcpu/guest context switch is a part of the 
sequence.  

I wonder if the stack corruption is caused by a mismatch of exception 
frame formats between exception->iret?

All a bit handwavy, but I haven't really managed to make much headway.  
I spent some time assuming it was a bug on my side, but the fact that 
all these symptoms go away with pre-shadow2 Xen makes me point the 
finger over the wall.

Or perhaps it really is just a qemu bug, but I can't imagine that 
shadow2 exercises qemu's emulated CPU exception stuff in a way which 
normal Xen doesn't...  I think its more likely that there's some race 
which is much more easily triggered by qemu's slow speed.

Any thoughts or ideas?

    J

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-13 23:27 shadow2 corrupting PV guest state Jeremy Fitzhardinge
@ 2006-10-14  7:05 ` Keir Fraser
  2006-10-16 22:09   ` Jeremy Fitzhardinge
  2006-10-20 13:42 ` Doi.Tsunehisa
  1 sibling, 1 reply; 18+ messages in thread
From: Keir Fraser @ 2006-10-14  7:05 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Michael A Fetterman, Tim Deegan
  Cc: Chris Wright, Xen-devel

On 14/10/06 12:27 am, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:

> All a bit handwavy, but I haven't really managed to make much headway.
> I spent some time assuming it was a bug on my side, but the fact that
> all these symptoms go away with pre-shadow2 Xen makes me point the
> finger over the wall.
> 
> Or perhaps it really is just a qemu bug, but I can't imagine that
> shadow2 exercises qemu's emulated CPU exception stuff in a way which
> normal Xen doesn't...  I think its more likely that there's some race
> which is much more easily triggered by qemu's slow speed.
> 
> Any thoughts or ideas?

The shadow-refcount code in shadow2 is very new. It's likely to have bugs
that need shaking out. It won't be a deferred pagetable update because they
don't exist in shadow2 (updates are always synchronously emulated).

 -- Keir

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-14  7:05 ` Keir Fraser
@ 2006-10-16 22:09   ` Jeremy Fitzhardinge
  2006-10-17  6:51     ` Keir Fraser
  0 siblings, 1 reply; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2006-10-16 22:09 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Chris Wright, Michael A Fetterman, Tim Deegan, Xen-devel

Keir Fraser wrote:
> The shadow-refcount code in shadow2 is very new. It's likely to have bugs
> that need shaking out.

Yep, that's pretty much what I'd expect.

>  It won't be a deferred pagetable update because they
> don't exist in shadow2 (updates are always synchronously emulated).
>   

OK, so every write to a pagetable page will take the fault&emulate 
path?  What's your confidence level in the emulator?

    J

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-16 22:09   ` Jeremy Fitzhardinge
@ 2006-10-17  6:51     ` Keir Fraser
  2006-10-17 18:54       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 18+ messages in thread
From: Keir Fraser @ 2006-10-17  6:51 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Chris Wright, Michael A Fetterman, Tim Deegan, Xen-devel

On 16/10/06 11:09 pm, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:

>>  It won't be a deferred pagetable update because they
>> don't exist in shadow2 (updates are always synchronously emulated).
>>   
> 
> OK, so every write to a pagetable page will take the fault&emulate
> path?  What's your confidence level in the emulator?

Pretty high. It's used *a lot*. I'm >95% confident the bug is elsewhere.

 -- Keir

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-17  6:51     ` Keir Fraser
@ 2006-10-17 18:54       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 18+ messages in thread
From: Jeremy Fitzhardinge @ 2006-10-17 18:54 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Chris Wright, Michael A Fetterman, Tim Deegan, Xen-devel

Keir Fraser wrote:
> Pretty high. It's used *a lot*. I'm >95% confident the bug is elsewhere.
>   

I guess all that code is also used in the hvm case.  What paths are 
unique to non-hvm shadow users?

    J

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-13 23:27 shadow2 corrupting PV guest state Jeremy Fitzhardinge
  2006-10-14  7:05 ` Keir Fraser
@ 2006-10-20 13:42 ` Doi.Tsunehisa
  2006-10-20 13:57   ` Tim Deegan
  1 sibling, 1 reply; 18+ messages in thread
From: Doi.Tsunehisa @ 2006-10-20 13:42 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Chris Wright, Michael A Fetterman, Tim Deegan, Xen-devel

Hi,

You (jeremy) said:
> I've been fighting random crashes in the paravirt tree for a while.  
> After a fair amount of head-banging, it  looks to me like the shadow2 
> code is trashing the guest stack (and maybe register state) at random 
> points.

  I have a question about shadow2 in another point of view.

  I've been porting PV-on-HVM driver for ia64 platform. In my jobs,
I had a doubt that shadow2 might occur a problem of memory corruption.

  At first, I had found the problem as a hypervisor crash during
destruction of HVM domain with active VNIF on ia64 platform. The
reason of crash was that hypervisor detected P2M table used by 
gnttab_copy in the HVM domain destruction. Thus I looked for a way
to avoid hypervisor crash in x86 code.

  So, I found that:

  * Before shadow2 age, x86 and ia64 use same logic for domain
    destruction.
    - at first, release gnttab references
    - destruct page table for VCPU
    - destruct P2M table for domain
    - relinquish memory for domain

  * After shadow2 age, x86 introduces delayed P2M table destruction.
    - release gnttab references
    - destruct page table for VCPU
    - relinquish memory for domain
    - destruct P2M table for domain in domain_destroy()
    *** I don't have confidence in my investigation. 
    *** Am I right ?

  I try to show the code that...

[common/domain.c]
   203  void domain_kill(struct domain *d)
   204  {
   205      domain_pause(d);
   206
   207      if ( test_and_set_bit(_DOMF_dying, &d->domain_flags) )
   208          return;
   209
   210      gnttab_release_mappings(d);
   211      domain_relinquish_resources(d);
   212      put_domain(d);
   213
   214      send_guest_global_virq(dom0, VIRQ_DOM_EXC);
   215  }

[arch/x86/domain.c]
   930  void domain_relinquish_resources(struct domain *d)
   931  {
   932      struct vcpu *v;
   933      unsigned long pfn;
       ....
   937      /* Drop the in-use references to page-table bases. */
   938      for_each_vcpu ( d, v )
       ....
   979      /* Relinquish every page of memory. */
   980      relinquish_memory(d, &d->xenpage_list);
   981      relinquish_memory(d, &d->page_list);
       ....

  This is the code for domain_kill phase. I think that hypervisor
relinquishes memory for domain in this code.

  In the other hand...

[common/domain.c]
   322  /* Release resources belonging to task @p. */
   323  void domain_destroy(struct domain *d)
   324  {
   325      struct domain **pd;
   326      atomic_t      old, new;
       ....
   354      arch_domain_destroy(d);
   355
   356      free_domain(d);
   357
   358      send_guest_global_virq(dom0, VIRQ_DOM_EXC);
   359  }

[arch/x86/domain.c]
   237  void arch_domain_destroy(struct domain *d)
   238  {
   239      shadow_final_teardown(d);
      ....

[arch/x86/mm/shadow/common.c]
  2580  void shadow_final_teardown(struct domain *d)
  2581  /* Called by arch_domain_destroy(), when it's safe to pull down the p2m
map. */
  2582  {
      ....
  2597      /* It is now safe to pull down the p2m map. */
  2598      if ( d->arch.shadow.p2m_pages != 0 )
  2599          shadow_p2m_teardown(d);

  In this code, P2M table are released.

  If my speculation is correct, shadow2 may occur a problem of memory
corruption.

  What do you think about this point ?

Thanks,
- Tsunehisa Doi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-20 13:42 ` Doi.Tsunehisa
@ 2006-10-20 13:57   ` Tim Deegan
  2006-10-23  5:45     ` Doi.Tsunehisa
  0 siblings, 1 reply; 18+ messages in thread
From: Tim Deegan @ 2006-10-20 13:57 UTC (permalink / raw)
  To: Doi.Tsunehisa
  Cc: Jeremy Fitzhardinge, Xen-devel, Tim Deegan, Chris Wright,
	Michael A Fetterman

Hi,

At 22:42 +0900 on 20 Oct (1161384159), Doi.Tsunehisa@jp.fujitsu.com wrote:
>   So, I found that:
> 
>   * Before shadow2 age, x86 and ia64 use same logic for domain
>     destruction.
>     - at first, release gnttab references
>     - destruct page table for VCPU
>     - destruct P2M table for domain
>     - relinquish memory for domain
> 
>   * After shadow2 age, x86 introduces delayed P2M table destruction.
>     - release gnttab references
>     - destruct page table for VCPU
>     - relinquish memory for domain
>     - destruct P2M table for domain in domain_destroy()
>     *** I don't have confidence in my investigation. 
>     *** Am I right ?

Yep.  The P2M table can't be destroyed in domain_relinquish_resources,
as it is needed when pulling down grant references, and foreign domains
may have outstanding grant references to the dying domain's memory even
after domain_relinquish_resources.

>   If my speculation is correct, shadow2 may occur a problem of memory
> corruption.

I don't follow quite why this would lead to memory corruption.  Can you
explain?

Tim.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-20 13:57   ` Tim Deegan
@ 2006-10-23  5:45     ` Doi.Tsunehisa
  2006-10-23 10:26       ` Tim Deegan
  0 siblings, 1 reply; 18+ messages in thread
From: Doi.Tsunehisa @ 2006-10-23  5:45 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Jeremy Fitzhardinge, Xen-devel, Chris Wright, Michael A Fetterman,
	Doi.Tsunehisa

Hi,

You (Tim.Deegan) said:
>>   * Before shadow2 age, x86 and ia64 use same logic for domain
>>     destruction.
>>     - at first, release gnttab references
>>     - destruct page table for VCPU
>>     - destruct P2M table for domain
>>     - relinquish memory for domain
>> 
>>   * After shadow2 age, x86 introduces delayed P2M table destruction.
>>     - release gnttab references
>>     - destruct page table for VCPU
>>     - relinquish memory for domain
>>     - destruct P2M table for domain in domain_destroy()
>>     *** I don't have confidence in my investigation. 
>>     *** Am I right ?
> 
> Yep.  The P2M table can't be destroyed in domain_relinquish_resources,
> as it is needed when pulling down grant references, and foreign domains
> may have outstanding grant references to the dying domain's memory even
> after domain_relinquish_resources.

  Thanks.

 I've supposed that the introduce of P2M table delayed destruction was
for gnttab_copy feature indeed. It seems that other grant table
references are released with gnttab_release_mapping() in domain_kill().

>>   If my speculation is correct, shadow2 may occur a problem of memory
>> corruption.
> 
> I don't follow quite why this would lead to memory corruption.  Can you
> explain?

  I'll try to explain it.

  Basically, the referencee should not be released during to exist the
referencer, I think.

  In domain_kill phase, domain_relinquish_resource releases a memory
of destroying domain. So, the memory may use other domain. But, P2M
table of the domain exists, then the memory might be corrupted by
gnttab_copy.

  In __gnttab_copy code, it will avoid to corrupt a memory that was
used in destroying domain with __acquire_grant_for_copy and get_page.
But, I think that it has atomicity issue of owner.

  In my opinion, P2M table should be destroyed before releasing memory.
So, if gnttab_copy will use it, it shall be failed as invalid access,
because the gnttab_copy doesn't have to succeed during the domain
destruction.

  What do you think about ?

Thanks,
- Tsunehisa Doi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-23  5:45     ` Doi.Tsunehisa
@ 2006-10-23 10:26       ` Tim Deegan
  2006-10-23 11:21         ` Doi.Tsunehisa
  0 siblings, 1 reply; 18+ messages in thread
From: Tim Deegan @ 2006-10-23 10:26 UTC (permalink / raw)
  To: Doi.Tsunehisa
  Cc: Chris Wright, Michael A Fetterman, Jeremy Fitzhardinge, Xen-devel

At 14:45 +0900 on 23 Oct (1161614732), Doi.Tsunehisa@jp.fujitsu.com wrote:
>   Basically, the referencee should not be released during to exist the
> referencer, I think.
> 
>   In domain_kill phase, domain_relinquish_resource releases a memory
> of destroying domain. So, the memory may use other domain. But, P2M
> table of the domain exists, then the memory might be corrupted by
> gnttab_copy.
> 
>   In __gnttab_copy code, it will avoid to corrupt a memory that was
> used in destroying domain with __acquire_grant_for_copy and get_page.
> But, I think that it has atomicity issue of owner.

Are you worried about a race where the foreign domain is destroyed and
another domain created, with the same struct domain pointer, and which
owns the same frame, between the __acquire_grant_for_copy() and the
get_page()?

Earlier in __gnttab_copy, we call find_domain_by_id() on the foreign
domain, which calls get_domain(), so we're safe from that.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-23 10:26       ` Tim Deegan
@ 2006-10-23 11:21         ` Doi.Tsunehisa
  2006-10-23 12:42           ` Tim Deegan
  0 siblings, 1 reply; 18+ messages in thread
From: Doi.Tsunehisa @ 2006-10-23 11:21 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Jeremy Fitzhardinge, Xen-devel, Chris Wright, Michael A Fetterman,
	Doi.Tsunehisa

  Hi,

You (Tim.Deegan) said:
>>   Basically, the referencee should not be released during to exist the
>> referencer, I think.
>> 
>>   In domain_kill phase, domain_relinquish_resource releases a memory
>> of destroying domain. So, the memory may use other domain. But, P2M
>> table of the domain exists, then the memory might be corrupted by
>> gnttab_copy.
>> 
>>   In __gnttab_copy code, it will avoid to corrupt a memory that was
>> used in destroying domain with __acquire_grant_for_copy and get_page.
>> But, I think that it has atomicity issue of owner.
> 
> Are you worried about a race where the foreign domain is destroyed and
> another domain created, with the same struct domain pointer, and which
> owns the same frame, between the __acquire_grant_for_copy() and the
> get_page()?

  No, I'm worried that two domains use with same page frame.

  The released pages can be used by new domain, but old domain sturct
exists between domain_kill and domain_destroy.

> Earlier in __gnttab_copy, we call find_domain_by_id() on the foreign
> domain, which calls get_domain(), so we're safe from that.

  I suppose that find_domain_by_id doesn't ensure not to be used by
both domains.

Thanks,
- Tsunehisa Doi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-23 11:21         ` Doi.Tsunehisa
@ 2006-10-23 12:42           ` Tim Deegan
  2006-10-24  7:18             ` Doi.Tsunehisa
  0 siblings, 1 reply; 18+ messages in thread
From: Tim Deegan @ 2006-10-23 12:42 UTC (permalink / raw)
  To: Doi.Tsunehisa; +Cc: Xen-devel

At 20:21 +0900 on 23 Oct (1161634872), Doi.Tsunehisa@jp.fujitsu.com wrote:
> > Are you worried about a race where the foreign domain is destroyed and
> > another domain created, with the same struct domain pointer, and which
> > owns the same frame, between the __acquire_grant_for_copy() and the
> > get_page()?
> 
>   No, I'm worried that two domains use with same page frame.
> 
>   The released pages can be used by new domain, but old domain sturct
> exists between domain_kill and domain_destroy.

If the released frames are used by a new domain, get_page() will fail:
the old domain still exists (we have a reference to it), so the new
owner's domain pointer must be different from the one we pass to
get_page.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-23 12:42           ` Tim Deegan
@ 2006-10-24  7:18             ` Doi.Tsunehisa
  2006-10-24  9:09               ` Tim Deegan
  0 siblings, 1 reply; 18+ messages in thread
From: Doi.Tsunehisa @ 2006-10-24  7:18 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Xen-devel, Doi.Tsunehisa

Hi Tim,

You (Tim.Deegan) said:
>>> Are you worried about a race where the foreign domain is destroyed and
>>> another domain created, with the same struct domain pointer, and which
>>> owns the same frame, between the __acquire_grant_for_copy() and the
>>> get_page()?
>> 
>>   No, I'm worried that two domains use with same page frame.
>> 
>>   The released pages can be used by new domain, but old domain sturct
>> exists between domain_kill and domain_destroy.
> 
> If the released frames are used by a new domain, get_page() will fail:
> the old domain still exists (we have a reference to it), so the new
> owner's domain pointer must be different from the one we pass to
> get_page.

  In my investigation, get_page() assumes that the page frame is in
use. But, the page_info structure of released page frame should not
be treated as inuse. Thus the nd value might be invalid value in this
situation, I think.

[xen/include/asm-x86/mm.h]
static inline int get_page(struct page_info *page,
                           struct domain *domain)
{
    u32 x, nx, y = page->count_info;
    u32 d, nd = page->u.inuse._domain;    /* <<=== THIS LINE */
    u32 _domain = pickle_domptr(domain);

    do {
        x  = y;
        nx = x + 1;
        d  = nd;
        if ( unlikely((x & PGC_count_mask) == 0) ||  /* Not allocated? */
             unlikely((nx & PGC_count_mask) == 0) || /* Count overflow? */
             unlikely(d != _domain) )                /* Wrong owner? */
        {
            if ( !_shadow_mode_refcounts(domain) )
                DPRINTK("Error pfn %lx: rd=%p, od=%p, caf=%08x, taf=%"
                        PRtype_info "\n",
                        page_to_mfn(page), domain, unpickle_domptr(d),
                        x, page->u.inuse.type_info);
            return 0;
        }
        __asm__ __volatile__(
            LOCK_PREFIX "cmpxchg8b %3"
            : "=d" (nd), "=a" (y), "=c" (d),
              "=m" (*(volatile u64 *)(&page->count_info))
            : "0" (d), "1" (x), "c" (d), "b" (nx) );    /* <<=== THIS LINE */
    }
    while ( unlikely(nd != d) || unlikely(y != x) );

    return 1;
}

Thanks,
- Tsunehisa Doi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-24  7:18             ` Doi.Tsunehisa
@ 2006-10-24  9:09               ` Tim Deegan
  2006-10-24  9:39                 ` Doi.Tsunehisa
  0 siblings, 1 reply; 18+ messages in thread
From: Tim Deegan @ 2006-10-24  9:09 UTC (permalink / raw)
  To: Doi.Tsunehisa; +Cc: Tim Deegan, Xen-devel

Hi,

At 16:18 +0900 on 24 Oct (1161706722), Doi.Tsunehisa@jp.fujitsu.com wrote:
>   In my investigation, get_page() assumes that the page frame is in
> use. But, the page_info structure of released page frame should not
> be treated as inuse. Thus the nd value might be invalid value in this
> situation, I think.

True, but we don't look at nd unless the page is allocated...

>         if ( unlikely((x & PGC_count_mask) == 0) ||  /* Not allocated? */
>              unlikely((nx & PGC_count_mask) == 0) || /* Count overflow? */
>              unlikely(d != _domain) )                /* Wrong owner? */

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-24  9:09               ` Tim Deegan
@ 2006-10-24  9:39                 ` Doi.Tsunehisa
  2006-10-24  9:44                   ` Keir Fraser
  0 siblings, 1 reply; 18+ messages in thread
From: Doi.Tsunehisa @ 2006-10-24  9:39 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Xen-devel, Doi.Tsunehisa

Hi,

You (Tim.Deegan) said:
>>   In my investigation, get_page() assumes that the page frame is in
>> use. But, the page_info structure of released page frame should not
>> be treated as inuse. Thus the nd value might be invalid value in this
>> situation, I think.
> 
> True, but we don't look at nd unless the page is allocated...

  Between domain_relinquish_resources() in domain_kill() and
shadow_final_teadown() in domain_destroy(), nd might be looked
with gnttab_copy(), I think.

 Is it true ?

Thanks,
- Tsunehisa Doi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-24  9:39                 ` Doi.Tsunehisa
@ 2006-10-24  9:44                   ` Keir Fraser
  2006-10-24 10:05                     ` Doi.Tsunehisa
  0 siblings, 1 reply; 18+ messages in thread
From: Keir Fraser @ 2006-10-24  9:44 UTC (permalink / raw)
  To: Doi.Tsunehisa, Tim Deegan; +Cc: Xen-devel




On 24/10/06 10:39, "Doi.Tsunehisa@jp.fujitsu.com"
<Doi.Tsunehisa@jp.fujitsu.com> wrote:

>> True, but we don't look at nd unless the page is allocated...
> 
>   Between domain_relinquish_resources() in domain_kill() and
> shadow_final_teadown() in domain_destroy(), nd might be looked
> with gnttab_copy(), I think.

Domain_destroy() is only called when the domain refcnt reaches zero. This
can only happen when all its page refcnts have reached zero. When a page's
refcnt reaches zero, get_page() no longer succeeds on it. So there is no
race between gnttab_copy() and domain_destroy().

 -- Keir

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-24  9:44                   ` Keir Fraser
@ 2006-10-24 10:05                     ` Doi.Tsunehisa
  2006-10-24 10:08                       ` Keir Fraser
  0 siblings, 1 reply; 18+ messages in thread
From: Doi.Tsunehisa @ 2006-10-24 10:05 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel, Tim Deegan, Doi.Tsunehisa

You (Keir.Fraser) said:
>>> True, but we don't look at nd unless the page is allocated...
>> 
>>   Between domain_relinquish_resources() in domain_kill() and
>> shadow_final_teadown() in domain_destroy(), nd might be looked
>> with gnttab_copy(), I think.
> 
> Domain_destroy() is only called when the domain refcnt reaches zero. This
> can only happen when all its page refcnts have reached zero. When a page's
> refcnt reaches zero, get_page() no longer succeeds on it. So there is no
> race between gnttab_copy() and domain_destroy().

  I see.

  I want to confirm that...

  In free_domheap_pages(), if the page counts of each section are zero,
then domain refcount is decreased. Finaly the domain refcount is zero,
and domain_destroy() is called.

  In the other hand, get_page checks page count like below...

    .....
        if ( unlikely((x & PGC_count_mask) == 0) ||  /* Not allocated? */
             unlikely((nx & PGC_count_mask) == 0) || /* Count overflow? */
             unlikely(d != _domain) )                /* Wrong owner? */
    .....

  Thus, get_page can't succeeds on it.

  Is my understaning is right ?

Thanks,
- Tsunehisa Doi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-24 10:05                     ` Doi.Tsunehisa
@ 2006-10-24 10:08                       ` Keir Fraser
  2006-10-24 10:16                         ` Doi.Tsunehisa
  0 siblings, 1 reply; 18+ messages in thread
From: Keir Fraser @ 2006-10-24 10:08 UTC (permalink / raw)
  To: Doi.Tsunehisa; +Cc: Tim Deegan, Xen-devel




On 24/10/06 11:05, "Doi.Tsunehisa@jp.fujitsu.com"
<Doi.Tsunehisa@jp.fujitsu.com> wrote:

>  Thus, get_page can't succeeds on it.
> 
>   Is my understaning is right ?

Indeed it is.

 -- Keir

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: shadow2 corrupting PV guest state
  2006-10-24 10:08                       ` Keir Fraser
@ 2006-10-24 10:16                         ` Doi.Tsunehisa
  0 siblings, 0 replies; 18+ messages in thread
From: Doi.Tsunehisa @ 2006-10-24 10:16 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Xen-devel, Tim Deegan, Doi.Tsunehisa

You (Keir.Fraser) said:
>>  Thus, get_page can't succeeds on it.
>> 
>>   Is my understaning is right ?
> 
> Indeed it is.

  Thank.

- Tsunehisa Doi

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-10-24 10:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-13 23:27 shadow2 corrupting PV guest state Jeremy Fitzhardinge
2006-10-14  7:05 ` Keir Fraser
2006-10-16 22:09   ` Jeremy Fitzhardinge
2006-10-17  6:51     ` Keir Fraser
2006-10-17 18:54       ` Jeremy Fitzhardinge
2006-10-20 13:42 ` Doi.Tsunehisa
2006-10-20 13:57   ` Tim Deegan
2006-10-23  5:45     ` Doi.Tsunehisa
2006-10-23 10:26       ` Tim Deegan
2006-10-23 11:21         ` Doi.Tsunehisa
2006-10-23 12:42           ` Tim Deegan
2006-10-24  7:18             ` Doi.Tsunehisa
2006-10-24  9:09               ` Tim Deegan
2006-10-24  9:39                 ` Doi.Tsunehisa
2006-10-24  9:44                   ` Keir Fraser
2006-10-24 10:05                     ` Doi.Tsunehisa
2006-10-24 10:08                       ` Keir Fraser
2006-10-24 10:16                         ` Doi.Tsunehisa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.