* [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall
@ 2012-11-29 7:02 Kouya Shimura
2012-11-29 15:40 ` Tim Deegan
0 siblings, 1 reply; 5+ messages in thread
From: Kouya Shimura @ 2012-11-29 7:02 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1: Type: text/plain, Size: 2534 bytes --]
There is a race condition between XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY
and HVMOP_track_dirty_vram hypercall.
Although HVMOP_track_dirty_vram is called many times from qemu-dm
which is connected via VNC, XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY is
called only once from a migration process (e.g. xc_save, libxl-save-helper).
So the race seldom happens, but the following cases are possible.
========================================================================
[case-1]
XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY hypercall
-> paging_enable_logdirty()
-> hap_logdirty_init()
-> paging_log_dirty_disable()
dirty_vram = NULL
-> paging_log_dirty_init(d, hap_enable_log_dirty, ...) ---> (A)
-> paging_log_dirty_enable()
**************************************************************************
/* <--- (B) */
-> hap_enable_vram_tracking() // should be hap_enable_log_dirty() !!!
return -EINVAL
<- return -EINVAL // live-migration failure!!!
**************************************************************************
HVMOP_track_dirty_vram
-> hap_track_dirty_vram()
/* (!paging_mode_log_dirty(d) && !dirty_vram) */
-> hap_vram_tracking_init()
/* <--- (A) */
-> paging_log_dirty_init(d, hap_enable_vram_tracking, ...) ---> (B)
-> paging_log_dirty_enable()
========================================================================
[case-2]
XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY hypercall
-> paging_enable_logdirty()
-> hap_logdirty_init()
-> paging_log_dirty_disable()
/* d->arch.hvm_domain.dirty_vram != NULL */
d->arch.paging.mode &= ~PG_log_dirty; ---> (C)
/* d->arch.hvm_domain.dirty_vram is freed */
dereference dirty_vram // access to freed memory !!! <--- (D)
HVMOP_track_dirty_vram
-> hap_track_dirty_vram()
/* (!paging_mode_log_dirty(d) && dirty_vram) */ <---(C)
rc = -EINVAL
xfree(dirty_vram); ---> (D)
dirty_vram = d->arch.hvm_domain.dirty_vram = NULL;
return rc;
========================================================================
Actually I encountered the xen crash by null pointer exception in xen-3.4.
FYI, in xen-4.x, c/s 19738:8dd5c3cae086 mitigated it.
I'm not sure why paging_lock() is used partially in hap_XXX_vram_tracking
functions. Thus, this patch introduces a new lock.
It would be better to use paging_lock() instead of the new lock
since shadow paging mode (not HAP mode) uses paging_lock to avoid
this race condition.
Thanks,
Kouya
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
[-- Attachment #2: hap_dirty_vram.patch --]
[-- Type: text/x-patch, Size: 3076 bytes --]
diff -r d1d05cb59a76 xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c Wed Nov 14 16:27:58 2012 +0000
+++ b/xen/arch/x86/hvm/hvm.c Wed Nov 28 16:04:19 2012 +0900
@@ -513,6 +513,7 @@ int hvm_domain_initialise(struct domain
spin_lock_init(&d->arch.hvm_domain.pbuf_lock);
spin_lock_init(&d->arch.hvm_domain.irq_lock);
spin_lock_init(&d->arch.hvm_domain.uc_lock);
+ spin_lock_init(&d->arch.hvm_domain.dirty_vram_lock);
INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
spin_lock_init(&d->arch.hvm_domain.msixtbl_list_lock);
diff -r d1d05cb59a76 xen/arch/x86/mm/hap/hap.c
--- a/xen/arch/x86/mm/hap/hap.c Wed Nov 14 16:27:58 2012 +0000
+++ b/xen/arch/x86/mm/hap/hap.c Wed Nov 28 16:04:19 2012 +0900
@@ -122,7 +122,12 @@ int hap_track_dirty_vram(struct domain *
XEN_GUEST_HANDLE_64(uint8) dirty_bitmap)
{
long rc = 0;
- struct sh_dirty_vram *dirty_vram = d->arch.hvm_domain.dirty_vram;
+ struct sh_dirty_vram *dirty_vram;
+
+ if ( !spin_trylock(&d->arch.hvm_domain.dirty_vram_lock) )
+ return -ENODATA;
+
+ dirty_vram = d->arch.hvm_domain.dirty_vram;
if ( nr )
{
@@ -174,6 +179,7 @@ int hap_track_dirty_vram(struct domain *
rc = 0;
}
+ spin_unlock(&d->arch.hvm_domain.dirty_vram_lock);
return rc;
param_fail:
@@ -182,6 +188,8 @@ param_fail:
xfree(dirty_vram);
dirty_vram = d->arch.hvm_domain.dirty_vram = NULL;
}
+
+ spin_unlock(&d->arch.hvm_domain.dirty_vram_lock);
return rc;
}
diff -r d1d05cb59a76 xen/arch/x86/mm/paging.c
--- a/xen/arch/x86/mm/paging.c Wed Nov 14 16:27:58 2012 +0000
+++ b/xen/arch/x86/mm/paging.c Wed Nov 28 16:04:19 2012 +0900
@@ -697,14 +697,21 @@ int paging_domctl(struct domain *d, xen_
break;
/* Else fall through... */
case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
+ spin_lock(&d->arch.hvm_domain.dirty_vram_lock);
if ( hap_enabled(d) )
hap_logdirty_init(d);
- return paging_log_dirty_enable(d);
+ rc = paging_log_dirty_enable(d);
+ spin_unlock(&d->arch.hvm_domain.dirty_vram_lock);
+ return rc;
case XEN_DOMCTL_SHADOW_OP_OFF:
+ spin_lock(&d->arch.hvm_domain.dirty_vram_lock);
+ rc = 0;
if ( paging_mode_log_dirty(d) )
- if ( (rc = paging_log_dirty_disable(d)) != 0 )
- return rc;
+ rc = paging_log_dirty_disable(d);
+ spin_unlock(&d->arch.hvm_domain.dirty_vram_lock);
+ if ( rc != 0 )
+ return rc;
break;
case XEN_DOMCTL_SHADOW_OP_CLEAN:
diff -r d1d05cb59a76 xen/include/asm-x86/hvm/domain.h
--- a/xen/include/asm-x86/hvm/domain.h Wed Nov 14 16:27:58 2012 +0000
+++ b/xen/include/asm-x86/hvm/domain.h Wed Nov 28 16:04:19 2012 +0900
@@ -75,6 +75,7 @@ struct hvm_domain {
/* VRAM dirty support. */
struct sh_dirty_vram *dirty_vram;
+ spinlock_t dirty_vram_lock;
/* If one of vcpus of this domain is in no_fill_mode or
* mtrr/pat between vcpus is not the same, set is_in_uc_mode
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall
2012-11-29 7:02 [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall Kouya Shimura
@ 2012-11-29 15:40 ` Tim Deegan
2012-12-03 17:59 ` Robert Phillips
0 siblings, 1 reply; 5+ messages in thread
From: Tim Deegan @ 2012-11-29 15:40 UTC (permalink / raw)
To: Kouya Shimura; +Cc: Robert Phillips, xen-devel
At 16:02 +0900 on 29 Nov (1354204941), Kouya Shimura wrote:
> I'm not sure why paging_lock() is used partially in hap_XXX_vram_tracking
> functions. Thus, this patch introduces a new lock.
> It would be better to use paging_lock() instead of the new lock
> since shadow paging mode (not HAP mode) uses paging_lock to avoid
> this race condition.
I think you're right - it would be better to use the paging_lock.
Cc'ing Robert Phillips, who's got a big patch outstanding that touches
the locking in this code. I think the right thing to do is make sure
his patch fixes the issue and the backport just the locking parts of it
to older trees.
Robert, in your patch you do wrap this all in the paging_lock, but then
unlock to call various enable and disable routines. Is there a version
of this reace condition there, where some other CPU might call
LOG_DIRTY_ENABLE while you've temporarily dropped the lock?
Cheers,
Tim.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall
2012-11-29 15:40 ` Tim Deegan
@ 2012-12-03 17:59 ` Robert Phillips
2012-12-06 9:32 ` Tim Deegan
0 siblings, 1 reply; 5+ messages in thread
From: Robert Phillips @ 2012-12-03 17:59 UTC (permalink / raw)
To: Tim (Xen.org), Kouya Shimura; +Cc: xen-devel@lists.xen.org
Tim,
> -----Original Message-----
> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Thursday, November 29, 2012 10:41 AM
> To: Kouya Shimura
> Cc: Robert Phillips; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH] x86/hap: fix race condition between
> ENABLE_LOGDIRTY and track_dirty_vram hypercall
>
> At 16:02 +0900 on 29 Nov (1354204941), Kouya Shimura wrote:
> > I'm not sure why paging_lock() is used partially in hap_XXX_vram_tracking
> > functions. Thus, this patch introduces a new lock.
> > It would be better to use paging_lock() instead of the new lock
> > since shadow paging mode (not HAP mode) uses paging_lock to avoid
> > this race condition.
>
> I think you're right - it would be better to use the paging_lock.
>
> Cc'ing Robert Phillips, who's got a big patch outstanding that touches
> the locking in this code. I think the right thing to do is make sure
> his patch fixes the issue and the backport just the locking parts of it
> to older trees.
>
> Robert, in your patch you do wrap this all in the paging_lock, but then
> unlock to call various enable and disable routines. Is there a version
> of this race condition there, where some other CPU might call
> LOG_DIRTY_ENABLE while you've temporarily dropped the lock?
My proposed patch does not modify the problematic locking code so,
unfortunately, it preserves the race condition that Kouya Shimura
has discovered.
I question whether his proposed patch would be suitable for the
multiple frame buffer situation that my proposed patch addresses.
It is possible that a guest might be updating its frame buffers when
live migration starts, and the same race would result.
I think the domain.arch.paging.log_dirty function pointers are problematic.
They are modified and executed without benefit of locking.
I am uncomfortable with adding another lock.
I will look at updating my patch to avoid the race and will (hopefully)
avoid adding another lock.
-- rsp
>
> Cheers,
>
> Tim.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall
2012-12-03 17:59 ` Robert Phillips
@ 2012-12-06 9:32 ` Tim Deegan
2012-12-06 10:36 ` Robert Phillips
0 siblings, 1 reply; 5+ messages in thread
From: Tim Deegan @ 2012-12-06 9:32 UTC (permalink / raw)
To: Robert Phillips; +Cc: Kouya Shimura, xen-devel@lists.xen.org
Hi,
At 12:59 -0500 on 03 Dec (1354539567), Robert Phillips wrote:
> > Robert, in your patch you do wrap this all in the paging_lock, but then
> > unlock to call various enable and disable routines. Is there a version
> > of this race condition there, where some other CPU might call
> > LOG_DIRTY_ENABLE while you've temporarily dropped the lock?
>
> My proposed patch does not modify the problematic locking code so,
> unfortunately, it preserves the race condition that Kouya Shimura
> has discovered.
>
> I question whether his proposed patch would be suitable for the
> multiple frame buffer situation that my proposed patch addresses.
> It is possible that a guest might be updating its frame buffers when
> live migration starts, and the same race would result.
>
> I think the domain.arch.paging.log_dirty function pointers are problematic.
> They are modified and executed without benefit of locking.
>
> I am uncomfortable with adding another lock.
>
> I will look at updating my patch to avoid the race and will (hopefully)
> avoid adding another lock.
Thanks. I think the paging_lock can probably cover everything we need
here. These are toolstack operations and should be fairly rare, and HAP
can do most of its work without the paging_lock.
Also, in the next version can you please update this section:
+int hap_track_dirty_vram(struct domain *d,
+ unsigned long begin_pfn,
+ unsigned long nr,
+ XEN_GUEST_HANDLE_64(uint8) guest_dirty_bitmap)
+{
+ long rc = 0;
+ dv_dirty_vram_t *dirty_vram;
+
+ paging_lock(d);
+ dirty_vram = d->arch.hvm_domain.dirty_vram;
+ if ( nr )
+ {
+ dv_range_t *range = NULL;
+ int size = ( nr + BITS_PER_LONG - 1 ) & ~( BITS_PER_LONG - 1 );
+ uint8_t dirty_bitmap[size];
not to allocate a guest-specified amount of stack memory. This is one
of the things recently found and fixed in the existing code as XSA-27.
http://xenbits.xen.org/hg/staging/xen-4.1-testing.hg/rev/53ef1f35a0f8
Cheers,
Tim.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall
2012-12-06 9:32 ` Tim Deegan
@ 2012-12-06 10:36 ` Robert Phillips
0 siblings, 0 replies; 5+ messages in thread
From: Robert Phillips @ 2012-12-06 10:36 UTC (permalink / raw)
To: Tim (Xen.org); +Cc: Kouya Shimura, xen-devel@lists.xen.org
Hi Tim,
> -----Original Message-----
> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Thursday, December 06, 2012 4:32 AM
> To: Robert Phillips
> Cc: Kouya Shimura; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH] x86/hap: fix race condition between
> ENABLE_LOGDIRTY and track_dirty_vram hypercall
>
> Hi,
>
> At 12:59 -0500 on 03 Dec (1354539567), Robert Phillips wrote:
> > > Robert, in your patch you do wrap this all in the paging_lock, but
> > > then unlock to call various enable and disable routines. Is there a
> > > version of this race condition there, where some other CPU might
> > > call LOG_DIRTY_ENABLE while you've temporarily dropped the lock?
> >
> > My proposed patch does not modify the problematic locking code so,
> > unfortunately, it preserves the race condition that Kouya Shimura has
> > discovered.
> >
> > I question whether his proposed patch would be suitable for the
> > multiple frame buffer situation that my proposed patch addresses.
> > It is possible that a guest might be updating its frame buffers when
> > live migration starts, and the same race would result.
> >
> > I think the domain.arch.paging.log_dirty function pointers are problematic.
> > They are modified and executed without benefit of locking.
> >
> > I am uncomfortable with adding another lock.
> >
> > I will look at updating my patch to avoid the race and will
> > (hopefully) avoid adding another lock.
>
> Thanks. I think the paging_lock can probably cover everything we need here.
> These are toolstack operations and should be fairly rare, and HAP can do
> most of its work without the paging_lock.
I agree. I am currently working on fix. Should be ready in a short while.
>
> Also, in the next version can you please update this section:
>
> +int hap_track_dirty_vram(struct domain *d,
> + unsigned long begin_pfn,
> + unsigned long nr,
> + XEN_GUEST_HANDLE_64(uint8) guest_dirty_bitmap)
> +{
> + long rc = 0;
> + dv_dirty_vram_t *dirty_vram;
> +
> + paging_lock(d);
> + dirty_vram = d->arch.hvm_domain.dirty_vram;
> + if ( nr )
> + {
> + dv_range_t *range = NULL;
> + int size = ( nr + BITS_PER_LONG - 1 ) & ~( BITS_PER_LONG - 1 );
> + uint8_t dirty_bitmap[size];
>
> not to allocate a guest-specified amount of stack memory. This is one of the
> things recently found and fixed in the existing code as XSA-27.
> http://xenbits.xen.org/hg/staging/xen-4.1-testing.hg/rev/53ef1f35a0f8
I'll include this improvement too.
-- rsp
>
> Cheers,
>
> Tim.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-12-06 10:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-29 7:02 [PATCH] x86/hap: fix race condition between ENABLE_LOGDIRTY and track_dirty_vram hypercall Kouya Shimura
2012-11-29 15:40 ` Tim Deegan
2012-12-03 17:59 ` Robert Phillips
2012-12-06 9:32 ` Tim Deegan
2012-12-06 10:36 ` Robert Phillips
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).