* [PATCH] KVM, XEN: Fix potential race in pvclock code
@ 2014-01-16 14:13 Julian Stecklina
2014-01-16 15:04 ` [Xen-devel] " Jan Beulich
2014-01-24 18:08 ` Konrad Rzeszutek Wilk
0 siblings, 2 replies; 11+ messages in thread
From: Julian Stecklina @ 2014-01-16 14:13 UTC (permalink / raw)
To: kvm; +Cc: Julian Stecklina, xen-devel
The paravirtualized clock used in KVM and Xen uses a version field to
allow the guest to see when the shared data structure is inconsistent.
The code reads the version field twice (before and after the data
structure is copied) and checks whether they haven't
changed and that the lowest bit is not set. As the second access is not
synchronized, the compiler could generate code that accesses version
more than two times and you end up with inconsistent data.
An example using pvclock_get_time_values:
host starts updating data, sets src->version to 1
guest reads src->version (1) and stores it into dst->version.
guest copies inconsistent data
guest reads src->version (1) and computes xor with dst->version.
host finishes updating data and sets src->version to 2
guest reads src->version (2) and checks whether lower bit is not set.
while loop exits with inconsistent data!
AFAICS the compiler is allowed to optimize the given code this way.
Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de>
---
arch/x86/kernel/pvclock.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 42eb330..f62b41c 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -55,6 +55,8 @@ static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
struct pvclock_vcpu_time_info *src)
{
+ u32 nversion;
+
do {
dst->version = src->version;
rmb(); /* fetch version before data */
@@ -64,7 +66,8 @@ static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
dst->tsc_shift = src->tsc_shift;
dst->flags = src->flags;
rmb(); /* test version after fetching data */
- } while ((src->version & 1) || (dst->version != src->version));
+ nversion = ACCESS_ONCE(src->version);
+ } while ((nversion & 1) || (dst->version != nversion));
return dst->version;
}
@@ -135,7 +138,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
struct pvclock_vcpu_time_info *vcpu_time,
struct timespec *ts)
{
- u32 version;
+ u32 version, nversion;
u64 delta;
struct timespec now;
@@ -146,7 +149,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
now.tv_sec = wall_clock->sec;
now.tv_nsec = wall_clock->nsec;
rmb(); /* fetch time before checking version */
- } while ((wall_clock->version & 1) || (version != wall_clock->version));
+ nversion = ACCESS_ONCE(wall_clock->version);
+ } while ((nversion & 1) || (version != nversion));
delta = pvclock_clocksource_read(vcpu_time); /* time since system boot */
delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
--
1.8.4.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code
2014-01-16 14:13 [PATCH] KVM, XEN: Fix potential race in pvclock code Julian Stecklina
@ 2014-01-16 15:04 ` Jan Beulich
2014-01-16 16:04 ` Julian Stecklina
2014-01-24 18:08 ` Konrad Rzeszutek Wilk
1 sibling, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2014-01-16 15:04 UTC (permalink / raw)
To: Julian Stecklina; +Cc: xen-devel, kvm
>>> On 16.01.14 at 15:13, Julian Stecklina <jsteckli@os.inf.tu-dresden.de> wrote:
> The paravirtualized clock used in KVM and Xen uses a version field to
> allow the guest to see when the shared data structure is inconsistent.
> The code reads the version field twice (before and after the data
> structure is copied) and checks whether they haven't
> changed and that the lowest bit is not set. As the second access is not
> synchronized, the compiler could generate code that accesses version
> more than two times and you end up with inconsistent data.
>
> An example using pvclock_get_time_values:
>
> host starts updating data, sets src->version to 1
> guest reads src->version (1) and stores it into dst->version.
> guest copies inconsistent data
> guest reads src->version (1) and computes xor with dst->version.
> host finishes updating data and sets src->version to 2
> guest reads src->version (2) and checks whether lower bit is not set.
> while loop exits with inconsistent data!
>
> AFAICS the compiler is allowed to optimize the given code this way.
I don't think so - this would only be an issue if the conditions used
| instead of ||. || implies a sequence point between evaluating the
left and right sides, and the standard says: "The presence of a
sequence point between the evaluation of expressions A and B
implies that every value computation and side effect associated
with A is sequenced before every value computation and side
effect associated with B."
And even if there was a problem (i.e. my interpretation of the
above being incorrect), I don't think you'd need ACCESS_ONCE()
here: The same local variable can't have two different values in
two different use sites when there was no intermediate
assignment to it.
Interestingly the old XenoLinux code uses | instead of || in
both cases, yet only one of the two also entertains the use
of a local variable. I shall fix this (read: Thanks for pointing
out even if in the upstream incarnation this is not a problem).
Jan
> Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de>
> ---
> arch/x86/kernel/pvclock.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 42eb330..f62b41c 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -55,6 +55,8 @@ static u64 pvclock_get_nsec_offset(struct
> pvclock_shadow_time *shadow)
> static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
> struct pvclock_vcpu_time_info *src)
> {
> + u32 nversion;
> +
> do {
> dst->version = src->version;
> rmb(); /* fetch version before data */
> @@ -64,7 +66,8 @@ static unsigned pvclock_get_time_values(struct
> pvclock_shadow_time *dst,
> dst->tsc_shift = src->tsc_shift;
> dst->flags = src->flags;
> rmb(); /* test version after fetching data */
> - } while ((src->version & 1) || (dst->version != src->version));
> + nversion = ACCESS_ONCE(src->version);
> + } while ((nversion & 1) || (dst->version != nversion));
>
> return dst->version;
> }
> @@ -135,7 +138,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock
> *wall_clock,
> struct pvclock_vcpu_time_info *vcpu_time,
> struct timespec *ts)
> {
> - u32 version;
> + u32 version, nversion;
> u64 delta;
> struct timespec now;
>
> @@ -146,7 +149,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock
> *wall_clock,
> now.tv_sec = wall_clock->sec;
> now.tv_nsec = wall_clock->nsec;
> rmb(); /* fetch time before checking version */
> - } while ((wall_clock->version & 1) || (version != wall_clock->version));
> + nversion = ACCESS_ONCE(wall_clock->version);
> + } while ((nversion & 1) || (version != nversion));
>
> delta = pvclock_clocksource_read(vcpu_time); /* time since system boot */
> delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
> --
> 1.8.4.2
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code
2014-01-16 15:04 ` [Xen-devel] " Jan Beulich
@ 2014-01-16 16:04 ` Julian Stecklina
2014-01-17 9:41 ` Jan Beulich
0 siblings, 1 reply; 11+ messages in thread
From: Julian Stecklina @ 2014-01-16 16:04 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, kvm
[-- Attachment #1: Type: text/plain, Size: 948 bytes --]
On 01/16/2014 04:04 PM, Jan Beulich wrote:
> I don't think so - this would only be an issue if the conditions used
> | instead of ||. || implies a sequence point between evaluating the
> left and right sides, and the standard says: "The presence of a
> sequence point between the evaluation of expressions A and B
> implies that every value computation and side effect associated
> with A is sequenced before every value computation and side
> effect associated with B."
This only applies to single-threaded code. Multithreaded code must be
data-race free for that to be true. See
https://lwn.net/Articles/508991/
> And even if there was a problem (i.e. my interpretation of the
> above being incorrect), I don't think you'd need ACCESS_ONCE()
> here: The same local variable can't have two different values in
> two different use sites when there was no intermediate
> assignment to it.
Same comment as above.
Julian
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code
2014-01-16 16:04 ` Julian Stecklina
@ 2014-01-17 9:41 ` Jan Beulich
2014-01-17 9:50 ` Julian Stecklina
2014-01-27 17:47 ` Paolo Bonzini
0 siblings, 2 replies; 11+ messages in thread
From: Jan Beulich @ 2014-01-17 9:41 UTC (permalink / raw)
To: Julian Stecklina; +Cc: xen-devel, kvm
>>> On 16.01.14 at 17:04, Julian Stecklina <jsteckli@os.inf.tu-dresden.de> wrote:
> On 01/16/2014 04:04 PM, Jan Beulich wrote:
>> I don't think so - this would only be an issue if the conditions used
>> | instead of ||. || implies a sequence point between evaluating the
>> left and right sides, and the standard says: "The presence of a
>> sequence point between the evaluation of expressions A and B
>> implies that every value computation and side effect associated
>> with A is sequenced before every value computation and side
>> effect associated with B."
>
> This only applies to single-threaded code. Multithreaded code must be
> data-race free for that to be true. See
>
> https://lwn.net/Articles/508991/
>
>> And even if there was a problem (i.e. my interpretation of the
>> above being incorrect), I don't think you'd need ACCESS_ONCE()
>> here: The same local variable can't have two different values in
>> two different use sites when there was no intermediate
>> assignment to it.
>
> Same comment as above.
One half of this doesn't apply here, due to the explicit barriers
that are there. The half about converting local variable accesses
back to memory reads (i.e. eliding the local variable), however,
is only a theoretical issue afaict: If a compiler really did this, I
think there'd be far more places where this would hurt.
Jan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code
2014-01-17 9:41 ` Jan Beulich
@ 2014-01-17 9:50 ` Julian Stecklina
2014-01-27 17:47 ` Paolo Bonzini
1 sibling, 0 replies; 11+ messages in thread
From: Julian Stecklina @ 2014-01-17 9:50 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, kvm
[-- Attachment #1: Type: text/plain, Size: 472 bytes --]
On 01/17/2014 10:41 AM, Jan Beulich wrote:
> The half about converting local variable accesses
> back to memory reads (i.e. eliding the local variable), however,
> is only a theoretical issue afaict: If a compiler really did this, I
> think there'd be far more places where this would hurt.
It happens rarely, but it does happen. Not fixing those issues is
inviting trouble with new compiler generations. And these issues are
terribly hard to debug.
Julian
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code
2014-01-16 14:13 [PATCH] KVM, XEN: Fix potential race in pvclock code Julian Stecklina
2014-01-16 15:04 ` [Xen-devel] " Jan Beulich
@ 2014-01-24 18:08 ` Konrad Rzeszutek Wilk
2014-01-27 12:33 ` Julian Stecklina
1 sibling, 1 reply; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-24 18:08 UTC (permalink / raw)
To: Julian Stecklina; +Cc: kvm, xen-devel
On Thu, Jan 16, 2014 at 03:13:44PM +0100, Julian Stecklina wrote:
> The paravirtualized clock used in KVM and Xen uses a version field to
> allow the guest to see when the shared data structure is inconsistent.
> The code reads the version field twice (before and after the data
> structure is copied) and checks whether they haven't
> changed and that the lowest bit is not set. As the second access is not
> synchronized, the compiler could generate code that accesses version
> more than two times and you end up with inconsistent data.
Could you paste in the code that the 'bad' compiler generates
vs the compiler that generate 'good' code please?
>
> An example using pvclock_get_time_values:
>
> host starts updating data, sets src->version to 1
> guest reads src->version (1) and stores it into dst->version.
> guest copies inconsistent data
> guest reads src->version (1) and computes xor with dst->version.
> host finishes updating data and sets src->version to 2
> guest reads src->version (2) and checks whether lower bit is not set.
> while loop exits with inconsistent data!
>
> AFAICS the compiler is allowed to optimize the given code this way.
>
> Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de>
> ---
> arch/x86/kernel/pvclock.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 42eb330..f62b41c 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -55,6 +55,8 @@ static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
> static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
> struct pvclock_vcpu_time_info *src)
> {
> + u32 nversion;
> +
> do {
> dst->version = src->version;
> rmb(); /* fetch version before data */
> @@ -64,7 +66,8 @@ static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
> dst->tsc_shift = src->tsc_shift;
> dst->flags = src->flags;
> rmb(); /* test version after fetching data */
> - } while ((src->version & 1) || (dst->version != src->version));
> + nversion = ACCESS_ONCE(src->version);
> + } while ((nversion & 1) || (dst->version != nversion));
>
> return dst->version;
> }
> @@ -135,7 +138,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
> struct pvclock_vcpu_time_info *vcpu_time,
> struct timespec *ts)
> {
> - u32 version;
> + u32 version, nversion;
> u64 delta;
> struct timespec now;
>
> @@ -146,7 +149,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
> now.tv_sec = wall_clock->sec;
> now.tv_nsec = wall_clock->nsec;
> rmb(); /* fetch time before checking version */
> - } while ((wall_clock->version & 1) || (version != wall_clock->version));
> + nversion = ACCESS_ONCE(wall_clock->version);
> + } while ((nversion & 1) || (version != nversion));
>
> delta = pvclock_clocksource_read(vcpu_time); /* time since system boot */
> delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
> --
> 1.8.4.2
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code
2014-01-24 18:08 ` Konrad Rzeszutek Wilk
@ 2014-01-27 12:33 ` Julian Stecklina
2014-01-28 15:16 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 11+ messages in thread
From: Julian Stecklina @ 2014-01-27 12:33 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: kvm, xen-devel
[-- Attachment #1: Type: text/plain, Size: 4001 bytes --]
On 01/24/2014 07:08 PM, Konrad Rzeszutek Wilk wrote:
> On Thu, Jan 16, 2014 at 03:13:44PM +0100, Julian Stecklina wrote:
>> The paravirtualized clock used in KVM and Xen uses a version field to
>> allow the guest to see when the shared data structure is inconsistent.
>> The code reads the version field twice (before and after the data
>> structure is copied) and checks whether they haven't
>> changed and that the lowest bit is not set. As the second access is not
>> synchronized, the compiler could generate code that accesses version
>> more than two times and you end up with inconsistent data.
>
> Could you paste in the code that the 'bad' compiler generates
> vs the compiler that generate 'good' code please?
At least 4.8 and probably older compilers compile this as intended. The
point is that the standard does not guarantee the indented behavior,
i.e. the code is wrong.
I can refer to this lwn article:
https://lwn.net/Articles/508991/
The whole point of ACCESS_ONCE is to avoid time bombs like that. There
are lots of place where ACCESS_ONCE is used in the kernel:
http://lxr.free-electrons.com/ident?i=ACCESS_ONCE
See for example the check_zero function here:
http://lxr.free-electrons.com/source/arch/x86/kernel/kvm.c#L559
Julian
>
>>
>> An example using pvclock_get_time_values:
>>
>> host starts updating data, sets src->version to 1
>> guest reads src->version (1) and stores it into dst->version.
>> guest copies inconsistent data
>> guest reads src->version (1) and computes xor with dst->version.
>> host finishes updating data and sets src->version to 2
>> guest reads src->version (2) and checks whether lower bit is not set.
>> while loop exits with inconsistent data!
>>
>> AFAICS the compiler is allowed to optimize the given code this way.
>>
>> Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de>
>> ---
>> arch/x86/kernel/pvclock.c | 10 +++++++---
>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
>> index 42eb330..f62b41c 100644
>> --- a/arch/x86/kernel/pvclock.c
>> +++ b/arch/x86/kernel/pvclock.c
>> @@ -55,6 +55,8 @@ static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
>> static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
>> struct pvclock_vcpu_time_info *src)
>> {
>> + u32 nversion;
>> +
>> do {
>> dst->version = src->version;
>> rmb(); /* fetch version before data */
>> @@ -64,7 +66,8 @@ static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
>> dst->tsc_shift = src->tsc_shift;
>> dst->flags = src->flags;
>> rmb(); /* test version after fetching data */
>> - } while ((src->version & 1) || (dst->version != src->version));
>> + nversion = ACCESS_ONCE(src->version);
>> + } while ((nversion & 1) || (dst->version != nversion));
>>
>> return dst->version;
>> }
>> @@ -135,7 +138,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
>> struct pvclock_vcpu_time_info *vcpu_time,
>> struct timespec *ts)
>> {
>> - u32 version;
>> + u32 version, nversion;
>> u64 delta;
>> struct timespec now;
>>
>> @@ -146,7 +149,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
>> now.tv_sec = wall_clock->sec;
>> now.tv_nsec = wall_clock->nsec;
>> rmb(); /* fetch time before checking version */
>> - } while ((wall_clock->version & 1) || (version != wall_clock->version));
>> + nversion = ACCESS_ONCE(wall_clock->version);
>> + } while ((nversion & 1) || (version != nversion));
>>
>> delta = pvclock_clocksource_read(vcpu_time); /* time since system boot */
>> delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
>> --
>> 1.8.4.2
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code
2014-01-17 9:41 ` Jan Beulich
2014-01-17 9:50 ` Julian Stecklina
@ 2014-01-27 17:47 ` Paolo Bonzini
2014-01-28 15:25 ` Konrad Rzeszutek Wilk
1 sibling, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2014-01-27 17:47 UTC (permalink / raw)
To: Jan Beulich, Julian Stecklina; +Cc: xen-devel, kvm
Il 17/01/2014 10:41, Jan Beulich ha scritto:
> One half of this doesn't apply here, due to the explicit barriers
> that are there. The half about converting local variable accesses
> back to memory reads (i.e. eliding the local variable), however,
> is only a theoretical issue afaict: If a compiler really did this, I
> think there'd be far more places where this would hurt.
Perhaps. But for example seqlocks get it right.
> I don't think so - this would only be an issue if the conditions used
> | instead of ||. || implies a sequence point between evaluating the
> left and right sides, and the standard says: "The presence of a
> sequence point between the evaluation of expressions A and B
> implies that every value computation and side effect associated
> with A is sequenced before every value computation and side
> effect associated with B."
I suspect this is widely ignored by compilers if A is not
side-effecting. The above wording would imply that
x = a || b => x = (a | b) != 0
(where "a" and "b" are non-volatile globals) would be an invalid
change. The compiler would have to do:
temp = a;
barrier();
x = (temp | b) != 0
and I'm pretty sure that no compiler does it this way unless C11/C++11
atomics are involved (at which point accesses become side-effecting).
The code has changed and pvclock_get_time_values moved to
__pvclock_read_cycles, but I think the problem remains. Another approach
to fixing this (and one I prefer) is to do the same thing as seqlocks:
turn off the low bit in the return value of __pvclock_read_cycles,
and drop the || altogether. Untested patch after my name.
Paolo
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index d6b078e9fa28..5aec80adaf54 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -75,7 +75,7 @@ unsigned __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src,
cycle_t ret, offset;
u8 ret_flags;
- version = src->version;
+ version = src->version & ~1;
/* Note: emulated platforms which do not advertise SSE2 support
* result in kvmclock not using the necessary RDTSC barriers.
* Without barriers, it is possible that RDTSC instruction reads from
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 2f355d229a58..a5052a87d55e 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -66,7 +66,7 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
do {
version = __pvclock_read_cycles(src, &ret, &flags);
- } while ((src->version & 1) || version != src->version);
+ } while (version != src->version);
return flags & valid_flags;
}
@@ -80,7 +80,7 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
do {
version = __pvclock_read_cycles(src, &ret, &flags);
- } while ((src->version & 1) || version != src->version);
+ } while (version != src->version);
if (unlikely((flags & PVCLOCK_GUEST_STOPPED) != 0)) {
src->flags &= ~PVCLOCK_GUEST_STOPPED;
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index eb5d7a56f8d4..f09b09bcb515 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -117,7 +117,6 @@ static notrace cycle_t vread_pvclock(int *mode)
*/
cpu1 = __getcpu() & VGETCPU_CPU_MASK;
} while (unlikely(cpu != cpu1 ||
- (pvti->pvti.version & 1) ||
pvti->pvti.version != version));
if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code
2014-01-27 12:33 ` Julian Stecklina
@ 2014-01-28 15:16 ` Konrad Rzeszutek Wilk
2014-01-28 16:06 ` Julian Stecklina
0 siblings, 1 reply; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-28 15:16 UTC (permalink / raw)
To: Julian Stecklina, jbeulich; +Cc: kvm, xen-devel
On Mon, Jan 27, 2014 at 01:33:01PM +0100, Julian Stecklina wrote:
> On 01/24/2014 07:08 PM, Konrad Rzeszutek Wilk wrote:
> > On Thu, Jan 16, 2014 at 03:13:44PM +0100, Julian Stecklina wrote:
> >> The paravirtualized clock used in KVM and Xen uses a version field to
> >> allow the guest to see when the shared data structure is inconsistent.
> >> The code reads the version field twice (before and after the data
> >> structure is copied) and checks whether they haven't
> >> changed and that the lowest bit is not set. As the second access is not
> >> synchronized, the compiler could generate code that accesses version
> >> more than two times and you end up with inconsistent data.
> >
> > Could you paste in the code that the 'bad' compiler generates
> > vs the compiler that generate 'good' code please?
>
> At least 4.8 and probably older compilers compile this as intended. The
> point is that the standard does not guarantee the indented behavior,
> i.e. the code is wrong.
Perhaps I misunderstood Jan's response but it sounded to me like
that the compiler was not adhering to the standard?
>
> I can refer to this lwn article:
> https://lwn.net/Articles/508991/
>
> The whole point of ACCESS_ONCE is to avoid time bombs like that. There
> are lots of place where ACCESS_ONCE is used in the kernel:
>
> http://lxr.free-electrons.com/ident?i=ACCESS_ONCE
>
> See for example the check_zero function here:
> http://lxr.free-electrons.com/source/arch/x86/kernel/kvm.c#L559
>
In other words, you don't have a sample of 'bad' compiler code.
> Julian
>
> >
> >>
> >> An example using pvclock_get_time_values:
> >>
> >> host starts updating data, sets src->version to 1
> >> guest reads src->version (1) and stores it into dst->version.
> >> guest copies inconsistent data
> >> guest reads src->version (1) and computes xor with dst->version.
> >> host finishes updating data and sets src->version to 2
> >> guest reads src->version (2) and checks whether lower bit is not set.
> >> while loop exits with inconsistent data!
> >>
> >> AFAICS the compiler is allowed to optimize the given code this way.
> >>
> >> Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de>
> >> ---
> >> arch/x86/kernel/pvclock.c | 10 +++++++---
> >> 1 file changed, 7 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> >> index 42eb330..f62b41c 100644
> >> --- a/arch/x86/kernel/pvclock.c
> >> +++ b/arch/x86/kernel/pvclock.c
> >> @@ -55,6 +55,8 @@ static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
> >> static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
> >> struct pvclock_vcpu_time_info *src)
> >> {
> >> + u32 nversion;
> >> +
> >> do {
> >> dst->version = src->version;
> >> rmb(); /* fetch version before data */
> >> @@ -64,7 +66,8 @@ static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
> >> dst->tsc_shift = src->tsc_shift;
> >> dst->flags = src->flags;
> >> rmb(); /* test version after fetching data */
> >> - } while ((src->version & 1) || (dst->version != src->version));
> >> + nversion = ACCESS_ONCE(src->version);
> >> + } while ((nversion & 1) || (dst->version != nversion));
> >>
> >> return dst->version;
> >> }
> >> @@ -135,7 +138,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
> >> struct pvclock_vcpu_time_info *vcpu_time,
> >> struct timespec *ts)
> >> {
> >> - u32 version;
> >> + u32 version, nversion;
> >> u64 delta;
> >> struct timespec now;
> >>
> >> @@ -146,7 +149,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
> >> now.tv_sec = wall_clock->sec;
> >> now.tv_nsec = wall_clock->nsec;
> >> rmb(); /* fetch time before checking version */
> >> - } while ((wall_clock->version & 1) || (version != wall_clock->version));
> >> + nversion = ACCESS_ONCE(wall_clock->version);
> >> + } while ((nversion & 1) || (version != nversion));
> >>
> >> delta = pvclock_clocksource_read(vcpu_time); /* time since system boot */
> >> delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
> >> --
> >> 1.8.4.2
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xen.org
> >> http://lists.xen.org/xen-devel
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code
2014-01-27 17:47 ` Paolo Bonzini
@ 2014-01-28 15:25 ` Konrad Rzeszutek Wilk
0 siblings, 0 replies; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-28 15:25 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Jan Beulich, Julian Stecklina, kvm, xen-devel
On Mon, Jan 27, 2014 at 06:47:58PM +0100, Paolo Bonzini wrote:
> Il 17/01/2014 10:41, Jan Beulich ha scritto:
> > One half of this doesn't apply here, due to the explicit barriers
> > that are there. The half about converting local variable accesses
> > back to memory reads (i.e. eliding the local variable), however,
> > is only a theoretical issue afaict: If a compiler really did this, I
> > think there'd be far more places where this would hurt.
>
> Perhaps. But for example seqlocks get it right.
>
> > I don't think so - this would only be an issue if the conditions used
> > | instead of ||. || implies a sequence point between evaluating the
> > left and right sides, and the standard says: "The presence of a
> > sequence point between the evaluation of expressions A and B
> > implies that every value computation and side effect associated
> > with A is sequenced before every value computation and side
> > effect associated with B."
>
> I suspect this is widely ignored by compilers if A is not
> side-effecting. The above wording would imply that
>
> x = a || b => x = (a | b) != 0
>
> (where "a" and "b" are non-volatile globals) would be an invalid
> change. The compiler would have to do:
>
> temp = a;
> barrier();
> x = (temp | b) != 0
>
> and I'm pretty sure that no compiler does it this way unless C11/C++11
> atomics are involved (at which point accesses become side-effecting).
>
> The code has changed and pvclock_get_time_values moved to
> __pvclock_read_cycles, but I think the problem remains. Another approach
> to fixing this (and one I prefer) is to do the same thing as seqlocks:
> turn off the low bit in the return value of __pvclock_read_cycles,
> and drop the || altogether. Untested patch after my name.
Is there a good test-case to confirm that this patch does not introduce
any regressions?
>
> Paolo
>
> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index d6b078e9fa28..5aec80adaf54 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -75,7 +75,7 @@ unsigned __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src,
> cycle_t ret, offset;
> u8 ret_flags;
>
> - version = src->version;
> + version = src->version & ~1;
> /* Note: emulated platforms which do not advertise SSE2 support
> * result in kvmclock not using the necessary RDTSC barriers.
> * Without barriers, it is possible that RDTSC instruction reads from
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 2f355d229a58..a5052a87d55e 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -66,7 +66,7 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
>
> do {
> version = __pvclock_read_cycles(src, &ret, &flags);
> - } while ((src->version & 1) || version != src->version);
> + } while (version != src->version);
>
> return flags & valid_flags;
> }
> @@ -80,7 +80,7 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
>
> do {
> version = __pvclock_read_cycles(src, &ret, &flags);
> - } while ((src->version & 1) || version != src->version);
> + } while (version != src->version);
>
> if (unlikely((flags & PVCLOCK_GUEST_STOPPED) != 0)) {
> src->flags &= ~PVCLOCK_GUEST_STOPPED;
> diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
> index eb5d7a56f8d4..f09b09bcb515 100644
> --- a/arch/x86/vdso/vclock_gettime.c
> +++ b/arch/x86/vdso/vclock_gettime.c
> @@ -117,7 +117,6 @@ static notrace cycle_t vread_pvclock(int *mode)
> */
> cpu1 = __getcpu() & VGETCPU_CPU_MASK;
> } while (unlikely(cpu != cpu1 ||
> - (pvti->pvti.version & 1) ||
> pvti->pvti.version != version));
>
> if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code
2014-01-28 15:16 ` Konrad Rzeszutek Wilk
@ 2014-01-28 16:06 ` Julian Stecklina
0 siblings, 0 replies; 11+ messages in thread
From: Julian Stecklina @ 2014-01-28 16:06 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, jbeulich; +Cc: kvm, xen-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 01/28/2014 04:16 PM, Konrad Rzeszutek Wilk wrote:
>> At least 4.8 and probably older compilers compile this as
>> intended. The point is that the standard does not guarantee the
>> indented behavior, i.e. the code is wrong.
>
> Perhaps I misunderstood Jan's response but it sounded to me like
> that the compiler was not adhering to the standard?
Compilers are free to generate code that breaks the current clock code
while staying within the standards. If the compiler ignores the
standard, all bets are off anyway...
>>
>> I can refer to this lwn article:
>> https://lwn.net/Articles/508991/
>>
>> The whole point of ACCESS_ONCE is to avoid time bombs like that.
>> There are lots of place where ACCESS_ONCE is used in the kernel:
>>
>> http://lxr.free-electrons.com/ident?i=ACCESS_ONCE
>>
>> See for example the check_zero function here:
>> http://lxr.free-electrons.com/source/arch/x86/kernel/kvm.c#L559
>>
>
> In other words, you don't have a sample of 'bad' compiler code.
Nope.
Julian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEUEARECAAYFAlLn1ZgACgkQ2EtjUdW3H9n/DgCSAmrVCyxrs42aFFB3Ug+aw4kN
7wCgmEO4CbOI3BkOXKSorY91td9u7H8=
=fgrl
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-01-28 16:07 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-16 14:13 [PATCH] KVM, XEN: Fix potential race in pvclock code Julian Stecklina
2014-01-16 15:04 ` [Xen-devel] " Jan Beulich
2014-01-16 16:04 ` Julian Stecklina
2014-01-17 9:41 ` Jan Beulich
2014-01-17 9:50 ` Julian Stecklina
2014-01-27 17:47 ` Paolo Bonzini
2014-01-28 15:25 ` Konrad Rzeszutek Wilk
2014-01-24 18:08 ` Konrad Rzeszutek Wilk
2014-01-27 12:33 ` Julian Stecklina
2014-01-28 15:16 ` Konrad Rzeszutek Wilk
2014-01-28 16:06 ` Julian Stecklina
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).