* What's kvmclock's custom sched_clock for? @ 2016-01-07 7:18 Andy Lutomirski 2016-01-07 8:41 ` Andy Lutomirski 2016-01-07 10:56 ` Marcelo Tosatti 0 siblings, 2 replies; 14+ messages in thread From: Andy Lutomirski @ 2016-01-07 7:18 UTC (permalink / raw) To: Marcelo Tosatti, Radim Krcmar, kvm list AFAICT KVM reliably passes a monotonic TSC through to guests, even if the host suspends. That's all that sched_clock needs, I think. So why does kvmclock have a custom sched_clock? On a related note, KVM doesn't pass the "invariant TSC" feature through to guests on my machine even though "invtsc" is set in QEMU and the kernel host code appears to support it. What gives? --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: What's kvmclock's custom sched_clock for? 2016-01-07 7:18 What's kvmclock's custom sched_clock for? Andy Lutomirski @ 2016-01-07 8:41 ` Andy Lutomirski 2016-01-07 10:59 ` Marcelo Tosatti 2016-01-07 15:18 ` Radim Krcmar 2016-01-07 10:56 ` Marcelo Tosatti 1 sibling, 2 replies; 14+ messages in thread From: Andy Lutomirski @ 2016-01-07 8:41 UTC (permalink / raw) To: Marcelo Tosatti, Radim Krcmar, kvm list On Wed, Jan 6, 2016 at 11:18 PM, Andy Lutomirski <luto@amacapital.net> wrote: > AFAICT KVM reliably passes a monotonic TSC through to guests, even if > the host suspends. That's all that sched_clock needs, I think. > > So why does kvmclock have a custom sched_clock? > > On a related note, KVM doesn't pass the "invariant TSC" feature > through to guests on my machine even though "invtsc" is set in QEMU > and the kernel host code appears to support it. What gives? I think I solved part of the puzzle. KVM doesn't like to advertise invtsc by default because that breaks migration. (Oddly, the end result seems wrong -- with migration, the TSC doesn't stop, but it's not constant, and X86_FEATURE_CONSTANT_TSC is nonetheless set, but whatever.) So the scheduler clock doesn't get marked stable. Is that it? This still doesn't explain why even explicitly trying to set invtsc doesn't seem to work. --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: What's kvmclock's custom sched_clock for? 2016-01-07 8:41 ` Andy Lutomirski @ 2016-01-07 10:59 ` Marcelo Tosatti 2016-01-07 15:18 ` Radim Krcmar 1 sibling, 0 replies; 14+ messages in thread From: Marcelo Tosatti @ 2016-01-07 10:59 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Radim Krcmar, kvm list On Thu, Jan 07, 2016 at 12:41:34AM -0800, Andy Lutomirski wrote: > On Wed, Jan 6, 2016 at 11:18 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > AFAICT KVM reliably passes a monotonic TSC through to guests, even if > > the host suspends. That's all that sched_clock needs, I think. > > > > So why does kvmclock have a custom sched_clock? > > > > On a related note, KVM doesn't pass the "invariant TSC" feature > > through to guests on my machine even though "invtsc" is set in QEMU > > and the kernel host code appears to support it. What gives? > > I think I solved part of the puzzle. KVM doesn't like to advertise > invtsc by default because that breaks migration. (Oddly, the end > result seems wrong -- with migration, the TSC doesn't stop, but it's > not constant, and X86_FEATURE_CONSTANT_TSC is nonetheless set, but > whatever.) So the scheduler clock doesn't get marked stable. Can you break down this sentence? QEMU commit 68bfd0ad4a1dcc4c328d5db85dc746b49c1ec07e target-i386: block migration and savevm if invariant tsc is exposed Invariant TSC documentation mentions that "invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states". This is not the case if migration to a host with different TSC frequency is allowed, or if savevm is performed. So block migration/savevm. > Is that it? > > This still doesn't explain why even explicitly trying to set invtsc > doesn't seem to work. > > --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: What's kvmclock's custom sched_clock for? 2016-01-07 8:41 ` Andy Lutomirski 2016-01-07 10:59 ` Marcelo Tosatti @ 2016-01-07 15:18 ` Radim Krcmar 2016-01-07 17:27 ` Andy Lutomirski 2016-01-07 20:10 ` Marcelo Tosatti 1 sibling, 2 replies; 14+ messages in thread From: Radim Krcmar @ 2016-01-07 15:18 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Marcelo Tosatti, kvm list 2016-01-07 00:41-0800, Andy Lutomirski: > On Wed, Jan 6, 2016 at 11:18 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> AFAICT KVM reliably passes a monotonic TSC through to guests, even if >> the host suspends. That's all that sched_clock needs, I think. >> >> So why does kvmclock have a custom sched_clock? If the host CPU has enough features, then yes, KVM can take care of everything and kvmclock has no advantage over TSC, even when migrating to TSC with different frequency as modern CPUs support TSC offset + scaling in guests. The problem is with antiques. Guests on old CPUs need to have more information on top of TSC to be able to get useful system time. And old KVM doesn't provide good information, so we have legacy layers everywhere. kvmclock in the guest can just equal to rdtsc() with modern CPUs, but we still want to use kvmclock wrapper, because kvmclock can provide an stable clock regardless of underlying TSC (in theory). >> On a related note, KVM doesn't pass the "invariant TSC" feature >> through to guests on my machine even though "invtsc" is set in QEMU >> and the kernel host code appears to support it. What gives? > > I think I solved part of the puzzle. KVM doesn't like to advertise > invtsc by default because that breaks migration. (Oddly, the end > result seems wrong -- with migration, the TSC doesn't stop, but it's > not constant, and X86_FEATURE_CONSTANT_TSC is nonetheless set, but > whatever.) QEMU probably missed that because X86_FEATURE_CONSTANT_TSC is a function of family/model. (CONSTANT_TSC is the same as invariant TSC as KVM guests don't have c-states.) > So the scheduler clock doesn't get marked stable. Stable sched clock is quite unrelated to TSC features. KVMs from last few years should always give good enough result to allow stable sched clock. We wanted realtime guests and realtime linux needs no_hz=full that depends on stable sched clock. The result is huge hack. We'd need to say that migration creates powerful gravity fields to faithfully migrate constant/invariant TSC, but stable sched clock doesn't have that strict expectations about time. > Is that it? > > This still doesn't explain why even explicitly trying to set invtsc > doesn't seem to work. Seems like a bug. Mine cpuid is 0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100 and QEMU says warning: host doesn't support requested feature: CPUID.80000007H:EDX.invtsc [bit 8] I'll see if it's in KVM or QEMU. (We should only forbid migrations to hosts with different frequency and without guest TSC scaling.) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: What's kvmclock's custom sched_clock for? 2016-01-07 15:18 ` Radim Krcmar @ 2016-01-07 17:27 ` Andy Lutomirski 2016-01-07 17:48 ` Radim Krcmar 2016-01-07 20:15 ` Marcelo Tosatti 2016-01-07 20:10 ` Marcelo Tosatti 1 sibling, 2 replies; 14+ messages in thread From: Andy Lutomirski @ 2016-01-07 17:27 UTC (permalink / raw) To: Radim Krcmar; +Cc: Marcelo Tosatti, kvm list On Thu, Jan 7, 2016 at 2:56 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote: > On Wed, Jan 06, 2016 at 11:18:51PM -0800, Andy Lutomirski wrote: >> AFAICT KVM reliably passes a monotonic TSC through to guests, > > It does not. Under what circumstances does it go backwards? All hosts support tsc offsets, I think, and the host code knows how to prevent the clock from going backwards even on host suspend. Does migration make the TSC go backwards? If so, that's impolite and it would be nice to fix it. On Thu, Jan 7, 2016 at 7:18 AM, Radim Krcmar <rkrcmar@redhat.com> wrote: > 2016-01-07 00:41-0800, Andy Lutomirski: >> On Wed, Jan 6, 2016 at 11:18 PM, Andy Lutomirski <luto@amacapital.net> wrote: >>> AFAICT KVM reliably passes a monotonic TSC through to guests, even if >>> the host suspends. That's all that sched_clock needs, I think. >>> >>> So why does kvmclock have a custom sched_clock? > > If the host CPU has enough features, then yes, KVM can take care of > everything and kvmclock has no advantage over TSC, even when migrating > to TSC with different frequency as modern CPUs support TSC offset + > scaling in guests. > > The problem is with antiques. Guests on old CPUs need to have more > information on top of TSC to be able to get useful system time. > And old KVM doesn't provide good information, so we have legacy layers > everywhere. > > kvmclock in the guest can just equal to rdtsc() with modern CPUs, but we > still want to use kvmclock wrapper, because kvmclock can provide an > stable clock regardless of underlying TSC (in theory). OK, makes sense. > >>> On a related note, KVM doesn't pass the "invariant TSC" feature >>> through to guests on my machine even though "invtsc" is set in QEMU >>> and the kernel host code appears to support it. What gives? >> >> I think I solved part of the puzzle. KVM doesn't like to advertise >> invtsc by default because that breaks migration. (Oddly, the end >> result seems wrong -- with migration, the TSC doesn't stop, but it's >> not constant, and X86_FEATURE_CONSTANT_TSC is nonetheless set, but >> whatever.) > > QEMU probably missed that because X86_FEATURE_CONSTANT_TSC is a function > of family/model. (CONSTANT_TSC is the same as invariant TSC as KVM > guests don't have c-states.) > >> So the scheduler clock doesn't get marked stable. > > Stable sched clock is quite unrelated to TSC features. KVMs from last > few years should always give good enough result to allow stable sched > clock. We wanted realtime guests and realtime linux needs no_hz=full > that depends on stable sched clock. The result is huge hack. > > We'd need to say that migration creates powerful gravity fields to > faithfully migrate constant/invariant TSC, but stable sched clock > doesn't have that strict expectations about time. > >> Is that it? >> >> This still doesn't explain why even explicitly trying to set invtsc >> doesn't seem to work. > > Seems like a bug. Mine cpuid is > 0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100 > and QEMU says > warning: host doesn't support requested feature: CPUID.80000007H:EDX.invtsc [bit 8] > > I'll see if it's in KVM or QEMU. (We should only forbid migrations to > hosts with different frequency and without guest TSC scaling.) If I do -cpu host,migratable=off,+invtsc, then it works. Maybe QEMU is just being too strict. This is Skylake. -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: What's kvmclock's custom sched_clock for? 2016-01-07 17:27 ` Andy Lutomirski @ 2016-01-07 17:48 ` Radim Krcmar 2016-01-07 20:15 ` Marcelo Tosatti 1 sibling, 0 replies; 14+ messages in thread From: Radim Krcmar @ 2016-01-07 17:48 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Marcelo Tosatti, kvm list 2016-01-07 09:27-0800, Andy Lutomirski: > On Thu, Jan 7, 2016 at 7:18 AM, Radim Krcmar <rkrcmar@redhat.com> wrote: > > 2016-01-07 00:41-0800, Andy Lutomirski: >>> This still doesn't explain why even explicitly trying to set invtsc >>> doesn't seem to work. >> >> Seems like a bug. Mine cpuid is >> 0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100 >> and QEMU says >> warning: host doesn't support requested feature: CPUID.80000007H:EDX.invtsc [bit 8] >> >> I'll see if it's in KVM or QEMU. (We should only forbid migrations to >> hosts with different frequency and without guest TSC scaling.) > > If I do -cpu host,migratable=off,+invtsc, then it works. Maybe QEMU > is just being too strict. This is Skylake. It does, thanks. It's mainly a misleading warning then; stripping flags at the beginning instead of denying migration later on makes some sense. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: What's kvmclock's custom sched_clock for? 2016-01-07 17:27 ` Andy Lutomirski 2016-01-07 17:48 ` Radim Krcmar @ 2016-01-07 20:15 ` Marcelo Tosatti 1 sibling, 0 replies; 14+ messages in thread From: Marcelo Tosatti @ 2016-01-07 20:15 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Radim Krcmar, kvm list On Thu, Jan 07, 2016 at 09:27:30AM -0800, Andy Lutomirski wrote: > On Thu, Jan 7, 2016 at 2:56 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote: > > On Wed, Jan 06, 2016 at 11:18:51PM -0800, Andy Lutomirski wrote: > >> AFAICT KVM reliably passes a monotonic TSC through to guests, > > > > It does not. > > Under what circumstances does it go backwards? All hosts support tsc > offsets, I think, and the host code knows how to prevent the clock > from going backwards even on host suspend. > > Does migration make the TSC go backwards? If so, that's impolite and > it would be nice to fix it. TSC clocksource in the host is required for TSC masterclock scheme. A change from TSC clocksource to a different clocksource, in the host, invalidates TSC masterclock scheme. If you change from TSC clocksource to HPET clocksource, for example, TSC masterclock scheme stops functioning and its necessary to stop exposing PVCLOCK_TSC_STABLE_CLOCK. Please send a fix, your patch is causing breakage. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: What's kvmclock's custom sched_clock for? 2016-01-07 15:18 ` Radim Krcmar 2016-01-07 17:27 ` Andy Lutomirski @ 2016-01-07 20:10 ` Marcelo Tosatti 2016-01-08 14:13 ` Radim Krcmar 1 sibling, 1 reply; 14+ messages in thread From: Marcelo Tosatti @ 2016-01-07 20:10 UTC (permalink / raw) To: Radim Krcmar; +Cc: Andy Lutomirski, kvm list On Thu, Jan 07, 2016 at 04:18:11PM +0100, Radim Krcmar wrote: > 2016-01-07 00:41-0800, Andy Lutomirski: > > On Wed, Jan 6, 2016 at 11:18 PM, Andy Lutomirski <luto@amacapital.net> wrote: > >> AFAICT KVM reliably passes a monotonic TSC through to guests, even if > >> the host suspends. That's all that sched_clock needs, I think. > >> > >> So why does kvmclock have a custom sched_clock? > > If the host CPU has enough features, then yes, KVM can take care of > everything and kvmclock has no advantage over TSC, even when migrating > to TSC with different frequency as modern CPUs support TSC offset + > scaling in guests. > > The problem is with antiques. Guests on old CPUs need to have more > information on top of TSC to be able to get useful system time. > And old KVM doesn't provide good information, so we have legacy layers > everywhere. > > kvmclock in the guest can just equal to rdtsc() with modern CPUs, but we > still want to use kvmclock wrapper, because kvmclock can provide an > stable clock regardless of underlying TSC (in theory). > > >> On a related note, KVM doesn't pass the "invariant TSC" feature > >> through to guests on my machine even though "invtsc" is set in QEMU > >> and the kernel host code appears to support it. What gives? > > > > I think I solved part of the puzzle. KVM doesn't like to advertise > > invtsc by default because that breaks migration. (Oddly, the end > > result seems wrong -- with migration, the TSC doesn't stop, but it's > > not constant, and X86_FEATURE_CONSTANT_TSC is nonetheless set, but > > whatever.) > > QEMU probably missed that because X86_FEATURE_CONSTANT_TSC is a function > of family/model. (CONSTANT_TSC is the same as invariant TSC as KVM > guests don't have c-states.) > > > So the scheduler clock doesn't get marked stable. > > Stable sched clock is quite unrelated to TSC features. KVMs from last > few years should always give good enough result to allow stable sched > clock. We wanted realtime guests and realtime linux needs no_hz=full > that depends on stable sched clock. The result is huge hack. > > We'd need to say that migration creates powerful gravity fields to > faithfully migrate constant/invariant TSC, but stable sched clock > doesn't have that strict expectations about time. Was that supposed to be a joke? > > Is that it? > > > > This still doesn't explain why even explicitly trying to set invtsc > > doesn't seem to work. > > Seems like a bug. Mine cpuid is > 0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100 > and QEMU says > warning: host doesn't support requested feature: CPUID.80000007H:EDX.invtsc [bit 8] > > I'll see if it's in KVM or QEMU. (We should only forbid migrations to > hosts with different frequency and without guest TSC scaling.) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: What's kvmclock's custom sched_clock for? 2016-01-07 20:10 ` Marcelo Tosatti @ 2016-01-08 14:13 ` Radim Krcmar 2016-01-11 21:00 ` Marcelo Tosatti 0 siblings, 1 reply; 14+ messages in thread From: Radim Krcmar @ 2016-01-08 14:13 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Andy Lutomirski, kvm list 2016-01-07 18:10-0200, Marcelo Tosatti: > On Thu, Jan 07, 2016 at 04:18:11PM +0100, Radim Krcmar wrote: >> Stable sched clock is quite unrelated to TSC features. KVMs from last >> few years should always give good enough result to allow stable sched >> clock. We wanted realtime guests and realtime linux needs no_hz=full >> that depends on stable sched clock. The result is huge hack. >> >> We'd need to say that migration creates powerful gravity fields to >> faithfully migrate constant/invariant TSC, but stable sched clock >> doesn't have that strict expectations about time. > > Was that supposed to be a joke? Yes, if you mean the first sentence of the second paragraph. (I think that we'll use a different disclaimer when we enable best-effort migration with invariant TSC.) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: What's kvmclock's custom sched_clock for? 2016-01-08 14:13 ` Radim Krcmar @ 2016-01-11 21:00 ` Marcelo Tosatti 2016-01-12 15:33 ` Radim Krcmar 0 siblings, 1 reply; 14+ messages in thread From: Marcelo Tosatti @ 2016-01-11 21:00 UTC (permalink / raw) To: Radim Krcmar; +Cc: Andy Lutomirski, kvm list On Fri, Jan 08, 2016 at 03:13:16PM +0100, Radim Krcmar wrote: > 2016-01-07 18:10-0200, Marcelo Tosatti: > > On Thu, Jan 07, 2016 at 04:18:11PM +0100, Radim Krcmar wrote: > >> Stable sched clock is quite unrelated to TSC features. KVMs from last > >> few years should always give good enough result to allow stable sched > >> clock. We wanted realtime guests and realtime linux needs no_hz=full > >> that depends on stable sched clock. The result is huge hack. > >> > >> We'd need to say that migration creates powerful gravity fields to > >> faithfully migrate constant/invariant TSC, but stable sched clock > >> doesn't have that strict expectations about time. > > > > Was that supposed to be a joke? > > Yes, if you mean the first sentence of the second paragraph. > (I think that we'll use a different disclaimer when we enable > best-effort migration with invariant TSC.) About getting rid of kvmclock, problem is steal time. Should separate steal time reporting from rest of kvmclock, so that you can use TSC clocksource and have steal time reporting. Also, its very clear why migration was disabled, because invariant tsc man page says: QEMU commit 68bfd0ad4a1dcc4c328d5db85dc746b49c1ec07e target-i386: block migration and savevm if invariant tsc is exposed Invariant TSC documentation mentions that "invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states". This is not the case if migration to a host with different TSC frequency is allowed, or if savevm is performed. So block migration/savevm. The issue is, even with migration to a host with proper frequency, TSC counting will stop for the duration of migration. But i suppose you can document the fact (that "invariant TSC" behaviour as documented is different than what exposed by virtualization), and go for it. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: What's kvmclock's custom sched_clock for? 2016-01-11 21:00 ` Marcelo Tosatti @ 2016-01-12 15:33 ` Radim Krcmar 2016-01-12 20:48 ` Marcelo Tosatti 0 siblings, 1 reply; 14+ messages in thread From: Radim Krcmar @ 2016-01-12 15:33 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Andy Lutomirski, kvm list 2016-01-11 19:00-0200, Marcelo Tosatti: > On Fri, Jan 08, 2016 at 03:13:16PM +0100, Radim Krcmar wrote: >> 2016-01-07 18:10-0200, Marcelo Tosatti: >>> On Thu, Jan 07, 2016 at 04:18:11PM +0100, Radim Krcmar wrote: >>>> Stable sched clock is quite unrelated to TSC features. KVMs from last >>>> few years should always give good enough result to allow stable sched >>>> clock. We wanted realtime guests and realtime linux needs no_hz=full >>>> that depends on stable sched clock. The result is huge hack. >>>> >>>> We'd need to say that migration creates powerful gravity fields to >>>> faithfully migrate constant/invariant TSC, but stable sched clock >>>> doesn't have that strict expectations about time. >>> >>> Was that supposed to be a joke? >> >> Yes, if you mean the first sentence of the second paragraph. >> (I think that we'll use a different disclaimer when we enable >> best-effort migration with invariant TSC.) > > About getting rid of kvmclock, I never wanted to get rid of kvmclock. In the first part of the email in question, I meant that the shift and scale can be accelerated by VMX-TSC hardware, leaving only a check that kvmclock in expected mode and rdtsc to get the result. > problem is steal time. Should > separate steal time reporting from rest of kvmclock, so that you > can use TSC clocksource and have steal time reporting. We can already do that, steal time doesn't depend on guest sched clock. Steal time uses a MSR+memory based interface that is related to kvmclock only by shared notion of a second. > Also, its very clear why migration was disabled, because > invariant tsc man page says: > > QEMU commit 68bfd0ad4a1dcc4c328d5db85dc746b49c1ec07e > > target-i386: block migration and savevm if invariant tsc is exposed > > Invariant TSC documentation mentions that "invariant TSC will run at a > constant rate in all ACPI P-, C-. and T-states". > > This is not the case if migration to a host with different TSC frequency > is allowed, or if savevm is performed. So block migration/savevm. > > The issue is, even with migration to a host with > proper frequency, TSC counting will stop for the duration of migration. Stopping is the easiest solution. We can also try to mitigate the difference by synchronizing time on source and destination hosts, sharing what UTC/TAI/... time there was at one TSC read on the source, and setting the appropriate TSC shift on the destination. (And solve accumulation of the error, maybe by always using the initial pair.) The result should be less off than when stopping and the guest couldn't tell that TSC rate varied as it can't have more reliable time source than the host. The issue doesn't have a good solution and I think that some people will prefer drawbacks associated with invariant TSC migration. (They do so for other time sources and all have the issue + we already migrate constant TSC, which can only match the spec if we make some excuses, like "migration forces CPUs into a deep C-state".) > But i suppose you can document the fact (that "invariant TSC" behaviour > as documented is different than what exposed by virtualization), Yep, that generic explanation is quite likely, next to no documentation. (There are some lawyerish explanations that don't need to violate the spec, but I prefer the physics-based one.) > and > go for it. I definitely won't be proactive. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: What's kvmclock's custom sched_clock for? 2016-01-12 15:33 ` Radim Krcmar @ 2016-01-12 20:48 ` Marcelo Tosatti 2016-01-13 14:59 ` Radim Krcmar 0 siblings, 1 reply; 14+ messages in thread From: Marcelo Tosatti @ 2016-01-12 20:48 UTC (permalink / raw) To: Radim Krcmar; +Cc: Andy Lutomirski, kvm list On Tue, Jan 12, 2016 at 04:33:28PM +0100, Radim Krcmar wrote: > 2016-01-11 19:00-0200, Marcelo Tosatti: > > On Fri, Jan 08, 2016 at 03:13:16PM +0100, Radim Krcmar wrote: > >> 2016-01-07 18:10-0200, Marcelo Tosatti: > >>> On Thu, Jan 07, 2016 at 04:18:11PM +0100, Radim Krcmar wrote: > >>>> Stable sched clock is quite unrelated to TSC features. KVMs from last > >>>> few years should always give good enough result to allow stable sched > >>>> clock. We wanted realtime guests and realtime linux needs no_hz=full > >>>> that depends on stable sched clock. The result is huge hack. > >>>> > >>>> We'd need to say that migration creates powerful gravity fields to > >>>> faithfully migrate constant/invariant TSC, but stable sched clock > >>>> doesn't have that strict expectations about time. > >>> > >>> Was that supposed to be a joke? > >> > >> Yes, if you mean the first sentence of the second paragraph. > >> (I think that we'll use a different disclaimer when we enable > >> best-effort migration with invariant TSC.) > > > > About getting rid of kvmclock, > > I never wanted to get rid of kvmclock. In the first part of the email > in question, I meant that the shift and scale can be accelerated by > VMX-TSC hardware, leaving only a check that kvmclock in expected mode > and rdtsc to get the result. If host TSC can be used, then its not necessary to have the kvmclock complication. > > problem is steal time. Should > > separate steal time reporting from rest of kvmclock, so that you > > can use TSC clocksource and have steal time reporting. > > We can already do that, steal time doesn't depend on guest sched clock. > Steal time uses a MSR+memory based interface that is related to kvmclock > only by shared notion of a second. Err, i meant "guest stop notification" which is done via flags field. > > Also, its very clear why migration was disabled, because > > invariant tsc man page says: > > > > QEMU commit 68bfd0ad4a1dcc4c328d5db85dc746b49c1ec07e > > > > target-i386: block migration and savevm if invariant tsc is exposed > > > > Invariant TSC documentation mentions that "invariant TSC will run at a > > constant rate in all ACPI P-, C-. and T-states". > > > > This is not the case if migration to a host with different TSC frequency > > is allowed, or if savevm is performed. So block migration/savevm. > > > > The issue is, even with migration to a host with > > proper frequency, TSC counting will stop for the duration of migration. > > Stopping is the easiest solution. We can also try to mitigate the > difference by synchronizing time on source and destination hosts, > sharing what UTC/TAI/... time there was at one TSC read on the source, > and setting the appropriate TSC shift on the destination. (And solve > accumulation of the error, maybe by always using the initial pair.) > > The result should be less off than when stopping and the guest couldn't > tell that TSC rate varied as it can't have more reliable time source > than the host. > > The issue doesn't have a good solution and I think that some people will > prefer drawbacks associated with invariant TSC migration. > (They do so for other time sources and all have the issue + we already > migrate constant TSC, which can only match the spec if we make some > excuses, like "migration forces CPUs into a deep C-state".) > > > But i suppose you can document the fact (that "invariant TSC" behaviour > > as documented is different than what exposed by virtualization), > > Yep, that generic explanation is quite likely, next to no documentation. > > (There are some lawyerish explanations that don't need to violate the > spec, but I prefer the physics-based one.) > > > and > > go for it. > > I definitely won't be proactive. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: What's kvmclock's custom sched_clock for? 2016-01-12 20:48 ` Marcelo Tosatti @ 2016-01-13 14:59 ` Radim Krcmar 0 siblings, 0 replies; 14+ messages in thread From: Radim Krcmar @ 2016-01-13 14:59 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Andy Lutomirski, kvm list 2016-01-12 18:48-0200, Marcelo Tosatti: > On Tue, Jan 12, 2016 at 04:33:28PM +0100, Radim Krcmar wrote: >> 2016-01-11 19:00-0200, Marcelo Tosatti: >> > About getting rid of kvmclock, >> >> I never wanted to get rid of kvmclock. In the first part of the email >> in question, I meant that the shift and scale can be accelerated by >> VMX-TSC hardware, leaving only a check that kvmclock in expected mode >> and rdtsc to get the result. > > If host TSC can be used, then its not necessary to have the kvmclock > complication. Yes, it's just easier to have an indirection until all hosts can be used. (And that condition may never be true, so we'll just hide obsoleted code in an unlikely path.) >> > problem is steal time. Should >> > separate steal time reporting from rest of kvmclock, so that you >> > can use TSC clocksource and have steal time reporting. >> >> We can already do that, steal time doesn't depend on guest sched clock. >> Steal time uses a MSR+memory based interface that is related to kvmclock >> only by shared notion of a second. > > Err, i meant "guest stop notification" which is done via flags field. True, we read the bit without looking at time, so a split wouldn't be unnatural. (The current code probably works with any clocksource if kvmclock is set up first :/) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: What's kvmclock's custom sched_clock for? 2016-01-07 7:18 What's kvmclock's custom sched_clock for? Andy Lutomirski 2016-01-07 8:41 ` Andy Lutomirski @ 2016-01-07 10:56 ` Marcelo Tosatti 1 sibling, 0 replies; 14+ messages in thread From: Marcelo Tosatti @ 2016-01-07 10:56 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Radim Krcmar, kvm list On Wed, Jan 06, 2016 at 11:18:51PM -0800, Andy Lutomirski wrote: > AFAICT KVM reliably passes a monotonic TSC through to guests, It does not. > even if the host suspends. That's all that sched_clock needs, I think. > > So why does kvmclock have a custom sched_clock? Migration between hosts with different TSC frequencies. > On a related note, KVM doesn't pass the "invariant TSC" feature > through to guests on my machine even though "invtsc" is set in QEMU > and the kernel host code appears to support it. What gives? > > --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-01-13 14:59 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-01-07 7:18 What's kvmclock's custom sched_clock for? Andy Lutomirski 2016-01-07 8:41 ` Andy Lutomirski 2016-01-07 10:59 ` Marcelo Tosatti 2016-01-07 15:18 ` Radim Krcmar 2016-01-07 17:27 ` Andy Lutomirski 2016-01-07 17:48 ` Radim Krcmar 2016-01-07 20:15 ` Marcelo Tosatti 2016-01-07 20:10 ` Marcelo Tosatti 2016-01-08 14:13 ` Radim Krcmar 2016-01-11 21:00 ` Marcelo Tosatti 2016-01-12 15:33 ` Radim Krcmar 2016-01-12 20:48 ` Marcelo Tosatti 2016-01-13 14:59 ` Radim Krcmar 2016-01-07 10:56 ` Marcelo Tosatti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).