* [PATCH] x86/xen: Add "xen_timer_slop" command line option @ 2019-03-22 18:29 thibodux 2019-03-22 22:10 ` Boris Ostrovsky 2019-04-24 18:47 ` Boris Ostrovsky 0 siblings, 2 replies; 22+ messages in thread From: thibodux @ 2019-03-22 18:29 UTC (permalink / raw) To: xen-devel, linux-kernel Cc: boris.ostrovsky, oleksandr_andrushchenko, tglx, jgross, thibodux, ryan.thibodeaux From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io> Add a new command-line option "xen_timer_slop=<INT>" that sets the minimum delta of virtual Xen timers. This commit does not change the default timer slop value for virtual Xen timers. Lowering the timer slop value should improve the accuracy of virtual timers (e.g., better process dispatch latency), but it will likely increase the number of virtual timer interrupts (relative to the original slop setting). The original timer slop value has not changed since the introduction of the Xen-aware Linux kernel code. This commit provides users an opportunity to tune timer performance given the refinements to hardware and the Xen event channel processing. It also mirrors a feature in the Xen hypervisor - the "timer_slop" Xen command line option. Signed-off-by: Ryan Thibodeaux <ryan.thibodeaux@starlab.io> --- Documentation/admin-guide/kernel-parameters.txt | 7 +++++++ arch/x86/xen/time.c | 18 ++++++++++++++++-- 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 858b6c0..fb58c84 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5129,6 +5129,13 @@ with /sys/devices/system/xen_memory/xen_memory0/scrub_pages. Default value controlled with CONFIG_XEN_SCRUB_PAGES_DEFAULT. + xen_timer_slop= [X86-64,XEN] + Set the timer slop (in nanoseconds) for the virtual Xen + timers (default is 100000). This adjusts the minimum + delta of virtualized Xen timers, where lower values + improve timer resolution at the expense of processing + more timer interrupts. + xirc2ps_cs= [NET,PCMCIA] Format: <irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]] diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index 6e29794..0393968 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -212,7 +212,7 @@ static int xen_timerop_set_next_event(unsigned long delta, return 0; } -static const struct clock_event_device xen_timerop_clockevent = { +static struct clock_event_device xen_timerop_clockevent __ro_after_init = { .name = "xen", .features = CLOCK_EVT_FEAT_ONESHOT, @@ -273,7 +273,7 @@ static int xen_vcpuop_set_next_event(unsigned long delta, return ret; } -static const struct clock_event_device xen_vcpuop_clockevent = { +static struct clock_event_device xen_vcpuop_clockevent __ro_after_init = { .name = "xen", .features = CLOCK_EVT_FEAT_ONESHOT, @@ -570,3 +570,17 @@ void __init xen_hvm_init_time_ops(void) x86_platform.set_wallclock = xen_set_wallclock; } #endif + +/* Kernel parameter to specify Xen timer slop */ +static int __init parse_xen_timer_slop(char *ptr) +{ + unsigned long slop = memparse(ptr, NULL); + + xen_timerop_clockevent.min_delta_ns = slop; + xen_timerop_clockevent.min_delta_ticks = slop; + xen_vcpuop_clockevent.min_delta_ns = slop; + xen_vcpuop_clockevent.min_delta_ticks = slop; + + return 0; +} +early_param("xen_timer_slop", parse_xen_timer_slop); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-22 18:29 [PATCH] x86/xen: Add "xen_timer_slop" command line option thibodux @ 2019-03-22 22:10 ` Boris Ostrovsky 2019-03-23 2:58 ` Dario Faggioli 2019-03-23 12:00 ` Ryan Thibodeaux 2019-04-24 18:47 ` Boris Ostrovsky 1 sibling, 2 replies; 22+ messages in thread From: Boris Ostrovsky @ 2019-03-22 22:10 UTC (permalink / raw) To: thibodux, xen-devel, linux-kernel Cc: oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux On 3/22/19 2:29 PM, thibodux@gmail.com wrote: > From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io> > > Add a new command-line option "xen_timer_slop=<INT>" that sets the > minimum delta of virtual Xen timers. This commit does not change the > default timer slop value for virtual Xen timers. > > Lowering the timer slop value should improve the accuracy of virtual > timers (e.g., better process dispatch latency), but it will likely > increase the number of virtual timer interrupts (relative to the > original slop setting). > > The original timer slop value has not changed since the introduction > of the Xen-aware Linux kernel code. This commit provides users an > opportunity to tune timer performance given the refinements to > hardware and the Xen event channel processing. It also mirrors > a feature in the Xen hypervisor - the "timer_slop" Xen command line > option. Is there any data that shows effects of using this new parameter? -boris > > Signed-off-by: Ryan Thibodeaux <ryan.thibodeaux@starlab.io> > --- > Documentation/admin-guide/kernel-parameters.txt | 7 +++++++ > arch/x86/xen/time.c | 18 ++++++++++++++++-- > 2 files changed, 23 insertions(+), 2 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 858b6c0..fb58c84 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -5129,6 +5129,13 @@ > with /sys/devices/system/xen_memory/xen_memory0/scrub_pages. > Default value controlled with CONFIG_XEN_SCRUB_PAGES_DEFAULT. > > + xen_timer_slop= [X86-64,XEN] > + Set the timer slop (in nanoseconds) for the virtual Xen > + timers (default is 100000). This adjusts the minimum > + delta of virtualized Xen timers, where lower values > + improve timer resolution at the expense of processing > + more timer interrupts. > + > xirc2ps_cs= [NET,PCMCIA] > Format: > <irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]] > diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c > index 6e29794..0393968 100644 > --- a/arch/x86/xen/time.c > +++ b/arch/x86/xen/time.c > @@ -212,7 +212,7 @@ static int xen_timerop_set_next_event(unsigned long delta, > return 0; > } > > -static const struct clock_event_device xen_timerop_clockevent = { > +static struct clock_event_device xen_timerop_clockevent __ro_after_init = { > .name = "xen", > .features = CLOCK_EVT_FEAT_ONESHOT, > > @@ -273,7 +273,7 @@ static int xen_vcpuop_set_next_event(unsigned long delta, > return ret; > } > > -static const struct clock_event_device xen_vcpuop_clockevent = { > +static struct clock_event_device xen_vcpuop_clockevent __ro_after_init = { > .name = "xen", > .features = CLOCK_EVT_FEAT_ONESHOT, > > @@ -570,3 +570,17 @@ void __init xen_hvm_init_time_ops(void) > x86_platform.set_wallclock = xen_set_wallclock; > } > #endif > + > +/* Kernel parameter to specify Xen timer slop */ > +static int __init parse_xen_timer_slop(char *ptr) > +{ > + unsigned long slop = memparse(ptr, NULL); > + > + xen_timerop_clockevent.min_delta_ns = slop; > + xen_timerop_clockevent.min_delta_ticks = slop; > + xen_vcpuop_clockevent.min_delta_ns = slop; > + xen_vcpuop_clockevent.min_delta_ticks = slop; > + > + return 0; > +} > +early_param("xen_timer_slop", parse_xen_timer_slop); ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-22 22:10 ` Boris Ostrovsky @ 2019-03-23 2:58 ` Dario Faggioli 2019-03-23 10:41 ` luca abeni 2019-03-23 12:00 ` Ryan Thibodeaux 1 sibling, 1 reply; 22+ messages in thread From: Dario Faggioli @ 2019-03-23 2:58 UTC (permalink / raw) To: Boris Ostrovsky, thibodux, xen-devel, linux-kernel Cc: oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux, luca abeni [-- Attachment #1: Type: text/plain, Size: 1386 bytes --] On Fri, 2019-03-22 at 18:10 -0400, Boris Ostrovsky wrote: > On 3/22/19 2:29 PM, thibodux@gmail.com wrote: > > From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io> > > > > The original timer slop value has not changed since the > > introduction > > of the Xen-aware Linux kernel code. This commit provides users an > > opportunity to tune timer performance given the refinements to > > hardware and the Xen event channel processing. It also mirrors > > a feature in the Xen hypervisor - the "timer_slop" Xen command line > > option. > > Is there any data that shows effects of using this new parameter? > Yes, I've done some research and experiments on this. I did it together with a friend, which I'm Cc-ing, as I'm not sure we're ready/capable to share the results, yet (Luca?). What I think I can anticipate is that having such a high value for timer slop in the kernel, for the Xen clockevent device is (together with the also quite high default value of timer_slop in Xen itself) responsible for really high vcpu wakeup latencies. Lowering those two values, reduces such latencies dramatically. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-23 2:58 ` Dario Faggioli @ 2019-03-23 10:41 ` luca abeni 2019-03-25 12:05 ` luca abeni 0 siblings, 1 reply; 22+ messages in thread From: luca abeni @ 2019-03-23 10:41 UTC (permalink / raw) To: Dario Faggioli Cc: Boris Ostrovsky, thibodux, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux Hi all, On Sat, 23 Mar 2019 03:58:16 +0100 Dario Faggioli <dfaggioli@suse.com> wrote: > On Fri, 2019-03-22 at 18:10 -0400, Boris Ostrovsky wrote: > > On 3/22/19 2:29 PM, thibodux@gmail.com wrote: > > > From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io> > > > > > > The original timer slop value has not changed since the > > > introduction > > > of the Xen-aware Linux kernel code. This commit provides users an > > > opportunity to tune timer performance given the refinements to > > > hardware and the Xen event channel processing. It also mirrors > > > a feature in the Xen hypervisor - the "timer_slop" Xen command > > > line option. > > > > Is there any data that shows effects of using this new parameter? > > > Yes, I've done some research and experiments on this. I did it > together with a friend, which I'm Cc-ing, as I'm not sure we're > ready/capable to share the results, yet (Luca?). I think we can easily share the experimental data (cyclictest output and plots). Moreover, we can share the scripts and tools for running the experiments (so, everyone can easily reproduce the numbers by simply typing "make" and waiting for some time :) I'll try to package the results and the scripts/tools this evening, and I'll send them. Luca > > What I think I can anticipate is that having such a high value for > timer slop in the kernel, for the Xen clockevent device is (together > with the also quite high default value of timer_slop in Xen itself) > responsible for really high vcpu wakeup latencies. > > Lowering those two values, reduces such latencies dramatically. > > Regards, > Dario ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-23 10:41 ` luca abeni @ 2019-03-25 12:05 ` luca abeni 2019-03-25 13:43 ` Boris Ostrovsky 0 siblings, 1 reply; 22+ messages in thread From: luca abeni @ 2019-03-25 12:05 UTC (permalink / raw) To: Dario Faggioli Cc: Boris Ostrovsky, thibodux, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux Hi all, On Sat, 23 Mar 2019 11:41:51 +0100 luca abeni <luca.abeni@santannapisa.it> wrote: [...] > > > Is there any data that shows effects of using this new parameter? > > > > > Yes, I've done some research and experiments on this. I did it > > together with a friend, which I'm Cc-ing, as I'm not sure we're > > ready/capable to share the results, yet (Luca?). > > I think we can easily share the experimental data (cyclictest output > and plots). > > Moreover, we can share the scripts and tools for running the > experiments (so, everyone can easily reproduce the numbers by simply > typing "make" and waiting for some time :) > > > I'll try to package the results and the scripts/tools this evening, > and I'll send them. Sorry for the delay. I put some quick results here: http://retis.santannapisa.it/luca/XenTimers/ (there also is a link to the scripts to be used for reproducing the results). The latencies have been measured by running cyclictest in the guest (see the scripts for details). The picture shows the latencies measured with an unpatched guest kernel and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary small value :). All the experiments have been performed booting the hypervisor with a small timer_slop (the hypervisor's one) value. So, they show that decreasing the hypervisor's timer_slop is not enough to measure low latencies with cyclictest. Luca ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-25 12:05 ` luca abeni @ 2019-03-25 13:43 ` Boris Ostrovsky 2019-03-25 14:07 ` luca abeni ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Boris Ostrovsky @ 2019-03-25 13:43 UTC (permalink / raw) To: luca abeni, Dario Faggioli Cc: thibodux, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux On 3/25/19 8:05 AM, luca abeni wrote: > Hi all, > > On Sat, 23 Mar 2019 11:41:51 +0100 > luca abeni <luca.abeni@santannapisa.it> wrote: > [...] >>>> Is there any data that shows effects of using this new parameter? >>>> >>> Yes, I've done some research and experiments on this. I did it >>> together with a friend, which I'm Cc-ing, as I'm not sure we're >>> ready/capable to share the results, yet (Luca?). >> I think we can easily share the experimental data (cyclictest output >> and plots). >> >> Moreover, we can share the scripts and tools for running the >> experiments (so, everyone can easily reproduce the numbers by simply >> typing "make" and waiting for some time :) >> >> >> I'll try to package the results and the scripts/tools this evening, >> and I'll send them. > Sorry for the delay. I put some quick results here: > http://retis.santannapisa.it/luca/XenTimers/ > (there also is a link to the scripts to be used for reproducing the > results). The latencies have been measured by running cyclictest in the > guest (see the scripts for details). > > The picture shows the latencies measured with an unpatched guest kernel > and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary small > value :). > All the experiments have been performed booting the hypervisor with a > small timer_slop (the hypervisor's one) value. So, they show that > decreasing the hypervisor's timer_slop is not enough to measure low > latencies with cyclictest. I have a couple of questions: * Does it make sense to make this a tunable for other clockevent devices as well? * This patch adjusts min value. Could max value (ever) need a similar adjustment? -boris ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-25 13:43 ` Boris Ostrovsky @ 2019-03-25 14:07 ` luca abeni 2019-03-25 14:11 ` Ryan Thibodeaux 2019-03-26 9:13 ` Dario Faggioli 2 siblings, 0 replies; 22+ messages in thread From: luca abeni @ 2019-03-25 14:07 UTC (permalink / raw) To: Boris Ostrovsky Cc: Dario Faggioli, thibodux, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux Hi, On Mon, 25 Mar 2019 09:43:20 -0400 Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: [...] > > http://retis.santannapisa.it/luca/XenTimers/ > > (there also is a link to the scripts to be used for reproducing the > > results). The latencies have been measured by running cyclictest in > > the guest (see the scripts for details). > > > > The picture shows the latencies measured with an unpatched guest > > kernel and with a guest kernel having TIMER_SLOP set to 1000 > > (arbitrary small value :). > > All the experiments have been performed booting the hypervisor with > > a small timer_slop (the hypervisor's one) value. So, they show that > > decreasing the hypervisor's timer_slop is not enough to measure low > > latencies with cyclictest. > > > > I have a couple of questions: > * Does it make sense to make this a tunable for other clockevent > devices as well? > * This patch adjusts min value. Could max value (ever) need a similar > adjustment? Sorry, I do not know much about clockevent devices, so I have no answers to these questions... What I can say is that when I repeated the cyclictest experiments on VMs using a different clockevent device (lapic) I did not measure large latencies. So, I guess the "lapic" clockevent device already defaults to a smaller min value (not sure about other clockevent devices, I do not know how to test them). Luca ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-25 13:43 ` Boris Ostrovsky 2019-03-25 14:07 ` luca abeni @ 2019-03-25 14:11 ` Ryan Thibodeaux 2019-03-25 17:36 ` Ryan Thibodeaux 2019-03-25 18:31 ` Boris Ostrovsky 2019-03-26 9:13 ` Dario Faggioli 2 siblings, 2 replies; 22+ messages in thread From: Ryan Thibodeaux @ 2019-03-25 14:11 UTC (permalink / raw) To: Boris Ostrovsky Cc: luca abeni, Dario Faggioli, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux On Mon, Mar 25, 2019 at 09:43:20AM -0400, Boris Ostrovsky wrote: > On 3/25/19 8:05 AM, luca abeni wrote: > > Hi all, > > > > On Sat, 23 Mar 2019 11:41:51 +0100 > > luca abeni <luca.abeni@santannapisa.it> wrote: > > [...] > >>>> Is there any data that shows effects of using this new parameter? > >>>> > >>> Yes, I've done some research and experiments on this. I did it > >>> together with a friend, which I'm Cc-ing, as I'm not sure we're > >>> ready/capable to share the results, yet (Luca?). > >> I think we can easily share the experimental data (cyclictest output > >> and plots). > >> > >> Moreover, we can share the scripts and tools for running the > >> experiments (so, everyone can easily reproduce the numbers by simply > >> typing "make" and waiting for some time :) > >> > >> > >> I'll try to package the results and the scripts/tools this evening, > >> and I'll send them. > > Sorry for the delay. I put some quick results here: > > http://retis.santannapisa.it/luca/XenTimers/ > > (there also is a link to the scripts to be used for reproducing the > > results). The latencies have been measured by running cyclictest in the > > guest (see the scripts for details). > > > > The picture shows the latencies measured with an unpatched guest kernel > > and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary small > > value :). > > All the experiments have been performed booting the hypervisor with a > > small timer_slop (the hypervisor's one) value. So, they show that > > decreasing the hypervisor's timer_slop is not enough to measure low > > latencies with cyclictest. > > > > I have a couple of questions: > * Does it make sense to make this a tunable for other clockevent devices > as well? I gather that would be on a case-by-case basis for very specific ones. For many timers in the kernel, the minimums are determined by the actual hardware backing the timer, and the minimum can be adjusted by the clockevent code. This case is special since it is entirely a software-based implementation in the kernel, where the actual timer implementation is in the Xen hypervisor. > * This patch adjusts min value. Could max value (ever) need a similar > adjustment? I cannot think of such a case where that would be helpful, but I cannot rule that out or speak as an authority. - Ryan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-25 14:11 ` Ryan Thibodeaux @ 2019-03-25 17:36 ` Ryan Thibodeaux 2019-03-25 18:31 ` Boris Ostrovsky 1 sibling, 0 replies; 22+ messages in thread From: Ryan Thibodeaux @ 2019-03-25 17:36 UTC (permalink / raw) To: Boris Ostrovsky Cc: luca abeni, Dario Faggioli, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux On Mon, Mar 25, 2019 at 10:11:38AM -0400, Ryan Thibodeaux wrote: > > > [...] > > >>>> Is there any data that shows effects of using this new parameter? > > >>>> Continuing with the experimental data conversation (thanks to Luca and Dario for being so generous), I am providing more results from quick tests this morning. I ran the same sequence of tests four times with the same hardware, hypervisor, and Linux guest setup. Only changes between runs was adjusting the slop settings in Xen and Linux. This was on a build of Xen 4.10 and a Linux guest running the current Xen tip.git kernel with my patch. For each sequence, I ran two variations of cyclictest on an isolated processor. The first test used an interval of 50 microseconds and second test used an interval of 1000 microseconds, passing "-i50" and "-10000" arguments to cyclictest respectively. The variations of the sequences are as follows: #1 - default slops: Xen@50000, Linux@100000 #2 - lowering Linux: Xen@50000, Linux@5000 #3 - lowering Xen: Xen@5000, Linux@100000 #4 - lowering both: Xen@5000, Linux@5000 The cleaned up test output is below. Only showing the total stats for each run and the number of spikes / samples that went over 100 microseconds. I do not record each sample value like Luca and Dario, because I want to eliminate as many variables as possible, e.g., eliminating overhead of writing out raw results. Looking at the results, you can see that only lowering the Linux slop (with my proposed patch) does reduce the overall PDL stats for the shorter interval, but it especially lowers the spikes, in both cases. Even in test #3 where the Xen slop was lowered, the spikes are a problem at the default Linux slop. Reiterating what Luca and Dario said, lowering both slops is the way to consisten results for both interval configurations. Note: even better stats can likely be achieved with more tuning and using the RT patchset. These results were just focusing on a non-specialized configuration. ... ############################## # Timer Slop: Xen (default, 50000) | Guest (default, 100000) # Cyclictest Interval (-i50) Min: 62 Avg: 127 Max: 212 Spikes (over 100): 3892034 # Cyclictest Interval (-i1000) Min: 24 Avg: 45 Max: 156 Spikes (over 100): 27 ############################## # Timer Slop: Xen (default, 50000) | Guest (5000) # Cyclictest Interval (-i50) Min: 25 Avg: 78 Max: 230 Spikes (over 100): 274549 # Cyclictest Interval (-i1000) Min: 37 Avg: 45 Max: 82 Spikes (over 100): 0 ############################## # Timer Slop: Xen (5000) | Guest (default, 100000) # Cyclictest Interval (-i50) Min: 61 Avg: 126 Max: 226 Spikes (over 100): 3877860 # Cyclictest Interval (-i1000) Min: 37 Avg: 45 Max: 74 Spikes (over 100): 0 ############################## # Timer Slop: Xen (5000) | Guest (5000) # Cyclictest Interval (-i50) Min: 13 Avg: 30 Max: 150 Spikes (over 100): 120 # Cyclictest Interval (-i1000) Min: 37 Avg: 45 Max: 97 Spikes (over 100): 0 ... - Ryan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-25 14:11 ` Ryan Thibodeaux 2019-03-25 17:36 ` Ryan Thibodeaux @ 2019-03-25 18:31 ` Boris Ostrovsky 1 sibling, 0 replies; 22+ messages in thread From: Boris Ostrovsky @ 2019-03-25 18:31 UTC (permalink / raw) To: Ryan Thibodeaux Cc: luca abeni, Dario Faggioli, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux On 3/25/19 10:11 AM, Ryan Thibodeaux wrote: > On Mon, Mar 25, 2019 at 09:43:20AM -0400, Boris Ostrovsky wrote: >> On 3/25/19 8:05 AM, luca abeni wrote: >>> Hi all, >>> >>> On Sat, 23 Mar 2019 11:41:51 +0100 >>> luca abeni <luca.abeni@santannapisa.it> wrote: >>> [...] >>>>>> Is there any data that shows effects of using this new parameter? >>>>>> >>>>> Yes, I've done some research and experiments on this. I did it >>>>> together with a friend, which I'm Cc-ing, as I'm not sure we're >>>>> ready/capable to share the results, yet (Luca?). >>>> I think we can easily share the experimental data (cyclictest output >>>> and plots). >>>> >>>> Moreover, we can share the scripts and tools for running the >>>> experiments (so, everyone can easily reproduce the numbers by simply >>>> typing "make" and waiting for some time :) >>>> >>>> >>>> I'll try to package the results and the scripts/tools this evening, >>>> and I'll send them. >>> Sorry for the delay. I put some quick results here: >>> http://retis.santannapisa.it/luca/XenTimers/ >>> (there also is a link to the scripts to be used for reproducing the >>> results). The latencies have been measured by running cyclictest in the >>> guest (see the scripts for details). >>> >>> The picture shows the latencies measured with an unpatched guest kernel >>> and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary small >>> value :). >>> All the experiments have been performed booting the hypervisor with a >>> small timer_slop (the hypervisor's one) value. So, they show that >>> decreasing the hypervisor's timer_slop is not enough to measure low >>> latencies with cyclictest. >> >> >> I have a couple of questions: >> * Does it make sense to make this a tunable for other clockevent devices >> as well? > I gather that would be on a case-by-case basis for very specific > ones. > > For many timers in the kernel, the minimums are determined by the > actual hardware backing the timer, and the minimum can be > adjusted by the clockevent code. This case is special since it > is entirely a software-based implementation in the kernel, where > the actual timer implementation is in the Xen hypervisor. > >> * This patch adjusts min value. Could max value (ever) need a similar >> adjustment? > I cannot think of such a case where that would be helpful, but I > cannot rule that out or speak as an authority. I am asking mostly because you are introducing new interface and I don't want it to change in the future. I suppose if later we decide to add control for the max value we could just expand your current proposal to xen_timer_slop=[min],[max] and keep it to be back-compatible. For the patch: Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-25 13:43 ` Boris Ostrovsky 2019-03-25 14:07 ` luca abeni 2019-03-25 14:11 ` Ryan Thibodeaux @ 2019-03-26 9:13 ` Dario Faggioli 2019-03-26 11:12 ` luca abeni 2019-03-26 23:21 ` Boris Ostrovsky 2 siblings, 2 replies; 22+ messages in thread From: Dario Faggioli @ 2019-03-26 9:13 UTC (permalink / raw) To: Boris Ostrovsky, luca abeni Cc: thibodux, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux [-- Attachment #1: Type: text/plain, Size: 3280 bytes --] On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote: > On 3/25/19 8:05 AM, luca abeni wrote: > > > > The picture shows the latencies measured with an unpatched guest > > kernel > > and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary > > small > > value :). > > All the experiments have been performed booting the hypervisor with > > a > > small timer_slop (the hypervisor's one) value. So, they show that > > decreasing the hypervisor's timer_slop is not enough to measure low > > latencies with cyclictest. > > I have a couple of questions: > * Does it make sense to make this a tunable for other clockevent > devices > as well? > So, AFAIUI, the thing is as follows. In clockevents_program_event(), we keep the delta between now and the next timer event within dev->max_delta_ns and dev->min_delta_ns: delta = min(delta, (int64_t) dev->max_delta_ns); delta = max(delta, (int64_t) dev->min_delta_ns); For Xen (well, for the Xen clock) we have: .max_delta_ns = 0xffffffff, .min_delta_ns = TIMER_SLOP, which means a guest can't ask for a timer to fire earlier than 100us ahead, which is a bit too coarse, especially on contemporary hardware. For "lapic_deadline" (which was what was in use in KVM guests, in our experiments) we have: lapic_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFF, &lapic_clockevent); lapic_clockevent.min_delta_ns = clockevent_delta2ns(0xF, &lapic_clockevent); Which means max is 0x7FFFFF device ticks, and min is 0xF. clockevent_delta2ns() does the conversion from ticks to ns, basing on the results of the APIC calibration process. It calls cev_delta2ns() which does some scaling, shifting, divs, etc, and, at the very end, this: /* Deltas less than 1usec are pointless noise */ return clc > 1000 ? clc : 1000; So, as Ryan is also saying, the actual minimum, in this case, depends on hardware, with a sanity check of "never below 1us" (which is quite smaller than 100us!) Of course, the actual granularity depends on hardware in the Xen case as well, but that is handled in Xen itself. And we have mechanisms in place in there to avoid timer interrupt storms (like, ahem, the Xen's 'timer_slop' boot parameter... :-P) And this is basically why I was also thinking we can/should lower the default value of TIMER_SLOP, here in the Xen clock implementation in Linux. > * This patch adjusts min value. Could max value (ever) need a similar > adjustment? > Well, for Xen, it's already 0xffffffff. I don't see use cases when one would want a smaller max. Wanting an higher max *might* be of some interest, e.g., for power management, if the first timer event is 1min ahead, and you don't want to be woken up every (if my math is right) 4 secs. But we'd have to see if that actually works, not to mention that 4 secs is already large enough, IMHO, that it's unlikely we'll be really sleeping for that much time without having to wake up for one reason or another. :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-26 9:13 ` Dario Faggioli @ 2019-03-26 11:12 ` luca abeni 2019-03-26 11:41 ` Ryan Thibodeaux 2019-03-26 23:21 ` Boris Ostrovsky 1 sibling, 1 reply; 22+ messages in thread From: luca abeni @ 2019-03-26 11:12 UTC (permalink / raw) To: Dario Faggioli Cc: Boris Ostrovsky, thibodux, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux Hi all, On Tue, 26 Mar 2019 10:13:32 +0100 Dario Faggioli <dfaggioli@suse.com> wrote: > On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote: > > On 3/25/19 8:05 AM, luca abeni wrote: > > > > > > The picture shows the latencies measured with an unpatched guest > > > kernel > > > and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary > > > small > > > value :). > > > All the experiments have been performed booting the hypervisor > > > with a > > > small timer_slop (the hypervisor's one) value. So, they show that > > > decreasing the hypervisor's timer_slop is not enough to measure > > > low latencies with cyclictest. > > > > I have a couple of questions: > > * Does it make sense to make this a tunable for other clockevent > > devices > > as well? > > > So, AFAIUI, the thing is as follows. In clockevents_program_event(), > we keep the delta between now and the next timer event within > dev->max_delta_ns and dev->min_delta_ns: > > delta = min(delta, (int64_t) dev->max_delta_ns); > delta = max(delta, (int64_t) dev->min_delta_ns); > > For Xen (well, for the Xen clock) we have: > > .max_delta_ns = 0xffffffff, > .min_delta_ns = TIMER_SLOP, > > which means a guest can't ask for a timer to fire earlier than 100us [...] I know this is not fully related with the current discussion, but in these days I had a look at the code again, and... The comment for TIMER_SLOP in arch/x86/xen/time.c says: /* Xen may fire a timer up to this many ns early */ Isn't the comment wrong? shouldn't it be "...many ns late" instead of "early"? Thanks, Luca ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-26 11:12 ` luca abeni @ 2019-03-26 11:41 ` Ryan Thibodeaux 0 siblings, 0 replies; 22+ messages in thread From: Ryan Thibodeaux @ 2019-03-26 11:41 UTC (permalink / raw) To: luca abeni Cc: Dario Faggioli, Boris Ostrovsky, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux On Tue, Mar 26, 2019 at 12:12:56PM +0100, luca abeni wrote: > Hi all, > > On Tue, 26 Mar 2019 10:13:32 +0100 > Dario Faggioli <dfaggioli@suse.com> wrote: > > > On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote: > > > On 3/25/19 8:05 AM, luca abeni wrote: > > > > > > > > The picture shows the latencies measured with an unpatched guest > > > > kernel > > > > and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary > > > > small > > > > value :). > > > > All the experiments have been performed booting the hypervisor > > > > with a > > > > small timer_slop (the hypervisor's one) value. So, they show that > > > > decreasing the hypervisor's timer_slop is not enough to measure > > > > low latencies with cyclictest. > > > > > > I have a couple of questions: > > > * Does it make sense to make this a tunable for other clockevent > > > devices > > > as well? > > > > > So, AFAIUI, the thing is as follows. In clockevents_program_event(), > > we keep the delta between now and the next timer event within > > dev->max_delta_ns and dev->min_delta_ns: > > > > delta = min(delta, (int64_t) dev->max_delta_ns); > > delta = max(delta, (int64_t) dev->min_delta_ns); > > > > For Xen (well, for the Xen clock) we have: > > > > .max_delta_ns = 0xffffffff, > > .min_delta_ns = TIMER_SLOP, > > > > which means a guest can't ask for a timer to fire earlier than 100us > [...] > > I know this is not fully related with the current discussion, but in > these days I had a look at the code again, and... > The comment for TIMER_SLOP in arch/x86/xen/time.c says: > /* Xen may fire a timer up to this many ns early */ > > Isn't the comment wrong? shouldn't it be "...many ns late" instead of > "early"? I would say is something else entirely. If you look at "clockevents_program_event()" in kernel/time/clockevents.c, you see that the min_delta_ns value sets the limit or granulariy for the clock's sleep time. Basically, it is the minimum amount of sleep one can set for the next event for the clock in question. - Ryan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-26 9:13 ` Dario Faggioli 2019-03-26 11:12 ` luca abeni @ 2019-03-26 23:21 ` Boris Ostrovsky 2019-03-27 10:00 ` Ryan Thibodeaux 1 sibling, 1 reply; 22+ messages in thread From: Boris Ostrovsky @ 2019-03-26 23:21 UTC (permalink / raw) To: Dario Faggioli, luca abeni Cc: thibodux, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux [-- Attachment #1.1: Type: text/plain, Size: 2744 bytes --] On 3/26/19 5:13 AM, Dario Faggioli wrote: > On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote: >> On 3/25/19 8:05 AM, luca abeni wrote: >>> The picture shows the latencies measured with an unpatched guest >>> kernel >>> and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary >>> small >>> value :). >>> All the experiments have been performed booting the hypervisor with >>> a >>> small timer_slop (the hypervisor's one) value. So, they show that >>> decreasing the hypervisor's timer_slop is not enough to measure low >>> latencies with cyclictest. >> I have a couple of questions: >> * Does it make sense to make this a tunable for other clockevent >> devices >> as well? >> > So, AFAIUI, the thing is as follows. In clockevents_program_event(), we > keep the delta between now and the next timer event within > dev->max_delta_ns and dev->min_delta_ns: > > delta = min(delta, (int64_t) dev->max_delta_ns); > delta = max(delta, (int64_t) dev->min_delta_ns); > > For Xen (well, for the Xen clock) we have: > > .max_delta_ns = 0xffffffff, > .min_delta_ns = TIMER_SLOP, > > which means a guest can't ask for a timer to fire earlier than 100us > ahead, which is a bit too coarse, especially on contemporary hardware. > > For "lapic_deadline" (which was what was in use in KVM guests, in our > experiments) we have: > > lapic_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFF, &lapic_clockevent); > lapic_clockevent.min_delta_ns = clockevent_delta2ns(0xF, &lapic_clockevent); > > Which means max is 0x7FFFFF device ticks, and min is 0xF. > clockevent_delta2ns() does the conversion from ticks to ns, basing on > the results of the APIC calibration process. It calls cev_delta2ns() > which does some scaling, shifting, divs, etc, and, at the very end, > this: > > /* Deltas less than 1usec are pointless noise */ > return clc > 1000 ? clc : 1000; > > So, as Ryan is also saying, the actual minimum, in this case, depends > on hardware, with a sanity check of "never below 1us" (which is quite > smaller than 100us!) > > Of course, the actual granularity depends on hardware in the Xen case > as well, but that is handled in Xen itself. And we have mechanisms in > place in there to avoid timer interrupt storms (like, ahem, the Xen's > 'timer_slop' boot parameter... :-P) > > And this is basically why I was also thinking we can/should lower the > default value of TIMER_SLOP, here in the Xen clock implementation in > Linux. What do you think would be a sane value? 10us? Should we then still keep this patch? My concern would be that if we change the current value and it turns out to be very wrong we'd then have no recourse. -boris [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-26 23:21 ` Boris Ostrovsky @ 2019-03-27 10:00 ` Ryan Thibodeaux 2019-03-27 14:46 ` Boris Ostrovsky 0 siblings, 1 reply; 22+ messages in thread From: Ryan Thibodeaux @ 2019-03-27 10:00 UTC (permalink / raw) To: Boris Ostrovsky Cc: Dario Faggioli, luca abeni, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux On Tue, Mar 26, 2019 at 07:21:31PM -0400, Boris Ostrovsky wrote: > On 3/26/19 5:13 AM, Dario Faggioli wrote: > > On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote: > >> On 3/25/19 8:05 AM, luca abeni wrote: > >>> The picture shows the latencies measured with an unpatched guest > >>> kernel > >>> and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary > >>> small > >>> value :). > >>> All the experiments have been performed booting the hypervisor with > >>> a > >>> small timer_slop (the hypervisor's one) value. So, they show that > >>> decreasing the hypervisor's timer_slop is not enough to measure low > >>> latencies with cyclictest. > >> I have a couple of questions: > >> * Does it make sense to make this a tunable for other clockevent > >> devices > >> as well? > >> > > So, AFAIUI, the thing is as follows. In clockevents_program_event(), we > > keep the delta between now and the next timer event within > > dev->max_delta_ns and dev->min_delta_ns: > > > > delta = min(delta, (int64_t) dev->max_delta_ns); > > delta = max(delta, (int64_t) dev->min_delta_ns); > > > > For Xen (well, for the Xen clock) we have: > > > > .max_delta_ns = 0xffffffff, > > .min_delta_ns = TIMER_SLOP, > > > > which means a guest can't ask for a timer to fire earlier than 100us > > ahead, which is a bit too coarse, especially on contemporary hardware. > > > > For "lapic_deadline" (which was what was in use in KVM guests, in our > > experiments) we have: > > > > lapic_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFF, &lapic_clockevent); > > lapic_clockevent.min_delta_ns = clockevent_delta2ns(0xF, &lapic_clockevent); > > > > Which means max is 0x7FFFFF device ticks, and min is 0xF. > > clockevent_delta2ns() does the conversion from ticks to ns, basing on > > the results of the APIC calibration process. It calls cev_delta2ns() > > which does some scaling, shifting, divs, etc, and, at the very end, > > this: > > > > /* Deltas less than 1usec are pointless noise */ > > return clc > 1000 ? clc : 1000; > > > > So, as Ryan is also saying, the actual minimum, in this case, depends > > on hardware, with a sanity check of "never below 1us" (which is quite > > smaller than 100us!) > > > > Of course, the actual granularity depends on hardware in the Xen case > > as well, but that is handled in Xen itself. And we have mechanisms in > > place in there to avoid timer interrupt storms (like, ahem, the Xen's > > 'timer_slop' boot parameter... :-P) > > > > And this is basically why I was also thinking we can/should lower the > > default value of TIMER_SLOP, here in the Xen clock implementation in > > Linux. > > What do you think would be a sane value? 10us? Should we then still keep > this patch? > > My concern would be that if we change the current value and it turns out > to be very wrong we'd then have no recourse. > > > -boris > Speaking out of turn but as a participant in this thread, I would not assume to change the default value for all cases without significant testing by the community, touching a variety of configurations. It feels like changing the default has a non-trivial amount of unknowns that would need to be addressed. Not surprisingly, I am biased to the approach of my patch which does not change the default but offers flexibility to all. - Ryan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-27 10:00 ` Ryan Thibodeaux @ 2019-03-27 14:46 ` Boris Ostrovsky 2019-03-27 14:59 ` Ryan Thibodeaux 2019-03-27 15:19 ` Dario Faggioli 0 siblings, 2 replies; 22+ messages in thread From: Boris Ostrovsky @ 2019-03-27 14:46 UTC (permalink / raw) To: Ryan Thibodeaux Cc: Dario Faggioli, luca abeni, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux On 3/27/19 6:00 AM, Ryan Thibodeaux wrote: > On Tue, Mar 26, 2019 at 07:21:31PM -0400, Boris Ostrovsky wrote: >> On 3/26/19 5:13 AM, Dario Faggioli wrote: >>> On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote: >>>> On 3/25/19 8:05 AM, luca abeni wrote: >>>>> The picture shows the latencies measured with an unpatched guest >>>>> kernel >>>>> and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary >>>>> small >>>>> value :). >>>>> All the experiments have been performed booting the hypervisor with >>>>> a >>>>> small timer_slop (the hypervisor's one) value. So, they show that >>>>> decreasing the hypervisor's timer_slop is not enough to measure low >>>>> latencies with cyclictest. >>>> I have a couple of questions: >>>> * Does it make sense to make this a tunable for other clockevent >>>> devices >>>> as well? >>>> >>> So, AFAIUI, the thing is as follows. In clockevents_program_event(), we >>> keep the delta between now and the next timer event within >>> dev->max_delta_ns and dev->min_delta_ns: >>> >>> delta = min(delta, (int64_t) dev->max_delta_ns); >>> delta = max(delta, (int64_t) dev->min_delta_ns); >>> >>> For Xen (well, for the Xen clock) we have: >>> >>> .max_delta_ns = 0xffffffff, >>> .min_delta_ns = TIMER_SLOP, >>> >>> which means a guest can't ask for a timer to fire earlier than 100us >>> ahead, which is a bit too coarse, especially on contemporary hardware. >>> >>> For "lapic_deadline" (which was what was in use in KVM guests, in our >>> experiments) we have: >>> >>> lapic_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFF, &lapic_clockevent); >>> lapic_clockevent.min_delta_ns = clockevent_delta2ns(0xF, &lapic_clockevent); >>> >>> Which means max is 0x7FFFFF device ticks, and min is 0xF. >>> clockevent_delta2ns() does the conversion from ticks to ns, basing on >>> the results of the APIC calibration process. It calls cev_delta2ns() >>> which does some scaling, shifting, divs, etc, and, at the very end, >>> this: >>> >>> /* Deltas less than 1usec are pointless noise */ >>> return clc > 1000 ? clc : 1000; >>> >>> So, as Ryan is also saying, the actual minimum, in this case, depends >>> on hardware, with a sanity check of "never below 1us" (which is quite >>> smaller than 100us!) >>> >>> Of course, the actual granularity depends on hardware in the Xen case >>> as well, but that is handled in Xen itself. And we have mechanisms in >>> place in there to avoid timer interrupt storms (like, ahem, the Xen's >>> 'timer_slop' boot parameter... :-P) >>> >>> And this is basically why I was also thinking we can/should lower the >>> default value of TIMER_SLOP, here in the Xen clock implementation in >>> Linux. >> What do you think would be a sane value? 10us? Should we then still keep >> this patch? >> >> My concern would be that if we change the current value and it turns out >> to be very wrong we'd then have no recourse. >> >> >> -boris >> > Speaking out of turn but as a participant in this thread, I would not > assume to change the default value for all cases without significant > testing by the community, touching a variety of configurations. > > It feels like changing the default has a non-trivial amount of > unknowns that would need to be addressed. > > Not surprisingly, I am biased to the approach of my patch which > does not change the default but offers flexibility to all. If we are to change the default it would be good to at least collect some data on distribution of delta values in clockevents_program_event(). But as I said, I'd keep the patch. Also, as far as the comment describing TIMER_SLOP, I agree that it is rather misleading. I can replace it with /* Minimum amount of time until next clock event fires */, I can do it while committing so no need to resend. -boris ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-27 14:46 ` Boris Ostrovsky @ 2019-03-27 14:59 ` Ryan Thibodeaux 2019-03-27 15:19 ` Dario Faggioli 1 sibling, 0 replies; 22+ messages in thread From: Ryan Thibodeaux @ 2019-03-27 14:59 UTC (permalink / raw) To: Boris Ostrovsky Cc: Dario Faggioli, luca abeni, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux On Wed, Mar 27, 2019 at 10:46:21AM -0400, Boris Ostrovsky wrote: > On 3/27/19 6:00 AM, Ryan Thibodeaux wrote: > > On Tue, Mar 26, 2019 at 07:21:31PM -0400, Boris Ostrovsky wrote: > >> On 3/26/19 5:13 AM, Dario Faggioli wrote: > >>> On Mon, 2019-03-25 at 09:43 -0400, Boris Ostrovsky wrote: > >>>> On 3/25/19 8:05 AM, luca abeni wrote: > >>>>> The picture shows the latencies measured with an unpatched guest > >>>>> kernel > >>>>> and with a guest kernel having TIMER_SLOP set to 1000 (arbitrary > >>>>> small > >>>>> value :). > >>>>> All the experiments have been performed booting the hypervisor with > >>>>> a > >>>>> small timer_slop (the hypervisor's one) value. So, they show that > >>>>> decreasing the hypervisor's timer_slop is not enough to measure low > >>>>> latencies with cyclictest. > >>>> I have a couple of questions: > >>>> * Does it make sense to make this a tunable for other clockevent > >>>> devices > >>>> as well? > >>>> > >>> So, AFAIUI, the thing is as follows. In clockevents_program_event(), we > >>> keep the delta between now and the next timer event within > >>> dev->max_delta_ns and dev->min_delta_ns: > >>> > >>> delta = min(delta, (int64_t) dev->max_delta_ns); > >>> delta = max(delta, (int64_t) dev->min_delta_ns); > >>> > >>> For Xen (well, for the Xen clock) we have: > >>> > >>> .max_delta_ns = 0xffffffff, > >>> .min_delta_ns = TIMER_SLOP, > >>> > >>> which means a guest can't ask for a timer to fire earlier than 100us > >>> ahead, which is a bit too coarse, especially on contemporary hardware. > >>> > >>> For "lapic_deadline" (which was what was in use in KVM guests, in our > >>> experiments) we have: > >>> > >>> lapic_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFF, &lapic_clockevent); > >>> lapic_clockevent.min_delta_ns = clockevent_delta2ns(0xF, &lapic_clockevent); > >>> > >>> Which means max is 0x7FFFFF device ticks, and min is 0xF. > >>> clockevent_delta2ns() does the conversion from ticks to ns, basing on > >>> the results of the APIC calibration process. It calls cev_delta2ns() > >>> which does some scaling, shifting, divs, etc, and, at the very end, > >>> this: > >>> > >>> /* Deltas less than 1usec are pointless noise */ > >>> return clc > 1000 ? clc : 1000; > >>> > >>> So, as Ryan is also saying, the actual minimum, in this case, depends > >>> on hardware, with a sanity check of "never below 1us" (which is quite > >>> smaller than 100us!) > >>> > >>> Of course, the actual granularity depends on hardware in the Xen case > >>> as well, but that is handled in Xen itself. And we have mechanisms in > >>> place in there to avoid timer interrupt storms (like, ahem, the Xen's > >>> 'timer_slop' boot parameter... :-P) > >>> > >>> And this is basically why I was also thinking we can/should lower the > >>> default value of TIMER_SLOP, here in the Xen clock implementation in > >>> Linux. > >> What do you think would be a sane value? 10us? Should we then still keep > >> this patch? > >> > >> My concern would be that if we change the current value and it turns out > >> to be very wrong we'd then have no recourse. > >> > >> > >> -boris > >> > > Speaking out of turn but as a participant in this thread, I would not > > assume to change the default value for all cases without significant > > testing by the community, touching a variety of configurations. > > > > It feels like changing the default has a non-trivial amount of > > unknowns that would need to be addressed. > > > > Not surprisingly, I am biased to the approach of my patch which > > does not change the default but offers flexibility to all. > > > If we are to change the default it would be good to at least collect > some data on distribution of delta values in > clockevents_program_event(). But as I said, I'd keep the patch. > > Also, as far as the comment describing TIMER_SLOP, I agree that it is > rather misleading. > > I can replace it with /* Minimum amount of time until next clock event > fires */, I can do it while committing so no need to resend. > > -boris I like that. Thanks Boris! - Ryan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-27 14:46 ` Boris Ostrovsky 2019-03-27 14:59 ` Ryan Thibodeaux @ 2019-03-27 15:19 ` Dario Faggioli 1 sibling, 0 replies; 22+ messages in thread From: Dario Faggioli @ 2019-03-27 15:19 UTC (permalink / raw) To: Boris Ostrovsky, Ryan Thibodeaux Cc: luca abeni, xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux On Wed, 2019-03-27 at 10:46 -0400, Boris Ostrovsky wrote: > On 3/27/19 6:00 AM, Ryan Thibodeaux wrote: > > On Tue, Mar 26, 2019 at 07:21:31PM -0400, Boris Ostrovsky wrote: > > > On 3/26/19 5:13 AM, Dario Faggioli wrote: > > > > > > > > And this is basically why I was also thinking we can/should > > > > lower the > > > > default value of TIMER_SLOP, here in the Xen clock > > > > implementation in > > > > Linux. > > > What do you think would be a sane value? 10us? Should we then > > > still keep > > > this patch? > > > > > > My concern would be that if we change the current value and it > > > turns out > > > to be very wrong we'd then have no recourse. > > > > > Speaking out of turn but as a participant in this thread, I would > > not > > assume to change the default value for all cases without > > significant > > testing by the community, touching a variety of configurations. > > > If we are to change the default it would be good to at least collect > some data on distribution of delta values in > clockevents_program_event(). But as I said, I'd keep the patch. > I would definitely take/keep this patch. Choosing a more sane (IMO) default and making things flexible and configurable are not mutually exclusive things. :-) I think that having this set to 100us stands in the way of a lot of people wanting to do time sensitive stuff in Xen VMs. I'd at least halve that to 50us, but 10us is even better. But sure we can do this at a later point. And even at that point, a patch like this is valuable, because there might be people that might, probably after some testing of their own setup, want to lower it even further. > Also, as far as the comment describing TIMER_SLOP, I agree that it is > rather misleading. > > I can replace it with /* Minimum amount of time until next clock > event > fires */, I can do it while committing so no need to resend. > Yeah, much better, yes. Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-22 22:10 ` Boris Ostrovsky 2019-03-23 2:58 ` Dario Faggioli @ 2019-03-23 12:00 ` Ryan Thibodeaux 2019-03-24 18:07 ` Boris Ostrovsky 1 sibling, 1 reply; 22+ messages in thread From: Ryan Thibodeaux @ 2019-03-23 12:00 UTC (permalink / raw) To: Boris Ostrovsky Cc: xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux On Fri, Mar 22, 2019 at 06:10:16PM -0400, Boris Ostrovsky wrote: > On 3/22/19 2:29 PM, thibodux@gmail.com wrote: > > From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io> > > > > Add a new command-line option "xen_timer_slop=<INT>" that sets the > > minimum delta of virtual Xen timers. This commit does not change the > > default timer slop value for virtual Xen timers. > > > > Lowering the timer slop value should improve the accuracy of virtual > > timers (e.g., better process dispatch latency), but it will likely > > increase the number of virtual timer interrupts (relative to the > > original slop setting). > > > > The original timer slop value has not changed since the introduction > > of the Xen-aware Linux kernel code. This commit provides users an > > opportunity to tune timer performance given the refinements to > > hardware and the Xen event channel processing. It also mirrors > > a feature in the Xen hypervisor - the "timer_slop" Xen command line > > option. > > Is there any data that shows effects of using this new parameter? > > -boris > For our own testing using "cyclictest" from the rt-tests project, lowering the timer slop helped produce the best test runs, especially in terms of maximum process dispatch latency (PDL). Here is the output from one such test that ran overnight. The Xen timer slop in this case was 10000 or 10 microseconds. ... [root@slop1 ~]# cset shield -c 3 [root@slop1 ~]# echo ; date ; echo ; \ ./rt-tests-1.3/cyclictest -p95 -a3 -t1 -m; echo ; date Thu Mar 14 19:45:36 UTC 2019 # /dev/cpu_dma_latency set to 0us policy: fifo: loadavg: 0.00 0.02 0.00 1/91 4260 T: 0 ( 3212) P:95 I:1000 C:57077313 Min: 27 Act: 44 Avg: 43 Max: 145 ^C Fri Mar 15 11:36:53 UTC 2019 ... This test system was configured to use a TSC clocksource, disabled C states, and lowered the timer slop. I am not claiming the timer slop change was solely responsible for the best results. In other testing with the default timer slop setting of 100000 (100 microseconds), the average PDL would run slightly higher, but the spikes were much higher and more in number, often near the 1000s and happening multiple times per 10 minutes of testing. - Ryan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-23 12:00 ` Ryan Thibodeaux @ 2019-03-24 18:07 ` Boris Ostrovsky 2019-03-25 10:36 ` Dario Faggioli 0 siblings, 1 reply; 22+ messages in thread From: Boris Ostrovsky @ 2019-03-24 18:07 UTC (permalink / raw) To: Ryan Thibodeaux Cc: xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux On Sat, Mar 23, 2019 at 08:00:52AM -0400, Ryan Thibodeaux wrote: > On Fri, Mar 22, 2019 at 06:10:16PM -0400, Boris Ostrovsky wrote: > > On 3/22/19 2:29 PM, thibodux@gmail.com wrote: > > > From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io> > > > > > > Add a new command-line option "xen_timer_slop=<INT>" that sets the > > > minimum delta of virtual Xen timers. This commit does not change the > > > default timer slop value for virtual Xen timers. > > > > > > Lowering the timer slop value should improve the accuracy of virtual > > > timers (e.g., better process dispatch latency), but it will likely > > > increase the number of virtual timer interrupts (relative to the > > > original slop setting). > > > > > > The original timer slop value has not changed since the introduction > > > of the Xen-aware Linux kernel code. This commit provides users an > > > opportunity to tune timer performance given the refinements to > > > hardware and the Xen event channel processing. It also mirrors > > > a feature in the Xen hypervisor - the "timer_slop" Xen command line > > > option. > > > > Is there any data that shows effects of using this new parameter? > > > > -boris > > > > For our own testing using "cyclictest" from the rt-tests project, > lowering the timer slop helped produce the best test runs, especially > in terms of maximum process dispatch latency (PDL). > > Here is the output from one such test that ran overnight. The Xen > timer slop in this case was 10000 or 10 microseconds. > > ... > [root@slop1 ~]# cset shield -c 3 > [root@slop1 ~]# echo ; date ; echo ; \ > ./rt-tests-1.3/cyclictest -p95 -a3 -t1 -m; echo ; date > > Thu Mar 14 19:45:36 UTC 2019 > > # /dev/cpu_dma_latency set to 0us > policy: fifo: loadavg: 0.00 0.02 0.00 1/91 4260 > T: 0 ( 3212) P:95 I:1000 C:57077313 Min: 27 Act: 44 Avg: 43 Max: 145 > ^C > Fri Mar 15 11:36:53 UTC 2019 > ... > > This test system was configured to use a TSC clocksource, disabled > C states, and lowered the timer slop. I am not claiming the timer > slop change was solely responsible for the best results. How can we then be sure that the proposed change will indeed provide some sort of benefit? Were there any other changes between your tests to think that slop time modification may not be responsible for better results? -boris > In other > testing with the default timer slop setting of 100000 (100 > microseconds), the average PDL would run slightly higher, but the > spikes were much higher and more in number, often near the 1000s > and happening multiple times per 10 minutes of testing. > > - Ryan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-24 18:07 ` Boris Ostrovsky @ 2019-03-25 10:36 ` Dario Faggioli 0 siblings, 0 replies; 22+ messages in thread From: Dario Faggioli @ 2019-03-25 10:36 UTC (permalink / raw) To: Boris Ostrovsky, Ryan Thibodeaux Cc: xen-devel, linux-kernel, oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux, luca.abeni [-- Attachment #1: Type: text/plain, Size: 1940 bytes --] On Sun, 2019-03-24 at 14:07 -0400, Boris Ostrovsky wrote: > On Sat, Mar 23, 2019 at 08:00:52AM -0400, Ryan Thibodeaux wrote: > > This test system was configured to use a TSC clocksource, disabled > > C states, and lowered the timer slop. I am not claiming the timer > > slop change was solely responsible for the best results. > > How can we then be sure that the proposed change will indeed provide > some sort of benefit? > > Were there any other changes between your tests to think that slop > time modification may not be responsible for better results? > FWIW, in mine and Luca's experiments, changing lowering both timer_slop in Xen and TIMER_SLOP in Linux, improved latency dramatically, without any other change. We also tried _only_ playing with the Xen tunable (as there's a Xen boot parameter for it already) but that wasn't enough. It was only when we also tuned TIMER_SLOP in Linux's Xen clockevent device that we got decent numbers (i.e., comparable to KVM ones). Reason why we had not share these results yet was that we were still "polishing" them, and because we also found a couple of other issues, and we were trying to understand them better, before sending anything out. But those other issues were --although still about achieving low latencies-- orthogonal from this, and lowering the default slop is absolute prerequisite for even talking about having a reasonable vcpu response time. A patch like this one, was something we were thinking to submit ourself sooner or later (backed up by our results). Personally, in addition to making the value tunable, which I think is a good thing, I also would lower the default. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/ [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] x86/xen: Add "xen_timer_slop" command line option 2019-03-22 18:29 [PATCH] x86/xen: Add "xen_timer_slop" command line option thibodux 2019-03-22 22:10 ` Boris Ostrovsky @ 2019-04-24 18:47 ` Boris Ostrovsky 1 sibling, 0 replies; 22+ messages in thread From: Boris Ostrovsky @ 2019-04-24 18:47 UTC (permalink / raw) To: thibodux, xen-devel, linux-kernel Cc: oleksandr_andrushchenko, tglx, jgross, ryan.thibodeaux On 3/22/19 2:29 PM, thibodux@gmail.com wrote: > From: Ryan Thibodeaux <ryan.thibodeaux@starlab.io> > > Add a new command-line option "xen_timer_slop=<INT>" that sets the > minimum delta of virtual Xen timers. This commit does not change the > default timer slop value for virtual Xen timers. > > Lowering the timer slop value should improve the accuracy of virtual > timers (e.g., better process dispatch latency), but it will likely > increase the number of virtual timer interrupts (relative to the > original slop setting). > > The original timer slop value has not changed since the introduction > of the Xen-aware Linux kernel code. This commit provides users an > opportunity to tune timer performance given the refinements to > hardware and the Xen event channel processing. It also mirrors > a feature in the Xen hypervisor - the "timer_slop" Xen command line > option. > > Signed-off-by: Ryan Thibodeaux <ryan.thibodeaux@starlab.io> Applied to for-linus-5.2 ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2019-04-24 18:48 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-03-22 18:29 [PATCH] x86/xen: Add "xen_timer_slop" command line option thibodux 2019-03-22 22:10 ` Boris Ostrovsky 2019-03-23 2:58 ` Dario Faggioli 2019-03-23 10:41 ` luca abeni 2019-03-25 12:05 ` luca abeni 2019-03-25 13:43 ` Boris Ostrovsky 2019-03-25 14:07 ` luca abeni 2019-03-25 14:11 ` Ryan Thibodeaux 2019-03-25 17:36 ` Ryan Thibodeaux 2019-03-25 18:31 ` Boris Ostrovsky 2019-03-26 9:13 ` Dario Faggioli 2019-03-26 11:12 ` luca abeni 2019-03-26 11:41 ` Ryan Thibodeaux 2019-03-26 23:21 ` Boris Ostrovsky 2019-03-27 10:00 ` Ryan Thibodeaux 2019-03-27 14:46 ` Boris Ostrovsky 2019-03-27 14:59 ` Ryan Thibodeaux 2019-03-27 15:19 ` Dario Faggioli 2019-03-23 12:00 ` Ryan Thibodeaux 2019-03-24 18:07 ` Boris Ostrovsky 2019-03-25 10:36 ` Dario Faggioli 2019-04-24 18:47 ` Boris Ostrovsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox