* Sketch of an idea for handling the "mixed workload" problem @ 2023-09-29 16:42 George Dunlap 2023-09-30 23:28 ` Demi Marie Obenour 2024-01-22 0:31 ` Demi Marie Obenour 0 siblings, 2 replies; 12+ messages in thread From: George Dunlap @ 2023-09-29 16:42 UTC (permalink / raw) To: Xen-devel Cc: Juergen Gross, Demi Marie Obenour, Marek Marczykowski-Górecki The basic credit2 algorithm goes something like this: 1. All vcpus start with the same number of credits; about 10ms worth if everyone has the same weight 2. vcpus burn credits as they consume cpu, based on the relative weights: higher weights burn slower, lower weights burn faster 3. At any given point in time, the runnable vcpu with the highest credit is allowed to run 4. When the "next runnable vcpu" on a runqueue is negative, credit is reset: everyone gets another 10ms, and can carry over at most 2ms of credit over the reset. Generally speaking, vcpus that use less than their quota and have lots of interrupts are scheduled immediately, since when they wake up they always have more credit than the vcpus who are burning through their slices. But what about a situation as described recently on Matrix, where a VM uses a non-negligible amount of cpu doing un-accelerated encryption and decryption, which can be delayed by a few MS, as well as handling audio events? How can we make sure that: 1. We can run whenever interrupts happen 2. We get no more than our fair share of the cpu? The counter-intuitive key here is that in order to achieve the above, you need to *deschedule or preempt early*, so that when the interrupt comes, you have spare credit to run the interrupt handler. How do we manage that? The idea I'm working out comes from a phrase I used in the Matrix discussion, about a vcpu that "foolishly burned all its credits". Naturally the thing you want to do to have credits available is to save them up. So the idea would be this. Each vcpu would have a "boost credit ratio" and a "default boost interval"; there would be sensible defaults based on typical workloads, but these could be tweaked for individual VMs. When credit is assigned, all VMs would get the same amount of credit, but divided into two "buckets", according to the boost credit ratio. Under certain conditions, a vcpu would be considered "boosted"; this state would last either until the default boost interval, or until some other event (such as a de-boost yield). The queue would be sorted thus: * Boosted vcpus, by boost credit available * Non-boosted vcpus, by non-boost credit available Getting more boost credit means having lower priority when not boosted; and burning through your boost credit means not being scheduled when you need to be. Other ways we could consider putting a vcpu into a boosted state (some discussed on Matrix or emails linked from Matrix): * Xen is about to preempt, but finds that the vcpu interrupts are blocked (this sort of overlaps with the "when we deliver an interrupt" one) * Xen is about to preempt, but finds that the (currently out-of-tree) "dont_desched" bit has been set in the shared memory area Other ways to consider de-boosting: * There's a way to trigger a VMEXIT when interrupts have been re-enabled; setting this up when the VM is in the boost state Getting the defaults right might take some thinking. If you set the default "boost credit ratio" to 25% and the "default boost interval" to 500ms, then you'd basically have five "boosts" per scheduling window. The window depends on how active other vcpus are, but if it's longer than 20ms your system is too overloaded. Thoughts? Demi, what kinds of interrupt counts are you getting for your VM? -George ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Sketch of an idea for handling the "mixed workload" problem 2023-09-29 16:42 Sketch of an idea for handling the "mixed workload" problem George Dunlap @ 2023-09-30 23:28 ` Demi Marie Obenour 2023-10-02 11:20 ` George Dunlap 2024-01-22 0:31 ` Demi Marie Obenour 1 sibling, 1 reply; 12+ messages in thread From: Demi Marie Obenour @ 2023-09-30 23:28 UTC (permalink / raw) To: George Dunlap, Xen-devel; +Cc: Juergen Gross, Marek Marczykowski-Górecki [-- Attachment #1: Type: text/plain, Size: 6475 bytes --] On Fri, Sep 29, 2023 at 05:42:16PM +0100, George Dunlap wrote: > The basic credit2 algorithm goes something like this: > > 1. All vcpus start with the same number of credits; about 10ms worth > if everyone has the same weight > 2. vcpus burn credits as they consume cpu, based on the relative > weights: higher weights burn slower, lower weights burn faster > > 3. At any given point in time, the runnable vcpu with the highest > credit is allowed to run > > 4. When the "next runnable vcpu" on a runqueue is negative, credit is > reset: everyone gets another 10ms, and can carry over at most 2ms of > credit over the reset. One relevant aspect of Qubes OS is that it is very very heavily oversubscribed: having more VMs running than physical CPUs is (at least in my usage) not uncommon, and each of those VMs will typically have at least two vCPUs. With a credit of 10ms and 36 vCPUs, I could easily see a vCPU not being allowed to execute for 200ms or more. For audio or video, workloads, this is a disaster. 10ms is a LOT for desktop workloads or for anyone who cares about latency. At 60Hz it is 3/5 of a frame, and with a 120Hz monitor and a heavily contended system frame drops are guaranteed. > Generally speaking, vcpus that use less than their quota and have lots > of interrupts are scheduled immediately, since when they wake up they > always have more credit than the vcpus who are burning through their > slices. > > But what about a situation as described recently on Matrix, where a VM > uses a non-negligible amount of cpu doing un-accelerated encryption > and decryption, which can be delayed by a few MS, as well as handling > audio events? How can we make sure that: > > 1. We can run whenever interrupts happen > 2. We get no more than our fair share of the cpu? > > The counter-intuitive key here is that in order to achieve the above, > you need to *deschedule or preempt early*, so that when the interrupt > comes, you have spare credit to run the interrupt handler. How do we > manage that? > > The idea I'm working out comes from a phrase I used in the Matrix > discussion, about a vcpu that "foolishly burned all its credits". > Naturally the thing you want to do to have credits available is to > save them up. > > So the idea would be this. Each vcpu would have a "boost credit > ratio" and a "default boost interval"; there would be sensible > defaults based on typical workloads, but these could be tweaked for > individual VMs. > > When credit is assigned, all VMs would get the same amount of credit, > but divided into two "buckets", according to the boost credit ratio. > > Under certain conditions, a vcpu would be considered "boosted"; this > state would last either until the default boost interval, or until > some other event (such as a de-boost yield). > > The queue would be sorted thus: > > * Boosted vcpus, by boost credit available > * Non-boosted vcpus, by non-boost credit available > > Getting more boost credit means having lower priority when not > boosted; and burning through your boost credit means not being > scheduled when you need to be. > > Other ways we could consider putting a vcpu into a boosted state (some > discussed on Matrix or emails linked from Matrix): > * Xen is about to preempt, but finds that the vcpu interrupts are > blocked (this sort of overlaps with the "when we deliver an interrupt" > one) This is also a good heuristic for "vCPU owns a spinlock", which is definitely a bad time to preempt. > * Xen is about to preempt, but finds that the (currently out-of-tree) > "dont_desched" bit has been set in the shared memory area > > Other ways to consider de-boosting: > * There's a way to trigger a VMEXIT when interrupts have been > re-enabled; setting this up when the VM is in the boost state This is a good idea. > Getting the defaults right might take some thinking. If you set the > default "boost credit ratio" to 25% and the "default boost interval" > to 500ms, then you'd basically have five "boosts" per scheduling > window. The window depends on how active other vcpus are, but if it's > longer than 20ms your system is too overloaded. An interval of 500ms seems rather long to me. Did you mean 500μs? > Thoughts? My first thought when I had the problem is that Xen's scheduling quantum was too long. This is consistent with the observation that dom0 (which was not very busy IIRC) fell behind in its delivery of audio samples. Presumably it had plenty of credit, but simply did not get scheduled in time, perhaps because Xen did not preempt soon enough. It’s also worth noting that Qubes makes heavy use of vchans, and I expect the latency of these to be directly proportional to the time between preemption interrupts. Audio is not very demanding on throughput, but is extremely sensitive to latency. Therefore, the top priority is making sure that every runnable vCPU gets a chance to execute periodically. One way to solve this would be for both the credits (both the initial credit and the maximum credit carried over) and the interval between preemptions to be inversely proportional to the number of runnable vCPUs, so that the time needed to cycle through all runnable vCPUs is roughly constant. Specifically, they would be proportional to Lmax/runnable_vCPUs, where Lmax is the latency target (1ms or so). This also ensures that even Xen-unaware VMs (such as a Windows guest running Microsoft Teams or Skype) get to run periodically. There would need to be a limit to prevent Xen from hogging more than e.g. 10% of CPU time just doing preemption, but if this is hit, Xen should log something and possibly notify dom0 so that a warning can be displayed to the user. Additionally, a certain amount of CPU time (such as 10%) should be reserved for dom0, so that the system remains responsive. Qubes OS could also help here. If a VM is allowed to record audio, it (and the VMs providing network to it, transitively) should get a boost in priority, so that if the system is overloaded other guests are more likely be delayed in their execution. > Demi, what kinds of interrupt counts are you getting for your VM? I didn't measure it, but I can check the next time I am on a video call or doing audio recoring. > -George -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Sketch of an idea for handling the "mixed workload" problem 2023-09-30 23:28 ` Demi Marie Obenour @ 2023-10-02 11:20 ` George Dunlap 2024-01-21 23:46 ` Demi Marie Obenour 0 siblings, 1 reply; 12+ messages in thread From: George Dunlap @ 2023-10-02 11:20 UTC (permalink / raw) To: Demi Marie Obenour Cc: Xen-devel, Juergen Gross, Marek Marczykowski-Górecki On Sun, Oct 1, 2023 at 12:28 AM Demi Marie Obenour <demi@invisiblethingslab.com> wrote: > > On Fri, Sep 29, 2023 at 05:42:16PM +0100, George Dunlap wrote: > > The basic credit2 algorithm goes something like this: > > > > 1. All vcpus start with the same number of credits; about 10ms worth > > if everyone has the same weight > > > 2. vcpus burn credits as they consume cpu, based on the relative > > weights: higher weights burn slower, lower weights burn faster > > > > 3. At any given point in time, the runnable vcpu with the highest > > credit is allowed to run > > > > 4. When the "next runnable vcpu" on a runqueue is negative, credit is > > reset: everyone gets another 10ms, and can carry over at most 2ms of > > credit over the reset. > > One relevant aspect of Qubes OS is that it is very very heavily > oversubscribed: having more VMs running than physical CPUs is (at least > in my usage) not uncommon, and each of those VMs will typically have at > least two vCPUs. With a credit of 10ms and 36 vCPUs, I could easily see > a vCPU not being allowed to execute for 200ms or more. For audio or > video, workloads, this is a disaster. > > 10ms is a LOT for desktop workloads or for anyone who cares about > latency. At 60Hz it is 3/5 of a frame, and with a 120Hz monitor and a > heavily contended system frame drops are guaranteed. You'd probably benefit from understanding better how the various algorithms actually work. I'm sorry I don't have any really good "virtualization scheduling for dummies" resources; the best I have is a few talks I gave on the subject; e.g.: https://www.youtube.com/watch?v=C3jjvkr6fgQ For one, when I say "oversubscribed", I don't mean "vcpus / pcpus"; I mean "requested vcpu execution time / vcpus". If you have 18 vcpus on a single pcpu, and all of them *on an empty system* would have run at 5%, you're totally fine. If you have 18 vcpus on a single pcpu, and all of them on an empty system would have averaged 100%, there's only so much the scheduler can do to avoid problems. Secondly, while on credit1 a vcpu is allowed to run for 10ms without stopping (and then must wait for 18x that time to get the same credit back, if there are 18 other vcpus running on that same pcpu), this is not the case for credit2. The exact calculation can be found in xen/common/sched/credit2.c:sched2_runtime(), but generally here's the general algorithm from the comment: /* General algorithm: * 1) Run until snext's credit will be 0. * 2) But if someone is waiting, run until snext's credit is equal * to his. * 3) But, if we are capped, never run more than our budget. * 4) And never run longer than MAX_TIMER or shorter than MIN_TIMER or * the ratelimit time. */ Default MIN_TIMER is 500us, and is configurable via sysctl; default MAX_TIMER is... hmm, I'm pretty sure this started out as 2ms, but now it seems to be 10ms. Looks like this was changed in da92ec5bd1 ("xen: credit2: "relax" CSCHED2_MAX_TIMER") in 2016. (MAX_TIMER isn't configurable, but arguably it should be; and making it configurable should just be a matter of duplicating the logic around MIN_TIMER.) That's not yet the last word though: If a VM that was a sleep wakes up, and it has credit than the running vcpu, then it will generally preempt that cpu. All that to say, that it should be very rare for a cpu to run for a full 10ms under credit2. > > Other ways we could consider putting a vcpu into a boosted state (some > > discussed on Matrix or emails linked from Matrix): > > * Xen is about to preempt, but finds that the vcpu interrupts are > > blocked (this sort of overlaps with the "when we deliver an interrupt" > > one) > > This is also a good heuristic for "vCPU owns a spinlock", which is > definitely a bad time to preempt. Not all spinlocks disable IRQs, but certainly some do. > > Getting the defaults right might take some thinking. If you set the > > default "boost credit ratio" to 25% and the "default boost interval" > > to 500ms, then you'd basically have five "boosts" per scheduling > > window. The window depends on how active other vcpus are, but if it's > > longer than 20ms your system is too overloaded. > > An interval of 500ms seems rather long to me. Did you mean 500μs? Yes, I did mean 500us, sorry. I'll respond to the other suggestions later. > > Demi, what kinds of interrupt counts are you getting for your VM? > > I didn't measure it, but I can check the next time I am on a video call > or doing audio recoring. Running xentrace would be really interesting too; those are another good way to nerd-snipe me. :-) -George ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Sketch of an idea for handling the "mixed workload" problem 2023-10-02 11:20 ` George Dunlap @ 2024-01-21 23:46 ` Demi Marie Obenour 0 siblings, 0 replies; 12+ messages in thread From: Demi Marie Obenour @ 2024-01-21 23:46 UTC (permalink / raw) To: George Dunlap; +Cc: Xen-devel, Juergen Gross, Marek Marczykowski-Górecki [-- Attachment #1: Type: text/plain, Size: 5473 bytes --] On Mon, Oct 02, 2023 at 12:20:31PM +0100, George Dunlap wrote: > On Sun, Oct 1, 2023 at 12:28 AM Demi Marie Obenour > <demi@invisiblethingslab.com> wrote: > > > > On Fri, Sep 29, 2023 at 05:42:16PM +0100, George Dunlap wrote: > > > The basic credit2 algorithm goes something like this: > > > > > > 1. All vcpus start with the same number of credits; about 10ms worth > > > if everyone has the same weight > > > > > 2. vcpus burn credits as they consume cpu, based on the relative > > > weights: higher weights burn slower, lower weights burn faster > > > > > > 3. At any given point in time, the runnable vcpu with the highest > > > credit is allowed to run > > > > > > 4. When the "next runnable vcpu" on a runqueue is negative, credit is > > > reset: everyone gets another 10ms, and can carry over at most 2ms of > > > credit over the reset. > > > > One relevant aspect of Qubes OS is that it is very very heavily > > oversubscribed: having more VMs running than physical CPUs is (at least > > in my usage) not uncommon, and each of those VMs will typically have at > > least two vCPUs. With a credit of 10ms and 36 vCPUs, I could easily see > > a vCPU not being allowed to execute for 200ms or more. For audio or > > video, workloads, this is a disaster. > > > > 10ms is a LOT for desktop workloads or for anyone who cares about > > latency. At 60Hz it is 3/5 of a frame, and with a 120Hz monitor and a > > heavily contended system frame drops are guaranteed. > > You'd probably benefit from understanding better how the various > algorithms actually work. I'm sorry I don't have any really good > "virtualization scheduling for dummies" resources; the best I have is > a few talks I gave on the subject; e.g.: > > https://www.youtube.com/watch?v=C3jjvkr6fgQ > > For one, when I say "oversubscribed", I don't mean "vcpus / pcpus"; I > mean "requested vcpu execution time / vcpus". If you have 18 vcpus on > a single pcpu, and all of them *on an empty system* would have run at > 5%, you're totally fine. If you have 18 vcpus on a single pcpu, and > all of them on an empty system would have averaged 100%, there's only > so much the scheduler can do to avoid problems. If each vCPU would have spent 4% time doing realtime tasks, it should be possible to give all of the realtime tasks all the time they need, while the remaining 100 - 4 * 18 = 28% of time is available to non-realtime tasks. That’s not awesome, but it might be enough to prevent audio from glitching. > Secondly, while on credit1 a vcpu is allowed to run for 10ms without > stopping (and then must wait for 18x that time to get the same credit > back, if there are 18 other vcpus running on that same pcpu), this is > not the case for credit2. The exact calculation can be found in > xen/common/sched/credit2.c:sched2_runtime(), but generally here's the > general algorithm from the comment: > > /* General algorithm: > * 1) Run until snext's credit will be 0. > * 2) But if someone is waiting, run until snext's credit is equal > * to his. > * 3) But, if we are capped, never run more than our budget. > * 4) And never run longer than MAX_TIMER or shorter than MIN_TIMER or > * the ratelimit time. > */ > > Default MIN_TIMER is 500us, and is configurable via sysctl; default > MAX_TIMER is... hmm, I'm pretty sure this started out as 2ms, but now > it seems to be 10ms. Looks like this was changed in da92ec5bd1 ("xen: > credit2: "relax" CSCHED2_MAX_TIMER") in 2016. (MAX_TIMER isn't > configurable, but arguably it should be; and making it configurable > should just be a matter of duplicating the logic around MIN_TIMER.) Maybe MAX_TIMER should be lowered to e.g. 1ms? > That's not yet the last word though: If a VM that was a sleep wakes > up, and it has credit than the running vcpu, then it will generally > preempt that cpu. > > All that to say, that it should be very rare for a cpu to run for a > full 10ms under credit2. That’s good. > > > Other ways we could consider putting a vcpu into a boosted state (some > > > discussed on Matrix or emails linked from Matrix): > > > * Xen is about to preempt, but finds that the vcpu interrupts are > > > blocked (this sort of overlaps with the "when we deliver an interrupt" > > > one) > > > > This is also a good heuristic for "vCPU owns a spinlock", which is > > definitely a bad time to preempt. > > Not all spinlocks disable IRQs, but certainly some do. > > > > Getting the defaults right might take some thinking. If you set the > > > default "boost credit ratio" to 25% and the "default boost interval" > > > to 500ms, then you'd basically have five "boosts" per scheduling > > > window. The window depends on how active other vcpus are, but if it's > > > longer than 20ms your system is too overloaded. > > > > An interval of 500ms seems rather long to me. Did you mean 500μs? > > Yes, I did mean 500us, sorry. > > I'll respond to the other suggestions later. > > > > Demi, what kinds of interrupt counts are you getting for your VM? > > > > I didn't measure it, but I can check the next time I am on a video call > > or doing audio recoring. > > Running xentrace would be really interesting too; those are another > good way to nerd-snipe me. :-) > > -George That would certainly be a good idea! -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Sketch of an idea for handling the "mixed workload" problem 2023-09-29 16:42 Sketch of an idea for handling the "mixed workload" problem George Dunlap 2023-09-30 23:28 ` Demi Marie Obenour @ 2024-01-22 0:31 ` Demi Marie Obenour 2024-01-22 11:54 ` George Dunlap 1 sibling, 1 reply; 12+ messages in thread From: Demi Marie Obenour @ 2024-01-22 0:31 UTC (permalink / raw) To: George Dunlap, Xen-devel; +Cc: Juergen Gross, Marek Marczykowski-Górecki [-- Attachment #1: Type: text/plain, Size: 6571 bytes --] On Fri, Sep 29, 2023 at 05:42:16PM +0100, George Dunlap wrote: > The basic credit2 algorithm goes something like this: > > 1. All vcpus start with the same number of credits; about 10ms worth > if everyone has the same weight > > 2. vcpus burn credits as they consume cpu, based on the relative > weights: higher weights burn slower, lower weights burn faster > > 3. At any given point in time, the runnable vcpu with the highest > credit is allowed to run > > 4. When the "next runnable vcpu" on a runqueue is negative, credit is > reset: everyone gets another 10ms, and can carry over at most 2ms of > credit over the reset. > > Generally speaking, vcpus that use less than their quota and have lots > of interrupts are scheduled immediately, since when they wake up they > always have more credit than the vcpus who are burning through their > slices. > > But what about a situation as described recently on Matrix, where a VM > uses a non-negligible amount of cpu doing un-accelerated encryption > and decryption, which can be delayed by a few MS, as well as handling > audio events? How can we make sure that: > > 1. We can run whenever interrupts happen > 2. We get no more than our fair share of the cpu? > > The counter-intuitive key here is that in order to achieve the above, > you need to *deschedule or preempt early*, so that when the interrupt > comes, you have spare credit to run the interrupt handler. How do we > manage that? > > The idea I'm working out comes from a phrase I used in the Matrix > discussion, about a vcpu that "foolishly burned all its credits". > Naturally the thing you want to do to have credits available is to > save them up. > > So the idea would be this. Each vcpu would have a "boost credit > ratio" and a "default boost interval"; there would be sensible > defaults based on typical workloads, but these could be tweaked for > individual VMs. > > When credit is assigned, all VMs would get the same amount of credit, > but divided into two "buckets", according to the boost credit ratio. > > Under certain conditions, a vcpu would be considered "boosted"; this > state would last either until the default boost interval, or until > some other event (such as a de-boost yield). > > The queue would be sorted thus: > > * Boosted vcpus, by boost credit available > * Non-boosted vcpus, by non-boost credit available > > Getting more boost credit means having lower priority when not > boosted; and burning through your boost credit means not being > scheduled when you need to be. > > Other ways we could consider putting a vcpu into a boosted state (some > discussed on Matrix or emails linked from Matrix): > * Xen is about to preempt, but finds that the vcpu interrupts are > blocked (this sort of overlaps with the "when we deliver an interrupt" > one) > * Xen is about to preempt, but finds that the (currently out-of-tree) > "dont_desched" bit has been set in the shared memory area I think both of these would be good. Another one would be when Xen is about to deliver an interrupt to a guest, provided that there is no storm of interrupts. I’ve seen a USB webcam cause a system-wide latency spike through what I presume is an interrupt storm, and I suspect that others have observed similar behavior with USB external drives. > Other ways to consider de-boosting: > * There's a way to trigger a VMEXIT when interrupts have been > re-enabled; setting this up when the VM is in the boost state That’s a good idea, but should be conditional on “dont_desched” _not_ being set. This handles the case where the guest is running a realtime thread. Generally, I’d like to see something like this: - A vCPU with sufficient boost credit is boosted by Xen under the following conditions: 1. Xen interrupts the guest. 2. Xen is about to preempt, but detects that “dont_desched” is set. 3. Xen is about to preempt, but detects that interrupts are disabled. - A vCPU is deboosted if: 1. It runs out of boost credit, even if “dont_desched” is set. 2. An interrupt handler returns, but only if “dont_desched” is not set. 3. Interrupts are re-enabled, but only if “dont_desched” is not set. The first case is an abnormal condition and typically means that either the system is overloaded or a vCPU is running boosted for too long. To help debug this situation, Xen will log a warning and increment both a system-wide and a per-domain counter. dom0 can retrieve counters for any domain, and a domain can read its own counter. - When to set “dont_desched” is entirely up to the guest kernel, but there are some general rules guests should follow: - Only set “dont_desched” if there is a good reason, and unset it as soon as possible. Xen gives vCPUs with “dont_desched” set priority over all other vCPUs on the system, but the amount of time a vCPU is allowed to run with an elevated priority is limited. Xen will log a warning if a guest tries to run with elevated priority for too long. - Xen boosts vCPUs before delivering an interrupt, but there should be a way for a vCPU to deboost itself even before returning from the interrupt handler. - Guests should always set “dont_desched” when running hard-realtime threads (used for e.g. audio processing), even when the thread is in userspace. This ensures that Xen gives the underlying vCPU priority over vCPUs - Guests should always set “dont_desched” when holding a spin lock, but it is even better to use paravirtualized spin locks (which make a hypercall into Xen and therefore allow other vCPUs to run). - Xen does not implement priority inheritance, so guests need to do that. - Max boost credits can be set by dom0 via a hypercall. The advantage of this approach is that it keeps almost all policy out of Xen. The only exception is the boosting when an interrupt is received, but a well-behaved guest will deboost itself very quickly (by enabling interrupts) if the boost was not actually needed, so this should have very limited impact. I think this should be enough for realtime audio, and it is somewhat related to (but hopefully simpler than) the KVM RFC from Google [1]. Any thoughts on this? -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [1]: https://lore.kernel.org/kvm/20231214024727.3503870-1-vineeth@bitbyteword.org/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Sketch of an idea for handling the "mixed workload" problem 2024-01-22 0:31 ` Demi Marie Obenour @ 2024-01-22 11:54 ` George Dunlap 2024-01-22 12:17 ` Marek Marczykowski-Górecki 2024-01-23 16:58 ` Demi Marie Obenour 0 siblings, 2 replies; 12+ messages in thread From: George Dunlap @ 2024-01-22 11:54 UTC (permalink / raw) To: Demi Marie Obenour Cc: Xen-devel, Juergen Gross, Marek Marczykowski-Górecki On Mon, Jan 22, 2024 at 12:31 AM Demi Marie Obenour <demi@invisiblethingslab.com> wrote: > > On Fri, Sep 29, 2023 at 05:42:16PM +0100, George Dunlap wrote: > > The basic credit2 algorithm goes something like this: > > > > 1. All vcpus start with the same number of credits; about 10ms worth > > if everyone has the same weight > > > > 2. vcpus burn credits as they consume cpu, based on the relative > > weights: higher weights burn slower, lower weights burn faster > > > > 3. At any given point in time, the runnable vcpu with the highest > > credit is allowed to run > > > > 4. When the "next runnable vcpu" on a runqueue is negative, credit is > > reset: everyone gets another 10ms, and can carry over at most 2ms of > > credit over the reset. > > > > Generally speaking, vcpus that use less than their quota and have lots > > of interrupts are scheduled immediately, since when they wake up they > > always have more credit than the vcpus who are burning through their > > slices. > > > > But what about a situation as described recently on Matrix, where a VM > > uses a non-negligible amount of cpu doing un-accelerated encryption > > and decryption, which can be delayed by a few MS, as well as handling > > audio events? How can we make sure that: > > > > 1. We can run whenever interrupts happen > > 2. We get no more than our fair share of the cpu? > > > > The counter-intuitive key here is that in order to achieve the above, > > you need to *deschedule or preempt early*, so that when the interrupt > > comes, you have spare credit to run the interrupt handler. How do we > > manage that? > > > > The idea I'm working out comes from a phrase I used in the Matrix > > discussion, about a vcpu that "foolishly burned all its credits". > > Naturally the thing you want to do to have credits available is to > > save them up. > > > > So the idea would be this. Each vcpu would have a "boost credit > > ratio" and a "default boost interval"; there would be sensible > > defaults based on typical workloads, but these could be tweaked for > > individual VMs. > > > > When credit is assigned, all VMs would get the same amount of credit, > > but divided into two "buckets", according to the boost credit ratio. > > > > Under certain conditions, a vcpu would be considered "boosted"; this > > state would last either until the default boost interval, or until > > some other event (such as a de-boost yield). > > > > The queue would be sorted thus: > > > > * Boosted vcpus, by boost credit available > > * Non-boosted vcpus, by non-boost credit available > > > > Getting more boost credit means having lower priority when not > > boosted; and burning through your boost credit means not being > > scheduled when you need to be. > > > > Other ways we could consider putting a vcpu into a boosted state (some > > discussed on Matrix or emails linked from Matrix): > > * Xen is about to preempt, but finds that the vcpu interrupts are > > blocked (this sort of overlaps with the "when we deliver an interrupt" > > one) > > * Xen is about to preempt, but finds that the (currently out-of-tree) > > "dont_desched" bit has been set in the shared memory area > > I think both of these would be good. Another one would be when Xen is > about to deliver an interrupt to a guest, provided that there is no > storm of interrupts. I’ve seen a USB webcam cause a system-wide latency > spike through what I presume is an interrupt storm, and I suspect that > others have observed similar behavior with USB external drives. How would you determine that a given interrupt was part of a "storm", and what would you do differently as a result of determining that? > > Other ways to consider de-boosting: > > * There's a way to trigger a VMEXIT when interrupts have been > > re-enabled; setting this up when the VM is in the boost state > > That’s a good idea, but should be conditional on “dont_desched” _not_ > being set. This handles the case where the guest is running a realtime > thread. In which case we need some way for the "enlightened" guest to know how to de-boost itself; a yield might do. > Generally, I’d like to see something like this: > > - A vCPU with sufficient boost credit is boosted by Xen under the > following conditions: > > 1. Xen interrupts the guest. I take it you mean, "delivers an interrupt to the guest"? > 2. Xen is about to preempt, but detects that “dont_desched” is set. > 3. Xen is about to preempt, but detects that interrupts are disabled. > > - A vCPU is deboosted if: > > 1. It runs out of boost credit, even if “dont_desched” is set. > 2. An interrupt handler returns, but only if “dont_desched” is not set. > 3. Interrupts are re-enabled, but only if “dont_desched” is not set. > > The first case is an abnormal condition and typically means that > either the system is overloaded or a vCPU is running boosted for too > long. To help debug this situation, Xen will log a warning and > increment both a system-wide and a per-domain counter. dom0 can > retrieve counters for any domain, and a domain can read its own > counter. > > - When to set “dont_desched” is entirely up to the guest kernel, but > there are some general rules guests should follow: > > - Only set “dont_desched” if there is a good reason, and unset it as > soon as possible. Xen gives vCPUs with “dont_desched” set priority > over all other vCPUs on the system, but the amount of time a vCPU is > allowed to run with an elevated priority is limited. Xen will log a > warning if a guest tries to run with elevated priority for too long. > > - Xen boosts vCPUs before delivering an interrupt, but there should be > a way for a vCPU to deboost itself even before returning from the > interrupt handler. > > - Guests should always set “dont_desched” when running hard-realtime > threads (used for e.g. audio processing), even when the thread is in > userspace. This ensures that Xen gives the underlying vCPU priority > over vCPUs > > - Guests should always set “dont_desched” when holding a spin lock, > but it is even better to use paravirtualized spin locks (which make > a hypercall into Xen and therefore allow other vCPUs to run). > > - Xen does not implement priority inheritance, so guests need to do > that. > > - Max boost credits can be set by dom0 via a hypercall. > > The advantage of this approach is that it keeps almost all policy out of > Xen. The only exception is the boosting when an interrupt is received, > but a well-behaved guest will deboost itself very quickly (by enabling > interrupts) if the boost was not actually needed, so this should have > very limited impact. I think this should be enough for realtime audio, > and it is somewhat related to (but hopefully simpler than) the KVM RFC > from Google [1]. > > Any thoughts on this? Overall sounds good. I think a good approach would be to start by implementing it without the "dont_desched" flag, and then add that on top later. It sounds like you have a clear vision for what you want, so it shouldn't be too hard to write such that adding the "dont_desched" doesn't require a lot of pointless refactoring. The other issue I have with this (and essentially where I got stuck developing credit2 in the first place) is testing: how do you ensure that it has the properties that you expect? How do you develop a "regression test" to make sure that server-based workloads don't have issues in this sort of case? -George ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Sketch of an idea for handling the "mixed workload" problem 2024-01-22 11:54 ` George Dunlap @ 2024-01-22 12:17 ` Marek Marczykowski-Górecki 2024-01-22 12:25 ` George Dunlap 2024-01-23 16:58 ` Demi Marie Obenour 1 sibling, 1 reply; 12+ messages in thread From: Marek Marczykowski-Górecki @ 2024-01-22 12:17 UTC (permalink / raw) To: George Dunlap; +Cc: Demi Marie Obenour, Xen-devel, Juergen Gross [-- Attachment #1: Type: text/plain, Size: 911 bytes --] On Mon, Jan 22, 2024 at 11:54:14AM +0000, George Dunlap wrote: > The other issue I have with this (and essentially where I got stuck > developing credit2 in the first place) is testing: how do you ensure > that it has the properties that you expect? Audio is actually quite nice use case at this, since it's quite sensitive for scheduling jitter. I think even a simple "PCI passthrough a sound card and play/record something" should show results. Especially you can measure how hard you can push the system (for example artificial load in other domains) until it breaks. > How do you develop a > "regression test" to make sure that server-based workloads don't have > issues in this sort of case? For this I believe there are several benchmarking methods already, starting with old trusty "Linux kernel build time". -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Sketch of an idea for handling the "mixed workload" problem 2024-01-22 12:17 ` Marek Marczykowski-Górecki @ 2024-01-22 12:25 ` George Dunlap 2024-01-22 12:50 ` Marek Marczykowski-Górecki 0 siblings, 1 reply; 12+ messages in thread From: George Dunlap @ 2024-01-22 12:25 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Demi Marie Obenour, Xen-devel, Juergen Gross On Mon, Jan 22, 2024 at 12:17 PM Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> wrote: > > On Mon, Jan 22, 2024 at 11:54:14AM +0000, George Dunlap wrote: > > The other issue I have with this (and essentially where I got stuck > > developing credit2 in the first place) is testing: how do you ensure > > that it has the properties that you expect? > > Audio is actually quite nice use case at this, since it's quite > sensitive for scheduling jitter. I think even a simple "PCI passthrough a > sound card and play/record something" should show results. Especially > you can measure how hard you can push the system (for example artificial > load in other domains) until it breaks. Are we going have a gitlab runner which says, "Marek sits in front of his test machine and listens to audio for pops"? :-) > > > How do you develop a > > "regression test" to make sure that server-based workloads don't have > > issues in this sort of case? > > For this I believe there are several benchmarking methods already, > starting with old trusty "Linux kernel build time". First of all, AFAICT "Linux kernel bulid time" is not representative of almost any actual server workload; and the end-to-end throughput completely misses what most server workloads will actually care about, like latency. Secondly, what you're testing isn't the performance of a single workload on an empty system; you're testing how workloads *interact*. If you want ideal throughput for a single workload on an empty system, use the null scheduler; more complex schedulers are only necessary when multiple different workloads interact. FWIW this was my first stab at trying to be systematic about testing the scheduler: https://github.com/gwd/schedbench The rump kernel project has basically died AFAIK, so anyone trying to resurrect this would probably have to try to rebase that bit of it against something like XTF or unikernels. -George ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Sketch of an idea for handling the "mixed workload" problem 2024-01-22 12:25 ` George Dunlap @ 2024-01-22 12:50 ` Marek Marczykowski-Górecki 2024-01-22 13:02 ` George Dunlap 0 siblings, 1 reply; 12+ messages in thread From: Marek Marczykowski-Górecki @ 2024-01-22 12:50 UTC (permalink / raw) To: George Dunlap; +Cc: Demi Marie Obenour, Xen-devel, Juergen Gross [-- Attachment #1: Type: text/plain, Size: 2710 bytes --] On Mon, Jan 22, 2024 at 12:25:58PM +0000, George Dunlap wrote: > On Mon, Jan 22, 2024 at 12:17 PM Marek Marczykowski-Górecki > <marmarek@invisiblethingslab.com> wrote: > > > > On Mon, Jan 22, 2024 at 11:54:14AM +0000, George Dunlap wrote: > > > The other issue I have with this (and essentially where I got stuck > > > developing credit2 in the first place) is testing: how do you ensure > > > that it has the properties that you expect? > > > > Audio is actually quite nice use case at this, since it's quite > > sensitive for scheduling jitter. I think even a simple "PCI passthrough a > > sound card and play/record something" should show results. Especially > > you can measure how hard you can push the system (for example artificial > > load in other domains) until it breaks. > > Are we going have a gitlab runner which says, "Marek sits in front of > his test machine and listens to audio for pops"? :-) Kinda ;) We have already audio tests in qubes CI. They do more or less the above, but using our audio virtualization. Play something, record in another domain, and compare. Running the very same thing in gitlab-ci may be too complicated (require bringing in some qubes infrastructure to make PV audio work), but maybe similar test can be done based on qemu-emulated audio or other pv audio solution? > > > How do you develop a > > > "regression test" to make sure that server-based workloads don't have > > > issues in this sort of case? > > > > For this I believe there are several benchmarking methods already, > > starting with old trusty "Linux kernel build time". > > First of all, AFAICT "Linux kernel bulid time" is not representative > of almost any actual server workload; and the end-to-end throughput > completely misses what most server workloads will actually care about, > like latency. > > Secondly, what you're testing isn't the performance of a single > workload on an empty system; you're testing how workloads *interact*. > If you want ideal throughput for a single workload on an empty system, > use the null scheduler; more complex schedulers are only necessary > when multiple different workloads interact. I should have clarified I meant `make -jNN`. But still, that's the same workload on multiple vCPUs. > FWIW this was my first stab at trying to be systematic about testing > the scheduler: > > https://github.com/gwd/schedbench > > The rump kernel project has basically died AFAIK, so anyone trying to > resurrect this would probably have to try to rebase that bit of it > against something like XTF or unikernels. > > -George -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Sketch of an idea for handling the "mixed workload" problem 2024-01-22 12:50 ` Marek Marczykowski-Górecki @ 2024-01-22 13:02 ` George Dunlap 2024-01-22 13:03 ` George Dunlap 0 siblings, 1 reply; 12+ messages in thread From: George Dunlap @ 2024-01-22 13:02 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Demi Marie Obenour, Xen-devel, Juergen Gross On Mon, Jan 22, 2024 at 12:50 PM Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> wrote: > > On Mon, Jan 22, 2024 at 12:25:58PM +0000, George Dunlap wrote: > > On Mon, Jan 22, 2024 at 12:17 PM Marek Marczykowski-Górecki > > <marmarek@invisiblethingslab.com> wrote: > > > > > > On Mon, Jan 22, 2024 at 11:54:14AM +0000, George Dunlap wrote: > > > > The other issue I have with this (and essentially where I got stuck > > > > developing credit2 in the first place) is testing: how do you ensure > > > > that it has the properties that you expect? > > > > > > Audio is actually quite nice use case at this, since it's quite > > > sensitive for scheduling jitter. I think even a simple "PCI passthrough a > > > sound card and play/record something" should show results. Especially > > > you can measure how hard you can push the system (for example artificial > > > load in other domains) until it breaks. > > > > Are we going have a gitlab runner which says, "Marek sits in front of > > his test machine and listens to audio for pops"? :-) > > Kinda ;) > We have already audio tests in qubes CI. They do more or less the above, > but using our audio virtualization. Play something, record in another > domain, and compare. Running the very same thing in gitlab-ci may be too > complicated (require bringing in some qubes infrastructure to make PV > audio work), but maybe similar test can be done based on qemu-emulated > audio or other pv audio solution? > > > > > How do you develop a > > > > "regression test" to make sure that server-based workloads don't have > > > > issues in this sort of case? > > > > > > For this I believe there are several benchmarking methods already, > > > starting with old trusty "Linux kernel build time". > > > > First of all, AFAICT "Linux kernel bulid time" is not representative > > of almost any actual server workload; and the end-to-end throughput > > completely misses what most server workloads will actually care about, > > like latency. > > > > Secondly, what you're testing isn't the performance of a single > > workload on an empty system; you're testing how workloads *interact*. > > If you want ideal throughput for a single workload on an empty system, > > use the null scheduler; more complex schedulers are only necessary > > when multiple different workloads interact. > > I should have clarified I meant `make -jNN`. But still, that's the same > workload on multiple vCPUs. See, you're still not getting it. :-) What you need is not multiple vcpus across a single VM, but multiple instances of different workloads across different VMs. For example: 1. One VM running kernbench 2. two VMs running kernbench, but not competing for vcpu 3. four VMs running kernbench, competing for vcpus 4. three VMs running kernbench, and one playing audio 5. four VMs running kernbench, one of which is *also* playing audio And then you have to collect several metrics: 1. Total kernbench throughput of entire system 2. Kernbench performance of each VM, compared with expected "fair share" 3. Some measure of latency for the audio VM And figure out how to compare trade-offs -- how much total throughput hit should we tolerate to increase fairness? How much fairness hit should we take to decrease latency? And as I said, kernbench isn't really a great server workload; you should do something request-based, measuring both throughput and latency. -George ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Sketch of an idea for handling the "mixed workload" problem 2024-01-22 13:02 ` George Dunlap @ 2024-01-22 13:03 ` George Dunlap 0 siblings, 0 replies; 12+ messages in thread From: George Dunlap @ 2024-01-22 13:03 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Demi Marie Obenour, Xen-devel, Juergen Gross On Mon, Jan 22, 2024 at 1:02 PM George Dunlap <george.dunlap@cloud.com> wrote: > 2. two VMs running kernbench, but not competing for vcpu > 3. four VMs running kernbench, competing for vcpus Sorry, this should be competing for *P*cpus -George ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Sketch of an idea for handling the "mixed workload" problem 2024-01-22 11:54 ` George Dunlap 2024-01-22 12:17 ` Marek Marczykowski-Górecki @ 2024-01-23 16:58 ` Demi Marie Obenour 1 sibling, 0 replies; 12+ messages in thread From: Demi Marie Obenour @ 2024-01-23 16:58 UTC (permalink / raw) To: George Dunlap; +Cc: Xen-devel, Juergen Gross, Marek Marczykowski-Górecki [-- Attachment #1: Type: text/plain, Size: 9310 bytes --] On Mon, Jan 22, 2024 at 11:54:14AM +0000, George Dunlap wrote: > On Mon, Jan 22, 2024 at 12:31 AM Demi Marie Obenour > <demi@invisiblethingslab.com> wrote: > > > > On Fri, Sep 29, 2023 at 05:42:16PM +0100, George Dunlap wrote: > > > The basic credit2 algorithm goes something like this: > > > > > > 1. All vcpus start with the same number of credits; about 10ms worth > > > if everyone has the same weight > > > > > > 2. vcpus burn credits as they consume cpu, based on the relative > > > weights: higher weights burn slower, lower weights burn faster > > > > > > 3. At any given point in time, the runnable vcpu with the highest > > > credit is allowed to run > > > > > > 4. When the "next runnable vcpu" on a runqueue is negative, credit is > > > reset: everyone gets another 10ms, and can carry over at most 2ms of > > > credit over the reset. > > > > > > Generally speaking, vcpus that use less than their quota and have lots > > > of interrupts are scheduled immediately, since when they wake up they > > > always have more credit than the vcpus who are burning through their > > > slices. > > > > > > But what about a situation as described recently on Matrix, where a VM > > > uses a non-negligible amount of cpu doing un-accelerated encryption > > > and decryption, which can be delayed by a few MS, as well as handling > > > audio events? How can we make sure that: > > > > > > 1. We can run whenever interrupts happen > > > 2. We get no more than our fair share of the cpu? > > > > > > The counter-intuitive key here is that in order to achieve the above, > > > you need to *deschedule or preempt early*, so that when the interrupt > > > comes, you have spare credit to run the interrupt handler. How do we > > > manage that? > > > > > > The idea I'm working out comes from a phrase I used in the Matrix > > > discussion, about a vcpu that "foolishly burned all its credits". > > > Naturally the thing you want to do to have credits available is to > > > save them up. > > > > > > So the idea would be this. Each vcpu would have a "boost credit > > > ratio" and a "default boost interval"; there would be sensible > > > defaults based on typical workloads, but these could be tweaked for > > > individual VMs. > > > > > > When credit is assigned, all VMs would get the same amount of credit, > > > but divided into two "buckets", according to the boost credit ratio. > > > > > > Under certain conditions, a vcpu would be considered "boosted"; this > > > state would last either until the default boost interval, or until > > > some other event (such as a de-boost yield). > > > > > > The queue would be sorted thus: > > > > > > * Boosted vcpus, by boost credit available > > > * Non-boosted vcpus, by non-boost credit available > > > > > > Getting more boost credit means having lower priority when not > > > boosted; and burning through your boost credit means not being > > > scheduled when you need to be. > > > > > > Other ways we could consider putting a vcpu into a boosted state (some > > > discussed on Matrix or emails linked from Matrix): > > > * Xen is about to preempt, but finds that the vcpu interrupts are > > > blocked (this sort of overlaps with the "when we deliver an interrupt" > > > one) > > > * Xen is about to preempt, but finds that the (currently out-of-tree) > > > "dont_desched" bit has been set in the shared memory area > > > > I think both of these would be good. Another one would be when Xen is > > about to deliver an interrupt to a guest, provided that there is no > > storm of interrupts. I’ve seen a USB webcam cause a system-wide latency > > spike through what I presume is an interrupt storm, and I suspect that > > others have observed similar behavior with USB external drives. > > How would you determine that a given interrupt was part of a "storm", > and what would you do differently as a result of determining that? I’m not sure. One heuristic might be that if a device assigned to a VM is interrupting Xen too many times while Xen is running other VMs, interrupts from that device are blocked as needed to ensure other VMs get to execute. Theoretically, an interrupt from a USB storage device should be safe to block until Xen is no longer running boosted workloads, but an interrupt from a USB microphone or speaker is not. > > > Other ways to consider de-boosting: > > > * There's a way to trigger a VMEXIT when interrupts have been > > > re-enabled; setting this up when the VM is in the boost state > > > > That’s a good idea, but should be conditional on “dont_desched” _not_ > > being set. This handles the case where the guest is running a realtime > > thread. > > In which case we need some way for the "enlightened" guest to know how > to de-boost itself; a yield might do. That would be sufficient. > > Generally, I’d like to see something like this: > > > > - A vCPU with sufficient boost credit is boosted by Xen under the > > following conditions: > > > > 1. Xen interrupts the guest. > > I take it you mean, "delivers an interrupt to the guest"? Yes. > > 2. Xen is about to preempt, but detects that “dont_desched” is set. > > 3. Xen is about to preempt, but detects that interrupts are disabled. > > > > - A vCPU is deboosted if: > > > > 1. It runs out of boost credit, even if “dont_desched” is set. > > 2. An interrupt handler returns, but only if “dont_desched” is not set. > > 3. Interrupts are re-enabled, but only if “dont_desched” is not set. > > > > The first case is an abnormal condition and typically means that > > either the system is overloaded or a vCPU is running boosted for too > > long. To help debug this situation, Xen will log a warning and > > increment both a system-wide and a per-domain counter. dom0 can > > retrieve counters for any domain, and a domain can read its own > > counter. > > > > - When to set “dont_desched” is entirely up to the guest kernel, but > > there are some general rules guests should follow: > > > > - Only set “dont_desched” if there is a good reason, and unset it as > > soon as possible. Xen gives vCPUs with “dont_desched” set priority > > over all other vCPUs on the system, but the amount of time a vCPU is > > allowed to run with an elevated priority is limited. Xen will log a > > warning if a guest tries to run with elevated priority for too long. > > > > - Xen boosts vCPUs before delivering an interrupt, but there should be > > a way for a vCPU to deboost itself even before returning from the > > interrupt handler. > > > > - Guests should always set “dont_desched” when running hard-realtime > > threads (used for e.g. audio processing), even when the thread is in > > userspace. This ensures that Xen gives the underlying vCPU priority > > over vCPUs > > > > - Guests should always set “dont_desched” when holding a spin lock, > > but it is even better to use paravirtualized spin locks (which make > > a hypercall into Xen and therefore allow other vCPUs to run). > > > > - Xen does not implement priority inheritance, so guests need to do > > that. > > > > - Max boost credits can be set by dom0 via a hypercall. > > > > The advantage of this approach is that it keeps almost all policy out of > > Xen. The only exception is the boosting when an interrupt is received, > > but a well-behaved guest will deboost itself very quickly (by enabling > > interrupts) if the boost was not actually needed, so this should have > > very limited impact. I think this should be enough for realtime audio, > > and it is somewhat related to (but hopefully simpler than) the KVM RFC > > from Google [1]. > > > > Any thoughts on this? > > Overall sounds good. I think a good approach would be to start by > implementing it without the "dont_desched" flag, and then add that on > top later. It sounds like you have a clear vision for what you want, > so it shouldn't be too hard to write such that adding the > "dont_desched" doesn't require a lot of pointless refactoring. > > The other issue I have with this (and essentially where I got stuck > developing credit2 in the first place) is testing: how do you ensure > that it has the properties that you expect? How do you develop a > "regression test" to make sure that server-based workloads don't have > issues in this sort of case? I don’t have any server workloads myself. Would it be reasonable to ask those who do have such workloads to develop such a test? They would be in a much better position to check for regressions on these workloads, and have server hardware that they can use to benchmark such workloads. I just have my laptop and a test laptop, both running Qubes OS. It’s also possible that some of these changes will improve latency at the expense of throughput. In that case, I could add a Xen command-line option (or even a runtime toggle) that controls whether Xen honors the boost state. I do expect that the rest of the logic should have very little overhead in this case. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-01-23 16:58 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-29 16:42 Sketch of an idea for handling the "mixed workload" problem George Dunlap 2023-09-30 23:28 ` Demi Marie Obenour 2023-10-02 11:20 ` George Dunlap 2024-01-21 23:46 ` Demi Marie Obenour 2024-01-22 0:31 ` Demi Marie Obenour 2024-01-22 11:54 ` George Dunlap 2024-01-22 12:17 ` Marek Marczykowski-Górecki 2024-01-22 12:25 ` George Dunlap 2024-01-22 12:50 ` Marek Marczykowski-Górecki 2024-01-22 13:02 ` George Dunlap 2024-01-22 13:03 ` George Dunlap 2024-01-23 16:58 ` Demi Marie Obenour
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.