From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB94D6A032 for ; Thu, 14 Dec 2023 20:13:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3uXO53yS" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-1d32b4a8ea0so52019595ad.3 for ; Thu, 14 Dec 2023 12:13:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702584816; x=1703189616; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=o6UuF2I+ksa0nRBkfH8g4L0LiVSfO6AHAS12VBA0d84=; b=3uXO53yS0xx9GMoa+komR8PI861qPpzCKzKGgx56phJ8Hs2Bgv1uJfdXktZ1DtAf+h LMpt+mcZiChHv4qbuyM9LI+CGrJNAriiPtatLimiZp9Y41osIH4nu3+QaYQa9ylrtp9c 5XFOnAdGlNNMZmSwqtIBU87d/AbBuy+Zx0egjOtZmvDQZuot3Y92Q/+KUbJ4NWxR31Ci TY1i5HPd93n+XrZ9a+LE2NPhE1cX9lYO0t+sHD/vPL0IxZus8SjQF/u/RkJCZZfEzpgP Sl25R2eTWqEldgUPSw2vtHsi792+1hzII1t1YpSE4U10WnU/KUCMqeRZhMckVE/88TC4 +8Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702584816; x=1703189616; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=o6UuF2I+ksa0nRBkfH8g4L0LiVSfO6AHAS12VBA0d84=; b=COL0xAYvRLNVQYnEsgW5HbsCMYTuJAyfGgFBXV7xZ7WaFPROxJsKTVpKyuuTfLc+3J ilFjZmhTPV1FEeamfXiPpldpikbyY2t5UEkBdVrHTCPMFFiCenYf3r/tO1tet+rmMsCd yxE8jRbef4fCFYN3zkN68W+7pSj4jCh6RG9JiF8ocDqLKm1Emvw9DsN6vjigNP7FlraZ 921y/Ta6f0kuS9e8WmRFqh4H6FreGvod+Sc6Cpv+032tWOFzbDF/5EEtCkgCk96DHhLc 1oG5W+UUPJ5C7VBo/f8nT2LYm9/ghb4Pb9LCJ0/bPi0zweTfpdeSki/4+kibowFzaNee UMBg== X-Gm-Message-State: AOJu0YwDLzuThjsbl/m3yTWulVnzyy2pyntqnz58Cc6jT3R4n7OoppAi SzVj60KnIpSym5Q+oOqMzLGD0CWjuB0= X-Google-Smtp-Source: AGHT+IGEiwa1cSoRDflydMN7++Q6taulTF4HV3L/RNxde17wAejKqSXkzjYYxVXM2/0619dGhrSppCrZiaM= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:b18f:b0:1d0:9b2d:5651 with SMTP id s15-20020a170902b18f00b001d09b2d5651mr1570045plr.1.1702584816052; Thu, 14 Dec 2023 12:13:36 -0800 (PST) Date: Thu, 14 Dec 2023 12:13:34 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231214024727.3503870-1-vineeth@bitbyteword.org> Message-ID: Subject: Re: [RFC PATCH 0/8] Dynamic vcpu priority management in kvm From: Sean Christopherson To: Vineeth Remanan Pillai Cc: Ben Segall , Borislav Petkov , Daniel Bristot de Oliveira , Dave Hansen , Dietmar Eggemann , "H . Peter Anvin" , Ingo Molnar , Juri Lelli , Mel Gorman , Paolo Bonzini , Andy Lutomirski , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Valentin Schneider , Vincent Guittot , Vitaly Kuznetsov , Wanpeng Li , Suleiman Souhlal , Masami Hiramatsu , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, Tejun Heo , Josh Don , Barret Rhoden , David Vernet , Joel Fernandes Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Thu, Dec 14, 2023, Vineeth Remanan Pillai wrote: > On Thu, Dec 14, 2023 at 11:38=E2=80=AFAM Sean Christopherson wrote: > Now when I think about it, the implementation seems to > suggest that we are putting policies in kvm. Ideally, the goal is: > - guest scheduler communicates the priority requirements of the workload > - kvm applies the priority to the vcpu task. Why? Tasks are tasks, why does KVM need to get involved? E.g. if the prob= lem is that userspace doesn't have the right knobs to adjust the priority of a = task quickly and efficiently, then wouldn't it be better to solve that problem i= n a generic way? > - Now that vcpu is appropriately prioritized, host scheduler can make > the right choice of picking the next best task. >=20 > We have an exception of proactive boosting for interrupts/nmis. I > don't expect these proactive boosting cases to grow. And I think this > also to be controlled by the guest where the guest can say what > scenarios would it like to be proactive boosted. >=20 > That would make kvm just a medium to communicate the scheduler > requirements from guest to host and not house any policies. What do > you think? ... =20 > > Pushing the scheduling policies to host userspace would allow for far m= ore control > > and flexibility. E.g. a heavily paravirtualized environment where host= userspace > > knows *exactly* what workloads are being run could have wildly differen= t policies > > than an environment where the guest is a fairly vanilla Linux VM that h= as received > > a small amount of enlightment. > > > > Lastly, if the concern/argument is that userspace doesn't have the righ= t knobs > > to (quickly) boost vCPU tasks, then the proposed sched_ext functionalit= y seems > > tailor made for the problems you are trying to solve. > > > > https://lkml.kernel.org/r/20231111024835.2164816-1-tj%40kernel.org > > > You are right, sched_ext is a good choice to have policies > implemented. In our case, we would need a communication mechanism as > well and hence we thought kvm would work best to be a medium between > the guest and the host. Making KVM be the medium may be convenient and the quickest way to get a Po= C out the door, but effectively making KVM a middle-man is going to be a huge= net negative in the long term. Userspace can communicate with the guest just a= s easily as KVM, and if you make KVM the middle-man, then you effectively *mu= st* define a relatively rigid guest/host ABI. If instead the contract is between host userspace and the guest, the ABI ca= n be much more fluid, e.g. if you (or any setup) can control at least some amoun= t of code that runs in the guest, then the contract between the guest and host d= oesn't even need to be formally defined, it could simply be a matter of bundling h= ost and guest code appropriately. If you want to land support for a given contract in upstream repositories, = e.g. to broadly enable paravirt scheduling support across a variety of usersepac= e VMMs and/or guests, then yeah, you'll need a formal ABI. But that's still not a= good reason to have KVM define the ABI. Doing it in KVM might be a wee bit easi= er because it's largely just a matter of writing code, and LKML provides a centralized= channel for getting buyin from all parties. But defining an ABI that's independent= of the kernel is absolutely doable, e.g. see the many virtio specs. I'm not saying KVM can't help, e.g. if there is information that is known o= nly to KVM, but the vast majority of the contract doesn't need to be defined by= KVM.