From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-173.mta0.migadu.com (out-173.mta0.migadu.com [91.218.175.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B67D33E44F9 for ; Thu, 25 Jun 2026 12:51:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782391918; cv=none; b=LKG+zlRBrwTGcYJ9G2QLlqLmT7NCU5dRnCzwkk46vIyHL5yDoLt6T7+SNhMK3ltHjGL79I5fAt6hYyP9pmUrcrp1aSpM63btgApR9jDEuOwWexj4g/dizns+4UyrI19ttOGwykENsS3t2evYz24NBpYUTGSJeF2UBnEQ518PwMY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782391918; c=relaxed/simple; bh=qI5a312uzCO1/du7mDSlEQPVStrftWOg9+WuhzXFfwU=; h=Message-ID:Date:MIME-Version:Cc:Subject:To:References:From: In-Reply-To:Content-Type; b=kqDiEFu/u5sz3JaYh+g62c1XUi1s2ZBL31p/lMzfVhe+KgKAe8OAwjuMPRapo6gtdw0WkNRtYvPvUhBP3JFaGHlPvXlrtknE2rkTUuu19C9xgagWva1dhy4Xc02GQ4BtDxEpfOg31L5ry7wDuzh4WCX6vq+R5ZnHHQNhclRJPKc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=GipR/ca6; arc=none smtp.client-ip=91.218.175.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="GipR/ca6" Message-ID: <416dfbf8-f765-442e-b6de-6fc0fe1a4b5f@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1782391914; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6vCDn7zTlDDt00JE09Z8ikXG1La8GTobpXUF8h+MjkI=; b=GipR/ca6QP+LZiejM68dl/qx1WxQNZGfK1ZN64TuNLOIJ0p9sLapjmbHg38LmulMO5Xtc9 byDWtHSF7gxLDvGahshb9ZIJ9bICzFr97ofI6UnejOO8W0UsoLp0YTbrjM6LyRTnFLyX+C UmIABC2PlhqeDTw6uyayBIQmY6hvN1o= Date: Thu, 25 Jun 2026 20:51:45 +0800 Precedence: bulk X-Mailing-List: loongarch@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Cc: cui.tao@linux.dev, kernel@xen0n.name, kvm@vger.kernel.org, Tao Cui Subject: Re: [PATCH v4 2/3] LoongArch: KVM: Implement guest-side PV TLB flush To: Bibo Mao , zhaotianrui@loongson.cn, chenhuacai@kernel.org, loongarch@lists.linux.dev References: <20260615082154.42144-1-cui.tao@linux.dev> <20260615082154.42144-3-cui.tao@linux.dev> <0c47ce21-9a4d-4cdc-9bec-ce749e31512e@loongson.cn> <1bfa4941-b94b-410a-9b64-c13f2712edf9@linux.dev> <9ec9d22b-93fd-3dcb-c6b8-19563f1b7c0a@loongson.cn> <404180d0-c734-4465-8752-f43279730692@linux.dev> <6f837e00-e606-ddaa-b22b-dd30348a18fb@loongson.cn> <084d7bec-592c-31b5-aa44-099bff87af9a@loongson.cn> <013459e1-f817-42b5-a0fd-20b38d9b6140@linux.dev> <835a9c5c-2f66-293b-d093-fc59bac26a01@loongson.cn> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Tao Cui In-Reply-To: <835a9c5c-2f66-293b-d093-fc59bac26a01@loongson.cn> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT 在 2026/6/25 15:36, Bibo Mao 写道: > > > On 2026/6/25 下午3:15, Tao Cui wrote: >> >> >> 在 2026/6/25 14:11, Bibo Mao 写道: >>> >>> >>> On 2026/6/25 上午11:31, Bibo Mao wrote: >>>> >>>> >>>> On 2026/6/25 上午10:27, Tao Cui wrote: >>>>> >>>>> Hi Bibo, >>>>> >>>>> 在 2026/6/17 09:05, Bibo Mao 写道: >>>>>> >>>>>>> Rather than argue from intuition, I'd like to try the hypercall approach >>>>>>> you suggested and measure the performance improvement against the current >>>>>>> path. I'll share the results with you once the testing is done, so we >>>>>>> can decide the direction based on the numbers. >>>>>> well, that is the best. It is my pleasure to discuss this with you. >>>>>> >>>>> >>>>> A quick update on the testing. I put both the hypercall and the >>>>> steal-time variants through two benchmarks on an 8-core host with a >>>>> 4:1 overcommitted guest (32 vCPUs), and wanted to share where things >>>>> stand. >>>>> >>>>> The two workloads: >>>>>    - ebizzy (all threads busy, mm-flush heavy) >>>>>    - tlb_bench in sleep-idle mode (1 flusher + 31 sleeping idle threads, >>>>>      so the idle vCPUs get preempted) >>>>> >>>>> ebizzy (records/s, higher is better), 32 vCPUs: >>>>>     no-PV       ~103,737 >>>>>     hypercall   ~105,779 >>>>>     steal-time  ~105,872 >>>>>     -> all within noise (±2%); no measurable difference. >>>> Hi Tao, >>>> >>>> what is ebizzy command? ebizzy -m or ebizzy -M. >>>> >>>> could you try command on host and one VM without over-committed at first, and then two VMs and three VMs? >>>> >>>> Here is result on my 3C5000 Dual-way machines with 32 cores and two numa nodes: >>>>                   ./ebizzy -m          ./ebizzy -M >>>> host             8633                 158898 >>>> VM(32 vCPUs)     6610                 133153 >>>> VM/host          76%                  83% >>>> >>> just ./ebizzy -M is enough, it seems that CPU number is one key factor. >>> >> >> Sorry for the delay — it turned out my ebizzy command was wrong. I had >> been running `ebizzy -t -S 10`, which is neither -m nor -M, so >> neither mmap mode was active and the workload wasn't really stressing >> the TLB-flush path. Thanks for catching it. >> >> I re-ran with -m and -M on host and a single VM (8-core LoongArch >> KVM host, 8 vCPU guest, 1:1, no overcommit). >> >>                   ebizzy -m        ebizzy -M >> host             ~20,000          ~55,000 >> VM (8 vCPU,1:1)  ~17,000          ~53,000 >> VM/host          ~86%             ~97% >> >> The -m ratio (86%) is close to your 76%. > On my 3C5000 Dual-way machine, VM has the same CPU/memory topology with physical machine, the kernel is mainline without any patch. >                           ./ebizzy -M > Host (32 pCPUs)           158898 > One VM(32 vCPUs)          133153                 83% of host > Two VMs(32 vCPUs each)    9083 + 9630 = 18713    11% of host > > It seems that with ebizzy benchmark, there is big difference if vCPU is preempted. Even if vCPU is not preempted, the performance is only 83% of host on my 3C5000 Dual-way machine. After fixing the ebizzy command, I have multi-VM overcommit results for both approaches. Setup: 8-core LoongArch (single-socket, single NUMA), KVM, linux-next-20260623, 8 vCPU per VM. All VMs run ebizzy -M simultaneously, 3 runs each. The PV-off baseline uses a guest kernel with CONFIG_PARAVIRT=y but without the PV TLB flush patches, so PV IPI and steal-time are active in all three columns. ebizzy -M, total records/s across all VMs: PV-off steal-time hypercall 1:1 (1VM) 53,600 53,800 53,900 2:1 (2VM) 2,600 42,600 45,300 3:1 (3VM) 2,800 44,700 46,000 At 1:1 there is no difference — no vCPU gets preempted. Under overcommit, without PV TLB flush the throughput drops to ~3-5% of the single-VM case, because every remote TLB flush sends IPIs to preempted vCPUs. With either PV TLB flush variant, preempted vCPUs are skipped, and total throughput stays at ~85-90% of single-VM (bounded by physical cores). Hypercall is consistently 3-6% above steal-time in the overcommit cases. A possible reason is that the hypercall hands the entire target set to the host in one call, while steal-time still IPIs the running vCPUs and only defers the preempted ones. On our 8-core machine the collapse is more severe than on your 3C5000 (~5% vs 11% of host at 2 VMs), likely due to the smaller core count and single-NUMA topology. Thanks, Tao > > Regards > Bibo Mao >> >> I then tried multi-VM overcommit (2 and 3 VMs, all running ebizzy -M >> simultaneously). The initial result showed a large gap between the >> PV-TLB-flush kernel and the baseline under overcommit. Both kernels >> have CONFIG_PARAVIRT enabled (PV IPI and steal-time are active in >> both), so PV TLB flush should be the main differentiator — but since >> they are two separate kernel images rather than a clean on/off toggle, >> there may be some noise from other differences. I'm now re-running >> with a QEMU CPU property (kvm-pv-tlb-flush on/off) on the same kernel >> to isolate the effect cleanly. >> >> I'll share the verified numbers once I have them. >> >> Thanks, >> Tao >> >>>> Regards >>>> Bibo Mao >>>> >>>>> >>>>> tlb_bench sleep-idle (ns/flush, lower is better), 1 flusher + 31 idle: >>>>>     no-PV       ~166,536 >>>>>     steal-time  ~149,553 >>>>>     hypercall    ~88,686 >>>>> >>>>> ebizzy's workload is mostly threads staying busy with alloc/copy/free, >>>>> which drives remote TLB flushes against running vCPUs — that may not be >>>>> the path this feature is meant to optimize, so the flat result there >>>>> probably says more about the workload mismatch than about the feature >>>>> itself. I need to take another look at whether the benchmark actually >>>>> exercises the cases PV TLB flush targets before reading too much into >>>>> the numbers, including the tlb_bench figure above. >>>>> >>>>> So I'd hold off on any conclusion for now. Next I'll re-examine the >>>>> test setup / pick a workload that better matches the feature, and keep >>>>> you posted once I have something more representative. >>>>> >>>>> Best, >>>>> Tao >>>>> >>> >