From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F3E8BC6FA82 for ; Tue, 20 Sep 2022 09:11:45 +0000 (UTC) Received: from localhost ([::1]:33092 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oaZI8-0002sz-Ai for qemu-devel@archiver.kernel.org; Tue, 20 Sep 2022 05:11:44 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60846) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oaYkK-0000oD-2n for qemu-devel@nongnu.org; Tue, 20 Sep 2022 04:36:54 -0400 Received: from mail-pg1-x530.google.com ([2607:f8b0:4864:20::530]:43004) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oaYkH-0001l5-5y for qemu-devel@nongnu.org; Tue, 20 Sep 2022 04:36:47 -0400 Received: by mail-pg1-x530.google.com with SMTP id t190so1839690pgd.9 for ; Tue, 20 Sep 2022 01:36:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=8xut7OO+tMUA59UTrF0V0vnVS5Og/Ucefg+dtagaSCk=; b=ekwZh7VT5xu1NGEcs7Sl9AvBiq2gVbTaAE73XhwT6S06Xnb0JNnNEyTMGy3niK2bWT hDNMng0lBRCEm6qpQ2r0rPQ6PbNsguH/5nY+dxSTo697rCeET/9JWv4PovJIB3ipvHfK Jgv3hwmKC6nH3L+I5RMXYbiON9MZMkwepCGb1cFgI/MhO3KSTKxW5qD+iBA7jyAIx7p0 nUIcepaJrAIIlBdA4H4WQBHZgJ5PD9egjPbJFltNpvGoguxUWi0fjRAHwFz2bvs42cIb 960QRzStRgIWNFy7qlgIrPw2clYH7SAldsOQMPRZiKSD5NEo5uEpqhBvS6o6fRDyiuMN /ETQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=8xut7OO+tMUA59UTrF0V0vnVS5Og/Ucefg+dtagaSCk=; b=peZOa2xrmeF82G1uYNwk3MTXh6LR4KTRaigmvQD72k9N3i7s5MTK1T8wfF/hYSaAVo 0SWHCWwIpwZRs0r2siuX3wzsG9TkOBYY+sR+JuLkUk4GZYcMtv8EpLdCAO4u03guha/2 +DjHLSEV/zJ1OlFaDD41ORJjEeJvkrujqCFEIjob7aY/6YXn5gvB4WaygsDCdlkJMyZC qM+EG7zetJBTyF1JvyBESWbYnqZYoRQa2sy0qBOrfnDA8zVw6x0b3koX3OP58Y4aOZVJ ct1dZJd/XkwYgZlB1Ra8/Y0yQOjIYu7MgjLfe89gPBUGQU/hFbCRmVcGQLOBSho7DB/C s5ow== X-Gm-Message-State: ACrzQf1xVY2WBCwwHU9Tybt98xXv4ukSr4zWoUzSfDwbtthz9UNTT5zc 12mwoJYPxgCk+nSzINCDrDVWPXO6pAXHRZXeOmuHKQ== X-Google-Smtp-Source: AMsMyM70cWYXzraaRriSURIEpWbYh4lSAtCHdAaEw/hh/LeKExPHtM5AaUevmY61170q9AkoQhFHFh0b7l3GFwXAfNQ= X-Received: by 2002:a63:575a:0:b0:439:169f:4b5a with SMTP id h26-20020a63575a000000b00439169f4b5amr19984505pgm.595.1663662999139; Tue, 20 Sep 2022 01:36:39 -0700 (PDT) MIME-Version: 1.0 References: <20220824221701.41932-1-atishp@rivosinc.com> In-Reply-To: From: Atish Kumar Patra Date: Tue, 20 Sep 2022 01:36:28 -0700 Message-ID: Subject: Re: [PATCH v14 0/5] Improve PMU support To: Alistair Francis Cc: "qemu-devel@nongnu.org Developers" , Alistair Francis , Bin Meng , Palmer Dabbelt , "open list:RISC-V" Content-Type: multipart/alternative; boundary="00000000000017119c05e917bac3" Received-SPF: pass client-ip=2607:f8b0:4864:20::530; envelope-from=atishp@rivosinc.com; helo=mail-pg1-x530.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" --00000000000017119c05e917bac3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Sep 19, 2022 at 3:08 PM Alistair Francis wrote: > On Thu, Aug 25, 2022 at 8:22 AM Atish Patra wrote: > > > > The latest version of the SBI specification includes a Performance > Monitoring > > Unit(PMU) extension[1] which allows the supervisor to > start/stop/configure > > various PMU events. The Sscofpmf ('Ss' for Privileged arch and > Supervisor-level > > extensions, and 'cofpmf' for Count OverFlow and Privilege Mode Filterin= g) > > extension[2] allows the perf like tool to handle overflow interrupts an= d > > filtering support. > > > > This series implements remaining PMU infrastructure to support > > PMU in virt machine. The first seven patches from the original series > > have been merged already. > > > > This will allow us to add any PMU events in future. > > Currently, this series enables the following omu events. > > 1. cycle count > > 2. instruction count > > 3. DTLB load/store miss > > 4. ITLB prefetch miss > > > > The first two are computed using host ticks while last three are counte= d > during > > cpu_tlb_fill. We can do both sampling and count from guest userspace. > > This series has been tested on both RV64 and RV32. Both Linux[3] and > Opensbi[4] > > patches are required to get the perf working. > > > > Here is an output of perf stat/report while running hackbench with late= st > > OpenSBI & Linux kernel. > > > > Perf stat: > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > [root@fedora-riscv ~]# perf stat -e cycles -e instructions -e > dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \ > > > perf bench sched messaging -g 1 -l 10 > > # Running 'sched/messaging' benchmark: > > # 20 sender and receiver processes per group > > # 1 groups =3D=3D 40 processes run > > > > Total time: 0.265 [sec] > > > > Performance counter stats for 'perf bench sched messaging -g 1 -l 10': > > > > 4,167,825,362 cycles > > 4,166,609,256 instructions # 1.00 insn per > cycle > > 3,092,026 dTLB-load-misses > > 258,280 dTLB-store-misses > > 2,068,966 iTLB-load-misses > > > > 0.585791767 seconds time elapsed > > > > 0.373802000 seconds user > > 1.042359000 seconds sys > > > > Perf record: > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > [root@fedora-riscv ~]# perf record -e cycles -e instructions \ > > > -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c 10000= \ > > > perf bench sched messaging -g 1 -l 10 > > # Running 'sched/messaging' benchmark: > > # 20 sender and receiver processes per group > > # 1 groups =3D=3D 40 processes run > > > > Total time: 1.397 [sec] > > [ perf record: Woken up 10 times to write data ] > > Check IO/CPU overload! > > [ perf record: Captured and wrote 8.211 MB perf.data (214486 samples) ] > > > > [root@fedora-riscv riscv]# perf report > > Available samples > > 107K cycles > =E2=97=86 > > 107K instructions > =E2=96=92 > > 250 dTLB-load-misses > =E2=96=92 > > 13 dTLB-store-misses > =E2=96=92 > > 172 iTLB-load-misses > > .. > > > > Changes from v13->v14: > > 1. Added sanity check for the hashtable in pmu.c > > > > Changes from v12->v13: > > 1. Rebased on top of the apply-next. > > 2. Addressed comments about space & comment block. > > > > Changes from v11->v12: > > 1. Rebased on top of the apply-next. > > 2. Aligned the write function & .min_priv to the previous line. > > 3. Fixed the FDT generations for multi-socket scenario. > > 4. Dropped interrupt property from the DT. > > 5. Generate illegal instruction fault instead of virtual instruction > fault > > for VS/VU access while mcounteren is not set. > > > > Changes from v10->v11: > > 1. Rebased on top of the master where first 7 patches were already > merged. > > 2. Removed unnecessary additional check in ctr predicate function. > > 3. Removed unnecessary priv version checks in mcountinhibit read/write. > > 4. Added Heiko's reviewed-by/tested-by tags. > > > > Changes from v8->v9: > > 1. Added the write_done flags to the vmstate. > > 2. Fixed the hpmcounter read access from M-mode. > > > > Changes from v7->v8: > > 1. Removeding ordering constraints for mhpmcounter & mhpmevent. > > > > Changes from v6->v7: > > 1. Fixed all the compilation errors for the usermode. > > > > Changes from v5->v6: > > 1. Fixed compilation issue with PATCH 1. > > 2. Addressed other comments. > > > > Changes from v4->v5: > > 1. Rebased on top of the -next with following patches. > > - isa extension > > - priv 1.12 spec > > 2. Addressed all the comments on v4 > > 3. Removed additional isa-ext DT node in favor of riscv,isa string upda= te > > > > Changes from v3->v4: > > 1. Removed the dummy events from pmu DT node. > > 2. Fixed pmu_avail_counters mask generation. > > 3. Added a patch to simplify the predicate function for counters. > > > > Changes from v2->v3: > > 1. Addressed all the comments on PATCH1-4. > > 2. Split patch1 into two separate patches. > > 3. Added explicit comments to explain the event types in DT node. > > 4. Rebased on latest Qemu. > > > > Changes from v1->v2: > > 1. Dropped the ACks from v1 as signficant changes happened after v1. > > 2. sscofpmf support. > > 3. A generic counter management framework. > > > > [1] > https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc > > [2] > https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit > > [3] https://github.com/atishp04/qemu/tree/riscv_pmu_v14 > > > > Atish Patra (5): > > target/riscv: Add sscofpmf extension support > > target/riscv: Simplify counter predicate function > > target/riscv: Add few cache related PMU events > > hw/riscv: virt: Add PMU DT node to the device tree > > target/riscv: Update the privilege field for sscofpmf CSRs > > Sorry, but this doesn't apply. Are you able to rebase it? > > I am a bit confused. Your PULL request on Sep 7th already included this & sstc series. I can see the patches in upstream qemu as well. > Alistair > > > > > hw/riscv/virt.c | 16 ++ > > target/riscv/cpu.c | 12 ++ > > target/riscv/cpu.h | 25 +++ > > target/riscv/cpu_bits.h | 55 +++++ > > target/riscv/cpu_helper.c | 25 +++ > > target/riscv/csr.c | 304 +++++++++++++++++---------- > > target/riscv/machine.c | 1 + > > target/riscv/pmu.c | 425 +++++++++++++++++++++++++++++++++++++- > > target/riscv/pmu.h | 8 + > > 9 files changed, 760 insertions(+), 111 deletions(-) > > > > -- > > 2.25.1 > > > > > --00000000000017119c05e917bac3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Mon, Sep 19, 2022 at 3:08 PM Alist= air Francis <alistair23@gmail.co= m> wrote:
On Thu, Aug 25, 2022 at 8:22 AM Atish Patra <atishp@rivosinc.com> wrote:
>
> The latest version of the SBI specification includes a Performance Mon= itoring
> Unit(PMU) extension[1] which allows the supervisor to start/stop/confi= gure
> various PMU events. The Sscofpmf ('Ss' for Privileged arch and= Supervisor-level
> extensions, and 'cofpmf' for Count OverFlow and Privilege Mode= Filtering)
> extension[2] allows the perf like tool to handle overflow interrupts a= nd
> filtering support.
>
> This series implements remaining PMU infrastructure to support
> PMU in virt machine. The first seven patches from the original series<= br> > have been merged already.
>
> This will allow us to add any PMU events in future.
> Currently, this series enables the following omu events.
> 1. cycle count
> 2. instruction count
> 3. DTLB load/store miss
> 4. ITLB prefetch miss
>
> The first two are computed using host ticks while last three are count= ed during
> cpu_tlb_fill. We can do both sampling and count from guest userspace.<= br> > This series has been tested on both RV64 and RV32. Both Linux[3] and O= pensbi[4]
> patches are required to get the perf working.
>
> Here is an output of perf stat/report while running hackbench with lat= est
> OpenSBI & Linux kernel.
>
> Perf stat:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> [root@fedora-riscv ~]# perf stat -e cycles -e instructions -e dTLB-loa= d-misses -e dTLB-store-misses -e iTLB-load-misses \
> > perf bench sched messaging -g 1 -l 10
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 1 groups =3D=3D 40 processes run
>
>=C2=A0 =C2=A0 =C2=A0 Total time: 0.265 [sec]
>
>=C2=A0 Performance counter stats for 'perf bench sched messaging -g= 1 -l 10':
>
>=C2=A0 =C2=A0 =C2=A0 4,167,825,362=C2=A0 =C2=A0 =C2=A0 cycles
>=C2=A0 =C2=A0 =C2=A0 4,166,609,256=C2=A0 =C2=A0 =C2=A0 instructions=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 #=C2=A0 =C2=A0 1.00=C2=A0 ins= n per cycle
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 3,092,026=C2=A0 =C2=A0 =C2=A0 dTLB-l= oad-misses
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 258,280=C2=A0 =C2=A0 =C2=A0 d= TLB-store-misses
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 2,068,966=C2=A0 =C2=A0 =C2=A0 iTLB-l= oad-misses
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 0.585791767 seconds time elapsed
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 0.373802000 seconds user
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 1.042359000 seconds sys
>
> Perf record:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> [root@fedora-riscv ~]# perf record -e cycles -e instructions \
> > -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c 1= 0000 \
> > perf bench sched messaging -g 1 -l 10
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 1 groups =3D=3D 40 processes run
>
>=C2=A0 =C2=A0 =C2=A0 Total time: 1.397 [sec]
> [ perf record: Woken up 10 times to write data ]
> Check IO/CPU overload!
> [ perf record: Captured and wrote 8.211 MB perf.data (214486 samples) = ]
>
> [root@fedora-riscv riscv]# perf report
> Available samples
> 107K cycles=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =E2=97=86
> 107K instructions=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =E2=96=92
> 250 dTLB-load-misses=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0=E2=96=92
> 13 dTLB-store-misses=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0=E2=96=92
> 172 iTLB-load-misses
> ..
>
> Changes from v13->v14:
> 1. Added sanity check for the hashtable in pmu.c
>
> Changes from v12->v13:
> 1. Rebased on top of the apply-next.
> 2. Addressed comments about space & comment block.
>
> Changes from v11->v12:
> 1. Rebased on top of the apply-next.
> 2. Aligned the write function & .min_priv to the previous line. > 3. Fixed the FDT generations for multi-socket scenario.
> 4. Dropped interrupt property from the DT.
> 5. Generate illegal instruction fault instead of virtual instruction f= ault
>=C2=A0 =C2=A0 for VS/VU access while mcounteren is not set.
>
> Changes from v10->v11:
> 1. Rebased on top of the master where first 7 patches were already mer= ged.
> 2. Removed unnecessary additional check in ctr predicate function.
> 3. Removed unnecessary priv version checks in mcountinhibit read/write= .
> 4. Added Heiko's reviewed-by/tested-by tags.
>
> Changes from v8->v9:
> 1. Added the write_done flags to the vmstate.
> 2. Fixed the hpmcounter read access from M-mode.
>
> Changes from v7->v8:
> 1. Removeding ordering constraints for mhpmcounter & mhpmevent. >
> Changes from v6->v7:
> 1. Fixed all the compilation errors for the usermode.
>
> Changes from v5->v6:
> 1. Fixed compilation issue with PATCH 1.
> 2. Addressed other comments.
>
> Changes from v4->v5:
> 1. Rebased on top of the -next with following patches.
>=C2=A0 =C2=A0 - isa extension
>=C2=A0 =C2=A0 - priv 1.12 spec
> 2. Addressed all the comments on v4
> 3. Removed additional isa-ext DT node in favor of riscv,isa string upd= ate
>
> Changes from v3->v4:
> 1. Removed the dummy events from pmu DT node.
> 2. Fixed pmu_avail_counters mask generation.
> 3. Added a patch to simplify the predicate function for counters.
>
> Changes from v2->v3:
> 1. Addressed all the comments on PATCH1-4.
> 2. Split patch1 into two separate patches.
> 3. Added explicit comments to explain the event types in DT node.
> 4. Rebased on latest Qemu.
>
> Changes from v1->v2:
> 1. Dropped the ACks from v1 as signficant changes happened after v1. > 2. sscofpmf support.
> 3. A generic counter management framework.
>
> [1] https://github.com= /riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc
> [2] https://drive.google.= com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit
> [3] https://github.com/atishp04/qemu/tree/ri= scv_pmu_v14
>
> Atish Patra (5):
> target/riscv: Add sscofpmf extension support
> target/riscv: Simplify counter predicate function
> target/riscv: Add few cache related PMU events
> hw/riscv: virt: Add PMU DT node to the device tree
> target/riscv: Update the privilege field for sscofpmf CSRs

Sorry, but this doesn't apply. Are you able to rebase it?


I am a bit confused. Your PULL request= on Sep 7th already included this & sstc series.
I can see th= e patches in upstream qemu as well.

=C2=A0
Alistair

>
> hw/riscv/virt.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 16 ++<= br> > target/riscv/cpu.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 12 ++
> target/riscv/cpu.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 25 +++
> target/riscv/cpu_bits.h=C2=A0 =C2=A0|=C2=A0 55 +++++
> target/riscv/cpu_helper.c |=C2=A0 25 +++
> target/riscv/csr.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 | 304 +++++++++++++++++-= ---------
> target/riscv/machine.c=C2=A0 =C2=A0 |=C2=A0 =C2=A01 +
> target/riscv/pmu.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 | 425 ++++++++++++++++++= +++++++++++++++++++-
> target/riscv/pmu.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A08 +
> 9 files changed, 760 insertions(+), 111 deletions(-)
>
> --
> 2.25.1
>
>
--00000000000017119c05e917bac3--