Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Richard Henderson <richard.henderson@linaro.org>
To: Max Chou <max.chou@sifive.com>, qemu-devel@nongnu.org
Cc: qemu-arm@nongnu.org, qemu-ppc@nongu.org, qemu-s390x@nongnu.org,
	qemu-riscv@nongnu.org, balaton@eik.bme.hu
Subject: Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff
Date: Tue, 16 Jul 2024 07:42:51 +1000	[thread overview]
Message-ID: <3d86bfd4-e6ef-464a-a663-21e95e1f6cf6@linaro.org> (raw)
In-Reply-To: <5a83ce12-9a9c-4c9d-9fee-e37fb6afd19d@sifive.com>

On 7/15/24 17:06, Max Chou wrote:
>> +                /* Probe nonfault on subsequent elements. */
>> +                flags = probe_access_flags(env, addr, offset, MMU_DATA_LOAD,
>> +                                           mmu_index, true, &host, 0);
>> +                if (flags) {
> According to the section 7.7. Unit-stride Fault-Only-First Loads in the v spec (v1.0)
> 
>       When the fault-only- data-watchpoint trap on an element after the      
> implementations should not reduce vl but instead should trigger the debug trap as 
> otherwise the event might be lost.

Hmm, ok.  Interesting.


> And I think that there is a potential issue in the original implementation that maybe we 
> can fix in this patch.
> 
> We need to assign the correct element load size to the probe_access_internal function 
> called by tlb_vaddr_to_host in original implementation or is called directly in this patch.
> The size parameter will be used by the pmp_hart_has_privs function to do the physical 
> memory protection (PMP) checking.
> If we set the size parameter to the remain page size, we may get unexpected trap caused by 
> the PMP rules that covered the regions of masked-off elements.
> 
> Maybe we can replace the while loop liked below.
> 
> 
> vext_ldff(void *vd, void *v0, target_ulong base,
>            ...
> {
>      ...
>      uint32_t size = nf << log2_esz;
> 
>      VSTART_CHECK_EARLY_EXIT(env);
> 
>      /* probe every access */
>      for (i = env->vstart; i < env->vl; i++) {
>          if (!vm && !vext_elem_mask(v0, i)) {
>              continue;
>          }
>          addr = adjust_addr(env, base + i * size);
>          if (i == 0) {
>              probe_pages(env, addr, size, ra, MMU_DATA_LOAD);
>          } else {
>              /* if it triggers an exception, no need to check watchpoint */
>              void *host;
>              int flags;
> 
>              /* Probe nonfault on subsequent elements. */
>              flags = probe_access_flags(env, addr, size, MMU_DATA_LOAD,
>                      mmu_index, true, &host, 0);
>              if (flags & ~TLB_WATCHPOINT) {
>                  /*
>                   * Stop any flag bit set:
>                   *   invalid (unmapped)
>                   *   mmio (transaction failed)
>                   * In all cases, handle as the first load next time.
>                   */
>                  vl = i;
>                  break;
>              }
>          }
>      }

No, I don't think repeated probing is a good idea.
You'll lose everything you attempted to gain with the other improvements.

It seems, to handle watchpoints, you need to start by probing the entire length non-fault. 
  That will tell you if any portion of the length has any of the problem cases.  The fast 
path will not, of course.

After probing, you have flags for the 1 or two pages, and you can make a choice about the 
actual load length:

   - invalid on first page: either the first element faults,
     or you need to check PMP via some alternate mechanism.
     Do not be afraid to add something to CPUTLBEntryFull.extra.riscv
     during tlb_fill in order to accelerate this, if needed.

   - mmio on first page: just one element, as the second might fault
     during the transaction.

     It would be possible to enhance riscv_cpu_do_transaction_failed to
     suppress the fault and set a flag noting the fault.  This would allow
     multiple elements to be loaded, at the expense of another check after
     each element within the slow tlb-load path.  I don't know if this is
     desirable, really.  Using vector operations on mmio is usually a
     programming error.  :-)

   - invalid or mmio on second page, continue to the end of the first page.

Once we have the actual load length, handle watchpoints by hand.
See sve_cont_ldst_watchpoints.

Finally, the loop loading the elements, likely in ram via host pointer.


r~

     prev parent reply	other threads:[~2024-07-15 21:43 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
2024-07-10  3:28 ` [PATCH v2 01/13] accel/tcg: Move {set, clear}_helper_retaddr to cpu_ldst.h Richard Henderson
2024-07-12 12:48   ` Peter Maydell
2024-07-10  3:28 ` [PATCH v2 02/13] target/arm: Use cpu_env in cpu_untagged_addr Richard Henderson
2024-07-12 12:49   ` Peter Maydell
2024-07-10  3:28 ` [PATCH v2 03/13] target/arm: Use set/clear_helper_retaddr in helper-a64.c Richard Henderson
2024-07-12 12:53   ` Peter Maydell
2024-07-10  3:28 ` [PATCH v2 04/13] target/arm: Use set/clear_helper_retaddr in SVE and SME helpers Richard Henderson
2024-07-12 13:00   ` Peter Maydell
2024-07-10  3:28 ` [PATCH v2 05/13] target/ppc/mem_helper.c: Remove a conditional from dcbz_common() Richard Henderson
2024-07-10  3:28 ` [PATCH v2 06/13] target/ppc: Hoist dcbz_size out of dcbz_common Richard Henderson
2024-07-10 12:11   ` BALATON Zoltan
2024-07-10 14:36     ` Richard Henderson
2024-07-10  3:28 ` [PATCH v2 07/13] target/ppc: Split out helper_dbczl for 970 Richard Henderson
2024-07-10 12:17   ` BALATON Zoltan
2024-07-10  3:28 ` [PATCH v2 08/13] target/ppc: Merge helper_{dcbz,dcbzep} Richard Henderson
2024-07-10 12:20   ` BALATON Zoltan
2024-07-10 14:41     ` Richard Henderson
2024-07-10  3:28 ` [PATCH v2 09/13] target/ppc: Improve helper_dcbz for user-only Richard Henderson
2024-07-10 12:25   ` BALATON Zoltan
2024-07-10 14:42     ` Richard Henderson
2024-07-10  3:28 ` [PATCH v2 10/13] target/s390x: Use user_or_likely in do_access_memset Richard Henderson
2024-07-12 13:02   ` Peter Maydell
2024-07-10  3:28 ` [PATCH v2 11/13] target/s390x: Use user_or_likely in access_memmove Richard Henderson
2024-07-10  3:28 ` [PATCH v2 12/13] target/s390x: Use set/clear_helper_retaddr in mem_helper.c Richard Henderson
2024-07-10  3:28 ` [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff Richard Henderson
2024-07-10  4:09   ` Alistair Francis
2024-07-15  7:06   ` Max Chou
2024-07-15 21:42     ` Richard Henderson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d86bfd4-e6ef-464a-a663-21e95e1f6cf6@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=balaton@eik.bme.hu \
    --cc=max.chou@sifive.com \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongu.org \
    --cc=qemu-riscv@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).