Re: Re: Re: Re: Re: Re: [PATCH V2 1/3] riscv: Add Zicbop instruction definitions & cpufeature

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Leonardo Bras <leobras@redhat.com>
To: Andrew Jones <ajones@ventanamicro.com>
Cc: Leonardo Bras <leobras@redhat.com>,
	guoren@kernel.org, paul.walmsley@sifive.com, palmer@dabbelt.com,
	panqinglin2020@iscas.ac.cn, bjorn@rivosinc.com,
	conor.dooley@microchip.com, peterz@infradead.org,
	keescook@chromium.org, wuwei2016@iscas.ac.cn,
	xiaoguang.xing@sophgo.com, chao.wei@sophgo.com,
	unicorn_wang@outlook.com, uwu@icenowy.me, jszhang@kernel.org,
	wefu@redhat.com, atishp@atishpatra.org,
	linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
	Guo Ren <guoren@linux.alibaba.com>
Subject: Re: Re: Re: Re: Re: Re: [PATCH V2 1/3] riscv: Add Zicbop instruction definitions & cpufeature
Date: Mon,  8 Jan 2024 13:14:23 -0300	[thread overview]
Message-ID: <ZZwfX6PMqeKDsR-z@LeoBras> (raw)
In-Reply-To: <20240108-c47754d958917d5d5e1efc75@orel>

On Mon, Jan 08, 2024 at 04:24:59PM +0100, Andrew Jones wrote:
> On Mon, Jan 08, 2024 at 11:34:15AM -0300, Leonardo Bras wrote:
> > On Fri, Jan 05, 2024 at 02:24:45PM +0100, Andrew Jones wrote:
> > > On Thu, Jan 04, 2024 at 02:43:04PM -0300, Leonardo Bras wrote:
> > > ...
> > > > > > > > I don't think we can detect a caller with non-zero offset at compile time, 
> > > > > > > > since it will be used in locks which can be at (potentially) any place in 
> > > > > > > > the block size. (if you have any idea though, please let me know :) )
> > > > > 
> > > > > I forgot to reply to this before. The reason I think it may be possible to
> > > > > validate offset at compile time is because it must be a constant, i.e.
> > > > > __builtin_constant_p(offset) must return true. So maybe something like
> > > > > 
> > > > >  static_assert(__builtin_constant_p(offset) && !(offset & 0x1f))
> > > > > 
> > > > > I'll try to find time to play with it.
> > > > > 
> > > > 
> > > > Let me know if you find anything.
> > > 
> > > There's nothing we can do in this file (insn-def.h), other than maybe
> > > masking, since all magic must happen at preprocessor time, other than
> > > a tiny bit of constant arithmetic allowed at assembly time. For C, using
> > > a wrapper, like patch 2 of this series introduces, we could add the
> > > static assert above. I'll suggest that in patch 2, since I've already
> > > thought it through, but it sort of feels like overkill to me.
> > 
> > It makes sense.
> > 
> > > 
> > > > 
> > > > > > > > 
> > > > > > > > On the other hand, we could create a S-Type macro which deliberately 
> > > > > > > > ignores imm[4:0], like  
> > > > > > > > 
> > > > > > > > + INSN_S_TRUNCATE(OPCODE_OP_IMM, FUNC3(6), __RS2(3),               \
> > > > > > > > +                 SIMM12(offset), RS1(base))
> > > > > > > > 
> > > > > > > > Which saves the bits 11:5 of offset  into imm[11:5], and zero-fill 
> > > > > > > > imm[4:0], like
> > > > > > > > 
> > > > > > > > +#define DEFINE_INSN_S                                                    \
> > > > > > > > + __DEFINE_ASM_GPR_NUMS                                           \
> > > > > > > > +"        .macro insn_s, opcode, func3, rs2, simm12, rs1\n"               \
> > > > > > > > +"        .4byte  ((\\opcode << " __stringify(INSN_S_OPCODE_SHIFT) ") |"  \
> > > > > > > > +"                 (\\func3 << " __stringify(INSN_S_FUNC3_SHIFT) ") |"    \
> > > > > > > > +"                 (.L__gpr_num_\\rs2 << " __stringify(INSN_S_RS2_SHIFT) ") |" \
> > > > > > > > +"                 (.L__gpr_num_\\rs1 << " __stringify(INSN_S_RS1_SHIFT) ") |" \
> > > > > > > > +"                 (((\\simm12 >> 5) & 0x7f) << " __stringify(INSN_S_SIMM7_SHIFT) "))\n" \
> > > > > > > > +"        .endm\n"
> > > > > > > > +
> > > > > > > > 
> > > > > > > > Does this make sense?
> > > > > > > 
> > > > > > > If we create a special version of INSN_S, then I suggest we create one
> > > > > > > where its two SIMM fields are independent and then define prefetch
> > > > > > > instructions like this
> > > > > > > 
> > > > > > >  #define PREFETCH_W(base, offset) \
> > > > > > >      INSN_S_SPLIT_IMM(OPCODE_OP_IMM, FUNC3(6), __RS2(3), \
> > > > > > >          SIMM_11_5(offset >> 5), SIMM_4_0(0), RS1(base))
> > > > > > > 
> > > > > > > which would allow simple review against the spec and potentially
> > > > > > > support other instructions which use hard coded values in the
> > > > > > > immediate fields.
> > > > > > > 
> > > > > > 
> > > > > > I agree, it looks better this way.
> > > > > > 
> > > > > > We could have:
> > > > > > INSN_S_SPLIT_IMM(OPCODE, FUNC3, RS1, RS2, SIMM_11_5, SIMM_4_0)
> > > > > > 
> > > > > > and implement INSN_S like:
> > > > > > #define INSN_S(OPCODE, FUNC3, RS1, RS2, SIMM_11_0) \
> > > > > > 	INSN_S_SPLIT_IMM(OPCODE, FUNC3, RS1, RS2,  \
> > > > > > 		SIMM_11_0 >> 5, SIMM_11_0 & 0x1f)
> > > > > 
> > > > > That won't work since SIMM_11_0 will be a string. Actually, with
> > > > > stringification in mind, I don't think defining INSN_S_SPLIT_IMM()
> > > > > is a good idea.
> > > > 
> > > > I don't see how SIMM_11_0 will be a string here. Is this due to using it 
> > > > on asm code?
> > > > 
> > > > I understand a user will call 
> > > > ---
> > > > PREFETCH_W(base, offset), which becomes:
> > > > 
> > > > INSN_S(OPCODE_OP_IMM, 6, base, 3, offset) , which becomes:
> > > > 
> > > > INSN_S_SPLIT_IMM(OPCODE_OP_IMM, FUNC3(6), RS1(base), RS2(3), \
> > > > 	SIMM_11_5(offset >> 5), SIMM_4_0(offset & 0x1f))
> > > 
> > > The other annotations, like SIMM12, stringify their arguments. So, if
> > > SIMM_11_5 and SIMM_4_0 also stringified, then it wouldn't be possible
> > > to recombine them into a simm12 for the '.insn s' directive. I suppose
> > > SIMM_11_5 and SIMM_4_0 could just expand their arguments without
> > > stringifying. With that, along with throwing even more ugly at it, then
> > > it is possible to get the end result we want, which is
> > > 
> > >  - PREFETCH_* instructions are defined with annotations and have a
> > >    SIMM_4_0(0) in their definitions to explicitly point out that field
> > > 
> > >  - the INSN_S definition still directly maps to the .insn s directive
> > > 
> > > 
> > > I got that to work with this
> > > 
> > > #define __RV_SIMM(v)           v
> > > #define RV___SIMM_11_5(v)      __RV_SIMM(v)
> > > #define RV___SIMM_4_0(v)       __RV_SIMM(v)
> > > 
> > > #define __INSN_S_SPLIT_IMM(opcode, func3, rs2, simm12, rs1) \
> > >         INSN_S(opcode, func3, rs2, SIMM12(simm12), rs1)
> > > 
> > > #define INSN_S_SPLIT_IMM(opcode, func3, rs2, simm_11_5, simm_4_0, rs1) \
> > >         __INSN_S_SPLIT_IMM(opcode, func3, rs2, (RV_##simm_11_5 << 5) | RV_##simm_4_0, rs1)
> > > 
> > > #define CBO_PREFETCH_W(base, offset)                            \
> > >         INSN_S_SPLIT_IMM(OPCODE_OP_IMM, FUNC3(6), __RS2(3),     \
> > >                 __SIMM_11_5((offset) >> 5), __SIMM_4_0(0), RS1(base))
> > > 
> > > 
> > > But, again, I feel like it's probably overkill...
> > 
> > I though our intention was to avoid the extra IMM masking in asm, while 
> > keeping the 5 lower bits zeroed at all times.
> > 
> > But IIUC here you are writing a insn_s_split_imm in terms of a plain 
> > insn_s, which guarantees the zeroed 5 lower bits but still does an 
> > unnecessaty masking in asm. In fact, if we use the split version, it 
> > concatenates the two IMM parts, to then split them again in order to build 
> > the instruction.
> 
> That's backwards.
> 
> INSN_S should map to its namesake directive '.insn s', which takes the
> immediate as a single simm12. simm7 and simm5 are only used in the
> fallback path.

Wait, but is not the final instruction build with a split simm7 and simm5 ?

Oh, ok, we are probably creating an asm with mnemonics first, and leaving 
it to the assembler to split the simm7 and simm5 from simm12. It does make 
sense.

Sorry for the confusion, I was wrongly thinking on terms of final bytecode.

> Also, it's likely few instructions care about the split.
> Other S-type instructions would want to treat the immediate as 12 bits,
> so INSN_S should not be written in terms of INSN_S_SPLIT_IMM, since we'd
> split a 12-bit immediate for those instructions just to have it merged
> again for the .insn s directive.

Yeah, makes sense now.

> 
> > 
> > In my suggestion above, we make INSN_S_SPLIT_IMM() the helper / standard 
> > way to write an s-type and write a INSN_S() in terms of the SPLIT version.
> > 
> > This allows using the split version when we don't need one of the halfs, 
> > thus avoiding a masking or a rotation. The full version just splits the 
> > halfs and pass to the split version that directly builds the instruction.
> > 
> > Well, I think I was under the wrong impression that we wanted to avoid the 
> > rotation and masking, but I just noticed that we are just dealing with 
> > splitting the offset, which is supposed to be a constant during the 
> > generation of the instruction, so we can just guarantee the value being 
> > masked at no runtime cost.
> 
> Right, there's no reason to avoid the rotation and masking with respect to
> performance, as it's all done at compile time. Avoiding the operations as
> much as possible is nice, though, since they're ugly and, with macro
> arguments, getting the parentheses right to ensure the correct order of
> operations is error prone.
> 
> > 
> > So in the end we are just thinking on how it could look better to the user, 
> > and maybe the split version is unnecessary if the user guarantees the 
> > masking to be correct. But if we are going to have it, I suggest we do 
> > INSN_S in terms of INSN_S_SPLIT_IMM() instead of the other way around.
> 
> Yes, the goal was to have __SIM_4_0(0) for the prefetch instructions to
> make it simple to review against the spec, but ((offset) & ~0x1f) is also
> simple to review.

Agreed

> 
> For the next revision of this series, I'd be happy with just the masking
> or even to just leave it up to the callers as this version does.
> 
> Thanks,
> drew
> 

Thank you!
Leo

next prev parent reply	other threads:[~2024-01-08 16:14 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-31  8:29 [PATCH V2 0/3] riscv: Add Zicbop & prefetchw support guoren
2023-12-31  8:29 ` [PATCH V2 1/3] riscv: Add Zicbop instruction definitions & cpufeature guoren
2024-01-02 10:32   ` Andrew Jones
2024-01-03  6:13     ` Guo Ren
2024-01-03  6:49       ` Andrew Jones
2024-01-03 19:44         ` Andrew Jones
2024-01-03 19:06     ` Leonardo Bras
2024-01-03  9:31   ` Clément Léger
2024-01-03 12:00     ` Andrew Jones
2024-01-11 10:31       ` Clément Léger
2024-01-11 10:45         ` Andrew Jones
2024-01-11 10:49           ` Clément Léger
2024-01-11 11:12             ` Conor Dooley
2024-01-03 18:52   ` Leonardo Bras
2024-01-03 19:29     ` Andrew Jones
2024-01-03 20:33       ` Leonardo Bras
2024-01-04  9:47         ` Andrew Jones
2024-01-04 15:03           ` Leonardo Bras
2024-01-04 16:40             ` Andrew Jones
2024-01-04 17:43               ` Leonardo Bras
2024-01-05 13:24                 ` Andrew Jones
2024-01-08 14:34                   ` Leonardo Bras
2024-01-08 15:24                     ` Andrew Jones
2024-01-08 16:14                       ` Leonardo Bras [this message]
2024-01-03 19:48   ` Andrew Jones
2024-01-03 20:34     ` Leonardo Bras
2023-12-31  8:29 ` [PATCH V2 2/3] riscv: Add ARCH_HAS_PRETCHW support with Zibop guoren
2024-01-01  2:29   ` Guo Ren
2024-01-03 19:04     ` Leonardo Bras
2024-01-02 10:45   ` Andrew Jones
2024-01-03  6:19     ` Guo Ren
2024-01-03 19:56       ` Andrew Jones
2024-01-05 13:31     ` Andrew Jones
2023-12-31  8:29 ` [PATCH V2 3/3] riscv: xchg: Prefetch the destination word for sc.w guoren
2024-01-02 11:18   ` Andrew Jones
2024-01-03  6:15     ` Guo Ren
2024-01-03 19:45       ` Leonardo Bras
2024-01-04  1:24         ` Guo Ren
2024-01-04  3:56           ` Leonardo Bras
2024-01-04  8:14             ` Guo Ren
2024-01-04 14:17               ` Leonardo Bras
2024-01-05  1:13                 ` Guo Ren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZZwfX6PMqeKDsR-z@LeoBras \
    --to=leobras@redhat.com \
    --cc=ajones@ventanamicro.com \
    --cc=atishp@atishpatra.org \
    --cc=bjorn@rivosinc.com \
    --cc=chao.wei@sophgo.com \
    --cc=conor.dooley@microchip.com \
    --cc=guoren@kernel.org \
    --cc=guoren@linux.alibaba.com \
    --cc=jszhang@kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=palmer@dabbelt.com \
    --cc=panqinglin2020@iscas.ac.cn \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=unicorn_wang@outlook.com \
    --cc=uwu@icenowy.me \
    --cc=wefu@redhat.com \
    --cc=wuwei2016@iscas.ac.cn \
    --cc=xiaoguang.xing@sophgo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox