public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Paul Mackerras <paulus@samba.org>
Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
Date: Fri, 13 Aug 2021 16:08:13 +1000	[thread overview]
Message-ID: <1628834356.pr4zgn1xf1.astroid@bobo.none> (raw)
In-Reply-To: <b286e07fb771a664b631cd07a40b09c06f26e64b.1618331881.git.christophe.leroy@csgroup.eu>

Excerpts from Christophe Leroy's message of April 14, 2021 2:38 am:
> powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.
> 
> For catching simple conditions like a variable having value 0, this
> is efficient because it does the test and the trap at the same time.
> But most conditions used with BUG_ON or WARN_ON are more complex and
> forces GCC to format the condition into a 0 or 1 value in a register.
> This will usually require 2 to 3 instructions.
> 
> The most efficient solution would be to use __builtin_trap() because
> GCC is able to optimise the use of the different trap instructions
> based on the requested condition, but this is complex if not
> impossible for the following reasons:
> - __builtin_trap() is a non-recoverable instruction, so it can't be
> used for WARN_ON
> - Knowing which line of code generated the trap would require the
> analysis of DWARF information. This is not a feature we have today.
> 
> As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops")
> the way WARN_ON() is implemented is suboptimal. That commit also
> mentions an issue with 'long long' condition. It fixed it for
> WARN_ON() but the same problem still exists today with BUG_ON() on
> PPC32. It will be fixed by using the generic implementation.
> 
> By using the generic implementation, gcc will naturally generate a
> branch to the unconditional trap generated by BUG().
> 
> As modern powerpc implement zero-cycle branch,
> that's even more efficient.
> 
> And for the functions using WARN_ON() and its return, the test
> on return from WARN_ON() is now also used for the WARN_ON() itself.
> 
> On PPC64 we don't want it because we want to be able to use CFAR
> register to track how we entered the code that trapped. The CFAR
> register would be clobbered by the branch.
> 
> A simple test function:
> 
> 	unsigned long test9w(unsigned long a, unsigned long b)
> 	{
> 		if (WARN_ON(!b))
> 			return 0;
> 		return a / b;
> 	}
> 
> Before the patch:
> 
> 	0000046c <test9w>:
> 	 46c:	7c 89 00 34 	cntlzw  r9,r4
> 	 470:	55 29 d9 7e 	rlwinm  r9,r9,27,5,31
> 	 474:	0f 09 00 00 	twnei   r9,0
> 	 478:	2c 04 00 00 	cmpwi   r4,0
> 	 47c:	41 82 00 0c 	beq     488 <test9w+0x1c>
> 	 480:	7c 63 23 96 	divwu   r3,r3,r4
> 	 484:	4e 80 00 20 	blr
> 
> 	 488:	38 60 00 00 	li      r3,0
> 	 48c:	4e 80 00 20 	blr
> 
> After the patch:
> 
> 	00000468 <test9w>:
> 	 468:	2c 04 00 00 	cmpwi   r4,0
> 	 46c:	41 82 00 0c 	beq     478 <test9w+0x10>
> 	 470:	7c 63 23 96 	divwu   r3,r3,r4
> 	 474:	4e 80 00 20 	blr
> 
> 	 478:	0f e0 00 00 	twui    r0,0
> 	 47c:	38 60 00 00 	li      r3,0
> 	 480:	4e 80 00 20 	blr

That's clearly better because we have a branch anyway.

> 
> So we see before the patch we need 3 instructions on the likely path
> to handle the WARN_ON(). With the patch the trap goes on the unlikely
> path.
> 
> See below the difference at the entry of system_call_exception where
> we have several BUG_ON(), allthough less impressing.
> 
> With the patch:
> 
> 	00000000 <system_call_exception>:
> 	   0:	81 6a 00 84 	lwz     r11,132(r10)
> 	   4:	90 6a 00 88 	stw     r3,136(r10)
> 	   8:	71 60 00 02 	andi.   r0,r11,2
> 	   c:	41 82 00 70 	beq     7c <system_call_exception+0x7c>
> 	  10:	71 60 40 00 	andi.   r0,r11,16384
> 	  14:	41 82 00 6c 	beq     80 <system_call_exception+0x80>
> 	  18:	71 6b 80 00 	andi.   r11,r11,32768
> 	  1c:	41 82 00 68 	beq     84 <system_call_exception+0x84>
> 	  20:	94 21 ff e0 	stwu    r1,-32(r1)
> 	  24:	93 e1 00 1c 	stw     r31,28(r1)
> 	  28:	7d 8c 42 e6 	mftb    r12
> 	...
> 	  7c:	0f e0 00 00 	twui    r0,0
> 	  80:	0f e0 00 00 	twui    r0,0
> 	  84:	0f e0 00 00 	twui    r0,0
> 
> Without the patch:
> 
> 	00000000 <system_call_exception>:
> 	   0:	94 21 ff e0 	stwu    r1,-32(r1)
> 	   4:	93 e1 00 1c 	stw     r31,28(r1)
> 	   8:	90 6a 00 88 	stw     r3,136(r10)
> 	   c:	81 6a 00 84 	lwz     r11,132(r10)
> 	  10:	69 60 00 02 	xori    r0,r11,2
> 	  14:	54 00 ff fe 	rlwinm  r0,r0,31,31,31
> 	  18:	0f 00 00 00 	twnei   r0,0
> 	  1c:	69 60 40 00 	xori    r0,r11,16384
> 	  20:	54 00 97 fe 	rlwinm  r0,r0,18,31,31
> 	  24:	0f 00 00 00 	twnei   r0,0
> 	  28:	69 6b 80 00 	xori    r11,r11,32768
> 	  2c:	55 6b 8f fe 	rlwinm  r11,r11,17,31,31
> 	  30:	0f 0b 00 00 	twnei   r11,0
> 	  34:	7d 8c 42 e6 	mftb    r12

This one possibly the branches end up in predictors, whereas conditional 
trap is always just speculated not to hit. Branches may also have a
throughput limit on execution whereas trap could be more (1 per cycle
vs 4 per cycle on POWER9).

On typical ppc32 CPUs, maybe it's a more obvious win. As you say there
is the CFAR issue as well which makes it a problem for 64s. It would
have been nice if it could use the same code though.

Maybe one day gcc's __builtin_trap() will become smart enough around
conditional statements that it it generates better code and tries to
avoid branches.

Thanks,
Nick

  parent reply	other threads:[~2021-08-13  6:08 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-13 16:38 [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 Christophe Leroy
2021-04-13 16:38 ` [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto Christophe Leroy
2021-08-13  6:19   ` Nicholas Piggin
2021-08-15  3:49   ` Michael Ellerman
2021-08-25 21:25   ` Nathan Chancellor
2021-08-26  3:21     ` Michael Ellerman
2021-08-26  6:37       ` Christophe Leroy
2021-08-26 13:47         ` Segher Boessenkool
2021-08-26 14:45         ` Michael Ellerman
2021-08-26 14:53           ` Christophe Leroy
2021-08-26 14:12       ` Segher Boessenkool
2021-08-26 18:54       ` Nathan Chancellor
2021-08-26 23:55         ` Nathan Chancellor
2021-08-27  7:53           ` Michael Ellerman
2021-08-13  6:08 ` Nicholas Piggin [this message]
2021-08-18 15:06   ` [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32 Segher Boessenkool
2021-08-26  3:26     ` Nicholas Piggin
2021-08-26 12:49       ` Segher Boessenkool
2021-08-26 13:57         ` Nicholas Piggin
2021-08-26 14:37           ` Segher Boessenkool
2021-08-26 15:04             ` Nicholas Piggin
2021-08-26 15:30               ` Segher Boessenkool
2021-08-27  1:28                 ` Nicholas Piggin
2021-08-18 13:38 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1628834356.pr4zgn1xf1.astroid@bobo.none \
    --to=npiggin@gmail.com \
    --cc=benh@kernel.crashing.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox