All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
To: Jay Vosburgh <jay.vosburgh@canonical.com>
Cc: David Miller <davem@davemloft.net>,
	eric.dumazet@gmail.com, netdev@vger.kernel.org,
	ogerlitz@mellanox.com, pshelar@nicira.com, jesse@nicira.com,
	discuss@openvswitch.org
Subject: Re: [PATCH net-next] fast_hash: clobber registers correctly for inline function use
Date: Fri, 14 Nov 2014 23:37:42 +0100	[thread overview]
Message-ID: <1416004662.15154.76.camel@localhost> (raw)
In-Reply-To: <18948.1416003002@famine>

Hi Jay,

On Fr, 2014-11-14 at 14:10 -0800, Jay Vosburgh wrote:
> Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> [...]
> >I created it via the function calling convention documented in
> >arch/x86/include/asm/calling.h, so I specified each register which a
> >function is allowed to clobber with.
> >
> >I currently cannot see how I can resolve the invalid constraints error
> >easily. :(
> >
> >So either go with my first patch, which I puts the alternative_call
> >switch point into its own function without ever inlining or the patch
> >needs to be reverted. :/
> 
> 	As a data point, I tested the first patch as well, and the
> system does not panic with it in place.  Inspection shows that it's
> using %r14 in place of %r8 in the prior (crashing) implementation.

Yes, I also could reproduce your oops and the first unoffical patch  and
the first offical one both fixed it. After that, I thought that just
adding more clobbers cannot introduce bugs, so I only did compile
testing until I hit a window where gcc got mad with the excessive use of
clobbered registers but haven't tested the inline call sites that much
(sorry). :(

> 	Disassembly of the call site (on the non-sse4_1 system) in
> ovs_flow_tbl_insert with the first patch applied looks like this:
> 
> 0xffffffffa00b6bb9 <ovs_flow_tbl_insert+0xb9>:	mov    %r15,0x348(%r14)
> 0xffffffffa00b6bc0 <ovs_flow_tbl_insert+0xc0>:	movzwl 0x28(%r15),%ecx
> 0xffffffffa00b6bc5 <ovs_flow_tbl_insert+0xc5>:	movzwl 0x2a(%r15),%esi
> 0xffffffffa00b6bca <ovs_flow_tbl_insert+0xca>:	movzwl %cx,%eax
> 0xffffffffa00b6bcd <ovs_flow_tbl_insert+0xcd>:	sub    %ecx,%esi
> 0xffffffffa00b6bcf <ovs_flow_tbl_insert+0xcf>:	lea    0x38(%r14,%rax,1),%rdi
> 0xffffffffa00b6bd4 <ovs_flow_tbl_insert+0xd4>:	sar    $0x2,%esi
> 0xffffffffa00b6bd7 <ovs_flow_tbl_insert+0xd7>:	callq  0xffffffff813a7810 <__jhash2>
> 0xffffffffa00b6bdc <ovs_flow_tbl_insert+0xdc>:	mov    %eax,0x30(%r14)
> 0xffffffffa00b6be0 <ovs_flow_tbl_insert+0xe0>:	mov    (%rbx),%r13
> 0xffffffffa00b6be3 <ovs_flow_tbl_insert+0xe3>:	mov    %r14,%rsi
> 0xffffffffa00b6be6 <ovs_flow_tbl_insert+0xe6>:	mov    %r13,%rdi
> 0xffffffffa00b6be9 <ovs_flow_tbl_insert+0xe9>:	callq  0xffffffffa00b61a0 <table_instance_insert>
> 
> 	Compared to the panicking version's function:
> 
> 0xffffffffa01a55c9 <ovs_flow_tbl_insert+0xb9>:  mov    %r15,0x348(%r8)
> 0xffffffffa01a55d0 <ovs_flow_tbl_insert+0xc0>:  movzwl 0x28(%r15),%ecx
> 0xffffffffa01a55d5 <ovs_flow_tbl_insert+0xc5>:  movzwl 0x2a(%r15),%esi
> 0xffffffffa01a55da <ovs_flow_tbl_insert+0xca>:  movzwl %cx,%eax
> 0xffffffffa01a55dd <ovs_flow_tbl_insert+0xcd>:  sub    %ecx,%esi
> 0xffffffffa01a55df <ovs_flow_tbl_insert+0xcf>:  lea    0x38(%r8,%rax,1),%rdi
> 0xffffffffa01a55e4 <ovs_flow_tbl_insert+0xd4>:  sar    $0x2,%esi
> 0xffffffffa01a55e7 <ovs_flow_tbl_insert+0xd7>:  callq  0xffffffff813a75c0 <__jhash2>
> 0xffffffffa01a55ec <ovs_flow_tbl_insert+0xdc>:  mov    %eax,0x30(%r8)
> 0xffffffffa01a55f0 <ovs_flow_tbl_insert+0xe0>:  mov    (%rbx),%r13
> 0xffffffffa01a55f3 <ovs_flow_tbl_insert+0xe3>:  mov    %r8,%rsi
> 0xffffffffa01a55f6 <ovs_flow_tbl_insert+0xe6>:  mov    %r13,%rdi
> 0xffffffffa01a55f9 <ovs_flow_tbl_insert+0xe9>:  callq  0xffffffffa01a4ba0 <table_instance_insert>
> 
> 	It appears to generate the same instructions, but allocates
> registers differently (using %r14 instead of %r8).

Exactly and that makes sense. While %r8 must be available for the callee
to be clobbered with, %r14 must be saved by the callee and restored
before returning. So you pass the responsibility down to the other
functions, which tries not to touch %r14 because it knows it will have
to generate code for saving and restoring.

That's the reason why I actually like the the static inline clobbering
approach so much, it gives gcc possibilities to move around the
save/restore cycles and decide itself just by aligning which registers
to use.

Also the first version does work flawlessly (which I didn't send as a
patch but only as a diff in the mail). Here gcc synthesizes a full
function call which has the same effect as the long clobber list, only
it does chain two calls right behind each other.

> 	The __jhash2 disassembly appears to be unchanged between the two
> versions.

Thanks for looking into this!

It is actually pretty hairy to come up with a good solution for this,
because with the alternative interface you are only allowed to alter one
instruction. jump_tables also don't work because I currently have the
opinion that they do the switch way too late. I absolutely don't want to
have inserts into a hashtable with different hashing functions depending
how early during boot they took place.

Bye,
Hannes

  reply	other threads:[~2014-11-14 22:37 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-14 14:06 [PATCH net-next] fast_hash: clobber registers correctly for inline function use Hannes Frederic Sowa
2014-11-14 14:40 ` [PATCH net-next v2] " Hannes Frederic Sowa
2014-11-14 14:50 ` [PATCH net-next] " Eric Dumazet
2014-11-14 15:13   ` Hannes Frederic Sowa
2014-11-14 15:33     ` Eric Dumazet
2014-11-14 15:46       ` Hannes Frederic Sowa
2014-11-14 18:38         ` David Miller
2014-11-14 19:02           ` Cong Wang
2014-11-14 20:42             ` Hannes Frederic Sowa
2014-11-14 21:35               ` David Miller
2014-11-14 19:05           ` [PATCH net-next] Revert "fast_hash: avoid indirect function calls" Jay Vosburgh
2014-11-14 21:36             ` David Miller
2014-11-14 21:43             ` Hannes Frederic Sowa
2014-11-14 20:04           ` [PATCH net-next] fast_hash: clobber registers correctly for inline function use Hannes Frederic Sowa
2014-11-14 20:10             ` Hannes Frederic Sowa
2014-11-14 20:15             ` Jay Vosburgh
2014-11-14 20:35               ` Hannes Frederic Sowa
2014-11-14 22:10                 ` Jay Vosburgh
2014-11-14 22:37                   ` Hannes Frederic Sowa [this message]
2014-11-14 15:17   ` [PATCH net-next v3] " Hannes Frederic Sowa
2014-11-14 17:57     ` Jay Vosburgh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1416004662.15154.76.camel@localhost \
    --to=hannes@stressinduktion.org \
    --cc=davem@davemloft.net \
    --cc=discuss@openvswitch.org \
    --cc=eric.dumazet@gmail.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=jesse@nicira.com \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=pshelar@nicira.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.