public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Stack trace of csum_partial_copy_generic
@ 2016-05-13 11:07 Nikolay Borisov
  2016-05-16 18:28 ` Josh Poimboeuf
  0 siblings, 1 reply; 2+ messages in thread
From: Nikolay Borisov @ 2016-05-13 11:07 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: Linux-Kernel@Vger. Kernel. Org

Hello Josh, 

I'd like to ask you whether objtool is supposed to produce a 
warning when arch/x86/lib/csum-copy_64.o (produced from 
arch/x86/lib/csum-copy_64.S). Since I cannot see any specific 
usage of rbp for defining a stackframe. I'm chasing against 
poor performance of a network benchmark and this is what perf produces: 

# Overhead          Command          Shared Object                                         Symbol
# ........  ...............  .....................  .............................................
#
    37.30%            iperf  [kernel.kallsyms]      [k] csum_partial_copy_generic                
                      |
                      --- csum_partial_copy_generic
                         |          
                         |--99.98%-- 0x7f809108b7cd
                         |          |          
                         |          |--69.72%-- 0x20000
                         |          |          
                         |           --30.28%-- 0x7f809108b7c2
                         |                     0x20000
                          --0.02%-- [...]

So this is not very helpful in tracing where this is being 
called from. Presumably somewhere from the networking layer. So 
should objtool catch this or since csum_partial_copy_generic is a leaf
function reliable stack trace isn't needed? Furthermore this function 
is called from C wrapper in csum-wrappers_64.c - shouldn't at least
they be present in the callstack?

This is on 4.6 master from linus and CONFIG_FRAME_POINTER being enabled. 

Regards, 
Nikolay

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Stack trace of csum_partial_copy_generic
  2016-05-13 11:07 Stack trace of csum_partial_copy_generic Nikolay Borisov
@ 2016-05-16 18:28 ` Josh Poimboeuf
  0 siblings, 0 replies; 2+ messages in thread
From: Josh Poimboeuf @ 2016-05-16 18:28 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Linux-Kernel@Vger. Kernel. Org

Hi Nikolay,

On Fri, May 13, 2016 at 02:07:47PM +0300, Nikolay Borisov wrote:
> Hello Josh, 
> 
> I'd like to ask you whether objtool is supposed to produce a 
> warning when arch/x86/lib/csum-copy_64.o (produced from 
> arch/x86/lib/csum-copy_64.S). Since I cannot see any specific 
> usage of rbp for defining a stackframe. I'm chasing against 
> poor performance of a network benchmark and this is what perf produces: 
> 
> # Overhead          Command          Shared Object                                         Symbol
> # ........  ...............  .....................  .............................................
> #
>     37.30%            iperf  [kernel.kallsyms]      [k] csum_partial_copy_generic                
>                       |
>                       --- csum_partial_copy_generic
>                          |          
>                          |--99.98%-- 0x7f809108b7cd
>                          |          |          
>                          |          |--69.72%-- 0x20000
>                          |          |          
>                          |           --30.28%-- 0x7f809108b7c2
>                          |                     0x20000
>                           --0.02%-- [...]
> 
> So this is not very helpful in tracing where this is being 
> called from. Presumably somewhere from the networking layer. So 
> should objtool catch this or since csum_partial_copy_generic is a leaf
> function reliable stack trace isn't needed?

Right, since it's a leaf function, objtool ignores it and lets it do
whatever it wants with the frame pointer.

> Furthermore this function is called from C wrapper in
> csum-wrappers_64.c - shouldn't at least they be present in the
> callstack?

I suspect the problem is that it can't walk the stack because the
function overwrites the rbp register.  Try replacing all uses of rbp in
that function with another register.  r15?

(Another solution would be to tell perf to use DWARF unwinding instead
of frame pointers, but currently, kernel asm code doesn't have any DWARF
annotations.  I'm planning on adding support for that soon in the 4.8
timeframe by generating DWARF metadata using objtool.)

-- 
Josh

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-05-16 18:28 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-13 11:07 Stack trace of csum_partial_copy_generic Nikolay Borisov
2016-05-16 18:28 ` Josh Poimboeuf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox