Linux DTrace development list
 help / color / mirror / Atom feed
From: Nick Alcock <nick.alcock@oracle.com>
To: Kris Van Hees <kris.van.hees@oracle.com>
Cc: Nick Alcock <nick.alcock@oracle.com>,
	Eugene Loh <eugene.loh@oracle.com>, <dtrace@lists.linux.dev>,
	<dtrace-devel@oss.oracle.com>
Subject: Re: [PATCH 01/14] Fix stack-skip counts for caller and stackdepth
Date: Mon, 23 Jun 2025 15:04:00 +0100	[thread overview]
Message-ID: <87o6uey2pb.fsf@esperi.org.uk> (raw)
In-Reply-To: <aFQ7jfnA5eKnDMRc@oracle.com> (Kris Van Hees's message of "Thu, 19 Jun 2025 12:32:13 -0400")

On 19 Jun 2025, Kris Van Hees uttered the following:

> On Thu, Jun 19, 2025 at 12:20:22PM -0400, Kris Van Hees wrote:
>> On Thu, Jun 19, 2025 at 02:03:56PM +0100, Nick Alcock wrote:
>> > On 16 Jun 2025, Eugene Loh verbalised:
>> > 
>> > > On 6/13/25 10:33, Nick Alcock wrote:
>> > >
>> > >>> Note that we declare the skip count volatile.  The compiler might
>> > >>> optimize code that uses the STACK_SKIP value, but we will subsequently
>> > >>> perform relocations that adjust this value.
>> > >> ... why doesn't this apply to every other extern global variable in
>> > >> get_bvar()? They're all similarly relocated...
>> > >
>> > > Right.  There is potentially a broader problem.  But we simply do not have evidence of misbehavior in other cases.  Ruggedizing
>> > > other cases could be the subject of a different patch.
>> > 
>> > Aha, OK. I was just wondering if there was some extra reason.
>> > 
>> > > The problem in this case is that the compiler seems to assume &symbol!=0, which is reasonable except that we violate that behavior
>> > > for our relocation tricks.
>> > 
>> > I wonder where the code for that is... plenty of symbols have value
>> > zero.
>> > 
>> > But, really...
>> > 
>> > > Consider the C code:
>> > >
>> > >     uint64_t dt_bvar_stackdepth(const dt_dctx_t *dctx)
>> > >     {
>> > >         uint32_t          bufsiz = (uint32_t) (uint64_t) (&STKSIZ);
>> > >         char              *buf = dctx->mem + (uint64_t)(&STACK_OFF);
>> > 
>> > Hm...
>> > 
>> > extern uint64_t STACK_SKIP;
>> > 
>> > So we encode information about the stack size and skip value by encoding
>> > it in the *address* of the variable? Is there some reason we don't use
>> > its value? unlike the stack offset, we're *using* it as a value, not an
>> > address...
>> 
>> The issue seems to be (and perhaps this is a cross compiler problem) that e.g.
>> 
>> extern uint64_t PC;
>> 
>> and then code accessing the value of PC (e.g. foo(PC) as a call argument) will
>> yield:
>> 
>>   50:   18 02 00 00 00 00 00 00         lddw %r2,0
>>   58:   00 00 00 00 00 00 00 00 
>>                         50: R_BPF_INSN_64       PC
>>   60:   79 22 00 00 00 00 00 00         ldxdw %r2,[%r2+0]
>> 
>> which shows that it is interpreting PC as an address to a symbol, because it
>> loads the address of the symbol and then dereferences it with offset 0.  So,
>> we cannot plug in the value during relocation because the only value we can
>> put there would be an address where the vlaue can be found.  To get around
>> this, we "use" Tthe address as the entity to store the value in, knowing that
>> we *never* will interpret it as an address for these specific externs.
>
> Not a compiler error - since PC is extern uint64_t PC it *is* a variable and
> so it is present (and accessible) as an address in .data in the location where
> it is actually defined.  Since we never define it, we don't have a .data (which
> is fine because we only use this for constants known at link time) BUT the
> compiler of course is free to assume that we *do* have an address to the
> storage location for PC and thus that we get the value that way.  It does not
> know that the value is constant.  So, the trick I use is needed to make this
> work.

I'm just not sure why it's assuming that its value cannot be NULL, given
that you're not dereferencing it anywhere. I guess it's because it's a
symbol, but that's only a guess: I haven't managed to find the code that
implements this (?)optimization.

Anyway, this is not an objection to this patch, and a patch changing
this approach to something that didn't require us to throw volatiles
around and hope the compiler doesn't optimize things wrong would be a
separate patch anyway: you have my

Revieed-by: Nick Alcock <nick.alcock@oracle.com>

      reply	other threads:[~2025-06-23 14:04 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-22 18:01 [PATCH 01/14] Fix stack-skip counts for caller and stackdepth eugene.loh
2025-05-22 18:01 ` [PATCH 02/14] Add stack-skip frame count for rawtp provider eugene.loh
2025-06-13 14:35   ` Nick Alcock
2025-05-22 18:01 ` [PATCH 03/14] Test: remove unnecessary "unstable" tag eugene.loh
2025-06-13 14:35   ` [DTrace-devel] " Nick Alcock
2025-05-22 18:01 ` [PATCH 04/14] Test: caller and stackdepth tests for fbt provider eugene.loh
2025-06-13 14:40   ` [DTrace-devel] " Nick Alcock
2025-06-16 19:43     ` Eugene Loh
2025-06-25  4:14       ` Kris Van Hees
2025-05-22 18:01 ` [PATCH 05/14] Test: caller and stackdepth tests for dtrace provider eugene.loh
2025-06-13 14:42   ` Nick Alcock
2025-05-22 18:01 ` [PATCH 06/14] Test: caller and stackdepth tests for rawtp provider eugene.loh
2025-06-13 14:45   ` Nick Alcock
2025-05-22 18:01 ` [PATCH 07/14] Test: caller and stackdepth tests for cpc provider eugene.loh
2025-06-13 14:48   ` [DTrace-devel] " Nick Alcock
2025-05-22 18:01 ` [PATCH 08/14] Test: caller and stackdepth tests for ip provider eugene.loh
2025-06-13 14:51   ` [DTrace-devel] " Nick Alcock
2025-06-17  3:38     ` Eugene Loh
2025-06-25  4:15       ` Kris Van Hees
2025-05-22 18:01 ` [PATCH 09/14] Test: caller and stackdepth tests for profile provider eugene.loh
2025-06-13 14:52   ` [DTrace-devel] " Nick Alcock
2025-05-22 18:01 ` [PATCH 10/14] Test: caller and stackdepth tests for sched provider eugene.loh
2025-06-13 14:52   ` Nick Alcock
2025-05-22 18:01 ` [PATCH 11/14] Test: caller and stackdepth tests for proc provider eugene.loh
2025-06-13 14:53   ` Nick Alcock
2025-06-15 17:50     ` Eugene Loh
2025-06-19 12:52       ` Nick Alcock
2025-06-25  4:18       ` Kris Van Hees
2025-05-22 18:01 ` [PATCH 12/14] Test: caller and stackdepth tests for rawfbt provider eugene.loh
2025-06-13 14:54   ` [DTrace-devel] " Nick Alcock
2025-06-17 23:17     ` Eugene Loh
2025-05-22 18:01 ` [PATCH 13/14] Test: caller and stackdepth tests for io provider eugene.loh
2025-06-13 14:56   ` [DTrace-devel] " Nick Alcock
2025-05-22 18:01 ` [PATCH 14/14] Test: caller and stackdepth tests for lockstat provider eugene.loh
2025-06-13 14:57   ` Nick Alcock
2025-06-13 14:33 ` [PATCH 01/14] Fix stack-skip counts for caller and stackdepth Nick Alcock
2025-06-16 19:21   ` Eugene Loh
2025-06-19 13:03     ` Nick Alcock
2025-06-19 16:20       ` Kris Van Hees
2025-06-19 16:32         ` Kris Van Hees
2025-06-23 14:04           ` Nick Alcock [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o6uey2pb.fsf@esperi.org.uk \
    --to=nick.alcock@oracle.com \
    --cc=dtrace-devel@oss.oracle.com \
    --cc=dtrace@lists.linux.dev \
    --cc=eugene.loh@oracle.com \
    --cc=kris.van.hees@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox