bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Alan Maguire <alan.maguire@oracle.com>
Cc: dwarves@vger.kernel.org, "Jiri Olsa" <jolsa@kernel.org>,
	"Clark Williams" <williams@redhat.com>,
	"Kate Carcia" <kcarcia@redhat.com>,
	bpf@vger.kernel.org, "Kui-Feng Lee" <kuifeng@fb.com>,
	"Thomas Weißschuh" <linux@weissschuh.net>
Subject: Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
Date: Fri, 12 Apr 2024 17:36:13 -0300	[thread overview]
Message-ID: <ZhmbPVj62mMK1NZq@x1> (raw)
In-Reply-To: <ZhQBpAGIDU_koQnp@x1>

On Mon, Apr 08, 2024 at 11:39:32AM -0300, Arnaldo Carvalho de Melo wrote:
> On Mon, Apr 08, 2024 at 01:00:59PM +0100, Alan Maguire wrote:
> > On 04/04/2024 09:58, Alan Maguire wrote:
> > > Program terminated with signal SIGSEGV, Segmentation fault.
> > > #0  0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77)
> > >     at /home/almagui/src/dwarves/dwarves.c:612
> > > 612		return id >= pt->nr_entries ? NULL : pt->entries[id];
> > > [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))]
> > > (gdb) print *(struct ptr_table *)0x7f8c60001e70
> > > $1 = {entries = 0x0, nr_entries = 2979, allocated_entries = 4096}
> > > (gdb)
> 
> > > So it looks like the ptr_table has 2979 entries but entries is NULL;
> > > could there be an issue where CU initialization is not yet complete
> > > for some threads (it also happens very early in processing)? Can you
> > > reproduce this failure at your end? Thanks!
>  
> > the following (when applied on top of the series) resolves the
> > segmentation fault for me:
>  
> > diff --git a/pahole.c b/pahole.c
> > index 6c7e738..5ff0eaf 100644
> > --- a/pahole.c
> > +++ b/pahole.c
> > @@ -3348,8 +3348,8 @@ static enum load_steal_kind pahole_stealer(struct
> > cu *cu,
> >                 if (conf_load->reproducible_build) {
> >                         ret = LSK__KEEPIT; // we're not processing the
> > cu passed to this function, so keep it.
> > -                        // Equivalent to LSK__DELETE since we processed
> > this
> > -                       cus__remove(cus, cu);
> > -                       cu__delete(cu);
> >                 }
> >  out_btf:
> >                 if (!thr_data) // See comment about reproducibe_build above
> > 
> 
> Yeah, Jiri also pointed out this call to cu__delete() was new, I was
> trying to avoid having unprocessed 'struct cu' using too much memory, so
> after processing it, delete them, but as you found out there are
> references to that memory...
> 
> > In other words, the problem is we remove/delete CUs when finished with
> > them in each thread (when BTF is generated).  However because the
> > save/add_saved_funcs stashes CU references in the associated struct
> > function * (to allow prototype comparison for the same function in
> > different CUs), we end up with stale CU references and in this case the
> > freed/nulled ptr_table caused an issue. As far as I can see we need to
> > retain CUs until all BTF has been merged from threads.
>  
> > With the fix in place, I'm seeing less then 100msec difference between
> > reproducible/non-reproducible vmlinux BTF generation; that's great!
> 
> Excellent!
> 
> I'll remove it and add a note crediting you with the removal and having
> the explanation about why its not possibe to delete it at that point
> (references to the associated 'struct function').

So I removed that cus__remove + cu__delete and also the other one at the
flush operation, leaving all cleaning up to cus__delete() time:

⬢[acme@toolbox pahole]$ git diff
diff --git a/dwarves.c b/dwarves.c
index fbc8d8aa0060b7d0..1ec259f50dbd3778 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -489,8 +489,12 @@ struct cu *cus__get_next_processable_cu(struct cus *cus)
                        cu->state = CU__PROCESSING;
                        goto found;
                case CU__PROCESSING:
-                       // This will only happen when we get to parallel
-                       // reproducible BTF encoding, libbpf dedup work needed here.
+                       // This will happen when we get to parallel
+                       // reproducible BTF encoding, libbpf dedup work needed
+                       // here. The other possibility is when we're flushing
+                       // the DWARF processed CUs when the parallel DWARF
+                       // loading stoped and we still have CUs to encode to
+                       // BTF because of ordering requirements.
                        continue;
                case CU__UNPROCESSED:
                        // The first entry isn't loaded, signal the
diff --git a/pahole.c b/pahole.c
index 6c7e73835b3e9139..77772bb42bb443ce 100644
--- a/pahole.c
+++ b/pahole.c
@@ -3347,9 +3347,9 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 
                if (conf_load->reproducible_build) {
                        ret = LSK__KEEPIT; // we're not processing the cu passed to this function, so keep it.
-                       // Equivalent to LSK__DELETE since we processed this
-                       cus__remove(cus, cu);
-                       cu__delete(cu);
+                       // Kinda equivalent to LSK__DELETE since we processed this, but we can't delete it
+                       // as we stash references to entries in CUs for 'struct function' in btf_encoder__add_saved_funcs()
+                       // and btf_encoder__save_func(), so we can't delete them here. - Alan Maguire
                }
 out_btf:
                if (!thr_data) // See comment about reproducibe_build above
@@ -3667,9 +3667,6 @@ static int cus__flush_reproducible_build(struct cus *cus, struct btf_encoder *en
                err = btf_encoder__encode_cu(encoder, cu, conf_load);
                if (err < 0)
                        break;
-
-               cus__remove(cus, cu);
-               cu__delete(cu);
⬢[acme@toolbox pahole]$


It ends up taking a bit more time on this 14700K with 32Gb, I'll later
try to remove that need to keep everything in memory and also double
check this hunch that this is due to keeping everyuthing in memory.

Can I take this (with the above patch, that is a bit bigger than yours)
as a Tested-by + Reviewed-by you?

- Arnald

  reply	other threads:[~2024-04-12 20:36 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 01/12] core: Allow asking for a reproducible build Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds Arnaldo Carvalho de Melo
2024-04-03 18:19   ` Andrii Nakryiko
2024-04-03 21:38     ` Arnaldo Carvalho de Melo
2024-04-03 21:43       ` Andrii Nakryiko
2024-04-04  9:42   ` Jiri Olsa
2024-04-02 19:39 ` [PATCH 03/12] dwarf_loader: Separate creating the cu/dcu pair from processing it Arnaldo Carvalho de Melo
2024-04-04  9:42   ` Jiri Olsa
2024-04-02 19:39 ` [PATCH 04/12] dwarf_loader: Introduce dwarf_cus__process_cu() Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 05/12] dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu() Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 06/12] dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu() Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 07/12] core: Add unlocked cus__add() variant Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 08/12] core: Add cus__remove(), counterpart of cus__add() Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 09/12] dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 10/12] core/dwarf_loader: Add functions to set state of CU processing Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 11/12] pahole: Encode BTF serially in a reproducible build Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 12/12] tests: Add a BTF reproducible generation test Arnaldo Carvalho de Melo
2024-04-04  0:08 ` [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Eduard Zingerman
2024-04-04  8:05   ` Alan Maguire
2024-04-09 14:34     ` Eduard Zingerman
2024-04-09 14:56       ` Alexei Starovoitov
2024-04-09 15:01         ` Eduard Zingerman
2024-04-09 18:45           ` Arnaldo Carvalho de Melo
2024-04-09 19:29             ` Eduard Zingerman
2024-04-09 19:34               ` Alexei Starovoitov
2024-04-09 19:57               ` Arnaldo Carvalho de Melo
2024-04-12 20:37       ` Arnaldo Carvalho de Melo
2024-04-12 20:40         ` Eduard Zingerman
2024-04-12 21:09           ` Arnaldo Carvalho de Melo
2024-04-12 21:10             ` Eduard Zingerman
2024-04-04  8:58 ` Alan Maguire
2024-04-08 12:00   ` Alan Maguire
2024-04-08 14:39     ` Arnaldo Carvalho de Melo
2024-04-12 20:36       ` Arnaldo Carvalho de Melo [this message]
2024-04-04  9:42 ` Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZhmbPVj62mMK1NZq@x1 \
    --to=acme@kernel.org \
    --cc=alan.maguire@oracle.com \
    --cc=bpf@vger.kernel.org \
    --cc=dwarves@vger.kernel.org \
    --cc=jolsa@kernel.org \
    --cc=kcarcia@redhat.com \
    --cc=kuifeng@fb.com \
    --cc=linux@weissschuh.net \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).