All of lore.kernel.org
 help / color / mirror / Atom feed
From: Omar Sandoval <osandov@osandov.com>
To: "Frank Ch. Eigler" <fche@redhat.com>
Cc: elfutils-devel@sourceware.org, linux-debuggers@vger.kernel.org
Subject: Re: [PATCH 0/3] debuginfod: speed up extraction from kernel debuginfo packages by 200x
Date: Thu, 11 Jul 2024 16:00:00 -0700	[thread overview]
Message-ID: <ZpBj8Bmqnltd-cfj@telecaster> (raw)
In-Reply-To: <20240711201625.GD2826@redhat.com>

On Thu, Jul 11, 2024 at 04:16:25PM -0400, Frank Ch. Eigler wrote:
> Hi, Omar -
> 
> Thanks.  I wish this sort of amazing kludge weren't necessary, but
> given that it helps, so be it.
> 
> I'd like to commend you on the effort needed to match your code up
> with the stylistic idiosyncracies of the debuginfod c++ code.  It
> looks just like the other code.  My only reservation is the schema
> change.  Reindexing some of our large repos takes WEEKS.  Here's a
> possible way to avoid that:
> 
> - Preserve the current BUILDID schema id and tables as is.
> 
> - Add a new table for the intra-archive coordinates.  Think of it like a cache.
>   Index it with archive-file-name and content-file-name (source0, source1 IIRC).
> 
> - During a fetch out of the archive-file-name, check whether the new
>   table has a record for that file.  If yes, cache hit, go through to
>   the xz extraction stuff, winner!
> 
> - If not, try the is_seekable() check on the archive.  If it is true, we have an
>   archive that should be seekable, but we don't have it in the intra-archive cache.
>   So take this opportunity to index that archive (only), populate the cache table,
>   as the archive is being extracted.  (No need to use the new cache data then, since
>   we've just paid the effort of decompressing/reading the whole thing already.)
> 
> - Need to confirm that during grooming, a disappeared
>   archive-file-name would also drop the corresponding intra-archive
>   rows.
> 
> - Heck, during grooming or scanning, maybe the tool could preemptively
>   do the intra-archive coordinate cache thing if it's not already
>   done, just to defeat the latency of doing it on demand.
> 
> 
> What do you think?

Hi, Frank,

I didn't realize how expensive reindexing could be, thank you for
pointing that out.  Your proposal makes sense to me, I'll rework this.

Thanks,
Omar

      reply	other threads:[~2024-07-11 23:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-10 20:47 [PATCH 0/3] debuginfod: speed up extraction from kernel debuginfo packages by 200x Omar Sandoval
2024-07-10 20:47 ` [PATCH 1/3] debuginfod: factor out common code for responding from an archive Omar Sandoval
2024-07-10 20:47 ` [PATCH 2/3] debuginfod: add archive entry size, mtime, and uncompressed offset to database Omar Sandoval
2024-07-10 20:47 ` [PATCH 3/3] debuginfod: optimize extraction from seekable xz archives Omar Sandoval
2024-07-11 20:16 ` [PATCH 0/3] debuginfod: speed up extraction from kernel debuginfo packages by 200x Frank Ch. Eigler
2024-07-11 23:00   ` Omar Sandoval [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZpBj8Bmqnltd-cfj@telecaster \
    --to=osandov@osandov.com \
    --cc=elfutils-devel@sourceware.org \
    --cc=fche@redhat.com \
    --cc=linux-debuggers@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.