Re: [PATCH 0/3] debuginfod: speed up extraction from kernel debuginfo packages by 200x

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Frank Ch. Eigler" <fche@redhat.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: elfutils-devel@sourceware.org, linux-debuggers@vger.kernel.org
Subject: Re: [PATCH 0/3] debuginfod: speed up extraction from kernel debuginfo packages by 200x
Date: Thu, 11 Jul 2024 16:16:25 -0400	[thread overview]
Message-ID: <20240711201625.GD2826@redhat.com> (raw)
In-Reply-To: <cover.1720644134.git.osandov@fb.com>

Hi, Omar -

Thanks.  I wish this sort of amazing kludge weren't necessary, but
given that it helps, so be it.

I'd like to commend you on the effort needed to match your code up
with the stylistic idiosyncracies of the debuginfod c++ code.  It
looks just like the other code.  My only reservation is the schema
change.  Reindexing some of our large repos takes WEEKS.  Here's a
possible way to avoid that:

- Preserve the current BUILDID schema id and tables as is.

- Add a new table for the intra-archive coordinates.  Think of it like a cache.
  Index it with archive-file-name and content-file-name (source0, source1 IIRC).

- During a fetch out of the archive-file-name, check whether the new
  table has a record for that file.  If yes, cache hit, go through to
  the xz extraction stuff, winner!

- If not, try the is_seekable() check on the archive.  If it is true, we have an
  archive that should be seekable, but we don't have it in the intra-archive cache.
  So take this opportunity to index that archive (only), populate the cache table,
  as the archive is being extracted.  (No need to use the new cache data then, since
  we've just paid the effort of decompressing/reading the whole thing already.)

- Need to confirm that during grooming, a disappeared
  archive-file-name would also drop the corresponding intra-archive
  rows.

- Heck, during grooming or scanning, maybe the tool could preemptively
  do the intra-archive coordinate cache thing if it's not already
  done, just to defeat the latency of doing it on demand.

What do you think?

- FChE

next prev parent reply	other threads:[~2024-07-11 20:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-10 20:47 [PATCH 0/3] debuginfod: speed up extraction from kernel debuginfo packages by 200x Omar Sandoval
2024-07-10 20:47 ` [PATCH 1/3] debuginfod: factor out common code for responding from an archive Omar Sandoval
2024-07-10 20:47 ` [PATCH 2/3] debuginfod: add archive entry size, mtime, and uncompressed offset to database Omar Sandoval
2024-07-10 20:47 ` [PATCH 3/3] debuginfod: optimize extraction from seekable xz archives Omar Sandoval
2024-07-11 20:16 ` Frank Ch. Eigler [this message]
2024-07-11 23:00   ` [PATCH 0/3] debuginfod: speed up extraction from kernel debuginfo packages by 200x Omar Sandoval

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240711201625.GD2826@redhat.com \
    --to=fche@redhat.com \
    --cc=elfutils-devel@sourceware.org \
    --cc=linux-debuggers@vger.kernel.org \
    --cc=osandov@osandov.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.