From: "Frank Ch. Eigler" <fche@redhat.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: elfutils-devel@sourceware.org, linux-debuggers@vger.kernel.org
Subject: Re: [PATCH 0/3] debuginfod: speed up extraction from kernel debuginfo packages by 200x
Date: Thu, 11 Jul 2024 16:16:25 -0400 [thread overview]
Message-ID: <20240711201625.GD2826@redhat.com> (raw)
In-Reply-To: <cover.1720644134.git.osandov@fb.com>
Hi, Omar -
Thanks. I wish this sort of amazing kludge weren't necessary, but
given that it helps, so be it.
I'd like to commend you on the effort needed to match your code up
with the stylistic idiosyncracies of the debuginfod c++ code. It
looks just like the other code. My only reservation is the schema
change. Reindexing some of our large repos takes WEEKS. Here's a
possible way to avoid that:
- Preserve the current BUILDID schema id and tables as is.
- Add a new table for the intra-archive coordinates. Think of it like a cache.
Index it with archive-file-name and content-file-name (source0, source1 IIRC).
- During a fetch out of the archive-file-name, check whether the new
table has a record for that file. If yes, cache hit, go through to
the xz extraction stuff, winner!
- If not, try the is_seekable() check on the archive. If it is true, we have an
archive that should be seekable, but we don't have it in the intra-archive cache.
So take this opportunity to index that archive (only), populate the cache table,
as the archive is being extracted. (No need to use the new cache data then, since
we've just paid the effort of decompressing/reading the whole thing already.)
- Need to confirm that during grooming, a disappeared
archive-file-name would also drop the corresponding intra-archive
rows.
- Heck, during grooming or scanning, maybe the tool could preemptively
do the intra-archive coordinate cache thing if it's not already
done, just to defeat the latency of doing it on demand.
What do you think?
- FChE
next prev parent reply other threads:[~2024-07-11 20:16 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-10 20:47 [PATCH 0/3] debuginfod: speed up extraction from kernel debuginfo packages by 200x Omar Sandoval
2024-07-10 20:47 ` [PATCH 1/3] debuginfod: factor out common code for responding from an archive Omar Sandoval
2024-07-10 20:47 ` [PATCH 2/3] debuginfod: add archive entry size, mtime, and uncompressed offset to database Omar Sandoval
2024-07-10 20:47 ` [PATCH 3/3] debuginfod: optimize extraction from seekable xz archives Omar Sandoval
2024-07-11 20:16 ` Frank Ch. Eigler [this message]
2024-07-11 23:00 ` [PATCH 0/3] debuginfod: speed up extraction from kernel debuginfo packages by 200x Omar Sandoval
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240711201625.GD2826@redhat.com \
--to=fche@redhat.com \
--cc=elfutils-devel@sourceware.org \
--cc=linux-debuggers@vger.kernel.org \
--cc=osandov@osandov.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.