Linux debuggers
 help / color / mirror / Atom feed
From: Omar Sandoval <osandov@osandov.com>
To: elfutils-devel@sourceware.org
Cc: "Frank Ch . Eigler" <fche@redhat.com>, linux-debuggers@vger.kernel.org
Subject: [PATCH v3 7/7] debuginfod: populate _r_seekable on request
Date: Fri, 19 Jul 2024 01:32:03 -0700	[thread overview]
Message-ID: <99e9dcbd8c29a1be4ac46c74b9e59499fc0fce07.1721377314.git.osandov@fb.com> (raw)
In-Reply-To: <cover.1721377314.git.osandov@fb.com>

From: Omar Sandoval <osandov@fb.com>

Since the schema change adding _r_seekable was done in a backward
compatible way, seekable archives that were previously scanned will not
be in _r_seekable.  Whenever an archive is going to be extracted to
satisfy a request, check if it is seekable.  If so, populate _r_seekable
while extracting it so that future requests use the optimized path.

The next time that BUILDIDS is bumped, all archives will be checked at
scan time.  At that point, checking again will be unnecessary and this
commit (including the test case modification) can be reverted.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 debuginfod/debuginfod.cxx        | 76 +++++++++++++++++++++++++++++---
 tests/run-debuginfod-seekable.sh | 45 +++++++++++++++++++
 2 files changed, 115 insertions(+), 6 deletions(-)

diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index 677eca30..d8a02fb5 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -2740,6 +2740,7 @@ handle_buildid_r_match (bool internal_req_p,
     }
 
   // no match ... look for a seekable entry
+  bool populate_seekable = ! passive_p;
   unique_ptr<sqlite_ps> pp (new sqlite_ps (internal_req_p ? db : dbq,
                                            "rpm-seekable-query",
                                            "select type, size, offset, mtime from " BUILDIDS "_r_seekable "
@@ -2749,6 +2750,9 @@ handle_buildid_r_match (bool internal_req_p,
     {
       if (rc != SQLITE_ROW)
         throw sqlite_exception(rc, "step");
+      // if we found a match in _r_seekable but we fail to extract it, don't
+      // bother populating it again
+      populate_seekable = false;
       const char* seekable_type = (const char*) sqlite3_column_text (*pp, 0);
       if (seekable_type != NULL && strcmp (seekable_type, "xz") == 0)
         {
@@ -2840,16 +2844,39 @@ handle_buildid_r_match (bool internal_req_p,
       throw archive_exception(a, "cannot open archive from pipe");
     }
 
-  // archive traversal is in three stages, no, four stages:
-  // 1) skip entries whose names do not match the requested one
-  // 2) extract the matching entry name (set r = result)
-  // 3) extract some number of prefetched entries (just into fdcache)
-  // 4) abort any further processing
+  // If the archive was scanned in a version without _r_seekable, then we may
+  // need to populate _r_seekable now.  This can be removed the next time
+  // BUILDIDS is updated.
+  if (populate_seekable)
+    {
+      populate_seekable = is_seekable_archive (b_source0, a);
+      if (populate_seekable)
+        {
+          // NB: the names are already interned
+          pp.reset(new sqlite_ps (db, "rpm-seekable-insert2",
+                                  "insert or ignore into " BUILDIDS "_r_seekable (file, content, type, size, offset, mtime) "
+                                  "values (?, "
+                                  "(select id from " BUILDIDS "_files "
+                                  "where dirname = (select id from " BUILDIDS "_fileparts where name = ?) "
+                                  "and basename = (select id from " BUILDIDS "_fileparts where name = ?) "
+                                  "), 'xz', ?, ?, ?)"));
+        }
+    }
+
+  // archive traversal is in five stages:
+  // 1) before we find a matching entry, insert it into _r_seekable if needed or
+  //    skip it otherwise
+  // 2) extract the matching entry (set r = result).  Also insert it into
+  //    _r_seekable if needed
+  // 3) extract some number of prefetched entries (just into fdcache).  Also
+  //    insert them into _r_seekable if needed
+  // 4) if needed, insert all of the remaining entries into _r_seekable
+  // 5) abort any further processing
   struct MHD_Response* r = 0;                 // will set in stage 2
   unsigned prefetch_count =
     internal_req_p ? 0 : fdcache_prefetch;    // will decrement in stage 3
 
-  while(r == 0 || prefetch_count > 0) // stage 1, 2, or 3
+  while(r == 0 || prefetch_count > 0 || populate_seekable) // stage 1-4
     {
       if (interrupted)
         break;
@@ -2863,6 +2890,43 @@ handle_buildid_r_match (bool internal_req_p,
         continue;
 
       string fn = canonicalized_archive_entry_pathname (e);
+
+      if (populate_seekable)
+        {
+          string dn, bn;
+          size_t slash = fn.rfind('/');
+          if (slash == std::string::npos) {
+            dn = "";
+            bn = fn;
+          } else {
+            dn = fn.substr(0, slash);
+            bn = fn.substr(slash + 1);
+          }
+
+          int64_t seekable_size = archive_entry_size (e);
+          int64_t seekable_offset = archive_filter_bytes (a, 0);
+          time_t seekable_mtime = archive_entry_mtime (e);
+
+          pp->reset();
+          pp->bind(1, b_id0);
+          pp->bind(2, dn);
+          pp->bind(3, bn);
+          pp->bind(4, seekable_size);
+          pp->bind(5, seekable_offset);
+          pp->bind(6, seekable_mtime);
+          rc = pp->step();
+          if (rc != SQLITE_DONE)
+            obatched(clog) << "recording seekable file=" << fn
+                           << " sqlite3 error: " << (sqlite3_errstr(rc) ?: "?") << endl;
+          else if (verbose > 2)
+            obatched(clog) << "recorded seekable file=" << fn
+                           << " size=" << seekable_size
+                           << " offset=" << seekable_offset
+                           << " mtime=" << seekable_mtime << endl;
+          if (r != 0 && prefetch_count == 0) // stage 4
+            continue;
+        }
+
       if ((r == 0) && (fn != b_source1)) // stage 1
         continue;
 
diff --git a/tests/run-debuginfod-seekable.sh b/tests/run-debuginfod-seekable.sh
index d546fa3d..c787428f 100755
--- a/tests/run-debuginfod-seekable.sh
+++ b/tests/run-debuginfod-seekable.sh
@@ -138,4 +138,49 @@ kill $PID1
 wait $PID1
 PID1=0
 
+if type sqlite3 2>/dev/null; then
+	# Emulate the case of upgrading from an old server without the seekable
+	# optimization by dropping the _r_seekable table.
+	sqlite3 "$DB" 'DROP TABLE buildids10_r_seekable'
+
+	env LD_LIBRARY_PATH=$ldpath ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -d $DB -p $PORT2 -t0 -g0 --fdcache-prefetch=0 -v -R -U R D > vlog$PORT2 2>&1 &
+	PID2=$!
+	tempfiles vlog$PORT2
+	errfiles vlog$PORT2
+
+	wait_ready $PORT2 'ready' 1
+
+	check_all $PORT2
+
+	# The first request per archive has to do a full extraction.  Check
+	# that the rest used the seekable optimization.
+	curl -s http://localhost:$PORT2/metrics | awk '
+/^http_responses_total\{result="seekable xz archive"\}/ {
+	print
+	seekable = $NF
+}
+
+/^http_responses_total\{result="(rpm|deb) archive"\}/ {
+	print
+	full = $NF
+}
+
+END {
+	if (seekable == 0) {
+		print "error: no seekable extractions" > "/dev/stderr"
+		exit 1
+	}
+	if (full > 4) {
+		print "error: too many (" full ") full extractions" > "/dev/stderr"
+		exit 1
+	}
+}'
+
+	tempfiles $DB*
+
+	kill $PID2
+	wait $PID2
+	PID2=0
+fi
+
 exit 0
-- 
2.45.2


  parent reply	other threads:[~2024-07-19  8:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-19  8:31 [PATCH v3 0/7] debuginfod: speed up extraction from kernel debuginfo packages by 200x Omar Sandoval
2024-07-19  8:31 ` [PATCH v3 1/7] debuginfod: fix skipping <built-in> source file Omar Sandoval
2024-07-19  8:31 ` [PATCH v3 2/7] tests/run-debuginfod-fd-prefetch-caches.sh: disable fdcache limit check Omar Sandoval
2024-07-19  8:31 ` [PATCH v3 3/7] debuginfod: factor out common code for responding from an archive Omar Sandoval
2024-07-19  8:32 ` [PATCH v3 4/7] debugifod: add new table and views for seekable archives Omar Sandoval
2024-07-19  8:32 ` [PATCH v3 5/7] debuginfod: optimize extraction from seekable xz archives Omar Sandoval
2024-07-19  8:32 ` [PATCH v3 6/7] debuginfod: populate _r_seekable on scan Omar Sandoval
2024-07-19  8:32 ` Omar Sandoval [this message]
2024-07-19 17:34 ` [PATCH v3 0/7] debuginfod: speed up extraction from kernel debuginfo packages by 200x Frank Ch. Eigler
2024-07-19 18:04   ` Omar Sandoval

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=99e9dcbd8c29a1be4ac46c74b9e59499fc0fce07.1721377314.git.osandov@fb.com \
    --to=osandov@osandov.com \
    --cc=elfutils-devel@sourceware.org \
    --cc=fche@redhat.com \
    --cc=linux-debuggers@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox