From: Jeff King <peff@peff.net>
To: Charles Bailey <charles@hashpling.org>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] Add failing test for fetching from multiple packs over dumb httpd
Date: Tue, 27 Jan 2015 13:12:21 -0500 [thread overview]
Message-ID: <20150127181220.GA17067@peff.net> (raw)
In-Reply-To: <1422372041-16474-1-git-send-email-charles@hashpling.org>
On Tue, Jan 27, 2015 at 03:20:41PM +0000, Charles Bailey wrote:
> From: Charles Bailey <cbailey32@bloomberg.net>
>
> When objects are spread across multiple packs, if an initial fetch does
> require all pack files, a subsequent fetch for objects in packs not
> retrieved in the initial fetch will fail.
s/does/does not/, I think?
> I'm not very familiar with the http client code so this analysis is based
> purely on observed behaviour.
Debugging the http code is a royal pain because all the work happens in
a separate helper. I use a git-remote-debug script like this:
#!/bin/sh
host=localhost:5001
proto=$(echo "${2:-$1}" | sed 's/:.*//')
prog=git-remote-$proto
echo >&2 "gdb -ex 'target remote $host' $prog"
gdbserver localhost:5001 "$prog" "$@"
and then you can use:
git fetch debug::http://...
in the test script, cut-and-paste the gdb command printed to stderr, and
you're dropped into the appropriate debugger without worrying about all
of the stdio mess.
> When fetching only some refs from a repository served over dumb httpd Git
> appears to download all of the index files for the available packs but then
> only chooses the pack files that help it resolve the objects which we need.
Right. And it looks like we have special code in sha1_file.c to make
sure we do not trust an index which does not have a matching packfile.
So that's good.
The http-walker code does its own check, in fetch_and_setup_pack_index,
that checks for an existing valid copy of the index. If we don't have
it, we download the index and proceed. If we do, we skip straight to
grabbing the pack. But if we have it and it doesn't appear valid, we
return an error. And there seems to be a bug with checking the validity.
It looks like the culprit is 7b64469 (Allow parse_pack_index on
temporary files, 2010-04-19). It added a new "idx_path" parameter to
parse_pack_index, which we pass as NULL. That causes its call to
check_packed_git_idx to fail (because it has no idea what file we are
talking about!).
This seems to fix it:
diff --git a/sha1_file.c b/sha1_file.c
index 30995e6..eda4d90 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1149,6 +1149,9 @@ struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path)
const char *path = sha1_pack_name(sha1);
struct packed_git *p = alloc_packed_git(strlen(path) + 1);
+ if (!idx_path)
+ idx_path = sha1_pack_index_name(sha1);
+
strcpy(p->pack_name, path);
hashcpy(p->sha1, sha1);
if (check_packed_git_idx(idx_path, p)) {
(Alternatively, we could pass in sha1_pack_index_name instead of NULL in
the first place, but I think it is reasonable for parse_pack_index to
take care of this).
I think it may also make sense for fetch_and_setup_pack_index to delete
and re-download a broken .idx file (rather than aborting), but I don't
think that's a big deal. It should only happen in the face of on-disk
data corruption, and the user can remove the broken .idx themselves.
-Peff
next prev parent reply other threads:[~2015-01-27 18:12 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-27 15:20 [PATCH] Add failing test for fetching from multiple packs over dumb httpd Charles Bailey
2015-01-27 18:12 ` Jeff King [this message]
2015-01-27 18:29 ` Charles Bailey
2015-01-27 20:02 ` [PATCH] dumb-http: do not pass NULL path to parse_pack_index Jeff King
2015-01-27 20:19 ` Charles Bailey
2015-01-27 20:46 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150127181220.GA17067@peff.net \
--to=peff@peff.net \
--cc=charles@hashpling.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).