From mboxrd@z Thu Jan 1 00:00:00 1970 From: Junio C Hamano Subject: Re: [PATCH 16/16] write_sha1_file: freshen existing objects Date: Fri, 03 Oct 2014 14:29:58 -0700 Message-ID: References: <20141003202045.GA15205@peff.net> <20141003204110.GP16293@peff.net> Mime-Version: 1.0 Content-Type: text/plain Cc: git@vger.kernel.org, Michael Haggerty To: Jeff King X-From: git-owner@vger.kernel.org Fri Oct 03 23:30:09 2014 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XaAQT-0001to-1L for gcvg-git-2@plane.gmane.org; Fri, 03 Oct 2014 23:30:09 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750981AbaJCVaE (ORCPT ); Fri, 3 Oct 2014 17:30:04 -0400 Received: from smtp.pobox.com ([208.72.237.35]:52103 "EHLO smtp.pobox.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750850AbaJCVaD (ORCPT ); Fri, 3 Oct 2014 17:30:03 -0400 Received: from smtp.pobox.com (unknown [127.0.0.1]) by pb-smtp0.pobox.com (Postfix) with ESMTP id 6CD5D40A5D; Fri, 3 Oct 2014 17:30:02 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=v6FiRh49aLslDbEr5101rsMNk3k=; b=LBUGUl yybPjy3o6A+zSLZegZ9Pj0tixTl0sLrhlb7JbHVSeZ4HHXbiCnY1ViXjVto8Csnz HdB5AUimL4m6EYDZlRPMnvDiXyJIrlDJGG3AURsDsGYH2tgUt26NV7629fAQx76L haJyYQzVw4+H8tFOvlK6RseBpiBiwyophA8D4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=sasl; b=ADVEBrb5gFfXQL/pGro0/qx6W5KadUlY N7hNNt1cFbmumQXw9fRsMUUB8vWaWDOK0JVH6wwVZjeE0uYXlMefwdf7Cj1xWqZw X6KMnYYqPy4YrPi22jR6UbeKOmsKmqbiZW7jwOskDghI6H0Hls8qUVVdLCqpww0C GbvYP3ElD8U= Received: from pb-smtp0.int.icgroup.com (unknown [127.0.0.1]) by pb-smtp0.pobox.com (Postfix) with ESMTP id 5954240A5B; Fri, 3 Oct 2014 17:30:02 -0400 (EDT) Received: from pobox.com (unknown [72.14.226.9]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by pb-smtp0.pobox.com (Postfix) with ESMTPSA id 3529340A59; Fri, 3 Oct 2014 17:30:00 -0400 (EDT) In-Reply-To: <20141003204110.GP16293@peff.net> (Jeff King's message of "Fri, 3 Oct 2014 16:41:10 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) X-Pobox-Relay-ID: 6E0C5416-4B44-11E4-B781-9E3FC4D60FE0-77302942!pb-smtp0.pobox.com Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Jeff King writes: > When we try to write a loose object file, we first check > whether that object already exists. If so, we skip the > write as an optimization. However, this can interfere with > prune's strategy of using mtimes to mark files in progress. > > For example, if a branch contains a particular tree object > and is deleted, that tree object may become unreachable, and > have an old mtime. If a new operation then tries to write > the same tree, this ends up as a noop; we notice we > already have the object and do nothing. A prune running > simultaneously with this operation will see the object as > old, and may delete it. > > We can solve this by "freshening" objects that we avoid > writing by updating their mtime. The algorithm for doing so > is essentially the same as that of has_sha1_file. Therefore > we provide a new (static) interface "check_and_freshen", > which finds and optionally freshens the object. It's trivial > to implement freshening and simple checking by tweaking a > single parameter. An old referent by a recent unreachable may be in pack. Is it expected that the same pack will have many similar old objects (in other words, is it worth trying to optimize check-and-freshen by bypassing access() and utime(), perhaps by keeping a "freshened in this process already" flag in struct packed_git)? Could check-and-freshen-nonlocal() ever be called with freshen set to true? Should it be? In other words, should we be mucking with objects in other people's repositories with utime()? Other than these two minor questions, I am happy with this patch. Thanks. > Signed-off-by: Jeff King > --- > sha1_file.c | 51 +++++++++++++++++++++++++++++++++++++++------- > t/t6501-freshen-objects.sh | 27 ++++++++++++++++++++++++ > 2 files changed, 71 insertions(+), 7 deletions(-) > > diff --git a/sha1_file.c b/sha1_file.c > index d017289..d263b38 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -442,27 +442,53 @@ void prepare_alt_odb(void) > read_info_alternates(get_object_directory(), 0); > } > > -static int has_loose_object_local(const unsigned char *sha1) > +static int freshen_file(const char *fn) > { > - return !access(sha1_file_name(sha1), F_OK); > + struct utimbuf t; > + t.actime = t.modtime = time(NULL); > + return utime(fn, &t); > } > > -int has_loose_object_nonlocal(const unsigned char *sha1) > +static int check_and_freshen_file(const char *fn, int freshen) > +{ > + if (access(fn, F_OK)) > + return 0; > + if (freshen && freshen_file(fn)) > + return 0; > + return 1; > +} > + > +static int check_and_freshen_local(const unsigned char *sha1, int freshen) > +{ > + return check_and_freshen_file(sha1_file_name(sha1), freshen); > +} > + > +static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen) > { > struct alternate_object_database *alt; > prepare_alt_odb(); > for (alt = alt_odb_list; alt; alt = alt->next) { > fill_sha1_path(alt->name, sha1); > - if (!access(alt->base, F_OK)) > + if (check_and_freshen_file(alt->base, freshen)) > return 1; > } > return 0; > } > > +static int check_and_freshen(const unsigned char *sha1, int freshen) > +{ > + return check_and_freshen_local(sha1, freshen) || > + check_and_freshen_nonlocal(sha1, freshen); > +} > + > +int has_loose_object_nonlocal(const unsigned char *sha1) > +{ > + return check_and_freshen_nonlocal(sha1, 0); > +} > + > static int has_loose_object(const unsigned char *sha1) > { > - return has_loose_object_local(sha1) || > - has_loose_object_nonlocal(sha1); > + return check_and_freshen(sha1, 0); > } > > static unsigned int pack_used_ctr; > @@ -2949,6 +2975,17 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen, > return move_temp_to_file(tmp_file, filename); > } > > +static int freshen_loose_object(const unsigned char *sha1) > +{ > + return check_and_freshen(sha1, 1); > +} > + > +static int freshen_packed_object(const unsigned char *sha1) > +{ > + struct pack_entry e; > + return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name); > +} > + > int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *returnsha1) > { > unsigned char sha1[20]; > @@ -2961,7 +2998,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign > write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen); > if (returnsha1) > hashcpy(returnsha1, sha1); > - if (has_sha1_file(sha1)) > + if (freshen_loose_object(sha1) || freshen_packed_object(sha1)) > return 0; > return write_loose_object(sha1, hdr, hdrlen, buf, len, 0); > } > diff --git a/t/t6501-freshen-objects.sh b/t/t6501-freshen-objects.sh > index e25c47d..157f3f9 100755 > --- a/t/t6501-freshen-objects.sh > +++ b/t/t6501-freshen-objects.sh > @@ -100,6 +100,33 @@ for repack in '' true; do > test_expect_success "repository passes fsck ($title)" ' > git fsck > ' > + > + test_expect_success "abandon objects again ($title)" ' > + git reset --hard HEAD^ && > + find .git/objects -type f | > + xargs test-chmtime -v -86400 > + ' > + > + test_expect_success "start writing new commit with same tree ($title)" ' > + tree=$( > + GIT_INDEX_FILE=index.tmp && > + export GIT_INDEX_FILE && > + git read-tree HEAD && > + add abandon && > + add unrelated && > + git write-tree > + ) > + ' > + > + test_expect_success "simultaneous gc ($title)" ' > + git gc --prune=12.hours.ago > + ' > + > + # tree should have been refreshed by write-tree > + test_expect_success "finish writing out commit ($title)" ' > + commit=$(echo foo | git commit-tree -p HEAD $tree) && > + git update-ref HEAD $commit > + ' > done > > test_done