git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCHv2] write_index: optionally allow broken null sha1s
Date: Sun, 25 Aug 2013 12:54:12 -0700	[thread overview]
Message-ID: <20130825195412.GA2752@elie.Belkin> (raw)
In-Reply-To: <20130825095818.GA12556@sigill.intra.peff.net>

On Sun, Aug 25, 2013 at 05:58:18AM -0400, Jeff King wrote:
> On Sat, Aug 24, 2013 at 11:15:00PM -0700, Jonathan Nieder wrote:

>>> I was tempted to not involve filter-branch in this commit at all, and
>>> instead require the user to manually invoke
>>>
>>>   GIT_ALLOW_NULL_SHA1=1 git filter-branch ...
>>>
>>> to perform such a filter.  That would be slightly safer, but requires
>>> some specialized knowledge from the user (and advice on using
>>> filter-branch to remove such entries already exists on places like
>>> stackoverflow, and this patch makes it Just Work on recent versions of
>>> git).
>>
>> The above few paragraphs explained the most mysterious part of the
>> patch to me.  I think they would be fine to include in the commit
>> message.
>
> I rolled them into the commit message.

Hm --- the most useful part was "advice on using filter-branch to
remove such entries already exists on places like stackoverflow",
which was dropped.  From my point of view, that is exactly the
motivation of the patch.

I also found the "I was tempted to ... That would be slightly safer,
but requires ..." structure easier to read.

In other words, why not use something like this?

	write_index: optionally allow broken null sha1s

	Commit 4337b58 (do not write null sha1s to on-disk index, 2012-07-28)
	added a safety check preventing git from writing null sha1s into the
	index. The intent was to catch errors in other parts of the code that
	might let such an entry slip into the index (or worse, a tree).

	Some existing repositories have some invalid trees that contain null
	sha1s already, though.  Until 4337b58, a common way to clean this up
	would be to use git-filter-branch's index-filter to repair such broken
	entries.  That now fails when filter-branch tries to write out the
	index.

	Introduce a GIT_ALLOW_NULL_SHA1 environment variable to relax this check
	and make it easier to recover from such a history.

	It is tempting to not involve filter-branch in this commit at all, and
	instead require the user to manually invoke

		GIT_ALLOW_NULL_SHA1=1 git filter-branch ...

	to perform an index-filter on a history with trees with null sha1s.
	That would be slightly safer, but requires some specialized knowledge
	from the user.  So let's set the GIT_ALLOW_NULL_SHA1 variable
	automatically when checking out the to-be-filtered trees.  Advice on
	using filter-branch to remove such entries already exists on places like
	stackoverflow, and this patch makes it Just Work again on recent
	versions of git.

	Further commands that touch the index will still notice and fail,	unless
	they actually remove the broken entries.  A filter-branch whose filters
	do not touch the index at all will not error out (since we complain of
	the null sha1 only on writing, not when making a tree out of the index),
	but this is acceptable, as we still print a loud warning, so the problem
	is unlikely to go unnoticed.

With or without such a change,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

After this patch, do you think (in a separate change) it would make
sense for cache-tree.c::update_one() to check for null sha1 and error
out unless GIT_ALLOW_NULL_SHA1 is true?  That would let us get rid of
the caveat from the last paragraph.

[...]
> --- /dev/null
> +++ b/t/t7009-filter-branch-null-sha1.sh
> @@ -0,0 +1,52 @@
> +#!/bin/sh
> +
> +test_description='filter-branch removal of trees with null sha1'
> +. ./test-lib.sh
> +
> +test_expect_success 'create base commits' '
> +	test_commit one &&
> +	test_commit two &&
> +	test_commit three
> +'
> +
> +test_expect_success 'create a commit with a bogus null sha1 in the tree' '
> +	{
> +		git ls-tree HEAD &&
> +		printf "160000 commit $_z40\\tbroken\\n"
> +	} >broken-tree
> +	echo "add broken entry" >msg &&
> +
> +	tree=$(git mktree <broken-tree) &&
> +	test_tick &&
> +	commit=$(git commit-tree $tree -p HEAD <msg) &&
> +	git update-ref HEAD "$commit"
> +'
> +
> +# we have to make one more commit on top removing the broken
> +# entry, since otherwise our index does not match HEAD (and filter-branch will
> +# complain). We could make the index match HEAD, but doing so would involve
> +# writing a null sha1 into the index.
> +test_expect_success 'create a commit dropping the broken entry' '
> +	test_tick &&
> +	git commit -a -m "back to normal"
> +'

This is kind of an old-fashioned test, since each step of the setup is
treated as a separate test assertion.  I don't really mind until we
get better automation to make it easy to skip or rearrange tests.
Just for reference, I think the usual way to do this now is

	test_expect_success 'setup' '
		# create base commits
		...

		# create a commit with bogus null sha1 in the tree
		...

		# We have to make one more commit on top removing the broken
		# entry, since otherwise our index does not match HEAD and
		# filter-branch will complain. We could make the index match
		# HEAD, but doing so would involve writing a null sha1 into
		# the index.
		...
	'

	test_expect_success 'nontrivial filter-branch bails out on null sha1' '
		old_head=$(git rev-parse HEAD) &&
		test_must_fail git filter-branch ... &&
		test_cmp_rev "$old_head" HEAD
	'

	test_expect_success 'filter-branch can filter out null sha1' '
		git filter-branch ... &&

		# resulting history is clean
		echo three >expect &&
		git log -1 --format=%s >actual &&
		test_cmp expect actual
	'

Thanks,
Jonathan

  reply	other threads:[~2013-08-25 19:54 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-24  1:33 [PATCH] write_index: optionally allow broken null sha1s Jeff King
2013-08-25  6:15 ` Jonathan Nieder
2013-08-25  9:58   ` [PATCHv2] " Jeff King
2013-08-25 19:54     ` Jonathan Nieder [this message]
2013-08-26  6:03       ` Junio C Hamano
2013-08-26 14:31         ` Jeff King
2013-08-26 16:02           ` Junio C Hamano
2013-08-26 21:36             ` Jeff King
2013-08-26 14:27       ` Jeff King
2013-08-26 17:35         ` Jonathan Nieder
2013-08-26 21:20           ` Jeff King
2013-08-27  3:46             ` Junio C Hamano
2013-08-27 20:41             ` [PATCHv3] " Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130825195412.GA2752@elie.Belkin \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).