All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Đoàn Trần Công Danh" <congdanhqx@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
	"Junio C Hamano" <gitster@pobox.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	git@vger.kernel.org, "Jeff King" <peff@peff.net>
Subject: Re: [PATCH v3 2/2] diff: index-line: respect --abbrev in object's name
Date: Sat, 15 Aug 2020 09:27:59 +0700	[thread overview]
Message-ID: <20200815022759.GC12363@danh.dev> (raw)
In-Reply-To: <20200815002120.GQ8085@camp.crustytoothpaste.net>

On 2020-08-15 00:21:20+0000, "brian m. carlson" <sandals@crustytoothpaste.net> wrote:
> On 2020-08-14 at 18:59:53, Junio C Hamano wrote:
> > Junio C Hamano <gitster@pobox.com> writes:
> > 
> > > Ouch.  These apparently come from
> > >
> > > process_diffs () {
> > > ...
> > > }
> > >
> > > Hash-independence may be good, but it should not munge expected mode
> > > bits from 100644 to ffff44, which I think is a bug in the original
> > > introduced in 72f936b1 (t4013: make test hash independent,
> > > 2020-02-07).
> > >
> > > When we are adjusting the abbrev length of the index line, of course
> > > $_x07 would not be sufficient to match the abbreviated object name
> > > in full, so a79 vs 895 can be explained and is a bug in this patch
> > > that did not update the process_diffs helper.
> > >
> > > Another thing that I find somewhat problematic in the original
> > > (brian cc'ed) is that it does not special case all-zero object name
> > > specially.  By turning any and all instances of $_x40 to $ZERO_OID,
> > > we lose the distinction between a random-looking object name which
> > > got turned into $ZERO_OID by the processing, and an object name that
> > > was $ZERO_OID from the beginning, so we won't catch a possible
> > > future bug where new file's preimage object name is not $ZERO_OID
> > > (this is plausible when you try to show an intent-to-add entry; the
> > > diff between the index and the working tree would be "new file"
> > > patch, but the index entry records the object name for an empty
> > > blob, e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 instead of $ZERO_OID
> > > can easily be emitted by a mistaken implementation).
> 
> Yeah, it wasn't intended that we munge mode bits, and I definitely agree
> we'd be better off distinguishing between all-zero and non-zero OIDs.
> 
> As you might imagine, this is not my favorite test because have a large
> amount of diff formats, and even I think the giant list of sed
> expressions I wrote is hideous.  It is, however, reasonably
> comprehensive, which is pretty much the only nice thing that can be said
> about it.
> 
> > So here is what I came up with as a possible starting point.  The
> > idea is to grab hexadecimal strings at locations the original tried
> > to isolate with various contexts, and
> > 
> >  - if the input happens to be all zero, use '0', otherwise use 'f'
> > 
> >  - if the input is 40-bytes (i.e. unabbreviated object name in the
> >    SHA-1 world), repeat the character chosen in the first step as
> >    many times as there are chars in $ZERO_OID
> > 
> >  - otherwise, repeat the character chosen in the first step as many
> >    times as there are chars in the input.
> > 
> >  - regardless of all of the above, special case possible in-tree
> >    blob modes (100644, 100755 and 120000) and don't munge them.
> > 
> > I haven't tried it with the patch that started this discussion
> > thread, nor with SHA-256 build, though.
> 
> This seems like a sane approach.
> 
> > diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> > index 43267d6024..b33e60ab9d 100755
> > --- a/t/t4013-diff-various.sh
> > +++ b/t/t4013-diff-various.sh
> > @@ -130,27 +130,43 @@ test_expect_success setup '
> >  EOF
> >  
> >  process_diffs () {
> > -	_x04="[0-9a-f][0-9a-f][0-9a-f][0-9a-f]" &&
> > -	_x07="$_x05[0-9a-f][0-9a-f]" &&
> > -	sed -e "s/$OID_REGEX/$ZERO_OID/g" \
> > -	    -e "s/From $_x40 /From $ZERO_OID /" \
> > -	    -e "s/from $_x40)/from $ZERO_OID)/" \
> > -	    -e "s/commit $_x40\$/commit $ZERO_OID/" \
> > -	    -e "s/commit $_x40 (/commit $ZERO_OID (/" \
> > -	    -e "s/$_x40 $_x40 $_x40/$ZERO_OID $ZERO_OID $ZERO_OID/" \
> > -	    -e "s/$_x40 $_x40 /$ZERO_OID $ZERO_OID /" \
> > -	    -e "s/^$_x40 $_x40$/$ZERO_OID $ZERO_OID/" \
> > -	    -e "s/^$_x40 /$ZERO_OID /" \
> > -	    -e "s/^$_x40$/$ZERO_OID/" \
> > -	    -e "s/$_x07\.\.$_x07/fffffff..fffffff/g" \
> > -	    -e "s/$_x07,$_x07\.\.$_x07/fffffff,fffffff..fffffff/g" \
> > -	    -e "s/$_x07 $_x07 $_x07/fffffff fffffff fffffff/g" \
> > -	    -e "s/$_x07 $_x07 /fffffff fffffff /g" \
> > -	    -e "s/Merge: $_x07 $_x07/Merge: fffffff fffffff/g" \
> > -	    -e "s/$_x07\.\.\./fffffff.../g" \
> > -	    -e "s/ $_x04\.\.\./ ffff.../g" \
> > -	    -e "s/ $_x04/ ffff/g" \
> > -	    "$1"
> > +	perl -e '
> > +		my $oid_length = length($ARGV[0]);
> > +		my $x40 = "[0-9a-f]{40}";
> > +		my $xab = "[0-9a-f]{4,16}";
> > +		my $orx = "[0-9a-f]" x $oid_length;
> > +
> > +		sub munge_oid {
> > +			my ($oid) = @_;
> > +			my $x;
> > +
> > +			if ($oid =~ /^(100644|100755|120000)$/) {
> > +				return $oid;
> > +			}
> > +
> > +			if ($oid =~ /^0*$/) {
> > +				$x = "0";
> > +			} else {
> > +				$x = "f";
> > +			}
> > +
> > +			if (length($oid) == 40) {
> > +				return $x x $oid_length;
> > +			} else {
> > +				return $x x length($oid);
> > +			}
> > +		}
> > +
> > +		while (<STDIN>) {
> > +			s/($orx)/munge_oid($1)/ge;
> > +			s/From ($x40)( |\))/"From " . munge_oid($1) . $2/ge;
> > +			s/commit ($x40)($| \()/"commit " . munge_oid($1) . $2/ge;
> > +			s/($x40) /munge_oid($1) . " "/ge;
> > +			s/^($x40)($| )/munge_oid($1) . $2/e;
> > +			s/($xab)(\.\.|,| |\.\.\.|$)/munge_oid($1) . $2/ge;
> > +			print;
> > +		}
> > +	' "$ZERO_OID" <"$1"
> >  }
> 
> This is much nicer, but I think we need the following on top of it
> because we have a couple of tricky cases the original didn't consider:
> 
> * Some of our 64-bit object IDs get processed twice, turning them into
>   88-character object IDs, which don't match.  We therefore need "\b".
> * The new unabbreviated index lines aren't accounted for, so I included
>   them by possibly matching "\.\.".
> * We have some lines that look like "commit $OID (from $OID)" that
>   aren't accounted for.  Because we now have an optional OID in
>   munge_oid, I had to account for that as well.
> 
> So this is what's on top:
> 
> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> index b5c7e1a63b..dfc87a0d19 100755
> --- a/t/t4013-diff-various.sh
> +++ b/t/t4013-diff-various.sh
> @@ -140,6 +140,8 @@ process_diffs () {
>  			my ($oid) = @_;
>  			my $x;
>  
> +			return "" unless length $oid;
> +
>  			if ($oid =~ /^(100644|100755|120000)$/) {
>  				return $oid;
>  			}
> @@ -160,8 +162,8 @@ process_diffs () {
>  		while (<STDIN>) {
>  			s/($orx)/munge_oid($1)/ge;
>  			s/From ($x40)( |\))/"From " . munge_oid($1) . $2/ge;
> -			s/commit ($x40)($| \()/"commit " . munge_oid($1) . $2/ge;
> -			s/($x40) /munge_oid($1) . " "/ge;
> +			s/commit ($x40)($| \(from )($x40?)/"commit " .  munge_oid($1) . $2 . munge_oid($3)/ge;
> +			s/\b($x40)( |\.\.|$)/munge_oid($1) . $2/ge;
>  			s/^($x40)($| )/munge_oid($1) . $2/e;
>  			s/($xab)(\.\.|,| |\.\.\.|$)/munge_oid($1) . $2/ge;
>  			print;
> 
> Or, a fresh original version:
> 
> -- %< --
> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> index e6eb4dd4c7..dfc87a0d19 100755
> --- a/t/t4013-diff-various.sh
> +++ b/t/t4013-diff-various.sh
> @@ -130,27 +130,45 @@ test_expect_success setup '
>  EOF
>  
>  process_diffs () {
> -	_x04="[0-9a-f][0-9a-f][0-9a-f][0-9a-f]" &&
> -	_x07="$_x05[0-9a-f][0-9a-f]" &&
> -	sed -e "s/$OID_REGEX/$ZERO_OID/g" \
> -	    -e "s/From $_x40 /From $ZERO_OID /" \
> -	    -e "s/from $_x40)/from $ZERO_OID)/" \
> -	    -e "s/commit $_x40\$/commit $ZERO_OID/" \
> -	    -e "s/commit $_x40 (/commit $ZERO_OID (/" \
> -	    -e "s/$_x40 $_x40 $_x40/$ZERO_OID $ZERO_OID $ZERO_OID/" \
> -	    -e "s/$_x40 $_x40 /$ZERO_OID $ZERO_OID /" \
> -	    -e "s/^$_x40 $_x40$/$ZERO_OID $ZERO_OID/" \
> -	    -e "s/^$_x40 /$ZERO_OID /" \
> -	    -e "s/^$_x40$/$ZERO_OID/" \
> -	    -e "s/$_x07\.\.$_x07/fffffff..fffffff/g" \
> -	    -e "s/$_x07,$_x07\.\.$_x07/fffffff,fffffff..fffffff/g" \
> -	    -e "s/$_x07 $_x07 $_x07/fffffff fffffff fffffff/g" \
> -	    -e "s/$_x07 $_x07 /fffffff fffffff /g" \
> -	    -e "s/Merge: $_x07 $_x07/Merge: fffffff fffffff/g" \
> -	    -e "s/$_x07\.\.\./fffffff.../g" \
> -	    -e "s/ $_x04\.\.\./ ffff.../g" \
> -	    -e "s/ $_x04/ ffff/g" \
> -	    "$1"
> +	perl -e '
> +		my $oid_length = length($ARGV[0]);
> +		my $x40 = "[0-9a-f]{40}";
> +		my $xab = "[0-9a-f]{4,16}";
> +		my $orx = "[0-9a-f]" x $oid_length;
> +
> +		sub munge_oid {
> +			my ($oid) = @_;
> +			my $x;
> +
> +			return "" unless length $oid;
> +
> +			if ($oid =~ /^(100644|100755|120000)$/) {
> +				return $oid;
> +			}
> +
> +			if ($oid =~ /^0*$/) {
> +				$x = "0";
> +			} else {
> +				$x = "f";
> +			}
> +
> +			if (length($oid) == 40) {
> +				return $x x $oid_length;
> +			} else {
> +				return $x x length($oid);
> +			}
> +		}
> +
> +		while (<STDIN>) {
> +			s/($orx)/munge_oid($1)/ge;
> +			s/From ($x40)( |\))/"From " . munge_oid($1) . $2/ge;
> +			s/commit ($x40)($| \(from )($x40?)/"commit " .  munge_oid($1) . $2 . munge_oid($3)/ge;
> +			s/\b($x40)( |\.\.|$)/munge_oid($1) . $2/ge;
> +			s/^($x40)($| )/munge_oid($1) . $2/e;
> +			s/($xab)(\.\.|,| |\.\.\.|$)/munge_oid($1) . $2/ge;
> +			print;
> +		}
> +	' "$ZERO_OID" <"$1"
>  }
>  
>  V=$(git version | sed -e 's/^git version //' -e 's/\./\\./g')
> -- %< --
> 
> Anyone is welcome to have my sign-off if they pick up any part of this
> patch.

This one seems to work with:

	make -j9 test GIT_TEST_DEFAULT_HASH=sha256

If noone step up and write this into a patch in some days,
I'll take this as first step in my series.

Also waiting some days so other people could come up with better idea,
sed's y seems to be able to work if we don't have the constraint
on all-0 oid.


-- 
Danh

  reply	other threads:[~2020-08-15 21:57 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-09  2:19 [RFC PATCH 0/2] extend --abbrev support to diff-patch format Đoàn Trần Công Danh
2020-08-09  2:19 ` [RFC PATCH 1/2] revision: differentiate if --no-abbrev asked explicitly Đoàn Trần Công Danh
2020-08-09  2:19 ` [RFC PATCH 2/2] diff: extend --abbrev support to diff-patch format Đoàn Trần Công Danh
2020-08-09 19:01 ` [RFC PATCH 0/2] " Junio C Hamano
2020-08-10 10:00   ` Jeff King
2020-08-10 12:31     ` Đoàn Trần Công Danh
2020-08-10 15:15       ` Junio C Hamano
2020-08-10 15:27         ` Jeff King
2020-08-11  0:33           ` Đoàn Trần Công Danh
2020-08-11  5:22             ` Jeff King
2020-08-11 12:07               ` Đoàn Trần Công Danh
2020-08-10 15:06     ` Junio C Hamano
2020-08-11 11:49 ` [PATCH v2 0/2] diff: index-line: respect --abbrev in object's name Đoàn Trần Công Danh
2020-08-11 11:49   ` [PATCH v2 1/2] revision: differentiate if --no-abbrev asked explicitly Đoàn Trần Công Danh
2020-08-11 18:54     ` Junio C Hamano
2020-08-11 11:49   ` [PATCH v2 2/2] diff: index-line: respect --abbrev in object's name Đoàn Trần Công Danh
2020-08-14  0:23 ` [PATCH v3 0/2] " Đoàn Trần Công Danh
2020-08-14  0:23   ` [PATCH v3 1/2] revision: differentiate if --no-abbrev asked explicitly Đoàn Trần Công Danh
2020-08-14  0:50     ` Junio C Hamano
2020-08-14  0:59       ` Đoàn Trần Công Danh
2020-08-14  1:06         ` Junio C Hamano
2020-08-14 14:50           ` Đoàn Trần Công Danh
2020-08-19 22:50             ` Junio C Hamano
2020-08-14  0:23   ` [PATCH v3 2/2] diff: index-line: respect --abbrev in object's name Đoàn Trần Công Danh
2020-08-14 15:18     ` SZEDER Gábor
2020-08-14 17:00       ` Junio C Hamano
2020-08-14 18:59         ` Junio C Hamano
2020-08-15  0:21           ` brian m. carlson
2020-08-15  2:27             ` Đoàn Trần Công Danh [this message]
2020-08-17 16:17               ` Junio C Hamano
2020-08-20  4:56               ` Junio C Hamano
2020-08-20 12:35 ` [PATCH v4 0/2] " Đoàn Trần Công Danh
2020-08-20 12:35   ` [PATCH v4 1/2] t4013: improve diff-post-processor logic Đoàn Trần Công Danh
2020-08-20 19:49     ` Junio C Hamano
2020-08-21 12:05       ` Đoàn Trần Công Danh
2020-08-21 15:44         ` Junio C Hamano
2020-08-20 12:35   ` [PATCH v4 2/2] diff: index-line: respect --abbrev in object's name Đoàn Trần Công Danh
2020-08-20 19:58     ` Junio C Hamano
2020-08-21 11:51 ` [PATCH v5 0/2] " Đoàn Trần Công Danh
2020-08-21 11:51   ` [PATCH v5 1/2] t4013: improve diff-post-processor logic Đoàn Trần Công Danh
2020-08-21 11:51   ` [PATCH v5 2/2] diff: index-line: respect --abbrev in object's name Đoàn Trần Công Danh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200815022759.GC12363@danh.dev \
    --to=congdanhqx@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.