From: Jerry Zhang <jerry@skydio.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git Mailing List <git@vger.kernel.org>, lilinchao@oschina.cn
Subject: Re: [PATCH] git-apply: fix --3way with binary patch
Date: Wed, 28 Jul 2021 13:04:31 -0700 [thread overview]
Message-ID: <CAMKO5CsPK65wzkJSYv6dHO4wjsX1+U7yX5P95jS3UHXKThMjqw@mail.gmail.com> (raw)
In-Reply-To: <CAMKO5CvZCMHuzRLSs2aHJ3iUH-LBJfFP3fG+GgwtQvsKQPtT5Q@mail.gmail.com>
On Wed, Jul 28, 2021 at 12:38 PM Jerry Zhang <jerry@skydio.com> wrote:
>
> On Tue, Jul 27, 2021 at 9:30 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Jerry Zhang <jerry@skydio.com> writes:
> >
> > > Binary patches applied with "--3way" will
> > > always return a conflict even if the patch
> > > should cleanly apply because the low level
> > > merge function considers all binary merges
> > > without a variant to be conflicting.
> > >
> > > Fix by falling back to normal patch application
> > > for all binary patches.
> > >
> > > Add tests for --3way and normal applications
> > > of binary patches.
> > >
> > > Fixes: 923cd87ac8 ("git-apply: try threeway first when "--3way" is used")
> > > Signed-off-by: Jerry Zhang <jerry@skydio.com>
> > > ---
> > > apply.c | 3 ++-
> > > t/t4108-apply-threeway.sh | 45 +++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 47 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/apply.c b/apply.c
> > > index 1d2d7e124e..78e52f0dc1 100644
> > > --- a/apply.c
> > > +++ b/apply.c
> > > @@ -3638,7 +3638,8 @@ static int apply_data(struct apply_state *state, struct patch *patch,
> > > if (load_preimage(state, &image, patch, st, ce) < 0)
> > > return -1;
> > >
> > > - if (!state->threeway || try_threeway(state, &image, patch, st, ce) < 0) {
> > > + if (!state->threeway || patch->is_binary ||
> > > + try_threeway(state, &image, patch, st, ce) < 0) {
> >
> > Thanks for a quick turnaround. However.
> >
> > Because apply.c::three_way_merge() calls into ll_merge() that lets
> > the low-level custom merge drivers to take over the actual merge, I
> > do not think your "if binary, bypass and never call try_threway() at
> > all" is the right solution. The custom merge driver user uses for
> > the path may successfully perform such a "trivial" three-way merge
> > and return success.
> I understand now, thanks for the explanation
> >
> > Why does the current code that lets threeway tried first fails to
> > fall back to direct application? The code before your change, if
> > fed a binary patch that does not apply, would have failed the direct
> > application first *and* then fell back to the threeway (if only to
> > fail because we do not let binary files be merged), no?
> >
> > Is it that try_threeway()'s way to express failure slightly
> > different from how direct application reports failure, but your
> > change used the same "only if it is negative, we fail and fallback"
> > logic? IIRC, apply_fragments() which is the meat of the direct
> > application logic reports failures by negative, but try_threeway()
> > can return positive non-zero to signal a "recoverable" failure (aka
> > "conflicted merge"). Which should lead us to explore a different
> > approach, which is ...
> >
> > Would it be possible for a patch to leave conflicts when
> > try_threeway() was attempted, but will cleanly apply if direct
> > application is done?
> >
> > If so, perhaps
> >
> > - we first run try_threeway() and see if it cleanly resolves; if
> > so, we are done.
> >
> > - then we try direct application and see if it cleanly applies; if
> > so, we are done.
> >
> > - finally we run try_threeway() again and let it fail with
> > conflict.
> >
> > might be the right sequence? We theoretically could omit the first
> > of these three steps, but that would mean we'd write 923cd87a
> > (git-apply: try threeway first when "--3way" is used, 2021-04-06)
> > off as a failed experiment and revert it, which would not be ideal.
> >
> >
> > Also, independent from this "if we claim we try threeway first and
> > fall back to direct application, we really should do so" fix we are
> > discussing, I think our default binary merge can be a bit more
> > lenient and resolve this particular case of applying the binary
> > patch taken from itself (i.e. a patch that takes A to B gets applied
> > using --3way option to A). I wonder if it can be as simple as the
> > attached patch. FWIW, this change is sufficient (without the change
> > to apply.c we are reviewing here) to make your new tests in t4108
> > pass.
> So basically, another way of stating the problem would be that binary
> patches can apply cleanly with direct application in some cases where
> merge application is not clean. If i understand correctly this is unique
> to binary files, although it would be possible for a user to supply a custom
> merge driver for text files that is worse than direct application, that is
> most likely heavy user error that we shouldn't have to cater to. However
> the issue with binary is that the *default* merge driver is actually worse
> than direct application (in some cases). Therefore our options are
>
> 1. do as you suggest and run 3way -> direct -> 3way. I would modify
> this and say we should only attempt this for binary patches, since a text
> file that fails 3way would most likely also fail direct, so it would be a waste
> of time to try it. furthermore if we cache results from the first 3way and
> return them after attempting direct, it can save us from having to compute
> the 3way twice, so would be no worse than our current performance.
>
> 2. improve the default binary merge driver to be at least as good as direct
> application. this would allow us to say overall that "merge drivers should
> be at least as intelligent as direct patch application" and would greatly
> simplify logic in apply.c. Your change is a good first step in allowing it
> to handle more cases. A trivial way to make the binary merge driver
> at least as good as patch application is to generate a patch and apply
> it as part of the merge. I imagine this would have other consequences
> though as many parts of git use the binary merge driver.
Hmm I would have thought that binary patches allow context, similar to this
test snippet
"
test_expect_success 'apply complex binary file patch' '
git reset --hard main &&
cp $TEST_DIRECTORY/test-binary-1.png bin.png &&
git add bin.png &&
git commit -m "add binary file" &&
echo 1 >>bin.png &&
git diff --binary >bin.diff &&
git reset --hard &&
cat $TEST_DIRECTORY/test-binary-2.png
$TEST_DIRECTORY/test-binary-1.png >bin.png &&
git add bin.png &&
git commit -m "change binary file" &&
# Apply must succeed.
git apply bin.diff
'
"
but upon running it I see that normal patch application still requires the
preimage to match exactly.
"
error: the patch applies to 'bin.png'
(836481bd1b9b6bd7a1bb8939cf4ea01e05946850), which does not match the
current contents.
error: bin.png: patch does not apply
"
So at least in regards to making the default binary merge driver "at
least as intelligent" as
direct patch application, your patch ought to do it.
>
> Separately I think it would be a worthwhile follow-up patch to also handle
> trivial three-way merges in try_threeway(). This would:
> 1. Allow us to compare oid instead of the entire file buffer, which would be
> faster.
> 2. Handle trivial merges of all file types, which would save time.
>
> >
> > ---- >8 ------- >8 ------- >8 ------- >8 ------- >8 ------- >8 ----
> > Subject: ll-merge: teach ll_binary_merge() a trivial three-way merge
> >
> > The low-level binary merge code assumed that the caller will not
> > feed trivial merges that would have been resolved at the tree level;
> > because of this, ll_binary_merge() assumes the ancestor is different
> > from either side, always failing the merge in conflict unless -Xours
> > or -Xtheirs is in effect.
> >
> > But "git apply --3way" codepath could ask us to perform three-way
> > merge between two binaries A and B using A as the ancestor version.
> > The current code always fails such an application, but when given a
> > binary patch that turns A into B and asked to apply it to A, there
> > is no reason to fail such a request---we can trivially tell that the
> > result must be B.
> >
> > Arguably, this fix may belong to one level higher at ll_merge()
> > function, which dispatches to lower-level merge drivers, possibly
> > even before it renormalizes the three input buffers. But let's
> > first see how this goes.
> >
> > Signed-off-by: Junio C Hamano <gitster@pobox.com>
> > ---
> > ll-merge.c | 56 +++++++++++++++++++++++++++++++++++++++-----------------
> > 1 file changed, 39 insertions(+), 17 deletions(-)
> >
> > diff --git c/ll-merge.c w/ll-merge.c
> > index 261657578c..bc8038d404 100644
> > --- c/ll-merge.c
> > +++ w/ll-merge.c
> > @@ -46,6 +46,13 @@ void reset_merge_attributes(void)
> > merge_attributes = NULL;
> > }
> >
> > +static int same_mmfile(mmfile_t *a, mmfile_t *b)
> > +{
> > + if (a->size != b->size)
> > + return 0;
> > + return !memcmp(a->ptr, b->ptr, a->size);
> > +}
> > +
> > /*
> > * Built-in low-levels
> > */
> > @@ -58,9 +65,18 @@ static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
> > const struct ll_merge_options *opts,
> > int marker_size)
> > {
> > + int status;
> > mmfile_t *stolen;
> > assert(opts);
> >
> > + /*
> > + * With -Xtheirs or -Xours, we have cleanly merged;
> > + * otherwise we got a conflict, unless 3way trivially
> > + * resolves.
> > + */
> > + status = (opts->variant == XDL_MERGE_FAVOR_OURS ||
> > + opts->variant == XDL_MERGE_FAVOR_THEIRS) ? 0 : 1;
> > +
> > /*
> > * The tentative merge result is the common ancestor for an
> > * internal merge. For the final merge, it is "ours" by
> > @@ -68,18 +84,30 @@ static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
> > */
> > if (opts->virtual_ancestor) {
> > stolen = orig;
> > + status = 0;
> > } else {
> > - switch (opts->variant) {
> > - default:
> > - warning("Cannot merge binary files: %s (%s vs. %s)",
> > - path, name1, name2);
> > - /* fallthru */
> > - case XDL_MERGE_FAVOR_OURS:
> > - stolen = src1;
> > - break;
> > - case XDL_MERGE_FAVOR_THEIRS:
> > + if (same_mmfile(orig, src1)) {
> > stolen = src2;
> > - break;
> > + status = 0;
> > + } else if (same_mmfile(orig, src2)) {
> > + stolen = src1;
> > + status = 0;
> > + } else if (same_mmfile(src1, src2)) {
> > + stolen = src1;
> > + status = 0;
> > + } else {
> > + switch (opts->variant) {
> > + default:
> > + warning("Cannot merge binary files: %s (%s vs. %s)",
> > + path, name1, name2);
> > + /* fallthru */
> > + case XDL_MERGE_FAVOR_OURS:
> > + stolen = src1;
> > + break;
> > + case XDL_MERGE_FAVOR_THEIRS:
> > + stolen = src2;
> > + break;
> > + }
> > }
> > }
> >
> > @@ -87,13 +115,7 @@ static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
> > result->size = stolen->size;
> > stolen->ptr = NULL;
> >
> > - /*
> > - * With -Xtheirs or -Xours, we have cleanly merged;
> > - * otherwise we got a conflict.
> > - */
> > - return opts->variant == XDL_MERGE_FAVOR_OURS ||
> > - opts->variant == XDL_MERGE_FAVOR_THEIRS ?
> > - 0 : 1;
> > + return status;
> > }
> >
> > static int ll_xdl_merge(const struct ll_merge_driver *drv_unused,
next prev parent reply other threads:[~2021-07-28 20:04 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-28 2:44 [PATCH] git-apply: fix --3way with binary patch Jerry Zhang
2021-07-28 4:29 ` Junio C Hamano
2021-07-28 4:30 ` Junio C Hamano
2021-07-28 17:55 ` [PATCH] ll-merge: teach ll_binary_merge() a trivial three-way merge Junio C Hamano
2021-07-28 23:49 ` Elijah Newren
2021-07-29 1:06 ` Junio C Hamano
2021-09-05 19:06 ` [PATCH v2] apply: resolve trivial merge without hitting ll-merge with "--3way" Junio C Hamano
2021-09-06 18:57 ` Elijah Newren
2021-09-06 21:59 ` Ævar Arnfjörð Bjarmason
2021-09-07 2:32 ` Junio C Hamano
2021-09-07 20:15 ` Junio C Hamano
2021-07-28 19:38 ` [PATCH] git-apply: fix --3way with binary patch Jerry Zhang
2021-07-28 20:04 ` Jerry Zhang [this message]
2021-07-28 20:08 ` Junio C Hamano
2021-07-28 20:37 ` Jerry Zhang
2021-07-28 21:01 ` Junio C Hamano
[not found] <ee1d1a82ef4d11ebbb4cd4ae5278bc123046@skydio.com>
2021-07-28 4:45 ` lilinchao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMKO5CsPK65wzkJSYv6dHO4wjsX1+U7yX5P95jS3UHXKThMjqw@mail.gmail.com \
--to=jerry@skydio.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=lilinchao@oschina.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).