git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jerry Zhang <jerry@skydio.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git Mailing List <git@vger.kernel.org>, lilinchao@oschina.cn
Subject: Re: [PATCH] git-apply: fix --3way with binary patch
Date: Wed, 28 Jul 2021 12:38:20 -0700	[thread overview]
Message-ID: <CAMKO5CvZCMHuzRLSs2aHJ3iUH-LBJfFP3fG+GgwtQvsKQPtT5Q@mail.gmail.com> (raw)
In-Reply-To: <xmqqh7gfawlt.fsf@gitster.g>

On Tue, Jul 27, 2021 at 9:30 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jerry Zhang <jerry@skydio.com> writes:
>
> > Binary patches applied with "--3way" will
> > always return a conflict even if the patch
> > should cleanly apply because the low level
> > merge function considers all binary merges
> > without a variant to be conflicting.
> >
> > Fix by falling back to normal patch application
> > for all binary patches.
> >
> > Add tests for --3way and normal applications
> > of binary patches.
> >
> > Fixes: 923cd87ac8 ("git-apply: try threeway first when "--3way" is used")
> > Signed-off-by: Jerry Zhang <jerry@skydio.com>
> > ---
> >  apply.c                   |  3 ++-
> >  t/t4108-apply-threeway.sh | 45 +++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 47 insertions(+), 1 deletion(-)
> >
> > diff --git a/apply.c b/apply.c
> > index 1d2d7e124e..78e52f0dc1 100644
> > --- a/apply.c
> > +++ b/apply.c
> > @@ -3638,7 +3638,8 @@ static int apply_data(struct apply_state *state, struct patch *patch,
> >       if (load_preimage(state, &image, patch, st, ce) < 0)
> >               return -1;
> >
> > -     if (!state->threeway || try_threeway(state, &image, patch, st, ce) < 0) {
> > +     if (!state->threeway || patch->is_binary ||
> > +             try_threeway(state, &image, patch, st, ce) < 0) {
>
> Thanks for a quick turnaround.  However.
>
> Because apply.c::three_way_merge() calls into ll_merge() that lets
> the low-level custom merge drivers to take over the actual merge, I
> do not think your "if binary, bypass and never call try_threway() at
> all" is the right solution.  The custom merge driver user uses for
> the path may successfully perform such a "trivial" three-way merge
> and return success.
I understand now, thanks for the explanation
>
> Why does the current code that lets threeway tried first fails to
> fall back to direct application?  The code before your change, if
> fed a binary patch that does not apply, would have failed the direct
> application first *and* then fell back to the threeway (if only to
> fail because we do not let binary files be merged), no?
>
> Is it that try_threeway()'s way to express failure slightly
> different from how direct application reports failure, but your
> change used the same "only if it is negative, we fail and fallback"
> logic?  IIRC, apply_fragments() which is the meat of the direct
> application logic reports failures by negative, but try_threeway()
> can return positive non-zero to signal a "recoverable" failure (aka
> "conflicted merge").  Which should lead us to explore a different
> approach, which is ...
>
>     Would it be possible for a patch to leave conflicts when
>     try_threeway() was attempted, but will cleanly apply if direct
>     application is done?
>
> If so, perhaps
>
>  - we first run try_threeway() and see if it cleanly resolves; if
>    so, we are done.
>
>  - then we try direct application and see if it cleanly applies; if
>    so, we are done.
>
>  - finally we run try_threeway() again and let it fail with
>    conflict.
>
> might be the right sequence?  We theoretically could omit the first
> of these three steps, but that would mean we'd write 923cd87a
> (git-apply: try threeway first when "--3way" is used, 2021-04-06)
> off as a failed experiment and revert it, which would not be ideal.
>
>
> Also, independent from this "if we claim we try threeway first and
> fall back to direct application, we really should do so" fix we are
> discussing, I think our default binary merge can be a bit more
> lenient and resolve this particular case of applying the binary
> patch taken from itself (i.e. a patch that takes A to B gets applied
> using --3way option to A).  I wonder if it can be as simple as the
> attached patch.  FWIW, this change is sufficient (without the change
> to apply.c we are reviewing here) to make your new tests in t4108
> pass.
So basically, another way of stating the problem would be that binary
patches can apply cleanly with direct application in some cases where
merge application is not clean. If i understand correctly this is unique
to binary files, although it would be possible for a user to supply a custom
merge driver for text files that is worse than direct application, that is
most likely heavy user error that we shouldn't have to cater to. However
the issue with binary is that the *default* merge driver is actually worse
than direct application (in some cases). Therefore our options are

1. do as you suggest and run 3way -> direct -> 3way. I would modify
this and say we should only attempt this for binary patches, since a text
file that fails 3way would most likely also fail direct, so it would be a waste
of time to try it. furthermore if we cache results from the first 3way and
return them after attempting direct, it can save us from having to compute
the 3way twice, so would be no worse than our current performance.

2. improve the default binary merge driver to be at least as good as direct
application. this would allow us to say overall that "merge drivers should
be at least as intelligent as direct patch application" and would greatly
simplify logic in apply.c. Your change is a good first step in allowing it
to handle more cases. A trivial way to make the binary merge driver
at least as good as patch application is to generate a patch and apply
it as part of the merge. I imagine this would have other consequences
though as many parts of git use the binary merge driver.

Separately I think it would be a worthwhile follow-up patch to also handle
trivial three-way merges in try_threeway(). This would:
1. Allow us to compare oid instead of the entire file buffer, which would be
faster.
2. Handle trivial merges of all file types, which would save time.

>
> ---- >8 ------- >8 ------- >8 ------- >8 ------- >8 ------- >8 ----
> Subject: ll-merge: teach ll_binary_merge() a trivial three-way merge
>
> The low-level binary merge code assumed that the caller will not
> feed trivial merges that would have been resolved at the tree level;
> because of this, ll_binary_merge() assumes the ancestor is different
> from either side, always failing the merge in conflict unless -Xours
> or -Xtheirs is in effect.
>
> But "git apply --3way" codepath could ask us to perform three-way
> merge between two binaries A and B using A as the ancestor version.
> The current code always fails such an application, but when given a
> binary patch that turns A into B and asked to apply it to A, there
> is no reason to fail such a request---we can trivially tell that the
> result must be B.
>
> Arguably, this fix may belong to one level higher at ll_merge()
> function, which dispatches to lower-level merge drivers, possibly
> even before it renormalizes the three input buffers.  But let's
> first see how this goes.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  ll-merge.c | 56 +++++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 39 insertions(+), 17 deletions(-)
>
> diff --git c/ll-merge.c w/ll-merge.c
> index 261657578c..bc8038d404 100644
> --- c/ll-merge.c
> +++ w/ll-merge.c
> @@ -46,6 +46,13 @@ void reset_merge_attributes(void)
>         merge_attributes = NULL;
>  }
>
> +static int same_mmfile(mmfile_t *a, mmfile_t *b)
> +{
> +       if (a->size != b->size)
> +               return 0;
> +       return !memcmp(a->ptr, b->ptr, a->size);
> +}
> +
>  /*
>   * Built-in low-levels
>   */
> @@ -58,9 +65,18 @@ static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
>                            const struct ll_merge_options *opts,
>                            int marker_size)
>  {
> +       int status;
>         mmfile_t *stolen;
>         assert(opts);
>
> +       /*
> +        * With -Xtheirs or -Xours, we have cleanly merged;
> +        * otherwise we got a conflict, unless 3way trivially
> +        * resolves.
> +        */
> +       status = (opts->variant == XDL_MERGE_FAVOR_OURS ||
> +                 opts->variant == XDL_MERGE_FAVOR_THEIRS) ? 0 : 1;
> +
>         /*
>          * The tentative merge result is the common ancestor for an
>          * internal merge.  For the final merge, it is "ours" by
> @@ -68,18 +84,30 @@ static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
>          */
>         if (opts->virtual_ancestor) {
>                 stolen = orig;
> +               status = 0;
>         } else {
> -               switch (opts->variant) {
> -               default:
> -                       warning("Cannot merge binary files: %s (%s vs. %s)",
> -                               path, name1, name2);
> -                       /* fallthru */
> -               case XDL_MERGE_FAVOR_OURS:
> -                       stolen = src1;
> -                       break;
> -               case XDL_MERGE_FAVOR_THEIRS:
> +               if (same_mmfile(orig, src1)) {
>                         stolen = src2;
> -                       break;
> +                       status = 0;
> +               } else if (same_mmfile(orig, src2)) {
> +                       stolen = src1;
> +                       status = 0;
> +               } else if (same_mmfile(src1, src2)) {
> +                       stolen = src1;
> +                       status = 0;
> +               } else {
> +                       switch (opts->variant) {
> +                       default:
> +                               warning("Cannot merge binary files: %s (%s vs. %s)",
> +                                       path, name1, name2);
> +                               /* fallthru */
> +                       case XDL_MERGE_FAVOR_OURS:
> +                               stolen = src1;
> +                               break;
> +                       case XDL_MERGE_FAVOR_THEIRS:
> +                               stolen = src2;
> +                               break;
> +                       }
>                 }
>         }
>
> @@ -87,13 +115,7 @@ static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
>         result->size = stolen->size;
>         stolen->ptr = NULL;
>
> -       /*
> -        * With -Xtheirs or -Xours, we have cleanly merged;
> -        * otherwise we got a conflict.
> -        */
> -       return opts->variant == XDL_MERGE_FAVOR_OURS ||
> -              opts->variant == XDL_MERGE_FAVOR_THEIRS ?
> -              0 : 1;
> +       return status;
>  }
>
>  static int ll_xdl_merge(const struct ll_merge_driver *drv_unused,

  parent reply	other threads:[~2021-07-28 19:38 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-28  2:44 [PATCH] git-apply: fix --3way with binary patch Jerry Zhang
2021-07-28  4:29 ` Junio C Hamano
2021-07-28  4:30 ` Junio C Hamano
2021-07-28 17:55   ` [PATCH] ll-merge: teach ll_binary_merge() a trivial three-way merge Junio C Hamano
2021-07-28 23:49     ` Elijah Newren
2021-07-29  1:06       ` Junio C Hamano
2021-09-05 19:06         ` [PATCH v2] apply: resolve trivial merge without hitting ll-merge with "--3way" Junio C Hamano
2021-09-06 18:57           ` Elijah Newren
2021-09-06 21:59           ` Ævar Arnfjörð Bjarmason
2021-09-07  2:32             ` Junio C Hamano
2021-09-07 20:15             ` Junio C Hamano
2021-07-28 19:38   ` Jerry Zhang [this message]
2021-07-28 20:04     ` [PATCH] git-apply: fix --3way with binary patch Jerry Zhang
2021-07-28 20:08     ` Junio C Hamano
2021-07-28 20:37       ` Jerry Zhang
2021-07-28 21:01         ` Junio C Hamano
     [not found] <ee1d1a82ef4d11ebbb4cd4ae5278bc123046@skydio.com>
2021-07-28  4:45 ` lilinchao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMKO5CvZCMHuzRLSs2aHJ3iUH-LBJfFP3fG+GgwtQvsKQPtT5Q@mail.gmail.com \
    --to=jerry@skydio.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=lilinchao@oschina.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).