From: <rsbecker@nexbridge.com>
To: "'Emanuel Czirai'" <correabuscar+gitML@gmail.com>, <git@vger.kernel.org>
Subject: RE: `git diff`/`git apply` can generate/apply ambiguous hunks (ie. in the wrong place) (just like gnu diff/patch)
Date: Wed, 3 Jul 2024 11:55:56 -0400 [thread overview]
Message-ID: <082b01dacd61$81174a80$8345df80$@nexbridge.com> (raw)
In-Reply-To: <CAFjaU5sAVaNHZ0amPXJcbSvsnaijo+3X5Otg_Mntkx2GbikZMA@mail.gmail.com>
On Wednesday, July 3, 2024 11:25 AM, Emanuel Czirai wrote:
>This doesn't affect `git rebase` as it's way more robust than simply extracting the
>commits as patches and re-applying them. (I haven't looked into `git merge` though,
>but I doubt it's affected)
>
>It seems to me that no matter which algorithm `git diff` uses (of the 4), it generates
>the same hunk, because it's really the context length that matters (which by default
>is 3), which is the same one that gnu `diff` generates, and its application via `git
>apply` is the same as gnu `patch`.
>
>side note:
>`diffy` is a simple rust-written library that behaves (at version 0.4.0) the same as
>normal diff and patch apply(with regards to generated diff contents and patch
>application; except it doesn't set the original/modified filenames in the patch), and
>since my limited experience, I've found it simpler to modify it to make it so that it
>generates unambiguous hunks, as a proof of concept that it can be done this way,
>here:
>https://github.com/bmwill/diffy/issues/31 , whereby increasing the context
>length(lines) of the whole patch(though ideally only of the affected hunks) the
>initially ambiguous hunk(s) cannot be applied anymore in more than 1 place inside
>the original file, thus avoiding both the diff creation and the patch application from
>generating and applying ambiguous hunks.
>
>But forgetting about that for a moment, I'm gonna show you about `git diff` and `git
>apply` below:
>1. clone cargo's repo:
>cd /tmp
>git clone https://github.com/rust-lang/cargo.git
>cd cargo
>2. checkout tag 0.76.0, or branch origin/rust-1.75.0 git checkout 0.76.0 3. manually
>edit this file ./src/cargo/core/workspace.rs at line 1118 or manually go to function:
>pub fn emit_warnings(&self) -> CargoResult<()> { right before that function's end
>which looks like:
>Ok(())
>}
>so there at line 1118, insert above that Ok(()) something, doesn't matter what,
>doesn't have to make sense, like:
>if seen_any_warnings {
> //comment
> bail!("reasons");
>}
>save the file
>4. try to generate a diff from it:
>git diff > /tmp/foo.patch
>you get this:
>```diff
>diff --git a/src/cargo/core/workspace.rs b/src/cargo/core/workspace.rs index
>4667c8029..8f0a74473 100644
>--- a/src/cargo/core/workspace.rs
>+++ b/src/cargo/core/workspace.rs
>@@ -1115,6 +1115,10 @@ impl<'cfg> Workspace<'cfg> {
> }
> }
> }
>+ if seen_any_warnings {
>+ //comment
>+ bail!("reasons");
>+ }
> Ok(())
> }
>
>```
>Now this is an ambiguous patch/hunk because if you try to apply this patch on the
>same original file, cumulatively, it applies successfully in 3 different places, but we
>won't do that now.
>5. now let's discard it(we already have it saved in /tmp/foo.patch) and pretend that
>something changed in the original code:
>git checkout .
>git checkout 0.80.0
>6. reapply that patch on the new changes:
>git apply /tmp/foo.patch
>(this shows no errors)
>7. look at the diff, for no good reason, just to see that it's the same(kind of):
>git diff > /tmp/foo2.patch
>contents:
>```diff
>diff --git a/src/cargo/core/workspace.rs b/src/cargo/core/workspace.rs index
>55bfb7a10..92709f224 100644
>--- a/src/cargo/core/workspace.rs
>+++ b/src/cargo/core/workspace.rs
>@@ -1099,6 +1099,10 @@ impl<'gctx> Workspace<'gctx> {
> }
> }
> }
>+ if seen_any_warnings {
>+ //comment
>+ bail!("reasons");
>+ }
> Ok(())
> }
>
>```
>8. now look at the file, where was the patch applied? that's right, it's at the end of
>the wrong function:
>fn validate_manifest(&mut self) -> CargoResult<()> { vim
>src/cargo/core/workspace.rs +1040 jump at the end of it at line 1107, you see it's
>our patch there, applied in the wrong spot!
>
>Hopefully, depending on the change, this kind of patch which applied in the wrong
>place, will be caught at (rust)compile time (in my case, it was this) or worse, at
>(binary)runtime.
>
>With the aforementioned `diffy` patch, the generated diff would actually be with a
>context of 4, to make it unambiguous, so it would've been this:
>```diff
>--- original
>+++ modified
>@@ -1186,8 +1186,12 @@
> self.gctx.shell().warn(msg)?
> }
> }
> }
>+ if seen_any_warnings {
>+ //use anyhow::bail;
>+ bail!("Compilation failed due to cargo warnings! Manually
>+ done
>this(via cargo patch) so that things like the following (ie. dep key packages= and
>using rust pre 1.26.0 which ignores it, downloads squatted
>package) will be avoided in the future:
>https://github.com/rust-lang/rust/security/advisories/GHSA-phjm-8x66-qw4r");
>+ }
> Ok(())
> }
>
> pub fn emit_lints(&self, pkg: &Package, path: &Path) -> CargoResult<()> { ``` this
>hunk is now unambiguous because it cannot be applied in more than 1 place in the
>original file, furthermore patching a modified future file will fail(with that `diffy`
>patch, and ideally, with `git apply` if any changes are implemented to "fix" this issue)
>if any of the hunks can be independently(and during the full patch application too)
>applied in more than 1 place.
>
>I consider that I don't know enough to understand how `git diff`/`git apply` works
>internally (and similarly, gnu `diff`/`patch`) to actually change them and make them
>generate unambiguous hunks where only the hunks that would've been ambiguous
>have increased context size, instead of the whole patch have increased context size
>for all hunks(which is what I did for `diffy` too so far, in that proof of concept patch),
>therefore if a "fix" is deemed necessary(it may not be, as I might've missed
>something and I'm unaware of it, so a fix may be messing other things up, who
>knows?!) then I hope someone much more knowledgeable could implement
>it(maybe even for gnu diff/patch too), and while I don't think that a "please" would
>be enough, I'm still gonna say it: please do so, if so inclined.
>
>Thank you for your time and consideration.
You make good points, but Rust code should not be put into the main git code base as it will break many non-GNU platforms. Perhaps rewriting it is C to be compatible with the git code-base.
--Randall
next prev parent reply other threads:[~2024-07-03 15:56 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-03 15:24 `git diff`/`git apply` can generate/apply ambiguous hunks (ie. in the wrong place) (just like gnu diff/patch) Emanuel Czirai
2024-07-03 15:55 ` rsbecker [this message]
2024-07-03 16:22 ` Emanuel Attila Czirai
2024-07-03 21:01 ` Johannes Sixt
2024-07-04 7:15 ` Emanuel Czirai
2024-07-04 20:07 ` Elijah Newren
2024-07-04 21:38 ` Emanuel Czirai
2024-07-06 5:43 ` Junio C Hamano
[not found] <CAFjaU5sU04-aUyfHLhYkR7jSqB18He-aEt=B_C41DkMnvm2zFQ@mail.gmail.com>
2024-07-04 10:24 ` Emanuel Czirai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='082b01dacd61$81174a80$8345df80$@nexbridge.com' \
--to=rsbecker@nexbridge.com \
--cc=correabuscar+gitML@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.