* renormalize histroy with smudge/clean-filter @ 2025-02-05 21:47 Josef Wolf 2025-02-05 22:55 ` brian m. carlson 2025-02-11 23:57 ` renormalize histroy with smudge/clean-filter, again Josef Wolf 0 siblings, 2 replies; 44+ messages in thread From: Josef Wolf @ 2025-02-05 21:47 UTC (permalink / raw) To: git Hello all, I have set up clean/smudge filters to normalzize files from an application to reduce the pain when those files are tracked by git. The clean/smudge filter work well on new commit and the result of smudge+smudge+clean is the same as the result of a simple clean, so the filter should be fine IMHO. But whenever I do any operations which introduce not yet normalized commits, I keep getting errors. So to get rod of those errors, I'd like to also renormalize the history: $ git rebase --root --strategy renormalize error: Your local changes to the following files would be overwritten by merge: foo/bar/baz Please commit your changes or stash them before you merge. Aborting $ git add foo/bar/baz $ git commit -m renormalize foo/bar/baz $ git rebase --continue git: 'merge-renormalize' is not a git command. See 'git --help'. error: could not apply abcdef... Foo Bar Baz [ ... ] Huh? I never entered a command "merge-renormalize" BTW: It does not make any difference whether I add "-c merge.renormalze=true" What would be the proper way to renormalize history? Any help? -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-05 21:47 renormalize histroy with smudge/clean-filter Josef Wolf @ 2025-02-05 22:55 ` brian m. carlson 2025-02-05 23:59 ` Josef Wolf 2025-02-06 7:55 ` Elijah Newren 2025-02-11 23:57 ` renormalize histroy with smudge/clean-filter, again Josef Wolf 1 sibling, 2 replies; 44+ messages in thread From: brian m. carlson @ 2025-02-05 22:55 UTC (permalink / raw) To: Josef Wolf, git; +Cc: Elijah Newren [-- Attachment #1: Type: text/plain, Size: 2435 bytes --] On 2025-02-05 at 21:47:26, Josef Wolf wrote: > Hello all, > > I have set up clean/smudge filters to normalzize files from an application to > reduce the pain when those files are tracked by git. > > The clean/smudge filter work well on new commit and the result of > smudge+smudge+clean is the same as the result of a simple clean, so the filter > should be fine IMHO. > > But whenever I do any operations which introduce not yet normalized commits, I > keep getting errors. Yes, this is known to occur. It notably happens with Git LFS, which uses smudge and clean filters, and suffers from this same problem. Renormalizing is indeed the right solution. > So to get rod of those errors, I'd like to also renormalize the history: > > $ git rebase --root --strategy renormalize > error: Your local changes to the following files would be overwritten by > merge: > foo/bar/baz > Please commit your changes or stash them before you merge. > Aborting > $ git add foo/bar/baz > $ git commit -m renormalize foo/bar/baz > $ git rebase --continue > git: 'merge-renormalize' is not a git command. See 'git --help'. > error: could not apply abcdef... Foo Bar Baz > [ ... ] > > Huh? I never entered a command "merge-renormalize" When you use command like `--strategy foo` with a custom strategy, Git calls a binary called `git merge-foo` to implement that strategy. So while you didn't explicitly invoke that, when you used the nonstandard strategy `renormalize` (which, by the way, does not exist), Git invoked it when you rebased, since rebases by default use merges under the hood. > BTW: It does not make any difference whether I add "-c merge.renormalze=true" That option also does not exist. Can you tell us where you found such a recommendation? If we've been misleading people in our documentation, I'd like to fix. > What would be the proper way to renormalize history? The command that needs to be done is `git add --renormalize .` I think you probably want to do is something like this: `git rebase --root -x 'git add --renormalize . && git commit --amend --no-edit'`. You might also be able to use `git filter-repo` to do this in a nicer way, but I'm not aware of how to do that. I've CCed the maintainer, however, in case he or anyone else can provide an answer. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-05 22:55 ` brian m. carlson @ 2025-02-05 23:59 ` Josef Wolf 2025-02-06 0:29 ` brian m. carlson 2025-02-06 10:13 ` Phillip Wood 2025-02-06 7:55 ` Elijah Newren 1 sibling, 2 replies; 44+ messages in thread From: Josef Wolf @ 2025-02-05 23:59 UTC (permalink / raw) To: git; +Cc: brian m. carlson, Elijah Newren Thanks for your help, Brian! On Wed, Feb 05, 2025 at 10:55:24PM +0000, brian m. carlson wrote: > On 2025-02-05 at 21:47:26, Josef Wolf wrote: > > > > Huh? I never entered a command "merge-renormalize" > > When you use command like `--strategy foo` with a custom strategy, Git > calls a binary called `git merge-foo` to implement that strategy. So > while you didn't explicitly invoke that, when you used the nonstandard > strategy `renormalize` (which, by the way, does not exist), Git invoked > it when you rebased, since rebases by default use merges under the hood. Uh, You're right: renormalize is not a merge-strategy on its own but an option to the ort strategy. $ git rebase --root --strategy ort -X renormalize Updating files: 100% (372/372), done. error: Your local changes to the following files would be overwritten by merge: gt8/P-0113/G gt8/P-0113/P-0113-0_A-2 gt8/P-0113/U Please commit your changes or stash them before you merge. Aborting Those are (some) of the files which are subject to the filtering. I can go further with: $ git add --renormalize . && git commit --amend --no-edit && git rebase --continue So this approach works. Although it needs some manual intervention. > > BTW: It does not make any difference whether I add "-c merge.renormalze=true" > > That option also does not exist. Well, this is described in git(1) manpage: [ ... ] SYNOPSIS git [-v | --version] [-h | --help] [-C <path>] [-c <name>=<value>] [ ... ] ^^^^^^^^^^^^^^^^^^^ > git rebase --root -x 'git add --renormalize . && git commit --amend --no-edit' Unfortunately, this runs the command on every commit and gives a warning when a cmmit don't touch a filtered file: $ git rebase --root -x 'git add --renormalize . && git commit --amend --no-edit' [ ... ] No changes You asked to amend the most recent commit, but doing so would make it empty. You can repeat your command with --allow-empty, or you can remove the commit entirely with "git reset HEAD^". Is there a way to run the command only when rebase halts? -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-05 23:59 ` Josef Wolf @ 2025-02-06 0:29 ` brian m. carlson 2025-02-06 8:07 ` Elijah Newren 2025-02-06 10:13 ` Phillip Wood 1 sibling, 1 reply; 44+ messages in thread From: brian m. carlson @ 2025-02-06 0:29 UTC (permalink / raw) To: Josef Wolf, git, Elijah Newren [-- Attachment #1: Type: text/plain, Size: 1940 bytes --] On 2025-02-05 at 23:59:31, Josef Wolf wrote: > > > BTW: It does not make any difference whether I add "-c merge.renormalze=true" > > > > That option also does not exist. > > Well, this is described in git(1) manpage: > > [ ... ] > SYNOPSIS > git [-v | --version] [-h | --help] [-C <path>] [-c <name>=<value>] > [ ... ] ^^^^^^^^^^^^^^^^^^^ > The -c option does exist, and apparently the merge.renormalize option does as well, so I apologize. It looks like it's only used in merge-recursive and not merge-ort.c, so I'm not sure if it's still effective. Elijah would know for certain, since he's the author of merge-ort as well. > > git rebase --root -x 'git add --renormalize . && git commit --amend --no-edit' > > Unfortunately, this runs the command on every commit and gives a warning when > a cmmit don't touch a filtered file: > > $ git rebase --root -x 'git add --renormalize . && git commit --amend --no-edit' > [ ... ] > No changes > You asked to amend the most recent commit, but doing so would make > it empty. You can repeat your command with --allow-empty, or you can > remove the commit entirely with "git reset HEAD^". Yeah, that's a problem with a rebase in general here. You could try `git rebase --root -X renormalize` here, which will use the `renormalize` option, but you may run into the same problem. I _think_ with the default merge strategy in rebase that it will keep the empty commits, so your linear parts of history won't be changed, although you'll probably drop the merge commits (and any conflict resolutions) unless you use `--rebase-merges`. If this is a small project, that may not be a problem, but I would recommend `git filter-repo` here if that's an option because it will preserve your history in a nicer way. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-06 0:29 ` brian m. carlson @ 2025-02-06 8:07 ` Elijah Newren 2025-02-06 13:40 ` Josef Wolf 0 siblings, 1 reply; 44+ messages in thread From: Elijah Newren @ 2025-02-06 8:07 UTC (permalink / raw) To: brian m. carlson, Josef Wolf, git, Elijah Newren On Wed, Feb 5, 2025 at 4:29 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > On 2025-02-05 at 23:59:31, Josef Wolf wrote: > > > > BTW: It does not make any difference whether I add "-c merge.renormalze=true" > > > > > > That option also does not exist. > > > > Well, this is described in git(1) manpage: > > > > [ ... ] > > SYNOPSIS > > git [-v | --version] [-h | --help] [-C <path>] [-c <name>=<value>] > > [ ... ] ^^^^^^^^^^^^^^^^^^^ > > > > The -c option does exist, and apparently the merge.renormalize option > does as well, so I apologize. It looks like it's only used in > merge-recursive and not merge-ort.c, so I'm not sure if it's still > effective. Elijah would know for certain, since he's the author of > merge-ort as well. init_*merge_options() are defined in merge-recursive.c, and these call merge_recursive_config() which is also in merge-recursive.c, but the parsed options are shared between the two backends; you'll note that merge-ort.h includes merge-recursive.h to get all these. And merge-ort does have the necessary code to use and understand the merge.renormalize option. (Of course, the fact that renormalization *requires* an index made it a bit nasty, because merge-ort was written to avoid the index as a data structure, so I had to do some ugly shenanigans in order to support that option -- https://lore.kernel.org/git/CABPp-BE1TvFJ1eOa8Ci5JTMET+dzZh3m3NxppqqWPyEp1UeAVg@mail.gmail.com/. But that's beside the point here.) ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-06 8:07 ` Elijah Newren @ 2025-02-06 13:40 ` Josef Wolf 2025-02-06 20:04 ` Josef Wolf 0 siblings, 1 reply; 44+ messages in thread From: Josef Wolf @ 2025-02-06 13:40 UTC (permalink / raw) To: git Thanks for all the insights and explanations! I have to admit that I have a hard time to understand why the merges (and even conflicts) happen. I have a totally linear history here. Thus, I'd expect the rebase to do something like (in pseudo-code) foreach $commit original-branch-commits git cherry-pick $commit So I tried this and I see that cherry-pick seems to ignore the clear-filter setting and commits the smudge'd version? My expectation would have been that every operation would run the clear filter before storing it in the repo. Why is not everything going into the repo cleared? On Thu, Feb 06, 2025 at 12:07:00AM -0800, Elijah Newren wrote: > On Wed, Feb 5, 2025 at 4:29 PM brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > > > On 2025-02-05 at 23:59:31, Josef Wolf wrote: > > > > > BTW: It does not make any difference whether I add "-c merge.renormalze=true" > > > > > > > > That option also does not exist. > > > > > > Well, this is described in git(1) manpage: > > > > > > [ ... ] > > > SYNOPSIS > > > git [-v | --version] [-h | --help] [-C <path>] [-c <name>=<value>] > > > [ ... ] ^^^^^^^^^^^^^^^^^^^ > > > > > > > The -c option does exist, and apparently the merge.renormalize option > > does as well, so I apologize. It looks like it's only used in > > merge-recursive and not merge-ort.c, so I'm not sure if it's still > > effective. Elijah would know for certain, since he's the author of > > merge-ort as well. > > init_*merge_options() are defined in merge-recursive.c, and these call > merge_recursive_config() which is also in merge-recursive.c, but the > parsed options are shared between the two backends; you'll note that > merge-ort.h includes merge-recursive.h to get all these. And > merge-ort does have the necessary code to use and understand the > merge.renormalize option. (Of course, the fact that renormalization > *requires* an index made it a bit nasty, because merge-ort was written > to avoid the index as a data structure, so I had to do some ugly > shenanigans in order to support that option -- > https://lore.kernel.org/git/CABPp-BE1TvFJ1eOa8Ci5JTMET+dzZh3m3NxppqqWPyEp1UeAVg@mail.gmail.com/. > But that's beside the point here.) > > -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-06 13:40 ` Josef Wolf @ 2025-02-06 20:04 ` Josef Wolf 2025-02-07 6:10 ` Chris Torek 0 siblings, 1 reply; 44+ messages in thread From: Josef Wolf @ 2025-02-06 20:04 UTC (permalink / raw) To: git On Thu, Feb 06, 2025 at 02:40:06PM +0100, Josef Wolf wrote: > foreach $commit original-branch-commits > git cherry-pick $commit I've done a lot of try and error with this approach and have come to the conclusion, that cherry-pick totally mis-behaves in the presence of clean/smudge filters. IMHO, content should never have any chance to bypass clean filter on its way to the repository. git-cherry-pick violates this and commits the smudged content, leading to problems which can be resolved only by using git add --renormalize . && git commit --amend --no-edit But even when git-cherry-pick starts on a normaalized commit, it tries to apply the picked commit without cleaning it before, so again conflits will be thrown. -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-06 20:04 ` Josef Wolf @ 2025-02-07 6:10 ` Chris Torek 2025-02-07 10:45 ` Josef Wolf 0 siblings, 1 reply; 44+ messages in thread From: Chris Torek @ 2025-02-07 6:10 UTC (permalink / raw) To: Josef Wolf, git [First] > On Thu, Feb 06, 2025 at 02:40:06PM +0100, Josef Wolf wrote: > > > foreach $commit original-branch-commits > > git cherry-pick $commit [then] >On Thu, Feb 6, 2025 at 12:07 PM Josef Wolf <jw@raven.inka.de> wrote: > I've done a lot of try and error with this approach and have come to the > conclusion, that cherry-pick totally mis-behaves in the presence of > clean/smudge filters. I suspect, actually, that the biggest problem here is that cherry-pick defaults to working by using merge. Given that you want to create a new linear set of "cleaned" commits, you don't want to use `git cherry-pick` at all. Just restore the files from the original commit, then add and commit. Chris ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-07 6:10 ` Chris Torek @ 2025-02-07 10:45 ` Josef Wolf 2025-02-07 11:06 ` Torsten Bögershausen ` (2 more replies) 0 siblings, 3 replies; 44+ messages in thread From: Josef Wolf @ 2025-02-07 10:45 UTC (permalink / raw) To: git On Thu, Feb 06, 2025 at 10:10:26PM -0800, Chris Torek wrote: > [First] > > > On Thu, Feb 06, 2025 at 02:40:06PM +0100, Josef Wolf wrote: > > > > > foreach $commit original-branch-commits > > > git cherry-pick $commit > > [then] > > >On Thu, Feb 6, 2025 at 12:07 PM Josef Wolf <jw@raven.inka.de> wrote: > > I've done a lot of try and error with this approach and have come to the > > conclusion, that cherry-pick totally mis-behaves in the presence of > > clean/smudge filters. > > I suspect, actually, that the biggest problem here is that cherry-pick > defaults to working by using merge. Given that you want to create > a new linear set of "cleaned" commits, you don't want to use > `git cherry-pick` at all. Just restore the files from the original > commit, then add and commit. Ummm... That's far beyond my git expertise... I completely fail to understand why git insists to operate on smudged files in many situations. IIUC, once clean/smudge are installed, all internal operations should be done on clean files. So why do I need this "git add --renormalize ." at all and (in the case of cherry-pick) there is not even any way to renormalize before picking. But maybe my understanding is too simplicistic here... -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-07 10:45 ` Josef Wolf @ 2025-02-07 11:06 ` Torsten Bögershausen 2025-02-07 11:12 ` Chris Torek 2025-02-07 15:39 ` Junio C Hamano 2 siblings, 0 replies; 44+ messages in thread From: Torsten Bögershausen @ 2025-02-07 11:06 UTC (permalink / raw) To: Josef Wolf, git On Fri, Feb 07, 2025 at 11:45:10AM +0100, Josef Wolf wrote: > On Thu, Feb 06, 2025 at 10:10:26PM -0800, Chris Torek wrote: > > [First] > > > > > On Thu, Feb 06, 2025 at 02:40:06PM +0100, Josef Wolf wrote: [] > Ummm... That's far beyond my git expertise... > > I completely fail to understand why git insists to operate on smudged files in > many situations. > > IIUC, once clean/smudge are installed, all internal operations should be done > on clean files. So why do I need this "git add --renormalize ." at all and (in > the case of cherry-pick) there is not even any way to renormalize before > picking. > > But maybe my understanding is too simplicistic here... Now, well, there is a lot of history here. Why things work, and what is working. The short version: The '--renormalize' functionality came into Git much later then all other commands, if I simplify things. There had been different answers here in this thread, and I try to be helpful. In general, this could work, fully untested: Take the first commit from your svn import. Check out a branch. Add a proper (!) .gitattributes file. run 'git add --renornormalize .' 'git commit' Now the fun starts. From what I understand, the following could work: foreach $commit original-branch-commits git merge -X renormalize $commit However, I don't have such a repo to test things. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-07 10:45 ` Josef Wolf 2025-02-07 11:06 ` Torsten Bögershausen @ 2025-02-07 11:12 ` Chris Torek 2025-02-07 11:17 ` Chris Torek ` (2 more replies) 2025-02-07 15:39 ` Junio C Hamano 2 siblings, 3 replies; 44+ messages in thread From: Chris Torek @ 2025-02-07 11:12 UTC (permalink / raw) To: Josef Wolf, git On Fri, Feb 7, 2025 at 2:46 AM Josef Wolf <jw@raven.inka.de> wrote: > I completely fail to understand why git insists to operate on smudged files in > many situations. It doesn't, really, and that's not the basis of the problem with rebase using merge. However: > IIUC, once clean/smudge are installed, all internal operations should be done > on clean files. So why do I need this "git add --renormalize ." at all ... To simplify (perhaps oversimplify, but I'll hope not), you're running afoul of an optimization trick. Git is famously *fast* (as compared to most of the systems that came before or at the same time anyway). In the old days when I used CVS and Subversion and the like, we'd run a commit or update, and then go out for coffee or lunch or whatever, because we knew we were not going to be able to do anything for another ten minutes or perhaps even an hour or more. Then Git came along and we'd run "git checkout" or "git commit" and it would say it was done, often without even a noticeable pause, and we'd wonder if it actually did anything at all. Git gets this speed through a lot of clever tricks, and one of them interacts poorly with clean and smudge filters *if you ever change the filter*. If the filter says constant, the tricks still work -- but what you are doing (in effect anyway) here is to change to a new filter with each commit. Running with an explicit `--renormalize` turns off the efficiency trick. This is documented (indirectly) where > and (in the case of cherry-pick) there is not even any way to > renormalize before picking. That's mostly correct. The problem here is that while `git merge` (both recursive and the new ort) has a renormalize option internally, it's not exposed to cherry-pick. Oddly, checkout obeys it. Perhaps builtin/revert.c should as well? Chris ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-07 11:12 ` Chris Torek @ 2025-02-07 11:17 ` Chris Torek 2025-02-07 14:01 ` Elijah Newren 2025-02-07 20:21 ` Josef Wolf 2 siblings, 0 replies; 44+ messages in thread From: Chris Torek @ 2025-02-07 11:17 UTC (permalink / raw) To: Josef Wolf, git Oops, I left something unfinished: On Fri, Feb 7, 2025 at 3:12 AM Chris Torek <chris.torek@gmail.com> wrote: > Running with an explicit `--renormalize` turns off the efficiency trick. > This is documented (indirectly) where I forgot to fill in the "where" part. It seems to be in both the FAQ and in `gitattributes`: Documentation/gitfaq.txt:You will need to run `git add --renormalize` to have this take effect. Note Documentation/gitattributes.txt:Note: Whenever the clean filter is changed, the repo should be renormalized: Chris ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-07 11:12 ` Chris Torek 2025-02-07 11:17 ` Chris Torek @ 2025-02-07 14:01 ` Elijah Newren 2025-02-07 20:32 ` Josef Wolf 2025-02-07 20:21 ` Josef Wolf 2 siblings, 1 reply; 44+ messages in thread From: Elijah Newren @ 2025-02-07 14:01 UTC (permalink / raw) To: Chris Torek; +Cc: Josef Wolf, git On Fri, Feb 7, 2025 at 3:13 AM Chris Torek <chris.torek@gmail.com> wrote: > > > and (in the case of cherry-pick) there is not even any way to > > renormalize before picking. > > That's mostly correct. The problem here is that while `git merge` > (both recursive and the new ort) has a renormalize option > internally, it's not exposed to cherry-pick. Perhaps not as a config option, but it can be selected via -Xrenormalize . However, whether it is exposed or used doesn't matter. Renormalization in the merge machinery (this is the same for both the recursive and ort backends) is something passed to xdiff[1], for doing 3-way content merges of individual files. If a merge/rebase/cherry-pick/revert doesn't need to do a 3-way content merge for some file, then no normalization will be done for it. This could happen, for example, if one side of history being merged modified a file and the other side of history being merged didn't touch that file. And as a special case, that includes when one side of history adds the file and the other side of history doesn't have the file. In particular, for the cherry-picks or rebasing that Josef is doing going back to the root of history, that is simply doing merges against a side of history that hasn't modified any of his files, so there isn't going to be any automatic renormalization. The rest of what you write about optimizations is spot on, though. This isn't a bug in cherry-pick (or merge or rebase); renormalizing all files proactively in the merge machinery whenever a merge or cherry-pick is done would be orders of magnitude slower for any decently sized repository; it's simply out of the question. I think Phillip's suggestion elsewhere in this thread (git rebase --root -x 'git add --renormalize . && { git diff --quiet --cached || git commit --amend --no-edit; }') would be what Josef needs to run, ASSUMING the history Josef is operating on is linear. Hope that helps, Elijah [1] Okay, technically renormalization is also used to turn modify/delete conflicts into simple deletes, when the only modification was a normalization of the file contents. I don't think that's relevant to Josef's case, though, so I elided it in the explanation. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-07 14:01 ` Elijah Newren @ 2025-02-07 20:32 ` Josef Wolf 2025-02-08 0:23 ` Elijah Newren 0 siblings, 1 reply; 44+ messages in thread From: Josef Wolf @ 2025-02-07 20:32 UTC (permalink / raw) To: git On Fri, Feb 07, 2025 at 06:01:43AM -0800, Elijah Newren wrote: > On Fri, Feb 7, 2025 at 3:13 AM Chris Torek <chris.torek@gmail.com> wrote: > renormalizing > all files proactively in the merge machinery whenever a merge or > cherry-pick is done would be orders of magnitude slower for any > decently sized repository; it's simply out of the question. Sounds like trade of time against correctness? See, I am sitting here trying to get this repo into a sane state for about two weeks now, and I keep getting conflicts and/or errors thrown onto me on every single attempt I try. I'd be happy to drink a whole can of coffee while some hypothetical "git renormalize-this-repo --force" is running. -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-07 20:32 ` Josef Wolf @ 2025-02-08 0:23 ` Elijah Newren 2025-02-08 11:14 ` Phillip Wood 2025-02-08 20:57 ` Josef Wolf 0 siblings, 2 replies; 44+ messages in thread From: Elijah Newren @ 2025-02-08 0:23 UTC (permalink / raw) To: Josef Wolf, git On Fri, Feb 7, 2025 at 12:34 PM Josef Wolf <jw@raven.inka.de> wrote: > > On Fri, Feb 07, 2025 at 06:01:43AM -0800, Elijah Newren wrote: > > On Fri, Feb 7, 2025 at 3:13 AM Chris Torek <chris.torek@gmail.com> wrote: > > > renormalizing > > all files proactively in the merge machinery whenever a merge or > > cherry-pick is done would be orders of magnitude slower for any > > decently sized repository; it's simply out of the question. > > Sounds like trade of time against correctness? I may have misunderstood what folks were saying in my reading & skimming of this thread. I thought some folks were suggesting git rebase --root -X renormalize as a way to renormalize the history, assuming you have linear history. I was arguing against that; it's not going to work and isn't meant to[1]. I also see I didn't look closely enough at Phillip's suggestion, which was: git rebase --root -x 'git add --renormalize . && { git diff --quiet --cached || git commit --amend --no-edit; }' which will work if you do a lot of manual work to resolve line ending difference conflicts. Since the git add at each step will modify the files on which the next commit is based, that causes the application of the subsequent commit to conflict, and you probably will have difficulty seeing those conflicts since they tend to just be line ending differences. But, mixing that with Brian's suggestion, you get: git rebase --root -X renormalize -x 'git add --renormalize . && { git diff --quiet --cached || git commit --amend --no-edit; }' which should probably work if you have a linear history (though I've never tried it myself; I've never actually used the renormalization stuff beyond making sure that merge-ort matched merge-recursive). The `git add --renormalize .` does the work of changing files, and the `-X renormalize` to git allows it to handle merging subsequent commits with the munged line ending differences as it does its work. Were you trying one of these three? Or something else? Elijah [1] The renormalize option to the merge machinery ensures that new blobs produced by the merge have normalized content, and avoid conflicts when the only differences between files are normalization ones. This option does not ensure that new trees only reference new content nor that they only reference normalized content; _any_ pre-existing blobs in the repository are fair game for new trees to reference. As per the manual: "renormalize...This runs a virtual check-out and check-in of all three stages of a file when resolving a three-way merge..." So, the existing behavior of the renormalize option to rebase/cherry-pick/merge is correct. It may not be what you want, but I don't think cherry-picking/rebasing/merging with the renormalize option is the right tool for this job. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-08 0:23 ` Elijah Newren @ 2025-02-08 11:14 ` Phillip Wood 2025-02-08 21:08 ` Josef Wolf 2025-02-08 21:43 ` Elijah Newren 2025-02-08 20:57 ` Josef Wolf 1 sibling, 2 replies; 44+ messages in thread From: Phillip Wood @ 2025-02-08 11:14 UTC (permalink / raw) To: Elijah Newren, Josef Wolf, git; +Cc: brian m . carlson Hi Elijah and Josef On 08/02/2025 00:23, Elijah Newren wrote: > On Fri, Feb 7, 2025 at 12:34 PM Josef Wolf <jw@raven.inka.de> wrote: >> On Fri, Feb 07, 2025 at 06:01:43AM -0800, Elijah Newren wrote: >>> On Fri, Feb 7, 2025 at 3:13 AM Chris Torek <chris.torek@gmail.com> wrote: >> > I also see I didn't look closely enough at Phillip's > suggestion, which was: > > git rebase --root -x 'git add --renormalize . && { git diff --quiet > --cached || git commit --amend --no-edit; }' > > which will work if you do a lot of manual work to resolve line ending > difference conflicts. Since the git add at each step will modify the > files on which the next commit is based, that causes the application > of the subsequent commit to conflict, Indeed, I'd missed that (like you I've not actually used any smudge/clean filters) > and you probably will have > difficulty seeing those conflicts since they tend to just be line > ending differences. But, mixing that with Brian's suggestion, you > get: > > git rebase --root -X renormalize -x 'git add --renormalize . && { > git diff --quiet --cached || git commit --amend --no-edit; }' > > which should probably work if you have a linear history I've tried that out with a small modification in the script below which seems to work. The modification is to add "--attr-source=$(git rev-parse HEAD)" between "git" and "rebase" so that git always has a .gitattributes file to read when rebasing commits that were made before that file was added. I wonder if we should add something about renormalizing a repository to the FAQ based on your footnote > [1] The renormalize option to the merge machinery ensures that new > blobs produced by the merge have normalized content, and avoid > conflicts when the only differences between files are normalization > ones. This option does not ensure that new trees only reference new > content nor that they only reference normalized content; _any_ > pre-existing blobs in the repository are fair game for new trees to > reference. As per the manual: "renormalize...This runs a virtual > check-out and check-in of all three stages of a file when resolving a > three-way merge..." So, the existing behavior of the renormalize > option to rebase/cherry-pick/merge is correct. It may not be what you > want, but I don't think cherry-picking/rebasing/merging with the > renormalize option is the right tool for this job. > Best Wishes Phillip --- >8 --- #!/bin/sh set -e d="$(mktemp -d)" cd "$d" git init echo "The quick brown" >file git add file git commit -m line-1 echo "fox jumps over" >>file git commit -a -m line-2 echo "the lazy dog" >>file git commit -a -m line-3 echo "file filter=space" >.gitattributes git config filter.space.clean "sed -e 's/ */ /g'" git config filter.space.smudge cat git add .gitattributes git commit -a -m 'add .gitattributes' git reset --hard HEAD git --attr-source=$(git rev-parse HEAD) rebase --root -X renormalize \ -x 'git add --renormalize . && { git diff --cached --quiet || git commit --amend --no-edit; }' git log -p ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-08 11:14 ` Phillip Wood @ 2025-02-08 21:08 ` Josef Wolf 2025-02-08 21:43 ` Elijah Newren 1 sibling, 0 replies; 44+ messages in thread From: Josef Wolf @ 2025-02-08 21:08 UTC (permalink / raw) To: git On Sat, Feb 08, 2025 at 11:14:57AM +0000, Phillip Wood wrote: > The modification is to add "--attr-source=$(git rev-parse HEAD)" Uh! My expectation would have been that this is the default? Why on earth would one want a changing filter setting during a rebase? Can anybody outline a use-case for changing filter during operaion? If I define a filter, I'd rather want it to be in effect on every commit of every branch. -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-08 11:14 ` Phillip Wood 2025-02-08 21:08 ` Josef Wolf @ 2025-02-08 21:43 ` Elijah Newren 2025-02-08 23:26 ` Josef Wolf 1 sibling, 1 reply; 44+ messages in thread From: Elijah Newren @ 2025-02-08 21:43 UTC (permalink / raw) To: phillip.wood; +Cc: Josef Wolf, git, brian m . carlson Hi Phillip, On Sat, Feb 8, 2025 at 3:15 AM Phillip Wood <phillip.wood123@gmail.com> wrote: > > Hi Elijah and Josef > > On 08/02/2025 00:23, Elijah Newren wrote: > > On Fri, Feb 7, 2025 at 12:34 PM Josef Wolf <jw@raven.inka.de> wrote: > >> On Fri, Feb 07, 2025 at 06:01:43AM -0800, Elijah Newren wrote: > >>> On Fri, Feb 7, 2025 at 3:13 AM Chris Torek <chris.torek@gmail.com> wrote: > >> > > I also see I didn't look closely enough at Phillip's > > suggestion, which was: > > > > git rebase --root -x 'git add --renormalize . && { git diff --quiet > > --cached || git commit --amend --no-edit; }' > > > > which will work if you do a lot of manual work to resolve line ending > > difference conflicts. Since the git add at each step will modify the > > files on which the next commit is based, that causes the application > > of the subsequent commit to conflict, > > Indeed, I'd missed that (like you I've not actually used any > smudge/clean filters) > > > and you probably will have > > difficulty seeing those conflicts since they tend to just be line > > ending differences. But, mixing that with Brian's suggestion, you > > get: > > > > git rebase --root -X renormalize -x 'git add --renormalize . && { > > git diff --quiet --cached || git commit --amend --no-edit; }' > > > > which should probably work if you have a linear history > > I've tried that out with a small modification in the script below which > seems to work. The modification is to add "--attr-source=$(git rev-parse > HEAD)" between "git" and "rebase" so that git always has a > .gitattributes file to read when rebasing commits that were made before > that file was added. Ooh, nice catch. If folks had an appropriate .gitattributes file in place in older versions of history, they probably wouldn't have gotten into the mess. > I wonder if we should add something about > renormalizing a repository to the FAQ based on your footnote. and perhaps your helpful example? (although it does assume linear history) :-) > > [1] The renormalize option to the merge machinery ensures that new > > blobs produced by the merge have normalized content, and avoid > > conflicts when the only differences between files are normalization > > ones. This option does not ensure that new trees only reference new > > content nor that they only reference normalized content; _any_ > > pre-existing blobs in the repository are fair game for new trees to > > reference. As per the manual: "renormalize...This runs a virtual > > check-out and check-in of all three stages of a file when resolving a > > three-way merge..." So, the existing behavior of the renormalize > > option to rebase/cherry-pick/merge is correct. It may not be what you > > want, but I don't think cherry-picking/rebasing/merging with the > > renormalize option is the right tool for this job. > > > > Best Wishes > > Phillip > > --- >8 --- > #!/bin/sh > set -e > d="$(mktemp -d)" > cd "$d" > git init > echo "The quick brown" >file > git add file > git commit -m line-1 > echo "fox jumps over" >>file > git commit -a -m line-2 > echo "the lazy dog" >>file > git commit -a -m line-3 > echo "file filter=space" >.gitattributes > git config filter.space.clean "sed -e 's/ */ /g'" > git config filter.space.smudge cat > git add .gitattributes > git commit -a -m 'add .gitattributes' > git reset --hard HEAD > git --attr-source=$(git rev-parse HEAD) rebase --root -X renormalize \ > -x 'git add --renormalize . && { git diff --cached --quiet || git > commit --amend --no-edit; }' So, I'm slightly surprised here. Does the --attr-source specified to the outer git become an environment variable or something for the inner git-add invocation? How does the git add subprocess know about it? ...<does some searches ending with>... $ git grep -5 GIT_ATTR_SOURCE -- git.c git.c- } else if (!strcmp(cmd, "--attr-source")) { git.c- if (*argc < 2) { git.c- fprintf(stderr, _("no attribute source given for --attr-source\n" )); git.c- usage(git_usage_string); git.c- } git.c: setenv(GIT_ATTR_SOURCE_ENVIRONMENT, (*argv)[1], 1); git.c- if (envchanged) git.c- *envchanged = 1; git.c- (*argv)++; git.c- (*argc)--; git.c- } else if (skip_prefix(cmd, "--attr-source=", &cmd)) { git.c- set_git_attr_source(cmd); git.c: setenv(GIT_ATTR_SOURCE_ENVIRONMENT, cmd, 1); git.c- if (envchanged) git.c- *envchanged = 1; git.c- } else if (!strcmp(cmd, "--no-advice")) { git.c- setenv(GIT_ADVICE_ENVIRONMENT, "0", 1); git.c- if (envchanged) ahah, so it is passed via environment variable to the subprocess. Anyway, nice catch. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-08 21:43 ` Elijah Newren @ 2025-02-08 23:26 ` Josef Wolf 2025-02-09 2:33 ` D. Ben Knoble 2025-02-09 7:21 ` Elijah Newren 0 siblings, 2 replies; 44+ messages in thread From: Josef Wolf @ 2025-02-08 23:26 UTC (permalink / raw) To: git Hi Elijah, On Sat, Feb 08, 2025 at 01:43:05PM -0800, Elijah Newren wrote: > Ooh, nice catch. If folks had an appropriate .gitattributes file in > place in older versions of history, they probably wouldn't have gotten > into the mess. Well, you can't assume that paople get it right from the very start. An important use case of git is fixing errors made in the past, right? In my case, I had no choice. I HAD to commit those propritary data files as-is, because I had no clue how they are structured and how those hashes are calculated. As time passed, I learned what I need to do to smudge+clean those files. But at that time a whole bunch of commits were already done. On this roadtrip, I had to modify those .gitattributes files in various ways. The only variant of those .gitattributes file which will work properly is the newest one. And this is also the variant wich will work for all the olter commits. So no, I don't see why using any of the older variants of this .gitattributes would make any sense. > ahah, so it is passed via environment variable to the subprocess. I find this to be confusing: the primary call should not need this parameter, since it is invoked from HEAD anyway. Everything else gets it via env-vars. I'd assume this variable will also be passed to the commands which are invoked by the -x switch? -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-08 23:26 ` Josef Wolf @ 2025-02-09 2:33 ` D. Ben Knoble 2025-02-09 8:53 ` Josef Wolf 2025-02-09 7:21 ` Elijah Newren 1 sibling, 1 reply; 44+ messages in thread From: D. Ben Knoble @ 2025-02-09 2:33 UTC (permalink / raw) To: Josef Wolf, git On Sat, Feb 8, 2025 at 6:28 PM Josef Wolf <jw@raven.inka.de> wrote: > > Hi Elijah, > > On Sat, Feb 08, 2025 at 01:43:05PM -0800, Elijah Newren wrote: > > > Ooh, nice catch. If folks had an appropriate .gitattributes file in > > place in older versions of history, they probably wouldn't have gotten > > into the mess. > > Well, you can't assume that paople get it right from the very start. An > important use case of git is fixing errors made in the past, right? > > In my case, I had no choice. I HAD to commit those propritary data files > as-is, because I had no clue how they are structured and how those hashes are > calculated. As time passed, I learned what I need to do to smudge+clean those > files. But at that time a whole bunch of commits were already done. > > On this roadtrip, I had to modify those .gitattributes files in various ways. > > The only variant of those .gitattributes file which will work properly is the > newest one. And this is also the variant wich will work for all the olter > commits. > > So no, I don't see why using any of the older variants of this .gitattributes > would make any sense. > The original question said > Why on earth would one want a changing filter setting during a rebase? > Can anybody outline a use-case for changing filter during operaion? [sic] But I'll answer this one—general operations on older history can't use a newer gitattributes declaration without explicit instruction because they'd have to know from which future to pull. Remember Git can branch, so (even assuming we had a fast way to calculate this, which AIUI we don't) from a single commit there can be multiple valid future commits with different gitattributes. For the starting point of an operation that eventually invokes other operations, where the start clearly uses one gitattributes, it _might_ be reasonable to assume that would propagate down to the other operations. But when subsequent operations logically operate on older history, it also seems reasonable (and unsurprising) to "do what the repository intended at that specific commit." Git assumes the latter and provides a way for you to indicate the former. Perhaps it's worth an explainer somewhere? -- D. Ben Knoble ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-09 2:33 ` D. Ben Knoble @ 2025-02-09 8:53 ` Josef Wolf 0 siblings, 0 replies; 44+ messages in thread From: Josef Wolf @ 2025-02-09 8:53 UTC (permalink / raw) To: git On Sat, Feb 08, 2025 at 09:33:05PM -0500, D. Ben Knoble wrote: > > So no, I don't see why using any of the older variants of this .gitattributes > > would make any sense. > > The original question said > > > Why on earth would one want a changing filter setting during a rebase? > > Can anybody outline a use-case for changing filter during operaion? [sic] > > But I'll answer this one—general operations on older history can't use > a newer gitattributes declaration without explicit instruction because > they'd have to know from which future to pull. Remember Git can > branch, so (even assuming we had a fast way to calculate this, which > AIUI we don't) from a single commit there can be multiple valid future > commits with different gitattributes. Yes, this is the way .gitattributes work. And to be honest, I always found it strange that this setting travels along the history. But there is also ~/.giconfig, which has the drawback that in won't travel with the repo. IMHO, it would make more sense to have some sort of global storage which travels along with the repo. -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-08 23:26 ` Josef Wolf 2025-02-09 2:33 ` D. Ben Knoble @ 2025-02-09 7:21 ` Elijah Newren 2025-02-09 8:57 ` Josef Wolf 1 sibling, 1 reply; 44+ messages in thread From: Elijah Newren @ 2025-02-09 7:21 UTC (permalink / raw) To: Josef Wolf, git On Sat, Feb 8, 2025 at 3:28 PM Josef Wolf <jw@raven.inka.de> wrote: > > Hi Elijah, > > On Sat, Feb 08, 2025 at 01:43:05PM -0800, Elijah Newren wrote: > > > Ooh, nice catch. If folks had an appropriate .gitattributes file in > > place in older versions of history, they probably wouldn't have gotten > > into the mess. > > Well, you can't assume that paople get it right from the very start. An > important use case of git is fixing errors made in the past, right? [...] Sorry if it sounded like that was passing judgement; that was not what I intended. I've been in a lot of messes too. I mean, I wrote git-filter-repo because of how many things there were to clean up. I get it, life is messy. Hindsight is 20/20. You can't let perfect be the enemy of the good. You can't prioritize "everything", you have to pick your battles. Iterative improvement, etc. > > ahah, so it is passed via environment variable to the subprocess. > > I find this to be confusing: the primary call should not need this parameter, > since it is invoked from HEAD anyway. No, the primary call I think would need the parameter too; it changes HEAD immediately when it starts rebasing, and continues changing it with each commit it rebases; since it's operating on older versions, by default it'd likely pick the .gitattributes from those older versions as it goes. > Everything else gets it via env-vars. > I'd assume this variable will also be passed to the commands which are invoked > by the -x switch? Yes, I was surprised Phillip's command with --attr-source on the outer-level git invocation worked until I discovered that the code indeed sets the environment variable (which subprocesses, like those created by the --exec/-x switch, will inherit). So, yes, the -x switch stuff seems to inherit that environment variable that the primary call sets in response to that parameter. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-09 7:21 ` Elijah Newren @ 2025-02-09 8:57 ` Josef Wolf 2025-02-10 17:51 ` D. Ben Knoble 0 siblings, 1 reply; 44+ messages in thread From: Josef Wolf @ 2025-02-09 8:57 UTC (permalink / raw) To: git On Sat, Feb 08, 2025 at 11:21:12PM -0800, Elijah Newren wrote: > On Sat, Feb 8, 2025 at 3:28 PM Josef Wolf <jw@raven.inka.de> wrote: > > > ahah, so it is passed via environment variable to the subprocess. > > > > I find this to be confusing: the primary call should not need this parameter, > > since it is invoked from HEAD anyway. > > No, the primary call I think would need the parameter too; it changes > HEAD immediately when it starts rebasing, and continues changing it > with each commit it rebases; since it's operating on older versions, > by default it'd likely pick the .gitattributes from those older > versions as it goes. OK. I see... > > Everything else gets it via env-vars. > > I'd assume this variable will also be passed to the commands which are invoked > > by the -x switch? > > Yes, I was surprised Phillip's command with --attr-source on the > outer-level git invocation worked until I discovered that the code > indeed sets the environment variable (which subprocesses, like those > created by the --exec/-x switch, will inherit). So, yes, the -x > switch stuff seems to inherit that environment variable that the > primary call sets in response to that parameter. Umm... OK... This means that specifying --attr-source to the commands for the -x switch is wrong, since they have a different HEAD? -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-09 8:57 ` Josef Wolf @ 2025-02-10 17:51 ` D. Ben Knoble 0 siblings, 0 replies; 44+ messages in thread From: D. Ben Knoble @ 2025-02-10 17:51 UTC (permalink / raw) To: Josef Wolf, git On Sun, Feb 9, 2025 at 3:58 AM Josef Wolf <jw@raven.inka.de> wrote: > > On Sat, Feb 08, 2025 at 11:21:12PM -0800, Elijah Newren wrote: > > Yes, I was surprised Phillip's command with --attr-source on the > > outer-level git invocation worked until I discovered that the code > > indeed sets the environment variable (which subprocesses, like those > > created by the --exec/-x switch, will inherit). So, yes, the -x > > switch stuff seems to inherit that environment variable that the > > primary call sets in response to that parameter. > > Umm... OK... This means that specifying --attr-source to the commands for the > -x switch is wrong, since they have a different HEAD? Not quite: the command actually did > git --attr-source=$(git rev-parse HEAD) […] So the subprocesses will see the attributes source as a full-length commit hash, not "HEAD" -- D. Ben Knoble ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-08 0:23 ` Elijah Newren 2025-02-08 11:14 ` Phillip Wood @ 2025-02-08 20:57 ` Josef Wolf 2025-02-08 21:56 ` Elijah Newren 2025-02-09 9:25 ` Josef Wolf 1 sibling, 2 replies; 44+ messages in thread From: Josef Wolf @ 2025-02-08 20:57 UTC (permalink / raw) To: git On Fri, Feb 07, 2025 at 04:23:45PM -0800, Elijah Newren wrote: > I may have misunderstood what folks were saying in my reading & > skimming of this thread. I thought some folks were suggesting > > git rebase --root -X renormalize > > as a way to renormalize the history, assuming you have linear history. Yes. And this did not work. Then there was Brian's suggenstion, so I tried: git rebase --root -x 'git add --renormalize . && git commit --amend --no-edit' which won't work because not every commit touches a filtered file, so I also tried: git rebase --root -x 'git add --renormalize . && git status --quiet -uno | git commit --amend --no-edit' which also did not work. Looks like git-status always exits with success. Why? > I was arguing against that; it's not going to work and isn't meant > to[1]. I also see I didn't look closely enough at Phillip's > suggestion, which was: > > git rebase --root -x 'git add --renormalize . && { git diff --quiet > --cached || git commit --amend --no-edit; }' > > which will work if you do a lot of manual work to resolve line ending > difference conflicts. Since the git add at each step will modify the > files on which the next commit is based, that causes the application > of the subsequent commit to conflict, and you probably will have > difficulty seeing those conflicts since they tend to just be line > ending differences. This did not work also: generated LOTS of conflicts. Oh, have I mentioned that I am not only about line endings? Yes, I mentioned it in the very first mail. In addition to line endings, I am also about XML files from a proprietary application which reorders the XML-elements into a random order every time it ist run. So the clean-filter needs to sort the XML elements into some "canonical" order. > But, mixing that with Brian's suggestion, you get: > > git rebase --root -X renormalize -x 'git add --renormalize . && { git diff --quiet --cached || git commit --amend --no-edit; }' Yes, this finally works, IF git add --renormalize . && git commit --amend --no-edit is run before starting the rebase process. BTW: why won't git rebase --root -X renormalize \ -x 'git add --renormalize .' \ -x 'git diff --quiet --cached || git commit --amend --no-edit' work? > Were you trying one of these three? Or something else? Yes. And even more... Oh, the application I am talking about also tracks changes in those XML files in corresponding hash files. I added those hash files into .gitignore and re-create them in the smudge-filter. This works fine so far, but it also generates lots of conflicts during renormalization. So I created a helper for the -x parameter of the renormalize-process to also remove those hash files: #! /bin/sh -e find gt8/ETS/Projekte/* -maxdepth 1 \ -name "[BDGIUP].ets5hash" -o \ -name "P-*.ets5hash" \ -print0 \ | xargs -r0 git status --short -uno \ | sed -n "s/^...\(.*\.ets5hash\)$/\1/p" \ | xargs -r git rm -f git --attr-source=$(git rev-parse HEAD) diff --quiet --cached || \ git --attr-source=$(git rev-parse HEAD) commit --amend --no-edit git --attr-source=$(git rev-parse HEAD) add --renormalize . git --attr-source=$(git rev-parse HEAD) diff --quiet --cached || \ git --attr-source=$(git rev-parse HEAD) commit --amend --no-edit But no matter how I construt this, the renormalize keeps conflicting on these files. Whehn I do git rm -f gt8/ETS/Projekte/XXX/U.ets5hash git --attr-source=$(git rev-parse HEAD) commit --amend --no-edit git rebase --continue manually, it works fine. Why won't the git-rm work when called from git-rebase directly? > [1] The renormalize option to the merge machinery ensures that new > blobs produced by the merge have normalized content, and avoid > conflicts when the only differences between files are normalization > ones. This option does not ensure that new trees only reference new > content nor that they only reference normalized content; _any_ > pre-existing blobs in the repository are fair game for new trees to > reference. OK. But then, non-normalized content should go through the clean-filter before it is handed over to diff/merge when filtering is active. At least when --renormalize is in effect. Using smudged content for diff/merge operations is a sure recipe for failure. > As per the manual: "renormalize...This runs a virtual > check-out and check-in of all three stages of a file when resolving a > three-way merge..." So, the existing behavior of the renormalize > option to rebase/cherry-pick/merge is correct. A virtual check-out and check-in should result in smudge+clean. Running this on smudged content results in smudge+smudge+clean. Which by definition is equivalent to a simple clean. No conflicts shoud happen, then. So the _description_ looks correct. But where do the conflicts coming from? > It may not be what you want I don't see how the description matches actual behaviour -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-08 20:57 ` Josef Wolf @ 2025-02-08 21:56 ` Elijah Newren 2025-02-09 9:25 ` Josef Wolf 1 sibling, 0 replies; 44+ messages in thread From: Elijah Newren @ 2025-02-08 21:56 UTC (permalink / raw) To: Josef Wolf, git On Sat, Feb 8, 2025 at 12:58 PM Josef Wolf <jw@raven.inka.de> wrote: > > On Fri, Feb 07, 2025 at 04:23:45PM -0800, Elijah Newren wrote: [...] > > [1] The renormalize option to the merge machinery ensures that new > > blobs produced by the merge have normalized content, and avoid > > conflicts when the only differences between files are normalization > > ones. This option does not ensure that new trees only reference new > > content nor that they only reference normalized content; _any_ > > pre-existing blobs in the repository are fair game for new trees to > > reference. > > OK. > > But then, non-normalized content should go through the clean-filter before it > is handed over to diff/merge when filtering is active. Not quite; if the diff/merge doesn't need to look at the content of the file to resolve the merge (i.e. the merge can simply use the file's already known hash as the resolution), then, since that content isn't read it shouldn't go through any filters. Whenever you merge two trees, only the files modified on both sides need to be inspected; the rest can be resolved without looking at their content. > > As per the manual: "renormalize...This runs a virtual > > check-out and check-in of all three stages of a file when resolving a > > three-way merge..." So, the existing behavior of the renormalize > > option to rebase/cherry-pick/merge is correct. > > A virtual check-out and check-in should result in smudge+clean. Running this > on smudged content results in smudge+smudge+clean. Which by definition is > equivalent to a simple clean. No conflicts shoud happen, then. > > So the _description_ looks correct. But where do the conflicts coming from? > > > It may not be what you want > > I don't see how the description matches actual behaviour The description says "This runs a virtual check-out and check-in of all three stages of a file when resolving a three-way merge..." So, when a file needs a three-way merge to be resolved, then the virtual check-out and check-in is done. When no three-way merge is needed for a file, no virtual check-out and check-in is done. Perhaps the documentation would be clearer if it read: renormalize This runs a virtual check-out and check-in of all three stages of any file which needs a three-way merge. ? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-08 20:57 ` Josef Wolf 2025-02-08 21:56 ` Elijah Newren @ 2025-02-09 9:25 ` Josef Wolf 2025-02-09 11:14 ` Torsten Bögershausen 1 sibling, 1 reply; 44+ messages in thread From: Josef Wolf @ 2025-02-09 9:25 UTC (permalink / raw) To: git On Sat, Feb 08, 2025 at 09:57:09PM +0100, Josef Wolf wrote: I just stumbled over another wirdeness: > Oh, have I mentioned that I am not only about line endings? Yes, I mentioned > it in the very first mail. In addition to line endings, I am also about XML > files from a proprietary application which reorders the XML-elements into a > random order every time it ist run. So the clean-filter needs to sort the > XML elements into some "canonical" order. This application stores the bulk of the data as text files and XML files with CRLF. But there are also some binary files. So I set gitattributes like this: # Catch bulk as text=crlf, rely on git to detect binary */* text=auto eol=crlf # # those are known to be text=crlf */B text eol=crlf */P-* text eol=crlf # # smudge-clean filter */B filter=etsfile */P-* filter=etsfile # # files I dont't want to touch (mostly binaries) */*.dll -filter -text */*.ver -filter -text */*.lang -filter -text */*.store -filter -text */*.ets5hash -filter -text But "git ls-files --eol" gives me this: i/lf w/lf attr/text eol=crlf gt8/ETS/Projekte/P-0113/B Why is git ignoring my explicit CRLF setting? This is on linux and on Windows+MSYS2. I don't have $GIT_DIR/info/attributes and ~/.gitconfig also doesn't specify any line ending things -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-09 9:25 ` Josef Wolf @ 2025-02-09 11:14 ` Torsten Bögershausen 2025-02-09 15:09 ` Josef Wolf 0 siblings, 1 reply; 44+ messages in thread From: Torsten Bögershausen @ 2025-02-09 11:14 UTC (permalink / raw) To: Josef Wolf, git On Sun, Feb 09, 2025 at 10:25:14AM +0100, Josef Wolf wrote: > On Sat, Feb 08, 2025 at 09:57:09PM +0100, Josef Wolf wrote: > > I just stumbled over another wirdeness: > > > Oh, have I mentioned that I am not only about line endings? Yes, I mentioned > > it in the very first mail. In addition to line endings, I am also about XML > > files from a proprietary application which reorders the XML-elements into a > > random order every time it ist run. So the clean-filter needs to sort the > > XML elements into some "canonical" order. > > This application stores the bulk of the data as text files and XML files with > CRLF. But there are also some binary files. So I set gitattributes like this: > > # Catch bulk as text=crlf, rely on git to detect binary > */* text=auto eol=crlf This looks a little bit strange to me. What happens if you replace "*/*" with "*" like this. * text=auto eol=crlf > # > # those are known to be text=crlf > */B text eol=crlf > */P-* text eol=crlf Same here. What is B ? Is it a directory ? > # > # smudge-clean filter > */B filter=etsfile > */P-* filter=etsfile > # > # files I dont't want to touch (mostly binaries) > */*.dll -filter -text > */*.ver -filter -text > */*.lang -filter -text > */*.store -filter -text > */*.ets5hash -filter -text *.dll -filter -text (and the same for everything else) > > But "git ls-files --eol" gives me this: > > i/lf w/lf attr/text eol=crlf gt8/ETS/Projekte/P-0113/B > > Why is git ignoring my explicit CRLF setting? > > This is on linux and on Windows+MSYS2. I don't have $GIT_DIR/info/attributes > and ~/.gitconfig also doesn't specify any line ending things > > -- > Josef Wolf > jw@raven.inka.de > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-09 11:14 ` Torsten Bögershausen @ 2025-02-09 15:09 ` Josef Wolf 2025-02-09 17:54 ` Josef Wolf 0 siblings, 1 reply; 44+ messages in thread From: Josef Wolf @ 2025-02-09 15:09 UTC (permalink / raw) To: git Hello Torsten, On Sun, Feb 09, 2025 at 12:14:06PM +0100, Torsten Bögershausen wrote: > On Sun, Feb 09, 2025 at 10:25:14AM +0100, Josef Wolf wrote: > > This application stores the bulk of the data as text files and XML files with > > CRLF. But there are also some binary files. So I set gitattributes like this: > > > > # Catch bulk as text=crlf, rely on git to detect binary > > */* text=auto eol=crlf > > > This looks a little bit strange to me. This should match all files in directories one level deeper than the directory where .gitattributes live: If there is a separator at the beginning or middle (or both) of the pattern, then the pattern is relative to the directory level of the particular .gitignore file itself. > What happens if you replace "*/*" with "*" like this. > * text=auto eol=crlf Same result, but when I commit .gitattributes, I get a warning that git will do lf->crl conversion. But even after commit, no conversion is done and git-ls-files still shows: i/lf w/lf attr/text=auto eol=crlf gt8/ETS/Projekte/.gitignore Only after removal followed by "git reset --hard", I get: i/lf w/lfcr attr/text=auto eol=crlf gt8/ETS/Projekte/.gitignore > > # > > # those are known to be text=crlf > > */B text eol=crlf > > */P-* text eol=crlf > Same here. What is B ? Is it a directory ? No. It is one of the XML files I want to smudge+clean -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-09 15:09 ` Josef Wolf @ 2025-02-09 17:54 ` Josef Wolf 2025-02-09 18:01 ` Josef Wolf 0 siblings, 1 reply; 44+ messages in thread From: Josef Wolf @ 2025-02-09 17:54 UTC (permalink / raw) To: git Uh! It starts getting real wired. After one more change to .gitattributes, one of the files marked as binary checks out as an EMPTY file and I can't find any git command to fix the situation: $ cat .gitattributes # Most files in ETS ProjectStore are XML with CRLF # * text=auto eol=crlf .gitignore text .gitattributes text # Binary files # *.dat -filter -text *.dll -filter -text *.ver -filter -text *.lang -filter -text *.store -filter -text # <--- this is the problematic file *.ets5hash -filter -text # Smudge/clean filter # */B filter=etsfile */D filter=etsfile */G filter=etsfile */I filter=etsfile */P filter=etsfile */U filter=etsfile */P-* filter=etsfile $ git diff diff --git a/gt8/ETS/Projekte/P-0113/P-0113.store b/gt8/ETS/Projekte/P-0113/P-0113.store index c33a5239..e69de29b 100755 --- a/gt8/ETS/Projekte/P-0113/P-0113.store +++ b/gt8/ETS/Projekte/P-0113/P-0113.store @@ -1 +0,0 @@ -4TamRjepVNV8F+bC4nBcBwXIymvb2IQdu0qEuMSB0o0= \ No newline at end of file $ git reset --hard HEAD is now at 6fba03d9 Fix .gitattributes again $ git diff diff --git a/gt8/ETS/Projekte/P-0113/P-0113.store b/gt8/ETS/Projekte/P-0113/P-0113.store index c33a5239..e69de29b 100755 --- a/gt8/ETS/Projekte/P-0113/P-0113.store +++ b/gt8/ETS/Projekte/P-0113/P-0113.store @@ -1 +0,0 @@ -4TamRjepVNV8F+bC4nBcBwXIymvb2IQdu0qEuMSB0o0= \ No newline at end of file $ rm -rf P-0113/ ; git checkout P-0113/ Updated 382 paths from the index $ git diff diff --git a/gt8/ETS/Projekte/P-0113/P-0113.store b/gt8/ETS/Projekte/P-0113/P-0113.store index c33a5239..e69de29b 100755 --- a/gt8/ETS/Projekte/P-0113/P-0113.store +++ b/gt8/ETS/Projekte/P-0113/P-0113.store @@ -1 +0,0 @@ -4TamRjepVNV8F+bC4nBcBwXIymvb2IQdu0qEuMSB0o0= \ No newline at end of file $ git ls-files --eol |grep P-0113.store i/none w/none attr/-text P-0113/P-0113.store i/lf w/crlf attr/text=auto eol=crlf P-0113/storeVersion $ -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-09 17:54 ` Josef Wolf @ 2025-02-09 18:01 ` Josef Wolf 0 siblings, 0 replies; 44+ messages in thread From: Josef Wolf @ 2025-02-09 18:01 UTC (permalink / raw) To: git On Sun, Feb 09, 2025 at 06:54:50PM +0100, Josef Wolf wrote: Upps. That's probably an ordering problem: > *.store -filter -text # <--- this is the problematic file > [ ... ] > */P-* filter=etsfile Later line overrides first line. Please ignore my last mail. -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-07 11:12 ` Chris Torek 2025-02-07 11:17 ` Chris Torek 2025-02-07 14:01 ` Elijah Newren @ 2025-02-07 20:21 ` Josef Wolf 2 siblings, 0 replies; 44+ messages in thread From: Josef Wolf @ 2025-02-07 20:21 UTC (permalink / raw) To: git On Fri, Feb 07, 2025 at 03:12:44AM -0800, Chris Torek wrote: > On Fri, Feb 7, 2025 at 2:46 AM Josef Wolf <jw@raven.inka.de> wrote: > Git is famously *fast* (as compared to most of the systems that came > before or at the same time anyway). In the old days when I used CVS > and Subversion and the like, we'd run a commit or update, and then go > out for coffee or lunch or whatever, because we knew we were not > going to be able to do anything for another ten minutes or perhaps > even an hour or more. Then Git came along and we'd run "git checkout" > or "git commit" and it would say it was done, often without even a > noticeable pause, and we'd wonder if it actually did anything at all. Well, I know the days of CVS. I even know the days of RCS. And yeah, bak in those days you used to cross your fingers hoping that all will go well while drinking the coffee. I think there is more into git than speed. -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-07 10:45 ` Josef Wolf 2025-02-07 11:06 ` Torsten Bögershausen 2025-02-07 11:12 ` Chris Torek @ 2025-02-07 15:39 ` Junio C Hamano 2 siblings, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2025-02-07 15:39 UTC (permalink / raw) To: Josef Wolf; +Cc: git Josef Wolf <jw@raven.inka.de> writes: > I completely fail to understand why git insists to operate on smudged files in > many situations. > > IIUC, once clean/smudge are installed, all internal operations should be done > on clean files. So why do I need this "git add --renormalize ." at all and (in > the case of cherry-pick) there is not even any way to renormalize before > picking. > > But maybe my understanding is too simplicistic here... Nah, I suspect that the reason is much simpler. Many tools in Git toolset (like cherry-pick) were written long before clean-smudge got popular, and they were written by those who did not need clean-smudge. Those capable of updating them still have not felt the need for clean-smudge for themselves. Motivate them and we may see responses ;-) ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-05 23:59 ` Josef Wolf 2025-02-06 0:29 ` brian m. carlson @ 2025-02-06 10:13 ` Phillip Wood 1 sibling, 0 replies; 44+ messages in thread From: Phillip Wood @ 2025-02-06 10:13 UTC (permalink / raw) To: Josef Wolf, git, brian m. carlson, Elijah Newren Hi Josef On 05/02/2025 23:59, Josef Wolf wrote: > >> git rebase --root -x 'git add --renormalize . && git commit --amend --no-edit' > > Unfortunately, this runs the command on every commit and gives a warning when > a cmmit don't touch a filtered file: > > $ git rebase --root -x 'git add --renormalize . && git commit --amend --no-edit' > [ ... ] > No changes > You asked to amend the most recent commit, but doing so would make > it empty. You can repeat your command with --allow-empty, or you can > remove the commit entirely with "git reset HEAD^". > > Is there a way to run the command only when rebase halts? You could try using "git diff --cached --quiet" to avoid running "git commit" if there are no changes. git rebase --root -x 'git add --renormalize . && { git diff --quiet --cached || git commit --amend --no-edit; }' Best Wishes Phillip ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-05 22:55 ` brian m. carlson 2025-02-05 23:59 ` Josef Wolf @ 2025-02-06 7:55 ` Elijah Newren 2025-02-06 19:00 ` Junio C Hamano 1 sibling, 1 reply; 44+ messages in thread From: Elijah Newren @ 2025-02-06 7:55 UTC (permalink / raw) To: brian m. carlson, Josef Wolf, git, Elijah Newren On Wed, Feb 5, 2025 at 2:55 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > On 2025-02-05 at 21:47:26, Josef Wolf wrote: > > Hello all, > > > > I have set up clean/smudge filters to normalzize files from an application to > > reduce the pain when those files are tracked by git. > > > > The clean/smudge filter work well on new commit and the result of > > smudge+smudge+clean is the same as the result of a simple clean, so the filter > > should be fine IMHO. > > > > But whenever I do any operations which introduce not yet normalized commits, I > > keep getting errors. > > Yes, this is known to occur. It notably happens with Git LFS, which > uses smudge and clean filters, and suffers from this same problem. > Renormalizing is indeed the right solution. > > > So to get rod of those errors, I'd like to also renormalize the history: > > > > $ git rebase --root --strategy renormalize > > error: Your local changes to the following files would be overwritten by > > merge: > > foo/bar/baz > > Please commit your changes or stash them before you merge. > > Aborting > > $ git add foo/bar/baz > > $ git commit -m renormalize foo/bar/baz > > $ git rebase --continue > > git: 'merge-renormalize' is not a git command. See 'git --help'. > > error: could not apply abcdef... Foo Bar Baz > > [ ... ] > > > > Huh? I never entered a command "merge-renormalize" > > When you use command like `--strategy foo` with a custom strategy, Git > calls a binary called `git merge-foo` to implement that strategy. So > while you didn't explicitly invoke that, when you used the nonstandard > strategy `renormalize` (which, by the way, does not exist), Git invoked > it when you rebased, since rebases by default use merges under the hood. > > > BTW: It does not make any difference whether I add "-c merge.renormalze=true" > > That option also does not exist. Can you tell us where you found such a > recommendation? If we've been misleading people in our documentation, > I'd like to fix. > > > What would be the proper way to renormalize history? > > The command that needs to be done is `git add --renormalize .` I think > you probably want to do is something like this: `git rebase --root -x > 'git add --renormalize . && git commit --amend --no-edit'`. > > You might also be able to use `git filter-repo` to do this in a nicer > way, but I'm not aware of how to do that. I've CCed the maintainer, > however, in case he or anyone else can provide an answer. `git add --renormalize .` requires a full checkout and an index. filter-repo was written to not require checkouts or an index; it should be able to operate in a bare repository as well. So, these simply don't go that well together. If we had a way to ask git "how would renormalization modify this buffer if it were at this path" we might be able to provide something (though that might require having a whole bunch of .gitattributes contents available, which might also make it tricky). Folks have requested it (https://github.com/newren/git-filter-repo/issues/375), and the final commenter provided a workaround that might be good enough for you, but I kind of think we need a way to ask git "how would renormalization modify this buffer if it were at this path" short of creating a full index and checkout. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter 2025-02-06 7:55 ` Elijah Newren @ 2025-02-06 19:00 ` Junio C Hamano 0 siblings, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2025-02-06 19:00 UTC (permalink / raw) To: Elijah Newren; +Cc: brian m. carlson, Josef Wolf, git Elijah Newren <newren@gmail.com> writes: > I kind of think we need a way to ask git "how would renormalization > modify this buffer if it were at this path" short of creating a full > index and checkout. That makes it sound as if you are asking for "diff/patch" between pre- and post- renormalization operation, but wouldn't the question be more like "pretend this buffer content were at this path in a checkout of this tree-ish. Now compute what 'git add --renormalize' would give us for that path". What would it take? - An equivalent of the in-core index (but you need to specify from which tree-ish it should be taken from) so that you can learn what attributes are attached to the path in question. You may want to grab`filter`, `ident`, `working-tree-encoding`, etc. out of the attribute subsystem. - Access to the "config" data, to learn what exact commands to spawn to filter the buffer for, and what encoding and line terminating conventions are used for given path. You may want to grab values of "filter.<name>.{clean,smudge}", "core.eol", etc., for example. - A sandbox to safely run these external commands needed for smudge/clean filters. It does not sound entirely trivial, but it does not look too much of recket science, either. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter, again 2025-02-05 21:47 renormalize histroy with smudge/clean-filter Josef Wolf 2025-02-05 22:55 ` brian m. carlson @ 2025-02-11 23:57 ` Josef Wolf 2025-02-12 6:12 ` Torsten Bögershausen ` (3 more replies) 1 sibling, 4 replies; 44+ messages in thread From: Josef Wolf @ 2025-02-11 23:57 UTC (permalink / raw) To: git Still struggling with my filter problem. Here is what I do: - Set up a clean filter which enforces CRLF (yes, for this specific use case I want CRLF even on linux) - Smudge filter does not modify the file at all - Set up git to fail when filter fails, so I can double-check that the filter is actually runnning: $ grep -A3 filter..etsfile ~/.gitconfig [filter "etsfile"] required = true clean = ets-utils -c smudge = ets-utils -s %f - Specify file as non-text and install the filter: $ grep etsfile .gitattributes */P -text filter=etsfile $ git commit .gitattributes - Check that git gets attributes as I want them: $ git --attr-source=$(git rev-parse HEAD) check-attr -a P-0113/P P-0113/P: text: unset P-0113/P: filter: etsfile $ git ls-files --eol P-0113/P i/lf w/ attr/-text P-0113/P - Create helper for renormalization $ cat renormalization-helper #! /bin/sh -e git add --renormalize . git diff --quiet --cached || \ git commit --amend --no-edit - Run the renormalization for the linear history: $ git --attr-source=$(git rev-parse HEAD) \ rebase --root -X renormalize \ -x $(dirname $0)/renormalize-helper So at this point, I'd expect the falie to have CRLF line endings. But it doesn't, so I do: $ rm -rf P-0113 git checkout --attr-source=$(git rev-parse HEAD) P-0113 Still no CRLF, so I look at what is stored by git: $ git --attr-source=$(git rev-parse HEAD) show 873a9b:P-0113/P |less -U Again, no CRLF. So I check all revisions in the history. Resut: no revision has CRLF. So the renormalization process does not work for me at all. Any ideas? -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter, again 2025-02-11 23:57 ` renormalize histroy with smudge/clean-filter, again Josef Wolf @ 2025-02-12 6:12 ` Torsten Bögershausen 2025-02-12 8:18 ` Josef Wolf 2025-02-13 11:36 ` Collisions while cloning (was: Re: renormalize histroy with smudge/clean-filter, again) Josef Wolf ` (2 subsequent siblings) 3 siblings, 1 reply; 44+ messages in thread From: Torsten Bögershausen @ 2025-02-12 6:12 UTC (permalink / raw) To: Josef Wolf, git On Wed, Feb 12, 2025 at 12:57:07AM +0100, Josef Wolf wrote: > Still struggling with my filter problem. > > Here is what I do: > > - Set up a clean filter which enforces CRLF (yes, for this specific use > case I want CRLF even on linux) In general, clean filters do their work when 'git add' or 'git commit file' is run. Does the filter do the CRLF conversion ? Or is it done in .gitattributes ? > > - Smudge filter does not modify the file at all > > - Set up git to fail when filter fails, so I can double-check that the > filter is actually runnning: > > $ grep -A3 filter..etsfile ~/.gitconfig > [filter "etsfile"] > required = true > clean = ets-utils -c > smudge = ets-utils -s %f > > - Specify file as non-text and install the filter: > > $ grep etsfile .gitattributes > */P -text filter=etsfile > $ git commit .gitattributes > > - Check that git gets attributes as I want them: > > $ git --attr-source=$(git rev-parse HEAD) check-attr -a P-0113/P > P-0113/P: text: unset > P-0113/P: filter: etsfile > $ git ls-files --eol P-0113/P > i/lf w/ attr/-text P-0113/P > > - Create helper for renormalization > > $ cat renormalization-helper > #! /bin/sh -e > git add --renormalize . > git diff --quiet --cached || \ > git commit --amend --no-edit > > - Run the renormalization for the linear history: > > $ git --attr-source=$(git rev-parse HEAD) \ > rebase --root -X renormalize \ > -x $(dirname $0)/renormalize-helper That will change the index, the repo, but not the working tree on disk, right ? > > So at this point, I'd expect the falie to have CRLF line endings. But it > doesn't, so I do: > > $ rm -rf P-0113 > git checkout --attr-source=$(git rev-parse HEAD) P-0113 > > Still no CRLF, so I look at what is stored by git: > > $ git --attr-source=$(git rev-parse HEAD) show 873a9b:P-0113/P |less -U > > Again, no CRLF. Just to make sure: You want to see the CRLF in the files on disk ? Do you have a valid .gitattributes file on disk now ? If yes, what does 'git ls-files --eol P-0113' say ? What does 'git status' say ? > > So I check all revisions in the history. Resut: no revision has CRLF. > So the renormalization process does not work for me at all. In general, renormalization is about the content inside the repo. If a filter is applied, or .gitattributes are changed, the files on disk are not updated automatically. 'mv -f P-0113 /tmp && git checkout P-0113' may be needed. > > Any ideas? Yes. The best thing to do (tm) would be to create a dummy repo, do all all the operations from scratch and post the stuff here. In other words, write a shell script that creates an empty repo, fills it with content, and does all the operations. That would enable people to reproduce it and look what is going on. Hope that make sense. > > -- > Josef Wolf > jw@raven.inka.de > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter, again 2025-02-12 6:12 ` Torsten Bögershausen @ 2025-02-12 8:18 ` Josef Wolf 0 siblings, 0 replies; 44+ messages in thread From: Josef Wolf @ 2025-02-12 8:18 UTC (permalink / raw) To: git Hi Torsten, On Wed, Feb 12, 2025 at 07:12:36AM +0100, Torsten Bögershausen wrote: > > - Set up a clean filter which enforces CRLF (yes, for this specific use > > case I want CRLF even on linux) > > In general, clean filters do their work when 'git add' or 'git commit file' > is run. Yes. This is done in the renormalise-helper shell script, which I included into my description below: > > $ cat renormalization-helper > > #! /bin/sh -e > > git add --renormalize . > > git diff --quiet --cached || \ > > git commit --amend --no-edit > Does the filter do the CRLF conversion ? As I wrote above: yes, the clean filter enforces CRLF > Or is it done in .gitattributes ? No. .gitattributes states that git should not modify the file since I have set it -text, as I wrote: > > */P -text filter=etsfile > > - Run the renormalization for the linear history: > > > > $ git --attr-source=$(git rev-parse HEAD) \ > > rebase --root -X renormalize \ > > -x $(dirname $0)/renormalize-helper > > That will change the index, the repo, but not the working tree on disk, > right ? "git reset --hard" or even "rm -rf P-0113; git checkout P-0113", also do not bring the CRLF into the file, see below. > > So at this point, I'd expect the falie to have CRLF line endings. But it > > doesn't, so I do: > > > > $ rm -rf P-0113 > > git checkout --attr-source=$(git rev-parse HEAD) P-0113 > > > > Still no CRLF, so I look at what is stored by git: > > > > $ git --attr-source=$(git rev-parse HEAD) show 873a9b:P-0113/P |less -U > > > > Again, no CRLF. > > Just to make sure: > You want to see the CRLF in the files on disk ? In the first place I want to see them in the repo. And a fresh checkout should bring them into the files on disk, since -text is in effect. > Do you have a valid .gitattributes file on disk now ? git recognizes my setting -text and filter=etsfile, as I wrote: > > $ git --attr-source=$(git rev-parse HEAD) check-attr -a P-0113/P > > P-0113/P: text: unset > > P-0113/P: filter: etsfile > If yes, what does 'git ls-files --eol P-0113' say ? As I wrote above: > > $ git ls-files --eol P-0113/P > > i/lf w/ attr/-text P-0113/P > What does 'git status' say ? Nothing, since git add --renormalize . && git commit --amend --no-edit have been done by the helper script on every commit of the history > > So I check all revisions in the history. Resut: no revision has CRLF. > > So the renormalization process does not work for me at all. > > In general, renormalization is about the content inside the repo. > If a filter is applied, or .gitattributes are changed, the files > on disk are not updated automatically. This is why I checkd the contents which are stored in the repo: > > $ git --attr-source=$(git rev-parse HEAD) show 873a9b:P-0113/P |less -U > 'mv -f P-0113 /tmp && git checkout P-0113' may be needed. Well, I did this instead: > > $ rm -rf P-0113 > > git checkout --attr-source=$(git rev-parse HEAD) P-0113 > Yes. The best thing to do (tm) would be to create a dummy repo, > do all all the operations from scratch and post the stuff here. > In other words, write a shell script that creates an empty repo, > fills it with content, and does all the operations. > That would enable people to reproduce it and look what is going on. > Hope that make sense. Well, if I _knew_ what triggers the problem, I could create such a script. As long as I can not figure what triggers the problem, I have to dig into internals of this old repo with long-running history. -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Collisions while cloning (was: Re: renormalize histroy with smudge/clean-filter, again) 2025-02-11 23:57 ` renormalize histroy with smudge/clean-filter, again Josef Wolf 2025-02-12 6:12 ` Torsten Bögershausen @ 2025-02-13 11:36 ` Josef Wolf 2025-02-13 16:40 ` Torsten Bögershausen 2025-02-14 20:03 ` renormalize histroy with smudge/clean-filter, again Josef Wolf 2025-02-14 20:21 ` brian m. carlson 3 siblings, 1 reply; 44+ messages in thread From: Josef Wolf @ 2025-02-13 11:36 UTC (permalink / raw) To: git Hi folks, while investigating/recovering my problems with renormalizing with clean/smudge filtering, I stumbled on collisions while creating a fresh clone of the repo from the server: $ LANG= git clone ssh://gitrepos@my.server/repo smart-home-ets5hashes-removed Cloning into 'smart-home-ets5hashes-removed'... remote: Enumerating objects: 7499, done. remote: Counting objects: 100% (7499/7499), done. remote: Compressing objects: 100% (3263/3263), done. remote: Total 7499 (delta 3955), reused 7109 (delta 3594), pack-reused 0 Receiving objects: 100% (7499/7499), 140.12 MiB | 10.54 MiB/s, done. Resolving deltas: 100% (3955/3955), done. Updating files: 100% (1423/1423), done. warning: the following paths have collided (e.g. case-sensitive paths on a case-insensitive filesystem) and only one from the same colliding group is in the working tree: 'Projects/P-0113/B.ets5hash' [more files deleted] This is on linux, so the FS is _not_ case-insensitive. The list of files given here is almost identical to the list of files which always give me collisions during renormalization process. Here is an explanation of how and why those files ended up in the repo and a hypothesis of why they might be in conflicting state. Those files contain hash values of the real data files for a proprietary application and are re-calculated on every invocation of the application. The application won't even start up if those hashes don't match. And it won't tell why it won't start, it just says "Corrupt data". At the time this repository started, I had no knowledge how the hashes of those files are calculated, so I had to commit them along with the associated data files to keep the application happy. This results in conflicts with many git operastions, of course. Then I learned how those files can be re-calculated and wrote a smudge-filter to keep them in sync with the data files. Since I was now able to recreate those files, I put them into .gitignore and installed the smudge-filter to recalculate them. But I left the files in the repo as a fallback, just to be sure. And I kept committing them every now and then whenever git showed differences, although they already were in .gitignore. So I guess those collisions might come from committing the ignored files. Unfortunately, I could not reproduce this effect on a fresh repo, yet. And the next question is: why do those conflicts cause the renormalization process to completely fail, even when the conflicts are resolved during the renormalization rebase? This, I also could not reproduced on a fresh repo. On Wed, Feb 12, 2025 at 12:57:07AM +0100, Josef Wolf wrote: > Still struggling with my filter problem. > > Here is what I do: > > - Set up a clean filter which enforces CRLF (yes, for this specific use > case I want CRLF even on linux) > > - Smudge filter does not modify the file at all > > - Set up git to fail when filter fails, so I can double-check that the > filter is actually runnning: > > $ grep -A3 filter..etsfile ~/.gitconfig > [filter "etsfile"] > required = true > clean = ets-utils -c > smudge = ets-utils -s %f > > - Specify file as non-text and install the filter: > > $ grep etsfile .gitattributes > */P -text filter=etsfile > $ git commit .gitattributes > > - Check that git gets attributes as I want them: > > $ git --attr-source=$(git rev-parse HEAD) check-attr -a P-0113/P > P-0113/P: text: unset > P-0113/P: filter: etsfile > $ git ls-files --eol P-0113/P > i/lf w/ attr/-text P-0113/P > > - Create helper for renormalization > > $ cat renormalization-helper > #! /bin/sh -e > git add --renormalize . > git diff --quiet --cached || \ > git commit --amend --no-edit > > - Run the renormalization for the linear history: > > $ git --attr-source=$(git rev-parse HEAD) \ > rebase --root -X renormalize \ > -x $(dirname $0)/renormalize-helper > > So at this point, I'd expect the falie to have CRLF line endings. But it > doesn't, so I do: > > $ rm -rf P-0113 > git checkout --attr-source=$(git rev-parse HEAD) P-0113 > > Still no CRLF, so I look at what is stored by git: > > $ git --attr-source=$(git rev-parse HEAD) show 873a9b:P-0113/P |less -U > > Again, no CRLF. > > So I check all revisions in the history. Resut: no revision has CRLF. > > So the renormalization process does not work for me at all. > > Any ideas? > > -- > Josef Wolf > jw@raven.inka.de > > -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Collisions while cloning (was: Re: renormalize histroy with smudge/clean-filter, again) 2025-02-13 11:36 ` Collisions while cloning (was: Re: renormalize histroy with smudge/clean-filter, again) Josef Wolf @ 2025-02-13 16:40 ` Torsten Bögershausen 0 siblings, 0 replies; 44+ messages in thread From: Torsten Bögershausen @ 2025-02-13 16:40 UTC (permalink / raw) To: Josef Wolf, git On Thu, Feb 13, 2025 at 12:36:14PM +0100, Josef Wolf wrote: > Hi folks, > > while investigating/recovering my problems with renormalizing with > clean/smudge filtering, I stumbled on collisions while creating a fresh clone > of the repo from the server: > > $ LANG= git clone ssh://gitrepos@my.server/repo > smart-home-ets5hashes-removed > Cloning into 'smart-home-ets5hashes-removed'... > remote: Enumerating objects: 7499, done. > remote: Counting objects: 100% (7499/7499), done. > remote: Compressing objects: 100% (3263/3263), done. > remote: Total 7499 (delta 3955), reused 7109 (delta 3594), pack-reused 0 > Receiving objects: 100% (7499/7499), 140.12 MiB | 10.54 MiB/s, done. > Resolving deltas: 100% (3955/3955), done. > Updating files: 100% (1423/1423), done. > warning: the following paths have collided (e.g. case-sensitive paths > on a case-insensitive filesystem) and only one from the same > colliding group is in the working tree: > > 'Projects/P-0113/B.ets5hash' > [more files deleted] > > This is on linux, so the FS is _not_ case-insensitive. > That sounds fishy (tm) Does 'git ls-files' give any hints ? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter, again 2025-02-11 23:57 ` renormalize histroy with smudge/clean-filter, again Josef Wolf 2025-02-12 6:12 ` Torsten Bögershausen 2025-02-13 11:36 ` Collisions while cloning (was: Re: renormalize histroy with smudge/clean-filter, again) Josef Wolf @ 2025-02-14 20:03 ` Josef Wolf 2025-02-14 20:21 ` brian m. carlson 3 siblings, 0 replies; 44+ messages in thread From: Josef Wolf @ 2025-02-14 20:03 UTC (permalink / raw) To: git Since none of the methods using plain git worked, my next try was to reach out to git-filter-repo: Again, using my renormalize-helper script: $ cat renormalize-helper #! /bin/sh -e git add --renormalize . git diff --quiet --cached || \ git commit --amend --no-edit So I go with git-filter-repo: $ git clone ssh://gitrepos@my.server/repo fresh-clone $ cd fresh-clone $ git-filter-repo \ --prune-empty always \ --invert-paths --use-base-name \ --path-regex '\.ets5hash$' $ for branch in branch-1 branch-2 branch-3 ; do git checkout -b $branch-renormalized $branch git add --renormalize . git diff --quiet --cached || \ git commit -m"Renormalize HEAD" git rebase \ --root -X renormalize \ -x $renormalize_helper done This went without problem and contents looked fine, so I really thought I got it finally. But then I tried to move .gitattributes to the very beginnig of history: $ git rebase -i --root AGAIN conflicts due to line ending errors. Adding '--attr-source=$(git rev-parse HEAD)' and '-x renormalize-helper' did not help beside moving the conflicts to another location. Thus, although the renormalization process finished successfully, there are _still_ commits with unclean content in the repository. I REALLY REALLY REALLY think there should be an option --always-apply-clean-filter-to-all-content-before-feeding-to-merge-or-diff or something! On Wed, Feb 12, 2025 at 12:57:07AM +0100, Josef Wolf wrote: > Still struggling with my filter problem. > > Here is what I do: > > - Set up a clean filter which enforces CRLF (yes, for this specific use > case I want CRLF even on linux) > > - Smudge filter does not modify the file at all > > - Set up git to fail when filter fails, so I can double-check that the > filter is actually runnning: > > $ grep -A3 filter..etsfile ~/.gitconfig > [filter "etsfile"] > required = true > clean = ets-utils -c > smudge = ets-utils -s %f > > - Specify file as non-text and install the filter: > > $ grep etsfile .gitattributes > */P -text filter=etsfile > $ git commit .gitattributes > > - Check that git gets attributes as I want them: > > $ git --attr-source=$(git rev-parse HEAD) check-attr -a P-0113/P > P-0113/P: text: unset > P-0113/P: filter: etsfile > $ git ls-files --eol P-0113/P > i/lf w/ attr/-text P-0113/P > > - Create helper for renormalization > > $ cat renormalization-helper > #! /bin/sh -e > git add --renormalize . > git diff --quiet --cached || \ > git commit --amend --no-edit > > - Run the renormalization for the linear history: > > $ git --attr-source=$(git rev-parse HEAD) \ > rebase --root -X renormalize \ > -x $(dirname $0)/renormalize-helper > > So at this point, I'd expect the falie to have CRLF line endings. But it > doesn't, so I do: > > $ rm -rf P-0113 > git checkout --attr-source=$(git rev-parse HEAD) P-0113 > > Still no CRLF, so I look at what is stored by git: > > $ git --attr-source=$(git rev-parse HEAD) show 873a9b:P-0113/P |less -U > > Again, no CRLF. > > So I check all revisions in the history. Resut: no revision has CRLF. > > So the renormalization process does not work for me at all. > > Any ideas? > > -- > Josef Wolf > jw@raven.inka.de > > -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter, again 2025-02-11 23:57 ` renormalize histroy with smudge/clean-filter, again Josef Wolf ` (2 preceding siblings ...) 2025-02-14 20:03 ` renormalize histroy with smudge/clean-filter, again Josef Wolf @ 2025-02-14 20:21 ` brian m. carlson 2025-02-14 20:55 ` Josef Wolf 3 siblings, 1 reply; 44+ messages in thread From: brian m. carlson @ 2025-02-14 20:21 UTC (permalink / raw) To: Josef Wolf, git [-- Attachment #1: Type: text/plain, Size: 934 bytes --] On 2025-02-11 at 23:57:07, Josef Wolf wrote: > Still struggling with my filter problem. > > Here is what I do: > > - Set up a clean filter which enforces CRLF (yes, for this specific use > case I want CRLF even on linux) Is there a reason you can't use `eol=crlf` instead of a smudge/clean filter? That looks like this in the Git repo: *.bat text eol=crlf That might be an easier way to accomplish what you want and it will always result in CRLF in the working tree, regardless of operating system, even though in the repository it will still use LF[0]. Note that if you need a specific encoding, there's also `working-tree-encoding` as well. [0] Okay, technically someone can override it with `.git/info/attributes`, but if they do that and it doesn't work, that's their own fault. We don't worry about that case in this project. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: renormalize histroy with smudge/clean-filter, again 2025-02-14 20:21 ` brian m. carlson @ 2025-02-14 20:55 ` Josef Wolf 0 siblings, 0 replies; 44+ messages in thread From: Josef Wolf @ 2025-02-14 20:55 UTC (permalink / raw) To: git; +Cc: brian m. carlson On Fri, Feb 14, 2025 at 08:21:28PM +0000, brian m. carlson wrote: > On 2025-02-11 at 23:57:07, Josef Wolf wrote: > > Still struggling with my filter problem. > > > > Here is what I do: > > > > - Set up a clean filter which enforces CRLF (yes, for this specific use > > case I want CRLF even on linux) > > Is there a reason you can't use `eol=crlf` instead of a smudge/clean > filter? That looks like this in the Git repo: Yes. Most of the data files of this (proprietary) application are XML files using mostly CRLF, but there is also LF ancoded content. Like this: [ ... ] <foo>^M <bar> fonly LF in contents of bar </bar>^M </foo>^M In addition, it randomly shuffles the XML elements at every startup, even if no changes are done. To prevent conflocts from this, I need to sort the XML elements into a canonical ordering in the clean filter. -- Josef Wolf jw@raven.inka.de ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2025-02-14 20:56 UTC | newest] Thread overview: 44+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-05 21:47 renormalize histroy with smudge/clean-filter Josef Wolf 2025-02-05 22:55 ` brian m. carlson 2025-02-05 23:59 ` Josef Wolf 2025-02-06 0:29 ` brian m. carlson 2025-02-06 8:07 ` Elijah Newren 2025-02-06 13:40 ` Josef Wolf 2025-02-06 20:04 ` Josef Wolf 2025-02-07 6:10 ` Chris Torek 2025-02-07 10:45 ` Josef Wolf 2025-02-07 11:06 ` Torsten Bögershausen 2025-02-07 11:12 ` Chris Torek 2025-02-07 11:17 ` Chris Torek 2025-02-07 14:01 ` Elijah Newren 2025-02-07 20:32 ` Josef Wolf 2025-02-08 0:23 ` Elijah Newren 2025-02-08 11:14 ` Phillip Wood 2025-02-08 21:08 ` Josef Wolf 2025-02-08 21:43 ` Elijah Newren 2025-02-08 23:26 ` Josef Wolf 2025-02-09 2:33 ` D. Ben Knoble 2025-02-09 8:53 ` Josef Wolf 2025-02-09 7:21 ` Elijah Newren 2025-02-09 8:57 ` Josef Wolf 2025-02-10 17:51 ` D. Ben Knoble 2025-02-08 20:57 ` Josef Wolf 2025-02-08 21:56 ` Elijah Newren 2025-02-09 9:25 ` Josef Wolf 2025-02-09 11:14 ` Torsten Bögershausen 2025-02-09 15:09 ` Josef Wolf 2025-02-09 17:54 ` Josef Wolf 2025-02-09 18:01 ` Josef Wolf 2025-02-07 20:21 ` Josef Wolf 2025-02-07 15:39 ` Junio C Hamano 2025-02-06 10:13 ` Phillip Wood 2025-02-06 7:55 ` Elijah Newren 2025-02-06 19:00 ` Junio C Hamano 2025-02-11 23:57 ` renormalize histroy with smudge/clean-filter, again Josef Wolf 2025-02-12 6:12 ` Torsten Bögershausen 2025-02-12 8:18 ` Josef Wolf 2025-02-13 11:36 ` Collisions while cloning (was: Re: renormalize histroy with smudge/clean-filter, again) Josef Wolf 2025-02-13 16:40 ` Torsten Bögershausen 2025-02-14 20:03 ` renormalize histroy with smudge/clean-filter, again Josef Wolf 2025-02-14 20:21 ` brian m. carlson 2025-02-14 20:55 ` Josef Wolf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).