* git smudge filter fails @ 2016-03-09 18:29 Stephen Morton 2016-03-10 1:59 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: Stephen Morton @ 2016-03-09 18:29 UTC (permalink / raw) To: git@vger.kernel.org A git smudge filter, at least one that relies on the results from 'git log' does not seem to work on file A when doing a 'git update' from a revision where file A doesn't exist to a revision where it does exist. Below is a simple recipe to reproduce. This appears to me to be a bug. If not, why is it expected and is there anything I can do to work around this behaviour? Steve mkdir git_test cd git_test/ git init . touch bar.c git add . git commit -am "Initial commit. foo.c not here yet." git tag no_foo touch foo.c git add . git commit -am "Add foo, no content" echo 'Date is $Date$' >> foo.c git commit -am "Add date to foo.c" echo 'foo.c filter=dater' > .git/info/attributes git config --local filter.dater.smudge 'myDate=`git log --pretty=format:"%cd" --date=iso -1 -- %f`; sed -e "s/\(\\$\)Date[^\\$]*\\$/\1Date: $myDate \\$/g"' git config --local filter.dater.clean 'sed -e "s/\(\\$\)Date[^\\$]*\\$/\1Date\\$/g"' rm -f foo.c git checkout -- foo.c cat foo.c # observe keyword expansion git checkout no_foo git checkout master cat foo.c #observe keyword expansion lost ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git smudge filter fails 2016-03-09 18:29 git smudge filter fails Stephen Morton @ 2016-03-10 1:59 ` Jeff King 2016-03-10 14:45 ` Stephen Morton 0 siblings, 1 reply; 7+ messages in thread From: Jeff King @ 2016-03-10 1:59 UTC (permalink / raw) To: Stephen Morton; +Cc: git@vger.kernel.org On Wed, Mar 09, 2016 at 01:29:31PM -0500, Stephen Morton wrote: > git config --local filter.dater.smudge 'myDate=`git log > --pretty=format:"%cd" --date=iso -1 -- %f`; sed -e > "s/\(\\$\)Date[^\\$]*\\$/\1Date: $myDate \\$/g"' Your filter is running "git log" without a revision parameter, which means it is looking at HEAD. And here.... > git checkout no_foo > git checkout master > cat foo.c > #observe keyword expansion lost You are expecting this second one to do: 1. Switch HEAD to "master". 2. Checkout files which need updating. Looking at HEAD in your filter then examines "master", and you see the commit timestamp of the destination. But that isn't how it is implemented. Checkout will handle the file checkout _first_, as that is the part that is likely to run into problems (e.g., rejecting a switch because it would lose changes in the working tree). Only at the very end, after the change to the working tree has succeeded, do we update HEAD. I think the order you are expecting is conceptually cleaner, but I don't think we would want to switch it around (for reasons of efficiency and robustness). And I don't think we would want to make a promise about the ordering to callers either way, as it binds our implementation. So is there a way to reliably know the destination of a checkout? My first thought was that we could add a placeholder similar to "%f" that your filter could use. I'm not sure how we would handle the corner cases there, though (e.g., do we always have a "destination" to report? If not, what do we give the script?). I suspect you could also hack something together with a post-checkout script, though it would probably be a lot less efficient (and might also have some weird corner cases). -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git smudge filter fails 2016-03-10 1:59 ` Jeff King @ 2016-03-10 14:45 ` Stephen Morton 2016-03-10 21:05 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: Stephen Morton @ 2016-03-10 14:45 UTC (permalink / raw) To: Jeff King; +Cc: git@vger.kernel.org I am a bit confused because this is basically the example used in ProGit [1] and it is fundamentally broken. In fact, if I understand correctly, this means that smudge filters cannot be relied upon to provide any 'keyword expansion' type tasks because they will all by nature have to query the file with 'git log'. (Note that although in my example I used 'git checkout', with an only slightly more complicated example I can make it fail on 'git pull' which is perhaps a much more realistic use case. That was probably implied in your answer, I just wanted to mention it.) Steve [1] https://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes On Wed, Mar 9, 2016 at 8:59 PM, Jeff King <peff@peff.net> wrote: > On Wed, Mar 09, 2016 at 01:29:31PM -0500, Stephen Morton wrote: > >> git config --local filter.dater.smudge 'myDate=`git log >> --pretty=format:"%cd" --date=iso -1 -- %f`; sed -e >> "s/\(\\$\)Date[^\\$]*\\$/\1Date: $myDate \\$/g"' > > Your filter is running "git log" without a revision parameter, which > means it is looking at HEAD. > > And here.... > >> git checkout no_foo >> git checkout master >> cat foo.c >> #observe keyword expansion lost > > You are expecting this second one to do: > > 1. Switch HEAD to "master". > > 2. Checkout files which need updating. Looking at HEAD in your filter > then examines "master", and you see the commit timestamp of the > destination. > > But that isn't how it is implemented. Checkout will handle the file > checkout _first_, as that is the part that is likely to run into > problems (e.g., rejecting a switch because it would lose changes in the > working tree). Only at the very end, after the change to the working > tree has succeeded, do we update HEAD. > > I think the order you are expecting is conceptually cleaner, but I don't > think we would want to switch it around (for reasons of efficiency and > robustness). And I don't think we would want to make a promise about the > ordering to callers either way, as it binds our implementation. > > So is there a way to reliably know the destination of a checkout? My > first thought was that we could add a placeholder similar to "%f" that > your filter could use. I'm not sure how we would handle the corner cases > there, though (e.g., do we always have a "destination" to report? If > not, what do we give the script?). > > I suspect you could also hack something together with a post-checkout > script, though it would probably be a lot less efficient (and might also > have some weird corner cases). > > -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git smudge filter fails 2016-03-10 14:45 ` Stephen Morton @ 2016-03-10 21:05 ` Jeff King 2016-03-10 22:04 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Jeff King @ 2016-03-10 21:05 UTC (permalink / raw) To: Stephen Morton; +Cc: git@vger.kernel.org On Thu, Mar 10, 2016 at 09:45:19AM -0500, Stephen Morton wrote: > I am a bit confused because this is basically the example used in > ProGit [1] and it is fundamentally broken. In fact, if I understand > correctly, this means that smudge filters cannot be relied upon to > provide any 'keyword expansion' type tasks because they will all by > nature have to query the file with 'git log'. Interesting. Perhaps I am missing something (I am far from an expert in clean/smudge filters, which I do not generally use myself), but the example in ProGit looks kind of bogus to me. I don't think it ever would have worked reliably, under any version of git. > (Note that although in my example I used 'git checkout', with an only > slightly more complicated example I can make it fail on 'git pull' > which is perhaps a much more realistic use case. That was probably > implied in your answer, I just wanted to mention it.) Yeah, I think the issue is basically the same for several commands which update the worktree and the HEAD. Most of them are going to do the worktree first. -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git smudge filter fails 2016-03-10 21:05 ` Jeff King @ 2016-03-10 22:04 ` Junio C Hamano 2016-03-15 16:17 ` Stephen Morton 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2016-03-10 22:04 UTC (permalink / raw) To: Jeff King; +Cc: Stephen Morton, git@vger.kernel.org Jeff King <peff@peff.net> writes: > On Thu, Mar 10, 2016 at 09:45:19AM -0500, Stephen Morton wrote: > >> I am a bit confused because this is basically the example used in >> ProGit [1] and it is fundamentally broken. In fact, if I understand >> correctly, this means that smudge filters cannot be relied upon to >> provide any 'keyword expansion' type tasks because they will all by >> nature have to query the file with 'git log'. > > Interesting. Perhaps I am missing something (I am far from an expert in > clean/smudge filters, which I do not generally use myself), but the > example in ProGit looks kind of bogus to me. I don't think it ever would > have worked reliably, under any version of git. > >> (Note that although in my example I used 'git checkout', with an only >> slightly more complicated example I can make it fail on 'git pull' >> which is perhaps a much more realistic use case. That was probably >> implied in your answer, I just wanted to mention it.) > > Yeah, I think the issue is basically the same for several commands which > update the worktree and the HEAD. Most of them are going to do the > worktree first. You can have a pair of branches A and B that have forked long time ago, and have a path F that has been changed identically since they forked, perhaps by cherry-picking the same change. This happens all the time. If there were some mechanism that modifies the checked out version of F with information that depends on the history that leads to A (e.g. "which commit that is an ancestor of A last modified F?") when you check out branch A, it will become invalid when you do "git checkout B", because the path F will not change because they are the same between the branches. In short, CVS $Id$-style substitutions that depend on the history fundamentally does not work, unless you are willing to always rewrite the whole working tree every time you switch branches. The smudge and clean filters are given _only_ the blob contents and nothing else, not "which commit (or tree) the blob is taken from", not "which path this blob sits in that tree-ish", not "what branch am I on" and this is a very much deliberate design decision made in order to avoid leading people to a misguided attempt to mimick CVS $Id$-style substitutions. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git smudge filter fails 2016-03-10 22:04 ` Junio C Hamano @ 2016-03-15 16:17 ` Stephen Morton 2016-03-15 16:48 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Stephen Morton @ 2016-03-15 16:17 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, git@vger.kernel.org On Thu, Mar 10, 2016 at 5:04 PM, Junio C Hamano <gitster@pobox.com> wrote: > Jeff King <peff@peff.net> writes: > >> On Thu, Mar 10, 2016 at 09:45:19AM -0500, Stephen Morton wrote: >> >>> I am a bit confused because this is basically the example used in >>> ProGit [1] and it is fundamentally broken. In fact, if I understand >>> correctly, this means that smudge filters cannot be relied upon to >>> provide any 'keyword expansion' type tasks because they will all by >>> nature have to query the file with 'git log'. >> >> Interesting. Perhaps I am missing something (I am far from an expert in >> clean/smudge filters, which I do not generally use myself), but the >> example in ProGit looks kind of bogus to me. I don't think it ever would >> have worked reliably, under any version of git. >> >>> (Note that although in my example I used 'git checkout', with an only >>> slightly more complicated example I can make it fail on 'git pull' >>> which is perhaps a much more realistic use case. That was probably >>> implied in your answer, I just wanted to mention it.) >> >> Yeah, I think the issue is basically the same for several commands which >> update the worktree and the HEAD. Most of them are going to do the >> worktree first. > > You can have a pair of branches A and B that have forked long time > ago, and have a path F that has been changed identically since they > forked, perhaps by cherry-picking the same change. This happens all > the time. > > If there were some mechanism that modifies the checked out version > of F with information that depends on the history that leads to A > (e.g. "which commit that is an ancestor of A last modified F?") > when you check out branch A, it will become invalid when you do "git > checkout B", because the path F will not change because they are the > same between the branches. In short, CVS $Id$-style substitutions > that depend on the history fundamentally does not work, unless you > are willing to always rewrite the whole working tree every time you > switch branches. > > The smudge and clean filters are given _only_ the blob contents and > nothing else, not "which commit (or tree) the blob is taken from", > not "which path this blob sits in that tree-ish", not "what branch > am I on" and this is a very much deliberate design decision made in > order to avoid leading people to a misguided attempt to mimick CVS > $Id$-style substitutions. > I will raise an Issue with ProGit. It's perhaps beyond the scope of my original question, but for situations where I need a "last change date" embedded in a file (e.g. because a protocol standard requires it), is there any recommended way to do so? We've the hard way that hardcoding makes merging/cherry-picking a bit of a nightmare and should be avoided. Is a post-checkout hook the way to go? I've actually found the smudge filter to be very slow for this application as each file is processed in series; a post-commit hook that could operate on files in parallel would likely be substantially faster. Stephen (Sorry about the earlier top-posting. I didn't realize what gmail was doing until after it had happened.) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: git smudge filter fails 2016-03-15 16:17 ` Stephen Morton @ 2016-03-15 16:48 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2016-03-15 16:48 UTC (permalink / raw) To: Stephen Morton; +Cc: Jeff King, git@vger.kernel.org Stephen Morton <stephen.c.morton@gmail.com> writes: > It's perhaps beyond the scope of my original question, but for > situations where I need a "last change date" embedded in a file (e.g. > because a protocol standard requires it), is there any recommended way > to do so? We've the hard way that hardcoding makes > merging/cherry-picking a bit of a nightmare and should be avoided. Does that "last change date" have to be embedded in a file with other stuff in there, or can it be a standalone file by itself (which may be used by other things via linking or inclusion)? If it can be a standalone file, a custom ll-merge driver that knows how yoru datestring looks like and takes the later of the versions in the two branches being merged would not be too hard to write to eliminate the "nightmare", I would think. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-03-15 16:48 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-09 18:29 git smudge filter fails Stephen Morton 2016-03-10 1:59 ` Jeff King 2016-03-10 14:45 ` Stephen Morton 2016-03-10 21:05 ` Jeff King 2016-03-10 22:04 ` Junio C Hamano 2016-03-15 16:17 ` Stephen Morton 2016-03-15 16:48 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).