* gitattributes - clean filter invoked on pull?
@ 2011-04-11 8:42 Miklos Vajna
2011-04-11 9:19 ` Ramkumar Ramachandra
2011-04-11 10:04 ` Dmitry Potapov
0 siblings, 2 replies; 10+ messages in thread
From: Miklos Vajna @ 2011-04-11 8:42 UTC (permalink / raw)
To: git; +Cc: timar74
[-- Attachment #1: Type: text/plain, Size: 1824 bytes --]
Hi,
Background: We at LibreOffice are trying to use the 'filter'
gitattributes feature to clean up line wrappings in po files.
The problem is that it seems the clean filter - which is supposed to be
invoked only in case a new blob is created - is invoked even on
clone/pull, and other developers are claiming that it slows down their
workflow.
Is this a bug? I don't exactly understand why this would be necessary.
Here is a short script to reproduce the issue:
----
rm -rf client*
mkdir client
cd client
git init
git config filter.po.clean 'echo foo >&2 && cat'
git config filter.po.smudge cat
echo '*.po filter=po' > .gitattributes
touch foo.po
git add .gitattributes foo.po
git commit -m foo
cd ..
git clone client client2
cd client2
git config filter.po.clean 'echo foo >&2 && cat'
git config filter.po.smudge cat
cd ..
cd client
echo aaa > foo.po
git commit -am second
cd ..
cd client2/
git pull
----
Its output here with 1.7.4.4:
----
$ sh test.sh
Initialized empty Git repository in /home/vmiklos/git/t/client/.git/
foo
foo
[master (root-commit) bbf8490] foo
foo
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 .gitattributes
create mode 100644 foo.po
Cloning into client2...
done.
foo
[master foo
foo
e37f5ab] second
foo
foo
1 files changed, 1 insertions(+), 0 deletions(-)
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From /home/vmiklos/git/t/client
bbf8490..e37f5ab master -> origin/master
Updating bbf8490..e37f5ab
foo
Fast-forward
foo
foo.po | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
----
Any thoughts why the clean filter is invoked on pull?
Thanks.
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: gitattributes - clean filter invoked on pull?
2011-04-11 8:42 gitattributes - clean filter invoked on pull? Miklos Vajna
@ 2011-04-11 9:19 ` Ramkumar Ramachandra
2011-04-11 9:31 ` Miklos Vajna
2011-04-11 9:50 ` Michael J Gruber
2011-04-11 10:04 ` Dmitry Potapov
1 sibling, 2 replies; 10+ messages in thread
From: Ramkumar Ramachandra @ 2011-04-11 9:19 UTC (permalink / raw)
To: Miklos Vajna; +Cc: git, timar74
Hi Miklos,
Miklos Vajna writes:
> Background: We at LibreOffice are trying to use the 'filter'
> gitattributes feature to clean up line wrappings in po files.
>
> The problem is that it seems the clean filter - which is supposed to be
> invoked only in case a new blob is created - is invoked even on
> clone/pull, and other developers are claiming that it slows down their
> workflow.
>
> Is this a bug? I don't exactly understand why this would be necessary.
>From config.txt:
- 'clean' is "The command which is used to convert the content of a
worktree file to a blob upon checkin".
- 'smudge' is "The command which is used to convert the content of a
blob object to a worktree file upon checkout."
According to the documentation, 'smudge' is *supposed* to be invoked
on a clone/ pull, since it involves a checkout. I don't see how you
can avoid running these filters on every checkin/ checkout unless you
cache the result somewhere.
-- Ram
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: gitattributes - clean filter invoked on pull?
2011-04-11 9:19 ` Ramkumar Ramachandra
@ 2011-04-11 9:31 ` Miklos Vajna
2011-04-11 9:50 ` Ramkumar Ramachandra
2011-04-11 10:00 ` Johannes Sixt
2011-04-11 9:50 ` Michael J Gruber
1 sibling, 2 replies; 10+ messages in thread
From: Miklos Vajna @ 2011-04-11 9:31 UTC (permalink / raw)
To: Ramkumar Ramachandra; +Cc: git, timar74
[-- Attachment #1: Type: text/plain, Size: 899 bytes --]
On Mon, Apr 11, 2011 at 02:49:21PM +0530, Ramkumar Ramachandra <artagnon@gmail.com> wrote:
> > Is this a bug? I don't exactly understand why this would be necessary.
>
> From config.txt:
> - 'clean' is "The command which is used to convert the content of a
> worktree file to a blob upon checkin".
> - 'smudge' is "The command which is used to convert the content of a
> blob object to a worktree file upon checkout."
>
> According to the documentation, 'smudge' is *supposed* to be invoked
> on a clone/ pull, since it involves a checkout. I don't see how you
> can avoid running these filters on every checkin/ checkout unless you
> cache the result somewhere.
That's not a problem - the issue I pointed out is that the 'clean' one
is invoked on pull/clone, and it takes time if it's applied to several
files.
'smudge' is just a 'cat', I don't care about it. :)
Thanks.
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: gitattributes - clean filter invoked on pull?
2011-04-11 9:31 ` Miklos Vajna
@ 2011-04-11 9:50 ` Ramkumar Ramachandra
2011-04-11 10:00 ` Johannes Sixt
1 sibling, 0 replies; 10+ messages in thread
From: Ramkumar Ramachandra @ 2011-04-11 9:50 UTC (permalink / raw)
To: Miklos Vajna; +Cc: git, timar74
Hi Miklos,
Miklos Vajna writes:
> On Mon, Apr 11, 2011 at 02:49:21PM +0530, Ramkumar Ramachandra <artagnon@gmail.com> wrote:
> > > Is this a bug? I don't exactly understand why this would be necessary.
> >
> > From config.txt:
> > - 'clean' is "The command which is used to convert the content of a
> > worktree file to a blob upon checkin".
> > - 'smudge' is "The command which is used to convert the content of a
> > blob object to a worktree file upon checkout."
> >
> > According to the documentation, 'smudge' is *supposed* to be invoked
> > on a clone/ pull, since it involves a checkout. I don't see how you
> > can avoid running these filters on every checkin/ checkout unless you
> > cache the result somewhere.
>
> That's not a problem - the issue I pointed out is that the 'clean' one
> is invoked on pull/clone, and it takes time if it's applied to several
> files.
>
> 'smudge' is just a 'cat', I don't care about it. :)
Ah, sorry about that. There actually seems to be a bug :|
-- Ram
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: gitattributes - clean filter invoked on pull?
2011-04-11 9:31 ` Miklos Vajna
2011-04-11 9:50 ` Ramkumar Ramachandra
@ 2011-04-11 10:00 ` Johannes Sixt
1 sibling, 0 replies; 10+ messages in thread
From: Johannes Sixt @ 2011-04-11 10:00 UTC (permalink / raw)
To: Miklos Vajna; +Cc: Ramkumar Ramachandra, git, timar74
Am 4/11/2011 11:31, schrieb Miklos Vajna:
> On Mon, Apr 11, 2011 at 02:49:21PM +0530, Ramkumar Ramachandra <artagnon@gmail.com> wrote:
>>> Is this a bug? I don't exactly understand why this would be necessary.
>>
>> From config.txt:
>> - 'clean' is "The command which is used to convert the content of a
>> worktree file to a blob upon checkin".
>> - 'smudge' is "The command which is used to convert the content of a
>> blob object to a worktree file upon checkout."
>>
>> According to the documentation, 'smudge' is *supposed* to be invoked
>> on a clone/ pull, since it involves a checkout. I don't see how you
>> can avoid running these filters on every checkin/ checkout unless you
>> cache the result somewhere.
>
> That's not a problem - the issue I pointed out is that the 'clean' one
> is invoked on pull/clone, and it takes time if it's applied to several
> files.
The invocation is only needed when files are marked as "racily clean",
because in this case git has to check whether the worktree contents are
what is recorded in the index or not. This can happen a lot when you have
a fast machine where many worktree files and the index itself can be
written within the same (wall clock) second. You example is so short that
it triggers this case almost reliably.
When git pull merges the fetched commit, it has to determine whether there
are no changes in any of the files that are to be updated by the merge. If
one such file is marked as racily clean, the worktree contents must be
inspected, which in turn means that the clean filter has to be used.
If you insert before the final 'git pull':
sleep 1
git reset
sleep 1
you will notice that some clean filter calls happen before the 'git pull'
because git 'git reset' rectifies the racily-clean entry.
This just explains what you observed. I haven't thought about how you
should change your workflow to avoid this behavior. My guess is that the
extra clean filters called by 'git pull' don't actually happen that
frequently during normal interactive work that touches the index.
Perhaps you are also bitten by a regression in 'git status', which does
not correct the racily-clean entries even though it should (fixed in git
1.7.4.4.), and therefore the clean filter is run more often than necessary.
>
> 'smudge' is just a 'cat', I don't care about it. :)
Then you can just remove it from the config and save a fork(). You don't
have to configure both clean and smudge filters.
-- Hannes
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: gitattributes - clean filter invoked on pull?
2011-04-11 9:19 ` Ramkumar Ramachandra
2011-04-11 9:31 ` Miklos Vajna
@ 2011-04-11 9:50 ` Michael J Gruber
2011-04-11 10:16 ` Miklos Vajna
1 sibling, 1 reply; 10+ messages in thread
From: Michael J Gruber @ 2011-04-11 9:50 UTC (permalink / raw)
To: Ramkumar Ramachandra; +Cc: Miklos Vajna, git, timar74
Ramkumar Ramachandra venit, vidit, dixit 11.04.2011 11:19:
> Hi Miklos,
>
> Miklos Vajna writes:
>> Background: We at LibreOffice are trying to use the 'filter'
>> gitattributes feature to clean up line wrappings in po files.
>>
>> The problem is that it seems the clean filter - which is supposed to be
>> invoked only in case a new blob is created - is invoked even on
>> clone/pull, and other developers are claiming that it slows down their
>> workflow.
>>
>> Is this a bug? I don't exactly understand why this would be necessary.
>
> From config.txt:
> - 'clean' is "The command which is used to convert the content of a
> worktree file to a blob upon checkin".
> - 'smudge' is "The command which is used to convert the content of a
> blob object to a worktree file upon checkout."
>
> According to the documentation, 'smudge' is *supposed* to be invoked
> on a clone/ pull, since it involves a checkout. I don't see how you
> can avoid running these filters on every checkin/ checkout unless you
> cache the result somewhere.
Exactly that is why it's surprising that the clean filter is invoked on
pull - clean is about checking in, pull only checks out.
If you run your script with GIT_TRACE=1 you see that the two last clean
invocations come from merge and gc. They go away when you do a fetch only.
Note that with clean/smudge, a check "worktree == repo" requires a
conversion of "worktree" to what would be checked in, and that uses
"clean". That's why it's invoked not only for "commit".
But maybe there is a better solution for your actual use case? Do you
want to ignore line wrap or normalise it?
Michael
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: gitattributes - clean filter invoked on pull?
2011-04-11 9:50 ` Michael J Gruber
@ 2011-04-11 10:16 ` Miklos Vajna
2011-04-11 10:41 ` Michael J Gruber
0 siblings, 1 reply; 10+ messages in thread
From: Miklos Vajna @ 2011-04-11 10:16 UTC (permalink / raw)
To: Michael J Gruber; +Cc: Ramkumar Ramachandra, git, timar74
[-- Attachment #1: Type: text/plain, Size: 507 bytes --]
On Mon, Apr 11, 2011 at 11:50:09AM +0200, Michael J Gruber <git@drmicha.warpmail.net> wrote:
> But maybe there is a better solution for your actual use case? Do you
> want to ignore line wrap or normalise it?
I want to avoid those, so we get readable diffs, whatever line width the
different translator tools are using. (If tool1 is using 72 and tool2 is
using 80, then it would reformat the whole file.)
Do you have a better idea than
git config filter.po.clean 'msgcat - --no-wrap'
?
Thanks.
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: gitattributes - clean filter invoked on pull?
2011-04-11 10:16 ` Miklos Vajna
@ 2011-04-11 10:41 ` Michael J Gruber
2011-04-11 11:14 ` Miklos Vajna
0 siblings, 1 reply; 10+ messages in thread
From: Michael J Gruber @ 2011-04-11 10:41 UTC (permalink / raw)
To: Miklos Vajna; +Cc: Ramkumar Ramachandra, git, timar74
Miklos Vajna venit, vidit, dixit 11.04.2011 12:16:
> On Mon, Apr 11, 2011 at 11:50:09AM +0200, Michael J Gruber <git@drmicha.warpmail.net> wrote:
>> But maybe there is a better solution for your actual use case? Do you
>> want to ignore line wrap or normalise it?
>
> I want to avoid those, so we get readable diffs, whatever line width the
> different translator tools are using. (If tool1 is using 72 and tool2 is
> using 80, then it would reformat the whole file.)
>
> Do you have a better idea than
>
> git config filter.po.clean 'msgcat - --no-wrap'
>
> ?
git config diff.po.textconv 'msgcat - --no-wrap'
git config diff.po.cachetextconv true
If you want to normalise the repo, you may want to look at hooks instead
of clean/smudge if they are a performance problem.
Michael
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: gitattributes - clean filter invoked on pull?
2011-04-11 8:42 gitattributes - clean filter invoked on pull? Miklos Vajna
2011-04-11 9:19 ` Ramkumar Ramachandra
@ 2011-04-11 10:04 ` Dmitry Potapov
1 sibling, 0 replies; 10+ messages in thread
From: Dmitry Potapov @ 2011-04-11 10:04 UTC (permalink / raw)
To: Miklos Vajna; +Cc: git, timar74
Hi,
On Mon, Apr 11, 2011 at 12:42 PM, Miklos Vajna <vmiklos@frugalware.org> wrote:
>
> The problem is that it seems the clean filter - which is supposed to be
> invoked only in case a new blob is created - is invoked even on
> clone/pull, and other developers are claiming that it slows down their
> workflow.
>
> Is this a bug? I don't exactly understand why this would be necessary.
No, it is not a bug. Git may invoke the clean filter when a file is not
changed to make sure that the file is not changed. It is necessary to
prevent a race when a file is changed so quickly that its timestamp does
not change. So, what git does is compare timestamp of your file and the
index file. Because the index file is written after all files, its
timestamp should be later than any file in the repository. However, if
the timestamp resolution is not sufficient (i.e. timestamp is the same),
git may re-read recently checkout file to make sure that there were no
changes to it. During this reading, the clean filter will be invoked.
So, clean filter may be invoked extra time, but smudge filter should not.
Dmitry
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-04-11 11:14 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-11 8:42 gitattributes - clean filter invoked on pull? Miklos Vajna
2011-04-11 9:19 ` Ramkumar Ramachandra
2011-04-11 9:31 ` Miklos Vajna
2011-04-11 9:50 ` Ramkumar Ramachandra
2011-04-11 10:00 ` Johannes Sixt
2011-04-11 9:50 ` Michael J Gruber
2011-04-11 10:16 ` Miklos Vajna
2011-04-11 10:41 ` Michael J Gruber
2011-04-11 11:14 ` Miklos Vajna
2011-04-11 10:04 ` Dmitry Potapov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).