Git development
 help / color / mirror / Atom feed
* .gitattributes glob matching broken
@ 2008-11-02 16:33 Hannu Koivisto
  2008-11-03  9:09 ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Hannu Koivisto @ 2008-11-02 16:33 UTC (permalink / raw)
  To: git

Greetings,

It seems that, for example, glob pattern *.s matches files with .sh
extension at least with checkout and reset --hard but git status
thinks otherwise:

mkdir test
cd test
git init
echo -e "*.sh -crlf\n*.s crlf" > .gitattributes
echo -e "foobar\nfoobar\nfoobar" > kala.s
echo -e "foobar\nfoobar\nfoobar" > kala.sh
git add .gitattributes kala.s kala.sh
git commit -m "Foo."
cd ..
git clone -n test test2
cd test2
git config core.autocrlf true
git checkout
git status

# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working
# directory)
#
#       modified:   kala.sh
#
no changes added to commit (use "git add" and/or "git commit -a")

file kala.s kala.sh

kala.s:  ASCII text, with CRLF line terminators
kala.sh: ASCII text, with CRLF line terminators

Tested in Linux with git 1.6.0.3.535.g933bb (master as of this
writing) but also witnessed in Windows and with slightly older
git versions.

This makes git use in a Windows environment pretty much impossible
if you don't want to / can't rely on git guessing "text"
vs. "binary" files correctly so I hope a solution is found soon.

It would also be good to document what kind of glob patterns git
actually supports.  I made the assumption that at least on Linux it
supports whatever glob(7) says but even if that assumption is
correct (which it may not be, of course) for example Windows users
may not realize to look for such a manual page.

-- 
Hannu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: .gitattributes glob matching broken
  2008-11-02 16:33 .gitattributes glob matching broken Hannu Koivisto
@ 2008-11-03  9:09 ` Jeff King
  2008-11-03 15:05   ` CRLF support bugs (was: Re: .gitattributes glob matching broken) Hannu Koivisto
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2008-11-03  9:09 UTC (permalink / raw)
  To: Hannu Koivisto; +Cc: git

On Sun, Nov 02, 2008 at 06:33:51PM +0200, Hannu Koivisto wrote:

> It seems that, for example, glob pattern *.s matches files with .sh
> extension at least with checkout and reset --hard but git status
> thinks otherwise:

I think your analysis is incorrect. I will try to explain what is
happening.

> mkdir test
> cd test
> git init
> echo -e "*.sh -crlf\n*.s crlf" > .gitattributes
> echo -e "foobar\nfoobar\nfoobar" > kala.s
> echo -e "foobar\nfoobar\nfoobar" > kala.sh
> git add .gitattributes kala.s kala.sh
> git commit -m "Foo."

OK, so here we have two files, one of which we are telling git is text
and one of which we are telling git is not text. Since we don't have
autocrlf set at all, of course nothing happens here.

> git clone -n test test2

And here we clone without checking out, so there are no files yet.

> cd test2
> git config core.autocrlf true
> git checkout

And now we do check out the files, with autocrlf applied. But what are
we left with? When I run this, _both_ files were detected as text and
have CRLF line endings. So here I think is where git didn't do what you
expected: kala.sh should not have had CRLF conversion applied.

This is a known limitation of the attributes mechanism: it only reads
from .gitattributes in the filesystem (or from .git/info/attributes),
and not from the tree that is being checked out. This is something that
should be addressed, but nobody has stepped up with a patch yet (though
there has been some preliminary discussion).

> git status
> 
> # On branch master
> # Changed but not updated:
> #   (use "git add <file>..." to update what will be committed)
> #   (use "git checkout -- <file>..." to discard changes in working
> # directory)
> #
> #       modified:   kala.sh
> #
> no changes added to commit (use "git add" and/or "git commit -a")

So yes, this status makes perfect sense, then. The file "kala.sh" has
CRLFs in the filesystem, but we have told git that it is not a file
which gets converted. So it looks like those CRs have been added.

The problem, again, is that we have inconsistently applied the
gitattributes. They were _not_ applied during checkout (because
.gitattributes did not exist yet), but they _are_ being applied here.

To "fix" this, you can then do a "git reset --hard" which will respect
your .gitattributes (since it is now checked out). And further file
creation and checkout should work OK.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* CRLF support bugs (was: Re: .gitattributes glob matching broken)
  2008-11-03  9:09 ` Jeff King
@ 2008-11-03 15:05   ` Hannu Koivisto
  2008-11-03 15:25     ` CRLF support bugs Hannu Koivisto
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Hannu Koivisto @ 2008-11-03 15:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King

Jeff King <peff@peff.net> writes:

> I think your analysis is incorrect. I will try to explain what is
> happening.

Yes, you are right.  The behaviour I saw in my actual use case was
so odd that I got completely confused.

I suspect one part of that "oddness" was caused by git applying its
heuristics in checkout as it doesn't use .gitattributes at that
time.  For example, it seems that it recognized some of my .sh
files as text files and the rest as binary files.  I suppose I was
correct to assume that it would be stupid to rely on git guessing
file type and the only sensible way is to use .gitattributes.  If
it was supported in checkout too, that is.

I don't know what purpose the autodetection aims to serve but I'd
add a big warning in the core.autocrlf documentation about it and
instructions on how to configure things so that it is never applied
but instead the types must always be specified explicitly.

> The problem, again, is that we have inconsistently applied the
> gitattributes. They were _not_ applied during checkout (because
> .gitattributes did not exist yet), but they _are_ being applied here.
>
> To "fix" this, you can then do a "git reset --hard" which will respect
> your .gitattributes (since it is now checked out). And further file
> creation and checkout should work OK.

Since I'm trying to launch git in a company environment, I think I
can't rely on people remembering to do that.

Actually, even if .gitattributes were applied in checkout, I think
the whole CRLF support is broken by design because people will have
to remember to use -n in clone, then enable core.autocrlf support
and then checkout.  This makes it unneccessarily complicated to
create "quick local clones" as well.  You might suggest that
Windows users should enable core.autocrlf globally but it may not
be the right thing to do for all projects/repositories either.

I think CRLF conversion support should have some attribute (be it
.gitattributes attribute or something else) that is somehow
inherited from the parent repository.  It would basically say that
"you should use platform's native line end type for text files with
this repository and its children".  To go with that, one would
maybe have a configuration option to tell what that platform
default line end type is (just in case someone wants to pretend
Cygwin is Unix or something like that).

I also observed this problem:

# Pretend someone does this on Unix
mkdir test1
cd test1
git init
echo "*.c crlf" > .gitattributes
echo -en "foo\r\nfoo\r\nfoo\r\n" > kala.c
git add .gitattributes kala.c
git commit -m "* Initial checkin."
cd ..
# Pretend someone else does this on Windows
git clone -n test1 test2
cd test2
git config core.autocrlf true
git checkout
git status

...
#       modified:   kala.c
...

git reset --hard
git status
...
#       modified:   kala.c
...

Now, even if .gitattributes were obeyed by checkout, I suspect the end
result would be the same(?)  I'm sure someone argues that this makes
sense.  But try to put yourself in the position of a random Window
user.  I think it's far from obvious what is going on and what
should be done in this situation.

-- 
Hannu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: CRLF support bugs
  2008-11-03 15:05   ` CRLF support bugs (was: Re: .gitattributes glob matching broken) Hannu Koivisto
@ 2008-11-03 15:25     ` Hannu Koivisto
  2008-11-03 16:46     ` CRLF support bugs (was: Re: .gitattributes glob matching broken) Dmitry Potapov
  2008-11-04  5:14     ` CRLF support bugs (was: Re: .gitattributes glob matching broken) Jeff King
  2 siblings, 0 replies; 9+ messages in thread
From: Hannu Koivisto @ 2008-11-03 15:25 UTC (permalink / raw)
  To: git; +Cc: Jeff King

Hannu Koivisto <azure@iki.fi> writes:

> Actually, even if .gitattributes were applied in checkout, I think
> the whole CRLF support is broken by design because people will have
> to remember to use -n in clone, then enable core.autocrlf support
> and then checkout.  This makes it unneccessarily complicated to

I forgot one thing: so what if someone forgets to use -n or just
imagines that you can set core.autocrlf afterwards?

# Pretend someone does this on Unix
mkdir test1
cd test1
git init
echo "*.c crlf" > .gitattributes
echo -e "foo\nfoo\nfoo" > kala.c
git add .gitattributes kala.c
git commit -m "Initial checkin."
cd ..
# Pretend test1 is not a local repository and someone else does this on Windows
git clone test1 test2
cd test2
git config core.autocrlf true
git status

# On branch master
nothing to commit (working directory clean)

Now the user would have to know that even though git status claims
everything is ok, that is not the case.  The user would have to
know to say (according to #git):

rm .git/index
git reset --hard

Just for the record, when I started to learn git, one of the first
questions I had was "how do I undo checkout?"  It wasn't until now
that I learned I need to remove .git/index (in addition to all
files).

-- 
Hannu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: CRLF support bugs (was: Re: .gitattributes glob matching broken)
  2008-11-03 15:05   ` CRLF support bugs (was: Re: .gitattributes glob matching broken) Hannu Koivisto
  2008-11-03 15:25     ` CRLF support bugs Hannu Koivisto
@ 2008-11-03 16:46     ` Dmitry Potapov
  2008-11-03 22:24       ` CRLF support bugs Hannu Koivisto
  2008-11-04  5:14     ` CRLF support bugs (was: Re: .gitattributes glob matching broken) Jeff King
  2 siblings, 1 reply; 9+ messages in thread
From: Dmitry Potapov @ 2008-11-03 16:46 UTC (permalink / raw)
  To: Hannu Koivisto; +Cc: git, Jeff King

On Mon, Nov 03, 2008 at 05:05:24PM +0200, Hannu Koivisto wrote:
> 
> Actually, even if .gitattributes were applied in checkout, I think
> the whole CRLF support is broken by design because people will have
> to remember to use -n in clone, then enable core.autocrlf support
> and then checkout.  This makes it unneccessarily complicated to
> create "quick local clones" as well.  You might suggest that
> Windows users should enable core.autocrlf globally but it may not
> be the right thing to do for all projects/repositories either.

core.autocrlf was exactly meant to be set globally. Basically,
it says what end-of-line should be on your system. It is strange
to have it different for different repositories.

> 
> I think CRLF conversion support should have some attribute (be it
> .gitattributes attribute or something else) that is somehow
> inherited from the parent repository.  It would basically say that
> "you should use platform's native line end type for text files with
> this repository and its children".

It is already so. All text files are treated accordingly to users
preferences, and for those files where automatic heuristic produces
undesirable result, it can be disabled using .gitattributes. (Alas,
there is a bug with checkout that you encountered earlier).

> To go with that, one would
> maybe have a configuration option to tell what that platform
> default line end type is (just in case someone wants to pretend
> Cygwin is Unix or something like that).

That is exactly what core.autocrlf is about. One user may want to
have LF on Windows and another wants CRLF. So, core.autocrlf defines
how text files should be treated. It is what each user may have in
his/her own ~/.gitconfig.

> 
> I also observed this problem:
> 
> # Pretend someone does this on Unix
> mkdir test1
> cd test1
> git init
> echo "*.c crlf" > .gitattributes
> echo -en "foo\r\nfoo\r\nfoo\r\n" > kala.c

The 'crlf' attribute means that the file should be treated as 'text'
without applying heuristic. The correct ending for text files on Unix
is '\n', not '\r\n'.  So, you put a text file with incorrect ending,
not surprisingly it causes problems for Windows users later.


Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: CRLF support bugs
  2008-11-03 16:46     ` CRLF support bugs (was: Re: .gitattributes glob matching broken) Dmitry Potapov
@ 2008-11-03 22:24       ` Hannu Koivisto
  0 siblings, 0 replies; 9+ messages in thread
From: Hannu Koivisto @ 2008-11-03 22:24 UTC (permalink / raw)
  To: git; +Cc: Dmitry Potapov, Jeff King

Dmitry Potapov <dpotapov@gmail.com> writes:

> On Mon, Nov 03, 2008 at 05:05:24PM +0200, Hannu Koivisto wrote:
>
> core.autocrlf was exactly meant to be set globally. Basically,
> it says what end-of-line should be on your system. It is strange
> to have it different for different repositories.

Maybe so from the point of view of what it was intended for, but
since there is nothing else that could be used to control end of
line conversion on a repository basis, it certainly doesn't feel
strange to me to use it like that.

When you clone & checkout, say, the official git repository on
Windows, are you comfortable doing it with core.autocrlf globally
set to true?  Maybe you know it's fine to do that so you actually
are but I'm not.  How about some other random free software
mainly-Unix project you would like to develop / build under
Windows?  Even if text vs. binary autodetection worked perfectly
(and it doesn't), CRLF line ends may still be the wrong choice for
some project.  I recall one such project and while admittedly the
situation with it may have changed since I last used it, that
doesn't change the point.

I certainly wouldn't want to have core.autocrlf globally set to
true on Windows.  No automatic conversion is a much safer default.
I only want CRLF conversion to happen with projects that have
actually considered such checkouts and if necessary have been
carefully set up to support it by using .gitattributes.

>> I also observed this problem:
>> 
>> # Pretend someone does this on Unix
>> mkdir test1
>> cd test1
>> git init
>> echo "*.c crlf" > .gitattributes
>> echo -en "foo\r\nfoo\r\nfoo\r\n" > kala.c
>
> The 'crlf' attribute means that the file should be treated as 'text'
> without applying heuristic. The correct ending for text files on Unix
> is '\n', not '\r\n'.  So, you put a text file with incorrect ending,
> not surprisingly it causes problems for Windows users later.

It seems to me you are looking at this, too, from the technical
point of view.  Yes, given the way CRLF support is implemented, the
end result was expected.  But that doesn't mean it was ok from the
user's point of view.  Consider usability instead.  A user makes a
mistake and adds a file from a colleague who uses Windows without
first converting it.  Are you really saying "so he made a mistake,
who cares if repository users face problems"?  I think it's just
very bad usability that by making such a small mistake you cause
the system to end up in a state that doesn't make any sense,
i.e. git claims you have modifications right after clone & checkout
even though you haven't modified anything.

-- 
Hannu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: CRLF support bugs (was: Re: .gitattributes glob matching broken)
  2008-11-03 15:05   ` CRLF support bugs (was: Re: .gitattributes glob matching broken) Hannu Koivisto
  2008-11-03 15:25     ` CRLF support bugs Hannu Koivisto
  2008-11-03 16:46     ` CRLF support bugs (was: Re: .gitattributes glob matching broken) Dmitry Potapov
@ 2008-11-04  5:14     ` Jeff King
  2008-11-04 12:37       ` CRLF support bugs (was: Re: .gitattributes glob matchingbroken) Kelly F. Hickel
  2 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2008-11-04  5:14 UTC (permalink / raw)
  To: Hannu Koivisto; +Cc: git

On Mon, Nov 03, 2008 at 05:05:24PM +0200, Hannu Koivisto wrote:

> I suspect one part of that "oddness" was caused by git applying its
> heuristics in checkout as it doesn't use .gitattributes at that
> time.

It _does_ apply them in checkout, you just didn't have a .gitattributes
file yet. So it is part of the same problem.

> For example, it seems that it recognized some of my .sh
> files as text files and the rest as binary files.  I suppose I was
> correct to assume that it would be stupid to rely on git guessing
> file type and the only sensible way is to use .gitattributes.  If

I think it depends on what's in your scripts, since many people have not
had trouble with the auto-detection. Perhaps some are UTF-16 which
contain NULs?

If the auto-detection is not working, I am sure people would love to see
samples of what fooled it (since it is, after all, just a guess, and we
would like to make the guess more accurate).

> > To "fix" this, you can then do a "git reset --hard" which will respect
> > your .gitattributes (since it is now checked out). And further file
> > creation and checkout should work OK.
> 
> Since I'm trying to launch git in a company environment, I think I
> can't rely on people remembering to do that.

Oh, absolutely. I think this is a shortcoming in git. The reset is
simply a workaround until it is actually fixed.

> Actually, even if .gitattributes were applied in checkout, I think
> the whole CRLF support is broken by design because people will have
> to remember to use -n in clone, then enable core.autocrlf support
> and then checkout.  This makes it unneccessarily complicated to

Yes, that is a little bit annoying. I think there are four options:

  - people set core.autocrlf in their global ~/.gitconfig. The downside,
    as you mentioned, is that you might not want it for all projects

  - clone should take an extra "options" parameter which can set this up
    after doing the 'init'. Like:

      git clone -O core.autocrlf=true /path/to/repo

  - after setting autocrlf, people need to tell git to re-checkout with
    the updated settings. I don't know of a straightforward way to tell
    git everything needs to be updated. So I would do:

      git ls-files | xargs touch
      git reset --hard

    which is not ideal. Probably some sort of "re-checkout" option to
    git-checkout would be better.

  - you could do this "re-checkout" automagically when core.autocrlf is
    set via "git config". There are two obvious problems with this
    magic, though:

      - that may not be what the user wants, if they have work in
        progress in the directory. And normally calling "git config"
        has no such side effects, so it is certainly unexpected.

      - we don't even know when the config is updated, since the user
        may simply edit the file behind git's back

    So that is a little too magic for my taste.

> I think CRLF conversion support should have some attribute (be it
> .gitattributes attribute or something else) that is somehow
> inherited from the parent repository.  It would basically say that
> "you should use platform's native line end type for text files with
> this repository and its children".  To go with that, one would
> maybe have a configuration option to tell what that platform
> default line end type is (just in case someone wants to pretend
> Cygwin is Unix or something like that).

I think others have complained before about something like this, in that
it really is a _local_ decision and not a _project_ decision to make. I
am fortunate enough to work exclusively on platforms with sane line
endings, so I don't know what is normal.

But if you really wanted to do such a thing for some set of corporate
users, maybe it would make sense to have a "clone" hook that runs after
init and can set up any relevant config (e.g., by copying certain config
values from the parent repo).

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: CRLF support bugs (was: Re: .gitattributes glob matchingbroken)
  2008-11-04  5:14     ` CRLF support bugs (was: Re: .gitattributes glob matching broken) Jeff King
@ 2008-11-04 12:37       ` Kelly F. Hickel
  2008-11-05  3:07         ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Kelly F. Hickel @ 2008-11-04 12:37 UTC (permalink / raw)
  To: Jeff King, Hannu Koivisto; +Cc: git

> -----Original Message-----
> From: git-owner@vger.kernel.org [mailto:git-owner@vger.kernel.org] On
> Behalf Of Jeff King
> Sent: Monday, November 03, 2008 11:15 PM
> To: Hannu Koivisto
> Cc: git@vger.kernel.org
> Subject: Re: CRLF support bugs (was: Re: .gitattributes glob
> matchingbroken)
> 
> On Mon, Nov 03, 2008 at 05:05:24PM +0200, Hannu Koivisto wrote:
> 
<snip>
> > I think CRLF conversion support should have some attribute (be it
> > .gitattributes attribute or something else) that is somehow
> > inherited from the parent repository.  It would basically say that
> > "you should use platform's native line end type for text files with
> > this repository and its children".  To go with that, one would
> > maybe have a configuration option to tell what that platform
> > default line end type is (just in case someone wants to pretend
> > Cygwin is Unix or something like that).
> 
> I think others have complained before about something like this, in
> that
> it really is a _local_ decision and not a _project_ decision to make. I
> am fortunate enough to work exclusively on platforms with sane line
> endings, so I don't know what is normal.

From my point of view, the factoid that a particular file should be subjected to having its line endings munged is a _project_ decision.  Whether or not to munge them on any given platform is a _local_ decision.

I work on various UNIXes, Linux, Windows, z/OS, etc, etc, and I want the tool to just do the right thing so that I don't have to think about it on a daily basis.

My $0.02....

-Kelly 

> 
> But if you really wanted to do such a thing for some set of corporate
> users, maybe it would make sense to have a "clone" hook that runs after
> init and can set up any relevant config (e.g., by copying certain
> config
> values from the parent repo).
> 
> -Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: CRLF support bugs (was: Re: .gitattributes glob matchingbroken)
  2008-11-04 12:37       ` CRLF support bugs (was: Re: .gitattributes glob matchingbroken) Kelly F. Hickel
@ 2008-11-05  3:07         ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2008-11-05  3:07 UTC (permalink / raw)
  To: Kelly F. Hickel; +Cc: Hannu Koivisto, git

On Tue, Nov 04, 2008 at 06:37:27AM -0600, Kelly F. Hickel wrote:

> From my point of view, the factoid that a particular file should be
> subjected to having its line endings munged is a _project_ decision.
> Whether or not to munge them on any given platform is a _local_
> decision.

Now that you spell it out, I am reminded that that is the argument I
remember having seen in past discussions. And of course, that argues for
.gitattributes being carried by the project, but the core.autocrlf
_config_ being a local decision.

Which, perhaps not coincidentally, is how it is currently implemented. :)

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-11-05  3:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-02 16:33 .gitattributes glob matching broken Hannu Koivisto
2008-11-03  9:09 ` Jeff King
2008-11-03 15:05   ` CRLF support bugs (was: Re: .gitattributes glob matching broken) Hannu Koivisto
2008-11-03 15:25     ` CRLF support bugs Hannu Koivisto
2008-11-03 16:46     ` CRLF support bugs (was: Re: .gitattributes glob matching broken) Dmitry Potapov
2008-11-03 22:24       ` CRLF support bugs Hannu Koivisto
2008-11-04  5:14     ` CRLF support bugs (was: Re: .gitattributes glob matching broken) Jeff King
2008-11-04 12:37       ` CRLF support bugs (was: Re: .gitattributes glob matchingbroken) Kelly F. Hickel
2008-11-05  3:07         ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox