* What's cooking extra @ 2010-05-19 14:33 Junio C Hamano 2010-05-19 15:12 ` A Large Angry SCM ` (3 more replies) 0 siblings, 4 replies; 35+ messages in thread From: Junio C Hamano @ 2010-05-19 14:33 UTC (permalink / raw) To: git First, let me thank everybody who participated in the list traffic for the past few weeks, reporting problems, answering questions, raising issues, discussing them, sending patches and giving feedback to them, while I was away. I took a new day-job (yesterday was my second day) and haven't quite adjusted yet, but I finally managed to find some time and energy to browse through the list traffic and even queued a handful of topics. I expect I'll be more productive and back to speed in a week or two, but until then I expect to still be slower than my usual self. Here are the topics I've picked up so far (excluding the ones that were trivially and obviously correct that went directly to 'maint' or 'master'): * ec/diff-noprefix-config (2010-05-02) 1 commit - diff: add configuration option for disabling diff prefixes. * jk/diff-m-doc (2010-05-08) 1 commit - docs: clarify meaning of -M for git-log * mc/maint-zoneparse (2010-05-17) 1 commit - Add "Z" as an alias for the timezone "UTC" * mg/notes-dry-run (2010-05-14) 1 commit - notes: dry-run and verbose options for prune * mg/rev-parse-lrbranches-locals (2010-05-14) 1 commit - revlist: Introduce --lrbranches and --locals revision specifiers (this branch uses mg/rev-parse-option-sifter-deprecation.) * mg/rev-parse-option-sifter-deprecation (2010-05-14) 3 commits - t6018: make sure all tested symbolic names are different revs - t6018: add tests for rev-list's --branches and --tags - rev-parse: deprecate use as an option sifter (this branch is used by mg/rev-parse-lrbranches-locals.) I am aware of the following topics, that are probably all worthy of inclusion at some point, but am unclear in what status their discussions are. I'd appreciate it if people can help me come up with a list of topics that are fully discussed, and if patch submitters of these topics can re-send the final "to apply" copy. * (Eyvind Bernhardsen and Linus) Fixing the behaviour of crlf attribute; ignoring them when core.autocrlf is not in effect was a wrong design decision. I agree with what Linus said in the thread; I haven't yet looked at the discussion in the past few days. Also I don't know where '[PATCH v2] Add "core.eol" config variable' fits in the picture. * (Chris Lamb, Jeff King, Thomas Rast) "rebase -i" mishandles a patch with backslash in the title * (Rene) grep on binary files * (Linus) "git show ':/this is now a regex'" * (Gary V. Vaughan) Portability patches * (Ævar Arnfjörð Bjarmason) cvsserver updates * (Bo Yang, Thomas Rast) "log --graph" improvements * (Pavan Kumar Sunkara) instaweb and web--browse * (Yann Droneaud, Linus) matching utf8 locale in t9129 ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-19 14:33 What's cooking extra Junio C Hamano @ 2010-05-19 15:12 ` A Large Angry SCM 2010-05-19 17:06 ` Finn Arne Gangstad ` (2 subsequent siblings) 3 siblings, 0 replies; 35+ messages in thread From: A Large Angry SCM @ 2010-05-19 15:12 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano wrote: [...] > > I took a new day-job (yesterday was my second day) and haven't quite > adjusted yet, but I finally managed to find some time and energy to browse > through the list traffic and even queued a handful of topics. I expect > I'll be more productive and back to speed in a week or two, but until then > I expect to still be slower than my usual self. > This is interesting news. Congratulations on the new job! It must be a good one to get you to leave southern California. Is your new employer subsidizing your git work similar to your previous employer? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-19 14:33 What's cooking extra Junio C Hamano 2010-05-19 15:12 ` A Large Angry SCM @ 2010-05-19 17:06 ` Finn Arne Gangstad 2010-05-19 20:09 ` Eyvind Bernhardsen 2010-05-22 13:09 ` Clemens Buchacher 2010-05-21 16:16 ` Ævar Arnfjörð Bjarmason 2010-05-22 21:24 ` René Scharfe 3 siblings, 2 replies; 35+ messages in thread From: Finn Arne Gangstad @ 2010-05-19 17:06 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Wed, May 19, 2010 at 07:33:32AM -0700, Junio C Hamano wrote: > * (Eyvind Bernhardsen and Linus) Fixing the behaviour of crlf attribute; > ignoring them when core.autocrlf is not in effect was a wrong design > decision. > > I agree with what Linus said in the thread; I haven't yet looked at the > discussion in the past few days. Also I don't know where '[PATCH v2] > Add "core.eol" config variable' fits in the picture. I think this one is pretty much discussed by now, with the latest changes this should do pretty much what Linus wanted. Together with the autocrlf patch, this should make CRLF handling on Windows and mixed Linux/Windows setups a lot better! - Finn Arne ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-19 17:06 ` Finn Arne Gangstad @ 2010-05-19 20:09 ` Eyvind Bernhardsen 2010-05-22 13:09 ` Clemens Buchacher 1 sibling, 0 replies; 35+ messages in thread From: Eyvind Bernhardsen @ 2010-05-19 20:09 UTC (permalink / raw) To: Junio C Hamano; +Cc: git@vger.kernel.org List, Finn Arne Gangstad On 19. mai 2010, at 19.06, Finn Arne Gangstad wrote: > On Wed, May 19, 2010 at 07:33:32AM -0700, Junio C Hamano wrote: > >> * (Eyvind Bernhardsen and Linus) Fixing the behaviour of crlf attribute; >> ignoring them when core.autocrlf is not in effect was a wrong design >> decision. >> >> I agree with what Linus said in the thread; I haven't yet looked at the >> discussion in the past few days. Also I don't know where '[PATCH v2] >> Add "core.eol" config variable' fits in the picture. > > I think this one is pretty much discussed by now, with the latest > changes this should do pretty much what Linus wanted. Together with > the autocrlf patch, this should make CRLF handling on Windows and > mixed Linux/Windows setups a lot better! Yes, it's done. I'll resend the series with the core.eol patch included. -- Eyvind ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-19 17:06 ` Finn Arne Gangstad 2010-05-19 20:09 ` Eyvind Bernhardsen @ 2010-05-22 13:09 ` Clemens Buchacher 2010-05-22 19:42 ` Eyvind Bernhardsen 1 sibling, 1 reply; 35+ messages in thread From: Clemens Buchacher @ 2010-05-22 13:09 UTC (permalink / raw) To: Finn Arne Gangstad; +Cc: Junio C Hamano, git, Eyvind Bernhardsen On Wed, May 19, 2010 at 07:06:56PM +0200, Finn Arne Gangstad wrote: > On Wed, May 19, 2010 at 07:33:32AM -0700, Junio C Hamano wrote: > > > * (Eyvind Bernhardsen and Linus) Fixing the behaviour of crlf attribute; > > ignoring them when core.autocrlf is not in effect was a wrong design > > decision. > > > > I agree with what Linus said in the thread; I haven't yet looked at the > > discussion in the past few days. Also I don't know where '[PATCH v2] > > Add "core.eol" config variable' fits in the picture. > > I think this one is pretty much discussed by now, with the latest > changes this should do pretty much what Linus wanted. That is not the impression I got. Linus was objecting to the idea of new attribute and configuration variables, which essentially do the same thing but with slightly different semantics. As soon as the existing crlf attribute is given priority over core.autocrlf, all the problems discussed originally go away. So what exactly are the new attributes supposed to do? Also, could you post a truth table for all the parameters involved (eol, crlf, core.autocrlf, core.eol). The documentation in the patches is too confusing for me to understand even that. And, renaming the crlf attribute to text? Where did Linus suggest that? If we do that, we don't even have to talk about backwards compatibility any more. Clemens ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-22 13:09 ` Clemens Buchacher @ 2010-05-22 19:42 ` Eyvind Bernhardsen 2010-05-22 22:27 ` Clemens Buchacher 0 siblings, 1 reply; 35+ messages in thread From: Eyvind Bernhardsen @ 2010-05-22 19:42 UTC (permalink / raw) To: Clemens Buchacher; +Cc: Finn Arne Gangstad, Junio C Hamano, git On 22. mai 2010, at 15.09, Clemens Buchacher wrote: > On Wed, May 19, 2010 at 07:06:56PM +0200, Finn Arne Gangstad wrote: >> On Wed, May 19, 2010 at 07:33:32AM -0700, Junio C Hamano wrote: >> >>> * (Eyvind Bernhardsen and Linus) Fixing the behaviour of crlf attribute; >>> ignoring them when core.autocrlf is not in effect was a wrong design >>> decision. >>> >>> I agree with what Linus said in the thread; I haven't yet looked at the >>> discussion in the past few days. Also I don't know where '[PATCH v2] >>> Add "core.eol" config variable' fits in the picture. >> >> I think this one is pretty much discussed by now, with the latest >> changes this should do pretty much what Linus wanted. > > That is not the impression I got. Linus was objecting to the idea of new > attribute and configuration variables, which essentially do the same thing > but with slightly different semantics. > > As soon as the existing crlf attribute is given priority over core.autocrlf, > all the problems discussed originally go away. So what exactly are the new > attributes supposed to do? There is one new attribute, "eol", that is used for files which need a specific line ending. Being able to "force" LF or CRLF line endings has been requested several times on the list, and is already sort of provided for by "crlf=input". Linus had lots of strong objections, but I also made lots of changes during the course of the discussion. > Also, could you post a truth table for all the parameters involved (eol, > crlf, core.autocrlf, core.eol). The documentation in the patches is too > confusing for me to understand even that. I'll do my best. Basically there are two things to keep track of: "should this file be normalized as text?" and "which line endings should this file have in the working directory?". There is an attribute for each, and two configuration variables that set the default values of the attributes. Unfortunately my mail client mangles ascii art, so I can't do a table. This will have to suffice: Any file with the "text" attribute set will have its line endings normalized to LF in the repository. If "text" is set to the special value "auto", git will only convert the file if it looks like a text file. The "eol" attribute is used for files that need a specific line ending. Setting it also sets "text". core.eol controls which line endings to use for normalized files that don't have the "eol" attribute set, and defaults to the platform native line ending. When core.autocrlf is set, the default value of the "text" attribute is set to "auto" but with an extra safety feature: if a file contains CRs in the index, it won't be normalized. The extra feature comes from Finn Arne's "safe autocrlf" patch. There is a backwards compatibility wrinkle in that core.autocrlf will override core.eol if the latter isn't explicitly set, so that "core.autocrlf=true" still results in CRLFs in the working directory on Linux. > And, renaming the crlf attribute to text? Where did Linus suggest that? If > we do that, we don't even have to talk about backwards compatibility any > more. In <alpine.LFD.2.00.1005121824260.3711@i5.linux-foundation.org>: > So if you rename these things, keep them separate. Make the "am I a > text-file" boolean be a boolean (plus "auto"), and just call it "text". > And make the "what end of line to use" be just "eol" then. So he didn't suggest renaming it, but he did suggest a better name and UI than the one I came up with. I expected objections to renaming the attribute, which is why that is a separate commit, but people seemed supportive overall. The "crlf" attribute will be used if it is present so backwards compatibility is preserved to a degree. Scripts that test for the "crlf" attribute explicitly (such as git-cvsserver, which I fixed) will break. I don't know how big a problem that is going to be in practice, but nobody raised it as an issue during the discussion. Compatibility with older clients is a valid concern, but older clients will ignore "crlf" as well as "text" unless core.autocrlf is set, so they will cause problems no matter what. -- Eyvind ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-22 19:42 ` Eyvind Bernhardsen @ 2010-05-22 22:27 ` Clemens Buchacher 2010-05-23 10:36 ` Eyvind Bernhardsen 0 siblings, 1 reply; 35+ messages in thread From: Clemens Buchacher @ 2010-05-22 22:27 UTC (permalink / raw) To: Eyvind Bernhardsen; +Cc: Finn Arne Gangstad, Junio C Hamano, git Hi Eyvind, Thanks for the extended summary. I still have several doubts, as detailed below. But I understand that this has been heavily discussed and if the discussion has indeed come to a conclusion, then I will not complain about it now. On Sat, May 22, 2010 at 09:42:14PM +0200, Eyvind Bernhardsen wrote: > On 22. mai 2010, at 15.09, Clemens Buchacher wrote: > > > As soon as the existing crlf attribute is given priority over > > core.autocrlf, all the problems discussed originally go away. > > So what exactly are the new attributes supposed to do? For all my comments below I am assuming that the behavior of autocrlf will be changed to respect the crlf/text attribute by default. > There is one new attribute, "eol", that is used for files which > need a specific line ending. Being able to "force" LF or CRLF > line endings has been requested several times on the list, and is > already sort of provided for by "crlf=input". [...] > Any file with the "text" attribute set will have its line endings > normalized to LF in the repository. If "text" is set to the > special value "auto", git will only convert the file if it looks > like a text file. > > The "eol" attribute is used for files that need a specific line > ending. Setting it also sets "text". If a file needs specific line endings, why enable conversion for this file at all? Just make sure the repository contains the correct version and unset the crlf attribute. > core.eol controls which line endings to use for normalized files > that don't have the "eol" attribute set, and defaults to the > platform native line ending. That makes sense to me, except for the part where I need a per-file attribute. > When core.autocrlf is set, the default value of the "text" > attribute is set to "auto" but with an extra safety feature: if a > file contains CRs in the index, it won't be normalized. The > extra feature comes from Finn Arne's "safe autocrlf" patch. > > There is a backwards compatibility wrinkle in that core.autocrlf > will override core.eol if the latter isn't explicitly set, so > that "core.autocrlf=true" still results in CRLFs in the working > directory on Linux. This also makes sense. I just fear that making this frequently misunderstood feature even more complex will only confuse users further. I do see the value of a global core.eol option, however, since it allows me to convert to LF instead of CRLF, which AFAIK is not currently possible. On the other hand, this will cause users to stop caring whether or not a file in the repository has LF or CRLF line endings. I am not sure if that is a good thing, but I suppose it is better than what we have now. > > And, renaming the crlf attribute to text? Where did Linus suggest that? If > > we do that, we don't even have to talk about backwards compatibility any > > more. > > In <alpine.LFD.2.00.1005121824260.3711@i5.linux-foundation.org>: > > So if you rename these things, keep them separate. Make the "am I a > > text-file" boolean be a boolean (plus "auto"), and just call it "text". > > And make the "what end of line to use" be just "eol" then. I see. Well, if we rename the "crlf" attribute then we will have a macro attribute "binary" and an attribute "text", which are not the opposite of each other. That is a bit strange. > The "crlf" attribute will be used if it is present so backwards > compatibility is preserved to a degree. Ah, ok. That is fine then. > Scripts that test for > the "crlf" attribute explicitly (such as git-cvsserver, which I > fixed) will break. I don't know how big a problem that is going > to be in practice, but nobody raised it as an issue during the > discussion. I agree that should not be a big issue. Regards, Clemens ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-22 22:27 ` Clemens Buchacher @ 2010-05-23 10:36 ` Eyvind Bernhardsen 2010-05-23 11:51 ` Clemens Buchacher 0 siblings, 1 reply; 35+ messages in thread From: Eyvind Bernhardsen @ 2010-05-23 10:36 UTC (permalink / raw) To: Clemens Buchacher; +Cc: Finn Arne Gangstad, Junio C Hamano, git On 23. mai 2010, at 00.27, Clemens Buchacher wrote: > Hi Eyvind, > > Thanks for the extended summary. I still have several doubts, as > detailed below. But I understand that this has been heavily > discussed and if the discussion has indeed come to a conclusion, > then I will not complain about it now. I appreciate your comments. The more people who take a critical look at the series, the better :) I think most of your concerns were covered during the discussion, but it's always good to have more sanity checks. [...] > For all my comments below I am assuming that the behavior of > autocrlf will be changed to respect the crlf/text attribute by > default. That's right, the attribute is used regardless of what autocrlf is set to. >> The "eol" attribute is used for files that need a specific line >> ending. Setting it also sets "text". > > If a file needs specific line endings, why enable conversion for > this file at all? Just make sure the repository contains the > correct version and unset the crlf attribute. Yeah, that's what I initially thought too, but it makes sense to be able to use normalization to prevent line ending breakages in your repository. If a file needs CRLFs for some tool to work, you don't want anyone to inadvertently convert it to LF, and "eol=crlf" makes git enforce that. [...] >> There is a backwards compatibility wrinkle in that core.autocrlf >> will override core.eol if the latter isn't explicitly set, so >> that "core.autocrlf=true" still results in CRLFs in the working >> directory on Linux. > > This also makes sense. I just fear that making this frequently > misunderstood feature even more complex will only confuse users > further. I agree, but I really wanted to avoid breaking any settings. I'm thinking of submitting a patch to remove the backwards compatibility for 1.8, leaving core.autocrlf as a simple boolean meaning "* text=auto". > I do see the value of a global core.eol option, however, since it > allows me to convert to LF instead of CRLF, which AFAIK is not > currently possible. Actually, since git normalizes to LF, "eol=lf" simply means "convert on input but not on output", which is what "core.autocrlf=input" currently does. The fact that you didn't know this reflects the poor usability of core.autocrlf, which is one of the things this series is trying to rectify :) It also allows for the possibility of supporting other line endings such as the classic Mac "CR only". That might not be a good thing. > On the other hand, this will cause users to stop caring whether or > not a file in the repository has LF or CRLF line endings. I am not > sure if that is a good thing, but I suppose it is better than what > we have now. The problem is that users don't care anyway, and they've been trained by other VCSes that they shouldn't need to care. This series allows me to put "* text=auto" in my repository and git will automatically normalize text files regardless of the user's configuration, which wasn't possible before. [...] > I see. Well, if we rename the "crlf" attribute then we will have a > macro attribute "binary" and an attribute "text", which are not the > opposite of each other. That is a bit strange. Yes, but I don't think it should cause any problems in practice. Since the "binary" attribute just means "-text -diff" it will override a "* text=auto", which I is the only relevant interaction I can see. -- Eyvind ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-23 10:36 ` Eyvind Bernhardsen @ 2010-05-23 11:51 ` Clemens Buchacher 2010-05-23 12:53 ` Eyvind Bernhardsen 2010-05-24 12:12 ` Dmitry Potapov 0 siblings, 2 replies; 35+ messages in thread From: Clemens Buchacher @ 2010-05-23 11:51 UTC (permalink / raw) To: Eyvind Bernhardsen; +Cc: Finn Arne Gangstad, Junio C Hamano, git On Sun, May 23, 2010 at 12:36:51PM +0200, Eyvind Bernhardsen wrote: > On 23. mai 2010, at 00.27, Clemens Buchacher wrote: > > >> The "eol" attribute is used for files that need a specific line > >> ending. Setting it also sets "text". > > > > If a file needs specific line endings, why enable conversion for > > this file at all? Just make sure the repository contains the > > correct version and unset the crlf attribute. > > Yeah, that's what I initially thought too, but it makes sense to > be able to use normalization to prevent line ending breakages in > your repository. If a file needs CRLFs for some tool to work, > you don't want anyone to inadvertently convert it to LF, and > "eol=crlf" makes git enforce that. Unsetting crlf/text already disables converting it to LF. The user would have to change the line endings in his work tree and commit the file with wrong line endings. I do not see how this can happen inadvertently. > > I do see the value of a global core.eol option, however, since it > > allows me to convert to LF instead of CRLF, which AFAIK is not > > currently possible. > > Actually, since git normalizes to LF, "eol=lf" simply means > "convert on input but not on output", which is what > "core.autocrlf=input" currently does. The fact that you didn't > know this reflects the poor usability of core.autocrlf, which is > one of the things this series is trying to rectify :) No, I am aware of autocrlf=input, but apparently I did not understand the meaning of eol=lf correctly. So if a file has CRLF endings in the repository, and eol=lf, it will _not_ be converted to LF in the work tree? Conversely, if it has LF endings in the repository, and eol=crlf, it _will_ be converted to CRLF in the work tree? I was expecting eol=lf and eol=crlf to be symmetric, which is also the reason for my reply to Finn's safe crlf patch. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-23 11:51 ` Clemens Buchacher @ 2010-05-23 12:53 ` Eyvind Bernhardsen 2010-05-23 13:26 ` Ævar Arnfjörð Bjarmason 2010-05-24 9:49 ` Clemens Buchacher 2010-05-24 12:12 ` Dmitry Potapov 1 sibling, 2 replies; 35+ messages in thread From: Eyvind Bernhardsen @ 2010-05-23 12:53 UTC (permalink / raw) To: Clemens Buchacher; +Cc: Finn Arne Gangstad, Junio C Hamano, git On 23. mai 2010, at 13.51, Clemens Buchacher wrote: > On Sun, May 23, 2010 at 12:36:51PM +0200, Eyvind Bernhardsen wrote: >> Yeah, that's what I initially thought too, but it makes sense to >> be able to use normalization to prevent line ending breakages in >> your repository. If a file needs CRLFs for some tool to work, >> you don't want anyone to inadvertently convert it to LF, and >> "eol=crlf" makes git enforce that. > > Unsetting crlf/text already disables converting it to LF. The user > would have to change the line endings in his work tree and commit > the file with wrong line endings. I do not see how this can happen > inadvertently. That's because you don't use (or work with people who use) editors that mangle line endings without asking. Modern versions of vi and Emacs don't do this, but the problem still exists in popular Windows editors. I don't have strong feelings about the need for this feature, but it has been requested so it's probably useful. [...] > No, I am aware of autocrlf=input, but apparently I did not > understand the meaning of eol=lf correctly. So if a file has CRLF > endings in the repository, and eol=lf, it will _not_ be converted > to LF in the work tree? Conversely, if it has LF endings in the > repository, and eol=crlf, it _will_ be converted to CRLF in the > work tree? That is correct, but "eol=lf" means that the file _should_ be LF-only in the repository. If it isn't, the repository is "corrupted". Such a file is marked as dirty when it is checked out and will be normalized to LF-only line endings when it is committed, at which point the repository will be fixed. This should only be a problem if you set the "text" or "eol" attributes in an existing repository, or if someone adds CRLFs to a normalized file using an older version of git. I'm sorry if I was unclear about Finn Arne's patch: "safe autocrlf" only applies when files are normalized due to core.autocrlf. If "text" or "eol" is set on a file, the file is normalized to LF-only regardless of whether it contains CRs in the repository. > I was expecting eol=lf and eol=crlf to be symmetric, which is also > the reason for my reply to Finn's safe crlf patch. I think they are symmetric in practice. While it's possible to get the repository into a state where you will get CRLFs in a text file that should be LF-only, it is both unlikely and easily fixed. -- Eyvind ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-23 12:53 ` Eyvind Bernhardsen @ 2010-05-23 13:26 ` Ævar Arnfjörð Bjarmason 2010-05-24 9:49 ` Clemens Buchacher 1 sibling, 0 replies; 35+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2010-05-23 13:26 UTC (permalink / raw) To: Eyvind Bernhardsen Cc: Clemens Buchacher, Finn Arne Gangstad, Junio C Hamano, git On Sun, May 23, 2010 at 12:53, Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> wrote: > I don't have strong feelings about the need for this feature, but it > has been requested so it's probably useful. It is, sanity checks for broken user tools like these are unfortunately important for Windows adoption of Git. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-23 12:53 ` Eyvind Bernhardsen 2010-05-23 13:26 ` Ævar Arnfjörð Bjarmason @ 2010-05-24 9:49 ` Clemens Buchacher 2010-05-24 12:47 ` Dmitry Potapov 2010-05-24 21:11 ` Eyvind Bernhardsen 1 sibling, 2 replies; 35+ messages in thread From: Clemens Buchacher @ 2010-05-24 9:49 UTC (permalink / raw) To: Eyvind Bernhardsen; +Cc: Finn Arne Gangstad, Junio C Hamano, git On Sun, May 23, 2010 at 02:53:18PM +0200, Eyvind Bernhardsen wrote: > On 23. mai 2010, at 13.51, Clemens Buchacher wrote: > > > On Sun, May 23, 2010 at 12:36:51PM +0200, Eyvind Bernhardsen wrote: > >> Yeah, that's what I initially thought too, but it makes sense to > >> be able to use normalization to prevent line ending breakages in > >> your repository. If a file needs CRLFs for some tool to work, > >> you don't want anyone to inadvertently convert it to LF, and > >> "eol=crlf" makes git enforce that. > > > > Unsetting crlf/text already disables converting it to LF. The user > > would have to change the line endings in his work tree and commit > > the file with wrong line endings. I do not see how this can happen > > inadvertently. > > That's because you don't use (or work with people who use) > editors that mangle line endings without asking. That I can imagine. But due to their change, the file will not work any more, not even on their own platform. What did they even make the change for then? > I don't have strong feelings about the need for this feature, but > it has been requested so it's probably useful. Just because a feature is requested doesn't mean it's useful, or even harmless. This has nothing to do with version control in the first place, so I do not see why we should suffer the additional complication. > > No, I am aware of autocrlf=input, but apparently I did not > > understand the meaning of eol=lf correctly. So if a file has CRLF > > endings in the repository, and eol=lf, it will _not_ be converted > > to LF in the work tree? Conversely, if it has LF endings in the > > repository, and eol=crlf, it _will_ be converted to CRLF in the > > work tree? > > That is correct, but "eol=lf" means that the file _should_ be > LF-only in the repository. If it isn't, the repository is > "corrupted". Such a file is marked as dirty when it is checked > out and will be normalized to LF-only line endings when it is > committed, at which point the repository will be fixed. With CRLF file in the repository, core.autocrlf=true and core.eol=lf, I tested the patches currently in pu (0ed6711a) in the following three scenarios: 1. no attributes are set 2. text attribute is set or auto 3. eol attribute is set to lf In the first scenario, the behavior is completely asymmetric. LF files will be converted to CRLF, if core.eol=crlf, but CRLF files will _not_ be converted to LF, even if core.eol=lf. And it will not be marked as dirty either. Yet this is the default behavior in terms of attributes. The only justification I can think of for this behavior is the fact that on platforms with LF line endings, most tools can deal with CRLF line endings. Not very convincing. In the second scenario, the file is marked as dirty. Neither reset --hard nor checkout HEAD . fix the problem. The file has to be added and committed, after which the line endings are _still_ CRLF. This appears to be the old autocrlf=true behavior. Is this intentional? The third scenario is similar to the second scenario, only it warns me about CRLF conversion during add and commit. The file still ends up having CRLF line endings in the work tree. git reset --hard does not fix the line endings. touch'ing the file finally makes git diff notice that the file is dirty, but git status still does not list the file. So to me, end of line conversion is still as confusing as it gets. > This should only be a problem if you set the "text" or "eol" > attributes in an existing repository, or if someone adds CRLFs to > a normalized file using an older version of git. In other words, this will be a problem all the time, since by default people will not even know about text or eol. Clemens ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-24 9:49 ` Clemens Buchacher @ 2010-05-24 12:47 ` Dmitry Potapov 2010-05-24 20:45 ` Eyvind Bernhardsen 2010-05-24 20:56 ` Clemens Buchacher 2010-05-24 21:11 ` Eyvind Bernhardsen 1 sibling, 2 replies; 35+ messages in thread From: Dmitry Potapov @ 2010-05-24 12:47 UTC (permalink / raw) To: Clemens Buchacher Cc: Eyvind Bernhardsen, Finn Arne Gangstad, Junio C Hamano, git On Mon, May 24, 2010 at 11:49:05AM +0200, Clemens Buchacher wrote: > > Just because a feature is requested doesn't mean it's useful, or > even harmless. This has nothing to do with version control in the > first place, so I do not see why we should suffer the additional > complication. IMHO, ability to enforce a specific for particular types of files is a good thing, and if you can say that these files must have CRLF, it should be possible to say that those files must have LF. At least, it would be a logic thing to do. I don't see how it can be harmful. > > With CRLF file in the repository, core.autocrlf=true and > core.eol=lf, I wonder if this combination should be allowed. core.autocrlf=true always implied that the native EOL is CRLF. So I do not think any reasonable behavior can be deduced for this combination. Can you imagine _anyone_ who would want to have such settings? Otherwise, it is better to error out if this combination is encountered. Dmitry ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-24 12:47 ` Dmitry Potapov @ 2010-05-24 20:45 ` Eyvind Bernhardsen 2010-05-24 20:56 ` Clemens Buchacher 1 sibling, 0 replies; 35+ messages in thread From: Eyvind Bernhardsen @ 2010-05-24 20:45 UTC (permalink / raw) To: Dmitry Potapov; +Cc: Clemens Buchacher, Finn Arne Gangstad, Junio C Hamano, git On 24. mai 2010, at 14.47, Dmitry Potapov wrote: > On Mon, May 24, 2010 at 11:49:05AM +0200, Clemens Buchacher wrote: >> With CRLF file in the repository, core.autocrlf=true and >> core.eol=lf, > > I wonder if this combination should be allowed. core.autocrlf=true > always implied that the native EOL is CRLF. So I do not think any > reasonable behavior can be deduced for this combination. Can you > imagine _anyone_ who would want to have such settings? Otherwise, > it is better to error out if this combination is encountered. It errors out when core.autocrlf=input conflicts with core.eol, but allows an explicitly set core.eol to override (with no warning) when core.autocrlf=true. That way, the meaning of core.autocrlf can later be changed to simply enable normalization without touching the output format--unfortunately removing any sense to the name. Leaving core.autocrlf to mean what its name implies would require a new config variable (core.autotext?) for people who want automatic normalization but LFs in their working directory. I'll be semi-offline for the next week or so, so any update of the series will have to wait until I get back. -- Eyvind ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-24 12:47 ` Dmitry Potapov 2010-05-24 20:45 ` Eyvind Bernhardsen @ 2010-05-24 20:56 ` Clemens Buchacher 2010-05-24 21:09 ` Eyvind Bernhardsen 1 sibling, 1 reply; 35+ messages in thread From: Clemens Buchacher @ 2010-05-24 20:56 UTC (permalink / raw) To: Dmitry Potapov Cc: Eyvind Bernhardsen, Finn Arne Gangstad, Junio C Hamano, git On Mon, May 24, 2010 at 04:47:34PM +0400, Dmitry Potapov wrote: > On Mon, May 24, 2010 at 11:49:05AM +0200, Clemens Buchacher wrote: > > > With CRLF file in the repository, core.autocrlf=true and > > core.eol=lf, > > I wonder if this combination should be allowed. core.autocrlf=true > always implied that the native EOL is CRLF. I just did what the commit message suggested: If core.eol is set explicitly (including setting it to "native"), it will override core.autocrlf so that [core] autocrlf = true eol = lf normalizes all files that look like text, but does not put CRLFs in the working directory. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-24 20:56 ` Clemens Buchacher @ 2010-05-24 21:09 ` Eyvind Bernhardsen 0 siblings, 0 replies; 35+ messages in thread From: Eyvind Bernhardsen @ 2010-05-24 21:09 UTC (permalink / raw) To: Clemens Buchacher; +Cc: Dmitry Potapov, Finn Arne Gangstad, Junio C Hamano, git On 24. mai 2010, at 22.56, Clemens Buchacher wrote: > On Mon, May 24, 2010 at 04:47:34PM +0400, Dmitry Potapov wrote: > >> On Mon, May 24, 2010 at 11:49:05AM +0200, Clemens Buchacher wrote: >> >>> With CRLF file in the repository, core.autocrlf=true and >>> core.eol=lf, >> >> I wonder if this combination should be allowed. core.autocrlf=true >> always implied that the native EOL is CRLF. > > I just did what the commit message suggested: > > If core.eol is set explicitly (including setting it to "native"), it > will override core.autocrlf so that > > [core] > autocrlf = true > eol = lf > > normalizes all files that look like text, but does not put CRLFs in the > working directory. Right. While technically true, this is misleading in that it implies that you will get LF line endings on all your files. I will update the documentation to reflect that core.autocrlf is only useful if you want to work with CRLF line endings in a repository that is not normalized. I won't be able to do that for a week or so, unfortunately. -- Eyvind ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-24 9:49 ` Clemens Buchacher 2010-05-24 12:47 ` Dmitry Potapov @ 2010-05-24 21:11 ` Eyvind Bernhardsen 2010-05-24 22:11 ` Clemens Buchacher 1 sibling, 1 reply; 35+ messages in thread From: Eyvind Bernhardsen @ 2010-05-24 21:11 UTC (permalink / raw) To: Clemens Buchacher; +Cc: Finn Arne Gangstad, Junio C Hamano, git On 24. mai 2010, at 11.49, Clemens Buchacher wrote: > With CRLF file in the repository, core.autocrlf=true and > core.eol=lf, I tested the patches currently in pu (0ed6711a) in the > following three scenarios: > > 1. no attributes are set > 2. text attribute is set or auto > 3. eol attribute is set to lf > > In the first scenario, the behavior is completely asymmetric. LF > files will be converted to CRLF, if core.eol=crlf, but CRLF files > will _not_ be converted to LF, even if core.eol=lf. And it will not > be marked as dirty either. Yet this is the default behavior in > terms of attributes. Default in terms of attributes maybe, but not in terms of configuration (you need to explicitly set core.eol to lf to see this). But I see your point, core.autocrlf just doesn't work very well unless you want CRLFs. > The only justification I can think of for this behavior is the fact > that on platforms with LF line endings, most tools can deal with > CRLF line endings. Not very convincing. Well, without "safe autocrlf", this would have worked as you expected, marking files as dirty and converting on checkin. Unfortunately, without "safe autocrlf", core.autocrlf just doesn't work in practice, PRECISELY because the person with core.autocrlf set has to clean up after everybody else. > In the second scenario, the file is marked as dirty. Neither reset > --hard nor checkout HEAD . fix the problem. The file has to be > added and committed, after which the line endings are _still_ CRLF. > This appears to be the old autocrlf=true behavior. Is this > intentional? Yes, since it uses the old autocrlf implementation. The documentation states that you need to convert your repository when enabling normalization, but is this going to be big problem in practice? Perhaps the working directory file should be converted to LF, but when? When it is added? When it's committed? If you checkout -f after you commit the change, the file should have the correct line ending, because then it is normalized in the repository and git knows what to do with it. > The third scenario is similar to the second scenario, only it warns > me about CRLF conversion during add and commit. The file still ends > up having CRLF line endings in the work tree. git reset --hard does > not fix the line endings. touch'ing the file finally makes git diff > notice that the file is dirty, but git status still does not list > the file. > > So to me, end of line conversion is still as confusing as it gets. All of your confusion seems to stem from starting with a repository that is not normalized and adding normalization to it. Yes, that is a pain, but there is no way to avoid that, and there is a recipe for fixing it in the documentation. >> This should only be a problem if you set the "text" or "eol" >> attributes in an existing repository, or if someone adds CRLFs to >> a normalized file using an older version of git. > > In other words, this will be a problem all the time, since by > default people will not even know about text or eol. By default people will not enable core.autocrlf either. I just don't see the huge problem here. If your repository does not need normalization, _good_! Nothing will ever be normalized or converted. If somebody wants to work on that repository but prefers CRLFs, they can enable core.autocrlf, and it will work sanely (thanks to "safe autocrlf"), converting text files that can safely be normalized back but leaving all other files alone. If you later discover that you want normalized text files in your repository, you _will_ have to convert all your files. That's a bit tricky, but it's not the huge problem you're making it out to be, and it only has to be done once. -- Eyvind ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-24 21:11 ` Eyvind Bernhardsen @ 2010-05-24 22:11 ` Clemens Buchacher 2010-05-25 6:41 ` Eyvind Bernhardsen 0 siblings, 1 reply; 35+ messages in thread From: Clemens Buchacher @ 2010-05-24 22:11 UTC (permalink / raw) To: Eyvind Bernhardsen; +Cc: Finn Arne Gangstad, Junio C Hamano, git On Mon, May 24, 2010 at 11:11:40PM +0200, Eyvind Bernhardsen wrote: > If you later discover that you want normalized text files in your > repository, you _will_ have to convert all your files. That's a > bit tricky, but it's not the huge problem you're making it out to > be, and it only has to be done once. I am not just making this stuff up. These things have bitten me in the past, and there have been complaints about it in #git. And even after finding the solution I always felt like crlf handling in git was really broken. I was hoping that after enabling the new eol handling, these weird effects would go away, but obviously they don't. Maybe for you it does not seem like such a big deal, because you are now so familiar with the inner workings of this algorithm. But to a naive user like me it is very counter-intuitive. I am not saying that the new features are all wrong. Some of them really are a major improvement. My main point is that it is still confusing and that in itself will cause issues for many people. And I don't see why we cannot do better. In the first scenario of my previous post (no attributes set), since I already enable core.eol = lf, couldn't we handle that as if text=auto were set on every file? Isn't that what core.autocrlf = true means in this case? And once we normalized the file to LF, why don't we also checkout that version, or at least mark it as dirty in the index, so a reset --hard will fix it up? Regards, Clemens ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-24 22:11 ` Clemens Buchacher @ 2010-05-25 6:41 ` Eyvind Bernhardsen 2010-05-25 8:27 ` Anthony Youngman 2010-05-25 8:33 ` Clemens Buchacher 0 siblings, 2 replies; 35+ messages in thread From: Eyvind Bernhardsen @ 2010-05-25 6:41 UTC (permalink / raw) To: Clemens Buchacher; +Cc: Finn Arne Gangstad, Junio C Hamano, git On 25. mai 2010, at 00.11, Clemens Buchacher wrote: > I am not just making this stuff up. These things have bitten me in > the past, and there have been complaints about it in #git. And even > after finding the solution I always felt like crlf handling in git > was really broken. I was hoping that after enabling the new eol > handling, these weird effects would go away, but obviously they > don't. Sorry, I was in a hurry and shouldn't have been so dismissive. I do think that the new features will help more than they cause trouble, though. [...] > And I don't see why we cannot do better. In the first scenario of > my previous post (no attributes set), since I already enable > core.eol = lf, couldn't we handle that as if text=auto were set on > every file? Isn't that what core.autocrlf = true means in this > case? Isn't that just the current core.autocrlf=input behaviour? The reason autocrlf was changed is because it breaks badly on repositories that have text files (ie, files that the autocrlf mechanism wants to normalize) that contain CRLFs in the repository. If you see someone who has problems with autocrlf in #git, it's almost certainly because autocrlf is on by default, they cloned a repository, and now a seemingly random selection of files are dirty and checking them out doesn't help. The "safe autocrlf" patch fixes this by not trying to normalize any files that are not already normalized in the index. This is what you noticed: the files do not show up as dirty and will not have their line endings converted. The tradeoff is that setting "core.autocrlf" no longer normalizes all text files, only new ones and ones that are already normalized. You (rightly) expected line endings to be normalized to LF when core.eol=lf, and I do need to fix that in the documentation. Safe autocrlf _only_ works if you want CRLF line endings in your working directory. A similar feature that converts text files that have CRLF in the repository to LF would need more development. I don't know how many people would use such a feature, and I would solve that problem by setting "* text=auto" and normalizing the repository instead. > And once we normalized the file to LF, why don't we also checkout > that version, or at least mark it as dirty in the index, so a reset > --hard will fix it up? I dunno. Won't it be even more confusing that the file is still dirty after you add it? The problem with converting it in the working directory when you add is that it loses information: if you didnt' want that file to be converted, there's no way to revert (this is very bad if it's a file that contained a mix of CRLF and LF). Maybe rewording the "[CR]LF will be replaced by [CR]LF in <file>" warning to tell the user how to get the working directory version normalized would be the best solution. As I said, though, this is a one-time problem: once your repository is normalized and has text attributes set, it will stay normalized. -- Eyvind ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-25 6:41 ` Eyvind Bernhardsen @ 2010-05-25 8:27 ` Anthony Youngman 2010-06-07 19:55 ` Eyvind Bernhardsen 2010-05-25 8:33 ` Clemens Buchacher 1 sibling, 1 reply; 35+ messages in thread From: Anthony Youngman @ 2010-05-25 8:27 UTC (permalink / raw) To: Eyvind Bernhardsen Cc: Clemens Buchacher, Finn Arne Gangstad, Junio C Hamano, git On 05/25/10 07:41, Eyvind Bernhardsen wrote: > The "safe autocrlf" patch fixes this by not trying to normalize any files that are not already normalized in the index. This is what you noticed: the files do not show up as dirty and will not have their line endings converted. The tradeoff is that setting "core.autocrlf" no longer normalizes all text files, only new ones and ones that are already normalized. > > You (rightly) expected line endings to be normalized to LF when core.eol=lf, and I do need to fix that in the documentation. Safe autocrlf _only_ works if you want CRLF line endings in your working directory. > Just a suggestion ... For core.autocrlf (or somewhere else more appropriate) could we add to false and true the option "force"? Bearing in mind "force" is always considered "a bit dangerous", that merely means "I don't care if it has crlf in the repository, change all commits to lf" (and checkouts to crlf if appropriate). Yep, things are likely to break, but I'm thinking this is the sort of situation where a lead dev could say to themselves "I know what I'm doing, we need to clean up, and if I set that as my options, then I know I can fix any resulting mess". Cheers, Wol ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-25 8:27 ` Anthony Youngman @ 2010-06-07 19:55 ` Eyvind Bernhardsen 0 siblings, 0 replies; 35+ messages in thread From: Eyvind Bernhardsen @ 2010-06-07 19:55 UTC (permalink / raw) To: Anthony Youngman Cc: Clemens Buchacher, Finn Arne Gangstad, Junio C Hamano, git Hi, I'm sorry I've taken so long to respond. On 25. mai 2010, at 10.27, Anthony Youngman wrote: > Just a suggestion ... > For core.autocrlf (or somewhere else more appropriate) could we add to > false and true the option "force"? > > Bearing in mind "force" is always considered "a bit dangerous", that > merely means "I don't care if it has crlf in the repository, change all > commits to lf" (and checkouts to crlf if appropriate). > > Yep, things are likely to break, but I'm thinking this is the sort of > situation where a lead dev could say to themselves "I know what I'm > doing, we need to clean up, and if I set that as my options, then I know > I can fix any resulting mess". If I understand you correctly, putting "* text=auto" in .gitattributes would do what you want, with the added benefit that you document the decision to normalize inside the repository. If you don't think that's a benefit, you can put "* text=auto" in .git/info/attributes for the same normalizing effect. -- Eyvind ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-25 6:41 ` Eyvind Bernhardsen 2010-05-25 8:27 ` Anthony Youngman @ 2010-05-25 8:33 ` Clemens Buchacher 1 sibling, 0 replies; 35+ messages in thread From: Clemens Buchacher @ 2010-05-25 8:33 UTC (permalink / raw) To: Eyvind Bernhardsen; +Cc: Finn Arne Gangstad, Junio C Hamano, git On Tue, May 25, 2010 at 08:41:14AM +0200, Eyvind Bernhardsen wrote: > On 25. mai 2010, at 00.11, Clemens Buchacher wrote: > > > And once we normalized the file to LF, why don't we also checkout > > that version, or at least mark it as dirty in the index, so a reset > > --hard will fix it up? > > I dunno. Won't it be even more confusing that the file is still > dirty after you add it? The problem with converting it in the > working directory when you add is that it loses information: if > you didnt' want that file to be converted, there's no way to > revert (this is very bad if it's a file that contained a mix of > CRLF and LF). I was trying to illustrate this with a test script, but after playing with this some more, I noticed that this is not a problem with your changes, since I can reproduce with git v1.7.0.5: #!/bin/sh cd $(mktemp -d) git init git config core.autocrlf input echo -n 'hi\r\nbye\r\n' > file git add file git commit -m initial # no output git diff --name-status sleep 1 touch file # file dirty git diff --name-status # at this point, reset --hard would convert 'file' to LF in the # work tree git update-index file # no output git diff --name-status The same behavior can be observed with the 'eol=lf' attribute, before and after normalizing the file. Regards, Clemens --- $ sh -x testcrlf2.sh + set -e + mktemp -d + cd /tmp/tmp.Smnx0U7IT1 + git init Initialized empty Git repository in /tmp/tmp.Smnx0U7IT1/.git/ + git config core.autocrlf input + echo -n hi\r\nbye\r\n + git add file warning: CRLF will be replaced by LF in file. + git commit -m initial [master (root-commit) d3be96f] initial warning: CRLF will be replaced by LF in file. 1 files changed, 2 insertions(+), 0 deletions(-) create mode 100644 file + git diff --name-status + sleep 1 + touch file + git diff --name-status M file + git update-index file warning: CRLF will be replaced by LF in file. + git diff --name-status ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-23 11:51 ` Clemens Buchacher 2010-05-23 12:53 ` Eyvind Bernhardsen @ 2010-05-24 12:12 ` Dmitry Potapov 2010-05-24 12:22 ` Erik Faye-Lund 1 sibling, 1 reply; 35+ messages in thread From: Dmitry Potapov @ 2010-05-24 12:12 UTC (permalink / raw) To: Clemens Buchacher Cc: Eyvind Bernhardsen, Finn Arne Gangstad, Junio C Hamano, git On Sun, May 23, 2010 at 01:51:27PM +0200, Clemens Buchacher wrote: > On Sun, May 23, 2010 at 12:36:51PM +0200, Eyvind Bernhardsen wrote: > > > > Actually, since git normalizes to LF, "eol=lf" simply means > > "convert on input but not on output", which is what > > "core.autocrlf=input" currently does. The fact that you didn't > > know this reflects the poor usability of core.autocrlf, which is > > one of the things this series is trying to rectify :) > > No, I am aware of autocrlf=input, but apparently I did not > understand the meaning of eol=lf correctly. So if a file has CRLF > endings in the repository, and eol=lf, it will _not_ be converted > to LF in the work tree? Conversely, if it has LF endings in the > repository, and eol=crlf, it _will_ be converted to CRLF in the > work tree? All text files should LF in the repository, if some file does not, it means the repository is corrupted in respect of handling text files. So, the situation is not symmetric at all! I don't know how better to handle this "corruption". I guess, it should be a warning about some files having different ending that it should have had based on their attributes. > > I was expecting eol=lf and eol=crlf to be symmetric, which is also > the reason for my reply to Finn's safe crlf patch. Finn's patch about _automatic_ text detection when there is no explicit policy about what ending this file should have. Dmitry ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-24 12:12 ` Dmitry Potapov @ 2010-05-24 12:22 ` Erik Faye-Lund 2010-05-24 12:42 ` Dmitry Potapov 0 siblings, 1 reply; 35+ messages in thread From: Erik Faye-Lund @ 2010-05-24 12:22 UTC (permalink / raw) To: Dmitry Potapov Cc: Clemens Buchacher, Eyvind Bernhardsen, Finn Arne Gangstad, Junio C Hamano, git On Mon, May 24, 2010 at 2:12 PM, Dmitry Potapov <dpotapov@gmail.com> wrote: > On Sun, May 23, 2010 at 01:51:27PM +0200, Clemens Buchacher wrote: >> On Sun, May 23, 2010 at 12:36:51PM +0200, Eyvind Bernhardsen wrote: >> > >> > Actually, since git normalizes to LF, "eol=lf" simply means >> > "convert on input but not on output", which is what >> > "core.autocrlf=input" currently does. The fact that you didn't >> > know this reflects the poor usability of core.autocrlf, which is >> > one of the things this series is trying to rectify :) >> >> No, I am aware of autocrlf=input, but apparently I did not >> understand the meaning of eol=lf correctly. So if a file has CRLF >> endings in the repository, and eol=lf, it will _not_ be converted >> to LF in the work tree? Conversely, if it has LF endings in the >> repository, and eol=crlf, it _will_ be converted to CRLF in the >> work tree? > > All text files should LF in the repository, if some file does not, it > means the repository is corrupted in respect of handling text files. > So, the situation is not symmetric at all! I don't know how better to > handle this "corruption". I guess, it should be a warning about some > files having different ending that it should have had based on their > attributes. > I thought the original motivation behind this change was to make repos with CRLF-textfiles work without reporting diffs on all lines when autocrlf was enabled. Because checking in CRLF-files DOES happen, and for some of us the reality is that we have to deal with such repos. Perhaps I'm confusing this discussion with some other, though. -- Erik "kusma" Faye-Lund ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-24 12:22 ` Erik Faye-Lund @ 2010-05-24 12:42 ` Dmitry Potapov 0 siblings, 0 replies; 35+ messages in thread From: Dmitry Potapov @ 2010-05-24 12:42 UTC (permalink / raw) To: kusmabite Cc: Clemens Buchacher, Eyvind Bernhardsen, Finn Arne Gangstad, Junio C Hamano, git On Mon, May 24, 2010 at 02:22:13PM +0200, Erik Faye-Lund wrote: > > I thought the original motivation behind this change was to make repos > with CRLF-textfiles work without reporting diffs on all lines when > autocrlf was enabled. Because checking in CRLF-files DOES happen, and > for some of us the reality is that we have to deal with such repos. Sure, but then CRLF files are treated as "binary" as far as autocrlf is concerned. There is no conversion for such files even though automatic text detection would detect them as. Thus, you do not have to worry that enabling autocrlf may be incompatible with some repositories. The situation is different when a file explicitly marked by attributes to have some particular ending. It is a policy of that repository, and if it is not followed, it means it is "corrupted". It is similar to what you would have with SVN with eol-style=native for some file while it being stored with CRLF inside of the SVN repository (obviously, any standard SVN client should not allow this to happen). Dmitry ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-19 14:33 What's cooking extra Junio C Hamano 2010-05-19 15:12 ` A Large Angry SCM 2010-05-19 17:06 ` Finn Arne Gangstad @ 2010-05-21 16:16 ` Ævar Arnfjörð Bjarmason 2010-05-22 21:24 ` René Scharfe 3 siblings, 0 replies; 35+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2010-05-21 16:16 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Wed, May 19, 2010 at 14:33, Junio C Hamano <gitster@pobox.com> wrote: > I am aware of the following topics, that are probably all worthy of > inclusion at some point, but am unclear in what status their discussions > are. I'd appreciate it if people can help me come up with a list of > topics that are fully discussed, and if patch submitters of these topics > can re-send the final "to apply" copy. > ... > * (Ævar Arnfjörð Bjarmason) cvsserver updates When I originally submitted this in 2008 you seemed to be willing to let it go in as-is. But arguably it's better if this were to be squashed into one patch. It's more readable, but some history will be destroyed along the way. I don't care either way, but I can submit it in squashed form on request. Thanks. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: What's cooking extra 2010-05-19 14:33 What's cooking extra Junio C Hamano ` (2 preceding siblings ...) 2010-05-21 16:16 ` Ævar Arnfjörð Bjarmason @ 2010-05-22 21:24 ` René Scharfe 2010-05-22 21:26 ` [PATCH 1/8] grep: add test script for binary file handling René Scharfe ` (7 more replies) 3 siblings, 8 replies; 35+ messages in thread From: René Scharfe @ 2010-05-22 21:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Phil Lawrence Am 19.05.2010 16:33, schrieb Junio C Hamano: > I am aware of the following topics, that are probably all worthy of > inclusion at some point, but am unclear in what status their discussions > are. I'd appreciate it if people can help me come up with a list of > topics that are fully discussed, and if patch submitters of these topics > can re-send the final "to apply" copy. > * (Rene) grep on binary files There was one helpful comment from Dmitry, which I addressed in a follow-up patch. No reply from Phil, the one who started the topic, though. I'll send an updated round as replies to this message: [PATCH 1/8] grep: add test script for binary file handling Adds a simple test script documenting what git grep can do with binary files. New: tests for -L and -q. [PATCH 2/8] grep: grep: refactor handling of binary mode options Cleanup patch; unchanged. [PATCH 3/8] grep: --count over binary [PATCH 4/8] grep: --name-only over binary Correctness patches for handling of the options --count and --name-only in connection with binary files. The first one was reimplemented and the second one is new. [PATCH 5/8] grep: use memmem() for fixed string search [PATCH 6/8] grep: continue case insensitive fixed string search after NUL chars These two patches make git grep -F work on binary files. They have been rebased against the preceding changed patches but are unchanged otherwise. [PATCH 7/8] grep: use REG_STARTEND for all matching if available This make git grep work on binary files if the platform's regexec() supports the flag REG_STARTEND. Our own version in compat/ doesn't, unfortunately. In the first round it consisted of two patches, which have been squashed and rebased. [PATCH 8/8] grep: support NUL chars in search strings for -F New patch, adds support for NUL in patterns, but only for git grep -F (not -Fi). It's main value is the addition of tests to show the current limitations regarding searching for NULs. builtin/grep.c | 8 +++- grep.c | 98 +++++++++++++++++++++++++++------------------ grep.h | 2 + t/t7008-grep-binary.sh | 102 ++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 169 insertions(+), 41 deletions(-) ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 1/8] grep: add test script for binary file handling 2010-05-22 21:24 ` René Scharfe @ 2010-05-22 21:26 ` René Scharfe 2010-05-22 21:28 ` [PATCH 2/8] grep: grep: refactor handling of binary mode options René Scharfe ` (6 subsequent siblings) 7 siblings, 0 replies; 35+ messages in thread From: René Scharfe @ 2010-05-22 21:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> --- t/t7008-grep-binary.sh | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 42 insertions(+), 0 deletions(-) create mode 100755 t/t7008-grep-binary.sh diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh new file mode 100755 index 0000000..2320e74 --- /dev/null +++ b/t/t7008-grep-binary.sh @@ -0,0 +1,42 @@ +#!/bin/sh + +test_description='git grep in binary files' + +. ./test-lib.sh + +test_expect_success 'setup' " + printf 'binary\000file\n' >a && + git add a && + git commit -m. +" + +test_expect_success 'git grep ina a' ' + echo Binary file a matches >expect && + git grep ina a >actual && + test_cmp expect actual +' + +test_expect_success 'git grep -ah ina a' ' + git grep -ah ina a >actual && + test_cmp a actual +' + +test_expect_success 'git grep -I ina a' ' + : >expect && + test_must_fail git grep -I ina a >actual && + test_cmp expect actual +' + +test_expect_success 'git grep -L bar a' ' + echo a >expect && + git grep -L bar a >actual && + test_cmp expect actual +' + +test_expect_success 'git grep -q ina a' ' + : >expect && + git grep -q ina a >actual && + test_cmp expect actual +' + +test_done -- 1.7.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 2/8] grep: grep: refactor handling of binary mode options 2010-05-22 21:24 ` René Scharfe 2010-05-22 21:26 ` [PATCH 1/8] grep: add test script for binary file handling René Scharfe @ 2010-05-22 21:28 ` René Scharfe 2010-05-22 21:29 ` [PATCH 3/8] grep: --count over binary René Scharfe ` (5 subsequent siblings) 7 siblings, 0 replies; 35+ messages in thread From: René Scharfe @ 2010-05-22 21:28 UTC (permalink / raw) Cc: Junio C Hamano, git Turn the switch inside-out and add labels for each possible value of ->binary. This makes the code easier to read and avoids calling buffer_is_binary() if the option -a was given. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> --- grep.c | 20 +++++++++++--------- 1 files changed, 11 insertions(+), 9 deletions(-) diff --git a/grep.c b/grep.c index 543b1d5..2a8e879 100644 --- a/grep.c +++ b/grep.c @@ -800,17 +800,19 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, opt->show_hunk_mark = 1; opt->last_shown = 0; - if (buffer_is_binary(buf, size)) { - switch (opt->binary) { - case GREP_BINARY_DEFAULT: + switch (opt->binary) { + case GREP_BINARY_DEFAULT: + if (buffer_is_binary(buf, size)) binary_match_only = 1; - break; - case GREP_BINARY_NOMATCH: + break; + case GREP_BINARY_NOMATCH: + if (buffer_is_binary(buf, size)) return 0; /* Assume unmatch */ - break; - default: - break; - } + break; + case GREP_BINARY_TEXT: + break; + default: + die("bug: unknown binary handling mode"); } memset(&xecfg, 0, sizeof(xecfg)); -- 1.7.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 3/8] grep: --count over binary 2010-05-22 21:24 ` René Scharfe 2010-05-22 21:26 ` [PATCH 1/8] grep: add test script for binary file handling René Scharfe 2010-05-22 21:28 ` [PATCH 2/8] grep: grep: refactor handling of binary mode options René Scharfe @ 2010-05-22 21:29 ` René Scharfe 2010-05-22 21:30 ` [PATCH 4/8] grep: --name-only " René Scharfe ` (4 subsequent siblings) 7 siblings, 0 replies; 35+ messages in thread From: René Scharfe @ 2010-05-22 21:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: git The intent of showing the message "Binary file xyz matches" for binary files is to avoid annoying users by potentially messing up their terminals by printing control characters. In --count mode, this precaution isn't necessary. Display counts of matches if -c/--count was specified, even if -a was not given. GNU grep does the same. Moving the check for ->count before the code for handling binary file also avoids printing context lines if --count and -[ABC] were used together, so we can remove the part of the comment that mentions this behaviour. Again, GNU grep does the same. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> --- grep.c | 9 ++++----- t/t7008-grep-binary.sh | 6 ++++++ 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/grep.c b/grep.c index 2a8e879..35c18b7 100644 --- a/grep.c +++ b/grep.c @@ -873,6 +873,8 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, count++; if (opt->status_only) return 1; + if (opt->count) + goto next_line; if (binary_match_only) { opt->output(opt, "Binary file ", 12); output_color(opt, name, strlen(name), @@ -886,16 +888,12 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, } /* Hit at this line. If we haven't shown the * pre-context lines, we would need to show them. - * When asked to do "count", this still show - * the context which is nonsense, but the user - * deserves to get that ;-). */ if (opt->pre_context) show_pre_context(opt, name, buf, bol, lno); else if (opt->funcname) show_funcname_line(opt, name, buf, bol, lno); - if (!opt->count) - show_line(opt, bol, eol, name, lno, ':'); + show_line(opt, bol, eol, name, lno, ':'); last_hit = lno; } else if (last_hit && @@ -939,6 +937,7 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, output_sep(opt, ':'); snprintf(buf, sizeof(buf), "%u\n", count); opt->output(opt, buf, strlen(buf)); + return 1; } return !!last_hit; } diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh index 2320e74..91970ea 100755 --- a/t/t7008-grep-binary.sh +++ b/t/t7008-grep-binary.sh @@ -27,6 +27,12 @@ test_expect_success 'git grep -I ina a' ' test_cmp expect actual ' +test_expect_success 'git grep -c ina a' ' + echo a:1 >expect && + git grep -c ina a >actual && + test_cmp expect actual +' + test_expect_success 'git grep -L bar a' ' echo a >expect && git grep -L bar a >actual && -- 1.7.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 4/8] grep: --name-only over binary 2010-05-22 21:24 ` René Scharfe ` (2 preceding siblings ...) 2010-05-22 21:29 ` [PATCH 3/8] grep: --count over binary René Scharfe @ 2010-05-22 21:30 ` René Scharfe 2010-05-22 21:32 ` [PATCH 5/8] grep: use memmem() for fixed string search René Scharfe ` (3 subsequent siblings) 7 siblings, 0 replies; 35+ messages in thread From: René Scharfe @ 2010-05-22 21:30 UTC (permalink / raw) To: Junio C Hamano; +Cc: git As with the option -c/--count, git grep with the option -l/--name-only should work the same with binary files as with text files because there is no danger of messing up the terminal with control characters from the contents of matching files. GNU grep does the same. Move the check for ->name_only before the one for binary_match_only, thus making the latter irrelevant for git grep -l. Reported-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> --- grep.c | 8 ++++---- t/t7008-grep-binary.sh | 6 ++++++ 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/grep.c b/grep.c index 35c18b7..22639cd 100644 --- a/grep.c +++ b/grep.c @@ -873,6 +873,10 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, count++; if (opt->status_only) return 1; + if (opt->name_only) { + show_name(opt, name); + return 1; + } if (opt->count) goto next_line; if (binary_match_only) { @@ -882,10 +886,6 @@ static int grep_buffer_1(struct grep_opt *opt, const char *name, opt->output(opt, " matches\n", 9); return 1; } - if (opt->name_only) { - show_name(opt, name); - return 1; - } /* Hit at this line. If we haven't shown the * pre-context lines, we would need to show them. */ diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh index 91970ea..4a12d97 100755 --- a/t/t7008-grep-binary.sh +++ b/t/t7008-grep-binary.sh @@ -33,6 +33,12 @@ test_expect_success 'git grep -c ina a' ' test_cmp expect actual ' +test_expect_success 'git grep -l ina a' ' + echo a >expect && + git grep -l ina a >actual && + test_cmp expect actual +' + test_expect_success 'git grep -L bar a' ' echo a >expect && git grep -L bar a >actual && -- 1.7.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 5/8] grep: use memmem() for fixed string search 2010-05-22 21:24 ` René Scharfe ` (3 preceding siblings ...) 2010-05-22 21:30 ` [PATCH 4/8] grep: --name-only " René Scharfe @ 2010-05-22 21:32 ` René Scharfe 2010-05-22 21:34 ` [PATCH 6/8] grep: continue case insensitive fixed string search after NUL chars René Scharfe ` (2 subsequent siblings) 7 siblings, 0 replies; 35+ messages in thread From: René Scharfe @ 2010-05-22 21:32 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Allow searching beyond NUL characters by using memmem() instead of strstr(). Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> --- grep.c | 16 +++++++++------- t/t7008-grep-binary.sh | 4 ++++ 2 files changed, 13 insertions(+), 7 deletions(-) diff --git a/grep.c b/grep.c index 22639cd..c3affb6 100644 --- a/grep.c +++ b/grep.c @@ -329,14 +329,15 @@ static void show_name(struct grep_opt *opt, const char *name) opt->output(opt, opt->null_following_name ? "\0" : "\n", 1); } - -static int fixmatch(const char *pattern, char *line, int ignore_case, regmatch_t *match) +static int fixmatch(const char *pattern, char *line, char *eol, + int ignore_case, regmatch_t *match) { char *hit; + if (ignore_case) hit = strcasestr(line, pattern); else - hit = strstr(line, pattern); + hit = memmem(line, eol - line, pattern, strlen(pattern)); if (!hit) { match->rm_so = match->rm_eo = -1; @@ -399,7 +400,7 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol, again: if (p->fixed) - hit = !fixmatch(p->pattern, bol, p->ignore_case, pmatch); + hit = !fixmatch(p->pattern, bol, eol, p->ignore_case, pmatch); else hit = !regexec(&p->regexp, bol, 1, pmatch, eflags); @@ -725,9 +726,10 @@ static int look_ahead(struct grep_opt *opt, int hit; regmatch_t m; - if (p->fixed) - hit = !fixmatch(p->pattern, bol, p->ignore_case, &m); - else { + if (p->fixed) { + hit = !fixmatch(p->pattern, bol, bol + *left_p, + p->ignore_case, &m); + } else { #ifdef REG_STARTEND m.rm_so = 0; m.rm_eo = *left_p; diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh index 4a12d97..9adc9ed 100755 --- a/t/t7008-grep-binary.sh +++ b/t/t7008-grep-binary.sh @@ -51,4 +51,8 @@ test_expect_success 'git grep -q ina a' ' test_cmp expect actual ' +test_expect_success 'git grep -F ile a' ' + git grep -F ile a +' + test_done -- 1.7.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 6/8] grep: continue case insensitive fixed string search after NUL chars 2010-05-22 21:24 ` René Scharfe ` (4 preceding siblings ...) 2010-05-22 21:32 ` [PATCH 5/8] grep: use memmem() for fixed string search René Scharfe @ 2010-05-22 21:34 ` René Scharfe 2010-05-22 21:35 ` [PATCH 7/8] grep: use REG_STARTEND for all matching if available René Scharfe 2010-05-22 21:43 ` [PATCH 8/8] grep: support NUL chars in search strings for -F René Scharfe 7 siblings, 0 replies; 35+ messages in thread From: René Scharfe @ 2010-05-22 21:34 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Functions for C strings, like strcasestr(), can't see beyond NUL characters. Check if there is such an obstacle on the line and try again behind it. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> --- grep.c | 12 +++++++++--- t/t7008-grep-binary.sh | 4 ++++ 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/grep.c b/grep.c index c3affb6..b95803b 100644 --- a/grep.c +++ b/grep.c @@ -334,9 +334,15 @@ static int fixmatch(const char *pattern, char *line, char *eol, { char *hit; - if (ignore_case) - hit = strcasestr(line, pattern); - else + if (ignore_case) { + char *s = line; + do { + hit = strcasestr(s, pattern); + if (hit) + break; + s += strlen(s) + 1; + } while (s < eol); + } else hit = memmem(line, eol - line, pattern, strlen(pattern)); if (!hit) { diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh index 9adc9ed..9660842 100755 --- a/t/t7008-grep-binary.sh +++ b/t/t7008-grep-binary.sh @@ -55,4 +55,8 @@ test_expect_success 'git grep -F ile a' ' git grep -F ile a ' +test_expect_success 'git grep -Fi iLE a' ' + git grep -Fi iLE a +' + test_done -- 1.7.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 7/8] grep: use REG_STARTEND for all matching if available 2010-05-22 21:24 ` René Scharfe ` (5 preceding siblings ...) 2010-05-22 21:34 ` [PATCH 6/8] grep: continue case insensitive fixed string search after NUL chars René Scharfe @ 2010-05-22 21:35 ` René Scharfe 2010-05-22 21:43 ` [PATCH 8/8] grep: support NUL chars in search strings for -F René Scharfe 7 siblings, 0 replies; 35+ messages in thread From: René Scharfe @ 2010-05-22 21:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Refactor REG_STARTEND handling inlook_ahead() into a new helper, regmatch(), and use it for line matching, too. This allows regex matching beyond NUL characters if regexec() supports the flag. NUL characters themselves are not matched in any way, though. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> --- grep.c | 24 ++++++++++++++---------- t/t7008-grep-binary.sh | 10 ++++++++++ 2 files changed, 24 insertions(+), 10 deletions(-) diff --git a/grep.c b/grep.c index b95803b..70a776f 100644 --- a/grep.c +++ b/grep.c @@ -356,6 +356,17 @@ static int fixmatch(const char *pattern, char *line, char *eol, } } +static int regmatch(const regex_t *preg, char *line, char *eol, + regmatch_t *match, int eflags) +{ +#ifdef REG_STARTEND + match->rm_so = 0; + match->rm_eo = eol - line; + eflags |= REG_STARTEND; +#endif + return regexec(preg, line, 1, match, eflags); +} + static int strip_timestamp(char *bol, char **eol_p) { char *eol = *eol_p; @@ -408,7 +419,7 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol, if (p->fixed) hit = !fixmatch(p->pattern, bol, eol, p->ignore_case, pmatch); else - hit = !regexec(&p->regexp, bol, 1, pmatch, eflags); + hit = !regmatch(&p->regexp, bol, eol, pmatch, eflags); if (hit && p->word_regexp) { if ((pmatch[0].rm_so < 0) || @@ -735,15 +746,8 @@ static int look_ahead(struct grep_opt *opt, if (p->fixed) { hit = !fixmatch(p->pattern, bol, bol + *left_p, p->ignore_case, &m); - } else { -#ifdef REG_STARTEND - m.rm_so = 0; - m.rm_eo = *left_p; - hit = !regexec(&p->regexp, bol, 1, &m, REG_STARTEND); -#else - hit = !regexec(&p->regexp, bol, 1, &m, 0); -#endif - } + } else + hit = !regmatch(&p->regexp, bol, bol + *left_p, &m, 0); if (!hit || m.rm_so < 0 || m.rm_eo < 0) continue; if (earliest < 0 || m.rm_so < earliest) diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh index 9660842..4f5e74f 100755 --- a/t/t7008-grep-binary.sh +++ b/t/t7008-grep-binary.sh @@ -59,4 +59,14 @@ test_expect_success 'git grep -Fi iLE a' ' git grep -Fi iLE a ' +# This test actually passes on platforms where regexec() supports the +# flag REG_STARTEND. +test_expect_failure 'git grep ile a' ' + git grep ile a +' + +test_expect_failure 'git grep .fi a' ' + git grep .fi a +' + test_done -- 1.7.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 8/8] grep: support NUL chars in search strings for -F 2010-05-22 21:24 ` René Scharfe ` (6 preceding siblings ...) 2010-05-22 21:35 ` [PATCH 7/8] grep: use REG_STARTEND for all matching if available René Scharfe @ 2010-05-22 21:43 ` René Scharfe 7 siblings, 0 replies; 35+ messages in thread From: René Scharfe @ 2010-05-22 21:43 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Search patterns in a file specified with -f can contain NUL characters. The current code ignores all characters on a line after a NUL. Pass the actual length of the line all the way from the pattern file to fixmatch() and use it for case-sensitive fixed string matching. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> --- Support for -F was easy, but in order to be able to search for NULs with -Fi, -G and -E, we'd need a different case-insensitive fixed string search function (memcasemem?) and a different regex library, or at least use a different (non-POSIX) entry point. How badly do we need this feature? If the new regex lib is faster or improves multi-platform support then NUL support would be a nice side effect, I think, but this feature alone doesn't justify a switch in my eyes. builtin/grep.c | 8 ++++++-- grep.c | 33 ++++++++++++++++++++------------- grep.h | 2 ++ t/t7008-grep-binary.sh | 30 ++++++++++++++++++++++++++++++ 4 files changed, 58 insertions(+), 15 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index b194ea3..d0a73da 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -724,11 +724,15 @@ static int file_callback(const struct option *opt, const char *arg, int unset) if (!patterns) die_errno("cannot open '%s'", arg); while (strbuf_getline(&sb, patterns, '\n') == 0) { + char *s; + size_t len; + /* ignore empty line like grep does */ if (sb.len == 0) continue; - append_grep_pattern(grep_opt, strbuf_detach(&sb, NULL), arg, - ++lno, GREP_PATTERN); + + s = strbuf_detach(&sb, &len); + append_grep_pat(grep_opt, s, len, arg, ++lno, GREP_PATTERN); } fclose(patterns); strbuf_release(&sb); diff --git a/grep.c b/grep.c index 70a776f..82fb349 100644 --- a/grep.c +++ b/grep.c @@ -7,6 +7,7 @@ void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field fie { struct grep_pat *p = xcalloc(1, sizeof(*p)); p->pattern = pat; + p->patternlen = strlen(pat); p->origin = "header"; p->no = 0; p->token = GREP_PATTERN_HEAD; @@ -19,8 +20,15 @@ void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field fie void append_grep_pattern(struct grep_opt *opt, const char *pat, const char *origin, int no, enum grep_pat_token t) { + append_grep_pat(opt, pat, strlen(pat), origin, no, t); +} + +void append_grep_pat(struct grep_opt *opt, const char *pat, size_t patlen, + const char *origin, int no, enum grep_pat_token t) +{ struct grep_pat *p = xcalloc(1, sizeof(*p)); p->pattern = pat; + p->patternlen = patlen; p->origin = origin; p->no = no; p->token = t; @@ -44,8 +52,8 @@ struct grep_opt *grep_opt_dup(const struct grep_opt *opt) append_header_grep_pattern(ret, pat->field, pat->pattern); else - append_grep_pattern(ret, pat->pattern, pat->origin, - pat->no, pat->token); + append_grep_pat(ret, pat->pattern, pat->patternlen, + pat->origin, pat->no, pat->token); } return ret; @@ -329,21 +337,21 @@ static void show_name(struct grep_opt *opt, const char *name) opt->output(opt, opt->null_following_name ? "\0" : "\n", 1); } -static int fixmatch(const char *pattern, char *line, char *eol, - int ignore_case, regmatch_t *match) +static int fixmatch(struct grep_pat *p, char *line, char *eol, + regmatch_t *match) { char *hit; - if (ignore_case) { + if (p->ignore_case) { char *s = line; do { - hit = strcasestr(s, pattern); + hit = strcasestr(s, p->pattern); if (hit) break; s += strlen(s) + 1; } while (s < eol); } else - hit = memmem(line, eol - line, pattern, strlen(pattern)); + hit = memmem(line, eol - line, p->pattern, p->patternlen); if (!hit) { match->rm_so = match->rm_eo = -1; @@ -351,7 +359,7 @@ static int fixmatch(const char *pattern, char *line, char *eol, } else { match->rm_so = hit - line; - match->rm_eo = match->rm_so + strlen(pattern); + match->rm_eo = match->rm_so + p->patternlen; return 0; } } @@ -417,7 +425,7 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol, again: if (p->fixed) - hit = !fixmatch(p->pattern, bol, eol, p->ignore_case, pmatch); + hit = !fixmatch(p, bol, eol, pmatch); else hit = !regmatch(&p->regexp, bol, eol, pmatch, eflags); @@ -743,10 +751,9 @@ static int look_ahead(struct grep_opt *opt, int hit; regmatch_t m; - if (p->fixed) { - hit = !fixmatch(p->pattern, bol, bol + *left_p, - p->ignore_case, &m); - } else + if (p->fixed) + hit = !fixmatch(p, bol, bol + *left_p, &m); + else hit = !regmatch(&p->regexp, bol, bol + *left_p, &m, 0); if (!hit || m.rm_so < 0 || m.rm_eo < 0) continue; diff --git a/grep.h b/grep.h index 89342e5..0aebebd 100644 --- a/grep.h +++ b/grep.h @@ -29,6 +29,7 @@ struct grep_pat { int no; enum grep_pat_token token; const char *pattern; + size_t patternlen; enum grep_header_field field; regex_t regexp; unsigned fixed:1; @@ -104,6 +105,7 @@ struct grep_opt { void *output_priv; }; +extern void append_grep_pat(struct grep_opt *opt, const char *pat, size_t patlen, const char *origin, int no, enum grep_pat_token t); extern void append_grep_pattern(struct grep_opt *opt, const char *pat, const char *origin, int no, enum grep_pat_token t); extern void append_header_grep_pattern(struct grep_opt *, enum grep_header_field, const char *); extern void compile_grep_patterns(struct grep_opt *opt); diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh index 4f5e74f..eb8ca88 100755 --- a/t/t7008-grep-binary.sh +++ b/t/t7008-grep-binary.sh @@ -69,4 +69,34 @@ test_expect_failure 'git grep .fi a' ' git grep .fi a ' +test_expect_success 'git grep -F y<NUL>f a' " + printf 'y\000f' >f && + git grep -f f -F a +" + +test_expect_success 'git grep -F y<NUL>x a' " + printf 'y\000x' >f && + test_must_fail git grep -f f -F a +" + +test_expect_success 'git grep -Fi Y<NUL>f a' " + printf 'Y\000f' >f && + git grep -f f -Fi a +" + +test_expect_failure 'git grep -Fi Y<NUL>x a' " + printf 'Y\000x' >f && + test_must_fail git grep -f f -Fi a +" + +test_expect_success 'git grep y<NUL>f a' " + printf 'y\000f' >f && + git grep -f f a +" + +test_expect_failure 'git grep y<NUL>x a' " + printf 'y\000x' >f && + test_must_fail git grep -f f a +" + test_done -- 1.7.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
end of thread, other threads:[~2010-06-07 19:55 UTC | newest] Thread overview: 35+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-05-19 14:33 What's cooking extra Junio C Hamano 2010-05-19 15:12 ` A Large Angry SCM 2010-05-19 17:06 ` Finn Arne Gangstad 2010-05-19 20:09 ` Eyvind Bernhardsen 2010-05-22 13:09 ` Clemens Buchacher 2010-05-22 19:42 ` Eyvind Bernhardsen 2010-05-22 22:27 ` Clemens Buchacher 2010-05-23 10:36 ` Eyvind Bernhardsen 2010-05-23 11:51 ` Clemens Buchacher 2010-05-23 12:53 ` Eyvind Bernhardsen 2010-05-23 13:26 ` Ævar Arnfjörð Bjarmason 2010-05-24 9:49 ` Clemens Buchacher 2010-05-24 12:47 ` Dmitry Potapov 2010-05-24 20:45 ` Eyvind Bernhardsen 2010-05-24 20:56 ` Clemens Buchacher 2010-05-24 21:09 ` Eyvind Bernhardsen 2010-05-24 21:11 ` Eyvind Bernhardsen 2010-05-24 22:11 ` Clemens Buchacher 2010-05-25 6:41 ` Eyvind Bernhardsen 2010-05-25 8:27 ` Anthony Youngman 2010-06-07 19:55 ` Eyvind Bernhardsen 2010-05-25 8:33 ` Clemens Buchacher 2010-05-24 12:12 ` Dmitry Potapov 2010-05-24 12:22 ` Erik Faye-Lund 2010-05-24 12:42 ` Dmitry Potapov 2010-05-21 16:16 ` Ævar Arnfjörð Bjarmason 2010-05-22 21:24 ` René Scharfe 2010-05-22 21:26 ` [PATCH 1/8] grep: add test script for binary file handling René Scharfe 2010-05-22 21:28 ` [PATCH 2/8] grep: grep: refactor handling of binary mode options René Scharfe 2010-05-22 21:29 ` [PATCH 3/8] grep: --count over binary René Scharfe 2010-05-22 21:30 ` [PATCH 4/8] grep: --name-only " René Scharfe 2010-05-22 21:32 ` [PATCH 5/8] grep: use memmem() for fixed string search René Scharfe 2010-05-22 21:34 ` [PATCH 6/8] grep: continue case insensitive fixed string search after NUL chars René Scharfe 2010-05-22 21:35 ` [PATCH 7/8] grep: use REG_STARTEND for all matching if available René Scharfe 2010-05-22 21:43 ` [PATCH 8/8] grep: support NUL chars in search strings for -F René Scharfe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).