git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Levedahl <mdl123@verizon.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Junio C Hamano <junkio@cox.net>,
	Alexander Litvinov <litvinov2004@gmail.com>,
	Mark Levedahl <mlevedahl@verizon.net>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: mingw, windows, crlf/lf, and git
Date: Wed, 14 Feb 2007 09:26:22 -0500	[thread overview]
Message-ID: <45D31C0E.2040206@verizon.net> (raw)
In-Reply-To: <Pine.LNX.4.63.0702141208020.22628@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin wrote:
> Last time I checked, the text files never had lines longer than 200 
> characters (I chose this intentionally large). So, it might be a good 
> heuristic to check the maximal line length, and refuse to believe that 
> it's text once a certain (configurable) threshold is reached.
>
> Ciao,
> Dsch
Unfortunately, on my program we have folks using text files with single 
lines over 60,000 characters long, these are data files. Think for 
example of a comma or tab separated data file saved from a spreadsheet. 
In this case, the files are pure ascii. So, the line length could be 
something else to take into account, but is not decisive by itself.

To recap, we have the following various suggestions to determine textness:

1) ratio of ascii to non-ascii characters, possibly weighting some chars 
more than others
2) line length
3) existence of a null (\0)
4) file name globbing
5) roundtrip ( lf(crlf(file) ) == file

I don't think any one suggestion is completely adequate for all uses, 
all need to be available, somehow configurable. This suggests to me a 
core.AutoCRLFstrategy variable that is a comma separated list of methods 
to use (set to a reasonable default of course that does not cause 
runtime headaches on Unix): a file would be deemed binary unless all 
listed methods declare the file as text (with an empty list disabling 
AutoCRLF detection).

Mark

  reply	other threads:[~2007-02-14 14:26 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-11 23:13 mingw, windows, crlf/lf, and git Mark Levedahl
2007-02-11 23:34 ` Johannes Schindelin
2007-02-12  0:46   ` Jakub Narebski
2007-02-12  2:36     ` Mark Levedahl
2007-02-12 11:21     ` Johannes Schindelin
2007-02-12  0:14 ` Robin Rosenberg
2007-02-12  2:37   ` Mark Levedahl
2007-02-12  4:24 ` Theodore Tso
2007-02-12  7:28   ` David Lang
2007-02-12 11:36   ` Johannes Schindelin
2007-02-12 17:20   ` Linus Torvalds
2007-02-12 22:37     ` Johannes Schindelin
2007-02-12 23:02       ` Linus Torvalds
2007-02-12 22:54     ` Junio C Hamano
2007-02-12 23:02       ` Junio C Hamano
2007-02-12 23:09       ` Linus Torvalds
2007-02-12 23:25         ` Linus Torvalds
2007-02-12 23:23           ` David Lang
2007-02-12 23:24       ` Johannes Schindelin
2007-02-12 23:42         ` Junio C Hamano
2007-02-12 23:46           ` David Lang
2007-02-12 23:50           ` Johannes Schindelin
2007-02-13  0:59             ` Mark Levedahl
2007-02-13  1:06               ` Johannes Schindelin
2007-02-13  1:13                 ` Shawn O. Pearce
2007-02-13  1:20                   ` David Lang
2007-02-13  1:36                 ` Mark Levedahl
2007-02-13  5:18               ` Jeff King
2007-02-13  0:32         ` Mark Levedahl
2007-02-13  2:02 ` Junio C Hamano
2007-02-13  3:21   ` Mark Levedahl
2007-02-13  6:05     ` Junio C Hamano
2007-02-13  3:32 ` Alexander Litvinov
2007-02-13 10:06   ` Johannes Schindelin
2007-02-13 12:16     ` Alexander Litvinov
2007-02-13 12:37       ` Johannes Schindelin
2007-02-13 19:36       ` Mark Levedahl
2007-02-13 20:32         ` Linus Torvalds
2007-02-14  1:42           ` Mark Levedahl
2007-02-14  2:16             ` Linus Torvalds
2007-02-13 21:58         ` Robin Rosenberg
2007-02-14  1:18           ` Mark Levedahl
2007-02-13 16:52     ` Linus Torvalds
2007-02-13 17:23       ` Linus Torvalds
2007-02-13 17:23         ` Linus Torvalds
2007-02-13 18:00         ` Junio C Hamano
2007-02-13 19:07           ` Linus Torvalds
2007-02-13 20:42             ` Sam Ravnborg
2007-02-13 21:08               ` Nicolas Pitre
2007-02-13 23:19               ` David Lang
2007-02-13 23:28               ` Linus Torvalds
2007-02-14  8:41                 ` Sam Ravnborg
2007-02-14 16:28                   ` Linus Torvalds
2007-02-14 16:47                     ` Sam Ravnborg
2007-02-14  3:47               ` Alexander Litvinov
2007-02-14  5:16             ` Junio C Hamano
2007-02-14  5:36               ` Linus Torvalds
2007-02-14 11:10                 ` Johannes Schindelin
2007-02-14 14:26                   ` Mark Levedahl [this message]
2007-02-14 15:51                     ` Linus Torvalds
2007-02-14 16:39                       ` Junio C Hamano
2007-02-14 17:01                         ` Linus Torvalds
2007-02-14 17:29                           ` Junio C Hamano
2007-02-14 17:43                             ` Linus Torvalds
2007-02-14 15:56                     ` Johannes Schindelin
2007-02-14 16:23                       ` Linus Torvalds
2007-02-14 17:28                       ` Mark Levedahl
2007-02-14 18:17                         ` Robin Rosenberg
2007-02-14 18:31                           ` Linus Torvalds
2007-02-14 20:24                             ` Robin Rosenberg
2007-02-14 15:44                   ` Linus Torvalds
2007-02-14 15:53                     ` Johannes Schindelin
2007-02-14 11:36             ` Alexander Litvinov
2007-02-14 16:37               ` Linus Torvalds
2007-02-14 17:18                 ` Junio C Hamano
2007-02-14 16:16             ` Johannes Sixt
2007-02-14 16:53               ` Linus Torvalds
2007-02-13 18:05         ` Johannes Schindelin
2007-02-13 17:25       ` Nicolas Pitre
2007-02-13 18:04       ` Johannes Schindelin
2007-02-13 18:11         ` Junio C Hamano
2007-02-13 18:39         ` Linus Torvalds
2007-02-13 18:42           ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45D31C0E.2040206@verizon.net \
    --to=mdl123@verizon.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=litvinov2004@gmail.com \
    --cc=mlevedahl@verizon.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).