git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] ls-files: Add eol diagnostics
@ 2015-10-31 10:12 Torsten Bögershausen
  2015-10-31 10:25 ` Matthieu Moy
  0 siblings, 1 reply; 7+ messages in thread
From: Torsten Bögershausen @ 2015-10-31 10:12 UTC (permalink / raw)
  To: git; +Cc: tboegi

When working in a cross-platform environment, a user wants to
check if text files are stored normalized in the repository and if
.gitattributes are set appropriately.

Make it possible to let Git show the line endings in the cache and
in the working tree.

Files which are treated as binary by Git, are shown as "binary",
otherwise the end of lines ("eolinfo") are shown:

"text-no-eol"  text file without any EOL (or empty files)
"text-lf"      text file with LF
"text-crlf"    text file with CRLF
"text-crlf-lf" text file with mixed line endings.

git ls-files --eol gives an output like this:

ca:text-no-eol   wt:text-no-eol   t/t5100/empty
ca:binary        wt:binary        t/test-binary-2.png
ca:text-lf       wt:text-lf       t/t5100/rfc2047-info-0007
ca:text-lf       wt:text-crlf     doit.bat
ca:text-crlf-lf  wt:text-crlf-lf  locale/XX.po

Note that the output is meant to be human-readable and may change.
When e.g. a file is deleted from the working tree and another file
is a soft link, the output may look like this:

ca:text-lf       wt:text-lf       Documentation/RelNotes/2.6.1.txt
ca:text-lf       Documentation/RelNotes/2.7.0.txt
RelNotes

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
The main motivation of this feature is human inspection, not shell processing.

For that reason I couldn't motivate to create a new command like git check-eol
or git get-eol.

Test cases are missing, they will be in V3,
when we know that this concept makes sense at all.

Changes since v1:
- Don't analyze the contents of softlinks
- Have only one option, --eol.
 --eol works together with -s, -o or other, but there may be combintaion which
 don't make sense.
- Updated the documentation


Documentation/git-ls-files.txt | 22 ++++++++++++++++++++++
builtin/ls-files.c             | 22 ++++++++++++++++++++++
convert.c                      | 40 ++++++++++++++++++++++++++++++++++++++++
convert.h                      |  2 ++
4 files changed, 86 insertions(+)

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index e26f01f..4b02912 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -12,6 +12,7 @@ SYNOPSIS
'git ls-files' [-z] [-t] [-v]
		(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
		(-[c|d|o|i|s|u|k|m])*
+		[--eol]
		[-x <pattern>|--exclude=<pattern>]
		[-X <file>|--exclude-from=<file>]
		[--exclude-per-directory=<file>]
@@ -147,6 +148,15 @@ a space) at the start of each line:
	possible for manual inspection; the exact format may change at
	any time.

+--eol::
+	Show line endings (eolinfo) of files for manual inspection:
+	"text-no-eol", "text-lf", "text-crlf", "text-crlf-lf" or "binary".
+	Both the cached content and the content in the working tree are shown for
+	for regular files, if available.
+	
+	Note:
+  `git ls-files --eol | grep ca:text-crlf` will show not-normailzed text files
+
\--::
	Do not interpret any more arguments as options.

@@ -161,6 +171,18 @@ which case it outputs:

        [<tag> ]<mode> <object> <stage> <file>

+'git ls-files --eol' will show
+        ca:<eolinfo> wt:<eolinfo> <file>
+
+'git ls-files --eol -d' will show
+        ca:<eolinfo>  <file>
+
+'git ls-files --eol -o' will show
+        wt:<eolinfo>  <file>
+
+'git ls-files --eol -s' will show
+[<tag> ]<mode> <object> <stage> ca:<eolinfo> wt:<eolinfo> <file>
+
'git ls-files --unmerged' and 'git ls-files --stage' can be used to examine
detailed information on unmerged paths.

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b6a7cb0..bdd0fd7 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -27,6 +27,7 @@ static int show_killed;
static int show_valid_bit;
static int line_terminator = '\n';
static int debug_mode;
+static int show_eol;

static const char *prefix;
static int max_prefix_len;
@@ -47,6 +48,21 @@ static const char *tag_modified = "";
static const char *tag_skip_worktree = "";
static const char *tag_resolve_undo = "";

+static void write_wt_convert_stats_ascii(const char *path)
+{
+	if (show_eol && !show_deleted) {
+		struct stat st;
+		if (!lstat(path, &st) && (S_ISREG(st.st_mode)))
+			printf("wt:%-13s ", get_wt_convert_stats_ascii(path));
+	}
+}
+
+static void write_ca_convert_stats_ascii(const struct cache_entry *ce)
+{
+	if (show_eol && S_ISREG(ce->ce_mode))
+		printf("ca:%-13s ", get_cached_convert_stats_ascii(ce->name));
+}
+
static void write_name(const char *name)
{
	/*
@@ -68,6 +84,7 @@ static void show_dir_entry(const char *tag, struct dir_entry *ent)
		return;

	fputs(tag, stdout);
+	write_wt_convert_stats_ascii(ent->name);
	write_name(ent->name);
}

@@ -170,6 +187,9 @@ static void show_ce_entry(const char *tag, const struct cache_entry *ce)
		       find_unique_abbrev(ce->sha1,abbrev),
		       ce_stage(ce));
	}
+	write_ca_convert_stats_ascii(ce);
+	write_wt_convert_stats_ascii(ce->name);
+	
	write_name(ce->name);
	if (debug_mode) {
		const struct stat_data *sd = &ce->ce_stat_data;
@@ -206,6 +226,7 @@ static void show_ru_info(void)
			printf("%s%06o %s %d\t", tag_resolve_undo, ui->mode[i],
			       find_unique_abbrev(ui->sha1[i], abbrev),
			       i + 1);
+			write_wt_convert_stats_ascii(path);
			write_name(path);
		}
	}
@@ -433,6 +454,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
		OPT_BIT(0, "directory", &dir.flags,
			N_("show 'other' directories' names only"),
			DIR_SHOW_OTHER_DIRECTORIES),
+		OPT_BOOL(0, "eol", &show_eol, N_("show line endings of files")),
		OPT_NEGBIT(0, "empty-directory", &dir.flags,
			N_("don't show empty directories"),
			DIR_HIDE_EMPTY_DIRECTORIES),
diff --git a/convert.c b/convert.c
index 814e814..291f869 100644
--- a/convert.c
+++ b/convert.c
@@ -95,6 +95,46 @@ static int is_binary(unsigned long size, struct text_stat *stats)
	return 0;
}

+
+const char *gather_convert_stats_ascii(const char *data, unsigned long size)
+{
+	struct text_stat stats;
+	if (!data)
+		return "";
+	gather_stats(data, size, &stats);
+	if (is_binary(size, &stats) || stats.cr != stats.crlf)
+		return("binary");
+	else if (stats.crlf && (stats.crlf == stats.lf))
+		return("text-crlf");
+	else if (stats.crlf && stats.lf)
+		return("text-crlf-lf");
+	else if (stats.lf)
+		return("text-lf");
+	else
+		return("text-no-eol");
+}
+
+const char *get_cached_convert_stats_ascii(const char *path)
+{
+	const char *ret;
+	unsigned long sz;
+	void *data = read_blob_data_from_cache(path, &sz);
+	ret = gather_convert_stats_ascii(data, sz);
+	free(data);
+	return ret;
+}
+
+const char *get_wt_convert_stats_ascii(const char *path)
+{
+	const char *ret = "";
+	struct strbuf sb = STRBUF_INIT;
+	if (strbuf_read_file(&sb, path, 0) < 0)
+		return "error";
+	ret = gather_convert_stats_ascii(sb.buf, sb.len);
+	strbuf_release(&sb);
+	return ret;
+}
+
static enum eol output_eol(enum crlf_action crlf_action)
{
	switch (crlf_action) {
diff --git a/convert.h b/convert.h
index d9d853c..a1671e9 100644
--- a/convert.h
+++ b/convert.h
@@ -32,6 +32,8 @@ enum eol {
};

extern enum eol core_eol;
+extern const char *get_cached_convert_stats_ascii(const char *path);
+extern const char *get_wt_convert_stats_ascii(const char *path);

/* returns 1 if *dst was used */
extern int convert_to_git(const char *path, const char *src, size_t len,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ls-files: Add eol diagnostics
  2015-10-31 10:12 [PATCH v2] ls-files: Add eol diagnostics Torsten Bögershausen
@ 2015-10-31 10:25 ` Matthieu Moy
  2015-11-01  8:43   ` Sebastian Schuberth
  2015-11-01 18:22   ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Matthieu Moy @ 2015-10-31 10:25 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

Torsten Bögershausen <tboegi@web.de> writes:

> ca:text-no-eol   wt:text-no-eol   t/t5100/empty
> ca:binary        wt:binary        t/test-binary-2.png
> ca:text-lf       wt:text-lf       t/t5100/rfc2047-info-0007
> ca:text-lf       wt:text-crlf     doit.bat
> ca:text-crlf-lf  wt:text-crlf-lf  locale/XX.po

I would spell the first "in" or "idx" (for "index"), not "ca" (for
"cache"). I think we avoid talking about "the cache" these days even
though the doc sometimes says "cached in the index" (i.e. use "cache" as
a verb, not a noun).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ls-files: Add eol diagnostics
  2015-10-31 10:25 ` Matthieu Moy
@ 2015-11-01  8:43   ` Sebastian Schuberth
  2015-11-01 18:40     ` Matthieu Moy
  2015-11-01 18:22   ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Sebastian Schuberth @ 2015-11-01  8:43 UTC (permalink / raw)
  To: Matthieu Moy, Torsten Bögershausen; +Cc: git

On 31.10.2015 11:25, Matthieu Moy wrote:

>> ca:text-no-eol   wt:text-no-eol   t/t5100/empty
>> ca:binary        wt:binary        t/test-binary-2.png
>> ca:text-lf       wt:text-lf       t/t5100/rfc2047-info-0007
>> ca:text-lf       wt:text-crlf     doit.bat
>> ca:text-crlf-lf  wt:text-crlf-lf  locale/XX.po
> 
> I would spell the first "in" or "idx" (for "index"), not "ca" (for
> "cache"). I think we avoid talking about "the cache" these days even
> though the doc sometimes says "cached in the index" (i.e. use "cache" as
> a verb, not a noun).

Good point, I'd prefer "idx" over ca", too.

However, the commit message says "to check if text files are stored normalized in the *repository*", yet the output refers to the index / cache. Is there a (potential) difference between line endings in the index and repo? AFAIK there is not. Any I find it a bit confusing to refer to the index where, as e.g. for a freshly cloned repo the index should be empty, yet you do have specific line endings in the repo.

Long story short, how about consistently talking about line endings in the repo, and also using "repo" instead of "ca" here?

-- 
Sebastian Schuberth

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ls-files: Add eol diagnostics
  2015-10-31 10:25 ` Matthieu Moy
  2015-11-01  8:43   ` Sebastian Schuberth
@ 2015-11-01 18:22   ` Junio C Hamano
  2015-11-01 18:41     ` Matthieu Moy
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2015-11-01 18:22 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Torsten Bögershausen, git

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

> Torsten Bögershausen <tboegi@web.de> writes:
>
>> ca:text-no-eol   wt:text-no-eol   t/t5100/empty
>> ca:binary        wt:binary        t/test-binary-2.png
>> ca:text-lf       wt:text-lf       t/t5100/rfc2047-info-0007
>> ca:text-lf       wt:text-crlf     doit.bat
>> ca:text-crlf-lf  wt:text-crlf-lf  locale/XX.po
>
> I would spell the first "in" or "idx" (for "index"), not "ca" (for
> "cache"). I think we avoid talking about "the cache" these days even
> though the doc sometimes says "cached in the index" (i.e. use "cache" as
> a verb, not a noun).

i/ and w/ have been used to denote the "i"ndex and "w"orktree
versions for the past 7 years with diff.mnemonicprefix option,
which you may want to match.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ls-files: Add eol diagnostics
  2015-11-01  8:43   ` Sebastian Schuberth
@ 2015-11-01 18:40     ` Matthieu Moy
  2015-11-01 20:17       ` Sebastian Schuberth
  0 siblings, 1 reply; 7+ messages in thread
From: Matthieu Moy @ 2015-11-01 18:40 UTC (permalink / raw)
  To: Sebastian Schuberth; +Cc: Torsten Bögershausen, git

Sebastian Schuberth <sschuberth@gmail.com> writes:

> However, the commit message says "to check if text files are stored
> normalized in the *repository*", yet the output refers to the index /
> cache. Is there a (potential) difference between line endings in the
> index and repo?

There is when you have staged changes that are not commited yet.

> Any I find it a bit confusing to refer to the index where, as e.g. for
> a freshly cloned repo the index should be empty,

No it is not. The index is a complete snapshot of your working tree.
When you have no uncommited staged changes, the index contains all files
that are in HEAD. Most commands show you _changes_ in the index (wrt
HEAD or wrt the working tree), but the index itself contain all files.

> Long story short, how about consistently talking about line endings in
> the repo, and also using "repo" instead of "ca" here?

I don't think this is a good idea. One typical use-case for the feature
would probably be:

1) wtf, there's something wrong with my line endings, let's fix this.

2) tweak .gitattributes, try to get everything right

3) prepare a commit to apply the new settings to the repository, play
   with "git add", "dos2unix" and friends.

4) check that it's OK

5) "git commit"

At stage 4), you really want to see the content of the index, because
your HEAD is still broken.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ls-files: Add eol diagnostics
  2015-11-01 18:22   ` Junio C Hamano
@ 2015-11-01 18:41     ` Matthieu Moy
  0 siblings, 0 replies; 7+ messages in thread
From: Matthieu Moy @ 2015-11-01 18:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Torsten Bögershausen, git

Junio C Hamano <gitster@pobox.com> writes:

> i/ and w/ have been used to denote the "i"ndex and "w"orktree
> versions for the past 7 years with diff.mnemonicprefix option,
> which you may want to match.

Indeed.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] ls-files: Add eol diagnostics
  2015-11-01 18:40     ` Matthieu Moy
@ 2015-11-01 20:17       ` Sebastian Schuberth
  0 siblings, 0 replies; 7+ messages in thread
From: Sebastian Schuberth @ 2015-11-01 20:17 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Torsten Bögershausen, Git Mailing List

On Sun, Nov 1, 2015 at 7:40 PM, Matthieu Moy
<Matthieu.Moy@grenoble-inp.fr> wrote:

>> Any I find it a bit confusing to refer to the index where, as e.g. for
>> a freshly cloned repo the index should be empty,
>
> No it is not. The index is a complete snapshot of your working tree.
> When you have no uncommited staged changes, the index contains all files
> that are in HEAD. Most commands show you _changes_ in the index (wrt
> HEAD or wrt the working tree), but the index itself contain all files.

Thanks for the info.

> At stage 4), you really want to see the content of the index, because
> your HEAD is still broken.

Ok, I'm convinced. Thanks again!

-- 
Sebastian Schuberth

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-11-01 20:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-31 10:12 [PATCH v2] ls-files: Add eol diagnostics Torsten Bögershausen
2015-10-31 10:25 ` Matthieu Moy
2015-11-01  8:43   ` Sebastian Schuberth
2015-11-01 18:40     ` Matthieu Moy
2015-11-01 20:17       ` Sebastian Schuberth
2015-11-01 18:22   ` Junio C Hamano
2015-11-01 18:41     ` Matthieu Moy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).