git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] ws: add new tab-between-non-ws check
@ 2026-01-07  1:30 Adrian Ratiu
  2026-01-07  2:12 ` Junio C Hamano
  2026-01-07 17:33 ` Johannes Sixt
  0 siblings, 2 replies; 8+ messages in thread
From: Adrian Ratiu @ 2026-01-07  1:30 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt, Emily Shaffer, Adrian Ratiu

This adds a new check to detect HT in the middle of sentences that
should have been a SP, as suggested by Junio in
https://public-inbox.org/git/xmqqy0mwsedz.fsf@gitster.g/

The check is a bit complex because we want to detect places where
a SP was intended (HT can expand to more than one display column),
so we need to count both the display columns (col) and the string
character columns (i) to determine if a HT looks identical to a SP
or can cause confusion.

Highlighting support for tools like git diff/show/log is added, as
well as git apply --whitespace=fix capability.

The middle section of the line used to be assumed non-highlighted,
which is obviously not true anymore, so we split its logic into a
separate function named emit_middle_section().

The new check is enabled for Documentation/**/*.adoc, where these
kinds of mistakes were seen in practice. It can also be enabled in
other locations where it can be useful, by adding to the relevant
attributes file.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
---
Changes in v2:
* Highlight the new error in ws_check_emit_1() output generation so
  tools like color git diff can show the errors to the user (Junio)
* Expanded ws_fix_copy to detect and fix this new error (Junio)
* Added more test cases for the above new functionality (Junio)
* Improved commit message (Junio)

Based on the latest master branch.
Pushed to GitHub: https://github.com/10ne1/git/tree/dev/aratiu/whitespace-new-test-v2
Successful CI run: https://github.com/10ne1/git/actions/runs/20766920987
---
 .gitattributes             |   2 +-
 t/t4015-diff-whitespace.sh | 143 +++++++++++++++++++++++++++++
 ws.c                       | 179 ++++++++++++++++++++++++++++++++++---
 ws.h                       |   1 +
 4 files changed, 313 insertions(+), 12 deletions(-)

diff --git a/.gitattributes b/.gitattributes
index 700743c3f5..d3c40a038b 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -7,7 +7,7 @@
 *.py text eol=lf diff=python
 *.bat text eol=crlf
 CODE_OF_CONDUCT.md -whitespace
-/Documentation/**/*.adoc text eol=lf whitespace=trail,space,incomplete
+/Documentation/**/*.adoc text eol=lf whitespace=trail,space,incomplete,tab-between-non-ws
 /command-list.txt text eol=lf
 /GIT-VERSION-GEN text eol=lf
 /mergetools/* text eol=lf
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 3c8eb02e4f..f5b6ceeed9 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -2440,4 +2440,147 @@ test_expect_success 'combine --ignore-blank-lines with --function-context 2' '
 	test_cmp expect actual
 '
 
+test_expect_success 'check tab between non-whitespace (tab-between-non-ws: off)' '
+	git config core.whitespace "-tab-between-non-ws" &&
+
+	printf "1234567\tb" >x &&
+	git add x &&
+	git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	! test_grep "<GREEN>1234567<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
+	test_grep "<GREEN>1234567	b<RESET>" actual &&
+
+	# should apply without error because tab-between-non-ws is off
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	git apply --whitespace=error patch.diff
+'
+
+test_expect_success 'check tab between non-whitespace at tab stop (tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,tabwidth=8" &&
+	printf "1234567\tb" >x &&
+	git add x &&
+	test_must_fail git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	test_grep "<GREEN>1234567<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
+	! test_grep "<GREEN>1234567	b<RESET>" actual &&
+
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	test_must_fail git apply --whitespace=error patch.diff &&
+	git apply --whitespace=fix patch.diff &&
+	printf "1234567 b" >expected &&
+	test_cmp expected x
+'
+
+test_expect_success 'check tab between non-whitespace not at tab stop (tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,tabwidth=8" &&
+	printf "a\tb" >x &&
+	git add x &&
+	git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	! test_grep "<GREEN>a<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
+	test_grep "<GREEN>a	b<RESET>" actual &&
+
+	# should apply without error because the input is valid
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	git apply --whitespace=error patch.diff
+'
+
+test_expect_success 'check tab between non-whitespace with tabwidth=4 (tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,tabwidth=4" &&
+	printf "123\tb" >x &&
+	git add x &&
+	test_must_fail git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	test_grep "<GREEN>123<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
+	! test_grep "<GREEN>123	b<RESET>" actual &&
+
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	test_must_fail git apply --whitespace=error patch.diff &&
+	git apply --whitespace=fix patch.diff &&
+	printf "123 b" >expected &&
+	test_cmp expected x
+'
+
+test_expect_success 'check tab between non-whitespace with tabwidth=4 (tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,tabwidth=4" &&
+	printf "1234\tb" >x &&
+	git add x &&
+	git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	! test_grep "<GREEN>1234<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
+	test_grep "<GREEN>1234	b<RESET>" actual &&
+
+	# should apply without error because tab is at tab stop
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	git apply --whitespace=error patch.diff
+'
+
+test_expect_success 'check multiple tabs with one error (tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,tabwidth=8" &&
+	printf "a\t1234567\tb" >x &&
+	git add x &&
+	test_must_fail git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	test_grep "<GREEN>a	1234567<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
+	! test_grep "<GREEN>a	1234567	b<RESET>" actual &&
+
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	test_must_fail git apply --whitespace=error patch.diff &&
+	git apply --whitespace=fix patch.diff &&
+	printf "a\t1234567 b" >expected &&
+	test_cmp expected x
+'
+
+test_expect_success 'check tab at beginning of line (tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,tabwidth=8" &&
+	printf "\ta" >x &&
+	git add x &&
+	git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	! test_grep "<BLUE>	" actual &&
+	test_grep "<GREEN>+<RESET>	<GREEN>a<RESET>" actual &&
+
+	# should apply without error because tab is a valid indentation
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	git apply --whitespace=error patch.diff
+'
+
+test_expect_success 'check tab at end of line(tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,-trailing-space,tabwidth=8" &&
+	printf "a\t" >x &&
+	git add x &&
+	git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	! test_grep "<GREEN>a<RESET><BLUE>	" actual &&
+	test_grep "<GREEN>a	<RESET>" actual &&
+
+	# should apply without error because tab is caught by another check (trailing-space)
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	git apply --whitespace=error patch.diff
+'
+
 test_done
diff --git a/ws.c b/ws.c
index 6cc2466c0c..633bc69418 100644
--- a/ws.c
+++ b/ws.c
@@ -26,6 +26,7 @@ static struct whitespace_rule {
 	{ "blank-at-eol", WS_BLANK_AT_EOL, 0 },
 	{ "blank-at-eof", WS_BLANK_AT_EOF, 0 },
 	{ "tab-in-indent", WS_TAB_IN_INDENT, 0, 1 },
+	{ "tab-between-non-ws", WS_TAB_BETWEEN_NON_WS, 0 },
 	{ "incomplete-line", WS_INCOMPLETE_LINE, 0, 0 },
 };
 
@@ -140,6 +141,11 @@ char *whitespace_error_string(unsigned ws)
 			strbuf_addstr(&err, ", ");
 		strbuf_addstr(&err, "tab in indent");
 	}
+	if (ws & WS_TAB_BETWEEN_NON_WS) {
+		if (err.len)
+			strbuf_addstr(&err, ", ");
+		strbuf_addstr(&err, "tab between non-whitespace characters");
+	}
 	if (ws & WS_INCOMPLETE_LINE) {
 		if (err.len)
 			strbuf_addstr(&err, ", ");
@@ -148,6 +154,80 @@ char *whitespace_error_string(unsigned ws)
 	return strbuf_detach(&err, NULL);
 }
 
+static void emit_literal(const char *line, int len, FILE *stream)
+{
+	fwrite(line, len, 1, stream);
+}
+
+static void highlight_tabs(const char *line, int len,
+			   int written, int trailing_whitespace,
+			   unsigned ws_rule, FILE *stream,
+			   const char *set, const char *reset, const char *ws)
+{
+	int tabwidth = ws_tab_width(ws_rule);
+	int start = written, col = 0;
+
+	if (!tabwidth)
+		BUG("a known tabwidth is required by WS_TAB_BETWEEN_NON_WS");
+
+	/*
+	 * Calculate the visual column position (col) up to 'written'.
+	 * Tabs expand based on 'tabwidth', so for example the 5th character in a
+	 * string might be at the 12th visual column, if the line contains tabs.
+	 */
+	for (int i = 0; i < written; i++) {
+		if (line[i] == '\t')
+			col += tabwidth - (col % tabwidth);
+		else
+			col++;
+	}
+
+	/* Iterate through the section of the line that needs potential highlighting. */
+	for (int i = written; i < trailing_whitespace; i++) {
+		if (line[i] == '\t') {
+			if (i > 0 && i < len - 1 &&
+			    !isspace(line[i - 1]) && !isspace(line[i + 1]) &&
+			    (col % tabwidth) == (tabwidth - 1)) {
+				/* Print unwritten content before the highlighted tab. */
+				if (start < i)
+					emit_literal(line + start, i - start, stream);
+
+				/* Apply highlight for the tab. */
+				fputs(reset, stream);
+				fputs(ws, stream);
+				fputc('\t', stream);
+				fputs(reset, stream);
+				fputs(set, stream);
+				start = i + 1;
+			}
+			col += tabwidth - (col % tabwidth);
+		} else {
+			col++; /* non-tab character. */
+		}
+	}
+
+	/* Print any remaining content in the current segment after the last highlight. */
+	if (start < trailing_whitespace)
+		emit_literal(line + start, trailing_whitespace - start, stream);
+}
+
+static void emit_middle_section(const char *line, int len,
+			       int written, int trailing_whitespace,
+			       unsigned ws_rule, unsigned result,
+			       FILE *stream, const char *set,
+			       const char *reset, const char *ws)
+{
+	fputs(set, stream);
+
+	if (result & WS_TAB_BETWEEN_NON_WS)
+		highlight_tabs(line, len, written, trailing_whitespace,
+			       ws_rule, stream, set, reset, ws);
+	else
+		emit_literal(line + written, trailing_whitespace - written, stream);
+
+	fputs(reset, stream);
+}
+
 /* If stream is non-NULL, emits the line after checking. */
 static unsigned ws_check_emit_1(const char *line, int len, unsigned ws_rule,
 				FILE *stream, const char *set,
@@ -228,19 +308,41 @@ static unsigned ws_check_emit_1(const char *line, int len, unsigned ws_rule,
 		written = i;
 	}
 
-	if (stream) {
+	if (ws_rule & WS_TAB_BETWEEN_NON_WS) {
 		/*
-		 * Now the rest of the line starts at "written".
-		 * The non-highlighted part ends at "trailing_whitespace".
+		 * A tab surrounded by non-whitespace characters is a typo candidate
+		 * (a space might have been intended). This checks for a tab that
+		 * would be expanded to a single space, which is when it appears at
+		 * a column that is one less than a multiple of the tabwidth.
 		 */
-
-		/* Emit non-highlighted (middle) segment. */
-		if (trailing_whitespace - written > 0) {
-			fputs(set, stream);
-			fwrite(line + written,
-			    trailing_whitespace - written, 1, stream);
-			fputs(reset, stream);
+		int col = 0;
+		int tabwidth = ws_tab_width(ws_rule);
+
+		if (!tabwidth)
+			BUG("a known tabwidth is required by WS_TAB_BETWEEN_NON_WS");
+
+		for (i = 0; i < len; i++) {
+			if (line[i] == '\t') {
+				if (i > 0 && i < len - 1 &&
+				    !isspace(line[i - 1]) && !isspace(line[i + 1]) &&
+				    (col % tabwidth) == (tabwidth - 1))
+					result |= WS_TAB_BETWEEN_NON_WS;
+				col += tabwidth - (col % tabwidth);
+			} else {
+				col++;
+			}
 		}
+	}
+
+	if (stream) {
+		/*
+		 * The middle section of the line starts at "written" and ends at
+		 * "trailing_whitespace".
+		 */
+		if (trailing_whitespace - written > 0)
+			emit_middle_section(line, len, written, trailing_whitespace,
+					    ws_rule, result,
+					    stream, set, reset, ws);
 
 		/* Highlight errors in trailing whitespace. */
 		if (trailing_whitespace != len) {
@@ -299,6 +401,7 @@ void ws_fix_copy(struct strbuf *dst, const char *src, int len, unsigned ws_rule,
 	int last_tab_in_indent = -1;
 	int last_space_in_indent = -1;
 	int need_fix_leading_space = 0;
+	size_t pre_indent_len = dst->len;
 
 	/*
 	 * Remembering that we need to add '\n' at the end
@@ -401,7 +504,61 @@ void ws_fix_copy(struct strbuf *dst, const char *src, int len, unsigned ws_rule,
 		fixed = 1;
 	}
 
-	strbuf_add(dst, src, len);
+	if (!(ws_rule & WS_TAB_BETWEEN_NON_WS)) {
+		/*
+		 * Middle section does not need fixing, so just append src to
+		 * dst because it already contains the previous ws fixes.
+		 */
+		strbuf_add(dst, src, len);
+	} else {
+		/* Fix middle section HTs which expand to a single display space */
+		const char *indent_part = dst->buf + pre_indent_len;
+		int indent_len = dst->len - pre_indent_len;
+		int tabwidth = ws_tab_width(ws_rule);
+		int col = 0;
+
+		if (!tabwidth)
+			BUG("a known tabwidth is required by WS_TAB_BETWEEN_NON_WS");
+
+		/*
+		 * Compute indentation display column length because it might not
+		 * end at a fixed tab stop position.
+		 */
+		for (i = 0; i < indent_len; i++) {
+			if (indent_part[i] == '\t')
+				col += tabwidth - (col % tabwidth);
+			else
+				col++;
+		}
+
+		/* Go through all chars in src, fix if necessary, and append to dst */
+		for (i = 0; i < len; i++) {
+			char prev_ch = i > 0 ? src[i - 1] :
+				(indent_len > 0 ? indent_part[indent_len - 1] : '\0');
+			char next_ch = i < len - 1 ? src[i + 1] : '\0';
+			bool needs_fixing = prev_ch && next_ch &&
+				!isspace(prev_ch) && !isspace(next_ch) &&
+				(col % tabwidth) == (tabwidth - 1);
+
+			/* non HT chars are added as is */
+			if (src[i] != '\t') {
+				strbuf_addch(dst, src[i]);
+				col++;
+				continue;
+			}
+
+			/* fix HT or add them as is */
+			if (needs_fixing) {
+				strbuf_addch(dst, ' ');
+				col++;
+				fixed = 1;
+			} else {
+				strbuf_addch(dst, '\t');
+				col += tabwidth - (col % tabwidth);
+			}
+		}
+	}
+
 	if (add_cr_to_tail)
 		strbuf_addch(dst, '\r');
 	if (add_nl_to_tail)
diff --git a/ws.h b/ws.h
index 06d5cb73f8..35475fd320 100644
--- a/ws.h
+++ b/ws.h
@@ -16,6 +16,7 @@ struct strbuf;
 #define WS_BLANK_AT_EOF         (1<<10)
 #define WS_TAB_IN_INDENT        (1<<11)
 #define WS_INCOMPLETE_LINE      (1<<12)
+#define WS_TAB_BETWEEN_NON_WS   (1<<13)
 
 #define WS_TRAILING_SPACE       (WS_BLANK_AT_EOL|WS_BLANK_AT_EOF)
 #define WS_DEFAULT_RULE (WS_TRAILING_SPACE|WS_SPACE_BEFORE_TAB|8)
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ws: add new tab-between-non-ws check
  2026-01-07  1:30 [PATCH v2] ws: add new tab-between-non-ws check Adrian Ratiu
@ 2026-01-07  2:12 ` Junio C Hamano
  2026-01-07 11:34   ` Adrian Ratiu
  2026-01-07 17:33 ` Johannes Sixt
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2026-01-07  2:12 UTC (permalink / raw)
  To: Adrian Ratiu; +Cc: git, Patrick Steinhardt, Emily Shaffer

Adrian Ratiu <adrian.ratiu@collabora.com> writes:

> The check is a bit complex because we want to detect places where
> a SP was intended (HT can expand to more than one display column),
> so we need to count both the display columns (col) and the string
> character columns (i) to determine if a HT looks identical to a SP
> or can cause confusion.
>
> +/....adoc text eol=lf whitespace=trail,space,incomplete,tab-between-non-ws

The name of the whitespace rule does not quite match what we want to
catch.  Can somebody find a phrasing than "between non-ws" that
conveys our intent better?  We want to catch a tab that is used by
mistsake when the writer would have used a space, and "between
non-ws" is one of the heuristics (another is "it is at the 7th
column to make it indistinguishable from a space") the code uses to
tell if a tab is such a mistaken tab.   "tab-instead-of-space"?
"tab-in-place-of-space"?  "tab-that-should-have-been-a-space"?

The last one is horrible and not a serious suggestion, of course.

> +test_expect_success 'check tab between non-whitespace (tab-between-non-ws: off)' '
> +	git config core.whitespace "-tab-between-non-ws" &&
> +
> +	printf "1234567\tb" >x &&

I notice all these printf create incomplete lines.  It is true that
the detection of a tab that is used when it should have been a space
should work even on an incomplete line, but using an incomplete
line, which is of course rather unusual, for these tests gives a
false impression that somehow this requires an incomplete line to
trigger, which is not what we want to give.

	printf "1234567\tb\n" > x &&

or something, perhaps?  I dunno.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ws: add new tab-between-non-ws check
  2026-01-07  2:12 ` Junio C Hamano
@ 2026-01-07 11:34   ` Adrian Ratiu
  0 siblings, 0 replies; 8+ messages in thread
From: Adrian Ratiu @ 2026-01-07 11:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Emily Shaffer

On Wed, 07 Jan 2026, Junio C Hamano <gitster@pobox.com> wrote:
> Adrian Ratiu <adrian.ratiu@collabora.com> writes:
>
>> The check is a bit complex because we want to detect places where
>> a SP was intended (HT can expand to more than one display column),
>> so we need to count both the display columns (col) and the string
>> character columns (i) to determine if a HT looks identical to a SP
>> or can cause confusion.
>>
>> +/....adoc text eol=lf whitespace=trail,space,incomplete,tab-between-non-ws
>
> The name of the whitespace rule does not quite match what we want to
> catch.  Can somebody find a phrasing than "between non-ws" that
> conveys our intent better?  We want to catch a tab that is used by
> mistsake when the writer would have used a space, and "between
> non-ws" is one of the heuristics (another is "it is at the 7th
> column to make it indistinguishable from a space") the code uses to
> tell if a tab is such a mistaken tab.   "tab-instead-of-space"?
> "tab-in-place-of-space"?  "tab-that-should-have-been-a-space"?
>
> The last one is horrible and not a serious suggestion, of course.

I like "tab-instead-of-space". :)

Will wait some time in case others have suggestions and if we can't come
up with something better, then I will use "tab-instead-of-space" in v3.

>> +test_expect_success 'check tab between non-whitespace (tab-between-non-ws: off)' '
>> +	git config core.whitespace "-tab-between-non-ws" &&
>> +
>> +	printf "1234567\tb" >x &&
>
> I notice all these printf create incomplete lines.  It is true that
> the detection of a tab that is used when it should have been a space
> should work even on an incomplete line, but using an incomplete
> line, which is of course rather unusual, for these tests gives a
> false impression that somehow this requires an incomplete line to
> trigger, which is not what we want to give.
>
> 	printf "1234567\tb\n" > x &&
>
> or something, perhaps?  I dunno.

That is a good idea. Will fix in v3. Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ws: add new tab-between-non-ws check
  2026-01-07  1:30 [PATCH v2] ws: add new tab-between-non-ws check Adrian Ratiu
  2026-01-07  2:12 ` Junio C Hamano
@ 2026-01-07 17:33 ` Johannes Sixt
  2026-01-07 18:11   ` Adrian Ratiu
  2026-01-08  9:01   ` Johannes Sixt
  1 sibling, 2 replies; 8+ messages in thread
From: Johannes Sixt @ 2026-01-07 17:33 UTC (permalink / raw)
  To: Adrian Ratiu; +Cc: Junio C Hamano, Patrick Steinhardt, Emily Shaffer, git

Am 07.01.26 um 02:30 schrieb Adrian Ratiu:
> This adds a new check to detect HT in the middle of sentences that
> should have been a SP, as suggested by Junio in
> https://public-inbox.org/git/xmqqy0mwsedz.fsf@gitster.g/

Generally, please review the commit message to follow the project's
style: Use imperative mood in sentences the describe the changes ("Add a
new check to...", "Supoort highlighting for tools like...", "Enable the
new chaeck for...", etc.)

> The check is a bit complex because we want to detect places where
> a SP was intended (HT can expand to more than one display column),
> so we need to count both the display columns (col) and the string
> character columns (i) to determine if a HT looks identical to a SP
> or can cause confusion.
> 
> Highlighting support for tools like git diff/show/log is added, as
> well as git apply --whitespace=fix capability.
> 
> The middle section of the line used to be assumed non-highlighted,
> which is obviously not true anymore, so we split its logic into a
> separate function named emit_middle_section().
> 
> The new check is enabled for Documentation/**/*.adoc, where these
> kinds of mistakes were seen in practice. It can also be enabled in
> other locations where it can be useful, by adding to the relevant
> attributes file.
> 
> Suggested-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> ---

> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 3c8eb02e4f..f5b6ceeed9 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -2440,4 +2440,147 @@ test_expect_success 'combine --ignore-blank-lines with --function-context 2' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'check tab between non-whitespace (tab-between-non-ws: off)' '
> +	git config core.whitespace "-tab-between-non-ws" &&

It might be worthwhile using test_config here, because this setting does
not need to persist for the remaining tests.

> +
> +	printf "1234567\tb" >x &&

What if you made the test cases into

	# only the TAB in the middle must be diagnosed
	printf "\t1234567\t12\t90\n" >x &&

to test that only the second of the three TABs is diagnosed?

> +	git add x &&
> +	git diff --cached --check &&
> +
> +	git diff --cached --color >raw &&
> +	test_decode_color <raw >actual &&
> +	! test_grep "<GREEN>1234567<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&

This must be

	test_grep ! "...

Furthermore, a negative test with a very tight pattern is often not
desired: The test could fail if any single character does not occur
(which could easily happen if the test text is changed, but not this
pattern). In this case, it would be sufficient to test only that "BLUE"
does not occur.

> +	test_grep "<GREEN>1234567	b<RESET>" actual &&
> +
> +	# should apply without error because tab-between-non-ws is off
> +	git diff --cached >patch.diff &&
> +	git checkout HEAD -- x &&
> +	git apply --whitespace=error patch.diff
> +'

There is t/t4124-apply-ws-rule.sh. Wouldn't the `git apply` tests be
better located there?

Please consider all comments on this test case repeated (and suitably
adusted) for all other test cases added by this patch.

> +
> +test_expect_success 'check tab between non-whitespace at tab stop (tab-between-non-ws: on)' '
> +	git config core.whitespace "tab-between-non-ws,tabwidth=8" &&

I am curious why you set tabwidth=8 here even though 8 is the default.

> +test_expect_success 'check tab between non-whitespace not at tab stop (tab-between-non-ws: on)' '

With my suggested text above, this case does not need a separate test, I
think.

> diff --git a/ws.c b/ws.c
> index 6cc2466c0c..633bc69418 100644
> --- a/ws.c
> +++ b/ws.c
> @@ -26,6 +26,7 @@ static struct whitespace_rule {
>  	{ "blank-at-eol", WS_BLANK_AT_EOL, 0 },
>  	{ "blank-at-eof", WS_BLANK_AT_EOF, 0 },
>  	{ "tab-in-indent", WS_TAB_IN_INDENT, 0, 1 },
> +	{ "tab-between-non-ws", WS_TAB_BETWEEN_NON_WS, 0 },

How about "tab-is-1-space"? The documentation can clarify that not any
TAB expanding to width 1 is diagnosed, but only those that are between
non-space characters.

>  	{ "incomplete-line", WS_INCOMPLETE_LINE, 0, 0 },
>  };

I didn't look at the remaining code changes.

-- Hannes


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ws: add new tab-between-non-ws check
  2026-01-07 17:33 ` Johannes Sixt
@ 2026-01-07 18:11   ` Adrian Ratiu
  2026-01-08  8:36     ` Johannes Sixt
  2026-01-08  9:01   ` Johannes Sixt
  1 sibling, 1 reply; 8+ messages in thread
From: Adrian Ratiu @ 2026-01-07 18:11 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Patrick Steinhardt, Emily Shaffer, git

On Wed, 07 Jan 2026, Johannes Sixt <j6t@kdbg.org> wrote:
> Am 07.01.26 um 02:30 schrieb Adrian Ratiu:
>> This adds a new check to detect HT in the middle of sentences that
>> should have been a SP, as suggested by Junio in
>> https://public-inbox.org/git/xmqqy0mwsedz.fsf@gitster.g/
>
> Generally, please review the commit message to follow the project's
> style: Use imperative mood in sentences the describe the changes ("Add a
> new check to...", "Supoort highlighting for tools like...", "Enable the
> new chaeck for...", etc.)

Ack, will fix.

>> The check is a bit complex because we want to detect places where
>> a SP was intended (HT can expand to more than one display column),
>> so we need to count both the display columns (col) and the string
>> character columns (i) to determine if a HT looks identical to a SP
>> or can cause confusion.
>> 
>> Highlighting support for tools like git diff/show/log is added, as
>> well as git apply --whitespace=fix capability.
>> 
>> The middle section of the line used to be assumed non-highlighted,
>> which is obviously not true anymore, so we split its logic into a
>> separate function named emit_middle_section().
>> 
>> The new check is enabled for Documentation/**/*.adoc, where these
>> kinds of mistakes were seen in practice. It can also be enabled in
>> other locations where it can be useful, by adding to the relevant
>> attributes file.
>> 
>> Suggested-by: Junio C Hamano <gitster@pobox.com>
>> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
>> ---
>> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
>> index 3c8eb02e4f..f5b6ceeed9 100755
>> --- a/t/t4015-diff-whitespace.sh
>> +++ b/t/t4015-diff-whitespace.sh
>> @@ -2440,4 +2440,147 @@ test_expect_success 'combine --ignore-blank-lines with --function-context 2' '
>>  	test_cmp expect actual
>>  '
>>  
>> +test_expect_success 'check tab between non-whitespace (tab-between-non-ws: off)' '
>> +	git config core.whitespace "-tab-between-non-ws" &&
>
> It might be worthwhile using test_config here, because this setting does
> not need to persist for the remaining tests.

Ack, will do.

>> +
>> +	printf "1234567\tb" >x &&
>
> What if you made the test cases into
>
> 	# only the TAB in the middle must be diagnosed
> 	printf "\t1234567\t12\t90\n" >x &&
>
> to test that only the second of the three TABs is diagnosed?

Yes we can do it this way.

>> +	git add x &&
>> +	git diff --cached --check &&
>> +
>> +	git diff --cached --color >raw &&
>> +	test_decode_color <raw >actual &&
>> +	! test_grep "<GREEN>1234567<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
>
> This must be
>
> 	test_grep ! "...
>
> Furthermore, a negative test with a very tight pattern is often not
> desired: The test could fail if any single character does not occur
> (which could easily happen if the test text is changed, but not this
> pattern). In this case, it would be sufficient to test only that "BLUE"
> does not occur.

Thanks, I'm still a bit of a noob wrt the git codebase. Will do.

>> +	test_grep "<GREEN>1234567	b<RESET>" actual &&
>> +
>> +	# should apply without error because tab-between-non-ws is off
>> +	git diff --cached >patch.diff &&
>> +	git checkout HEAD -- x &&
>> +	git apply --whitespace=error patch.diff
>> +'
>
> There is t/t4124-apply-ws-rule.sh. Wouldn't the `git apply` tests be
> better located there?

I think so, yes, thanks for pointing it out.

>
> Please consider all comments on this test case repeated (and suitably
> adusted) for all other test cases added by this patch.

Will do.

>> +
>> +test_expect_success 'check tab between non-whitespace at tab stop (tab-between-non-ws: on)' '
>> +	git config core.whitespace "tab-between-non-ws,tabwidth=8" &&
>
> I am curious why you set tabwidth=8 here even though 8 is the default.

Just to make it explicit because I also set tabwidth=4 in the following
tests, however I can drop it.

>> +test_expect_success 'check tab between non-whitespace not at tab stop (tab-between-non-ws: on)' '
>
> With my suggested text above, this case does not need a separate test, I
> think.

Yes, it likely is redundant. I'll double check the tabwidth=8 test as
well because that might also be redundant and can be dropped entirely.

>
>> diff --git a/ws.c b/ws.c
>> index 6cc2466c0c..633bc69418 100644
>> --- a/ws.c
>> +++ b/ws.c
>> @@ -26,6 +26,7 @@ static struct whitespace_rule {
>>  	{ "blank-at-eol", WS_BLANK_AT_EOL, 0 },
>>  	{ "blank-at-eof", WS_BLANK_AT_EOF, 0 },
>>  	{ "tab-in-indent", WS_TAB_IN_INDENT, 0, 1 },
>> +	{ "tab-between-non-ws", WS_TAB_BETWEEN_NON_WS, 0 },
>
> How about "tab-is-1-space"? The documentation can clarify that not any
> TAB expanding to width 1 is diagnosed, but only those that are between
> non-space characters.

I like this suggestion as well.

Many thanks for the review,
Adrian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ws: add new tab-between-non-ws check
  2026-01-07 18:11   ` Adrian Ratiu
@ 2026-01-08  8:36     ` Johannes Sixt
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Sixt @ 2026-01-08  8:36 UTC (permalink / raw)
  To: Adrian Ratiu; +Cc: Junio C Hamano, Patrick Steinhardt, Emily Shaffer, git

Am 07.01.26 um 19:11 schrieb Adrian Ratiu:
> On Wed, 07 Jan 2026, Johannes Sixt <j6t@kdbg.org> wrote:
>> Am 07.01.26 um 02:30 schrieb Adrian Ratiu:
>>> +	git add x &&
>>> +	git diff --cached --check &&
>>> +
>>> +	git diff --cached --color >raw &&
>>> +	test_decode_color <raw >actual &&
>>> +	! test_grep "<GREEN>1234567<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
>>
>> This must be
>>
>> 	test_grep ! "...
>>
>> Furthermore, a negative test with a very tight pattern is often not
>> desired: The test could fail if any single character does not occur
>> (which could easily happen if the test text is changed, but not this
>> pattern). In this case, it would be sufficient to test only that "BLUE"
>> does not occur.
> 
> Thanks, I'm still a bit of a noob wrt the git codebase. Will do.
> 
>>> +	test_grep "<GREEN>1234567	b<RESET>" actual &&

Reconsidering this, we have a positive test for the desired result here.
Then the negative test is redundant, I would think.

-- Hannes


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ws: add new tab-between-non-ws check
  2026-01-07 17:33 ` Johannes Sixt
  2026-01-07 18:11   ` Adrian Ratiu
@ 2026-01-08  9:01   ` Johannes Sixt
  2026-01-09 13:33     ` Adrian Ratiu
  1 sibling, 1 reply; 8+ messages in thread
From: Johannes Sixt @ 2026-01-08  9:01 UTC (permalink / raw)
  To: Adrian Ratiu; +Cc: Junio C Hamano, Patrick Steinhardt, Emily Shaffer, git

Am 07.01.26 um 18:33 schrieb Johannes Sixt:
> Am 07.01.26 um 02:30 schrieb Adrian Ratiu:
>> The check is a bit complex because we want to detect places where
>> a SP was intended (HT can expand to more than one display column),
>> so we need to count both the display columns (col) and the string
>> character columns (i) to determine if a HT looks identical to a SP
>> or can cause confusion.
>>
>> Highlighting support for tools like git diff/show/log is added, as
>> well as git apply --whitespace=fix capability.
>>
>> The middle section of the line used to be assumed non-highlighted,
>> which is obviously not true anymore, so we split its logic into a
>> separate function named emit_middle_section().
>>
>> The new check is enabled for Documentation/**/*.adoc, where these
>> kinds of mistakes were seen in practice. It can also be enabled in
>> other locations where it can be useful, by adding to the relevant
>> attributes file.

This makes me wonder how useful this check is. Yes, I has happened that
I didn't spot at TAB that should have been a SP, but perhaps a handful
of times in my career. Compare this to the many times that the other
kinds of whitespace errors happened.

Applying the rule to all documentation files is questionable: I can't
format a table with TAB characters between columns reliably, because if
a column happens to be 7 characters wide, the TAB at the 8th position
would be diagnosed, but I certainly do *not* want it to be replaced by a
SP. Yet, I might want legitimate cases outside tables to be diagnosed, so...

Maybe I'm too much of a devil's advocate here...

-- Hannes


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ws: add new tab-between-non-ws check
  2026-01-08  9:01   ` Johannes Sixt
@ 2026-01-09 13:33     ` Adrian Ratiu
  0 siblings, 0 replies; 8+ messages in thread
From: Adrian Ratiu @ 2026-01-09 13:33 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Patrick Steinhardt, Emily Shaffer, git

On Thu, 08 Jan 2026, Johannes Sixt <j6t@kdbg.org> wrote:
> Am 07.01.26 um 18:33 schrieb Johannes Sixt:
>> Am 07.01.26 um 02:30 schrieb Adrian Ratiu:
>>> The check is a bit complex because we want to detect places where
>>> a SP was intended (HT can expand to more than one display column),
>>> so we need to count both the display columns (col) and the string
>>> character columns (i) to determine if a HT looks identical to a SP
>>> or can cause confusion.
>>>
>>> Highlighting support for tools like git diff/show/log is added, as
>>> well as git apply --whitespace=fix capability.
>>>
>>> The middle section of the line used to be assumed non-highlighted,
>>> which is obviously not true anymore, so we split its logic into a
>>> separate function named emit_middle_section().
>>>
>>> The new check is enabled for Documentation/**/*.adoc, where these
>>> kinds of mistakes were seen in practice. It can also be enabled in
>>> other locations where it can be useful, by adding to the relevant
>>> attributes file.
>
> This makes me wonder how useful this check is. Yes, I has happened that
> I didn't spot at TAB that should have been a SP, but perhaps a handful
> of times in my career. Compare this to the many times that the other
> kinds of whitespace errors happened.
>
> Applying the rule to all documentation files is questionable: I can't
> format a table with TAB characters between columns reliably, because if
> a column happens to be 7 characters wide, the TAB at the 8th position
> would be diagnosed, but I certainly do *not* want it to be replaced by a
> SP. Yet, I might want legitimate cases outside tables to be diagnosed, so...
>
> Maybe I'm too much of a devil's advocate here...

I'll let Junio decide on the usefulness of this check since he's the one
who asked for it. :)

Maybe we could improve the heuristic to detect tables, for example
patterns like a\tb\tc.

I'm ok either way, just let me know if I should pursue this further.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-01-09 13:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-07  1:30 [PATCH v2] ws: add new tab-between-non-ws check Adrian Ratiu
2026-01-07  2:12 ` Junio C Hamano
2026-01-07 11:34   ` Adrian Ratiu
2026-01-07 17:33 ` Johannes Sixt
2026-01-07 18:11   ` Adrian Ratiu
2026-01-08  8:36     ` Johannes Sixt
2026-01-08  9:01   ` Johannes Sixt
2026-01-09 13:33     ` Adrian Ratiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).