git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Redact unsafe URLs in the Trace2 output
@ 2023-11-22 19:18 Johannes Schindelin via GitGitGadget
  2023-11-22 19:18 ` [PATCH 1/4] trace2: fix signature of trace2_def_param() macro Jeff Hostetler via GitGitGadget
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-11-22 19:18 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

The Trace2 output can contain secrets when a user issues a Git command with
sensitive information in the command-line. A typical (if highly discouraged)
example is: git clone https://user:password@host.com/.

With this PR, the Trace2 output redacts passwords in such URLs by default.

This series also includes a commit to temporarily disable leak checking on
t0210,t0211 because the tests uncover other unrelated bugs in Git.

These patches were integrated into Microsoft's fork of Git, as
https://github.com/microsoft/git/pull/616, and have been cooking there ever
since.

Jeff Hostetler (3):
  trace2: fix signature of trace2_def_param() macro
  t0211: test URL redacting in PERF format
  t0212: test URL redacting in EVENT format

Johannes Schindelin (1):
  trace2: redact passwords from https:// URLs by default

 t/helper/test-trace2.c   |  55 ++++++++++++++++++
 t/t0210-trace2-normal.sh |  20 ++++++-
 t/t0211-trace2-perf.sh   |  21 ++++++-
 t/t0212-trace2-event.sh  |  40 +++++++++++++
 trace2.c                 | 120 ++++++++++++++++++++++++++++++++++++++-
 trace2.h                 |   4 +-
 6 files changed, 253 insertions(+), 7 deletions(-)


base-commit: 564d0252ca632e0264ed670534a51d18a689ef5d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1616%2Fdscho%2Ftrace2-redact-credentials-in-https-urls-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1616/dscho/trace2-redact-credentials-in-https-urls-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1616
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] trace2: fix signature of trace2_def_param() macro
  2023-11-22 19:18 [PATCH 0/4] Redact unsafe URLs in the Trace2 output Johannes Schindelin via GitGitGadget
@ 2023-11-22 19:18 ` Jeff Hostetler via GitGitGadget
  2023-11-23  6:10   ` Junio C Hamano
  2023-11-22 19:18 ` [PATCH 2/4] trace2: redact passwords from https:// URLs by default Johannes Schindelin via GitGitGadget
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2023-11-22 19:18 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

Add `struct key_value_info` argument to `trace2_def_param()`.

In dc90208497 (trace2: plumb config kvi, 2023-06-28) a `kvi`
argument was added to `trace2_def_param_fl()` but the macro
was not up updated. Let's fix that.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 trace2.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/trace2.h b/trace2.h
index 40d8c2e02a5..1f0669bbd2d 100644
--- a/trace2.h
+++ b/trace2.h
@@ -337,8 +337,8 @@ struct key_value_info;
 void trace2_def_param_fl(const char *file, int line, const char *param,
 			 const char *value, const struct key_value_info *kvi);
 
-#define trace2_def_param(param, value) \
-	trace2_def_param_fl(__FILE__, __LINE__, (param), (value))
+#define trace2_def_param(param, value, kvi) \
+	trace2_def_param_fl(__FILE__, __LINE__, (param), (value), (kvi))
 
 /*
  * Tell trace2 about a newly instantiated repo object and assign
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] trace2: redact passwords from https:// URLs by default
  2023-11-22 19:18 [PATCH 0/4] Redact unsafe URLs in the Trace2 output Johannes Schindelin via GitGitGadget
  2023-11-22 19:18 ` [PATCH 1/4] trace2: fix signature of trace2_def_param() macro Jeff Hostetler via GitGitGadget
@ 2023-11-22 19:18 ` Johannes Schindelin via GitGitGadget
  2023-11-23 18:59   ` Elijah Newren
  2023-11-22 19:18 ` [PATCH 3/4] t0211: test URL redacting in PERF format Jeff Hostetler via GitGitGadget
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-11-22 19:18 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

It is an unsafe practice to call something like

	git clone https://user:password@example.com/

This not only risks leaking the password "over the shoulder" or into the
readline history of the current Unix shell, it also gets logged via
Trace2 if enabled.

Let's at least avoid logging such secrets via Trace2, much like we avoid
logging secrets in `http.c`. Much like the code in `http.c` is guarded
via `GIT_TRACE_REDACT` (defaulting to `true`), we guard the new code via
`GIT_TRACE2_REDACT` (also defaulting to `true`).

The new tests added in this commit uncover leaks in `builtin/clone.c`
and `remote.c`. Therefore we need to turn off
`TEST_PASSES_SANITIZE_LEAK`. The reasons:

- We observed that `the_repository->remote_status` is not released
  properly.

- We are using `url...insteadOf` and that runs into a code path where an
  allocated URL is replaced with another URL, and the original URL is
  never released.

- `remote_states` contains plenty of `struct remote`s whose refspecs
  seem to be usually allocated by never released.

More investigation is needed here to identify the exact cause and
proper fixes for these leaks/bugs.

Co-authored-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t0210-trace2-normal.sh |  20 ++++++-
 trace2.c                 | 120 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 136 insertions(+), 4 deletions(-)

diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
index 80e76a4695e..c312657a12c 100755
--- a/t/t0210-trace2-normal.sh
+++ b/t/t0210-trace2-normal.sh
@@ -2,7 +2,7 @@
 
 test_description='test trace2 facility (normal target)'
 
-TEST_PASSES_SANITIZE_LEAK=true
+TEST_PASSES_SANITIZE_LEAK=false
 . ./test-lib.sh
 
 # Turn off any inherited trace2 settings for this test.
@@ -283,4 +283,22 @@ test_expect_success 'using global config with include' '
 	test_cmp expect actual
 '
 
+test_expect_success 'unsafe URLs are redacted by default' '
+	test_when_finished \
+		"rm -r trace.normal unredacted.normal clone clone2" &&
+
+	test_config_global \
+		"url.$(pwd).insteadOf" https://user:pwd@example.com/ &&
+	test_config_global trace2.configParams "core.*,remote.*.url" &&
+
+	GIT_TRACE2="$(pwd)/trace.normal" \
+		git clone https://user:pwd@example.com/ clone &&
+	! grep user:pwd trace.normal &&
+
+	GIT_TRACE2_REDACT=0 GIT_TRACE2="$(pwd)/unredacted.normal" \
+		git clone https://user:pwd@example.com/ clone2 &&
+	grep "start .* clone https://user:pwd@example.com" unredacted.normal &&
+	grep "remote.origin.url=https://user:pwd@example.com" unredacted.normal
+'
+
 test_done
diff --git a/trace2.c b/trace2.c
index 6dc74dff4c7..87d9a3a0361 100644
--- a/trace2.c
+++ b/trace2.c
@@ -20,6 +20,7 @@
 #include "trace2/tr2_tmr.h"
 
 static int trace2_enabled;
+static int trace2_redact = 1;
 
 static int tr2_next_child_id; /* modify under lock */
 static int tr2_next_exec_id; /* modify under lock */
@@ -227,6 +228,8 @@ void trace2_initialize_fl(const char *file, int line)
 	if (!tr2_tgt_want_builtins())
 		return;
 	trace2_enabled = 1;
+	if (!git_env_bool("GIT_TRACE2_REDACT", 1))
+		trace2_redact = 0;
 
 	tr2_sid_get();
 
@@ -247,12 +250,93 @@ int trace2_is_enabled(void)
 	return trace2_enabled;
 }
 
+/*
+ * Redacts an argument, i.e. ensures that no password in
+ * https://user:password@host/-style URLs is logged.
+ *
+ * Returns the original if nothing needed to be redacted.
+ * Returns a pointer that needs to be `free()`d otherwise.
+ */
+static const char *redact_arg(const char *arg)
+{
+	const char *p, *colon;
+	size_t at;
+
+	if (!trace2_redact ||
+	    (!skip_prefix(arg, "https://", &p) &&
+	     !skip_prefix(arg, "http://", &p)))
+		return arg;
+
+	at = strcspn(p, "@/");
+	if (p[at] != '@')
+		return arg;
+
+	colon = memchr(p, ':', at);
+	if (!colon)
+		return arg;
+
+	return xstrfmt("%.*s:<REDACTED>%s", (int)(colon - arg), arg, p + at);
+}
+
+/*
+ * Redacts arguments in an argument list.
+ *
+ * Returns the original if nothing needed to be redacted.
+ * Otherwise, returns a new array that needs to be released
+ * via `free_redacted_argv()`.
+ */
+static const char **redact_argv(const char **argv)
+{
+	int i, j;
+	const char *redacted = NULL;
+	const char **ret;
+
+	if (!trace2_redact)
+		return argv;
+
+	for (i = 0; argv[i]; i++)
+		if ((redacted = redact_arg(argv[i])) != argv[i])
+			break;
+
+	if (!argv[i])
+		return argv;
+
+	for (j = 0; argv[j]; j++)
+		; /* keep counting */
+
+	ALLOC_ARRAY(ret, j + 1);
+	ret[j] = NULL;
+
+	for (j = 0; j < i; j++)
+		ret[j] = argv[j];
+	ret[i] = redacted;
+	for (++i; argv[i]; i++) {
+		redacted = redact_arg(argv[i]);
+		ret[i] = redacted ? redacted : argv[i];
+	}
+
+	return ret;
+}
+
+static void free_redacted_argv(const char **redacted, const char **argv)
+{
+	int i;
+
+	if (redacted != argv) {
+		for (i = 0; argv[i]; i++)
+			if (redacted[i] != argv[i])
+				free((void *)redacted[i]);
+		free((void *)redacted);
+	}
+}
+
 void trace2_cmd_start_fl(const char *file, int line, const char **argv)
 {
 	struct tr2_tgt *tgt_j;
 	int j;
 	uint64_t us_now;
 	uint64_t us_elapsed_absolute;
+	const char **redacted;
 
 	if (!trace2_enabled)
 		return;
@@ -260,10 +344,14 @@ void trace2_cmd_start_fl(const char *file, int line, const char **argv)
 	us_now = getnanotime() / 1000;
 	us_elapsed_absolute = tr2tls_absolute_elapsed(us_now);
 
+	redacted = redact_argv(argv);
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_start_fl)
 			tgt_j->pfn_start_fl(file, line, us_elapsed_absolute,
-					    argv);
+					    redacted);
+
+	free_redacted_argv(redacted, argv);
 }
 
 void trace2_cmd_exit_fl(const char *file, int line, int code)
@@ -409,6 +497,7 @@ void trace2_child_start_fl(const char *file, int line,
 	int j;
 	uint64_t us_now;
 	uint64_t us_elapsed_absolute;
+	const char **orig_argv = cmd->args.v;
 
 	if (!trace2_enabled)
 		return;
@@ -419,10 +508,24 @@ void trace2_child_start_fl(const char *file, int line,
 	cmd->trace2_child_id = tr2tls_locked_increment(&tr2_next_child_id);
 	cmd->trace2_child_us_start = us_now;
 
+	/*
+	 * The `pfn_child_start_fl` API takes a `struct child_process`
+	 * rather than a simple `argv` for the child because some
+	 * targets make use of the additional context bits/values. So
+	 * temporarily replace the original argv (inside the `strvec`)
+	 * with a possibly redacted version.
+	 */
+	cmd->args.v = redact_argv(orig_argv);
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_child_start_fl)
 			tgt_j->pfn_child_start_fl(file, line,
 						  us_elapsed_absolute, cmd);
+
+	if (cmd->args.v != orig_argv) {
+		free_redacted_argv(cmd->args.v, orig_argv);
+		cmd->args.v = orig_argv;
+	}
 }
 
 void trace2_child_exit_fl(const char *file, int line, struct child_process *cmd,
@@ -493,6 +596,7 @@ int trace2_exec_fl(const char *file, int line, const char *exe,
 	int exec_id;
 	uint64_t us_now;
 	uint64_t us_elapsed_absolute;
+	const char **redacted;
 
 	if (!trace2_enabled)
 		return -1;
@@ -502,10 +606,14 @@ int trace2_exec_fl(const char *file, int line, const char *exe,
 
 	exec_id = tr2tls_locked_increment(&tr2_next_exec_id);
 
+	redacted = redact_argv(argv);
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_exec_fl)
 			tgt_j->pfn_exec_fl(file, line, us_elapsed_absolute,
-					   exec_id, exe, argv);
+					   exec_id, exe, redacted);
+
+	free_redacted_argv(redacted, argv);
 
 	return exec_id;
 }
@@ -637,13 +745,19 @@ void trace2_def_param_fl(const char *file, int line, const char *param,
 {
 	struct tr2_tgt *tgt_j;
 	int j;
+	const char *redacted;
 
 	if (!trace2_enabled)
 		return;
 
+	redacted = redact_arg(value);
+
 	for_each_wanted_builtin (j, tgt_j)
 		if (tgt_j->pfn_param_fl)
-			tgt_j->pfn_param_fl(file, line, param, value, kvi);
+			tgt_j->pfn_param_fl(file, line, param, redacted, kvi);
+
+	if (redacted != value)
+		free((void *)redacted);
 }
 
 void trace2_def_repo_fl(const char *file, int line, struct repository *repo)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] t0211: test URL redacting in PERF format
  2023-11-22 19:18 [PATCH 0/4] Redact unsafe URLs in the Trace2 output Johannes Schindelin via GitGitGadget
  2023-11-22 19:18 ` [PATCH 1/4] trace2: fix signature of trace2_def_param() macro Jeff Hostetler via GitGitGadget
  2023-11-22 19:18 ` [PATCH 2/4] trace2: redact passwords from https:// URLs by default Johannes Schindelin via GitGitGadget
@ 2023-11-22 19:18 ` Jeff Hostetler via GitGitGadget
  2023-11-22 19:18 ` [PATCH 4/4] t0212: test URL redacting in EVENT format Jeff Hostetler via GitGitGadget
  2023-11-23 19:08 ` [PATCH 0/4] Redact unsafe URLs in the Trace2 output Elijah Newren
  4 siblings, 0 replies; 9+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2023-11-22 19:18 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

This transmogrifies the test case that was just added to t0210, to also
cover the `GIT_TRACE2_PERF` backend.

Just like t0211, we now have to toggle the `TEST_PASSES_SANITIZE_LEAK`
annotation.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 t/t0211-trace2-perf.sh | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/t/t0211-trace2-perf.sh b/t/t0211-trace2-perf.sh
index cfba6861322..290b6eaaab1 100755
--- a/t/t0211-trace2-perf.sh
+++ b/t/t0211-trace2-perf.sh
@@ -2,7 +2,7 @@
 
 test_description='test trace2 facility (perf target)'
 
-TEST_PASSES_SANITIZE_LEAK=true
+TEST_PASSES_SANITIZE_LEAK=false
 . ./test-lib.sh
 
 # Turn off any inherited trace2 settings for this test.
@@ -268,4 +268,23 @@ test_expect_success PTHREADS 'global counter test/test2' '
 	have_counter_event "main" "counter" "test" "test2" 60 actual
 '
 
+test_expect_success 'unsafe URLs are redacted by default' '
+	test_when_finished \
+		"rm -r actual trace.perf unredacted.perf clone clone2" &&
+
+	test_config_global \
+		"url.$(pwd).insteadOf" https://user:pwd@example.com/ &&
+	test_config_global trace2.configParams "core.*,remote.*.url" &&
+
+	GIT_TRACE2_PERF="$(pwd)/trace.perf" \
+		git clone https://user:pwd@example.com/ clone &&
+	! grep user:pwd trace.perf &&
+
+	GIT_TRACE2_REDACT=0 GIT_TRACE2_PERF="$(pwd)/unredacted.perf" \
+		git clone https://user:pwd@example.com/ clone2 &&
+	perl "$TEST_DIRECTORY/t0211/scrub_perf.perl" <unredacted.perf >actual &&
+	grep "d0|main|start|.* clone https://user:pwd@example.com" actual &&
+	grep "d0|main|def_param|.*|remote.origin.url:https://user:pwd@example.com" actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] t0212: test URL redacting in EVENT format
  2023-11-22 19:18 [PATCH 0/4] Redact unsafe URLs in the Trace2 output Johannes Schindelin via GitGitGadget
                   ` (2 preceding siblings ...)
  2023-11-22 19:18 ` [PATCH 3/4] t0211: test URL redacting in PERF format Jeff Hostetler via GitGitGadget
@ 2023-11-22 19:18 ` Jeff Hostetler via GitGitGadget
  2023-11-23 19:08 ` [PATCH 0/4] Redact unsafe URLs in the Trace2 output Elijah Newren
  4 siblings, 0 replies; 9+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2023-11-22 19:18 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Jeff Hostetler

From: Jeff Hostetler <jeffhostetler@github.com>

In the added tests cases, skip testing the `GIT_TRACE2_REDACT=0` case
because we would need to exactly model the full JSON event stream like
we did in the preceding basic tests and I do not think it is worth it.

Furthermore, the Trace2 routines print the same content in normal, perf,
or event format, and in t0210 and t0211 we already tested the basic
functionality, so no need to repeat it here.

In this test, we use the test-helper to unit test each of the event
messages where URLs can appear and confirm that they are redacted in
each event.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 t/helper/test-trace2.c  | 55 +++++++++++++++++++++++++++++++++++++++++
 t/t0212-trace2-event.sh | 40 ++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+)

diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index d5ca0046c89..1adac29a575 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -412,6 +412,56 @@ static int ut_201counter(int argc, const char **argv)
 	return 0;
 }
 
+static int ut_300redact_start(int argc, const char **argv)
+{
+	if (!argc)
+		die("expect <argv...>");
+
+	trace2_cmd_start(argv);
+
+	return 0;
+}
+
+static int ut_301redact_child_start(int argc, const char **argv)
+{
+	struct child_process cmd = CHILD_PROCESS_INIT;
+	int k;
+
+	if (!argc)
+		die("expect <argv...>");
+
+	for (k = 0; argv[k]; k++)
+		strvec_push(&cmd.args, argv[k]);
+
+	trace2_child_start(&cmd);
+
+	strvec_clear(&cmd.args);
+
+	return 0;
+}
+
+static int ut_302redact_exec(int argc, const char **argv)
+{
+	if (!argc)
+		die("expect <exe> <argv...>");
+
+	trace2_exec(argv[0], &argv[1]);
+
+	return 0;
+}
+
+static int ut_303redact_def_param(int argc, const char **argv)
+{
+	struct key_value_info kvi = KVI_INIT;
+
+	if (argc < 2)
+		die("expect <key> <value>");
+
+	trace2_def_param(argv[0], argv[1], &kvi);
+
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -438,6 +488,11 @@ static struct unit_test ut_table[] = {
 
 	{ ut_200counter,  "200counter", "<v1> [<v2> [<v3> [...]]]" },
 	{ ut_201counter,  "201counter", "<v1> <v2> <threads>" },
+
+	{ ut_300redact_start,       "300redact_start",       "<argv...>" },
+	{ ut_301redact_child_start, "301redact_child_start", "<argv...>" },
+	{ ut_302redact_exec,        "302redact_exec",        "<exe> <argv...>" },
+	{ ut_303redact_def_param,   "303redact_def_param",   "<key> <value>" },
 };
 /* clang-format on */
 
diff --git a/t/t0212-trace2-event.sh b/t/t0212-trace2-event.sh
index 6d3374ff773..147643d5826 100755
--- a/t/t0212-trace2-event.sh
+++ b/t/t0212-trace2-event.sh
@@ -323,4 +323,44 @@ test_expect_success 'discard traces when there are too many files' '
 	head -n2 trace_target_dir/git-trace2-discard | tail -n1 | grep \"event\":\"too_many_files\"
 '
 
+# In the following "...redact..." tests, skip testing the GIT_TRACE2_REDACT=0
+# case because we would need to exactly model the full JSON event stream like
+# we did in the basic tests above and I do not think it is worth it.
+
+test_expect_success 'unsafe URLs are redacted by default in cmd_start events' '
+	test_when_finished \
+		"rm -r trace.event" &&
+
+	GIT_TRACE2_EVENT="$(pwd)/trace.event" \
+		test-tool trace2 300redact_start git clone https://user:pwd@example.com/ clone2 &&
+	! grep user:pwd trace.event
+'
+
+test_expect_success 'unsafe URLs are redacted by default in child_start events' '
+	test_when_finished \
+		"rm -r trace.event" &&
+
+	GIT_TRACE2_EVENT="$(pwd)/trace.event" \
+		test-tool trace2 301redact_child_start git clone https://user:pwd@example.com/ clone2 &&
+	! grep user:pwd trace.event
+'
+
+test_expect_success 'unsafe URLs are redacted by default in exec events' '
+	test_when_finished \
+		"rm -r trace.event" &&
+
+	GIT_TRACE2_EVENT="$(pwd)/trace.event" \
+		test-tool trace2 302redact_exec git clone https://user:pwd@example.com/ clone2 &&
+	! grep user:pwd trace.event
+'
+
+test_expect_success 'unsafe URLs are redacted by default in def_param events' '
+	test_when_finished \
+		"rm -r trace.event" &&
+
+	GIT_TRACE2_EVENT="$(pwd)/trace.event" \
+		test-tool trace2 303redact_def_param url https://user:pwd@example.com/ &&
+	! grep user:pwd trace.event
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] trace2: fix signature of trace2_def_param() macro
  2023-11-22 19:18 ` [PATCH 1/4] trace2: fix signature of trace2_def_param() macro Jeff Hostetler via GitGitGadget
@ 2023-11-23  6:10   ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2023-11-23  6:10 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Johannes Schindelin, Jeff Hostetler

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Add `struct key_value_info` argument to `trace2_def_param()`.
>
> In dc90208497 (trace2: plumb config kvi, 2023-06-28) a `kvi`
> argument was added to `trace2_def_param_fl()` but the macro
> was not up updated. Let's fix that.
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  trace2.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/trace2.h b/trace2.h
> index 40d8c2e02a5..1f0669bbd2d 100644
> --- a/trace2.h
> +++ b/trace2.h
> @@ -337,8 +337,8 @@ struct key_value_info;
>  void trace2_def_param_fl(const char *file, int line, const char *param,
>  			 const char *value, const struct key_value_info *kvi);
>  
> -#define trace2_def_param(param, value) \
> -	trace2_def_param_fl(__FILE__, __LINE__, (param), (value))
> +#define trace2_def_param(param, value, kvi) \
> +	trace2_def_param_fl(__FILE__, __LINE__, (param), (value), (kvi))

IOW, this macro was not used back when it was updated, and nobody
used it since then?  

I briefly wondered if we are better off removing it but that does
not make sense because you are adding a new (and only) user to it.

Will queue.  Thanks.

>  
>  /*
>   * Tell trace2 about a newly instantiated repo object and assign

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] trace2: redact passwords from https:// URLs by default
  2023-11-22 19:18 ` [PATCH 2/4] trace2: redact passwords from https:// URLs by default Johannes Schindelin via GitGitGadget
@ 2023-11-23 18:59   ` Elijah Newren
  2023-11-27 21:50     ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Elijah Newren @ 2023-11-23 18:59 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Wed, Nov 22, 2023 at 11:19 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> It is an unsafe practice to call something like
>
>         git clone https://user:password@example.com/
>
> This not only risks leaking the password "over the shoulder" or into the
> readline history of the current Unix shell, it also gets logged via
> Trace2 if enabled.

Indeed.  Clone urls _also_ seem to be slurped up by other tools, such
as IDEs, and possibly sent off to various third-party cloud services
when users have various AI-assist plugins installed in their IDEs,
resulting in some infosec incidents and fire drills.  (Not a
theoretical scenario, and not fun.)

> Let's at least avoid logging such secrets via Trace2, much like we avoid
> logging secrets in `http.c`. Much like the code in `http.c` is guarded
> via `GIT_TRACE_REDACT` (defaulting to `true`), we guard the new code via
> `GIT_TRACE2_REDACT` (also defaulting to `true`).

Training users is hard.  I appreciate the changes here to make trace2
not be a leak vector, but is it time to perhaps consider bigger safety
measures: At the clone/fetch level, automatically warn loudly whenever
such a URL is provided, accompanied with a note that in the future it
will be turned into a hard error?

Either way, I agree with your "at least" comment here and the changes
you are making.

> The new tests added in this commit uncover leaks in `builtin/clone.c`
> and `remote.c`. Therefore we need to turn off
> `TEST_PASSES_SANITIZE_LEAK`. The reasons:
>
> - We observed that `the_repository->remote_status` is not released
>   properly.
>
> - We are using `url...insteadOf` and that runs into a code path where an
>   allocated URL is replaced with another URL, and the original URL is
>   never released.
>
> - `remote_states` contains plenty of `struct remote`s whose refspecs
>   seem to be usually allocated by never released.
>
> More investigation is needed here to identify the exact cause and
> proper fixes for these leaks/bugs.

Thanks for carefully documenting and explaining.

> Co-authored-by: Jeff Hostetler <jeffhostetler@github.com>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  t/t0210-trace2-normal.sh |  20 ++++++-
>  trace2.c                 | 120 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 136 insertions(+), 4 deletions(-)
>
> diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
> index 80e76a4695e..c312657a12c 100755
> --- a/t/t0210-trace2-normal.sh
> +++ b/t/t0210-trace2-normal.sh
> @@ -2,7 +2,7 @@
>
>  test_description='test trace2 facility (normal target)'
>
> -TEST_PASSES_SANITIZE_LEAK=true
> +TEST_PASSES_SANITIZE_LEAK=false
>  . ./test-lib.sh
>
>  # Turn off any inherited trace2 settings for this test.
> @@ -283,4 +283,22 @@ test_expect_success 'using global config with include' '
>         test_cmp expect actual
>  '
>
> +test_expect_success 'unsafe URLs are redacted by default' '
> +       test_when_finished \
> +               "rm -r trace.normal unredacted.normal clone clone2" &&
> +
> +       test_config_global \
> +               "url.$(pwd).insteadOf" https://user:pwd@example.com/ &&
> +       test_config_global trace2.configParams "core.*,remote.*.url" &&
> +
> +       GIT_TRACE2="$(pwd)/trace.normal" \
> +               git clone https://user:pwd@example.com/ clone &&
> +       ! grep user:pwd trace.normal &&
> +
> +       GIT_TRACE2_REDACT=0 GIT_TRACE2="$(pwd)/unredacted.normal" \
> +               git clone https://user:pwd@example.com/ clone2 &&
> +       grep "start .* clone https://user:pwd@example.com" unredacted.normal &&
> +       grep "remote.origin.url=https://user:pwd@example.com" unredacted.normal
> +'
> +
>  test_done
> diff --git a/trace2.c b/trace2.c
> index 6dc74dff4c7..87d9a3a0361 100644
> --- a/trace2.c
> +++ b/trace2.c
> @@ -20,6 +20,7 @@
>  #include "trace2/tr2_tmr.h"
>
>  static int trace2_enabled;
> +static int trace2_redact = 1;
>
>  static int tr2_next_child_id; /* modify under lock */
>  static int tr2_next_exec_id; /* modify under lock */
> @@ -227,6 +228,8 @@ void trace2_initialize_fl(const char *file, int line)
>         if (!tr2_tgt_want_builtins())
>                 return;
>         trace2_enabled = 1;
> +       if (!git_env_bool("GIT_TRACE2_REDACT", 1))
> +               trace2_redact = 0;
>
>         tr2_sid_get();
>
> @@ -247,12 +250,93 @@ int trace2_is_enabled(void)
>         return trace2_enabled;
>  }
>
> +/*
> + * Redacts an argument, i.e. ensures that no password in
> + * https://user:password@host/-style URLs is logged.
> + *
> + * Returns the original if nothing needed to be redacted.
> + * Returns a pointer that needs to be `free()`d otherwise.
> + */
> +static const char *redact_arg(const char *arg)
> +{
> +       const char *p, *colon;
> +       size_t at;
> +
> +       if (!trace2_redact ||
> +           (!skip_prefix(arg, "https://", &p) &&
> +            !skip_prefix(arg, "http://", &p)))
> +               return arg;
> +
> +       at = strcspn(p, "@/");
> +       if (p[at] != '@')
> +               return arg;
> +
> +       colon = memchr(p, ':', at);
> +       if (!colon)
> +               return arg;
> +
> +       return xstrfmt("%.*s:<REDACTED>%s", (int)(colon - arg), arg, p + at);
> +}
> +
> +/*
> + * Redacts arguments in an argument list.
> + *
> + * Returns the original if nothing needed to be redacted.
> + * Otherwise, returns a new array that needs to be released
> + * via `free_redacted_argv()`.
> + */
> +static const char **redact_argv(const char **argv)
> +{
> +       int i, j;
> +       const char *redacted = NULL;
> +       const char **ret;
> +
> +       if (!trace2_redact)
> +               return argv;
> +
> +       for (i = 0; argv[i]; i++)
> +               if ((redacted = redact_arg(argv[i])) != argv[i])
> +                       break;
> +
> +       if (!argv[i])
> +               return argv;
> +
> +       for (j = 0; argv[j]; j++)
> +               ; /* keep counting */
> +
> +       ALLOC_ARRAY(ret, j + 1);
> +       ret[j] = NULL;
> +
> +       for (j = 0; j < i; j++)
> +               ret[j] = argv[j];
> +       ret[i] = redacted;
> +       for (++i; argv[i]; i++) {
> +               redacted = redact_arg(argv[i]);
> +               ret[i] = redacted ? redacted : argv[i];
> +       }
> +
> +       return ret;
> +}
> +
> +static void free_redacted_argv(const char **redacted, const char **argv)
> +{
> +       int i;
> +
> +       if (redacted != argv) {
> +               for (i = 0; argv[i]; i++)
> +                       if (redacted[i] != argv[i])
> +                               free((void *)redacted[i]);
> +               free((void *)redacted);
> +       }
> +}
> +
>  void trace2_cmd_start_fl(const char *file, int line, const char **argv)
>  {
>         struct tr2_tgt *tgt_j;
>         int j;
>         uint64_t us_now;
>         uint64_t us_elapsed_absolute;
> +       const char **redacted;
>
>         if (!trace2_enabled)
>                 return;
> @@ -260,10 +344,14 @@ void trace2_cmd_start_fl(const char *file, int line, const char **argv)
>         us_now = getnanotime() / 1000;
>         us_elapsed_absolute = tr2tls_absolute_elapsed(us_now);
>
> +       redacted = redact_argv(argv);
> +
>         for_each_wanted_builtin (j, tgt_j)
>                 if (tgt_j->pfn_start_fl)
>                         tgt_j->pfn_start_fl(file, line, us_elapsed_absolute,
> -                                           argv);
> +                                           redacted);
> +
> +       free_redacted_argv(redacted, argv);
>  }
>
>  void trace2_cmd_exit_fl(const char *file, int line, int code)
> @@ -409,6 +497,7 @@ void trace2_child_start_fl(const char *file, int line,
>         int j;
>         uint64_t us_now;
>         uint64_t us_elapsed_absolute;
> +       const char **orig_argv = cmd->args.v;
>
>         if (!trace2_enabled)
>                 return;
> @@ -419,10 +508,24 @@ void trace2_child_start_fl(const char *file, int line,
>         cmd->trace2_child_id = tr2tls_locked_increment(&tr2_next_child_id);
>         cmd->trace2_child_us_start = us_now;
>
> +       /*
> +        * The `pfn_child_start_fl` API takes a `struct child_process`
> +        * rather than a simple `argv` for the child because some
> +        * targets make use of the additional context bits/values. So
> +        * temporarily replace the original argv (inside the `strvec`)
> +        * with a possibly redacted version.
> +        */
> +       cmd->args.v = redact_argv(orig_argv);
> +
>         for_each_wanted_builtin (j, tgt_j)
>                 if (tgt_j->pfn_child_start_fl)
>                         tgt_j->pfn_child_start_fl(file, line,
>                                                   us_elapsed_absolute, cmd);
> +
> +       if (cmd->args.v != orig_argv) {
> +               free_redacted_argv(cmd->args.v, orig_argv);
> +               cmd->args.v = orig_argv;
> +       }
>  }
>
>  void trace2_child_exit_fl(const char *file, int line, struct child_process *cmd,
> @@ -493,6 +596,7 @@ int trace2_exec_fl(const char *file, int line, const char *exe,
>         int exec_id;
>         uint64_t us_now;
>         uint64_t us_elapsed_absolute;
> +       const char **redacted;
>
>         if (!trace2_enabled)
>                 return -1;
> @@ -502,10 +606,14 @@ int trace2_exec_fl(const char *file, int line, const char *exe,
>
>         exec_id = tr2tls_locked_increment(&tr2_next_exec_id);
>
> +       redacted = redact_argv(argv);
> +
>         for_each_wanted_builtin (j, tgt_j)
>                 if (tgt_j->pfn_exec_fl)
>                         tgt_j->pfn_exec_fl(file, line, us_elapsed_absolute,
> -                                          exec_id, exe, argv);
> +                                          exec_id, exe, redacted);
> +
> +       free_redacted_argv(redacted, argv);
>
>         return exec_id;
>  }
> @@ -637,13 +745,19 @@ void trace2_def_param_fl(const char *file, int line, const char *param,
>  {
>         struct tr2_tgt *tgt_j;
>         int j;
> +       const char *redacted;
>
>         if (!trace2_enabled)
>                 return;
>
> +       redacted = redact_arg(value);
> +
>         for_each_wanted_builtin (j, tgt_j)
>                 if (tgt_j->pfn_param_fl)
> -                       tgt_j->pfn_param_fl(file, line, param, value, kvi);
> +                       tgt_j->pfn_param_fl(file, line, param, redacted, kvi);
> +
> +       if (redacted != value)
> +               free((void *)redacted);
>  }
>
>  void trace2_def_repo_fl(const char *file, int line, struct repository *repo)
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] Redact unsafe URLs in the Trace2 output
  2023-11-22 19:18 [PATCH 0/4] Redact unsafe URLs in the Trace2 output Johannes Schindelin via GitGitGadget
                   ` (3 preceding siblings ...)
  2023-11-22 19:18 ` [PATCH 4/4] t0212: test URL redacting in EVENT format Jeff Hostetler via GitGitGadget
@ 2023-11-23 19:08 ` Elijah Newren
  4 siblings, 0 replies; 9+ messages in thread
From: Elijah Newren @ 2023-11-23 19:08 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Wed, Nov 22, 2023 at 11:18 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> The Trace2 output can contain secrets when a user issues a Git command with
> sensitive information in the command-line. A typical (if highly discouraged)
> example is: git clone https://user:password@host.com/.
>
> With this PR, the Trace2 output redacts passwords in such URLs by default.
>
> This series also includes a commit to temporarily disable leak checking on
> t0210,t0211 because the tests uncover other unrelated bugs in Git.
>
> These patches were integrated into Microsoft's fork of Git, as
> https://github.com/microsoft/git/pull/616, and have been cooking there ever
> since.

Thanks for making these changes.  Makes me wonder, back when we were
logging trace2 data, if we had some of these leaks.  Eek.

As I commented in patch 2, I think this is a good start, but I'm
curious if others would be willing to turn clone/fetch of such bad
URLs into warnings for now and errors later.  The prevalence of
AI-assist add-ons for various IDEs and the number of developers opting
to use those IDEs and add-ons, and the fact that these tools sometimes
include repository URLs in what they send off to third parties, makes
me wonder if our recent infosec fire drill is soon going to be a
widely shared experience by lots of other companies and individuals.
Training users to not do bad things is hard, and it might be worth
saving them from themselves.  Thoughts?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] trace2: redact passwords from https:// URLs by default
  2023-11-23 18:59   ` Elijah Newren
@ 2023-11-27 21:50     ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2023-11-27 21:50 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

On Thu, Nov 23, 2023 at 10:59:20AM -0800, Elijah Newren wrote:

> > Let's at least avoid logging such secrets via Trace2, much like we avoid
> > logging secrets in `http.c`. Much like the code in `http.c` is guarded
> > via `GIT_TRACE_REDACT` (defaulting to `true`), we guard the new code via
> > `GIT_TRACE2_REDACT` (also defaulting to `true`).
> 
> Training users is hard.  I appreciate the changes here to make trace2
> not be a leak vector, but is it time to perhaps consider bigger safety
> measures: At the clone/fetch level, automatically warn loudly whenever
> such a URL is provided, accompanied with a note that in the future it
> will be turned into a hard error?

Yes, the password in such a case ends up in the plaintext .git/config
file, which is not great.

There's some discussion and patches here:

  https://lore.kernel.org/git/nycvar.QRO.7.76.6.1905172121130.46@tvgsbejvaqbjf.bet/

I meant to follow up on them, but never did.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-11-27 21:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-22 19:18 [PATCH 0/4] Redact unsafe URLs in the Trace2 output Johannes Schindelin via GitGitGadget
2023-11-22 19:18 ` [PATCH 1/4] trace2: fix signature of trace2_def_param() macro Jeff Hostetler via GitGitGadget
2023-11-23  6:10   ` Junio C Hamano
2023-11-22 19:18 ` [PATCH 2/4] trace2: redact passwords from https:// URLs by default Johannes Schindelin via GitGitGadget
2023-11-23 18:59   ` Elijah Newren
2023-11-27 21:50     ` Jeff King
2023-11-22 19:18 ` [PATCH 3/4] t0211: test URL redacting in PERF format Jeff Hostetler via GitGitGadget
2023-11-22 19:18 ` [PATCH 4/4] t0212: test URL redacting in EVENT format Jeff Hostetler via GitGitGadget
2023-11-23 19:08 ` [PATCH 0/4] Redact unsafe URLs in the Trace2 output Elijah Newren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).