git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] plugging some list-objects-filter leaks
@ 2022-09-08  4:52 Jeff King
  2022-09-08  4:54 ` [PATCH 1/5] list_objects_filter_copy(): deep-copy sparse_oid_name field Jeff King
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Jeff King @ 2022-09-08  4:52 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

The test I sent earlier in [1] fails the linux-leaks CI job, not because
it introduces new leaks, but just because it runs existing leaks in a
test marked as passing-leaks.

Of course we can drop the passing flag, but I figured it would probably
be an easy fix. Famous last words. It turned into quite a rabbit hole of
actual leaks (albeit small and bounded per process) and some
questionable memory ownership semantics.

Here's the series I came up with. I'm cc-ing Stolee as the last person
unfortunate enough to have touched this area. :)

  [1/5]: list_objects_filter_copy(): deep-copy sparse_oid_name field
  [2/5]: transport: deep-copy object-filter struct for fetch-pack
  [3/5]: transport: free filter options in disconnect_git()
  [4/5]: list_objects_filter_options: plug leak of filter_spec strings
  [5/5]: prepare_repo_settings(): plug leak of config values

 list-objects-filter-options.c | 20 ++++++++++++++------
 repo-settings.c               |  7 +++----
 transport.c                   |  5 ++++-
 3 files changed, 21 insertions(+), 11 deletions(-)

-Peff

[1] https://lore.kernel.org/git/YxkG3A30vNfu55Sx@coredump.intra.peff.net/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] list_objects_filter_copy(): deep-copy sparse_oid_name field
  2022-09-08  4:52 [PATCH 0/5] plugging some list-objects-filter leaks Jeff King
@ 2022-09-08  4:54 ` Jeff King
  2022-09-08  4:57 ` [PATCH 2/5] transport: deep-copy object-filter struct for fetch-pack Jeff King
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2022-09-08  4:54 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

The purpose of our copy function is to do a deep copy of each field so
that the source and destination structs become independent. We correctly
copy the filter_spec string list, but we forgot the sparse_oid_name
field. By doing a shallow copy of the pointer, that puts us at risk for
a use-after-free if one or both of the structs is cleaned up.

I don't think this can be triggered in practice, because we tend to leak
the structs rather than actually clean them up. But this should
future-proof us for plugging those leaks.

Signed-off-by: Jeff King <peff@peff.net>
---
I think this is the only thing missing. You can correlate with the
_release() function to see what other things might need a deep copy, and
everything else seems covered.

 list-objects-filter-options.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 4b25287886..41c41c9d45 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -418,6 +418,7 @@ void list_objects_filter_copy(
 	string_list_init_dup(&dest->filter_spec);
 	for_each_string_list_item(item, &src->filter_spec)
 		string_list_append(&dest->filter_spec, item->string);
+	dest->sparse_oid_name = xstrdup_or_null(src->sparse_oid_name);
 
 	ALLOC_ARRAY(dest->sub, dest->sub_alloc);
 	for (i = 0; i < src->sub_nr; i++)
-- 
2.37.3.1139.g47294c03c7


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] transport: deep-copy object-filter struct for fetch-pack
  2022-09-08  4:52 [PATCH 0/5] plugging some list-objects-filter leaks Jeff King
  2022-09-08  4:54 ` [PATCH 1/5] list_objects_filter_copy(): deep-copy sparse_oid_name field Jeff King
@ 2022-09-08  4:57 ` Jeff King
  2022-09-08  4:58 ` [PATCH 3/5] transport: free filter options in disconnect_git() Jeff King
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2022-09-08  4:57 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

When the transport code for the git protocol calls into fetch_pack(), it
has to fill out a fetch_pack_args struct that is mostly taken from the
transport options. We pass along any object-filter data by doing a
struct assignment of the list_objects_filter_options struct. But doing
so isn't safe; it contains allocated pointers in its filter_spec
string_list, which could lead to a double-free if one side mutates or
frees the string_list.

And indeed, the fetch-pack code does clear and rewrite the list via
expand_list_objects_filter_spec(), leaving the transport code with
dangling pointers.

This hasn't been a problem so far, though, because the transport code
doesn't look further at the filter struct. But it should, because in
some cases (when fetch-pack doesn't rewrite the list), it ends up
leaking the string_list.

So let's start by turning this shallow copy into a deep one, which
should let us fix the transport leak in a subsequent patch. Likewise,
we'll free the deep copy we made here when we're done with it (to avoid
leaking).

Note that it would also work to pass fetch-pack a pointer to our filter
struct, rather than a copy. But it's awkward for fetch-pack to take a
pointer in its arg struct; the actual git-fetch-pack command allocates a
fetch_pack_args struct on the stack and expects it to contain the filter
options. It could be rewritten to avoid this, but a deep copy serves our
purposes just as well.

Signed-off-by: Jeff King <peff@peff.net>
---
I wrote up the "pass a pointer" version and it's really not too bad. But
on the other hand, the deep copy here did turn up the bug in the
previous patch (once we start freeing both copies, it caused a
double-free of the sparse_oid_name field).

 transport.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/transport.c b/transport.c
index b51e991e44..24540f642a 100644
--- a/transport.c
+++ b/transport.c
@@ -386,7 +386,8 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.cloning = transport->cloning;
 	args.update_shallow = data->options.update_shallow;
 	args.from_promisor = data->options.from_promisor;
-	args.filter_options = data->options.filter_options;
+	list_objects_filter_copy(&args.filter_options,
+				 &data->options.filter_options);
 	args.refetch = data->options.refetch;
 	args.stateless_rpc = transport->stateless_rpc;
 	args.server_options = transport->server_options;
@@ -453,6 +454,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 
 	free_refs(refs_tmp);
 	free_refs(refs);
+	list_objects_filter_release(&args.filter_options);
 	return ret;
 }
 
-- 
2.37.3.1139.g47294c03c7


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] transport: free filter options in disconnect_git()
  2022-09-08  4:52 [PATCH 0/5] plugging some list-objects-filter leaks Jeff King
  2022-09-08  4:54 ` [PATCH 1/5] list_objects_filter_copy(): deep-copy sparse_oid_name field Jeff King
  2022-09-08  4:57 ` [PATCH 2/5] transport: deep-copy object-filter struct for fetch-pack Jeff King
@ 2022-09-08  4:58 ` Jeff King
  2022-09-08  5:01 ` [PATCH 4/5] list_objects_filter_options: plug leak of filter_spec strings Jeff King
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2022-09-08  4:58 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

If a user of the transport API calls transport_set_option() with
TRANS_OPT_LIST_OBJECTS_FILTER, it doesn't pass a struct, but rather a
string with the filter-spec, which the transport code then stores in its
own list_objects_filter_options struct.

When the caller is done and we call transport_disconnect(), the contents
of that filter struct are then leaked. We should release it before
freeing the transport struct.

Another way to solve this would be for transport_set_option() to pass a
pointer to the struct. But that's awkward, because there's a generic
transport-option interface that always takes a string. Plus it opens up
questions of memory lifetimes; by storing its own filter-options struct,
the transport code remains self-contained.

Signed-off-by: Jeff King <peff@peff.net>
---
 transport.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/transport.c b/transport.c
index 24540f642a..6ec6130852 100644
--- a/transport.c
+++ b/transport.c
@@ -895,6 +895,7 @@ static int disconnect_git(struct transport *transport)
 		finish_connect(data->conn);
 	}
 
+	list_objects_filter_release(&data->options.filter_options);
 	free(data);
 	return 0;
 }
-- 
2.37.3.1139.g47294c03c7


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] list_objects_filter_options: plug leak of filter_spec strings
  2022-09-08  4:52 [PATCH 0/5] plugging some list-objects-filter leaks Jeff King
                   ` (2 preceding siblings ...)
  2022-09-08  4:58 ` [PATCH 3/5] transport: free filter options in disconnect_git() Jeff King
@ 2022-09-08  5:01 ` Jeff King
  2022-09-08  5:02 ` [PATCH 5/5] prepare_repo_settings(): plug leak of config values Jeff King
  2022-09-09 14:20 ` [PATCH 0/5] plugging some list-objects-filter leaks Derrick Stolee
  5 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2022-09-08  5:01 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

The list_objects_filter_options struct contains a string_list to store
the filter_spec. Because we allow the options struct to be
zero-initialized by callers, the string_list's strdup_strings field is
generally not set.

Because we don't want to depend on the memory lifetimes of any strings
passed in to the list-objects API, everything we add to the string_list
is duplicated (either via xstrdup(), or via strbuf_detach()). So far so
good, but now we have a problem at cleanup time: when we clear the
list, the string_list API doesn't realize that it needs to free all of
those strings, and we leak them.

One option would be to set strdup_strings right before clearing the
list. But this is tricky for two reasons:

  1. There's one spot, in partial_clone_get_default_filter_spec(),
     that fails to duplicate its argument. We could fix that, but...

  2. We clear the list in a surprising number of places. As you might
     expect, we do so in list_objects_filter_release(). But we also
     clear and rewrite it in expand_list_objects_filter_spec(),
     list_objects_filter_spec(), and transform_to_combine_type().
     We'd have to put the same hack in all of those spots.

Instead, let's just set strdup_strings before adding anything. That lets
us drop the extra manual xstrdup() calls, fixes the spot mentioned
in (1) above that _should_ be duplicating, and future-proofs further
calls. We do have to switch the strbuf_detach() calls to use the nodup
form, but that's an easy change, and the resulting code more clearly
shows the expected ownership transfer.

This also resolves a weird inconsistency: when we make a deep copy with
list_objects_filter_copy(), it initializes the copy's filter_spec with
string_list_init_dup(). So the copy frees its string_list memory
correctly, but accidentally leaks the extra manual-xstrdup()'d strings!

There is one hiccup, though. In an ideal world, everyone would allocate
the list_objects_filter_options struct with an initializer which used
STRING_LIST_INIT_DUP under the hood. But there are a bunch of existing
callers which think that zero-initializing is good enough. We can leave
them as-is by noting that the list is always initially populated via
parse_list_objects_filter(). So we can just initialize the
strdup_strings flag there.

This is arguably a band-aid, but it works reliably. And it doesn't make
anything harder if we want to switch all the callers later to a new
LIST_OBJECTS_FILTER_INIT.

Signed-off-by: Jeff King <peff@peff.net>
---
 list-objects-filter-options.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 41c41c9d45..6cc4eb8e1c 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -207,7 +207,7 @@ static void filter_spec_append_urlencode(
 	struct strbuf buf = STRBUF_INIT;
 	strbuf_addstr_urlencode(&buf, raw, allow_unencoded);
 	trace_printf("Add to combine filter-spec: %s\n", buf.buf);
-	string_list_append(&filter->filter_spec, strbuf_detach(&buf, NULL));
+	string_list_append_nodup(&filter->filter_spec, strbuf_detach(&buf, NULL));
 }
 
 /*
@@ -226,12 +226,13 @@ static void transform_to_combine_type(
 			xcalloc(initial_sub_alloc, sizeof(*sub_array));
 		sub_array[0] = *filter_options;
 		memset(filter_options, 0, sizeof(*filter_options));
+		string_list_init_dup(&filter_options->filter_spec);
 		filter_options->sub = sub_array;
 		filter_options->sub_alloc = initial_sub_alloc;
 	}
 	filter_options->sub_nr = 1;
 	filter_options->choice = LOFC_COMBINE;
-	string_list_append(&filter_options->filter_spec, xstrdup("combine:"));
+	string_list_append(&filter_options->filter_spec, "combine:");
 	filter_spec_append_urlencode(
 		filter_options,
 		list_objects_filter_spec(&filter_options->sub[0]));
@@ -256,8 +257,14 @@ void parse_list_objects_filter(
 	struct strbuf errbuf = STRBUF_INIT;
 	int parse_error;
 
+	if (!filter_options->filter_spec.strdup_strings) {
+		if (filter_options->filter_spec.nr)
+			BUG("unexpected non-allocated string in filter_spec");
+		filter_options->filter_spec.strdup_strings = 1;
+	}
+
 	if (!filter_options->choice) {
-		string_list_append(&filter_options->filter_spec, xstrdup(arg));
+		string_list_append(&filter_options->filter_spec, arg);
 
 		parse_error = gently_parse_list_objects_filter(
 			filter_options, arg, &errbuf);
@@ -268,7 +275,7 @@ void parse_list_objects_filter(
 		 */
 		transform_to_combine_type(filter_options);
 
-		string_list_append(&filter_options->filter_spec, xstrdup("+"));
+		string_list_append(&filter_options->filter_spec, "+");
 		filter_spec_append_urlencode(filter_options, arg);
 		ALLOC_GROW_BY(filter_options->sub, filter_options->sub_nr, 1,
 			      filter_options->sub_alloc);
@@ -306,7 +313,7 @@ const char *list_objects_filter_spec(struct list_objects_filter_options *filter)
 		strbuf_add_separated_string_list(
 			&concatted, "", &filter->filter_spec);
 		string_list_clear(&filter->filter_spec, /*free_util=*/0);
-		string_list_append(
+		string_list_append_nodup(
 			&filter->filter_spec, strbuf_detach(&concatted, NULL));
 	}
 
@@ -321,7 +328,7 @@ const char *expand_list_objects_filter_spec(
 		strbuf_addf(&expanded_spec, "blob:limit=%lu",
 			    filter->blob_limit_value);
 		string_list_clear(&filter->filter_spec, /*free_util=*/0);
-		string_list_append(
+		string_list_append_nodup(
 			&filter->filter_spec,
 			strbuf_detach(&expanded_spec, NULL));
 	}
-- 
2.37.3.1139.g47294c03c7


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] prepare_repo_settings(): plug leak of config values
  2022-09-08  4:52 [PATCH 0/5] plugging some list-objects-filter leaks Jeff King
                   ` (3 preceding siblings ...)
  2022-09-08  5:01 ` [PATCH 4/5] list_objects_filter_options: plug leak of filter_spec strings Jeff King
@ 2022-09-08  5:02 ` Jeff King
  2022-09-09 14:20 ` [PATCH 0/5] plugging some list-objects-filter leaks Derrick Stolee
  5 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2022-09-08  5:02 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee

We call repo_config_get_string() for fetch.negotiationAlgorithm, which
allocates a copy of the string, but we never free it.

We could add a call to free(), but there's an even simpler solution: we
don't need the string to persist beyond a few strcasecmp() calls, so we
can instead use the "_tmp" variant which gives us a const pointer to the
cached value.

We need to switch the type of "strval" to "const char *" for this to
work, which affects a similar call that checks core.untrackedCache. But
it's in the same boat! It doesn't actually need the value to persist
beyond a maybe_bool() check (though it does remember to correctly free
the string afterwards). So we can simplify it at the same time.

Note that this core.untrackedCache check arguably should be using
repo_config_get_maybe_bool(), but there are some subtle behavior
changes. E.g., it doesn't currently allow a value-less "true". Arguably
it should, but let's avoid lumping further changes in what should be a
simple leak cleanup.

Signed-off-by: Jeff King <peff@peff.net>
---
 repo-settings.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/repo-settings.c b/repo-settings.c
index 43bc881bfc..e8b58151bc 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -22,7 +22,7 @@ void prepare_repo_settings(struct repository *r)
 {
 	int experimental;
 	int value;
-	char *strval;
+	const char *strval;
 	int manyfiles;
 
 	if (!r->gitdir)
@@ -77,7 +77,7 @@ void prepare_repo_settings(struct repository *r)
 	if (!repo_config_get_int(r, "index.version", &value))
 		r->settings.index_version = value;
 
-	if (!repo_config_get_string(r, "core.untrackedcache", &strval)) {
+	if (!repo_config_get_string_tmp(r, "core.untrackedcache", &strval)) {
 		int v = git_parse_maybe_bool(strval);
 
 		/*
@@ -88,10 +88,9 @@ void prepare_repo_settings(struct repository *r)
 		if (v >= 0)
 			r->settings.core_untracked_cache = v ?
 				UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE;
-		free(strval);
 	}
 
-	if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
+	if (!repo_config_get_string_tmp(r, "fetch.negotiationalgorithm", &strval)) {
 		int fetch_default = r->settings.fetch_negotiation_algorithm;
 		if (!strcasecmp(strval, "skipping"))
 			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
-- 
2.37.3.1139.g47294c03c7

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5] plugging some list-objects-filter leaks
  2022-09-08  4:52 [PATCH 0/5] plugging some list-objects-filter leaks Jeff King
                   ` (4 preceding siblings ...)
  2022-09-08  5:02 ` [PATCH 5/5] prepare_repo_settings(): plug leak of config values Jeff King
@ 2022-09-09 14:20 ` Derrick Stolee
  2022-09-11  4:51   ` Jeff King
  5 siblings, 1 reply; 8+ messages in thread
From: Derrick Stolee @ 2022-09-09 14:20 UTC (permalink / raw)
  To: Jeff King, git

On 9/8/2022 12:52 AM, Jeff King wrote:
> The test I sent earlier in [1] fails the linux-leaks CI job, not because
> it introduces new leaks, but just because it runs existing leaks in a
> test marked as passing-leaks.
> 
> Of course we can drop the passing flag, but I figured it would probably
> be an easy fix. Famous last words. It turned into quite a rabbit hole of
> actual leaks (albeit small and bounded per process) and some
> questionable memory ownership semantics.

Reading the patches, you make good arguments about the various trade-
offs in these sticky places. I agree with you in all cases, mostly
because the alternatives would not be any better unless we did a _lot_
of work to rewrite a lot more code than these patches. Even then, the
benefit is unclear.

> Here's the series I came up with. I'm cc-ing Stolee as the last person
> unfortunate enough to have touched this area. :)

Lucky me!

These patches look good. Thanks!

-Stolee

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5] plugging some list-objects-filter leaks
  2022-09-09 14:20 ` [PATCH 0/5] plugging some list-objects-filter leaks Derrick Stolee
@ 2022-09-11  4:51   ` Jeff King
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2022-09-11  4:51 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git

On Fri, Sep 09, 2022 at 10:20:37AM -0400, Derrick Stolee wrote:

> > Of course we can drop the passing flag, but I figured it would probably
> > be an easy fix. Famous last words. It turned into quite a rabbit hole of
> > actual leaks (albeit small and bounded per process) and some
> > questionable memory ownership semantics.
> 
> Reading the patches, you make good arguments about the various trade-
> offs in these sticky places. I agree with you in all cases, mostly
> because the alternatives would not be any better unless we did a _lot_
> of work to rewrite a lot more code than these patches. Even then, the
> benefit is unclear.

Thanks for looking them over. I wondered if I was just being lazy and/or
cowardly in not teaching callers to initialize the filter struct
specifically (rather than memset to zero).

So of course I started looking at it, and of course it was way more
complicated than you'd imagine. So I have no regrets in drawing the line
where I did in this leak-plugging series.

But now that I did do it, it turned out to reveal other oddities in the
filter code! So I'll post that as a separate series on top. Even if we
don't go all the way with the cleanup, we should at least fix the
oddities.

> > Here's the series I came up with. I'm cc-ing Stolee as the last person
> > unfortunate enough to have touched this area. :)
> 
> Lucky me!

It's the gift that keeps on giving. Guess what you're about to get cc'd
on. ;)

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-11  4:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-08  4:52 [PATCH 0/5] plugging some list-objects-filter leaks Jeff King
2022-09-08  4:54 ` [PATCH 1/5] list_objects_filter_copy(): deep-copy sparse_oid_name field Jeff King
2022-09-08  4:57 ` [PATCH 2/5] transport: deep-copy object-filter struct for fetch-pack Jeff King
2022-09-08  4:58 ` [PATCH 3/5] transport: free filter options in disconnect_git() Jeff King
2022-09-08  5:01 ` [PATCH 4/5] list_objects_filter_options: plug leak of filter_spec strings Jeff King
2022-09-08  5:02 ` [PATCH 5/5] prepare_repo_settings(): plug leak of config values Jeff King
2022-09-09 14:20 ` [PATCH 0/5] plugging some list-objects-filter leaks Derrick Stolee
2022-09-11  4:51   ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).