All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jacob Keller <jacob.e.keller@intel.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: <git@vger.kernel.org>, Jacob Keller <jacob.keller@gmail.com>,
	Lidong Yan <yldhome2d2@gmail.com>, Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote
Date: Tue, 17 Jun 2025 16:25:16 -0700	[thread overview]
Message-ID: <d72fb411-2e05-441e-aee4-d8a26d652fea@intel.com> (raw)
In-Reply-To: <xmqqcyb2uhth.fsf@gitster.g>



On 6/17/2025 3:25 PM, Junio C Hamano wrote:
> Jacob Keller <jacob.e.keller@intel.com> writes:
> 
>> From: Jacob Keller <jacob.keller@gmail.com>
>>
>> The branch_release() and remote_clear() functions fail to completely
>> release all of the memory they point to. This results in leaked memory
>> if you read the configuration for an arbitrary repository and then
>> release that repository.
>>
>> This should be caught by the leak sanitizer. However, for callers which
>> use ``the_repository``, the values never go out of scope and the
>> sanitizer won't complain.
>>
>> A future change is going to add a caller of read_config() for a
>> submodule repository structure. Doing so reveals one immediate issue due
>> to a bad NULL pointer access, as well as the mentioned leaks.
>>
>>  * The branch->merge array is accessed without checking if its NULL.
>>    Since this array is only setup by calling set_merge, it may in fact
>>    not be initialized even though merge_nr is non-zero.
>>
>>  * The remote->push and remote->fetch refspecs are never cleared.
>>
>>  * The branch->merge_name array is never cleared.
>>
>>  * The individual elements of branch->merge are not released.
>>
>> Add a check against branch->merge before accessing it and calling
>> refspec_item_clear. Update remote_clear() with calls to refspec_clear()
>> for both the push and fetch refspecs. Add a release of the merge_name
>> items as well as a final release of the merge_name array.
>>
>> Freeing merge_name elements results in a warning because we discard the
>> const qualifier on the parameter name. These values come from a call to
>> add_merge() in handle_config(), which always copies the names with
>> xstrdup. This makes ownership of the memory difficult to track in the
>> code.
>>
>> Move the call to xstrdup inside add_merge() so that its clear that the
>> memory is duplicated here and must be released when the merge_name array
>> is released. Drop the const qualifier on the branch structure to allow
>> calling free without an explicit cast.
>>
>> These changes make it safe to call read_config() on a submodule
>> repository without crashing or leaking any of the memory when the
>> submodule repository is released.
>>
>> There is still some confusion with the difference between branch->merge
>> and branch->merge_name, and the confusion of using branch->merge_nr for
>> both. That could probably use some future cleanup.
> 
> 
> Nicely described.  One thing that puzzles me is this part:
> 
>> @@ -253,9 +256,16 @@ static void branch_release(struct branch *branch)
>>  	free((char *)branch->refname);
>>  	free(branch->remote_name);
>>  	free(branch->pushremote_name);
>> -	for (int i = 0; i < branch->merge_nr; i++)
>> -		refspec_item_clear(branch->merge[i]);
>> +	for (int i = 0; i < branch->merge_nr; i++) {
>> +		if (branch->merge) {
>> +			refspec_item_clear(branch->merge[i]);
>> +			free(branch->merge[i]);
>> +		}
>> +		if (branch->merge_name)
>> +			free(branch->merge_name[i]);
>> +	}
>>  	free(branch->merge);
>> +	free(branch->merge_name);
>>  }
> 
> where we iterate over branch->merge[] and branch->merge_name[]
> for branch->nr times and each time we check the NULL-ness of these
> two pointers.
> 
> merge_nr is only incremented inside add_merge() when merge_name[]
> array is grown by the same amount.  Do we need to check for the
> NULL-ness of branch->merge_name?
> 

Probably not.

> Near the beginning of set_merge() we allocate branch->merge[] 
> only when branch->merge_nr is non-zero (the assumption seems to be
> that we accumulate in the merge_names[] while reading the config,
> so as long as branch->merge is initialized properly to NULL, it will
> never be NULL if merge_nr is not 0, no?
> 

We initialize branch->merge with set_merge() which is called by
branch_get() and which is the only way for callers external to remote.c
of getting a branch structure.

The issue is that merge_nr can be non-zero because if no caller has done
a branch_get() on the given branch, we still have merge_nr is non-zero
and merge is NULL. I suspect you're right that merge_name will never be
NULL while merge_nr is non-zero.

Really, I think we should find a better way to handle this than having
two separate arrays which interact with merge_nr. I especially just
noticed that set_merge will assign merge_nr to 0 if remote_name is bad,
but it doesn't release the memory, so we'd leak on teardown in that case...

I wonder if it is possible to just get rid of merge_names entirely and
only have the merge array?

> Thanks.


  reply	other threads:[~2025-06-17 23:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-17 21:30 [PATCH v2 0/6] submodule: improve remote lookup logic Jacob Keller
2025-06-17 21:30 ` [PATCH v2 1/6] remote: fix tear down of struct branch and struct remote Jacob Keller
2025-06-17 22:25   ` Junio C Hamano
2025-06-17 23:25     ` Jacob Keller [this message]
2025-06-18  1:30       ` Junio C Hamano
2025-06-18 11:20         ` Junio C Hamano
2025-06-18 17:41           ` Jacob Keller
2025-06-18  2:44       ` Lidong Yan
2025-06-17 23:45     ` Jacob Keller
2025-06-17 21:30 ` [PATCH v2 2/6] dir: move starts_with_dot(_dot)_slash to dir.h Jacob Keller
2025-06-17 21:30 ` [PATCH v2 3/6] remote: remove the_repository from some functions Jacob Keller
2025-06-17 22:46   ` Junio C Hamano
2025-06-17 23:19     ` Jacob Keller
2025-06-17 21:30 ` [PATCH v2 4/6] submodule--helper: improve logic for fallback remote name Jacob Keller
2025-06-17 23:12   ` Junio C Hamano
2025-06-17 23:21     ` Jacob Keller
2025-06-17 21:30 ` [PATCH v2 5/6] submodule: move get_default_remote_submodule() Jacob Keller
2025-06-17 21:30 ` [PATCH v2 6/6] submodule: look up remotes by URL first Jacob Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d72fb411-2e05-441e-aee4-d8a26d652fea@intel.com \
    --to=jacob.e.keller@intel.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jacob.keller@gmail.com \
    --cc=ps@pks.im \
    --cc=yldhome2d2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.