All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Haritha via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  Jeff King <peff@peff.net>,
	 Haritha <harithamma.d@ibm.com>
Subject: Re: [PATCH v2] Fix to avoid high memory footprint
Date: Wed, 24 Jul 2024 14:41:57 -0700	[thread overview]
Message-ID: <xmqqmsm6sc0q.fsf@gitster.g> (raw)
In-Reply-To: <pull.1744.v2.git.git.1721821503173.gitgitgadget@gmail.com> (Haritha via GitGitGadget's message of "Wed, 24 Jul 2024 11:45:03 +0000")

"Haritha  via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: D Harithamma <harithamma.d@ibm.com>
>
> This fix avoids high memory footprint when adding files that require
> conversion.  Git has a trace_encoding routine that prints trace output
> when GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment
> variable is used to debug the encoding contents.  When a 40MB file is
> added, it requests close to 1.8GB of storage from xrealloc which can
> lead to out of memory errors.  However, the check for
> GIT_TRACE_WORKING_TREE_ENCODING is done after the string is allocated.
> This resolves high memory footprints even when
> GIT_TRACE_WORKING_TREE_ENCODING is not active.  This fix adds an early
> exit to avoid the unnecessary memory allocation.

The sentences jump around and the logic flow is hard to follow.  The
first sentence makes a claim of what it does (but the readers have
not bee told where that problem comes from).  The second sentence
makes a statement of a fact, but the readers do not yet know at that
point what relevance the fact has to the issue at hand, etc.

The usual way to compose a log message of this project is to

 - Give an observation on how the current system work in the present
   tense (so no need to say "Currently X is Y", just "X is Y"), and
   discuss what you perceive as a problem in it.

 - Propose a solution (optional---often, problem description
   trivially leads to an obvious solution in reader's minds).

 - Give commands to the codebase to "become like so".

in this order.


    When Git needs to add a file that require encoding conversion,
    but tracing of encoding conversion is *not* requested via
    setting GIT_TRACE_WORKING_TREE_ENCODING environment variable,
    the trace_encoding() function still allocated and prepared
    "human readable" copies of the file contents before and after
    conversion to show in the trace.  This wasted a lot of memory
    footprint and runtime cycles without giving any user-visible
    benefit.

    Exit early from the function when we we are not tracing before
    we spend all the effort, not after.

or something, perhaps?

I am wondering if we should be able to test this, but "git grep
GIT_TRACE_WORKING_TREE_ENCODING t/" is not finding any existing test
in the area.

> Signed-off-by: Harithamma D <harithamma.d@ibm.com>

This does not match the "From: " line above.  Please pick one way to
spell your name and identify yourself to this project, and use it
consistently.

Thanks.

> diff --git a/convert.c b/convert.c
> index d8737fe0f2d..c4ddc4de81b 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -324,6 +324,9 @@ static void trace_encoding(const char *context, const char *path,
>  	struct strbuf trace = STRBUF_INIT;
>  	int i;
>  
> +	if (!trace_want(&coe))
> +		return;
> +

The actual fix is so simple and nice ;-)

  reply	other threads:[~2024-07-24 21:42 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-16  8:03 [PATCH] Fix to avoid high memory footprint Haritha  via GitGitGadget
2024-07-17  6:16 ` Jeff King
2024-07-24 11:45 ` [PATCH v2] " Haritha  via GitGitGadget
2024-07-24 21:41   ` Junio C Hamano [this message]
2024-07-24 22:16   ` Jeff King
2024-07-26  6:27   ` [PATCH v3] " Haritha  via GitGitGadget
2024-07-26  9:55     ` Torsten Bögershausen
2024-07-26 14:00     ` [PATCH v4] convert: " Haritha  via GitGitGadget
2024-07-30  3:42       ` [PATCH v5] convert: return early when not tracing Haritha  via GitGitGadget
2024-07-31  2:42         ` Junio C Hamano
2024-07-31  9:32           ` Haritha D
2024-07-31 13:33         ` [PATCH v6] " Haritha  via GitGitGadget
2024-07-26 15:06     ` [PATCH v3] Fix to avoid high memory footprint Junio C Hamano
2024-07-26 15:12     ` Junio C Hamano
2024-07-30  3:41       ` Haritha D

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqmsm6sc0q.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=harithamma.d@ibm.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.