[PATCH] Fix to avoid high memory footprint

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] Fix to avoid high memory footprint
@ 2024-07-16  8:03 Haritha  via GitGitGadget
  2024-07-17  6:16 ` Jeff King
  2024-07-24 11:45 ` [PATCH v2] " Haritha  via GitGitGadget
  0 siblings, 2 replies; 15+ messages in thread
From: Haritha  via GitGitGadget @ 2024-07-16  8:03 UTC (permalink / raw)
  To: git; +Cc: Haritha, D Harithamma

From: D Harithamma <harithamma.d@ibm.com>

This fix avoids high memory footprint when
adding files that require conversion.
Git has a trace_encoding routine that prints trace
output when GIT_TRACE_WORKING_TREE_ENCODING=1 is
set. This environment variable is used to debug
the encoding contents.
When a 40MB file is added, it requests close to
1.8GB of storage from xrealloc which can lead
to out of memory errors.
However, the check for
GIT_TRACE_WORKING_TREE_ENCODING is done after
the string is allocated. This resolves high
memory footprints even when
GIT_TRACE_WORKING_TREE_ENCODING is not active.
This fix adds an early exit to avoid the
unnecessary memory allocation.

Signed-off-by: Haritha D <harithamma.d@ibm.com>
---
    Fix to avoid high memory footprint
    
    This fix avoids high memory footprint when adding files that require
    conversion
    
    Git has a trace_encoding routine that prints trace output when
    GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment variable is
    used to debug the encoding contents. When a 40MB file is added, it
    requests close to 1.8GB of storage from xrealloc which can lead to out
    of memory errors. However, the check for GIT_TRACE_WORKING_TREE_ENCODING
    is done after the string is allocated. This resolves high memory
    footprints even when GIT_TRACE_WORKING_TREE_ENCODING is not active. This
    fix adds an early exit to avoid the unnecessary memory allocation.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1744%2FHarithaIBM%2FmemFootprintFix-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1744/HarithaIBM/memFootprintFix-v1
Pull-Request: https://github.com/git/git/pull/1744

 convert.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/convert.c b/convert.c
index d8737fe0f2d..e765bcd53d6 100644
--- a/convert.c
+++ b/convert.c
@@ -324,6 +324,11 @@ static void trace_encoding(const char *context, const char *path,
 	struct strbuf trace = STRBUF_INIT;
 	int i;
 
+	// If tracing is not on, exit early to avoid high memory footprint
+	if (!trace_pass_fl(&coe)) {
+		return;
+	}
+
 	strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding);
 	for (i = 0; i < len && buf; ++i) {
 		strbuf_addf(

base-commit: 557ae147e6cdc9db121269b058c757ac5092f9c9
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] Fix to avoid high memory footprint
  2024-07-16  8:03 [PATCH] Fix to avoid high memory footprint Haritha  via GitGitGadget
@ 2024-07-17  6:16 ` Jeff King
  2024-07-24 11:45 ` [PATCH v2] " Haritha  via GitGitGadget
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff King @ 2024-07-17  6:16 UTC (permalink / raw)
  To: Haritha via GitGitGadget; +Cc: Lars Schneider, git, Haritha

On Tue, Jul 16, 2024 at 08:03:59AM +0000, Haritha  via GitGitGadget wrote:

> From: D Harithamma <harithamma.d@ibm.com>
> 
> This fix avoids high memory footprint when
> adding files that require conversion.
> Git has a trace_encoding routine that prints trace
> output when GIT_TRACE_WORKING_TREE_ENCODING=1 is
> set. This environment variable is used to debug
> the encoding contents.
> When a 40MB file is added, it requests close to
> 1.8GB of storage from xrealloc which can lead
> to out of memory errors.
> However, the check for
> GIT_TRACE_WORKING_TREE_ENCODING is done after
> the string is allocated. This resolves high
> memory footprints even when
> GIT_TRACE_WORKING_TREE_ENCODING is not active.
> This fix adds an early exit to avoid the
> unnecessary memory allocation.
> 
> Signed-off-by: Haritha D <harithamma.d@ibm.com>

Good find. Any trace function should verify that tracing is enabled
before doing any substantial work.

Let's take a look at your patch. First, your line wrapping is unusual,
making the commit message a bit hard to read. We'd usually shoot for ~72
characters per line. So more like:

> This fix avoids high memory footprint when adding files that require
> conversion.  Git has a trace_encoding routine that prints trace output
> when GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment
> variable is used to debug the encoding contents.  When a 40MB file is
> added, it requests close to 1.8GB of storage from xrealloc which can
> lead to out of memory errors.  However, the check for
> GIT_TRACE_WORKING_TREE_ENCODING is done after the string is allocated.
> This resolves high memory footprints even when
> GIT_TRACE_WORKING_TREE_ENCODING is not active.  This fix adds an early
> exit to avoid the unnecessary memory allocation.

Second, we'd like a full real name in the Signed-off-by line, as you're
agreeing to the DCO. See:

  https://git-scm.com/docs/SubmittingPatches#sign-off

Likewise, the author name should match the signoff name (you can use
"git commit --amend --author=..." to fix it).

For the patch itself:

> --- a/convert.c
> +++ b/convert.c
> @@ -324,6 +324,11 @@ static void trace_encoding(const char *context, const char *path,
>  	struct strbuf trace = STRBUF_INIT;
>  	int i;
>  
> +	// If tracing is not on, exit early to avoid high memory footprint
> +	if (!trace_pass_fl(&coe)) {
> +		return;
> +	}

I don't think trace_pass_fl() is what you want. It will return true if
the trace fd is non-zero (so tracing was requested), but also if the key
has not yet been initialized (i.e., nobody has used this key to try
printing anything yet).

I think you'd just use trace_want(&coe) instead.

Also, two style nits:

 - our usual style (see Documentation/CodingGuidelines) is to avoid
   braces for one-liners.

 - we only use the /* */ comment form, not //. Though IMHO you could
   skip the comment completely here, as an early-return check in a
   tracing function is pretty obvious.

It would be nice if we could test this, but besides the wasted work, I
don't think there's any user-visible behavior (the problem is that we
are computing things when we're _not_ tracing, so there's nothing for
the user to see). And there's no provision in our test suite for
measuring memory usage of a program. So I think we can live without it,
and just manually verifying that it works (but it would be good to show
the measurements you did manually in the commit message).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] Fix to avoid high memory footprint
  2024-07-16  8:03 [PATCH] Fix to avoid high memory footprint Haritha  via GitGitGadget
  2024-07-17  6:16 ` Jeff King
@ 2024-07-24 11:45 ` Haritha  via GitGitGadget
  2024-07-24 21:41   ` Junio C Hamano
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Haritha  via GitGitGadget @ 2024-07-24 11:45 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Haritha, D Harithamma

From: D Harithamma <harithamma.d@ibm.com>

This fix avoids high memory footprint when adding files that require
conversion.  Git has a trace_encoding routine that prints trace output
when GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment
variable is used to debug the encoding contents.  When a 40MB file is
added, it requests close to 1.8GB of storage from xrealloc which can
lead to out of memory errors.  However, the check for
GIT_TRACE_WORKING_TREE_ENCODING is done after the string is allocated.
This resolves high memory footprints even when
GIT_TRACE_WORKING_TREE_ENCODING is not active.  This fix adds an early
exit to avoid the unnecessary memory allocation.

Signed-off-by: Harithamma D <harithamma.d@ibm.com>
---
    Fix to avoid high memory footprint
    
    This fix avoids high memory footprint when adding files that require
    conversion
    
    Git has a trace_encoding routine that prints trace output when
    GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment variable is
    used to debug the encoding contents. When a 40MB file is added, it
    requests close to 1.8GB of storage from xrealloc which can lead to out
    of memory errors. However, the check for GIT_TRACE_WORKING_TREE_ENCODING
    is done after the string is allocated. This resolves high memory
    footprints even when GIT_TRACE_WORKING_TREE_ENCODING is not active. This
    fix adds an early exit to avoid the unnecessary memory allocation.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1744%2FHarithaIBM%2FmemFootprintFix-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1744/HarithaIBM/memFootprintFix-v2
Pull-Request: https://github.com/git/git/pull/1744

Range-diff vs v1:

 1:  51c02f58fd6 ! 1:  500b7eacf2a Fix to avoid high memory footprint
     @@ Metadata
       ## Commit message ##
          Fix to avoid high memory footprint
      
     -    This fix avoids high memory footprint when
     -    adding files that require conversion.
     -    Git has a trace_encoding routine that prints trace
     -    output when GIT_TRACE_WORKING_TREE_ENCODING=1 is
     -    set. This environment variable is used to debug
     -    the encoding contents.
     -    When a 40MB file is added, it requests close to
     -    1.8GB of storage from xrealloc which can lead
     -    to out of memory errors.
     -    However, the check for
     -    GIT_TRACE_WORKING_TREE_ENCODING is done after
     -    the string is allocated. This resolves high
     -    memory footprints even when
     -    GIT_TRACE_WORKING_TREE_ENCODING is not active.
     -    This fix adds an early exit to avoid the
     -    unnecessary memory allocation.
     +    This fix avoids high memory footprint when adding files that require
     +    conversion.  Git has a trace_encoding routine that prints trace output
     +    when GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment
     +    variable is used to debug the encoding contents.  When a 40MB file is
     +    added, it requests close to 1.8GB of storage from xrealloc which can
     +    lead to out of memory errors.  However, the check for
     +    GIT_TRACE_WORKING_TREE_ENCODING is done after the string is allocated.
     +    This resolves high memory footprints even when
     +    GIT_TRACE_WORKING_TREE_ENCODING is not active.  This fix adds an early
     +    exit to avoid the unnecessary memory allocation.
      
     -    Signed-off-by: Haritha D <harithamma.d@ibm.com>
     +    Signed-off-by: Harithamma D <harithamma.d@ibm.com>
      
       ## convert.c ##
      @@ convert.c: static void trace_encoding(const char *context, const char *path,
       	struct strbuf trace = STRBUF_INIT;
       	int i;
       
     -+	// If tracing is not on, exit early to avoid high memory footprint
     -+	if (!trace_pass_fl(&coe)) {
     ++	if (!trace_want(&coe))
      +		return;
     -+	}
      +
       	strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding);
       	for (i = 0; i < len && buf; ++i) {


 convert.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/convert.c b/convert.c
index d8737fe0f2d..c4ddc4de81b 100644
--- a/convert.c
+++ b/convert.c
@@ -324,6 +324,9 @@ static void trace_encoding(const char *context, const char *path,
 	struct strbuf trace = STRBUF_INIT;
 	int i;
 
+	if (!trace_want(&coe))
+		return;
+
 	strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding);
 	for (i = 0; i < len && buf; ++i) {
 		strbuf_addf(

base-commit: 557ae147e6cdc9db121269b058c757ac5092f9c9
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] Fix to avoid high memory footprint
  2024-07-24 11:45 ` [PATCH v2] " Haritha  via GitGitGadget
@ 2024-07-24 21:41   ` Junio C Hamano
  2024-07-24 22:16   ` Jeff King
  2024-07-26  6:27   ` [PATCH v3] " Haritha  via GitGitGadget
  2 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2024-07-24 21:41 UTC (permalink / raw)
  To: Haritha via GitGitGadget; +Cc: git, Jeff King, Haritha

"Haritha  via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: D Harithamma <harithamma.d@ibm.com>
>
> This fix avoids high memory footprint when adding files that require
> conversion.  Git has a trace_encoding routine that prints trace output
> when GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment
> variable is used to debug the encoding contents.  When a 40MB file is
> added, it requests close to 1.8GB of storage from xrealloc which can
> lead to out of memory errors.  However, the check for
> GIT_TRACE_WORKING_TREE_ENCODING is done after the string is allocated.
> This resolves high memory footprints even when
> GIT_TRACE_WORKING_TREE_ENCODING is not active.  This fix adds an early
> exit to avoid the unnecessary memory allocation.

The sentences jump around and the logic flow is hard to follow.  The
first sentence makes a claim of what it does (but the readers have
not bee told where that problem comes from).  The second sentence
makes a statement of a fact, but the readers do not yet know at that
point what relevance the fact has to the issue at hand, etc.

The usual way to compose a log message of this project is to

 - Give an observation on how the current system work in the present
   tense (so no need to say "Currently X is Y", just "X is Y"), and
   discuss what you perceive as a problem in it.

 - Propose a solution (optional---often, problem description
   trivially leads to an obvious solution in reader's minds).

 - Give commands to the codebase to "become like so".

in this order.


    When Git needs to add a file that require encoding conversion,
    but tracing of encoding conversion is *not* requested via
    setting GIT_TRACE_WORKING_TREE_ENCODING environment variable,
    the trace_encoding() function still allocated and prepared
    "human readable" copies of the file contents before and after
    conversion to show in the trace.  This wasted a lot of memory
    footprint and runtime cycles without giving any user-visible
    benefit.

    Exit early from the function when we we are not tracing before
    we spend all the effort, not after.

or something, perhaps?

I am wondering if we should be able to test this, but "git grep
GIT_TRACE_WORKING_TREE_ENCODING t/" is not finding any existing test
in the area.

> Signed-off-by: Harithamma D <harithamma.d@ibm.com>

This does not match the "From: " line above.  Please pick one way to
spell your name and identify yourself to this project, and use it
consistently.

Thanks.

> diff --git a/convert.c b/convert.c
> index d8737fe0f2d..c4ddc4de81b 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -324,6 +324,9 @@ static void trace_encoding(const char *context, const char *path,
>  	struct strbuf trace = STRBUF_INIT;
>  	int i;
>  
> +	if (!trace_want(&coe))
> +		return;
> +

The actual fix is so simple and nice ;-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] Fix to avoid high memory footprint
  2024-07-24 11:45 ` [PATCH v2] " Haritha  via GitGitGadget
  2024-07-24 21:41   ` Junio C Hamano
@ 2024-07-24 22:16   ` Jeff King
  2024-07-26  6:27   ` [PATCH v3] " Haritha  via GitGitGadget
  2 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2024-07-24 22:16 UTC (permalink / raw)
  To: Haritha via GitGitGadget; +Cc: git, Haritha

On Wed, Jul 24, 2024 at 11:45:03AM +0000, Haritha  via GitGitGadget wrote:

> diff --git a/convert.c b/convert.c
> index d8737fe0f2d..c4ddc4de81b 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -324,6 +324,9 @@ static void trace_encoding(const char *context, const char *path,
>  	struct strbuf trace = STRBUF_INIT;
>  	int i;
>  
> +	if (!trace_want(&coe))
> +		return;
> +
>  	strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding);
>  	for (i = 0; i < len && buf; ++i) {
>  		strbuf_addf(

The patch itself looks good. I confirmed that running:

  git init
  dd if=/dev/zero of=foo.bin bs=1M count=50
  echo '*.bin working-tree-encoding=UTF-16LE' >.gitattributes
  valgrind --tool=massif git add .

goes from a max heap of 1.7G down to 51MB with your patch (whereas I
think with the previous iteration it would not have, since the old check
did the wrong thing on the first call to trace_encoding()).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] Fix to avoid high memory footprint
  2024-07-24 11:45 ` [PATCH v2] " Haritha  via GitGitGadget
  2024-07-24 21:41   ` Junio C Hamano
  2024-07-24 22:16   ` Jeff King
@ 2024-07-26  6:27   ` Haritha  via GitGitGadget
  2024-07-26  9:55     ` Torsten Bögershausen
                       ` (3 more replies)
  2 siblings, 4 replies; 15+ messages in thread
From: Haritha  via GitGitGadget @ 2024-07-26  6:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Haritha, D Harithamma

From: D Harithamma <harithamma.d@ibm.com>

When Git adds a file requiring encoding conversion and tracing of encoding
conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
environment variable, the `trace_encoding()` function still allocates &
prepares "human readable" copies of the file contents before and after
conversion to show in the trace. This results in a high memory footprint
and increased runtime without providing any user-visible benefit.

This fix introduces an early exit from the `trace_encoding()` function
when tracing is not requested, preventing unnecessary memory allocation
and processing.

Signed-off-by: Harithamma D <harithamma.d@ibm.com>
---
    Fix to avoid high memory footprint
    
    This fix avoids high memory footprint when adding files that require
    conversion
    
    Git has a trace_encoding routine that prints trace output when
    GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment variable is
    used to debug the encoding contents. When a 40MB file is added, it
    requests close to 1.8GB of storage from xrealloc which can lead to out
    of memory errors. However, the check for GIT_TRACE_WORKING_TREE_ENCODING
    is done after the string is allocated. This resolves high memory
    footprints even when GIT_TRACE_WORKING_TREE_ENCODING is not active. This
    fix adds an early exit to avoid the unnecessary memory allocation.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1744%2FHarithaIBM%2FmemFootprintFix-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1744/HarithaIBM/memFootprintFix-v3
Pull-Request: https://github.com/git/git/pull/1744

Range-diff vs v2:

 1:  500b7eacf2a ! 1:  d864de64380 Fix to avoid high memory footprint
     @@ Metadata
       ## Commit message ##
          Fix to avoid high memory footprint
      
     -    This fix avoids high memory footprint when adding files that require
     -    conversion.  Git has a trace_encoding routine that prints trace output
     -    when GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment
     -    variable is used to debug the encoding contents.  When a 40MB file is
     -    added, it requests close to 1.8GB of storage from xrealloc which can
     -    lead to out of memory errors.  However, the check for
     -    GIT_TRACE_WORKING_TREE_ENCODING is done after the string is allocated.
     -    This resolves high memory footprints even when
     -    GIT_TRACE_WORKING_TREE_ENCODING is not active.  This fix adds an early
     -    exit to avoid the unnecessary memory allocation.
     +    When Git adds a file requiring encoding conversion and tracing of encoding
     +    conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
     +    environment variable, the `trace_encoding()` function still allocates &
     +    prepares "human readable" copies of the file contents before and after
     +    conversion to show in the trace. This results in a high memory footprint
     +    and increased runtime without providing any user-visible benefit.
     +
     +    This fix introduces an early exit from the `trace_encoding()` function
     +    when tracing is not requested, preventing unnecessary memory allocation
     +    and processing.
      
          Signed-off-by: Harithamma D <harithamma.d@ibm.com>
      


 convert.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/convert.c b/convert.c
index d8737fe0f2d..c4ddc4de81b 100644
--- a/convert.c
+++ b/convert.c
@@ -324,6 +324,9 @@ static void trace_encoding(const char *context, const char *path,
 	struct strbuf trace = STRBUF_INIT;
 	int i;
 
+	if (!trace_want(&coe))
+		return;
+
 	strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding);
 	for (i = 0; i < len && buf; ++i) {
 		strbuf_addf(

base-commit: 557ae147e6cdc9db121269b058c757ac5092f9c9
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] Fix to avoid high memory footprint
  2024-07-26  6:27   ` [PATCH v3] " Haritha  via GitGitGadget
@ 2024-07-26  9:55     ` Torsten Bögershausen
  2024-07-26 14:00     ` [PATCH v4] convert: " Haritha  via GitGitGadget
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Torsten Bögershausen @ 2024-07-26  9:55 UTC (permalink / raw)
  To: Haritha via GitGitGadget; +Cc: git, Jeff King, Haritha

On Fri, Jul 26, 2024 at 06:27:14AM +0000, Haritha  via GitGitGadget wrote:
> From: D Harithamma <harithamma.d@ibm.com>
>
> When Git adds a file requiring encoding conversion and tracing of encoding
> conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
> environment variable, the `trace_encoding()` function still allocates &
> prepares "human readable" copies of the file contents before and after
> conversion to show in the trace. This results in a high memory footprint
> and increased runtime without providing any user-visible benefit.
>
> This fix introduces an early exit from the `trace_encoding()` function
> when tracing is not requested, preventing unnecessary memory allocation
> and processing.
>
> Signed-off-by: Harithamma D <harithamma.d@ibm.com>
> ---
>     Fix to avoid high memory footprint
>

This head line
> Fix to avoid high memory footprint
does not tell to much when and how it happens.
The word "fix" is not realy needed (in this project).

Something like
 "convert: avoid high memory footprint"

will tell the reader, that only the convert functionality is affected
by this patch.

Thinking about it, another suggestion may be:

convert: Reduce memory allocation when trace_encoding() is not used

If someone browses through the whole history of Git, this is easier to
follow.

The exact wording may be improved, important would be to have "convert:"

as the first keyword, and then "memory allocation" and "trace_encoding()"
give hints, what this is all about in one line.

And the rest looks good.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4] convert: avoid high memory footprint
  2024-07-26  6:27   ` [PATCH v3] " Haritha  via GitGitGadget
  2024-07-26  9:55     ` Torsten Bögershausen
@ 2024-07-26 14:00     ` Haritha  via GitGitGadget
  2024-07-30  3:42       ` [PATCH v5] convert: return early when not tracing Haritha  via GitGitGadget
  2024-07-26 15:06     ` [PATCH v3] Fix to avoid high memory footprint Junio C Hamano
  2024-07-26 15:12     ` Junio C Hamano
  3 siblings, 1 reply; 15+ messages in thread
From: Haritha  via GitGitGadget @ 2024-07-26 14:00 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Torsten Bögershausen, Haritha, D Harithamma

From: D Harithamma <harithamma.d@ibm.com>

When Git adds a file requiring encoding conversion and tracing of encoding
conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
environment variable, the `trace_encoding()` function still allocates &
prepares "human readable" copies of the file contents before and after
conversion to show in the trace. This results in a high memory footprint
and increased runtime without providing any user-visible benefit.

This fix introduces an early exit from the `trace_encoding()` function
when tracing is not requested, preventing unnecessary memory allocation
and processing.

Signed-off-by: Harithamma D <harithamma.d@ibm.com>
---
    Fix to avoid high memory footprint
    
    This fix avoids high memory footprint when adding files that require
    conversion
    
    Git has a trace_encoding routine that prints trace output when
    GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment variable is
    used to debug the encoding contents. When a 40MB file is added, it
    requests close to 1.8GB of storage from xrealloc which can lead to out
    of memory errors. However, the check for GIT_TRACE_WORKING_TREE_ENCODING
    is done after the string is allocated. This resolves high memory
    footprints even when GIT_TRACE_WORKING_TREE_ENCODING is not active. This
    fix adds an early exit to avoid the unnecessary memory allocation.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1744%2FHarithaIBM%2FmemFootprintFix-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1744/HarithaIBM/memFootprintFix-v4
Pull-Request: https://github.com/git/git/pull/1744

Range-diff vs v3:

 1:  d864de64380 ! 1:  50758a4fb94 Fix to avoid high memory footprint
     @@ Metadata
      Author: D Harithamma <harithamma.d@ibm.com>
      
       ## Commit message ##
     -    Fix to avoid high memory footprint
     +    convert: avoid high memory footprint
      
          When Git adds a file requiring encoding conversion and tracing of encoding
          conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING


 convert.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/convert.c b/convert.c
index d8737fe0f2d..c4ddc4de81b 100644
--- a/convert.c
+++ b/convert.c
@@ -324,6 +324,9 @@ static void trace_encoding(const char *context, const char *path,
 	struct strbuf trace = STRBUF_INIT;
 	int i;
 
+	if (!trace_want(&coe))
+		return;
+
 	strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding);
 	for (i = 0; i < len && buf; ++i) {
 		strbuf_addf(

base-commit: 557ae147e6cdc9db121269b058c757ac5092f9c9
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] Fix to avoid high memory footprint
  2024-07-26  6:27   ` [PATCH v3] " Haritha  via GitGitGadget
  2024-07-26  9:55     ` Torsten Bögershausen
  2024-07-26 14:00     ` [PATCH v4] convert: " Haritha  via GitGitGadget
@ 2024-07-26 15:06     ` Junio C Hamano
  2024-07-26 15:12     ` Junio C Hamano
  3 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2024-07-26 15:06 UTC (permalink / raw)
  To: Haritha via GitGitGadget; +Cc: git, Jeff King, Haritha

"Haritha  via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: D Harithamma <harithamma.d@ibm.com>
> ...
> Signed-off-by: Harithamma D <harithamma.d@ibm.com>

I am assuming that you and the person who did d254e650 (build:
support z/OS (OS/390)., 2024-03-06) are the same person?  That
commit was signed off like so:

    commit d254e65092daba8667d6b4d5b4f59c099c1edd1f
    Author: Haritha D <harithamma.d@ibm.com>
    Date:   Wed Mar 6 05:44:17 2024 +0000

        build: support z/OS (OS/390).

        Introduced z/OS (OS/390) as a platform in config.mak.uname

        Signed-off-by: Haritha D <harithamma.d@ibm.com>
        Signed-off-by: Junio C Hamano <gitster@pobox.com>

It is OK if you really want to use a longer name this time, but then
please be consistent within a single commit.  The author name of the
proposed commit is "D Harithamma" and a different name "Harithamma D"
is used to sign off the commit, which is not what we want to see.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] Fix to avoid high memory footprint
  2024-07-26  6:27   ` [PATCH v3] " Haritha  via GitGitGadget
                       ` (2 preceding siblings ...)
  2024-07-26 15:06     ` [PATCH v3] Fix to avoid high memory footprint Junio C Hamano
@ 2024-07-26 15:12     ` Junio C Hamano
  2024-07-30  3:41       ` Haritha D
  3 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2024-07-26 15:12 UTC (permalink / raw)
  To: Haritha via GitGitGadget; +Cc: git, Jeff King, Haritha

"Haritha  via GitGitGadget" <gitgitgadget@gmail.com> writes:

Another thing.

> Subject: Re: [PATCH v3] Fix to avoid high memory footprint

This does not tell us much about what area had problem under what
condition.  "git shortlog --no-merges -200 master" may give us good
examples of how we typically write the title of our commits.

In the case of this change, ideally we should be able to tell that
this is about tracing the conversion codepath.

    Subject: [PATCH vN] encoding: return early when not tracing conversion

or something, perhaps?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v3] Fix to avoid high memory footprint
  2024-07-26 15:12     ` Junio C Hamano
@ 2024-07-30  3:41       ` Haritha D
  0 siblings, 0 replies; 15+ messages in thread
From: Haritha D @ 2024-07-30  3:41 UTC (permalink / raw)
  To: Junio C Hamano, Haritha via GitGitGadget; +Cc: git@vger.kernel.org, Jeff King

Hello Team,

I have retained convert as per Torsten's comment. Rest i have changed as per Junio's suggestion. 

Thank you, everyone.

On 26/07/24, 8:42 PM, "Junio C Hamano" <gitster@pobox.com <mailto:gitster@pobox.com>> wrote:

"Haritha via GitGitGadget" <gitgitgadget@gmail.com <mailto:gitgitgadget@gmail.com>> writes:

Another thing.

> Subject: Re: [PATCH v3] Fix to avoid high memory footprint

This does not tell us much about what area had problem under what
condition. "git shortlog --no-merges -200 master" may give us good
examples of how we typically write the title of our commits.

In the case of this change, ideally we should be able to tell that
this is about tracing the conversion codepath.

Subject: [PATCH vN] encoding: return early when not tracing conversion

or something, perhaps?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v5] convert: return early when not tracing
  2024-07-26 14:00     ` [PATCH v4] convert: " Haritha  via GitGitGadget
@ 2024-07-30  3:42       ` Haritha  via GitGitGadget
  2024-07-31  2:42         ` Junio C Hamano
  2024-07-31 13:33         ` [PATCH v6] " Haritha  via GitGitGadget
  0 siblings, 2 replies; 15+ messages in thread
From: Haritha  via GitGitGadget @ 2024-07-30  3:42 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Torsten Bögershausen, Haritha, D Harithamma

From: D Harithamma <harithamma.d@ibm.com>

When Git adds a file requiring encoding conversion and tracing of encoding
conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
environment variable, the `trace_encoding()` function still allocates &
prepares "human readable" copies of the file contents before and after
conversion to show in the trace. This results in a high memory footprint
and increased runtime without providing any user-visible benefit.

This fix introduces an early exit from the `trace_encoding()` function
when tracing is not requested, preventing unnecessary memory allocation
and processing.

Signed-off-by: Harithamma D <harithamma.d@ibm.com>
---
    Fix to avoid high memory footprint
    
    This fix avoids high memory footprint when adding files that require
    conversion
    
    Git has a trace_encoding routine that prints trace output when
    GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment variable is
    used to debug the encoding contents. When a 40MB file is added, it
    requests close to 1.8GB of storage from xrealloc which can lead to out
    of memory errors. However, the check for GIT_TRACE_WORKING_TREE_ENCODING
    is done after the string is allocated. This resolves high memory
    footprints even when GIT_TRACE_WORKING_TREE_ENCODING is not active. This
    fix adds an early exit to avoid the unnecessary memory allocation.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1744%2FHarithaIBM%2FmemFootprintFix-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1744/HarithaIBM/memFootprintFix-v5
Pull-Request: https://github.com/git/git/pull/1744

Range-diff vs v4:

 1:  50758a4fb94 ! 1:  e2518b28f1c convert: avoid high memory footprint
     @@ Metadata
      Author: D Harithamma <harithamma.d@ibm.com>
      
       ## Commit message ##
     -    convert: avoid high memory footprint
     +    convert: return early when not tracing
      
          When Git adds a file requiring encoding conversion and tracing of encoding
          conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING


 convert.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/convert.c b/convert.c
index d8737fe0f2d..c4ddc4de81b 100644
--- a/convert.c
+++ b/convert.c
@@ -324,6 +324,9 @@ static void trace_encoding(const char *context, const char *path,
 	struct strbuf trace = STRBUF_INIT;
 	int i;
 
+	if (!trace_want(&coe))
+		return;
+
 	strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding);
 	for (i = 0; i < len && buf; ++i) {
 		strbuf_addf(

base-commit: 557ae147e6cdc9db121269b058c757ac5092f9c9
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5] convert: return early when not tracing
  2024-07-30  3:42       ` [PATCH v5] convert: return early when not tracing Haritha  via GitGitGadget
@ 2024-07-31  2:42         ` Junio C Hamano
  2024-07-31  9:32           ` Haritha D
  2024-07-31 13:33         ` [PATCH v6] " Haritha  via GitGitGadget
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2024-07-31  2:42 UTC (permalink / raw)
  To: Haritha via GitGitGadget
  Cc: git, Jeff King, Torsten Bögershausen, Haritha

"Haritha  via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: D Harithamma <harithamma.d@ibm.com>
>
> When Git adds a file requiring encoding conversion and tracing of encoding
> conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
> environment variable, the `trace_encoding()` function still allocates &
> prepares "human readable" copies of the file contents before and after
> conversion to show in the trace. This results in a high memory footprint
> and increased runtime without providing any user-visible benefit.
>
> This fix introduces an early exit from the `trace_encoding()` function
> when tracing is not requested, preventing unnecessary memory allocation
> and processing.
>
> Signed-off-by: Harithamma D <harithamma.d@ibm.com>
> ---

It seems that you forgot to adjust to

 https://lore.kernel.org/git/xmqqed7gyyyd.fsf@gitster.g/

where I asked you to be consistent in the authorship name and sign
off.

For now, as I like to allow "git shortlog --author=..." to group
contributions by a single author to a single bucket, I'll rewrite
both to the same name as used in d254e650 (build: support z/OS
(OS/390)., 2024-03-06), but will not merge it down to 'next' before
I hear what your response is.

Thanks.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v5] convert: return early when not tracing
  2024-07-31  2:42         ` Junio C Hamano
@ 2024-07-31  9:32           ` Haritha D
  0 siblings, 0 replies; 15+ messages in thread
From: Haritha D @ 2024-07-31  9:32 UTC (permalink / raw)
  To: Junio C Hamano, Haritha via GitGitGadget
  Cc: git@vger.kernel.org, Jeff King, Torsten Bögershausen

Hey Junio,

You're right. I forgot to respond to that question.
I used full name based on Jeff's suggestion. 
I will ensure it remains consistent.
Thank you reminding me.

Best Regards
Haritha

On 31/07/24, 8:12 AM, "Junio C Hamano" <gitster@pobox.com <mailto:gitster@pobox.com>> wrote:


"Haritha via GitGitGadget" <gitgitgadget@gmail.com <mailto:gitgitgadget@gmail.com>> writes:


> From: D Harithamma <harithamma.d@ibm.com <mailto:harithamma.d@ibm.com>>
>
> When Git adds a file requiring encoding conversion and tracing of encoding
> conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
> environment variable, the `trace_encoding()` function still allocates &
> prepares "human readable" copies of the file contents before and after
> conversion to show in the trace. This results in a high memory footprint
> and increased runtime without providing any user-visible benefit.
>
> This fix introduces an early exit from the `trace_encoding()` function
> when tracing is not requested, preventing unnecessary memory allocation
> and processing.
>
> Signed-off-by: Harithamma D <harithamma.d@ibm.com <mailto:harithamma.d@ibm.com>>
> ---


It seems that you forgot to adjust to


https://lore.kernel.org/git/xmqqed7gyyyd.fsf@gitster.g/ <https://lore.kernel.org/git/xmqqed7gyyyd.fsf@gitster.g/> 


where I asked you to be consistent in the authorship name and sign
off.


For now, as I like to allow "git shortlog --author=..." to group
contributions by a single author to a single bucket, I'll rewrite
both to the same name as used in d254e650 (build: support z/OS
(OS/390)., 2024-03-06), but will not merge it down to 'next' before
I hear what your response is.


Thanks.






^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v6] convert: return early when not tracing
  2024-07-30  3:42       ` [PATCH v5] convert: return early when not tracing Haritha  via GitGitGadget
  2024-07-31  2:42         ` Junio C Hamano
@ 2024-07-31 13:33         ` Haritha  via GitGitGadget
  1 sibling, 0 replies; 15+ messages in thread
From: Haritha  via GitGitGadget @ 2024-07-31 13:33 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Torsten Bögershausen, Haritha D, Haritha,
	D Harithamma

From: D Harithamma <harithamma.d@ibm.com>

When Git adds a file requiring encoding conversion and tracing of encoding
conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
environment variable, the `trace_encoding()` function still allocates &
prepares "human readable" copies of the file contents before and after
conversion to show in the trace. This results in a high memory footprint
and increased runtime without providing any user-visible benefit.

This fix introduces an early exit from the `trace_encoding()` function
when tracing is not requested, preventing unnecessary memory allocation
and processing.

Signed-off-by: D Harithamma <harithamma.d@ibm.com>
---
    Fix to avoid high memory footprint
    
    This fix avoids high memory footprint when adding files that require
    conversion
    
    Git has a trace_encoding routine that prints trace output when
    GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment variable is
    used to debug the encoding contents. When a 40MB file is added, it
    requests close to 1.8GB of storage from xrealloc which can lead to out
    of memory errors. However, the check for GIT_TRACE_WORKING_TREE_ENCODING
    is done after the string is allocated. This resolves high memory
    footprints even when GIT_TRACE_WORKING_TREE_ENCODING is not active. This
    fix adds an early exit to avoid the unnecessary memory allocation.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1744%2FHarithaIBM%2FmemFootprintFix-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1744/HarithaIBM/memFootprintFix-v6
Pull-Request: https://github.com/git/git/pull/1744

Range-diff vs v5:

 1:  e2518b28f1c ! 1:  7b68aa9de1d convert: return early when not tracing
     @@ Commit message
          when tracing is not requested, preventing unnecessary memory allocation
          and processing.
      
     -    Signed-off-by: Harithamma D <harithamma.d@ibm.com>
     +    Signed-off-by: D Harithamma <harithamma.d@ibm.com>
      
       ## convert.c ##
      @@ convert.c: static void trace_encoding(const char *context, const char *path,


 convert.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/convert.c b/convert.c
index d8737fe0f2d..c4ddc4de81b 100644
--- a/convert.c
+++ b/convert.c
@@ -324,6 +324,9 @@ static void trace_encoding(const char *context, const char *path,
 	struct strbuf trace = STRBUF_INIT;
 	int i;
 
+	if (!trace_want(&coe))
+		return;
+
 	strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding);
 	for (i = 0; i < len && buf; ++i) {
 		strbuf_addf(

base-commit: 557ae147e6cdc9db121269b058c757ac5092f9c9
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-07-31 13:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-16  8:03 [PATCH] Fix to avoid high memory footprint Haritha  via GitGitGadget
2024-07-17  6:16 ` Jeff King
2024-07-24 11:45 ` [PATCH v2] " Haritha  via GitGitGadget
2024-07-24 21:41   ` Junio C Hamano
2024-07-24 22:16   ` Jeff King
2024-07-26  6:27   ` [PATCH v3] " Haritha  via GitGitGadget
2024-07-26  9:55     ` Torsten Bögershausen
2024-07-26 14:00     ` [PATCH v4] convert: " Haritha  via GitGitGadget
2024-07-30  3:42       ` [PATCH v5] convert: return early when not tracing Haritha  via GitGitGadget
2024-07-31  2:42         ` Junio C Hamano
2024-07-31  9:32           ` Haritha D
2024-07-31 13:33         ` [PATCH v6] " Haritha  via GitGitGadget
2024-07-26 15:06     ` [PATCH v3] Fix to avoid high memory footprint Junio C Hamano
2024-07-26 15:12     ` Junio C Hamano
2024-07-30  3:41       ` Haritha D

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).