git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] dir.c: skip .gitignore, etc larger than INT_MAX
Date: Wed, 05 Jun 2024 09:23:21 -0700	[thread overview]
Message-ID: <xmqqplsvuznq.fsf@gitster.g> (raw)
In-Reply-To: <20240605080308.GA2345232@coredump.intra.peff.net> (Jeff King's message of "Wed, 5 Jun 2024 04:03:08 -0400")

Jeff King <peff@peff.net> writes:

> I do not mind if consolidation waits for later, but I guess the
> immediate question is whether we'd prefer to reduce the limit set by my
> patch to a more security-conservative value. It would be easy to swap
> out INT_MAX for a 100MB #define on top.
>
> Maybe this?

Sounds sensible.  Thanks.

> -- >8 --
> Subject: [PATCH] dir.c: reduce max pattern file size to 100MB
>
> In a2bc523e1e (dir.c: skip .gitignore, etc larger than INT_MAX,
> 2024-05-31) we put capped the size of some files whose parsing code and
> data structures used ints. Setting the limit to INT_MAX was a natural
> spot, since we know the parsing code would misbehave above that.
>
> But it also leaves the possibility of overflow errors when we multiply
> that limit to allocate memory. For instance, a file consisting only of
> "a\na\n..." could have INT_MAX/2 entries. Allocating an array of
> pointers for each would need INT_MAX*4 bytes on a 64-bit system, enough
> to overflow a 32-bit int.
>
> So let's give ourselves a bit more safety margin by giving a much
> smaller limit. The size 100MB is somewhat arbitrary, but is based on the
> similar value for attribute files added by 3c50032ff5 (attr: ignore
> overly large gitattributes files, 2022-12-01).
>
> There's no particular reason these have to be the same, but the idea is
> that they are in the ballpark of "so huge that nobody would care, but
> small enough to avoid malicious overflow". So lacking a better guess, it
> makes sense to use the same value. The implementation here doesn't share
> the same constant, but we could change that later (or even give it a
> runtime config knob, though nobody has complained yet about the
> attribute limit).
>
> And likewise, let's add a few tests that exercise the limits, based on
> the attr ones. In this case, though, we never read .gitignore from the
> index; the blob code is exercised only for sparse filters. So we'll
> trigger it that way.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  dir.c                               | 10 ++++++++--
>  t/t0008-ignores.sh                  |  8 ++++++++
>  t/t6112-rev-list-filters-objects.sh | 12 ++++++++++++
>  3 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index 914060edfd..ad2b7ebe2d 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -32,6 +32,12 @@
>  #include "tree.h"
>  #include "hex.h"
>  
> + /*
> +  * The maximum size of a pattern/exclude file. If the file exceeds this size
> +  * we will ignore it.
> +  */
> +#define PATTERN_MAX_FILE_SIZE (100 * 1024 * 1024)
> +
>  /*
>   * Tells read_directory_recursive how a file or directory should be treated.
>   * Values are ordered by significance, e.g. if a directory contains both
> @@ -1149,7 +1155,7 @@ static int add_patterns(const char *fname, const char *base, int baselen,
>  		}
>  	}
>  
> -	if (size > INT_MAX) {
> +	if (size > PATTERN_MAX_FILE_SIZE) {
>  		warning("ignoring excessively large pattern file: %s", fname);
>  		free(buf);
>  		return -1;
> @@ -1211,7 +1217,7 @@ int add_patterns_from_blob_to_list(
>  	if (r != 1)
>  		return r;
>  
> -	if (size > INT_MAX) {
> +	if (size > PATTERN_MAX_FILE_SIZE) {
>  		warning("ignoring excessively large pattern blob: %s",
>  			oid_to_hex(oid));
>  		free(buf);
> diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
> index 361446b2f4..02a18d4fdb 100755
> --- a/t/t0008-ignores.sh
> +++ b/t/t0008-ignores.sh
> @@ -945,4 +945,12 @@ test_expect_success SYMLINKS 'symlinks not respected in-tree' '
>  	test_grep "unable to access.*gitignore" err
>  '
>  
> +test_expect_success EXPENSIVE 'large exclude file ignored in tree' '
> +	test_when_finished "rm .gitignore" &&
> +	dd if=/dev/zero of=.gitignore bs=101M count=1 &&
> +	git ls-files -o --exclude-standard 2>err &&
> +	echo "warning: ignoring excessively large pattern file: .gitignore" >expect &&
> +	test_cmp expect err
> +'
> +
>  test_done
> diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
> index 43e1afd44c..0387f35a32 100755
> --- a/t/t6112-rev-list-filters-objects.sh
> +++ b/t/t6112-rev-list-filters-objects.sh
> @@ -701,4 +701,16 @@ test_expect_success 'expand blob limit in protocol' '
>  	grep "blob:limit=1024" trace
>  '
>  
> +test_expect_success EXPENSIVE 'large sparse filter file ignored' '
> +	blob=$(dd if=/dev/zero bs=101M count=1 |
> +	       git hash-object -w --stdin) &&
> +	test_must_fail \
> +		git rev-list --all --objects --filter=sparse:oid=$blob 2>err &&
> +	cat >expect <<-EOF &&
> +	warning: ignoring excessively large pattern blob: $blob
> +	fatal: unable to parse sparse filter data in $blob
> +	EOF
> +	test_cmp expect err
> +'
> +
>  test_done

  reply	other threads:[~2024-06-05 16:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-31 12:00 [PATCH] dir.c: skip .gitignore, etc larger than INT_MAX Jeff King
2024-05-31 15:05 ` Junio C Hamano
2024-06-04 10:39   ` Jeff King
2024-06-04 17:45     ` Junio C Hamano
2024-06-05  8:03       ` Jeff King
2024-06-05 16:23         ` Junio C Hamano [this message]
2024-05-31 15:10 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqplsvuznq.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).