git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Ben Stav <benstav@miggo.io>
Subject: Re: [PATCH] attr: avoid recursion when expanding attribute macros
Date: Wed, 12 Nov 2025 07:57:14 +0100	[thread overview]
Message-ID: <aRQvyvMq61syGT7_@pks.im> (raw)
In-Reply-To: <20251111223647.GA4055973@coredump.intra.peff.net>

On Tue, Nov 11, 2025 at 05:36:47PM -0500, Jeff King wrote:
> Given a set of attribute macros like:
> 
>    [attr]a1 a2
>    [attr]a2 a3
>    ...
>    [attr]a300000 -text
>    file a1
> 
> expanding the attributes for "file" requires expanding "a1" to "a2",
> "a2" to "a3", and so on until hitting a non-macro expansion ("-text", in
> this case). We implement this via recursion: fill_one() calls
> macroexpand_one(), which then recurses back to fill_one(). As a result,
> very deep macro chains like the one above can run out of stack space and
> cause us to segfault.
> 
> The required stack space is fairly small; I needed on the order of
> 200,000 entries to get a segfault on Linux. So it's unlikely anybody
> would hit this accidentally, leaving only malicious inputs. There you
> can easily construct a repo which will segfault on clone (we look at
> attributes during the checkout step, but you'd see the same trying to do
> other operations, like diff in a bare repo). It's mostly harmless, since
> anybody constructing such a repo is only preventing victims from cloning
> their evil garbage, but it could be a nuisance for hosting sites.
> 
> One option to prevent this is to limit the depth of recursion we'll
> allow. This is conceptually easy to implement, but it raises other
> questions: what should the limit be, and do we need a configuration knob
> for it?

That's fair, and as you demonstrate it's easy enough to turn recursion
into iteration. But it doesn't really solve the main problem: given
malicious input we'd now still crash eventually, even though we
ourselves control how exactly we crash. The main difference is that with
iteration it'll both:

  - take longer for us to crash

  - require way more memory along the way

So the evil garbage would continue to be a nuisance for users who want
to clone such a repository, but now it's going to be more of a nuisance
for hosting sites given that it could lead to out-of-memory situations.

I guess the reasoning here is that for this to become a real problem the
".gitattributes" file would need to be excessively huge. We're probably
talking about many millions or even billions of attributes before this
could cause an OOM situation. And such a file would be large enough to
bust the typical limits that the likes of GitHub and GitLab have in
place, so Git hosters already protect themselves against this crafted
input, even if only indirectly so.

The other angle is of course the wasted compute that an adversary can
cause. But I don't really mind that too much: there's enough benign
operations that require a bunch of compute, so I don't really see a
reason why one would need to craft a "compute waster" with malicious
input.

So personally I would've probably leaned into the direction of enforcing
a hard limit. I don't see a reason why anybody would need more than a
couple of recursions, it culls both compute and memory growth, and it
allows us to have a proper error message in case the limit is busted.
Furthermore, we can demonstrate right now that it wasn't possible to
have unlimited recursion anyway, which makes it easier to put a new
limit into place.

But following my above reasoning I think it's okay to turn this into
iteration, as well, though, but I'd like to hear whether my train of
thought matches yours.

Thanks!

Patrick

  parent reply	other threads:[~2025-11-12  6:57 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-11 22:36 [PATCH] attr: avoid recursion when expanding attribute macros Jeff King
2025-11-12  1:30 ` Ben Knoble
2025-11-12  7:09   ` Jeff King
2025-11-12  7:17     ` Jeff King
2025-11-12  6:57 ` Patrick Steinhardt [this message]
2025-11-12  7:16   ` Jeff King
2025-11-12 10:21     ` Patrick Steinhardt
2025-11-12 17:40   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRQvyvMq61syGT7_@pks.im \
    --to=ps@pks.im \
    --cc=benstav@miggo.io \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).