[PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
@ 2023-12-12 22:12 Jeff King
  2023-12-13  7:07 ` Patrick Steinhardt
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2023-12-12 22:12 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Carlos Andrés Ramírez Cataño

When processing a header like a "From" line, mailinfo uses
unquote_quoted_pair() to handle double-quotes and rfc822 parenthesized
comments. It takes a NUL-terminated string on input, and loops over the
"in" pointer until it sees the NUL. When it finds the start of an
interesting block, it delegates to helper functions which also increment
"in", and return the updated pointer.

But there's a bug here: the helpers find the NUL with a post-increment
in the loop condition, like:

   while ((c = *in++) != 0)

So when they do see a NUL (rather than the correct termination of the
quote or comment section), they return "in" as one _past_ the NUL
terminator. And thus the outer loop in unquote_quoted_pair() does not
realize we hit the NUL, and keeps reading past the end of the buffer.

We should instead make sure to return "in" positioned at the NUL, so
that the caller knows to stop their loop, too. A hacky way to do this is
to return "in - 1" after leaving the inner loop. But a slightly cleaner
solution is to avoid incrementing "in" until we are sure it contained a
non-NUL byte (i.e., doing it inside the loop body).

The two tests here show off the problem. Since we check the output,
they'll _usually_ report a failure in a normal build, but it depends on
what garbage bytes are found after the heap buffer. Building with
SANITIZE=address reliably notices the problem. The outcome (both the
exit code and the exact bytes) are just what Git happens to produce for
these cases today, and shouldn't be taken as an endorsement. It might be
reasonable to abort on an unterminated string, for example. The priority
for this patch is fixing the out-of-bounds memory access.

Reported-by: Carlos Andrés Ramírez Cataño <antaigroupltda@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
This was reported to the security list, but because it's just an
out-of-bounds read, we won't do a coordinated disclosure.

 mailinfo.c          |  8 ++++----
 t/t5100-mailinfo.sh | 22 ++++++++++++++++++++++
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/mailinfo.c b/mailinfo.c
index a07d2da16d..737b9e5e13 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -58,12 +58,12 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)

 static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 {
-	int c;
 	int take_next_literally = 0;

 	strbuf_addch(outbuf, '(');

-	while ((c = *in++) != 0) {
+	while (*in) {
+		int c = *in++;
 		if (take_next_literally == 1) {
 			take_next_literally = 0;
 		} else {
@@ -88,10 +88,10 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)

 static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
 {
-	int c;
 	int take_next_literally = 0;

-	while ((c = *in++) != 0) {
+	while (*in) {
+		int c = *in++;
 		if (take_next_literally == 1) {
 			take_next_literally = 0;
 		} else {
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index db11cababd..654d8cf3ee 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -268,4 +268,26 @@ test_expect_success 'mailinfo warn CR in base64 encoded email' '
 	test_must_be_empty quoted-cr/0002.err
 '

+test_expect_success 'from line with unterminated quoted string' '
+	echo "From: bob \"unterminated string smith <bob@example.com>" >in &&
+	git mailinfo /dev/null /dev/null <in >actual &&
+	cat >expect <<-\EOF &&
+	Author: bob unterminated string smith
+	Email: bob@example.com
+
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'from line with unterminated comment' '
+	echo "From: bob (unterminated comment smith <bob@example.com>" >in &&
+	git mailinfo /dev/null /dev/null <in >actual &&
+	cat >expect <<-\EOF &&
+	Author: bob (unterminated comment smith
+	Email: bob@example.com
+
+	EOF
+	test_cmp expect actual
+'
+
 test_done
-- 
2.43.0.310.g85bf1f633d

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
  2023-12-12 22:12 [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair() Jeff King
@ 2023-12-13  7:07 ` Patrick Steinhardt
  2023-12-13  8:20   ` Jeff King
  2023-12-13 14:54   ` [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair() Junio C Hamano
  0 siblings, 2 replies; 10+ messages in thread
From: Patrick Steinhardt @ 2023-12-13  7:07 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Taylor Blau, Carlos Andrés Ramírez Cataño

[-- Attachment #1: Type: text/plain, Size: 3736 bytes --]

On Tue, Dec 12, 2023 at 05:12:43PM -0500, Jeff King wrote:
> When processing a header like a "From" line, mailinfo uses
> unquote_quoted_pair() to handle double-quotes and rfc822 parenthesized
> comments. It takes a NUL-terminated string on input, and loops over the
> "in" pointer until it sees the NUL. When it finds the start of an
> interesting block, it delegates to helper functions which also increment
> "in", and return the updated pointer.
> 
> But there's a bug here: the helpers find the NUL with a post-increment
> in the loop condition, like:
> 
>    while ((c = *in++) != 0)
> 
> So when they do see a NUL (rather than the correct termination of the
> quote or comment section), they return "in" as one _past_ the NUL
> terminator. And thus the outer loop in unquote_quoted_pair() does not
> realize we hit the NUL, and keeps reading past the end of the buffer.
> 
> We should instead make sure to return "in" positioned at the NUL, so
> that the caller knows to stop their loop, too. A hacky way to do this is
> to return "in - 1" after leaving the inner loop. But a slightly cleaner
> solution is to avoid incrementing "in" until we are sure it contained a
> non-NUL byte (i.e., doing it inside the loop body).
> 
> The two tests here show off the problem. Since we check the output,
> they'll _usually_ report a failure in a normal build, but it depends on
> what garbage bytes are found after the heap buffer. Building with
> SANITIZE=address reliably notices the problem. The outcome (both the
> exit code and the exact bytes) are just what Git happens to produce for
> these cases today, and shouldn't be taken as an endorsement. It might be
> reasonable to abort on an unterminated string, for example. The priority
> for this patch is fixing the out-of-bounds memory access.

Makes sense.

> diff --git a/mailinfo.c b/mailinfo.c
> index a07d2da16d..737b9e5e13 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -58,12 +58,12 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
>  
>  static const char *unquote_comment(struct strbuf *outbuf, const char *in)
>  {
> -	int c;
>  	int take_next_literally = 0;
>  
>  	strbuf_addch(outbuf, '(');
>  
> -	while ((c = *in++) != 0) {
> +	while (*in) {
> +		int c = *in++;
>  		if (take_next_literally == 1) {
>  			take_next_literally = 0;
>  		} else {
> @@ -88,10 +88,10 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)
>  
>  static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
>  {
> -	int c;
>  	int take_next_literally = 0;
>  
> -	while ((c = *in++) != 0) {
> +	while (*in) {
> +		int c = *in++;
>  		if (take_next_literally == 1) {
>  			take_next_literally = 0;
>  		} else {

I was wondering whether we want to convert `unquote_quoted_pair()` in
the same way. It's not subject to the same issue because it doesn't
return `in` to its callers. But it would lower the cognitive overhead a
bit by keeping the coding style consistent. But please feel free to
ignore this remark.

Another thing I was wondering about is the recursive nature of these
functions, and whether it can lead to similar issues like we recently
had when parsing revisions with stack exhaustion. I _think_ this should
not be much of a problem in this case though, as we're talking about
mail headers here. While the length of header values isn't restricted
per se (the line length is restricted to 1000, but with Comment Folding
Whitespace values can span multiple lines), but mail provdiers will sure
clamp down on mails with a "From:" header that is megabytes in size.

So taken together, this looks good to me.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
  2023-12-13  7:07 ` Patrick Steinhardt
@ 2023-12-13  8:20   ` Jeff King
  2023-12-14  7:41     ` Patrick Steinhardt
  2023-12-13 14:54   ` [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair() Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Jeff King @ 2023-12-13  8:20 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Taylor Blau, Carlos Andrés Ramírez Cataño

On Wed, Dec 13, 2023 at 08:07:21AM +0100, Patrick Steinhardt wrote:

> >  static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
> >  {
> > -	int c;
> >  	int take_next_literally = 0;
> >  
> > -	while ((c = *in++) != 0) {
> > +	while (*in) {
> > +		int c = *in++;
> >  		if (take_next_literally == 1) {
> >  			take_next_literally = 0;
> >  		} else {
> 
> I was wondering whether we want to convert `unquote_quoted_pair()` in
> the same way. It's not subject to the same issue because it doesn't
> return `in` to its callers. But it would lower the cognitive overhead a
> bit by keeping the coding style consistent. But please feel free to
> ignore this remark.

I dunno. I don't think the consistency matters that much between
functions, and on its own, the "while ((c = *in++))" pattern is not
harmful. It's used elsewhere in the same file for other functions that
are also fine (for decoding q/b headers).

> Another thing I was wondering about is the recursive nature of these
> functions, and whether it can lead to similar issues like we recently
> had when parsing revisions with stack exhaustion. I _think_ this should
> not be much of a problem in this case though, as we're talking about
> mail headers here. While the length of header values isn't restricted
> per se (the line length is restricted to 1000, but with Comment Folding
> Whitespace values can span multiple lines), but mail provdiers will sure
> clamp down on mails with a "From:" header that is megabytes in size.

It's just unquote_comment() that is recursive, but yeah, there is
nothing to stop it from recursing forever on "((((((...". The stack
requirements are pretty small, though. I needed between 2^17 and 2^18
bytes to segfault on my machine using:

  perl -e 'print "From: ", "(" x 2**18;' |
  git mailinfo /dev/null /dev/null

Absurdly big for an email, but maybe within the realm of possibility? I
think it might be possible to drop the recursion and just use a depth
counter, like this:

diff --git a/mailinfo.c b/mailinfo.c
index 737b9e5e13..db236f9f9f 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -59,6 +59,7 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
 static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 {
 	int take_next_literally = 0;
+	int depth = 1;
 
 	strbuf_addch(outbuf, '(');
 
@@ -72,11 +73,14 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 				take_next_literally = 1;
 				continue;
 			case '(':
-				in = unquote_comment(outbuf, in);
+				strbuf_addch(outbuf, '(');
+				depth++;
 				continue;
 			case ')':
 				strbuf_addch(outbuf, ')');
-				return in;
+				if (!--depth)
+					return in;
+				continue;
 			}
 		}
 
I doubt it's a big deal either way, but if it's that easy to do it might
be worth it.

-Peff

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
  2023-12-13  7:07 ` Patrick Steinhardt
  2023-12-13  8:20   ` Jeff King
@ 2023-12-13 14:54   ` Junio C Hamano
  2023-12-14 21:40     ` Jeff King
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2023-12-13 14:54 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Jeff King, git, Taylor Blau,
	Carlos Andrés Ramírez Cataño

Patrick Steinhardt <ps@pks.im> writes:

>>  static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
>>  {
>> -	int c;
>>  	int take_next_literally = 0;
>>  
>> -	while ((c = *in++) != 0) {
>> +	while (*in) {
>> +		int c = *in++;
>>  		if (take_next_literally == 1) {
>>  			take_next_literally = 0;
>>  		} else {
>
> I was wondering whether we want to convert `unquote_quoted_pair()` in
> the same way. It's not subject to the same issue because it doesn't
> return `in` to its callers. But it would lower the cognitive overhead a
> bit by keeping the coding style consistent. But please feel free to
> ignore this remark.

Yeah, I was wondering about the value of establishing a pattern that
can be followed safely and with clarity, too.  I also briefly
wondered how bad if we picked the "compensate for the increment done
by the last iteration before leaving the loop by returning (in-1)"
idiom (which Peff called "a hacky way") to be that universal
pattern, but this bug was a clear enough evidence that it does not
work very well in developers' minds.

I actually had trouble with the proposed update, and wondered if

-	while ((c = *in++) != 0) {
+	while ((c = *in)) {
+		in++;

is easier to follow, but then was hit by the possibility that the
same "we have incremented 'in' a bit too early" may exist if such
a loop wants to use 'in' in its body.  Wouldn't it mean that

-	while ((c = *in++) != 0) {
+	for (; c = *in; in++) {

would be even a better rewrite?

Enough bikeshedding...

> Another thing I was wondering about is the recursive nature of these
> functions, and whether it can lead to similar issues like we recently
> had when parsing revisions with stack exhaustion. I _think_ this should
> not be much of a problem in this case though, as we're talking about
> mail headers here. While the length of header values isn't restricted
> per se (the line length is restricted to 1000, but with Comment Folding
> Whitespace values can span multiple lines), but mail provdiers will sure
> clamp down on mails with a "From:" header that is megabytes in size.

Good thinking, and I think Peff's iterative rewrite would be a good
way to deal with it if it ever becomes an issue.

> So taken together, this looks good to me.

Thanks, both, for writing and reviewing.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
  2023-12-13  8:20   ` Jeff King
@ 2023-12-14  7:41     ` Patrick Steinhardt
  2023-12-14 21:44       ` [PATCH 0/2] avoiding recursion in mailinfo Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Patrick Steinhardt @ 2023-12-14  7:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Taylor Blau, Carlos Andrés Ramírez Cataño

[-- Attachment #1: Type: text/plain, Size: 2643 bytes --]

On Wed, Dec 13, 2023 at 03:20:27AM -0500, Jeff King wrote:
> On Wed, Dec 13, 2023 at 08:07:21AM +0100, Patrick Steinhardt wrote:
[snip]
> > Another thing I was wondering about is the recursive nature of these
> > functions, and whether it can lead to similar issues like we recently
> > had when parsing revisions with stack exhaustion. I _think_ this should
> > not be much of a problem in this case though, as we're talking about
> > mail headers here. While the length of header values isn't restricted
> > per se (the line length is restricted to 1000, but with Comment Folding
> > Whitespace values can span multiple lines), but mail provdiers will sure
> > clamp down on mails with a "From:" header that is megabytes in size.
> 
> It's just unquote_comment() that is recursive, but yeah, there is
> nothing to stop it from recursing forever on "((((((...". The stack
> requirements are pretty small, though. I needed between 2^17 and 2^18
> bytes to segfault on my machine using:
> 
>   perl -e 'print "From: ", "(" x 2**18;' |
>   git mailinfo /dev/null /dev/null
> 
> Absurdly big for an email, but maybe within the realm of possibility? I
> think it might be possible to drop the recursion and just use a depth
> counter, like this:

It's definitely not as large as I'd have expected it to be, we're only
talking about kilobytes of data. Feels like it might be in the realm of
possibility to get transferred by a mail provider.

> diff --git a/mailinfo.c b/mailinfo.c
> index 737b9e5e13..db236f9f9f 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -59,6 +59,7 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
>  static const char *unquote_comment(struct strbuf *outbuf, const char *in)
>  {
>  	int take_next_literally = 0;
> +	int depth = 1;
>  
>  	strbuf_addch(outbuf, '(');
>  
> @@ -72,11 +73,14 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)
>  				take_next_literally = 1;
>  				continue;
>  			case '(':
> -				in = unquote_comment(outbuf, in);
> +				strbuf_addch(outbuf, '(');
> +				depth++;
>  				continue;
>  			case ')':
>  				strbuf_addch(outbuf, ')');
> -				return in;
> +				if (!--depth)
> +					return in;
> +				continue;
>  			}
>  		}
>  
> I doubt it's a big deal either way, but if it's that easy to do it might
> be worth it.

Isn't this only protecting against unbalanced braces? That might be a
sensible check to do regardless, but does it protect against recursion
blowing the stack if you just happen to have many opening braces?

Might be I'm missing something.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
  2023-12-13 14:54   ` [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair() Junio C Hamano
@ 2023-12-14 21:40     ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2023-12-14 21:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Patrick Steinhardt, git, Taylor Blau,
	Carlos Andrés Ramírez Cataño

On Wed, Dec 13, 2023 at 06:54:03AM -0800, Junio C Hamano wrote:

> I actually had trouble with the proposed update, and wondered if
> 
> -	while ((c = *in++) != 0) {
> +	while ((c = *in)) {
> +		in++;
> 
> is easier to follow, but then was hit by the possibility that the
> same "we have incremented 'in' a bit too early" may exist if such
> a loop wants to use 'in' in its body.  Wouldn't it mean that
> 
> -	while ((c = *in++) != 0) {
> +	for (; c = *in; in++) {
> 
> would be even a better rewrite?

No, the "for" loop wouldn't work, because the loop body actually depends
on "in" having already been incremented. If we find the end of the
comment or quoted string, we return "in", and the caller is expecting it
to have moved past the closing quote. So that would have to become
"return in+1".

IOW, the issue is that the normal end-of-quote parsing and hitting
end-of-string are fundamentally different. So we either need to
differentiate the returns (either with "+1" on one, or "-1" on the
other). Or we need to choose to increment "in" based on which we found
(which is what my patch does).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 0/2] avoiding recursion in mailinfo
  2023-12-14  7:41     ` Patrick Steinhardt
@ 2023-12-14 21:44       ` Jeff King
  2023-12-14 21:47         ` [PATCH 1/2] t5100: make rfc822 comment test more careful Jeff King
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jeff King @ 2023-12-14 21:44 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Taylor Blau, Carlos Andrés Ramírez Cataño

On Thu, Dec 14, 2023 at 08:41:20AM +0100, Patrick Steinhardt wrote:

> > @@ -72,11 +73,14 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)
> >  				take_next_literally = 1;
> >  				continue;
> >  			case '(':
> > -				in = unquote_comment(outbuf, in);
> > +				strbuf_addch(outbuf, '(');
> > +				depth++;
> >  				continue;
> >  			case ')':
> >  				strbuf_addch(outbuf, ')');
> > -				return in;
> > +				if (!--depth)
> > +					return in;
> > +				continue;
> >  			}
> >  		}
> >  
> > I doubt it's a big deal either way, but if it's that easy to do it might
> > be worth it.
> 
> Isn't this only protecting against unbalanced braces? That might be a
> sensible check to do regardless, but does it protect against recursion
> blowing the stack if you just happen to have many opening braces?
> 
> Might be I'm missing something.

It protects against recrusion blowing the stack because it removes the
recursive call to unquote_comment(). ;)

The "depth" stuff is there because without recursion, we have to know
when we've hit the correct closing paren (as opposed to an embedded
one).

Here it is as a patch (actually two patches). I don't think it's high
priority, but I'd sunk enough brain cells into thinking about it that I
wanted to capture the work. ;)

I did it on top of the earlier mailinfo out-of-bounds read fix, but it
could be applied separately.

  [1/2]: t5100: make rfc822 comment test more careful
  [2/2]: mailinfo: avoid recursion when unquoting From headers

 mailinfo.c             | 8 ++++++--
 t/t5100/comment.expect | 2 +-
 t/t5100/comment.in     | 2 +-
 3 files changed, 8 insertions(+), 4 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] t5100: make rfc822 comment test more careful
  2023-12-14 21:44       ` [PATCH 0/2] avoiding recursion in mailinfo Jeff King
@ 2023-12-14 21:47         ` Jeff King
  2023-12-14 21:48         ` [PATCH 2/2] mailinfo: avoid recursion when unquoting From headers Jeff King
  2023-12-15  7:58         ` [PATCH 0/2] avoiding recursion in mailinfo Patrick Steinhardt
  2 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2023-12-14 21:47 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Taylor Blau, Carlos Andrés Ramírez Cataño

When processing "From" headers in an email, mailinfo "unquotes" quoted
strings and rfc822 parenthesized comments. For quoted strings, we
actually remove the double-quotes, so:

  From: "A U Thor" <someone@example.com>

become:

  Author: A U Thor
  Email: someone@example.com

But for comments, we leave the outer parentheses in place, so:

  From: A U (this is a comment) Thor <someone@example.com>

becomes:

  Author: A U (this is a comment) Thor
  Email: someone@example.com

So what is the comment "unquoting" actually doing? In our code, being in
a comment section has exactly two effects:

  1. We'll unquote backslash-escaped characters inside a comment
     section.

  2. We _won't_ unquote double-quoted strings inside a comment section.

Our test for comments in t5100 checks this:

  From: "A U Thor" <somebody@example.com> (this is \(really\) a comment (honestly))

So it is covering (1), but not (2). Let's add in a quoted string to
cover this.

Moreover, because the comment appears at the end of the From header,
there's nothing to confirm that we correctly found the end of the
comment section (and not just the end-of-string). Let's instead move it
to the beginning of the header, which means we can confirm that the
existing quoted string is detected (which will only happen if we know
we've left the comment block).

As expected, the test continues to pass, but this will give us more
confidence as we refactor the code in the next patch.

Signed-off-by: Jeff King <peff@peff.net>
---
I did this as a separate patch so it is easy to see that the existing
code already passes the test (i.e., our refactor is a true noop in terms
of behavior, and not fixing or breaking anything).

 t/t5100/comment.expect | 2 +-
 t/t5100/comment.in     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t5100/comment.expect b/t/t5100/comment.expect
index 7228177984..bd71956a47 100644
--- a/t/t5100/comment.expect
+++ b/t/t5100/comment.expect
@@ -1,4 +1,4 @@
-Author: A U Thor (this is (really) a comment (honestly))
+Author: (this is (really) a "comment" (honestly)) A U Thor
 Email: somebody@example.com
 Subject: testing comments
 Date: Sun, 25 May 2008 00:38:18 -0700
diff --git a/t/t5100/comment.in b/t/t5100/comment.in
index c53a192dfe..0b7e903b06 100644
--- a/t/t5100/comment.in
+++ b/t/t5100/comment.in
@@ -1,5 +1,5 @@
 From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
-From: "A U Thor" <somebody@example.com> (this is \(really\) a comment (honestly))
+From: (this is \(really\) a "comment" (honestly)) "A U Thor" <somebody@example.com>
 Date: Sun, 25 May 2008 00:38:18 -0700
 Subject: [PATCH] testing comments

-- 
2.43.0.363.g1d22a8f302

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] mailinfo: avoid recursion when unquoting From headers
  2023-12-14 21:44       ` [PATCH 0/2] avoiding recursion in mailinfo Jeff King
  2023-12-14 21:47         ` [PATCH 1/2] t5100: make rfc822 comment test more careful Jeff King
@ 2023-12-14 21:48         ` Jeff King
  2023-12-15  7:58         ` [PATCH 0/2] avoiding recursion in mailinfo Patrick Steinhardt
  2 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2023-12-14 21:48 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Taylor Blau, Carlos Andrés Ramírez Cataño

Our unquote_comment() function is recursive; when it sees a comment
within a comment, like:

  (this is an (embedded) comment)

it recurses to handle the inner comment. This is fine for practical use,
but it does mean that you can easily run out of stack space with a
malicious header. For example:

  perl -e 'print "From: ", "(" x 2**18;' |
  git mailinfo /dev/null /dev/null

segfaults on my system. And since mailinfo is likely to be fed untrusted
input from the Internet (if not by human users, who might recognize a
garbage header, but certainly there are automated systems that apply
patches from a list) it may be possible for an attacker to trigger the
problem.

That said, I don't think there's an interesting security vulnerability
here. All an attacker can do is make it impossible to parse their email
and apply their patch, and there are lots of ways to generate bogus
emails. So it's more of an annoyance than anything.

But it's pretty easy to fix it. The recursion is not helping us preserve
any particular state from each level. The only flag in our parsing is
take_next_literally, and we can never recurse when it is set (since the
start of a new comment implies it was not backslash-escaped). So it is
really only useful for finding the end of the matched pair of
parentheses. We can do that easily with a simple depth counter.

Signed-off-by: Jeff King <peff@peff.net>
---
 mailinfo.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mailinfo.c b/mailinfo.c
index 737b9e5e13..db236f9f9f 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -59,6 +59,7 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
 static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 {
 	int take_next_literally = 0;
+	int depth = 1;

 	strbuf_addch(outbuf, '(');

@@ -72,11 +73,14 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 				take_next_literally = 1;
 				continue;
 			case '(':
-				in = unquote_comment(outbuf, in);
+				strbuf_addch(outbuf, '(');
+				depth++;
 				continue;
 			case ')':
 				strbuf_addch(outbuf, ')');
-				return in;
+				if (!--depth)
+					return in;
+				continue;
 			}
 		}

-- 
2.43.0.363.g1d22a8f302

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] avoiding recursion in mailinfo
  2023-12-14 21:44       ` [PATCH 0/2] avoiding recursion in mailinfo Jeff King
  2023-12-14 21:47         ` [PATCH 1/2] t5100: make rfc822 comment test more careful Jeff King
  2023-12-14 21:48         ` [PATCH 2/2] mailinfo: avoid recursion when unquoting From headers Jeff King
@ 2023-12-15  7:58         ` Patrick Steinhardt
  2 siblings, 0 replies; 10+ messages in thread
From: Patrick Steinhardt @ 2023-12-15  7:58 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Taylor Blau, Carlos Andrés Ramírez Cataño

[-- Attachment #1: Type: text/plain, Size: 1414 bytes --]

On Thu, Dec 14, 2023 at 04:44:44PM -0500, Jeff King wrote:
> On Thu, Dec 14, 2023 at 08:41:20AM +0100, Patrick Steinhardt wrote:
> 
> > > @@ -72,11 +73,14 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)
> > >  				take_next_literally = 1;
> > >  				continue;
> > >  			case '(':
> > > -				in = unquote_comment(outbuf, in);
> > > +				strbuf_addch(outbuf, '(');
> > > +				depth++;
> > >  				continue;
> > >  			case ')':
> > >  				strbuf_addch(outbuf, ')');
> > > -				return in;
> > > +				if (!--depth)
> > > +					return in;
> > > +				continue;
> > >  			}
> > >  		}
> > >  
> > > I doubt it's a big deal either way, but if it's that easy to do it might
> > > be worth it.
> > 
> > Isn't this only protecting against unbalanced braces? That might be a
> > sensible check to do regardless, but does it protect against recursion
> > blowing the stack if you just happen to have many opening braces?
> > 
> > Might be I'm missing something.
> 
> It protects against recrusion blowing the stack because it removes the
> recursive call to unquote_comment(). ;)

Doh. Of course it does, I completely missed the removals.

> The "depth" stuff is there because without recursion, we have to know
> when we've hit the correct closing paren (as opposed to an embedded
> one).

Yes, makes sense now. The patches look good to me, thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-12-15  7:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-12 22:12 [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair() Jeff King
2023-12-13  7:07 ` Patrick Steinhardt
2023-12-13  8:20   ` Jeff King
2023-12-14  7:41     ` Patrick Steinhardt
2023-12-14 21:44       ` [PATCH 0/2] avoiding recursion in mailinfo Jeff King
2023-12-14 21:47         ` [PATCH 1/2] t5100: make rfc822 comment test more careful Jeff King
2023-12-14 21:48         ` [PATCH 2/2] mailinfo: avoid recursion when unquoting From headers Jeff King
2023-12-15  7:58         ` [PATCH 0/2] avoiding recursion in mailinfo Patrick Steinhardt
2023-12-13 14:54   ` [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair() Junio C Hamano
2023-12-14 21:40     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).