[PATCH] strlcpy(): safer and faster version

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] strlcpy(): safer and faster version
@ 2021-12-16 17:31 Andriy Makukha via GitGitGadget
  2021-12-16 18:14 ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Andriy Makukha via GitGitGadget @ 2021-12-16 17:31 UTC (permalink / raw)
  To: git; +Cc: Andriy Makukha, Andrii Makukha

From: Andrii Makukha <andriy.makukha@gmail.com>

Original strlcpy() has a significant disadvantage of being both unsafe
and inefficient. It unnecessarily calculates length of `src` which may
result in a segmentation fault if `src` is not terminated with a
NUL-character.

In this fix, if `src` is too long, strlcpy() returns `size`. This
allows to still detect an error while fixing the mentioned
vulnerabilities. It deviates from original strlcpy(), but for a good
reason.

Signed-off-by: Andrii Makukha <andriy.makukha@gmail.com>
---
    strlcpy(): safer and faster version

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1097%2Famakukha%2Fmaint-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1097/amakukha/maint-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1097

 compat/strlcpy.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/compat/strlcpy.c b/compat/strlcpy.c
index 4024c360301..e7fd11015c7 100644
--- a/compat/strlcpy.c
+++ b/compat/strlcpy.c
@@ -2,7 +2,12 @@
 
 size_t gitstrlcpy(char *dest, const char *src, size_t size)
 {
-	size_t ret = strlen(src);
+	/*
+	 * NOTE: original strlcpy returns full length of src, but this is
+	 * unsafe. This implementation returns `size` if src is too long.
+	 * This behaviour is faster and still allows to detect an issue.
+	 */
+	size_t ret = strnlen(src, size);
 
 	if (size) {
 		size_t len = (ret >= size) ? size - 1 : ret;

base-commit: e9d7761bb94f20acc98824275e317fa82436c25d
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] strlcpy(): safer and faster version
  2021-12-16 17:31 [PATCH] strlcpy(): safer and faster version Andriy Makukha via GitGitGadget
@ 2021-12-16 18:14 ` Jeff King
  2021-12-16 22:32   ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2021-12-16 18:14 UTC (permalink / raw)
  To: Andriy Makukha via GitGitGadget; +Cc: git, Andriy Makukha

On Thu, Dec 16, 2021 at 05:31:20PM +0000, Andriy Makukha via GitGitGadget wrote:

> Original strlcpy() has a significant disadvantage of being both unsafe
> and inefficient. It unnecessarily calculates length of `src` which may
> result in a segmentation fault if `src` is not terminated with a
> NUL-character.

I think any code that passes such a "src" is still broken after your
code. If the length of "src" is less than "size", then the result in
"dest" will contain garbage we read from the memory after "src".

Likewise in that case using strnlen() isn't any faster, since it has to
look at the same number of bytes either way (it may even be slower since
its loop has two conditions to check).

> In this fix, if `src` is too long, strlcpy() returns `size`. This
> allows to still detect an error while fixing the mentioned
> vulnerabilities. It deviates from original strlcpy(), but for a good
> reason.

This could potentially break callers of strlcpy(), though, because it's
changing the semantics of the return value. For example, if they use the
return value to expand a buffer to hold the result.

I do think the proposed semantics are better (I have actually fixed a
real overflow bug where somebody assumed strlcpy() returned the number
of bytes written). But we probably should not call it strlcpy(), because
that's has well-known behavior that we're not meeting.

I don't think any of the current code would be broken by this (most does
not even look at the return value at all). It just seems like an
accident waiting to happen.

Personally, I don't love strlcpy() in the first place. Avoiding heap
overflows is good, but unexpected truncation can also be buggy. That's
why try to either size buffers automatically (strbuf, xstrfmt,
FLEX_ALLOC, etc) or assert that we didn't truncate (xsnprintf).

Some cases could probably be converted away from strlcpy(). For
instance, the color stuff in add-interactive.c should be using
xsnprintf(), since the point of COLOR_MAXLEN is to hold the
longest-possible color. The ones in difftool.c probably ought to be
strbufs. There are definitely some that want the truncation semantics
(e.g., usernames in archive-tar.c). We might be better off providing a
function whose name makes it clear that truncation is OK.

>  size_t gitstrlcpy(char *dest, const char *src, size_t size)
>  {
> -	size_t ret = strlen(src);
> +	/*
> +	 * NOTE: original strlcpy returns full length of src, but this is
> +	 * unsafe. This implementation returns `size` if src is too long.
> +	 * This behaviour is faster and still allows to detect an issue.
> +	 */
> +	size_t ret = strnlen(src, size);

Also, strnlen() isn't portable, so we'd need a solution there (open
coding or yet another compat wrapper).

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] strlcpy(): safer and faster version
  2021-12-16 18:14 ` Jeff King
@ 2021-12-16 22:32   ` Junio C Hamano
  2021-12-17  5:22     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2021-12-16 22:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Andriy Makukha via GitGitGadget, git, Andriy Makukha

Jeff King <peff@peff.net> writes:

> On Thu, Dec 16, 2021 at 05:31:20PM +0000, Andriy Makukha via GitGitGadget wrote:
>
>> Original strlcpy() has a significant disadvantage of being both unsafe
>> and inefficient. It unnecessarily calculates length of `src` which may
>> result in a segmentation fault if `src` is not terminated with a
>> NUL-character.
>
> I think any code that passes such a "src" is still broken after your
> code. If the length of "src" is less than "size", then the result in
> "dest" will contain garbage we read from the memory after "src".
>
> Likewise in that case using strnlen() isn't any faster, since it has to
> look at the same number of bytes either way (it may even be slower since
> its loop has two conditions to check).
>
>> In this fix, if `src` is too long, strlcpy() returns `size`. This
>> allows to still detect an error while fixing the mentioned
>> vulnerabilities. It deviates from original strlcpy(), but for a good
>> reason.
>
> This could potentially break callers of strlcpy(), though, because it's
> changing the semantics of the return value. For example, if they use the
> return value to expand a buffer to hold the result.
>
> I do think the proposed semantics are better (I have actually fixed a
> real overflow bug where somebody assumed strlcpy() returned the number
> of bytes written). But we probably should not call it strlcpy(), because
> that's has well-known behavior that we're not meeting.
>
> I don't think any of the current code would be broken by this (most does
> not even look at the return value at all). It just seems like an
> accident waiting to happen.
>
> Personally, I don't love strlcpy() in the first place. Avoiding heap
> overflows is good, but unexpected truncation can also be buggy. That's
> why try to either size buffers automatically (strbuf, xstrfmt,
> FLEX_ALLOC, etc) or assert that we didn't truncate (xsnprintf).
>
> Some cases could probably be converted away from strlcpy(). For
> instance, the color stuff in add-interactive.c should be using
> xsnprintf(), since the point of COLOR_MAXLEN is to hold the
> longest-possible color. The ones in difftool.c probably ought to be
> strbufs. There are definitely some that want the truncation semantics
> (e.g., usernames in archive-tar.c). We might be better off providing a
> function whose name makes it clear that truncation is OK.
>
>>  size_t gitstrlcpy(char *dest, const char *src, size_t size)
>>  {
>> -	size_t ret = strlen(src);
>> +	/*
>> +	 * NOTE: original strlcpy returns full length of src, but this is
>> +	 * unsafe. This implementation returns `size` if src is too long.
>> +	 * This behaviour is faster and still allows to detect an issue.
>> +	 */
>> +	size_t ret = strnlen(src, size);
>
> Also, strnlen() isn't portable, so we'd need a solution there (open
> coding or yet another compat wrapper).

Thanks for saying everything I wanted to say ;-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] strlcpy(): safer and faster version
  2021-12-16 22:32   ` Junio C Hamano
@ 2021-12-17  5:22     ` Ævar Arnfjörð Bjarmason
  2021-12-17 22:42       ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-17  5:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Andriy Makukha via GitGitGadget, git, Andriy Makukha


On Thu, Dec 16 2021, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
>
>> On Thu, Dec 16, 2021 at 05:31:20PM +0000, Andriy Makukha via GitGitGadget wrote:
>>
>>> Original strlcpy() has a significant disadvantage of being both unsafe
>>> and inefficient. It unnecessarily calculates length of `src` which may
>>> result in a segmentation fault if `src` is not terminated with a
>>> NUL-character.
>>
>> I think any code that passes such a "src" is still broken after your
>> code. If the length of "src" is less than "size", then the result in
>> "dest" will contain garbage we read from the memory after "src".
>>
>> Likewise in that case using strnlen() isn't any faster, since it has to
>> look at the same number of bytes either way (it may even be slower since
>> its loop has two conditions to check).
>>
>>> In this fix, if `src` is too long, strlcpy() returns `size`. This
>>> allows to still detect an error while fixing the mentioned
>>> vulnerabilities. It deviates from original strlcpy(), but for a good
>>> reason.
>>
>> This could potentially break callers of strlcpy(), though, because it's
>> changing the semantics of the return value. For example, if they use the
>> return value to expand a buffer to hold the result.
>>
>> I do think the proposed semantics are better (I have actually fixed a
>> real overflow bug where somebody assumed strlcpy() returned the number
>> of bytes written). But we probably should not call it strlcpy(), because
>> that's has well-known behavior that we're not meeting.
>>
>> I don't think any of the current code would be broken by this (most does
>> not even look at the return value at all). It just seems like an
>> accident waiting to happen.
>>
>> Personally, I don't love strlcpy() in the first place. Avoiding heap
>> overflows is good, but unexpected truncation can also be buggy. That's
>> why try to either size buffers automatically (strbuf, xstrfmt,
>> FLEX_ALLOC, etc) or assert that we didn't truncate (xsnprintf).
>>
>> Some cases could probably be converted away from strlcpy(). For
>> instance, the color stuff in add-interactive.c should be using
>> xsnprintf(), since the point of COLOR_MAXLEN is to hold the
>> longest-possible color. The ones in difftool.c probably ought to be
>> strbufs. There are definitely some that want the truncation semantics
>> (e.g., usernames in archive-tar.c). We might be better off providing a
>> function whose name makes it clear that truncation is OK.
>>
>>>  size_t gitstrlcpy(char *dest, const char *src, size_t size)
>>>  {
>>> -	size_t ret = strlen(src);
>>> +	/*
>>> +	 * NOTE: original strlcpy returns full length of src, but this is
>>> +	 * unsafe. This implementation returns `size` if src is too long.
>>> +	 * This behaviour is faster and still allows to detect an issue.
>>> +	 */
>>> +	size_t ret = strnlen(src, size);
>>
>> Also, strnlen() isn't portable, so we'd need a solution there (open
>> coding or yet another compat wrapper).
>
> Thanks for saying everything I wanted to say ;-)

Isn't strlcpy() an OpenBSD-initiated effort? So if we're going to update
this at all shouldn't be be aiming for picking an "upstream" here?
E.g. [1]?

But yeah, just getting rid of it in one form or another is probably
better.

1. https://github.com/libressl-portable/openbsd/blob/master/src/lib/libc/string/strlcpy.c

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] strlcpy(): safer and faster version
  2021-12-17  5:22     ` Ævar Arnfjörð Bjarmason
@ 2021-12-17 22:42       ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2021-12-17 22:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Andriy Makukha via GitGitGadget, git, Andriy Makukha

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Thanks for saying everything I wanted to say ;-)
>
> Isn't strlcpy() an OpenBSD-initiated effort? So if we're going to update

Yes.

> this at all shouldn't be be aiming for picking an "upstream" here?
> E.g. [1]?

If this were an improvement, yes.  But if I am reading the patch
correctly, it changes what the value returned from the function
means.  I do not think that would fly even in the upstream, without
a very good justification.  Adding a new function that has semantics
different from strlcpy() might be a possibility at upstream, but as
far as this project is concerned, if we were to change the use of
strlcpy() in the codebase, we often have tools that are much better
suited in our arsenal, as Peff already mentioned, so...

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-17 22:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-16 17:31 [PATCH] strlcpy(): safer and faster version Andriy Makukha via GitGitGadget
2021-12-16 18:14 ` Jeff King
2021-12-16 22:32   ` Junio C Hamano
2021-12-17  5:22     ` Ævar Arnfjörð Bjarmason
2021-12-17 22:42       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).