* [PATCH 1/3] unpack: replace xwrite() loop with write_in_full()
2024-03-02 19:03 [PATCH 0/3] Auditing use of xwrite() Junio C Hamano
@ 2024-03-02 19:03 ` Junio C Hamano
2024-03-04 6:58 ` Patrick Steinhardt
2024-03-02 19:03 ` [PATCH 2/3] sideband: avoid short write(2) Junio C Hamano
2024-03-02 19:03 ` [PATCH 3/3] repack: check error writing to pack-objects subprocess Junio C Hamano
2 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2024-03-02 19:03 UTC (permalink / raw)
To: git
We have two packfile stream consumers, index-pack and
unpack-objects, that allow excess payload after the packfile stream
data. Their code to relay excess data hasn't changed significantly
since their original implementation that appeared in 67e5a5ec
(git-unpack-objects: re-write to read from stdin, 2005-06-28) and
9bee2478 (mimic unpack-objects when --stdin is used with index-pack,
2006-10-25).
These code blocks contain hand-rolled loops using xwrite(), written
before our write_in_full() helper existed. This helper now provides
the same functionality.
Replace these loops with write_in_full() for shorter, clearer
code. Update related variables accordingly.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
builtin/index-pack.c | 17 +++--------------
builtin/unpack-objects.c | 8 +-------
2 files changed, 4 insertions(+), 21 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index a3a37bd215..856428fef9 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1524,14 +1524,12 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
struct strbuf pack_name = STRBUF_INIT;
struct strbuf index_name = STRBUF_INIT;
struct strbuf rev_index_name = STRBUF_INIT;
- int err;
if (!from_stdin) {
close(input_fd);
} else {
fsync_component_or_die(FSYNC_COMPONENT_PACK, output_fd, curr_pack_name);
- err = close(output_fd);
- if (err)
+ if (close(output_fd))
die_errno(_("error while closing pack file"));
}
@@ -1566,17 +1564,8 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
write_or_die(1, buf.buf, buf.len);
strbuf_release(&buf);
- /*
- * Let's just mimic git-unpack-objects here and write
- * the last part of the input buffer to stdout.
- */
- while (input_len) {
- err = xwrite(1, input_buffer + input_offset, input_len);
- if (err <= 0)
- break;
- input_len -= err;
- input_offset += err;
- }
+ /* Write the last part of the buffer to stdout */
+ write_in_full(1, input_buffer + input_offset, input_len);
}
strbuf_release(&rev_index_name);
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index e0a701f2b3..f1c85a00ae 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -679,13 +679,7 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix UNUSED)
use(the_hash_algo->rawsz);
/* Write the last part of the buffer to stdout */
- while (len) {
- int ret = xwrite(1, buffer + offset, len);
- if (ret <= 0)
- break;
- len -= ret;
- offset += ret;
- }
+ write_in_full(1, buffer + offset, len);
/* All done */
return has_errors;
--
2.44.0-84-gb387623c12
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/3] unpack: replace xwrite() loop with write_in_full()
2024-03-02 19:03 ` [PATCH 1/3] unpack: replace xwrite() loop with write_in_full() Junio C Hamano
@ 2024-03-04 6:58 ` Patrick Steinhardt
2024-03-04 7:29 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Patrick Steinhardt @ 2024-03-04 6:58 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 3345 bytes --]
On Sat, Mar 02, 2024 at 11:03:46AM -0800, Junio C Hamano wrote:
> We have two packfile stream consumers, index-pack and
> unpack-objects, that allow excess payload after the packfile stream
> data. Their code to relay excess data hasn't changed significantly
> since their original implementation that appeared in 67e5a5ec
> (git-unpack-objects: re-write to read from stdin, 2005-06-28) and
> 9bee2478 (mimic unpack-objects when --stdin is used with index-pack,
> 2006-10-25).
>
> These code blocks contain hand-rolled loops using xwrite(), written
> before our write_in_full() helper existed. This helper now provides
> the same functionality.
>
> Replace these loops with write_in_full() for shorter, clearer
> code. Update related variables accordingly.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> builtin/index-pack.c | 17 +++--------------
> builtin/unpack-objects.c | 8 +-------
> 2 files changed, 4 insertions(+), 21 deletions(-)
>
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index a3a37bd215..856428fef9 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1524,14 +1524,12 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
> struct strbuf pack_name = STRBUF_INIT;
> struct strbuf index_name = STRBUF_INIT;
> struct strbuf rev_index_name = STRBUF_INIT;
> - int err;
>
> if (!from_stdin) {
> close(input_fd);
> } else {
> fsync_component_or_die(FSYNC_COMPONENT_PACK, output_fd, curr_pack_name);
> - err = close(output_fd);
> - if (err)
> + if (close(output_fd))
> die_errno(_("error while closing pack file"));
> }
>
> @@ -1566,17 +1564,8 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
> write_or_die(1, buf.buf, buf.len);
> strbuf_release(&buf);
>
> - /*
> - * Let's just mimic git-unpack-objects here and write
> - * the last part of the input buffer to stdout.
> - */
> - while (input_len) {
> - err = xwrite(1, input_buffer + input_offset, input_len);
> - if (err <= 0)
> - break;
> - input_len -= err;
> - input_offset += err;
> - }
> + /* Write the last part of the buffer to stdout */
> + write_in_full(1, input_buffer + input_offset, input_len);
With this change we stop updating `input_len` and `input_offset`, both
of which are global variables. Assuming that tests pass this must be
okay right now given that this is the final part of what we are writing.
But I wonder whether we shouldn't update those regardless just so that
these remain consistent?
> }
>
> strbuf_release(&rev_index_name);
> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> index e0a701f2b3..f1c85a00ae 100644
> --- a/builtin/unpack-objects.c
> +++ b/builtin/unpack-objects.c
> @@ -679,13 +679,7 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix UNUSED)
> use(the_hash_algo->rawsz);
>
> /* Write the last part of the buffer to stdout */
> - while (len) {
> - int ret = xwrite(1, buffer + offset, len);
> - if (ret <= 0)
> - break;
> - len -= ret;
> - offset += ret;
> - }
> + write_in_full(1, buffer + offset, len);
Same here.
Patrick
> /* All done */
> return has_errors;
> --
> 2.44.0-84-gb387623c12
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/3] unpack: replace xwrite() loop with write_in_full()
2024-03-04 6:58 ` Patrick Steinhardt
@ 2024-03-04 7:29 ` Junio C Hamano
2024-03-04 16:43 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2024-03-04 7:29 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
Patrick Steinhardt <ps@pks.im> writes:
>> - while (input_len) {
>> - err = xwrite(1, input_buffer + input_offset, input_len);
>> - if (err <= 0)
>> - break;
>> - input_len -= err;
>> - input_offset += err;
>> - }
>> + /* Write the last part of the buffer to stdout */
>> + write_in_full(1, input_buffer + input_offset, input_len);
>
> With this change we stop updating `input_len` and `input_offset`, both
> of which are global variables. Assuming that tests pass this must be
> okay right now given that this is the final part of what we are writing.
> But I wonder whether we shouldn't update those regardless just so that
> these remain consistent?
It is probably a good hygiene, even though it may not matter at all
for the correctness in the current code.
Thanks for your sharp eyes.
>> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
>> index e0a701f2b3..f1c85a00ae 100644
>> --- a/builtin/unpack-objects.c
>> +++ b/builtin/unpack-objects.c
>> @@ -679,13 +679,7 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix UNUSED)
>> use(the_hash_algo->rawsz);
>>
>> /* Write the last part of the buffer to stdout */
>> - while (len) {
>> - int ret = xwrite(1, buffer + offset, len);
>> - if (ret <= 0)
>> - break;
>> - len -= ret;
>> - offset += ret;
>> - }
>> + write_in_full(1, buffer + offset, len);
>
> Same here.
>
> Patrick
>
>> /* All done */
>> return has_errors;
>> --
>> 2.44.0-84-gb387623c12
>>
>>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/3] unpack: replace xwrite() loop with write_in_full()
2024-03-04 7:29 ` Junio C Hamano
@ 2024-03-04 16:43 ` Junio C Hamano
0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2024-03-04 16:43 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
Junio C Hamano <gitster@pobox.com> writes:
> Patrick Steinhardt <ps@pks.im> writes:
>
>>> - while (input_len) {
>>> - err = xwrite(1, input_buffer + input_offset, input_len);
>>> - if (err <= 0)
>>> - break;
>>> - input_len -= err;
>>> - input_offset += err;
>>> - }
>>> + /* Write the last part of the buffer to stdout */
>>> + write_in_full(1, input_buffer + input_offset, input_len);
>>
>> With this change we stop updating `input_len` and `input_offset`, both
>> of which are global variables. Assuming that tests pass this must be
>> okay right now given that this is the final part of what we are writing.
>> But I wonder whether we shouldn't update those regardless just so that
>> these remain consistent?
>
> It is probably a good hygiene, even though it may not matter at all
> for the correctness in the current code.
>
> Thanks for your sharp eyes.
Actually, I changed my mind. As you said, this is flushing the very
end of the data in the input_buffer[] and nobody will fill() the
input_buffer[] after the call to this function happens.
>>> - while (len) {
>>> ...
>>> - len -= ret;
>>> - offset += ret;
>>> - }
>>> + write_in_full(1, buffer + offset, len);
>>
>> Same here.
Ditto. We are about to pass the control back to the caller that
will exit using the "has_errors" we return from here.
>>
>> Patrick
>>
>>> /* All done */
>>> return has_errors;
>>> --
>>> 2.44.0-84-gb387623c12
>>>
>>>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/3] sideband: avoid short write(2)
2024-03-02 19:03 [PATCH 0/3] Auditing use of xwrite() Junio C Hamano
2024-03-02 19:03 ` [PATCH 1/3] unpack: replace xwrite() loop with write_in_full() Junio C Hamano
@ 2024-03-02 19:03 ` Junio C Hamano
2024-03-02 19:03 ` [PATCH 3/3] repack: check error writing to pack-objects subprocess Junio C Hamano
2 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2024-03-02 19:03 UTC (permalink / raw)
To: git
The sideband demultiplexor writes the data it receives on sideband
with xwrite(). We can lose data if the underlying write(2) results
in a short write.
If they are limited to unimportant bytes like eye-candy progress
meter, it may be OK to lose them, but lets be careful and ensure
that we use write_in_full() instead. Note that the original does
not check for errors, and this rewrite does not check for one. At
least not yet.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
sideband.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sideband.c b/sideband.c
index 266a67342b..5d8907151f 100644
--- a/sideband.c
+++ b/sideband.c
@@ -220,7 +220,7 @@ int demultiplex_sideband(const char *me, int status,
}
strbuf_addch(scratch, *brk);
- xwrite(2, scratch->buf, scratch->len);
+ write_in_full(2, scratch->buf, scratch->len);
strbuf_reset(scratch);
b = brk + 1;
@@ -247,7 +247,7 @@ int demultiplex_sideband(const char *me, int status,
die("%s", scratch->buf);
if (scratch->len) {
strbuf_addch(scratch, '\n');
- xwrite(2, scratch->buf, scratch->len);
+ write_in_full(2, scratch->buf, scratch->len);
}
strbuf_release(scratch);
return 1;
--
2.44.0-84-gb387623c12
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/3] repack: check error writing to pack-objects subprocess
2024-03-02 19:03 [PATCH 0/3] Auditing use of xwrite() Junio C Hamano
2024-03-02 19:03 ` [PATCH 1/3] unpack: replace xwrite() loop with write_in_full() Junio C Hamano
2024-03-02 19:03 ` [PATCH 2/3] sideband: avoid short write(2) Junio C Hamano
@ 2024-03-02 19:03 ` Junio C Hamano
2 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2024-03-02 19:03 UTC (permalink / raw)
To: git
When "git repack" repacks promisor objects, it starts a pack-objects
subprocess and uses xwrite() to send object names over the pipe to
it, but without any error checking. An I/O error or short write
(even though a short write is unlikely for such a small amount of
data) can result in a packfile that lacks certain objects we wanted
to put in there, leading to a silent repository corruption.
Use write_in_full(), instead of xwrite(), to mitigate short write
risks, check errors from it, and abort if we see a failure.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
builtin/repack.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/builtin/repack.c b/builtin/repack.c
index ede36328a3..15e4cccc45 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -314,8 +314,9 @@ static int write_oid(const struct object_id *oid,
die(_("could not start pack-objects to repack promisor objects"));
}
- xwrite(cmd->in, oid_to_hex(oid), the_hash_algo->hexsz);
- xwrite(cmd->in, "\n", 1);
+ if (write_in_full(cmd->in, oid_to_hex(oid), the_hash_algo->hexsz) < 0 ||
+ write_in_full(cmd->in, "\n", 1) < 0)
+ die(_("failed to feed promisor objects to pack-objects"));
return 0;
}
--
2.44.0-84-gb387623c12
^ permalink raw reply related [flat|nested] 7+ messages in thread