[PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in

Git development
 help / color / mirror / Atom feed

* [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()`
@ 2026-04-09 12:47 Patrick Steinhardt
  2026-04-09 15:46 ` Ben Knoble
  2026-04-09 16:42 ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Patrick Steinhardt @ 2026-04-09 12:47 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, brian m. carlson, Randall S. Becker,
	Phillip Wood, Randall Becker

Some systems like NonStop set a comparatively small `MAX_IO_SIZE`, which
limits the maximum number of bytes we're allowed to write in a single
call. We already handle this limit properly in `xwrite()`, but we have
recently introduced wrappers for writev(3p) where we don't. This will
cause the syscall to return EINVAL in case somebody passes an iovec
entry to writev(3p) that is larger than `MAX_IO_SIZE`.

Introduce a new function `xwritev()` that is similar to `xwrite()` in
that it handles such platform-specific nuances. The logic is rather
simple: we simply coalesce all iovecs that don't exceed `MAX_IO_SIZE`
and pass those to writev(3p). If the first iovec already exceeds the
limit, we'll instead pass it to `xwrite()`, which handles the limit for
us.

Adapt `writev_in_full()` to use this new wrapper. As this wrapper
already knows to to call writev(3p) in a loop already it doesn't need
any further adjustment.

Reported-by: Randall Becker <randall.becker@nexbridge.ca>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
Hi,

this fixes the issue reported by Randall in [1].

I mostly wanted to get this patch out there so that we can discuss a
proposed fix, but as said in the thread I'm also happy to revise course
and instead set NO_WRITEV on NonStop for now. I think we'll want to
eventually land a fix like the one proposed here though, and at that
point the workaround would not be required anymore.

Thanks!

Patrick

[1]: <00f401dcc6e6$7183c0f0$548b42d0$@nexbridge.com>
---
 wrapper.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------
 wrapper.h |  1 +
 2 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/wrapper.c b/wrapper.c
index be8fa575e6..d989c78b4b 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -323,21 +323,60 @@ ssize_t write_in_full(int fd, const void *buf, size_t count)
 	return total;
 }
 
+ssize_t xwritev(int fd, struct iovec *iov, int iovcnt)
+{
+	ssize_t bytes_written;
+	size_t total_length;
+	int i;
+
+	/*
+	 * We need to make sure that writev(3p) call does not write more than
+	 * `MAX_IO_SIZE` many bytes. If we do exceed that limit, we only pass
+	 * those iovecs to writev(3p) that sum up to less than the limit.
+	 *
+	 * If on the other hand the first iovec entry already exceeds this
+	 * limit we'll instead use xwrite() to write it, which knows to handle
+	 * `MAX_IO_SIZE` for us.
+	 */
+	for (i = 0, total_length = 0; i < iovcnt; i++) {
+		if (unsigned_add_overflows(total_length, iov[i].iov_len))
+			break;
+
+		total_length += iov[i].iov_len;
+		if (total_length > MAX_IO_SIZE)
+			break;
+	}
+
+	if (i < iovcnt) {
+		/*
+		 * The first entry exceeds MAX_IO_SIZE, so we pass it to
+		 * xwrite, which knows to handle this case.
+		 */
+		if (!i)
+			return xwrite(fd, iov->iov_base, iov->iov_len);
+		iovcnt = i;
+	}
+
+	bytes_written = writev(fd, iov, iovcnt);
+	if (!bytes_written) {
+		errno = ENOSPC;
+		return -1;
+	}
+
+	return bytes_written;
+}
+
 ssize_t writev_in_full(int fd, struct iovec *iov, int iovcnt)
 {
 	ssize_t total_written = 0;
 
 	while (iovcnt) {
-		ssize_t bytes_written = writev(fd, iov, iovcnt);
-		if (bytes_written < 0) {
+		ssize_t bytes_written = xwritev(fd, iov, iovcnt);
+		if (bytes_written <= 0) {
 			if (errno == EINTR || errno == EAGAIN)
 				continue;
 			return -1;
 		}
-		if (!bytes_written) {
-			errno = ENOSPC;
-			return -1;
-		}
 
 		total_written += bytes_written;
 
diff --git a/wrapper.h b/wrapper.h
index 27519b32d1..a6287d7f4d 100644
--- a/wrapper.h
+++ b/wrapper.h
@@ -16,6 +16,7 @@ void *xmmap_gently(void *start, size_t length, int prot, int flags, int fd, off_
 int xopen(const char *path, int flags, ...);
 ssize_t xread(int fd, void *buf, size_t len);
 ssize_t xwrite(int fd, const void *buf, size_t len);
+ssize_t xwritev(int fd, struct iovec *iov, int iovcnt);
 ssize_t xpread(int fd, void *buf, size_t len, off_t offset);
 int xdup(int fd);
 FILE *xfopen(const char *path, const char *mode);

---
base-commit: b15384c06f77bc2d34d0d3623a8a58218313a561
change-id: 20260409-b4-pks-writev-max-io-size-e9b803439ae8


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()`
  2026-04-09 12:47 [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` Patrick Steinhardt
@ 2026-04-09 15:46 ` Ben Knoble
  2026-04-09 16:42 ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Ben Knoble @ 2026-04-09 15:46 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Junio C Hamano, Jeff King, brian m. carlson,
	Randall S. Becker, Phillip Wood, Randall Becker

Admitting I am out of my depth here…

> Le 9 avr. 2026 à 08:52, Patrick Steinhardt <ps@pks.im> a écrit :
> 
> Some systems like NonStop set a comparatively small `MAX_IO_SIZE`, which
> limits the maximum number of bytes we're allowed to write in a single
> call. We already handle this limit properly in `xwrite()`, but we have
> recently introduced wrappers for writev(3p) where we don't. This will
> cause the syscall to return EINVAL in case somebody passes an iovec
> entry to writev(3p) that is larger than `MAX_IO_SIZE`.
> 
> Introduce a new function `xwritev()` that is similar to `xwrite()` in
> that it handles such platform-specific nuances. The logic is rather
> simple: we simply coalesce all iovecs that don't exceed `MAX_IO_SIZE`
> and pass those to writev(3p). If the first iovec already exceeds the
> limit, we'll instead pass it to `xwrite()`, which handles the limit for
> us.
> 
> Adapt `writev_in_full()` to use this new wrapper. As this wrapper
> already knows to to call writev(3p) in a loop already it doesn't need
> any further adjustment.
> 
> Reported-by: Randall Becker <randall.becker@nexbridge.ca>
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> Hi,
> 
> this fixes the issue reported by Randall in [1].
> 
> I mostly wanted to get this patch out there so that we can discuss a
> proposed fix, but as said in the thread I'm also happy to revise course
> and instead set NO_WRITEV on NonStop for now. I think we'll want to
> eventually land a fix like the one proposed here though, and at that
> point the workaround would not be required anymore.
> 
> Thanks!
> 
> Patrick
> 
> [1]: <00f401dcc6e6$7183c0f0$548b42d0$@nexbridge.com>
> ---
> wrapper.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------
> wrapper.h |  1 +
> 2 files changed, 46 insertions(+), 6 deletions(-)
> 
> diff --git a/wrapper.c b/wrapper.c
> index be8fa575e6..d989c78b4b 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -323,21 +323,60 @@ ssize_t write_in_full(int fd, const void *buf, size_t count)
>    return total;
> }
> 
> +ssize_t xwritev(int fd, struct iovec *iov, int iovcnt)
> +{
> +    ssize_t bytes_written;
> +    size_t total_length;
> +    int i;
> +
> +    /*
> +     * We need to make sure that writev(3p) call does not write more than
> +     * `MAX_IO_SIZE` many bytes. If we do exceed that limit, we only pass
> +     * those iovecs to writev(3p) that sum up to less than the limit.
> +     *
> +     * If on the other hand the first iovec entry already exceeds this
> +     * limit we'll instead use xwrite() to write it, which knows to handle
> +     * `MAX_IO_SIZE` for us.
> +     */
> +    for (i = 0, total_length = 0; i < iovcnt; i++) {
> +        if (unsigned_add_overflows(total_length, iov[i].iov_len))
> +            break;
> +
> +        total_length += iov[i].iov_len;
> +        if (total_length > MAX_IO_SIZE)
> +            break;
> +    }
> +
> +    if (i < iovcnt) {
> +        /*
> +         * The first entry exceeds MAX_IO_SIZE, so we pass it to
> +         * xwrite, which knows to handle this case.
> +         */
> +        if (!i)
> +            return xwrite(fd, iov->iov_base, iov->iov_len);

It took me starting to write this email wondering “but i could be >= 1?” to realize that this comment applies to the !i case below. Darn.

Still, I find the declaration (“the first entry exceeds”) before the check a bit confusing. Is that typical of our style (in which case leave it be)?

> +        iovcnt = i;
> +    }
> +
> +    bytes_written = writev(fd, iov, iovcnt);
> +    if (!bytes_written) {
> +        errno = ENOSPC;
> +        return -1;
> +    }
> +
> +    return bytes_written;
> +}
> +
> ssize_t writev_in_full(int fd, struct iovec *iov, int iovcnt)
> {
>    ssize_t total_written = 0;
> 
>    while (iovcnt) {
> -        ssize_t bytes_written = writev(fd, iov, iovcnt);
> -        if (bytes_written < 0) {
> +        ssize_t bytes_written = xwritev(fd, iov, iovcnt);
> +        if (bytes_written <= 0) {
>            if (errno == EINTR || errno == EAGAIN)
>                continue;
>            return -1;
>        }
> -        if (!bytes_written) {
> -            errno = ENOSPC;
> -            return -1;
> -        }
> 
>        total_written += bytes_written;
> 
> diff --git a/wrapper.h b/wrapper.h
> index 27519b32d1..a6287d7f4d 100644
> --- a/wrapper.h
> +++ b/wrapper.h
> @@ -16,6 +16,7 @@ void *xmmap_gently(void *start, size_t length, int prot, int flags, int fd, off_
> int xopen(const char *path, int flags, ...);
> ssize_t xread(int fd, void *buf, size_t len);
> ssize_t xwrite(int fd, const void *buf, size_t len);
> +ssize_t xwritev(int fd, struct iovec *iov, int iovcnt);
> ssize_t xpread(int fd, void *buf, size_t len, off_t offset);
> int xdup(int fd);
> FILE *xfopen(const char *path, const char *mode);
> 
> ---
> base-commit: b15384c06f77bc2d34d0d3623a8a58218313a561
> change-id: 20260409-b4-pks-writev-max-io-size-e9b803439ae8
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()`
  2026-04-09 12:47 [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` Patrick Steinhardt
  2026-04-09 15:46 ` Ben Knoble
@ 2026-04-09 16:42 ` Junio C Hamano
  2026-04-09 20:23   ` Jeff King
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2026-04-09 16:42 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Jeff King, brian m. carlson, Randall S. Becker, Phillip Wood

Patrick Steinhardt <ps@pks.im> writes:

> Some systems like NonStop set a comparatively small `MAX_IO_SIZE`, which
> limits the maximum number of bytes we're allowed to write in a single
> call. We already handle this limit properly in `xwrite()`, but we have
> recently introduced wrappers for writev(3p) where we don't. This will
> cause the syscall to return EINVAL in case somebody passes an iovec
> entry to writev(3p) that is larger than `MAX_IO_SIZE`.
>
> Introduce a new function `xwritev()` that is similar to `xwrite()` in
> that it handles such platform-specific nuances. The logic is rather
> simple: we simply coalesce all iovecs that don't exceed `MAX_IO_SIZE`
> and pass those to writev(3p). If the first iovec already exceeds the
> limit, we'll instead pass it to `xwrite()`, which handles the limit for
> us.

OK, so the idea is just like xwrite(), whose original purpose was to
hide the details of having to restart write() system call, pretends
a short write on IO limited platforms, xwritev() pretends that the
underlying writev() gave a short write, instead of a failure with
EINVAL when ssize_t is unusually small.  That mirrors an established
pattern that is proven to work with write_in_full() code path, which
is a good thing.

By the way, I see that EINTR and EWOULDBLOCK are handled somewhat
differently from how xwrite() handles it.  As the original change
that introduced use of writev() were to replace write_in_full() that
eventually called into xwrite(), shouldn't we be closer to what
xwrite() used to do?

> this fixes the issue reported by Randall in [1].
>
> I mostly wanted to get this patch out there so that we can discuss a
> proposed fix, but as said in the thread I'm also happy to revise course
> and instead set NO_WRITEV on NonStop for now. I think we'll want to
> eventually land a fix like the one proposed here though, and at that
> point the workaround would not be required anymore.

It is too late to make this kind of "let's wrap writev()" effort
before the final, and Git 2.54 should ship without any such change.
If your platform does not have a writev() that works with write size
up to half of maximum of size_t (or at least 64kB), use NO_WRITEV to
build your Git.

But let's discuss to prepare for an update post release.

It would be nice to see minority platforms including NonStop test
and notice problems that appear only on their system much earlier in
the cycle next time.  A report at -rc0, while better than not seeing
any, is a bit too late.

> diff --git a/wrapper.c b/wrapper.c
> index be8fa575e6..d989c78b4b 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -323,21 +323,60 @@ ssize_t write_in_full(int fd, const void *buf, size_t count)
>  	return total;
>  }
>  
> +ssize_t xwritev(int fd, struct iovec *iov, int iovcnt)
> +{
> +	ssize_t bytes_written;
> +	size_t total_length;
> +	int i;
> +
> +	/*
> +	 * We need to make sure that writev(3p) call does not write more than
> +	 * `MAX_IO_SIZE` many bytes. If we do exceed that limit, we only pass
> +	 * those iovecs to writev(3p) that sum up to less than the limit.
> +	 *
> +	 * If on the other hand the first iovec entry already exceeds this
> +	 * limit we'll instead use xwrite() to write it, which knows to handle
> +	 * `MAX_IO_SIZE` for us.
> +	 */
> +	for (i = 0, total_length = 0; i < iovcnt; i++) {
> +		if (unsigned_add_overflows(total_length, iov[i].iov_len))
> +			break;

We add .iov_len up in this first loop, because we do not want to
bust writev(3p)'s limit.

	EINVAL The sum of the iov_len values in the iov array would
              overflow an ssize_t.

cf. https://pubs.opengroup.org/onlinepubs/9799919799/functions/writev.html

As the width of ssize_t in bits can be a lot smaller than size_t,
the above "unsigned_add_overflows() triggers way too late for the
check to matter, no?

> +		total_length += iov[i].iov_len;

Then we have total_length computed.

> +		if (total_length > MAX_IO_SIZE)
> +			break;

And it is capped to MAX_IO_SIZE, which is set way lower than
SSIZE_MAX even on mainstream platforms (8MB, IIRC).  This does not
matter only because we are currently using writev() only for
sideband communication and its payload is limited to 64kB, but if a
caller gave us a list of buffers whose total size ranges in a few
hundred megabytes (e.g., packfiles to clone a small project like
git.git itself), on a not-so-I/O-limited platform, do we still want
to chomp the original request into multiple writev(3p) system calls?

I personally think it is an OK thing to do, simply because we also
chomp a large xwrite() request into chunks and make multiple
write(2) system calls, but we should explain the reason behind "We
need to make sure" in the beginning of the above comment well--the
current text has no explanation on the reason.

> +	}
> +
> +	if (i < iovcnt) {
> +		/*
> +		 * The first entry exceeds MAX_IO_SIZE, so we pass it to
> +		 * xwrite, which knows to handle this case.
> +		 */
> +		if (!i)
> +			return xwrite(fd, iov->iov_base, iov->iov_len);

Ben has a similar comment, but it would be easier to see the
correspondence if you rephrase the comment perhaps like

		/*
                 * If the first buffer is larger than MAX_IO_SIZE,
                 * let xwrite() deal with it.
                 */

xwrite() can return a short write, but the caller is counting how
many bytes among what it passed to xwritev() are consumed by each
call to this function, so the next call we may be seeing may have
iov->iov_base pointing into the buffer we threw at xwrite() with
reduced iov->iov_len, and that is expected and everything will even
out.  Quite nice.

> +		iovcnt = i;

And if our iov[0].iov_len is smaller than MAX_IO_SIZE, then i would
be at least 1 here and shows the index in iov[] array that busts the
limit.  IOW, we know iov[0]..iov[i-1] (inclusive) can be written
without busting MAX_IO_SIZE in one go.  So se reduce iovcnt down to
that number here, in preparation for the next call.

> +	}

By the way, if the total of iov[] is reaonably small, then the
initial loop runs to the end of it, the above if() statement will be
skipped, and iovcnt is not reduced.

Either way, we now pass the initial part (which may be the entirety)
of iov[] up to iovcnt to writev().

> +	bytes_written = writev(fd, iov, iovcnt);
> +	if (!bytes_written) {
> +		errno = ENOSPC;
> +		return -1;
> +	}
> +
> +	return bytes_written;
> +}

OK.  So except for "is unsigned_add_overflows() doing what we want?"
question, I think the above is reasonable.

If we used "let's make sure sum of iov[].iov_len does not overflow
an ssize_t" at the beginning of the loop, it does change the
contract between the callers and this function, as they can no
longer get EINVAL due to such overflow (instead this function will
chomp the request into smaller pieces, just like MAX_IO_SIZE is used
here).  And I think that is a reasonable semantics.

>  ssize_t writev_in_full(int fd, struct iovec *iov, int iovcnt)
>  {
>  	ssize_t total_written = 0;
>  
>  	while (iovcnt) {
> -		ssize_t bytes_written = writev(fd, iov, iovcnt);
> -		if (bytes_written < 0) {
> +		ssize_t bytes_written = xwritev(fd, iov, iovcnt);
> +		if (bytes_written <= 0) {
>  			if (errno == EINTR || errno == EAGAIN)
>  				continue;

I am not sure if this is how we want to handle EINTR, given
especially that xwritev() may have called xwrite() under the hood in
some case but not in others.  If we are doing anything to these
signals, I think it should be done where we call underlying writev(),
and we should be close to whatever is done in xwrite() where it
calls write().

>  			return -1;
>  		}
> -		if (!bytes_written) {
> -			errno = ENOSPC;
> -			return -1;
> -		}

On the other hand, I think moving this into xwritev() is a mistake.
We shoudl try to be as close to what these functions are replacing
(i.e. write_in_full and xwrite combo) in the code paths that are
rewritten to use them.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()`
  2026-04-09 16:42 ` Junio C Hamano
@ 2026-04-09 20:23   ` Jeff King
  2026-04-09 20:40     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2026-04-09 20:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Patrick Steinhardt, git, brian m. carlson, Randall S. Becker,
	Phillip Wood

On Thu, Apr 09, 2026 at 09:42:42AM -0700, Junio C Hamano wrote:

> > +	for (i = 0, total_length = 0; i < iovcnt; i++) {
> > +		if (unsigned_add_overflows(total_length, iov[i].iov_len))
> > +			break;
> 
> We add .iov_len up in this first loop, because we do not want to
> bust writev(3p)'s limit.
> 
> 	EINVAL The sum of the iov_len values in the iov array would
>               overflow an ssize_t.
> 
> cf. https://pubs.opengroup.org/onlinepubs/9799919799/functions/writev.html
> 
> As the width of ssize_t in bits can be a lot smaller than size_t,
> the above "unsigned_add_overflows() triggers way too late for the
> check to matter, no?

I think it is correct as-is.

The real check against ssize_t is later, when we compare total_length to
MAX_IO_SIZE (which is clamped to SSIZE_MAX). So this is just making sure
we do not overflow size_t when counting up the total (and if we do, we
_know_ we are going to overflow ssize_t, which must be smaller).

I think this can be made more clear by counting down allowable bytes
instead of up. I'll show an example in a second.

> > +	}
> > +
> > +	if (i < iovcnt) {
> > +		/*
> > +		 * The first entry exceeds MAX_IO_SIZE, so we pass it to
> > +		 * xwrite, which knows to handle this case.
> > +		 */
> > +		if (!i)
> > +			return xwrite(fd, iov->iov_base, iov->iov_len);
> 
> Ben has a similar comment, but it would be easier to see the
> correspondence if you rephrase the comment perhaps like
> [...]

Me three. I think this can be made more clear if we bail to xwrite()
immediately in the loop. So together with the count-down, something
like:

   ssize_t allowed = MAX_IO_SIZE;
   int i;

   for (i = 0; i < iovcnt; i++) {
	if (iov[i].iov_len > allowed) {
		if (!i)
			return xwrite(fd, iov->iov_base, iov_len);
		break;
	}
	allowed -= iov[i].iov_len;
  }
  return writev(fd, iov, i);

You can also directly return writev() instead of breaking out of the
loop. That makes it even more clear that we are doing a partial write,
but means duplicating the "return writev()" line.

And I think the whole thing would still deserve comments, but I omitted
them here since the point was to show the rearranged structure.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()`
  2026-04-09 20:23   ` Jeff King
@ 2026-04-09 20:40     ` Junio C Hamano
  2026-04-09 20:59       ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2026-04-09 20:40 UTC (permalink / raw)
  To: Jeff King
  Cc: Patrick Steinhardt, git, brian m. carlson, Randall S. Becker,
	Phillip Wood

Jeff King <peff@peff.net> writes:

> On Thu, Apr 09, 2026 at 09:42:42AM -0700, Junio C Hamano wrote:
>
>> > +	for (i = 0, total_length = 0; i < iovcnt; i++) {
>> > +		if (unsigned_add_overflows(total_length, iov[i].iov_len))
>> > +			break;
>> 
>> We add .iov_len up in this first loop, because we do not want to
>> bust writev(3p)'s limit.
>> 
>> 	EINVAL The sum of the iov_len values in the iov array would
>>               overflow an ssize_t.
>> 
>> cf. https://pubs.opengroup.org/onlinepubs/9799919799/functions/writev.html
>> 
>> As the width of ssize_t in bits can be a lot smaller than size_t,
>> the above "unsigned_add_overflows() triggers way too late for the
>> check to matter, no?
>
> I think it is correct as-is.
>
> The real check against ssize_t is later, when we compare total_length to
> MAX_IO_SIZE (which is clamped to SSIZE_MAX). So this is just making sure
> we do not overflow size_t when counting up the total (and if we do, we
> _know_ we are going to overflow ssize_t, which must be smaller).

But then what happens after it breaks out of the loop?  We cannot be
at i==0, so let's say we have a reasonably small iov[0] and iov[1]
that is so large and makes size_t wraparound.  We break out here,
and then send the iov[0] with writev().  But have we checked if
iov[0] is under MAX_IO_SIZE in that case before calling writev()?

> I think this can be made more clear by counting down allowable bytes
> instead of up. I'll show an example in a second.
>
>> > +	}
>> > +
>> > +	if (i < iovcnt) {
>> > +		/*
>> > +		 * The first entry exceeds MAX_IO_SIZE, so we pass it to
>> > +		 * xwrite, which knows to handle this case.
>> > +		 */
>> > +		if (!i)
>> > +			return xwrite(fd, iov->iov_base, iov->iov_len);
>> 
>> Ben has a similar comment, but it would be easier to see the
>> correspondence if you rephrase the comment perhaps like
>> [...]
>
> Me three. I think this can be made more clear if we bail to xwrite()
> immediately in the loop. So together with the count-down, something
> like:
>
>    ssize_t allowed = MAX_IO_SIZE;
>    int i;
>
>    for (i = 0; i < iovcnt; i++) {
> 	if (iov[i].iov_len > allowed) {
> 		if (!i)
> 			return xwrite(fd, iov->iov_base, iov_len);
> 		break;
> 	}
> 	allowed -= iov[i].iov_len;
>   }
>   return writev(fd, iov, i);

I agree that this is much simpler to follow.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()`
  2026-04-09 20:40     ` Junio C Hamano
@ 2026-04-09 20:59       ` Jeff King
  2026-04-09 21:09         ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2026-04-09 20:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Patrick Steinhardt, git, brian m. carlson, Randall S. Becker,
	Phillip Wood

On Thu, Apr 09, 2026 at 01:40:36PM -0700, Junio C Hamano wrote:

> >> As the width of ssize_t in bits can be a lot smaller than size_t,
> >> the above "unsigned_add_overflows() triggers way too late for the
> >> check to matter, no?
> >
> > I think it is correct as-is.
> >
> > The real check against ssize_t is later, when we compare total_length to
> > MAX_IO_SIZE (which is clamped to SSIZE_MAX). So this is just making sure
> > we do not overflow size_t when counting up the total (and if we do, we
> > _know_ we are going to overflow ssize_t, which must be smaller).
> 
> But then what happens after it breaks out of the loop?  We cannot be
> at i==0, so let's say we have a reasonably small iov[0] and iov[1]
> that is so large and makes size_t wraparound.  We break out here,
> and then send the iov[0] with writev().  But have we checked if
> iov[0] is under MAX_IO_SIZE in that case before calling writev()?

I think so. Either:

  - We completed the first iteration of the loop successfully (and i >=
    1), in which case we added iov[0].iov_len to total_length, and then
    compared total_length against MAX_IO_SIZE, but did not break out of
    the loop. So we know iov[0] is within the limits.

  - We bailed at i==0 either because of addition overflow, or because of
    the MAX_IO_SIZE check. Either way, we will bail to xwrite() because
    i is 0.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()`
  2026-04-09 20:59       ` Jeff King
@ 2026-04-09 21:09         ` Junio C Hamano
  2026-04-10  5:19           ` Patrick Steinhardt
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2026-04-09 21:09 UTC (permalink / raw)
  To: Jeff King
  Cc: Patrick Steinhardt, git, brian m. carlson, Randall S. Becker,
	Phillip Wood

Jeff King <peff@peff.net> writes:

> On Thu, Apr 09, 2026 at 01:40:36PM -0700, Junio C Hamano wrote:
>
>> >> As the width of ssize_t in bits can be a lot smaller than size_t,
>> >> the above "unsigned_add_overflows() triggers way too late for the
>> >> check to matter, no?
>> >
>> > I think it is correct as-is.
>> >
>> > The real check against ssize_t is later, when we compare total_length to
>> > MAX_IO_SIZE (which is clamped to SSIZE_MAX). So this is just making sure
>> > we do not overflow size_t when counting up the total (and if we do, we
>> > _know_ we are going to overflow ssize_t, which must be smaller).
>> 
>> But then what happens after it breaks out of the loop?  We cannot be
>> at i==0, so let's say we have a reasonably small iov[0] and iov[1]
>> that is so large and makes size_t wraparound.  We break out here,
>> and then send the iov[0] with writev().  But have we checked if
>> iov[0] is under MAX_IO_SIZE in that case before calling writev()?
>
> I think so. Either:
>
>   - We completed the first iteration of the loop successfully (and i >=
>     1), in which case we added iov[0].iov_len to total_length, and then
>     compared total_length against MAX_IO_SIZE, but did not break out of
>     the loop. So we know iov[0] is within the limits.
>
>   - We bailed at i==0 either because of addition overflow, or because of
>     the MAX_IO_SIZE check. Either way, we will bail to xwrite() because
>     i is 0.

Yup, you're right.

There is no addition overflow at i==0, but I do not think we can
construct a case where the sum is not checked against MAX_IO_SIZE
before the vector is passed to underlying writev().

iov[0].iov_len that is slightly smaller than MAX_IO_SIZE would allow
us to keep looping to i==1 at which time iov[1].iov_len is so big
that we may trigger unsigned_add_overflows() check, but then what we
send to writev() is the first segment, which is smaller than
MAX_IO_SIZE, so we are OK.

iov[0].iov_len that is slightly larger than MAX_IO_SIZE would stop
us moving to i==1 at the end of the loop, and directly punt to
xwrite(), so we are OK, too.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()`
  2026-04-09 21:09         ` Junio C Hamano
@ 2026-04-10  5:19           ` Patrick Steinhardt
  0 siblings, 0 replies; 8+ messages in thread
From: Patrick Steinhardt @ 2026-04-10  5:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, git, brian m. carlson, Randall S. Becker, Phillip Wood

On Thu, Apr 09, 2026 at 02:09:42PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > On Thu, Apr 09, 2026 at 01:40:36PM -0700, Junio C Hamano wrote:
> >
> >> >> As the width of ssize_t in bits can be a lot smaller than size_t,
> >> >> the above "unsigned_add_overflows() triggers way too late for the
> >> >> check to matter, no?
> >> >
> >> > I think it is correct as-is.
> >> >
> >> > The real check against ssize_t is later, when we compare total_length to
> >> > MAX_IO_SIZE (which is clamped to SSIZE_MAX). So this is just making sure
> >> > we do not overflow size_t when counting up the total (and if we do, we
> >> > _know_ we are going to overflow ssize_t, which must be smaller).
> >> 
> >> But then what happens after it breaks out of the loop?  We cannot be
> >> at i==0, so let's say we have a reasonably small iov[0] and iov[1]
> >> that is so large and makes size_t wraparound.  We break out here,
> >> and then send the iov[0] with writev().  But have we checked if
> >> iov[0] is under MAX_IO_SIZE in that case before calling writev()?
> >
> > I think so. Either:
> >
> >   - We completed the first iteration of the loop successfully (and i >=
> >     1), in which case we added iov[0].iov_len to total_length, and then
> >     compared total_length against MAX_IO_SIZE, but did not break out of
> >     the loop. So we know iov[0] is within the limits.
> >
> >   - We bailed at i==0 either because of addition overflow, or because of
> >     the MAX_IO_SIZE check. Either way, we will bail to xwrite() because
> >     i is 0.
> 
> Yup, you're right.
> 
> There is no addition overflow at i==0, but I do not think we can
> construct a case where the sum is not checked against MAX_IO_SIZE
> before the vector is passed to underlying writev().
> 
> iov[0].iov_len that is slightly smaller than MAX_IO_SIZE would allow
> us to keep looping to i==1 at which time iov[1].iov_len is so big
> that we may trigger unsigned_add_overflows() check, but then what we
> send to writev() is the first segment, which is smaller than
> MAX_IO_SIZE, so we are OK.
> 
> iov[0].iov_len that is slightly larger than MAX_IO_SIZE would stop
> us moving to i==1 at the end of the loop, and directly punt to
> xwrite(), so we are OK, too.

Let's drop this patch for now. I'll pick it up again in the next release
cycle when reintroducing writev(3p). Thanks, all!

Patrick

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-10  5:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 12:47 [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` Patrick Steinhardt
2026-04-09 15:46 ` Ben Knoble
2026-04-09 16:42 ` Junio C Hamano
2026-04-09 20:23   ` Jeff King
2026-04-09 20:40     ` Junio C Hamano
2026-04-09 20:59       ` Jeff King
2026-04-09 21:09         ` Junio C Hamano
2026-04-10  5:19           ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox