* [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` @ 2026-04-09 12:47 Patrick Steinhardt 2026-04-09 15:46 ` Ben Knoble 2026-04-09 16:42 ` Junio C Hamano 0 siblings, 2 replies; 8+ messages in thread From: Patrick Steinhardt @ 2026-04-09 12:47 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, brian m. carlson, Randall S. Becker, Phillip Wood, Randall Becker Some systems like NonStop set a comparatively small `MAX_IO_SIZE`, which limits the maximum number of bytes we're allowed to write in a single call. We already handle this limit properly in `xwrite()`, but we have recently introduced wrappers for writev(3p) where we don't. This will cause the syscall to return EINVAL in case somebody passes an iovec entry to writev(3p) that is larger than `MAX_IO_SIZE`. Introduce a new function `xwritev()` that is similar to `xwrite()` in that it handles such platform-specific nuances. The logic is rather simple: we simply coalesce all iovecs that don't exceed `MAX_IO_SIZE` and pass those to writev(3p). If the first iovec already exceeds the limit, we'll instead pass it to `xwrite()`, which handles the limit for us. Adapt `writev_in_full()` to use this new wrapper. As this wrapper already knows to to call writev(3p) in a loop already it doesn't need any further adjustment. Reported-by: Randall Becker <randall.becker@nexbridge.ca> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Hi, this fixes the issue reported by Randall in [1]. I mostly wanted to get this patch out there so that we can discuss a proposed fix, but as said in the thread I'm also happy to revise course and instead set NO_WRITEV on NonStop for now. I think we'll want to eventually land a fix like the one proposed here though, and at that point the workaround would not be required anymore. Thanks! Patrick [1]: <00f401dcc6e6$7183c0f0$548b42d0$@nexbridge.com> --- wrapper.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------ wrapper.h | 1 + 2 files changed, 46 insertions(+), 6 deletions(-) diff --git a/wrapper.c b/wrapper.c index be8fa575e6..d989c78b4b 100644 --- a/wrapper.c +++ b/wrapper.c @@ -323,21 +323,60 @@ ssize_t write_in_full(int fd, const void *buf, size_t count) return total; } +ssize_t xwritev(int fd, struct iovec *iov, int iovcnt) +{ + ssize_t bytes_written; + size_t total_length; + int i; + + /* + * We need to make sure that writev(3p) call does not write more than + * `MAX_IO_SIZE` many bytes. If we do exceed that limit, we only pass + * those iovecs to writev(3p) that sum up to less than the limit. + * + * If on the other hand the first iovec entry already exceeds this + * limit we'll instead use xwrite() to write it, which knows to handle + * `MAX_IO_SIZE` for us. + */ + for (i = 0, total_length = 0; i < iovcnt; i++) { + if (unsigned_add_overflows(total_length, iov[i].iov_len)) + break; + + total_length += iov[i].iov_len; + if (total_length > MAX_IO_SIZE) + break; + } + + if (i < iovcnt) { + /* + * The first entry exceeds MAX_IO_SIZE, so we pass it to + * xwrite, which knows to handle this case. + */ + if (!i) + return xwrite(fd, iov->iov_base, iov->iov_len); + iovcnt = i; + } + + bytes_written = writev(fd, iov, iovcnt); + if (!bytes_written) { + errno = ENOSPC; + return -1; + } + + return bytes_written; +} + ssize_t writev_in_full(int fd, struct iovec *iov, int iovcnt) { ssize_t total_written = 0; while (iovcnt) { - ssize_t bytes_written = writev(fd, iov, iovcnt); - if (bytes_written < 0) { + ssize_t bytes_written = xwritev(fd, iov, iovcnt); + if (bytes_written <= 0) { if (errno == EINTR || errno == EAGAIN) continue; return -1; } - if (!bytes_written) { - errno = ENOSPC; - return -1; - } total_written += bytes_written; diff --git a/wrapper.h b/wrapper.h index 27519b32d1..a6287d7f4d 100644 --- a/wrapper.h +++ b/wrapper.h @@ -16,6 +16,7 @@ void *xmmap_gently(void *start, size_t length, int prot, int flags, int fd, off_ int xopen(const char *path, int flags, ...); ssize_t xread(int fd, void *buf, size_t len); ssize_t xwrite(int fd, const void *buf, size_t len); +ssize_t xwritev(int fd, struct iovec *iov, int iovcnt); ssize_t xpread(int fd, void *buf, size_t len, off_t offset); int xdup(int fd); FILE *xfopen(const char *path, const char *mode); --- base-commit: b15384c06f77bc2d34d0d3623a8a58218313a561 change-id: 20260409-b4-pks-writev-max-io-size-e9b803439ae8 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` 2026-04-09 12:47 [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` Patrick Steinhardt @ 2026-04-09 15:46 ` Ben Knoble 2026-04-09 16:42 ` Junio C Hamano 1 sibling, 0 replies; 8+ messages in thread From: Ben Knoble @ 2026-04-09 15:46 UTC (permalink / raw) To: Patrick Steinhardt Cc: git, Junio C Hamano, Jeff King, brian m. carlson, Randall S. Becker, Phillip Wood, Randall Becker Admitting I am out of my depth here… > Le 9 avr. 2026 à 08:52, Patrick Steinhardt <ps@pks.im> a écrit : > > Some systems like NonStop set a comparatively small `MAX_IO_SIZE`, which > limits the maximum number of bytes we're allowed to write in a single > call. We already handle this limit properly in `xwrite()`, but we have > recently introduced wrappers for writev(3p) where we don't. This will > cause the syscall to return EINVAL in case somebody passes an iovec > entry to writev(3p) that is larger than `MAX_IO_SIZE`. > > Introduce a new function `xwritev()` that is similar to `xwrite()` in > that it handles such platform-specific nuances. The logic is rather > simple: we simply coalesce all iovecs that don't exceed `MAX_IO_SIZE` > and pass those to writev(3p). If the first iovec already exceeds the > limit, we'll instead pass it to `xwrite()`, which handles the limit for > us. > > Adapt `writev_in_full()` to use this new wrapper. As this wrapper > already knows to to call writev(3p) in a loop already it doesn't need > any further adjustment. > > Reported-by: Randall Becker <randall.becker@nexbridge.ca> > Helped-by: Jeff King <peff@peff.net> > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > Hi, > > this fixes the issue reported by Randall in [1]. > > I mostly wanted to get this patch out there so that we can discuss a > proposed fix, but as said in the thread I'm also happy to revise course > and instead set NO_WRITEV on NonStop for now. I think we'll want to > eventually land a fix like the one proposed here though, and at that > point the workaround would not be required anymore. > > Thanks! > > Patrick > > [1]: <00f401dcc6e6$7183c0f0$548b42d0$@nexbridge.com> > --- > wrapper.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------ > wrapper.h | 1 + > 2 files changed, 46 insertions(+), 6 deletions(-) > > diff --git a/wrapper.c b/wrapper.c > index be8fa575e6..d989c78b4b 100644 > --- a/wrapper.c > +++ b/wrapper.c > @@ -323,21 +323,60 @@ ssize_t write_in_full(int fd, const void *buf, size_t count) > return total; > } > > +ssize_t xwritev(int fd, struct iovec *iov, int iovcnt) > +{ > + ssize_t bytes_written; > + size_t total_length; > + int i; > + > + /* > + * We need to make sure that writev(3p) call does not write more than > + * `MAX_IO_SIZE` many bytes. If we do exceed that limit, we only pass > + * those iovecs to writev(3p) that sum up to less than the limit. > + * > + * If on the other hand the first iovec entry already exceeds this > + * limit we'll instead use xwrite() to write it, which knows to handle > + * `MAX_IO_SIZE` for us. > + */ > + for (i = 0, total_length = 0; i < iovcnt; i++) { > + if (unsigned_add_overflows(total_length, iov[i].iov_len)) > + break; > + > + total_length += iov[i].iov_len; > + if (total_length > MAX_IO_SIZE) > + break; > + } > + > + if (i < iovcnt) { > + /* > + * The first entry exceeds MAX_IO_SIZE, so we pass it to > + * xwrite, which knows to handle this case. > + */ > + if (!i) > + return xwrite(fd, iov->iov_base, iov->iov_len); It took me starting to write this email wondering “but i could be >= 1?” to realize that this comment applies to the !i case below. Darn. Still, I find the declaration (“the first entry exceeds”) before the check a bit confusing. Is that typical of our style (in which case leave it be)? > + iovcnt = i; > + } > + > + bytes_written = writev(fd, iov, iovcnt); > + if (!bytes_written) { > + errno = ENOSPC; > + return -1; > + } > + > + return bytes_written; > +} > + > ssize_t writev_in_full(int fd, struct iovec *iov, int iovcnt) > { > ssize_t total_written = 0; > > while (iovcnt) { > - ssize_t bytes_written = writev(fd, iov, iovcnt); > - if (bytes_written < 0) { > + ssize_t bytes_written = xwritev(fd, iov, iovcnt); > + if (bytes_written <= 0) { > if (errno == EINTR || errno == EAGAIN) > continue; > return -1; > } > - if (!bytes_written) { > - errno = ENOSPC; > - return -1; > - } > > total_written += bytes_written; > > diff --git a/wrapper.h b/wrapper.h > index 27519b32d1..a6287d7f4d 100644 > --- a/wrapper.h > +++ b/wrapper.h > @@ -16,6 +16,7 @@ void *xmmap_gently(void *start, size_t length, int prot, int flags, int fd, off_ > int xopen(const char *path, int flags, ...); > ssize_t xread(int fd, void *buf, size_t len); > ssize_t xwrite(int fd, const void *buf, size_t len); > +ssize_t xwritev(int fd, struct iovec *iov, int iovcnt); > ssize_t xpread(int fd, void *buf, size_t len, off_t offset); > int xdup(int fd); > FILE *xfopen(const char *path, const char *mode); > > --- > base-commit: b15384c06f77bc2d34d0d3623a8a58218313a561 > change-id: 20260409-b4-pks-writev-max-io-size-e9b803439ae8 > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` 2026-04-09 12:47 [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` Patrick Steinhardt 2026-04-09 15:46 ` Ben Knoble @ 2026-04-09 16:42 ` Junio C Hamano 2026-04-09 20:23 ` Jeff King 1 sibling, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2026-04-09 16:42 UTC (permalink / raw) To: Patrick Steinhardt Cc: git, Jeff King, brian m. carlson, Randall S. Becker, Phillip Wood Patrick Steinhardt <ps@pks.im> writes: > Some systems like NonStop set a comparatively small `MAX_IO_SIZE`, which > limits the maximum number of bytes we're allowed to write in a single > call. We already handle this limit properly in `xwrite()`, but we have > recently introduced wrappers for writev(3p) where we don't. This will > cause the syscall to return EINVAL in case somebody passes an iovec > entry to writev(3p) that is larger than `MAX_IO_SIZE`. > > Introduce a new function `xwritev()` that is similar to `xwrite()` in > that it handles such platform-specific nuances. The logic is rather > simple: we simply coalesce all iovecs that don't exceed `MAX_IO_SIZE` > and pass those to writev(3p). If the first iovec already exceeds the > limit, we'll instead pass it to `xwrite()`, which handles the limit for > us. OK, so the idea is just like xwrite(), whose original purpose was to hide the details of having to restart write() system call, pretends a short write on IO limited platforms, xwritev() pretends that the underlying writev() gave a short write, instead of a failure with EINVAL when ssize_t is unusually small. That mirrors an established pattern that is proven to work with write_in_full() code path, which is a good thing. By the way, I see that EINTR and EWOULDBLOCK are handled somewhat differently from how xwrite() handles it. As the original change that introduced use of writev() were to replace write_in_full() that eventually called into xwrite(), shouldn't we be closer to what xwrite() used to do? > this fixes the issue reported by Randall in [1]. > > I mostly wanted to get this patch out there so that we can discuss a > proposed fix, but as said in the thread I'm also happy to revise course > and instead set NO_WRITEV on NonStop for now. I think we'll want to > eventually land a fix like the one proposed here though, and at that > point the workaround would not be required anymore. It is too late to make this kind of "let's wrap writev()" effort before the final, and Git 2.54 should ship without any such change. If your platform does not have a writev() that works with write size up to half of maximum of size_t (or at least 64kB), use NO_WRITEV to build your Git. But let's discuss to prepare for an update post release. It would be nice to see minority platforms including NonStop test and notice problems that appear only on their system much earlier in the cycle next time. A report at -rc0, while better than not seeing any, is a bit too late. > diff --git a/wrapper.c b/wrapper.c > index be8fa575e6..d989c78b4b 100644 > --- a/wrapper.c > +++ b/wrapper.c > @@ -323,21 +323,60 @@ ssize_t write_in_full(int fd, const void *buf, size_t count) > return total; > } > > +ssize_t xwritev(int fd, struct iovec *iov, int iovcnt) > +{ > + ssize_t bytes_written; > + size_t total_length; > + int i; > + > + /* > + * We need to make sure that writev(3p) call does not write more than > + * `MAX_IO_SIZE` many bytes. If we do exceed that limit, we only pass > + * those iovecs to writev(3p) that sum up to less than the limit. > + * > + * If on the other hand the first iovec entry already exceeds this > + * limit we'll instead use xwrite() to write it, which knows to handle > + * `MAX_IO_SIZE` for us. > + */ > + for (i = 0, total_length = 0; i < iovcnt; i++) { > + if (unsigned_add_overflows(total_length, iov[i].iov_len)) > + break; We add .iov_len up in this first loop, because we do not want to bust writev(3p)'s limit. EINVAL The sum of the iov_len values in the iov array would overflow an ssize_t. cf. https://pubs.opengroup.org/onlinepubs/9799919799/functions/writev.html As the width of ssize_t in bits can be a lot smaller than size_t, the above "unsigned_add_overflows() triggers way too late for the check to matter, no? > + total_length += iov[i].iov_len; Then we have total_length computed. > + if (total_length > MAX_IO_SIZE) > + break; And it is capped to MAX_IO_SIZE, which is set way lower than SSIZE_MAX even on mainstream platforms (8MB, IIRC). This does not matter only because we are currently using writev() only for sideband communication and its payload is limited to 64kB, but if a caller gave us a list of buffers whose total size ranges in a few hundred megabytes (e.g., packfiles to clone a small project like git.git itself), on a not-so-I/O-limited platform, do we still want to chomp the original request into multiple writev(3p) system calls? I personally think it is an OK thing to do, simply because we also chomp a large xwrite() request into chunks and make multiple write(2) system calls, but we should explain the reason behind "We need to make sure" in the beginning of the above comment well--the current text has no explanation on the reason. > + } > + > + if (i < iovcnt) { > + /* > + * The first entry exceeds MAX_IO_SIZE, so we pass it to > + * xwrite, which knows to handle this case. > + */ > + if (!i) > + return xwrite(fd, iov->iov_base, iov->iov_len); Ben has a similar comment, but it would be easier to see the correspondence if you rephrase the comment perhaps like /* * If the first buffer is larger than MAX_IO_SIZE, * let xwrite() deal with it. */ xwrite() can return a short write, but the caller is counting how many bytes among what it passed to xwritev() are consumed by each call to this function, so the next call we may be seeing may have iov->iov_base pointing into the buffer we threw at xwrite() with reduced iov->iov_len, and that is expected and everything will even out. Quite nice. > + iovcnt = i; And if our iov[0].iov_len is smaller than MAX_IO_SIZE, then i would be at least 1 here and shows the index in iov[] array that busts the limit. IOW, we know iov[0]..iov[i-1] (inclusive) can be written without busting MAX_IO_SIZE in one go. So se reduce iovcnt down to that number here, in preparation for the next call. > + } By the way, if the total of iov[] is reaonably small, then the initial loop runs to the end of it, the above if() statement will be skipped, and iovcnt is not reduced. Either way, we now pass the initial part (which may be the entirety) of iov[] up to iovcnt to writev(). > + bytes_written = writev(fd, iov, iovcnt); > + if (!bytes_written) { > + errno = ENOSPC; > + return -1; > + } > + > + return bytes_written; > +} OK. So except for "is unsigned_add_overflows() doing what we want?" question, I think the above is reasonable. If we used "let's make sure sum of iov[].iov_len does not overflow an ssize_t" at the beginning of the loop, it does change the contract between the callers and this function, as they can no longer get EINVAL due to such overflow (instead this function will chomp the request into smaller pieces, just like MAX_IO_SIZE is used here). And I think that is a reasonable semantics. > ssize_t writev_in_full(int fd, struct iovec *iov, int iovcnt) > { > ssize_t total_written = 0; > > while (iovcnt) { > - ssize_t bytes_written = writev(fd, iov, iovcnt); > - if (bytes_written < 0) { > + ssize_t bytes_written = xwritev(fd, iov, iovcnt); > + if (bytes_written <= 0) { > if (errno == EINTR || errno == EAGAIN) > continue; I am not sure if this is how we want to handle EINTR, given especially that xwritev() may have called xwrite() under the hood in some case but not in others. If we are doing anything to these signals, I think it should be done where we call underlying writev(), and we should be close to whatever is done in xwrite() where it calls write(). > return -1; > } > - if (!bytes_written) { > - errno = ENOSPC; > - return -1; > - } On the other hand, I think moving this into xwritev() is a mistake. We shoudl try to be as close to what these functions are replacing (i.e. write_in_full and xwrite combo) in the code paths that are rewritten to use them. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` 2026-04-09 16:42 ` Junio C Hamano @ 2026-04-09 20:23 ` Jeff King 2026-04-09 20:40 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: Jeff King @ 2026-04-09 20:23 UTC (permalink / raw) To: Junio C Hamano Cc: Patrick Steinhardt, git, brian m. carlson, Randall S. Becker, Phillip Wood On Thu, Apr 09, 2026 at 09:42:42AM -0700, Junio C Hamano wrote: > > + for (i = 0, total_length = 0; i < iovcnt; i++) { > > + if (unsigned_add_overflows(total_length, iov[i].iov_len)) > > + break; > > We add .iov_len up in this first loop, because we do not want to > bust writev(3p)'s limit. > > EINVAL The sum of the iov_len values in the iov array would > overflow an ssize_t. > > cf. https://pubs.opengroup.org/onlinepubs/9799919799/functions/writev.html > > As the width of ssize_t in bits can be a lot smaller than size_t, > the above "unsigned_add_overflows() triggers way too late for the > check to matter, no? I think it is correct as-is. The real check against ssize_t is later, when we compare total_length to MAX_IO_SIZE (which is clamped to SSIZE_MAX). So this is just making sure we do not overflow size_t when counting up the total (and if we do, we _know_ we are going to overflow ssize_t, which must be smaller). I think this can be made more clear by counting down allowable bytes instead of up. I'll show an example in a second. > > + } > > + > > + if (i < iovcnt) { > > + /* > > + * The first entry exceeds MAX_IO_SIZE, so we pass it to > > + * xwrite, which knows to handle this case. > > + */ > > + if (!i) > > + return xwrite(fd, iov->iov_base, iov->iov_len); > > Ben has a similar comment, but it would be easier to see the > correspondence if you rephrase the comment perhaps like > [...] Me three. I think this can be made more clear if we bail to xwrite() immediately in the loop. So together with the count-down, something like: ssize_t allowed = MAX_IO_SIZE; int i; for (i = 0; i < iovcnt; i++) { if (iov[i].iov_len > allowed) { if (!i) return xwrite(fd, iov->iov_base, iov_len); break; } allowed -= iov[i].iov_len; } return writev(fd, iov, i); You can also directly return writev() instead of breaking out of the loop. That makes it even more clear that we are doing a partial write, but means duplicating the "return writev()" line. And I think the whole thing would still deserve comments, but I omitted them here since the point was to show the rearranged structure. -Peff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` 2026-04-09 20:23 ` Jeff King @ 2026-04-09 20:40 ` Junio C Hamano 2026-04-09 20:59 ` Jeff King 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2026-04-09 20:40 UTC (permalink / raw) To: Jeff King Cc: Patrick Steinhardt, git, brian m. carlson, Randall S. Becker, Phillip Wood Jeff King <peff@peff.net> writes: > On Thu, Apr 09, 2026 at 09:42:42AM -0700, Junio C Hamano wrote: > >> > + for (i = 0, total_length = 0; i < iovcnt; i++) { >> > + if (unsigned_add_overflows(total_length, iov[i].iov_len)) >> > + break; >> >> We add .iov_len up in this first loop, because we do not want to >> bust writev(3p)'s limit. >> >> EINVAL The sum of the iov_len values in the iov array would >> overflow an ssize_t. >> >> cf. https://pubs.opengroup.org/onlinepubs/9799919799/functions/writev.html >> >> As the width of ssize_t in bits can be a lot smaller than size_t, >> the above "unsigned_add_overflows() triggers way too late for the >> check to matter, no? > > I think it is correct as-is. > > The real check against ssize_t is later, when we compare total_length to > MAX_IO_SIZE (which is clamped to SSIZE_MAX). So this is just making sure > we do not overflow size_t when counting up the total (and if we do, we > _know_ we are going to overflow ssize_t, which must be smaller). But then what happens after it breaks out of the loop? We cannot be at i==0, so let's say we have a reasonably small iov[0] and iov[1] that is so large and makes size_t wraparound. We break out here, and then send the iov[0] with writev(). But have we checked if iov[0] is under MAX_IO_SIZE in that case before calling writev()? > I think this can be made more clear by counting down allowable bytes > instead of up. I'll show an example in a second. > >> > + } >> > + >> > + if (i < iovcnt) { >> > + /* >> > + * The first entry exceeds MAX_IO_SIZE, so we pass it to >> > + * xwrite, which knows to handle this case. >> > + */ >> > + if (!i) >> > + return xwrite(fd, iov->iov_base, iov->iov_len); >> >> Ben has a similar comment, but it would be easier to see the >> correspondence if you rephrase the comment perhaps like >> [...] > > Me three. I think this can be made more clear if we bail to xwrite() > immediately in the loop. So together with the count-down, something > like: > > ssize_t allowed = MAX_IO_SIZE; > int i; > > for (i = 0; i < iovcnt; i++) { > if (iov[i].iov_len > allowed) { > if (!i) > return xwrite(fd, iov->iov_base, iov_len); > break; > } > allowed -= iov[i].iov_len; > } > return writev(fd, iov, i); I agree that this is much simpler to follow. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` 2026-04-09 20:40 ` Junio C Hamano @ 2026-04-09 20:59 ` Jeff King 2026-04-09 21:09 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: Jeff King @ 2026-04-09 20:59 UTC (permalink / raw) To: Junio C Hamano Cc: Patrick Steinhardt, git, brian m. carlson, Randall S. Becker, Phillip Wood On Thu, Apr 09, 2026 at 01:40:36PM -0700, Junio C Hamano wrote: > >> As the width of ssize_t in bits can be a lot smaller than size_t, > >> the above "unsigned_add_overflows() triggers way too late for the > >> check to matter, no? > > > > I think it is correct as-is. > > > > The real check against ssize_t is later, when we compare total_length to > > MAX_IO_SIZE (which is clamped to SSIZE_MAX). So this is just making sure > > we do not overflow size_t when counting up the total (and if we do, we > > _know_ we are going to overflow ssize_t, which must be smaller). > > But then what happens after it breaks out of the loop? We cannot be > at i==0, so let's say we have a reasonably small iov[0] and iov[1] > that is so large and makes size_t wraparound. We break out here, > and then send the iov[0] with writev(). But have we checked if > iov[0] is under MAX_IO_SIZE in that case before calling writev()? I think so. Either: - We completed the first iteration of the loop successfully (and i >= 1), in which case we added iov[0].iov_len to total_length, and then compared total_length against MAX_IO_SIZE, but did not break out of the loop. So we know iov[0] is within the limits. - We bailed at i==0 either because of addition overflow, or because of the MAX_IO_SIZE check. Either way, we will bail to xwrite() because i is 0. -Peff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` 2026-04-09 20:59 ` Jeff King @ 2026-04-09 21:09 ` Junio C Hamano 2026-04-10 5:19 ` Patrick Steinhardt 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2026-04-09 21:09 UTC (permalink / raw) To: Jeff King Cc: Patrick Steinhardt, git, brian m. carlson, Randall S. Becker, Phillip Wood Jeff King <peff@peff.net> writes: > On Thu, Apr 09, 2026 at 01:40:36PM -0700, Junio C Hamano wrote: > >> >> As the width of ssize_t in bits can be a lot smaller than size_t, >> >> the above "unsigned_add_overflows() triggers way too late for the >> >> check to matter, no? >> > >> > I think it is correct as-is. >> > >> > The real check against ssize_t is later, when we compare total_length to >> > MAX_IO_SIZE (which is clamped to SSIZE_MAX). So this is just making sure >> > we do not overflow size_t when counting up the total (and if we do, we >> > _know_ we are going to overflow ssize_t, which must be smaller). >> >> But then what happens after it breaks out of the loop? We cannot be >> at i==0, so let's say we have a reasonably small iov[0] and iov[1] >> that is so large and makes size_t wraparound. We break out here, >> and then send the iov[0] with writev(). But have we checked if >> iov[0] is under MAX_IO_SIZE in that case before calling writev()? > > I think so. Either: > > - We completed the first iteration of the loop successfully (and i >= > 1), in which case we added iov[0].iov_len to total_length, and then > compared total_length against MAX_IO_SIZE, but did not break out of > the loop. So we know iov[0] is within the limits. > > - We bailed at i==0 either because of addition overflow, or because of > the MAX_IO_SIZE check. Either way, we will bail to xwrite() because > i is 0. Yup, you're right. There is no addition overflow at i==0, but I do not think we can construct a case where the sum is not checked against MAX_IO_SIZE before the vector is passed to underlying writev(). iov[0].iov_len that is slightly smaller than MAX_IO_SIZE would allow us to keep looping to i==1 at which time iov[1].iov_len is so big that we may trigger unsigned_add_overflows() check, but then what we send to writev() is the first segment, which is smaller than MAX_IO_SIZE, so we are OK. iov[0].iov_len that is slightly larger than MAX_IO_SIZE would stop us moving to i==1 at the end of the loop, and directly punt to xwrite(), so we are OK, too. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` 2026-04-09 21:09 ` Junio C Hamano @ 2026-04-10 5:19 ` Patrick Steinhardt 0 siblings, 0 replies; 8+ messages in thread From: Patrick Steinhardt @ 2026-04-10 5:19 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, git, brian m. carlson, Randall S. Becker, Phillip Wood On Thu, Apr 09, 2026 at 02:09:42PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > On Thu, Apr 09, 2026 at 01:40:36PM -0700, Junio C Hamano wrote: > > > >> >> As the width of ssize_t in bits can be a lot smaller than size_t, > >> >> the above "unsigned_add_overflows() triggers way too late for the > >> >> check to matter, no? > >> > > >> > I think it is correct as-is. > >> > > >> > The real check against ssize_t is later, when we compare total_length to > >> > MAX_IO_SIZE (which is clamped to SSIZE_MAX). So this is just making sure > >> > we do not overflow size_t when counting up the total (and if we do, we > >> > _know_ we are going to overflow ssize_t, which must be smaller). > >> > >> But then what happens after it breaks out of the loop? We cannot be > >> at i==0, so let's say we have a reasonably small iov[0] and iov[1] > >> that is so large and makes size_t wraparound. We break out here, > >> and then send the iov[0] with writev(). But have we checked if > >> iov[0] is under MAX_IO_SIZE in that case before calling writev()? > > > > I think so. Either: > > > > - We completed the first iteration of the loop successfully (and i >= > > 1), in which case we added iov[0].iov_len to total_length, and then > > compared total_length against MAX_IO_SIZE, but did not break out of > > the loop. So we know iov[0] is within the limits. > > > > - We bailed at i==0 either because of addition overflow, or because of > > the MAX_IO_SIZE check. Either way, we will bail to xwrite() because > > i is 0. > > Yup, you're right. > > There is no addition overflow at i==0, but I do not think we can > construct a case where the sum is not checked against MAX_IO_SIZE > before the vector is passed to underlying writev(). > > iov[0].iov_len that is slightly smaller than MAX_IO_SIZE would allow > us to keep looping to i==1 at which time iov[1].iov_len is so big > that we may trigger unsigned_add_overflows() check, but then what we > send to writev() is the first segment, which is smaller than > MAX_IO_SIZE, so we are OK. > > iov[0].iov_len that is slightly larger than MAX_IO_SIZE would stop > us moving to i==1 at the end of the loop, and directly punt to > xwrite(), so we are OK, too. Let's drop this patch for now. I'll pick it up again in the next release cycle when reintroducing writev(3p). Thanks, all! Patrick ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-04-10 5:20 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-09 12:47 [PATCH] wrapper: properly handle MAX_IO_SIZE in `write_in_full()` Patrick Steinhardt 2026-04-09 15:46 ` Ben Knoble 2026-04-09 16:42 ` Junio C Hamano 2026-04-09 20:23 ` Jeff King 2026-04-09 20:40 ` Junio C Hamano 2026-04-09 20:59 ` Jeff King 2026-04-09 21:09 ` Junio C Hamano 2026-04-10 5:19 ` Patrick Steinhardt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox