* [PATCH] remove unnecessary test and dead diagnostic @ 2011-05-26 13:59 Jim Meyering 2011-05-26 14:11 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Jim Meyering @ 2011-05-26 13:59 UTC (permalink / raw) To: git list * sha1_file.c (index_stream): Don't check for size_t < 0. read_in_full does not return an indication of failure. Signed-off-by: Jim Meyering <meyering@redhat.com> --- sha1_file.c | 2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 5fc877f..ea4549c 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2736,8 +2736,6 @@ static int index_stream(unsigned char *sha1, int fd, size_t size, size_t actual; actual = read_in_full(fd, buf, sz); - if (actual < 0) - die_errno("index-stream: reading input"); if (write_in_full(fast_import.in, buf, actual) != actual) die_errno("index-stream: feeding fast-import"); size -= actual; -- 1.7.5.2.660.g9f46c ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] remove unnecessary test and dead diagnostic 2011-05-26 13:59 [PATCH] remove unnecessary test and dead diagnostic Jim Meyering @ 2011-05-26 14:11 ` Jeff King 2011-05-26 14:34 ` Jim Meyering 2011-05-26 14:37 ` Jim Meyering 0 siblings, 2 replies; 10+ messages in thread From: Jeff King @ 2011-05-26 14:11 UTC (permalink / raw) To: Jim Meyering; +Cc: git list On Thu, May 26, 2011 at 03:59:14PM +0200, Jim Meyering wrote: > * sha1_file.c (index_stream): Don't check for size_t < 0. > read_in_full does not return an indication of failure. Are you sure about that? $ sed -n '/read_in_full/,/^}/p' wrapper.c ssize_t read_in_full(int fd, void *buf, size_t count) { char *p = buf; ssize_t total = 0; while (count > 0) { ssize_t loaded = xread(fd, p, count); if (loaded <= 0) return total ? total : loaded; count -= loaded; p += loaded; total += loaded; } return total; } It looks like if we get -1 on the _first_ read, we will then return -1. Subsequent errors are then ignored, and we return the (possibly truncated) result. Which, to be honest, seems kind of insane to me. I'd think: while (count > 0) { ssize_t loaded = xread(fd, p, count); if (loaded < 0) return loaded; if (loaded == 0) return total; ... } would be much more sensible semantics. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] remove unnecessary test and dead diagnostic 2011-05-26 14:11 ` Jeff King @ 2011-05-26 14:34 ` Jim Meyering 2011-05-26 16:28 ` Jeff King 2011-05-26 14:37 ` Jim Meyering 1 sibling, 1 reply; 10+ messages in thread From: Jim Meyering @ 2011-05-26 14:34 UTC (permalink / raw) To: Jeff King; +Cc: git list Jeff King wrote: > On Thu, May 26, 2011 at 03:59:14PM +0200, Jim Meyering wrote: > >> * sha1_file.c (index_stream): Don't check for size_t < 0. >> read_in_full does not return an indication of failure. > > Are you sure about that? > > $ sed -n '/read_in_full/,/^}/p' wrapper.c > ssize_t read_in_full(int fd, void *buf, size_t count) > { > char *p = buf; > ssize_t total = 0; > > while (count > 0) { > ssize_t loaded = xread(fd, p, count); Argh. I went in with blinders on, thinking that the caller was right in using a type of size_t, and then read this "xread" name and assumed that it would exit upon failure. Thanks for catching that. Here's a better patch: -- >8 -- Subject: [PATCH] use the correct type (ssize_t, not size_t) for read-style function * sha1_file.c (index_stream): Using an unsigned type, we would fail to detect a read error and then proceed to try to write (size_t)-1 bytes. Signed-off-by: Jim Meyering <meyering@redhat.com> --- sha1_file.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 5fc877f..8a85217 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2731,11 +2731,11 @@ static int index_stream(unsigned char *sha1, int fd, size_t size, write_or_whine(fast_import.in, fast_import_cmd, len, "index-stream: feeding fast-import"); while (size) { char buf[10240]; size_t sz = size < sizeof(buf) ? size : sizeof(buf); - size_t actual; + ssize_t actual; actual = read_in_full(fd, buf, sz); if (actual < 0) die_errno("index-stream: reading input"); if (write_in_full(fast_import.in, buf, actual) != actual) -- 1.7.5.2.660.g9f46c ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] remove unnecessary test and dead diagnostic 2011-05-26 14:34 ` Jim Meyering @ 2011-05-26 16:28 ` Jeff King 0 siblings, 0 replies; 10+ messages in thread From: Jeff King @ 2011-05-26 16:28 UTC (permalink / raw) To: Jim Meyering; +Cc: git list On Thu, May 26, 2011 at 04:34:20PM +0200, Jim Meyering wrote: > Argh. I went in with blinders on, thinking that the caller was > right in using a type of size_t, and then read this "xread" name and > assumed that it would exit upon failure. I've made the same mistake, as many of our x* functions are designed to die on error. > Subject: [PATCH] use the correct type (ssize_t, not size_t) for read-style function > > * sha1_file.c (index_stream): Using an unsigned type, > we would fail to detect a read error and then proceed to > try to write (size_t)-1 bytes. This version looks right to me. There's another one, too: -- >8 -- Subject: [PATCH] read_gitfile_gently: use ssize_t to hold read result Otherwise, a negative error return becomes a very large read value. We catch this in practice because we compare the expected and actual numbers of bytes (and you are not likely to be reading (size_t)-1 bytes), but this makes the correctness a little more obvious. Signed-off-by: Jeff King <peff@peff.net> --- setup.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/setup.c b/setup.c index 013ad11..ce87900 100644 --- a/setup.c +++ b/setup.c @@ -382,7 +382,7 @@ const char *read_gitfile_gently(const char *path) const char *slash; struct stat st; int fd; - size_t len; + ssize_t len; if (stat(path, &st)) return NULL; -- 1.7.4.5.13.gd3ff5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] remove unnecessary test and dead diagnostic 2011-05-26 14:11 ` Jeff King 2011-05-26 14:34 ` Jim Meyering @ 2011-05-26 14:37 ` Jim Meyering 2011-05-26 16:30 ` [PATCH] read_in_full: always report errors Jeff King 1 sibling, 1 reply; 10+ messages in thread From: Jim Meyering @ 2011-05-26 14:37 UTC (permalink / raw) To: Jeff King; +Cc: git list Jeff King wrote: ... > $ sed -n '/read_in_full/,/^}/p' wrapper.c > ssize_t read_in_full(int fd, void *buf, size_t count) > { > char *p = buf; > ssize_t total = 0; > > while (count > 0) { > ssize_t loaded = xread(fd, p, count); > if (loaded <= 0) > return total ? total : loaded; > count -= loaded; > p += loaded; > total += loaded; > } > > return total; > } > > It looks like if we get -1 on the _first_ read, we will then return -1. > Subsequent errors are then ignored, and we return the (possibly > truncated) result. > > Which, to be honest, seems kind of insane to me. I'd think: > > while (count > 0) { > ssize_t loaded = xread(fd, p, count); > if (loaded < 0) > return loaded; > if (loaded == 0) > return total; > ... > } > > would be much more sensible semantics. That looks better to me, too. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] read_in_full: always report errors 2011-05-26 14:37 ` Jim Meyering @ 2011-05-26 16:30 ` Jeff King 2011-05-26 18:35 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Jeff King @ 2011-05-26 16:30 UTC (permalink / raw) To: Jim Meyering; +Cc: git list On Thu, May 26, 2011 at 04:37:10PM +0200, Jim Meyering wrote: > > It looks like if we get -1 on the _first_ read, we will then return -1. > > Subsequent errors are then ignored, and we return the (possibly > > truncated) result. > > > > Which, to be honest, seems kind of insane to me. I'd think: > > > > while (count > 0) { > > ssize_t loaded = xread(fd, p, count); > > if (loaded < 0) > > return loaded; > > if (loaded == 0) > > return total; > > ... > > } > > > > would be much more sensible semantics. > > That looks better to me, too. I was worried that some caller might care about the truncated output, but after looking through the code, that is not the case. So I think we should do this. -- >8 -- Subject: [PATCH] read_in_full: always report errors The read_in_full function repeatedly calls read() to fill a buffer. If the first read() returns an error, we notify the caller by returning the error. However, if we read some data and then get an error on a subsequent read, we simply return the amount of data that we did read, and the caller is unaware of the error. This makes the tradeoff that seeing the partial data is more important than the fact that an error occurred. In practice, this is generally not the case; we care more if an error occurred, and should throw away any partial data. I audited the current callers. In most cases, this will make no difference at all, as they do: if (read_in_full(fd, buf, size) != size) error("short read"); However, it will help in a few cases: 1. In sha1_file.c:index_stream, we would fail to notice errors in the incoming stream. 2. When reading symbolic refs in resolve_ref, we would fail to notice errors and potentially use a truncated ref name. 3. In various places, we will get much better error messages. For example, callers of safe_read would erroneously print "the remote end hung up unexpectedly" instead of showing the read error. Signed-off-by: Jeff King <peff@peff.net> --- wrapper.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/wrapper.c b/wrapper.c index 2829000..85f09df 100644 --- a/wrapper.c +++ b/wrapper.c @@ -148,8 +148,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count) while (count > 0) { ssize_t loaded = xread(fd, p, count); - if (loaded <= 0) - return total ? total : loaded; + if (loaded < 0) + return -1; + if (loaded == 0) + return total; count -= loaded; p += loaded; total += loaded; -- 1.7.4.5.13.gd3ff5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] read_in_full: always report errors 2011-05-26 16:30 ` [PATCH] read_in_full: always report errors Jeff King @ 2011-05-26 18:35 ` Junio C Hamano 2011-05-26 18:48 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2011-05-26 18:35 UTC (permalink / raw) To: Jeff King; +Cc: Jim Meyering, git list Jeff King <peff@peff.net> writes: > Subject: [PATCH] read_in_full: always report errors > > The read_in_full function repeatedly calls read() to fill a > buffer. If the first read() returns an error, we notify the > caller by returning the error. However, if we read some data > and then get an error on a subsequent read, we simply return > the amount of data that we did read, and the caller is > unaware of the error. Is the caller unaware? While it won't hurt the callers who do: if (expect != read_in_full(fd, buf, expect)) die(...) I think this change hurts the one that you mentioned in your analysis. The caller in index_stream() reads what it could, writes what it read, and comes back and makes another call to read_in_full(), at which point either it gets an error and the whole thing would error out (i.e. no difference from before), or if it was an transient error that interrupted the previous read_in_full(), it can keep reading (with this patch it will not have a chance to do so). > This makes the tradeoff that seeing the partial data is more > important than the fact that an error occurred. In practice, > this is generally not the case; we care more if an error > occurred, and should throw away any partial data. Not really. I think we care about both, and I think that is what the current code tries to do. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] read_in_full: always report errors 2011-05-26 18:35 ` Junio C Hamano @ 2011-05-26 18:48 ` Jeff King 2011-05-26 20:53 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Jeff King @ 2011-05-26 18:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jim Meyering, git list On Thu, May 26, 2011 at 11:35:51AM -0700, Junio C Hamano wrote: > The caller in index_stream() reads what it could, writes what it read, and > comes back and makes another call to read_in_full(), at which point either > it gets an error and the whole thing would error out (i.e. no difference > from before), or if it was an transient error that interrupted the > previous read_in_full(), it can keep reading (with this patch it will not > have a chance to do so). The problem is that most callers are not careful enough to repeatedly call read_in_full and find out that there might have been an error in the previous result. They see a read shorter than what they asked, and assume it was EOF. But even if we assume all callers are careful and want to handle these transient errors, then: 1. What sort of transient errors are we talking about? We already handle retrying after EAGAIN and EINTR via xread. 2. If we get a non-transient error, are we guaranteed to get the same error if we make some other syscalls and then call read() again? Otherwise we are masking it. But really, it just seems like a non-intuitive interface to me (as evidenced by the number of callers who _didn't_ get it right). If a caller like index_stream is really interested in reading and processing some data up to a certain size, shouldn't it just be using xread directly? -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] read_in_full: always report errors 2011-05-26 18:48 ` Jeff King @ 2011-05-26 20:53 ` Junio C Hamano 2011-05-26 21:04 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2011-05-26 20:53 UTC (permalink / raw) To: Jeff King; +Cc: Jim Meyering, git list Jeff King <peff@peff.net> writes: > The problem is that most callers are not careful enough to repeatedly > call read_in_full and find out that there might have been an error in > the previous result. They see a read shorter than what they asked, and > assume it was EOF. I can buy that argument, but then shouldn't we change the "careful" callers to treat any short-read from read_in_full() as an error? After this patch, which you convinced me is a good thing to do overall, they are no longer careful but are merely misguided that they can catch and tell two kinds of errors apart. Perhaps like this. I notice that overall they are good changes, but the one in pkt-line.c does not look very good. combine-diff.c | 5 +---- csum-file.c | 2 -- pkt-line.c | 6 ++---- sha1_file.c | 2 +- 4 files changed, 4 insertions(+), 11 deletions(-) diff --git a/combine-diff.c b/combine-diff.c index be67cfc..176231e 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -845,11 +845,8 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent, result = xmalloc(len + 1); done = read_in_full(fd, result, len); - if (done < 0) + if (done != len) die_errno("read error '%s'", elem->path); - else if (done < len) - die("early EOF '%s'", elem->path); - result[len] = 0; /* If not a fake symlink, apply filters, e.g. autocrlf */ diff --git a/csum-file.c b/csum-file.c index fc97d6e..f5ac31f 100644 --- a/csum-file.c +++ b/csum-file.c @@ -19,8 +19,6 @@ static void flush(struct sha1file *f, void *buf, unsigned int count) if (ret < 0) die_errno("%s: sha1 file read error", f->name); - if (ret < count) - die("%s: sha1 file truncated", f->name); if (memcmp(buf, check_buffer, count)) die("sha1 file '%s' validation error", f->name); } diff --git a/pkt-line.c b/pkt-line.c index 5a04984..5628801 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -138,10 +138,8 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, ...) static void safe_read(int fd, void *buffer, unsigned size) { ssize_t ret = read_in_full(fd, buffer, size); - if (ret < 0) - die_errno("read error"); - else if (ret < size) - die("The remote end hung up unexpectedly"); + if (ret != size) + die_errno("The remote end hung up unexpectedly"); } static int packet_length(const char *linelen) diff --git a/sha1_file.c b/sha1_file.c index 8a85217..d1332c4 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2736,7 +2736,7 @@ static int index_stream(unsigned char *sha1, int fd, size_t size, ssize_t actual; actual = read_in_full(fd, buf, sz); - if (actual < 0) + if (actual != sz) die_errno("index-stream: reading input"); if (write_in_full(fast_import.in, buf, actual) != actual) die_errno("index-stream: feeding fast-import"); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] read_in_full: always report errors 2011-05-26 20:53 ` Junio C Hamano @ 2011-05-26 21:04 ` Jeff King 0 siblings, 0 replies; 10+ messages in thread From: Jeff King @ 2011-05-26 21:04 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jim Meyering, git list On Thu, May 26, 2011 at 01:53:09PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > The problem is that most callers are not careful enough to repeatedly > > call read_in_full and find out that there might have been an error in > > the previous result. They see a read shorter than what they asked, and > > assume it was EOF. > > I can buy that argument, but then shouldn't we change the "careful" > callers to treat any short-read from read_in_full() as an error? I don't think so. A short-read could still be EOF, and you can distinguish between the two. Before, if you asked for n bytes, you would get back an 'r' that was one of: r < 0: error on first read r < n: short read via EOF, or error on subsequent read r == n: OK, got n bytes With my patch, you get: r < 0: error on any read r < n: short read via EOF r == n: OK, got n bytes So any negative return is an error, and less than n now _always_ means a short read. So your "careful" callers will now get the error automatically. If you want to update any callers, it would be ones like: if (read_in_full(fd, buf, len) != len)) die("unable to read %d bytes", len); which are not _wrong_, but could be more specific in doing: ssize_t r = read_in_full(fd, buf, len); if (r < 0) die_errno("unable to read"); else if (r < len) die("short read"); But that is just a quality-of-error-message issue, not a correctness issue. > diff --git a/combine-diff.c b/combine-diff.c > index be67cfc..176231e 100644 > --- a/combine-diff.c > +++ b/combine-diff.c > @@ -845,11 +845,8 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent, > result = xmalloc(len + 1); > > done = read_in_full(fd, result, len); > - if (done < 0) > + if (done != len) > die_errno("read error '%s'", elem->path); > - else if (done < len) > - die("early EOF '%s'", elem->path); > - This is backwards. We now _can_ tell the two apart, so more callers could be like this. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-05-26 21:04 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-05-26 13:59 [PATCH] remove unnecessary test and dead diagnostic Jim Meyering 2011-05-26 14:11 ` Jeff King 2011-05-26 14:34 ` Jim Meyering 2011-05-26 16:28 ` Jeff King 2011-05-26 14:37 ` Jim Meyering 2011-05-26 16:30 ` [PATCH] read_in_full: always report errors Jeff King 2011-05-26 18:35 ` Junio C Hamano 2011-05-26 18:48 ` Jeff King 2011-05-26 20:53 ` Junio C Hamano 2011-05-26 21:04 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).