[PATCH] remove unnecessary test and dead diagnostic

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] remove unnecessary test and dead diagnostic
@ 2011-05-26 13:59 Jim Meyering
  2011-05-26 14:11 ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Jim Meyering @ 2011-05-26 13:59 UTC (permalink / raw)
  To: git list


* sha1_file.c (index_stream): Don't check for size_t < 0.
read_in_full does not return an indication of failure.

Signed-off-by: Jim Meyering <meyering@redhat.com>
---
 sha1_file.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 5fc877f..ea4549c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2736,8 +2736,6 @@ static int index_stream(unsigned char *sha1, int fd, size_t size,
 		size_t actual;

 		actual = read_in_full(fd, buf, sz);
-		if (actual < 0)
-			die_errno("index-stream: reading input");
 		if (write_in_full(fast_import.in, buf, actual) != actual)
 			die_errno("index-stream: feeding fast-import");
 		size -= actual;
--
1.7.5.2.660.g9f46c

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] remove unnecessary test and dead diagnostic
  2011-05-26 13:59 [PATCH] remove unnecessary test and dead diagnostic Jim Meyering
@ 2011-05-26 14:11 ` Jeff King
  2011-05-26 14:34   ` Jim Meyering
  2011-05-26 14:37   ` Jim Meyering
  0 siblings, 2 replies; 10+ messages in thread
From: Jeff King @ 2011-05-26 14:11 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git list

On Thu, May 26, 2011 at 03:59:14PM +0200, Jim Meyering wrote:

> * sha1_file.c (index_stream): Don't check for size_t < 0.
> read_in_full does not return an indication of failure.

Are you sure about that?

  $ sed -n '/read_in_full/,/^}/p' wrapper.c
  ssize_t read_in_full(int fd, void *buf, size_t count)
  {
          char *p = buf;
          ssize_t total = 0;

          while (count > 0) {
                  ssize_t loaded = xread(fd, p, count);
                  if (loaded <= 0)
                          return total ? total : loaded;
                  count -= loaded;
                  p += loaded;
                  total += loaded;
          }

          return total;
  }

It looks like if we get -1 on the _first_ read, we will then return -1.
Subsequent errors are then ignored, and we return the (possibly
truncated) result.

Which, to be honest, seems kind of insane to me. I'd think:

  while (count > 0) {
          ssize_t loaded = xread(fd, p, count);
          if (loaded < 0)
                  return loaded;
          if (loaded == 0)
                  return total;
          ...
  }

would be much more sensible semantics.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] remove unnecessary test and dead diagnostic
  2011-05-26 14:11 ` Jeff King
@ 2011-05-26 14:34   ` Jim Meyering
  2011-05-26 16:28     ` Jeff King
  2011-05-26 14:37   ` Jim Meyering
  1 sibling, 1 reply; 10+ messages in thread
From: Jim Meyering @ 2011-05-26 14:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git list

Jeff King wrote:
> On Thu, May 26, 2011 at 03:59:14PM +0200, Jim Meyering wrote:
>
>> * sha1_file.c (index_stream): Don't check for size_t < 0.
>> read_in_full does not return an indication of failure.
>
> Are you sure about that?
>
>   $ sed -n '/read_in_full/,/^}/p' wrapper.c
>   ssize_t read_in_full(int fd, void *buf, size_t count)
>   {
>           char *p = buf;
>           ssize_t total = 0;
>
>           while (count > 0) {
>                   ssize_t loaded = xread(fd, p, count);

Argh.  I went in with blinders on, thinking that the caller was
right in using a type of size_t, and then read this "xread" name and
assumed that it would exit upon failure.

Thanks for catching that.
Here's a better patch:

-- >8 --
Subject: [PATCH] use the correct type (ssize_t, not size_t) for read-style function

* sha1_file.c (index_stream): Using an unsigned type,
we would fail to detect a read error and then proceed to
try to write (size_t)-1 bytes.

Signed-off-by: Jim Meyering <meyering@redhat.com>
---
 sha1_file.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 5fc877f..8a85217 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2731,11 +2731,11 @@ static int index_stream(unsigned char *sha1, int fd, size_t size,
 	write_or_whine(fast_import.in, fast_import_cmd, len,
 		       "index-stream: feeding fast-import");
 	while (size) {
 		char buf[10240];
 		size_t sz = size < sizeof(buf) ? size : sizeof(buf);
-		size_t actual;
+		ssize_t actual;

 		actual = read_in_full(fd, buf, sz);
 		if (actual < 0)
 			die_errno("index-stream: reading input");
 		if (write_in_full(fast_import.in, buf, actual) != actual)
--
1.7.5.2.660.g9f46c

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] remove unnecessary test and dead diagnostic
  2011-05-26 14:34   ` Jim Meyering
@ 2011-05-26 16:28     ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2011-05-26 16:28 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git list

On Thu, May 26, 2011 at 04:34:20PM +0200, Jim Meyering wrote:

> Argh.  I went in with blinders on, thinking that the caller was
> right in using a type of size_t, and then read this "xread" name and
> assumed that it would exit upon failure.

I've made the same mistake, as many of our x* functions are designed to
die on error.

> Subject: [PATCH] use the correct type (ssize_t, not size_t) for read-style function
> 
> * sha1_file.c (index_stream): Using an unsigned type,
> we would fail to detect a read error and then proceed to
> try to write (size_t)-1 bytes.

This version looks right to me.

There's another one, too:

-- >8 --
Subject: [PATCH] read_gitfile_gently: use ssize_t to hold read result

Otherwise, a negative error return becomes a very large read
value. We catch this in practice because we compare the
expected and actual numbers of bytes (and you are not likely
to be reading (size_t)-1 bytes), but this makes the
correctness a little more obvious.

Signed-off-by: Jeff King <peff@peff.net>
---
 setup.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/setup.c b/setup.c
index 013ad11..ce87900 100644
--- a/setup.c
+++ b/setup.c
@@ -382,7 +382,7 @@ const char *read_gitfile_gently(const char *path)
 	const char *slash;
 	struct stat st;
 	int fd;
-	size_t len;
+	ssize_t len;
 
 	if (stat(path, &st))
 		return NULL;
-- 
1.7.4.5.13.gd3ff5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] remove unnecessary test and dead diagnostic
  2011-05-26 14:11 ` Jeff King
  2011-05-26 14:34   ` Jim Meyering
@ 2011-05-26 14:37   ` Jim Meyering
  2011-05-26 16:30     ` [PATCH] read_in_full: always report errors Jeff King
  1 sibling, 1 reply; 10+ messages in thread
From: Jim Meyering @ 2011-05-26 14:37 UTC (permalink / raw)
  To: Jeff King; +Cc: git list

Jeff King wrote:
...
>   $ sed -n '/read_in_full/,/^}/p' wrapper.c
>   ssize_t read_in_full(int fd, void *buf, size_t count)
>   {
>           char *p = buf;
>           ssize_t total = 0;
>
>           while (count > 0) {
>                   ssize_t loaded = xread(fd, p, count);
>                   if (loaded <= 0)
>                           return total ? total : loaded;
>                   count -= loaded;
>                   p += loaded;
>                   total += loaded;
>           }
>
>           return total;
>   }
>
> It looks like if we get -1 on the _first_ read, we will then return -1.
> Subsequent errors are then ignored, and we return the (possibly
> truncated) result.
>
> Which, to be honest, seems kind of insane to me. I'd think:
>
>   while (count > 0) {
>           ssize_t loaded = xread(fd, p, count);
>           if (loaded < 0)
>                   return loaded;
>           if (loaded == 0)
>                   return total;
>           ...
>   }
>
> would be much more sensible semantics.

That looks better to me, too.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] read_in_full: always report errors
  2011-05-26 14:37   ` Jim Meyering
@ 2011-05-26 16:30     ` Jeff King
  2011-05-26 18:35       ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2011-05-26 16:30 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git list

On Thu, May 26, 2011 at 04:37:10PM +0200, Jim Meyering wrote:

> > It looks like if we get -1 on the _first_ read, we will then return -1.
> > Subsequent errors are then ignored, and we return the (possibly
> > truncated) result.
> >
> > Which, to be honest, seems kind of insane to me. I'd think:
> >
> >   while (count > 0) {
> >           ssize_t loaded = xread(fd, p, count);
> >           if (loaded < 0)
> >                   return loaded;
> >           if (loaded == 0)
> >                   return total;
> >           ...
> >   }
> >
> > would be much more sensible semantics.
> 
> That looks better to me, too.

I was worried that some caller might care about the truncated output,
but after looking through the code, that is not the case. So I think we
should do this.

-- >8 --
Subject: [PATCH] read_in_full: always report errors

The read_in_full function repeatedly calls read() to fill a
buffer. If the first read() returns an error, we notify the
caller by returning the error. However, if we read some data
and then get an error on a subsequent read, we simply return
the amount of data that we did read, and the caller is
unaware of the error.

This makes the tradeoff that seeing the partial data is more
important than the fact that an error occurred. In practice,
this is generally not the case; we care more if an error
occurred, and should throw away any partial data.

I audited the current callers. In most cases, this will make
no difference at all, as they do:

  if (read_in_full(fd, buf, size) != size)
	  error("short read");

However, it will help in a few cases:

  1. In sha1_file.c:index_stream, we would fail to notice
     errors in the incoming stream.

  2. When reading symbolic refs in resolve_ref, we would
     fail to notice errors and potentially use a truncated
     ref name.

  3. In various places, we will get much better error
     messages. For example, callers of safe_read would
     erroneously print "the remote end hung up unexpectedly"
     instead of showing the read error.

Signed-off-by: Jeff King <peff@peff.net>
---
 wrapper.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/wrapper.c b/wrapper.c
index 2829000..85f09df 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -148,8 +148,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count)
 
 	while (count > 0) {
 		ssize_t loaded = xread(fd, p, count);
-		if (loaded <= 0)
-			return total ? total : loaded;
+		if (loaded < 0)
+			return -1;
+		if (loaded == 0)
+			return total;
 		count -= loaded;
 		p += loaded;
 		total += loaded;
-- 
1.7.4.5.13.gd3ff5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] read_in_full: always report errors
  2011-05-26 16:30     ` [PATCH] read_in_full: always report errors Jeff King
@ 2011-05-26 18:35       ` Junio C Hamano
  2011-05-26 18:48         ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2011-05-26 18:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Jim Meyering, git list

Jeff King <peff@peff.net> writes:

> Subject: [PATCH] read_in_full: always report errors
>
> The read_in_full function repeatedly calls read() to fill a
> buffer. If the first read() returns an error, we notify the
> caller by returning the error. However, if we read some data
> and then get an error on a subsequent read, we simply return
> the amount of data that we did read, and the caller is
> unaware of the error.

Is the caller unaware?  While it won't hurt the callers who do:

   if (expect != read_in_full(fd, buf, expect))
      die(...)

I think this change hurts the one that you mentioned in your analysis.

The caller in index_stream() reads what it could, writes what it read, and
comes back and makes another call to read_in_full(), at which point either
it gets an error and the whole thing would error out (i.e. no difference
from before), or if it was an transient error that interrupted the
previous read_in_full(), it can keep reading (with this patch it will not
have a chance to do so).

> This makes the tradeoff that seeing the partial data is more
> important than the fact that an error occurred. In practice,
> this is generally not the case; we care more if an error
> occurred, and should throw away any partial data.

Not really. I think we care about both, and I think that is what the
current code tries to do.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] read_in_full: always report errors
  2011-05-26 18:35       ` Junio C Hamano
@ 2011-05-26 18:48         ` Jeff King
  2011-05-26 20:53           ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2011-05-26 18:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jim Meyering, git list

On Thu, May 26, 2011 at 11:35:51AM -0700, Junio C Hamano wrote:

> The caller in index_stream() reads what it could, writes what it read, and
> comes back and makes another call to read_in_full(), at which point either
> it gets an error and the whole thing would error out (i.e. no difference
> from before), or if it was an transient error that interrupted the
> previous read_in_full(), it can keep reading (with this patch it will not
> have a chance to do so).

The problem is that most callers are not careful enough to repeatedly
call read_in_full and find out that there might have been an error in
the previous result. They see a read shorter than what they asked, and
assume it was EOF.

But even if we assume all callers are careful and want to handle these
transient errors, then:

  1. What sort of transient errors are we talking about? We already
     handle retrying after EAGAIN and EINTR via xread.

  2. If we get a non-transient error, are we guaranteed to get the same
     error if we make some other syscalls and then call read() again?
     Otherwise we are masking it.

But really, it just seems like a non-intuitive interface to me (as
evidenced by the number of callers who _didn't_ get it right). If a
caller like index_stream is really interested in reading and processing
some data up to a certain size, shouldn't it just be using xread
directly?

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] read_in_full: always report errors
  2011-05-26 18:48         ` Jeff King
@ 2011-05-26 20:53           ` Junio C Hamano
  2011-05-26 21:04             ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2011-05-26 20:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Jim Meyering, git list

Jeff King <peff@peff.net> writes:

> The problem is that most callers are not careful enough to repeatedly
> call read_in_full and find out that there might have been an error in
> the previous result. They see a read shorter than what they asked, and
> assume it was EOF.

I can buy that argument, but then shouldn't we change the "careful"
callers to treat any short-read from read_in_full() as an error?

After this patch, which you convinced me is a good thing to do overall,
they are no longer careful but are merely misguided that they can catch
and tell two kinds of errors apart.

Perhaps like this.  I notice that overall they are good changes, but the
one in pkt-line.c does not look very good.

 combine-diff.c |    5 +----
 csum-file.c    |    2 --
 pkt-line.c     |    6 ++----
 sha1_file.c    |    2 +-
 4 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index be67cfc..176231e 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -845,11 +845,8 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			result = xmalloc(len + 1);
 
 			done = read_in_full(fd, result, len);
-			if (done < 0)
+			if (done != len)
 				die_errno("read error '%s'", elem->path);
-			else if (done < len)
-				die("early EOF '%s'", elem->path);
-
 			result[len] = 0;
 
 			/* If not a fake symlink, apply filters, e.g. autocrlf */
diff --git a/csum-file.c b/csum-file.c
index fc97d6e..f5ac31f 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -19,8 +19,6 @@ static void flush(struct sha1file *f, void *buf, unsigned int count)
 
 		if (ret < 0)
 			die_errno("%s: sha1 file read error", f->name);
-		if (ret < count)
-			die("%s: sha1 file truncated", f->name);
 		if (memcmp(buf, check_buffer, count))
 			die("sha1 file '%s' validation error", f->name);
 	}
diff --git a/pkt-line.c b/pkt-line.c
index 5a04984..5628801 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -138,10 +138,8 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 static void safe_read(int fd, void *buffer, unsigned size)
 {
 	ssize_t ret = read_in_full(fd, buffer, size);
-	if (ret < 0)
-		die_errno("read error");
-	else if (ret < size)
-		die("The remote end hung up unexpectedly");
+	if (ret != size)
+		die_errno("The remote end hung up unexpectedly");
 }
 
 static int packet_length(const char *linelen)
diff --git a/sha1_file.c b/sha1_file.c
index 8a85217..d1332c4 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2736,7 +2736,7 @@ static int index_stream(unsigned char *sha1, int fd, size_t size,
 		ssize_t actual;
 
 		actual = read_in_full(fd, buf, sz);
-		if (actual < 0)
+		if (actual != sz)
 			die_errno("index-stream: reading input");
 		if (write_in_full(fast_import.in, buf, actual) != actual)
 			die_errno("index-stream: feeding fast-import");

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] read_in_full: always report errors
  2011-05-26 20:53           ` Junio C Hamano
@ 2011-05-26 21:04             ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2011-05-26 21:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jim Meyering, git list

On Thu, May 26, 2011 at 01:53:09PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The problem is that most callers are not careful enough to repeatedly
> > call read_in_full and find out that there might have been an error in
> > the previous result. They see a read shorter than what they asked, and
> > assume it was EOF.
> 
> I can buy that argument, but then shouldn't we change the "careful"
> callers to treat any short-read from read_in_full() as an error?

I don't think so. A short-read could still be EOF, and you can
distinguish between the two. Before, if you asked for n bytes, you would
get back an 'r' that was one of:

  r < 0: error on first read
  r < n: short read via EOF, or error on subsequent read
  r == n: OK, got n bytes

With my patch, you get:

  r < 0: error on any read
  r < n: short read via EOF
  r == n: OK, got n bytes

So any negative return is an error, and less than n now _always_ means a
short read. So your "careful" callers will now get the error
automatically. If you want to update any callers, it would be ones like:

  if (read_in_full(fd, buf, len) != len))
          die("unable to read %d bytes", len);

which are not _wrong_, but could be more specific in doing:

  ssize_t r = read_in_full(fd, buf, len);
  if (r < 0)
          die_errno("unable to read");
  else if (r < len)
          die("short read");

But that is just a quality-of-error-message issue, not a correctness
issue.

> diff --git a/combine-diff.c b/combine-diff.c
> index be67cfc..176231e 100644
> --- a/combine-diff.c
> +++ b/combine-diff.c
> @@ -845,11 +845,8 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
>  			result = xmalloc(len + 1);
>  
>  			done = read_in_full(fd, result, len);
> -			if (done < 0)
> +			if (done != len)
>  				die_errno("read error '%s'", elem->path);
> -			else if (done < len)
> -				die("early EOF '%s'", elem->path);
> -

This is backwards. We now _can_ tell the two apart, so more callers
could be like this.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-05-26 21:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-26 13:59 [PATCH] remove unnecessary test and dead diagnostic Jim Meyering
2011-05-26 14:11 ` Jeff King
2011-05-26 14:34   ` Jim Meyering
2011-05-26 16:28     ` Jeff King
2011-05-26 14:37   ` Jim Meyering
2011-05-26 16:30     ` [PATCH] read_in_full: always report errors Jeff King
2011-05-26 18:35       ` Junio C Hamano
2011-05-26 18:48         ` Jeff King
2011-05-26 20:53           ` Junio C Hamano
2011-05-26 21:04             ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).