pread() over NFS (again) [1.5.5.4]

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* pread() over NFS (again) [1.5.5.4]
@ 2008-06-26 16:40 Christian Holtje
  2008-06-26 20:46 ` Shawn O. Pearce
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Holtje @ 2008-06-26 16:40 UTC (permalink / raw)
  To: git

I have read all the threads on git having trouble with pread() and I  
didn't see anything to help.

Situation:
   git commands run on system "dev2" which is Linux 2.6.9-42.0.8.ELsmp  
on an NFS mounted directory.
   The NFS server is "dev1" which is Linux 2.4.28

I get the following output from git when doing a fetch:
   remote: Counting objects: 406, done.
   remote: Compressing objects: 100% (198/198), done.
   remote: Total 253 (delta 127), reused 150 (delta 55)
   Receiving objects: 100% (253/253), 5.27 MiB | 9136 KiB/s, done.
   fatal: cannot pread pack file: No such file or directory
   fatal: index-pack failed

The end of the strace looks like so:
pread(3, "", 205, 1373)                 = 0
write(2, "fatal: cannot pread pack file: N"..., 57) = 57

I have ran strace -o /somedir/q -ff git fetch and have the straces  
available at:
http://docwhat.gerf.org/files/tmp/git-strace-20080626.tgz

I have worked around the problem by running the fetch from the nfs  
server.

It looks like I can recreate this at will and I'm willing to help  
figure this out.

Thanks!

Ciao!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-26 16:40 pread() over NFS (again) [1.5.5.4] Christian Holtje
@ 2008-06-26 20:46 ` Shawn O. Pearce
  2008-06-26 20:56   ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Shawn O. Pearce @ 2008-06-26 20:46 UTC (permalink / raw)
  To: Christian Holtje; +Cc: git

Christian Holtje <docwhat@gmail.com> wrote:
> I have read all the threads on git having trouble with pread() and I  
> didn't see anything to help.
...
>   Receiving objects: 100% (253/253), 5.27 MiB | 9136 KiB/s, done.
>   fatal: cannot pread pack file: No such file or directory
>   fatal: index-pack failed
> 
> The end of the strace looks like so:
> pread(3, "", 205, 1373)                 = 0
> write(2, "fatal: cannot pread pack file: N"..., 57) = 57

Hmmph.  So pread for a length of 205 can return 0 on NFS?  Is this
a transient error?  If so, perhaps a patch like this might help:

diff --git a/index-pack.c b/index-pack.c
index 5ac91ba..737f757 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -309,14 +309,19 @@ static void *get_data_from_pack(struct object_entry *obj)
 	unsigned char *src, *data;
 	z_stream stream;
 	int st;
+	int attempts = 0;
 
 	src = xmalloc(len);
 	data = src;
 	do {
 		ssize_t n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
-		if (n <= 0)
+		if (n <= 0) {
+			if (n == 0 && ++attempts < 10)
+				continue;
 			die("cannot pread pack file: %s", strerror(errno));
+		}
 		rdy += n;
+		attempts = 0;
 	} while (rdy < len);
 	data = xmalloc(obj->size);
 	memset(&stream, 0, sizeof(stream));


The file shouldn't be short unless someone truncated it, or there
is a bug in index-pack.  Neither is very likely, but I don't think
we would want to retry pread'ing the same block forever.

-- 
Shawn.

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-26 20:46 ` Shawn O. Pearce
@ 2008-06-26 20:56   ` Junio C Hamano
  2008-06-26 21:05     ` Shawn O. Pearce
  2008-06-26 23:36     ` logank
  0 siblings, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2008-06-26 20:56 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Christian Holtje, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Christian Holtje <docwhat@gmail.com> wrote:
>> I have read all the threads on git having trouble with pread() and I  
>> didn't see anything to help.
> ...
>>   Receiving objects: 100% (253/253), 5.27 MiB | 9136 KiB/s, done.
>>   fatal: cannot pread pack file: No such file or directory
>>   fatal: index-pack failed
>> 
>> The end of the strace looks like so:
>> pread(3, "", 205, 1373)                 = 0
>> write(2, "fatal: cannot pread pack file: N"..., 57) = 57
>
> Hmmph.  So pread for a length of 205 can return 0 on NFS?  Is this
> a transient error?  If so, perhaps a patch like this might help:
>
> diff --git a/index-pack.c b/index-pack.c
> index 5ac91ba..737f757 100644
> --- a/index-pack.c
> +++ b/index-pack.c
> @@ -309,14 +309,19 @@ static void *get_data_from_pack(struct object_entry *obj)
>  	unsigned char *src, *data;
>  	z_stream stream;
>  	int st;
> +	int attempts = 0;
>  
>  	src = xmalloc(len);
>  	data = src;
>  	do {
>  		ssize_t n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
> -		if (n <= 0)
> +		if (n <= 0) {
> +			if (n == 0 && ++attempts < 10)
> +				continue;
>  			die("cannot pread pack file: %s", strerror(errno));
> +		}
>  		rdy += n;
> +		attempts = 0;
>  	} while (rdy < len);
>  	data = xmalloc(obj->size);
>  	memset(&stream, 0, sizeof(stream));
>
>
> The file shouldn't be short unless someone truncated it, or there
> is a bug in index-pack.  Neither is very likely, but I don't think
> we would want to retry pread'ing the same block forever.

I don't think we would want to retry even once.  Return value of 0 from
pread is defined to be an EOF, isn't it?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-26 20:56   ` Junio C Hamano
@ 2008-06-26 21:05     ` Shawn O. Pearce
  2008-06-26 21:36       ` Christian Holtje
  2008-06-26 22:04       ` Junio C Hamano
  2008-06-26 23:36     ` logank
  1 sibling, 2 replies; 16+ messages in thread
From: Shawn O. Pearce @ 2008-06-26 21:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christian Holtje, git

Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > Christian Holtje <docwhat@gmail.com> wrote:
> >> I have read all the threads on git having trouble with pread() and I  
> >> didn't see anything to help.
> > ...
> >>   Receiving objects: 100% (253/253), 5.27 MiB | 9136 KiB/s, done.
> >>   fatal: cannot pread pack file: No such file or directory
> >>   fatal: index-pack failed
> >> 
> >> The end of the strace looks like so:
> >> pread(3, "", 205, 1373)                 = 0
> >> write(2, "fatal: cannot pread pack file: N"..., 57) = 57
> >
> > Hmmph.  So pread for a length of 205 can return 0 on NFS?  Is this
> > a transient error?  If so, perhaps a patch like this might help:
...
> > The file shouldn't be short unless someone truncated it, or there
> > is a bug in index-pack.  Neither is very likely, but I don't think
> > we would want to retry pread'ing the same block forever.
> 
> I don't think we would want to retry even once.  Return value of 0 from
> pread is defined to be an EOF, isn't it?

Indeed, it is defined to be EOF, but EOF here makes no sense.

We have a file position we saw once before as the start of a delta.
We wrote it down to disk.  We want to go back and open it up, as
we have the base decompressed and in memory and need to compute
the SHA-1 of the object that resides at this offset.

And *wham* we get an EOF.  Where there should be data.  Where we
know there is data.

I'm open to the idea that index-pack has a bug, but I doubt it.
We shovel hundreds of megabytes through that on a daily basis
across all of the git users, and nobody ever sees it crash out
with an EOF in the middle of an object it knows to be present.
Except poor Christian on NFS.

Actually, I think the last time someone reported something like this
in Git it turned out to be an NFS kernel bug.  I didn't quote it
in my reply to him, but I think he did say this was a linux client,
linux server.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-26 21:05     ` Shawn O. Pearce
@ 2008-06-26 21:36       ` Christian Holtje
  2008-06-26 22:04       ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Christian Holtje @ 2008-06-26 21:36 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Junio C Hamano, git

On Jun 26, 2008, at 5:05 PM, Shawn O. Pearce wrote:
> Junio C Hamano <gitster@pobox.com> wrote:
>> "Shawn O. Pearce" <spearce@spearce.org> writes:
>>> Christian Holtje <docwhat@gmail.com> wrote:
>>>> I have read all the threads on git having trouble with pread()  
>>>> and I
>>>> didn't see anything to help.
>>> ...
>>>>  Receiving objects: 100% (253/253), 5.27 MiB | 9136 KiB/s, done.
>>>>  fatal: cannot pread pack file: No such file or directory
>>>>  fatal: index-pack failed
>>>>
>>>> The end of the strace looks like so:
>>>> pread(3, "", 205, 1373)                 = 0
>>>> write(2, "fatal: cannot pread pack file: N"..., 57) = 57
>>>
>>> Hmmph.  So pread for a length of 205 can return 0 on NFS?  Is this
>>> a transient error?  If so, perhaps a patch like this might help:
> ...
>>> The file shouldn't be short unless someone truncated it, or there
>>> is a bug in index-pack.  Neither is very likely, but I don't think
>>> we would want to retry pread'ing the same block forever.
>>
>> I don't think we would want to retry even once.  Return value of 0  
>> from
>> pread is defined to be an EOF, isn't it?
>
> Indeed, it is defined to be EOF, but EOF here makes no sense.
>
> We have a file position we saw once before as the start of a delta.
> We wrote it down to disk.  We want to go back and open it up, as
> we have the base decompressed and in memory and need to compute
> the SHA-1 of the object that resides at this offset.
>
> And *wham* we get an EOF.  Where there should be data.  Where we
> know there is data.
>
> I'm open to the idea that index-pack has a bug, but I doubt it.
> We shovel hundreds of megabytes through that on a daily basis
> across all of the git users, and nobody ever sees it crash out
> with an EOF in the middle of an object it knows to be present.
> Except poor Christian on NFS.
>
> Actually, I think the last time someone reported something like this
> in Git it turned out to be an NFS kernel bug.  I didn't quote it
> in my reply to him, but I think he did say this was a linux client,
> linux server.

I did see the threads that talked about NFS, but I couldn't find any  
matching messages in the linux kernel mailing list.  If someone can  
give me a pointer to where this picked up on other mailing lists (say,  
via a URL) then I'll take that back to IT as a reason to upgrade.

Would it be a bug in the client or server?  I'd assume client...

The other NFS/pread() email thread was also a client of linux 2.6.9....

Ciao!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-26 21:05     ` Shawn O. Pearce
  2008-06-26 21:36       ` Christian Holtje
@ 2008-06-26 22:04       ` Junio C Hamano
  2008-06-26 22:07         ` Shawn O. Pearce
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2008-06-26 22:04 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Christian Holtje, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> We have a file position we saw once before as the start of a delta.
> We wrote it down to disk.  We want to go back and open it up, as
> we have the base decompressed and in memory and need to compute
> the SHA-1 of the object that resides at this offset.
>
> And *wham* we get an EOF.  Where there should be data.  Where we
> know there is data.

We have written that earlier in the same process?  Are we playing games
with mixed mmap() and pread()?  Is fsync() or msync() or unmap/remap
needed?

> Actually, I think the last time someone reported something like this
> in Git it turned out to be an NFS kernel bug.  I didn't quote it
> in my reply to him, but I think he did say this was a linux client,
> linux server.

This is getting into the area Linus would immediately know the answer to,
but he is away for this week.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-26 22:04       ` Junio C Hamano
@ 2008-06-26 22:07         ` Shawn O. Pearce
  0 siblings, 0 replies; 16+ messages in thread
From: Shawn O. Pearce @ 2008-06-26 22:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christian Holtje, git

Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > We have a file position we saw once before as the start of a delta.
> > We wrote it down to disk.  We want to go back and open it up, as
> > we have the base decompressed and in memory and need to compute
> > the SHA-1 of the object that resides at this offset.
> >
> > And *wham* we get an EOF.  Where there should be data.  Where we
> > know there is data.
> 
> We have written that earlier in the same process?  Are we playing games
> with mixed mmap() and pread()?  Is fsync() or msync() or unmap/remap
> needed?

Yes, we wrote the very same file earlier in the process.

index-pack _used_ to use a mixed write/mmap game.  Today it uses
write, followed by later pread.  No mmap.  We had major performance
issues on Mac OS X with mmap, not to mention the coherency issues
that can arise from using it.  So index-pack is mmap free[*1*].
 
> > Actually, I think the last time someone reported something like this
> > in Git it turned out to be an NFS kernel bug.  I didn't quote it
> > in my reply to him, but I think he did say this was a linux client,
> > linux server.
> 
> This is getting into the area Linus would immediately know the answer to,
> but he is away for this week.

Yea, I was expecting a reply from him, but that explains it.

*1* There is usage of mmap in index-pack, but only to access
    _existing_ objects and packs from the object store.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-26 20:56   ` Junio C Hamano
  2008-06-26 21:05     ` Shawn O. Pearce
@ 2008-06-26 23:36     ` logank
  2008-06-26 23:38       ` Junio C Hamano
                         ` (2 more replies)
  1 sibling, 3 replies; 16+ messages in thread
From: logank @ 2008-06-26 23:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn O. Pearce, Christian Holtje, git

On Jun 26, 2008, at 1:56 PM, Junio C Hamano wrote:

>> "The file shouldn't be short unless someone truncated it, or there
>> is a bug in index-pack.  Neither is very likely, but I don't think
>> we would want to retry pread'ing the same block forever.
>
> I don't think we would want to retry even once.  Return value of 0  
> from
> pread is defined to be an EOF, isn't it?

No, it seems to be a simple error-out in this case. We have 2.4.20  
systems with nfs-utils 0.3.3 and used to frequently get the same error  
while pushing. I made a similar change back in February and haven't  
had a problem since:

diff --git a/index-pack.c b/index-pack.c
index 5ac91ba..85c8bdb 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -313,7 +313,14 @@ static void *get_data_from_pack(struct  
object_entry *obj)
	src = xmalloc(len);
	data = src;
	do {
+		// It appears that if multiple threads read across NFS, the
+		// second read will fail. I know this is awful, but we wait for
+		// a little bit and try again.
		ssize_t n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
+		if (n <= 0) {
+			sleep(1);
+			n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
+		}
		if (n <= 0)
			die("cannot pread pack file: %s", strerror(errno));
		rdy += n;

I use a sleep request since it seems less likely that the other thread  
will have an outstanding request after a second of waiting.

-- 
                                                         Logan Kennelly
       ,,,
      (. .)
--ooO-(_)-Ooo--

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-26 23:36     ` logank
@ 2008-06-26 23:38       ` Junio C Hamano
  2008-06-27  2:57         ` J. Bruce Fields
  2008-06-27  2:54       ` J. Bruce Fields
  2008-06-27 13:54       ` Christian Holtje
  2 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2008-06-26 23:38 UTC (permalink / raw)
  To: logank; +Cc: J. Bruce Fields, Shawn O. Pearce, Christian Holtje, git

logank@sent.com writes:

> On Jun 26, 2008, at 1:56 PM, Junio C Hamano wrote:
>
>>> "The file shouldn't be short unless someone truncated it, or there
>>> is a bug in index-pack.  Neither is very likely, but I don't think
>>> we would want to retry pread'ing the same block forever.
>>
>> I don't think we would want to retry even once.  Return value of 0
>> from
>> pread is defined to be an EOF, isn't it?
>
> No, it seems to be a simple error-out in this case. We have 2.4.20
> systems with nfs-utils 0.3.3 and used to frequently get the same error
> while pushing. I made a similar change back in February and haven't
> had a problem since:
>
> diff --git a/index-pack.c b/index-pack.c
> index 5ac91ba..85c8bdb 100644
> --- a/index-pack.c
> +++ b/index-pack.c
> @@ -313,7 +313,14 @@ static void *get_data_from_pack(struct
> object_entry *obj)
> 	src = xmalloc(len);
> 	data = src;
> 	do {
> +		// It appears that if multiple threads read across NFS, the
> +		// second read will fail. I know this is awful, but we wait for
> +		// a little bit and try again.
> 		ssize_t n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
> +		if (n <= 0) {
> +			sleep(1);
> +			n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
> +		}
> 		if (n <= 0)
> 			die("cannot pread pack file: %s", strerror(errno));
> 		rdy += n;
>
> I use a sleep request since it seems less likely that the other thread
> will have an outstanding request after a second of waiting.

Gaah.  Don't we have NFS experts in house?  Bruce, perhaps?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-26 23:38       ` Junio C Hamano
@ 2008-06-27  2:57         ` J. Bruce Fields
  2008-06-27 14:50           ` Trond Myklebust
  0 siblings, 1 reply; 16+ messages in thread
From: J. Bruce Fields @ 2008-06-27  2:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: logank, Shawn O. Pearce, Christian Holtje, git, Trond Myklebust

On Thu, Jun 26, 2008 at 04:38:40PM -0700, Junio C Hamano wrote:
> logank@sent.com writes:
> 
> > On Jun 26, 2008, at 1:56 PM, Junio C Hamano wrote:
> >
> >>> "The file shouldn't be short unless someone truncated it, or there
> >>> is a bug in index-pack.  Neither is very likely, but I don't think
> >>> we would want to retry pread'ing the same block forever.
> >>
> >> I don't think we would want to retry even once.  Return value of 0
> >> from
> >> pread is defined to be an EOF, isn't it?
> >
> > No, it seems to be a simple error-out in this case. We have 2.4.20
> > systems with nfs-utils 0.3.3 and used to frequently get the same error
> > while pushing. I made a similar change back in February and haven't
> > had a problem since:
> >
> > diff --git a/index-pack.c b/index-pack.c
> > index 5ac91ba..85c8bdb 100644
> > --- a/index-pack.c
> > +++ b/index-pack.c
> > @@ -313,7 +313,14 @@ static void *get_data_from_pack(struct
> > object_entry *obj)
> > 	src = xmalloc(len);
> > 	data = src;
> > 	do {
> > +		// It appears that if multiple threads read across NFS, the
> > +		// second read will fail. I know this is awful, but we wait for
> > +		// a little bit and try again.
> > 		ssize_t n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
> > +		if (n <= 0) {
> > +			sleep(1);
> > +			n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
> > +		}
> > 		if (n <= 0)
> > 			die("cannot pread pack file: %s", strerror(errno));
> > 		rdy += n;
> >
> > I use a sleep request since it seems less likely that the other thread
> > will have an outstanding request after a second of waiting.
> 
> Gaah.  Don't we have NFS experts in house?  Bruce, perhaps?

Trond, you don't have any idea why a 2.6.9-42.0.8.ELsmp client (2.4.28
server) might be returning spurious 0's from pread()?

Seems like everything is happening from that one client--the file isn't
being simultaneously accessed from the server or from another client.

--b.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-27  2:57         ` J. Bruce Fields
@ 2008-06-27 14:50           ` Trond Myklebust
  2008-06-30  0:32             ` Shawn O. Pearce
  0 siblings, 1 reply; 16+ messages in thread
From: Trond Myklebust @ 2008-06-27 14:50 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Junio C Hamano, logank, Shawn O. Pearce, Christian Holtje, git,
	Trond Myklebust

On Thu, 2008-06-26 at 22:57 -0400, J. Bruce Fields wrote:
> On Thu, Jun 26, 2008 at 04:38:40PM -0700, Junio C Hamano wrote:
> > logank@sent.com writes:
> > 
> > > On Jun 26, 2008, at 1:56 PM, Junio C Hamano wrote:
> > >
> > >>> "The file shouldn't be short unless someone truncated it, or there
> > >>> is a bug in index-pack.  Neither is very likely, but I don't think
> > >>> we would want to retry pread'ing the same block forever.
> > >>
> > >> I don't think we would want to retry even once.  Return value of 0
> > >> from
> > >> pread is defined to be an EOF, isn't it?
> > >
> > > No, it seems to be a simple error-out in this case. We have 2.4.20
> > > systems with nfs-utils 0.3.3 and used to frequently get the same error
> > > while pushing. I made a similar change back in February and haven't
> > > had a problem since:
> > >
> > > diff --git a/index-pack.c b/index-pack.c
> > > index 5ac91ba..85c8bdb 100644
> > > --- a/index-pack.c
> > > +++ b/index-pack.c
> > > @@ -313,7 +313,14 @@ static void *get_data_from_pack(struct
> > > object_entry *obj)
> > > 	src = xmalloc(len);
> > > 	data = src;
> > > 	do {
> > > +		// It appears that if multiple threads read across NFS, the
> > > +		// second read will fail. I know this is awful, but we wait for
> > > +		// a little bit and try again.
> > > 		ssize_t n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
> > > +		if (n <= 0) {
> > > +			sleep(1);
> > > +			n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
> > > +		}
> > > 		if (n <= 0)
> > > 			die("cannot pread pack file: %s", strerror(errno));
> > > 		rdy += n;
> > >
> > > I use a sleep request since it seems less likely that the other thread
> > > will have an outstanding request after a second of waiting.
> > 
> > Gaah.  Don't we have NFS experts in house?  Bruce, perhaps?
> 
> Trond, you don't have any idea why a 2.6.9-42.0.8.ELsmp client (2.4.28
> server) might be returning spurious 0's from pread()?
> 
> Seems like everything is happening from that one client--the file isn't
> being simultaneously accessed from the server or from another client.

Is the file only being read, or could there be a simultaneous write to
the same file? I'm surmising this could be an effect resulting from
simultaneous cache invalidations: prior to Linux 2.6.20 or so, we
weren't rigorously following the VFS/VM rules for page locking, and so
page cache invalidation in particular could have some curious
side-effects.

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-27 14:50           ` Trond Myklebust
@ 2008-06-30  0:32             ` Shawn O. Pearce
  2008-06-30 19:09               ` Nicolas Pitre
  0 siblings, 1 reply; 16+ messages in thread
From: Shawn O. Pearce @ 2008-06-30  0:32 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: J. Bruce Fields, Junio C Hamano, logank, Christian Holtje, git,
	Trond Myklebust

Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> On Thu, 2008-06-26 at 22:57 -0400, J. Bruce Fields wrote:
> > On Thu, Jun 26, 2008 at 04:38:40PM -0700, Junio C Hamano wrote:
> > > logank@sent.com writes:
> > > 
> > > > On Jun 26, 2008, at 1:56 PM, Junio C Hamano wrote:
> > > >
> > > >>> "The file shouldn't be short unless someone truncated it, or there
> > > >>> is a bug in index-pack.  Neither is very likely, but I don't think
> > > >>> we would want to retry pread'ing the same block forever.
> > > >>
> > > >> I don't think we would want to retry even once.  Return value of 0
> > > >> from
> > > >> pread is defined to be an EOF, isn't it?
> > > >
> > > > No, it seems to be a simple error-out in this case. We have 2.4.20
> > > > systems with nfs-utils 0.3.3 and used to frequently get the same error
> > > > while pushing. I made a similar change back in February and haven't
> > > > had a problem since:
> > > >
> > > > diff --git a/index-pack.c b/index-pack.c
> > > > index 5ac91ba..85c8bdb 100644
> > > > --- a/index-pack.c
> > > > +++ b/index-pack.c
> > > > @@ -313,7 +313,14 @@ static void *get_data_from_pack(struct
> > > > object_entry *obj)
> > > > 	src = xmalloc(len);
> > > > 	data = src;
> > > > 	do {
> > > > +		// It appears that if multiple threads read across NFS, the
> > > > +		// second read will fail. I know this is awful, but we wait for
> > > > +		// a little bit and try again.
> > > > 		ssize_t n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
> > > > +		if (n <= 0) {
> > > > +			sleep(1);
> > > > +			n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
> > > > +		}
> > > > 		if (n <= 0)
> > > > 			die("cannot pread pack file: %s", strerror(errno));
> > > > 		rdy += n;
> > > >
> > > > I use a sleep request since it seems less likely that the other thread
> > > > will have an outstanding request after a second of waiting.
> > > 
> > > Gaah.  Don't we have NFS experts in house?  Bruce, perhaps?
> > 
> > Trond, you don't have any idea why a 2.6.9-42.0.8.ELsmp client (2.4.28
> > server) might be returning spurious 0's from pread()?
> > 
> > Seems like everything is happening from that one client--the file isn't
> > being simultaneously accessed from the server or from another client.
> 
> Is the file only being read, or could there be a simultaneous write to
> the same file? I'm surmising this could be an effect resulting from
> simultaneous cache invalidations: prior to Linux 2.6.20 or so, we
> weren't rigorously following the VFS/VM rules for page locking, and so
> page cache invalidation in particular could have some curious
> side-effects.

The file was created and opened O_CREAT|O_EXCL|O_RDWR, by this
process, written linearly using write(2), without any lseeks.
We kept the file descriptor open and starting issuing pread(2)
calls for earlier offsets we had alread written.  One of those
kicks back EOF far too early (and results in this bug report).

Note the only accesses we are using is write(2) and pread(2), and
once we start reading we don't ever go back to writing.  The pread(2)
calls are typically issued in ascending offsets, and we read each
position only once.  This is to try and take advantage of any
read-ahead the kernel may be able to do.  The pread(2) calls are
rarely (if ever) on a block/page boundary.

Nobody else should know about this file. Its written to a temporary
name and no other well behaved processes would attempt to read
the file until it gets closed and renamed to its final destination.
We haven't reached that far in the processing when we get this error,
so there should be only one file descriptor open on the inode, and
its the same one that wrote the data.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-30  0:32             ` Shawn O. Pearce
@ 2008-06-30 19:09               ` Nicolas Pitre
  0 siblings, 0 replies; 16+ messages in thread
From: Nicolas Pitre @ 2008-06-30 19:09 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Trond Myklebust, J. Bruce Fields, Junio C Hamano, logank,
	Christian Holtje, git, Trond Myklebust

On Sun, 29 Jun 2008, Shawn O. Pearce wrote:

> Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> > Is the file only being read, or could there be a simultaneous write to
> > the same file? I'm surmising this could be an effect resulting from
> > simultaneous cache invalidations: prior to Linux 2.6.20 or so, we
> > weren't rigorously following the VFS/VM rules for page locking, and so
> > page cache invalidation in particular could have some curious
> > side-effects.
> 
> The file was created and opened O_CREAT|O_EXCL|O_RDWR, by this
> process, written linearly using write(2), without any lseeks.
> We kept the file descriptor open and starting issuing pread(2)
> calls for earlier offsets we had alread written.  One of those
> kicks back EOF far too early (and results in this bug report).
> 
> Note the only accesses we are using is write(2) and pread(2), and
> once we start reading we don't ever go back to writing.

That's not exact.  With a thin pack, we continue appending data to the 
file after a bunch of pread() have occurred.  And only after those 
pread()'s do we know that we actually have a thin pack.

> The pread(2)
> calls are typically issued in ascending offsets, and we read each
> position only once.  This is to try and take advantage of any
> read-ahead the kernel may be able to do.

That's not exact.  The pread() calls are done when resolving deltas, 
hence a base object is read and every deltas based on it are recursively 
resolved to find their SHA1 signature.  Then another base is picked up 
and the same process repeated.  And in practice all those delta chains 
are all interleaced in the pack file due to the fact that objects are 
stored so to optimize access to recent commits.  Therefore they're more 
or less random.


Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-26 23:36     ` logank
  2008-06-26 23:38       ` Junio C Hamano
@ 2008-06-27  2:54       ` J. Bruce Fields
  2008-06-27 13:44         ` Christian Holtje
  2008-06-27 13:54       ` Christian Holtje
  2 siblings, 1 reply; 16+ messages in thread
From: J. Bruce Fields @ 2008-06-27  2:54 UTC (permalink / raw)
  To: logank; +Cc: Junio C Hamano, Shawn O. Pearce, Christian Holtje, git

On Thu, Jun 26, 2008 at 04:36:27PM -0700, logank@sent.com wrote:
> On Jun 26, 2008, at 1:56 PM, Junio C Hamano wrote:
>
>>> "The file shouldn't be short unless someone truncated it, or there
>>> is a bug in index-pack.  Neither is very likely, but I don't think
>>> we would want to retry pread'ing the same block forever.
>>
>> I don't think we would want to retry even once.  Return value of 0  
>> from
>> pread is defined to be an EOF, isn't it?
>
> No, it seems to be a simple error-out in this case. We have 2.4.20  
> systems

That version's for the client or the server?

--b.

> with nfs-utils 0.3.3 and used to frequently get the same error  
> while pushing. I made a similar change back in February and haven't had a 
> problem since:
>
> diff --git a/index-pack.c b/index-pack.c
> index 5ac91ba..85c8bdb 100644
> --- a/index-pack.c
> +++ b/index-pack.c
> @@ -313,7 +313,14 @@ static void *get_data_from_pack(struct object_entry 
> *obj)
> 	src = xmalloc(len);
> 	data = src;
> 	do {
> +		// It appears that if multiple threads read across NFS, the
> +		// second read will fail. I know this is awful, but we wait for
> +		// a little bit and try again.
> 		ssize_t n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
> +		if (n <= 0) {
> +			sleep(1);
> +			n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
> +		}
> 		if (n <= 0)
> 			die("cannot pread pack file: %s", strerror(errno));
> 		rdy += n;
>
> I use a sleep request since it seems less likely that the other thread  
> will have an outstanding request after a second of waiting.
>
> -- 
>                                                         Logan Kennelly
>       ,,,
>      (. .)
> --ooO-(_)-Ooo--
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-27  2:54       ` J. Bruce Fields
@ 2008-06-27 13:44         ` Christian Holtje
  0 siblings, 0 replies; 16+ messages in thread
From: Christian Holtje @ 2008-06-27 13:44 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: logank, Junio C Hamano, Shawn O. Pearce, git

On Jun 26, 2008, at 10:54 PM, J. Bruce Fields wrote:
> On Thu, Jun 26, 2008 at 04:36:27PM -0700, logank@sent.com wrote:
>> No, it seems to be a simple error-out in this case. We have 2.4.20
>> systems
>
> That version's for the client or the server?

For the bug I sent the information is:
Server:
Linux dev1 2.4.28 #3 SMP Tue Jan 18 14:59:40 EST 2005 i686 i686 i386  
GNU/Linux
Red Hat Linux release 9 (Shrike)

Client:
Linux dev2 2.6.9-42.0.8.ELsmp #1 SMP Tue Jan 30 12:18:01 EST 2007  
x86_64 x86_64 x86_64 GNU/Linux
CentOS release 4.4 (Final)

Ciao!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: pread() over NFS (again) [1.5.5.4]
  2008-06-26 23:36     ` logank
  2008-06-26 23:38       ` Junio C Hamano
  2008-06-27  2:54       ` J. Bruce Fields
@ 2008-06-27 13:54       ` Christian Holtje
  2 siblings, 0 replies; 16+ messages in thread
From: Christian Holtje @ 2008-06-27 13:54 UTC (permalink / raw)
  To: logank; +Cc: Junio C Hamano, Shawn O. Pearce, git

On Jun 26, 2008, at 7:36 PM, logank@sent.com wrote:
> diff --git a/index-pack.c b/index-pack.c
> index 5ac91ba..85c8bdb 100644
> --- a/index-pack.c
> +++ b/index-pack.c
> @@ -313,7 +313,14 @@ static void *get_data_from_pack(struct  
> object_entry *obj)
> 	src = xmalloc(len);
> 	data = src;
> 	do {
> +		// It appears that if multiple threads read across NFS, the
> +		// second read will fail. I know this is awful, but we wait for
> +		// a little bit and try again.
> 		ssize_t n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
> +		if (n <= 0) {
> +			sleep(1);
> +			n = pread(pack_fd, data + rdy, len - rdy, from + rdy);
> +		}
> 		if (n <= 0)
> 			die("cannot pread pack file: %s", strerror(errno));
> 		rdy += n;

This does work.  But the "unpacking objects" phase becomes very slow.   
I had 86 objects and I could see every time it did a sleep because the  
counter would wait a second (or more) before the next object was  
unpacked.

Ciao!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-06-30 19:10 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-26 16:40 pread() over NFS (again) [1.5.5.4] Christian Holtje
2008-06-26 20:46 ` Shawn O. Pearce
2008-06-26 20:56   ` Junio C Hamano
2008-06-26 21:05     ` Shawn O. Pearce
2008-06-26 21:36       ` Christian Holtje
2008-06-26 22:04       ` Junio C Hamano
2008-06-26 22:07         ` Shawn O. Pearce
2008-06-26 23:36     ` logank
2008-06-26 23:38       ` Junio C Hamano
2008-06-27  2:57         ` J. Bruce Fields
2008-06-27 14:50           ` Trond Myklebust
2008-06-30  0:32             ` Shawn O. Pearce
2008-06-30 19:09               ` Nicolas Pitre
2008-06-27  2:54       ` J. Bruce Fields
2008-06-27 13:44         ` Christian Holtje
2008-06-27 13:54       ` Christian Holtje

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).