hang on existing systems when exporting NFS share to new systems

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* hang on existing systems when exporting NFS share to new systems
@ 2010-07-28 13:44 Jason Keltz
  2010-07-28 17:42 ` J. Bruce Fields
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Keltz @ 2010-07-28 13:44 UTC (permalink / raw)
  To: linux-nfs

Hi.

My list of NFS exports has been gradually growing over the years.  Right 
now, for example, my home directories are exported to around 800 hosts. 
(although only a relatively small subset of those will mount at the same 
time...).  I used to just add hosts to /etc/exports on the file server, 
and run "exportfs -r", and everything would be fine.  New systems would 
be able to mount everything perfectly, and existing systems would not be 
affected at all.  As the list has grown, I've been noticing a problem. 
Now, when I run exportfs -r, there is an approximate 7-10 second hang on 
the systems that have already mounted the share, and then everything 
returns to normal.  This doesn't happen *while* exportfs -r is running, 
but just after it exits.  I figured that maybe exportfs was 
"unexporting"/re-exporting to hosts that already had the share in use 
which might have caused the problem, so I tried to manually add/remove 
hosts thinking that this would only affect those hosts, but it did not. 
Exporting to one new host still causes the hang on all existing hosts.

Since I have multiple exports to all of the hosts, adding one new host 
can hang things for a while.  I can see that reducing the list of 
exports, or hosts would reduce the delay.  What I am wondering is if 
there is a better way that I can add hosts without affecting 
connectivity to existing hosts?

The NFS server itself is pretty powerful -- dual quad core box, lots of 
memory, many NFS threads, exclusive NFS server, etc...  I am running an 
older RHEL4 release though, so it would have an older kernel/NFS system. 
  Maybe this issue has been solved in newer releases.

Any feedback would be helpful.

Thanks!

Jason.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: hang on existing systems when exporting NFS share to new systems
  2010-07-28 13:44 hang on existing systems when exporting NFS share to new systems Jason Keltz
@ 2010-07-28 17:42 ` J. Bruce Fields
  2010-07-31  2:07   ` Jason Keltz
  0 siblings, 1 reply; 6+ messages in thread
From: J. Bruce Fields @ 2010-07-28 17:42 UTC (permalink / raw)
  To: Jason Keltz; +Cc: linux-nfs

On Wed, Jul 28, 2010 at 09:44:48AM -0400, Jason Keltz wrote:
> Hi.
> 
> My list of NFS exports has been gradually growing over the years.
> Right now, for example, my home directories are exported to around
> 800 hosts. (although only a relatively small subset of those will
> mount at the same time...).  I used to just add hosts to
> /etc/exports on the file server, and run "exportfs -r", and
> everything would be fine.  New systems would be able to mount
> everything perfectly, and existing systems would not be affected at
> all.  As the list has grown, I've been noticing a problem. Now, when
> I run exportfs -r, there is an approximate 7-10 second hang on the
> systems that have already mounted the share, and then everything
> returns to normal.  This doesn't happen *while* exportfs -r is
> running, but just after it exits.  I figured that maybe exportfs was
> "unexporting"/re-exporting to hosts that already had the share in
> use which might have caused the problem, so I tried to manually
> add/remove hosts thinking that this would only affect those hosts,
> but it did not. Exporting to one new host still causes the hang on
> all existing hosts.
> 
> Since I have multiple exports to all of the hosts, adding one new
> host can hang things for a while.  I can see that reducing the list
> of exports, or hosts would reduce the delay.  What I am wondering is
> if there is a better way that I can add hosts without affecting
> connectivity to existing hosts?
> 
> The NFS server itself is pretty powerful -- dual quad core box, lots
> of memory, many NFS threads, exclusive NFS server, etc...  I am
> running an older RHEL4 release though, so it would have an older
> kernel/NFS system.  Maybe this issue has been solved in newer
> releases.

There have been fixes in this area, though I don't see any that I'm sure
would address your problem.  If you could test with the latest nfs-utils
(ideally, with the latest nfs-utils and kernel) and let us know the
result, that would be helpful.

The -t option to rpc.mountd (may need a newer nfs-utils?) may also help.

Also worth filing an RHEL bug.

--b.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: hang on existing systems when exporting NFS share to new systems
  2010-07-28 17:42 ` J. Bruce Fields
@ 2010-07-31  2:07   ` Jason Keltz
       [not found]     ` <4C53855F.5060201-Au2wK/vbkGRbjDw2KniTvg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Keltz @ 2010-07-31  2:07 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On 28/07/2010 1:42 PM, J. Bruce Fields wrote:
> On Wed, Jul 28, 2010 at 09:44:48AM -0400, Jason Keltz wrote
> My list of NFS exports has been gradually growing over the years.
> Right now, for example, my home directories are exported to around
> 800 hosts. (although only a relatively small subset of those will
> mount at the same time...).  I used to just add hosts to
> /etc/exports on the file server, and run "exportfs -r", and
> everything would be fine.  New systems would be able to mount
> everything perfectly, and existing systems would not be affected at
> all.  As the list has grown, I've been noticing a problem. Now, when
> I run exportfs -r, there is an approximate 7-10 second hang on the
> systems that have already mounted the share, and then everything
> returns to normal.  This doesn't happen *while* exportfs -r is
> running, but just after it exits.  I figured that maybe exportfs was
> "unexporting"/re-exporting to hosts that already had the share in
> use which might have caused the problem, so I tried to manually
> add/remove hosts thinking that this would only affect those hosts,
> but it did not. Exporting to one new host still causes the hang on
> all existing hosts.
>
> Since I have multiple exports to all of the hosts, adding one new
> host can hang things for a while.  I can see that reducing the list
> of exports, or hosts would reduce the delay.  What I am wondering is
> if there is a better way that I can add hosts without affecting
> connectivity to existing hosts?
>
> The NFS server itself is pretty powerful -- dual quad core box, lots
> of memory, many NFS threads, exclusive NFS server, etc...  I am
> running an older RHEL4 release though, so it would have an older
> kernel/NFS system.  Maybe this issue has been solved in newer
> releases.
>    
> There have been fixes in this area, though I don't see any that I'm sure
> would address your problem.  If you could test with the latest nfs-utils
> (ideally, with the latest nfs-utils and kernel) and let us know the
> result, that would be helpful.
>
> The -t option to rpc.mountd (may need a newer nfs-utils?) may also help.
>
> Also worth filing an RHEL bug.
>    

Hi Bruce,

I backported the -t option to RHEL4 by looking at the latest nfs-utils, 
but it didn't fix the problem.
I'm having trouble compiling the latest nfs-utils for RHEL4 because a 
couple of changed libraries...

What I have learned:

1) whether exportfs -r, or manually add a single host with exportfs, or 
even remove a host with exportfs -u, the delay to all the clients is the 
same.  The delay doesn't change depending on the share.
2) the delay doesn't happen while exportfs is running.  It happens 
immediately afterwards, and when it does happen, an strace of rpc.mountd 
shows that rpc.mountd is busy resolving every single hostname in etab.. 
on one of our NFS servers, this means a total of  13,000 DNS requests... 
on another system, that's over 30,000 DNS requests (and around a 30 
second delay to all shares).  Once rpc.mountd stops burdening the DNS, 
that's exactly when activity on all the shares returns.
3) I've tried to change /etc/exports to use just IP... but exportfs 
happily switches etab back to using hostnames, and then mountd does all 
the lookups again...

I suppose that the reason why exportfs doesn't convert etab to just use 
IPs in the first place is because a name can resolve to multiple IPs... 
but if I start with a list of IPs in /etc/exports, it would be nice if 
they just stayed like that in etab, and if mountd could use them as 
is... what's the point of all the DNS requests? (first to generate etab, 
then from mountd a second time!)

The only thing I can think to try at this point would be to see if I 
populated /etc/hosts locally on the file server to see if the timing 
works better than the DNS requests.

If someone has any suggestions, I'd love to hear them.

Thanks!

Jason.

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <4C53855F.5060201-Au2wK/vbkGRbjDw2KniTvg@public.gmane.org>]

* Re: hang on existing systems when exporting NFS share to new systems
       [not found]     ` <4C53855F.5060201-Au2wK/vbkGRbjDw2KniTvg@public.gmane.org>
@ 2010-10-03 21:13       ` J. Bruce Fields
  2010-10-04 13:21         ` Jason Keltz
  0 siblings, 1 reply; 6+ messages in thread
From: J. Bruce Fields @ 2010-10-03 21:13 UTC (permalink / raw)
  To: Jason Keltz; +Cc: linux-nfs

On Fri, Jul 30, 2010 at 10:07:27PM -0400, Jason Keltz wrote:
> On 28/07/2010 1:42 PM, J. Bruce Fields wrote:
> >On Wed, Jul 28, 2010 at 09:44:48AM -0400, Jason Keltz wrote
> >My list of NFS exports has been gradually growing over the years.
> >Right now, for example, my home directories are exported to around
> >800 hosts. (although only a relatively small subset of those will
> >mount at the same time...).  I used to just add hosts to
> >/etc/exports on the file server, and run "exportfs -r", and
> >everything would be fine.  New systems would be able to mount
> >everything perfectly, and existing systems would not be affected at
> >all.  As the list has grown, I've been noticing a problem. Now, when
> >I run exportfs -r, there is an approximate 7-10 second hang on the
> >systems that have already mounted the share, and then everything
> >returns to normal.  This doesn't happen *while* exportfs -r is
> >running, but just after it exits.  I figured that maybe exportfs was
> >"unexporting"/re-exporting to hosts that already had the share in
> >use which might have caused the problem, so I tried to manually
> >add/remove hosts thinking that this would only affect those hosts,
> >but it did not. Exporting to one new host still causes the hang on
> >all existing hosts.
> >
> >Since I have multiple exports to all of the hosts, adding one new
> >host can hang things for a while.  I can see that reducing the list
> >of exports, or hosts would reduce the delay.  What I am wondering is
> >if there is a better way that I can add hosts without affecting
> >connectivity to existing hosts?
> >
> >The NFS server itself is pretty powerful -- dual quad core box, lots
> >of memory, many NFS threads, exclusive NFS server, etc...  I am
> >running an older RHEL4 release though, so it would have an older
> >kernel/NFS system.  Maybe this issue has been solved in newer
> >releases.
> >There have been fixes in this area, though I don't see any that I'm sure
> >would address your problem.  If you could test with the latest nfs-utils
> >(ideally, with the latest nfs-utils and kernel) and let us know the
> >result, that would be helpful.
> >
> >The -t option to rpc.mountd (may need a newer nfs-utils?) may also help.
> >
> >Also worth filing an RHEL bug.
> 
> Hi Bruce,
> 
> I backported the -t option to RHEL4 by looking at the latest
> nfs-utils, but it didn't fix the problem.
> I'm having trouble compiling the latest nfs-utils for RHEL4 because
> a couple of changed libraries...
> 
> What I have learned:
> 
> 1) whether exportfs -r, or manually add a single host with exportfs,
> or even remove a host with exportfs -u, the delay to all the clients
> is the same.  The delay doesn't change depending on the share.
> 2) the delay doesn't happen while exportfs is running.  It happens
> immediately afterwards, and when it does happen, an strace of
> rpc.mountd shows that rpc.mountd is busy resolving every single
> hostname in etab.. on one of our NFS servers, this means a total of
> 13,000 DNS requests... on another system, that's over 30,000 DNS
> requests (and around a 30 second delay to all shares).  Once
> rpc.mountd stops burdening the DNS, that's exactly when activity on
> all the shares returns.
> 3) I've tried to change /etc/exports to use just IP... but exportfs
> happily switches etab back to using hostnames, and then mountd does
> all the lookups again...
> 
> I suppose that the reason why exportfs doesn't convert etab to just
> use IPs in the first place is because a name can resolve to multiple
> IPs... but if I start with a list of IPs in /etc/exports, it would
> be nice if they just stayed like that in etab, and if mountd could
> use them as is... what's the point of all the DNS requests? (first
> to generate etab, then from mountd a second time!)
> 
> The only thing I can think to try at this point would be to see if I
> populated /etc/hosts locally on the file server to see if the timing
> works better than the DNS requests.
> 
> If someone has any suggestions, I'd love to hear them.

Did you ever figure out anything more about the problem?

--b.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: hang on existing systems when exporting NFS share to new systems
  2010-10-03 21:13       ` J. Bruce Fields
@ 2010-10-04 13:21         ` Jason Keltz
       [not found]           ` <4CA9D4BC.1060201-Au2wK/vbkGRbjDw2KniTvg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Keltz @ 2010-10-04 13:21 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

J. Bruce Fields wrote:
> On Fri, Jul 30, 2010 at 10:07:27PM -0400, Jason Keltz wrote:
>> On 28/07/2010 1:42 PM, J. Bruce Fields wrote:
>>> On Wed, Jul 28, 2010 at 09:44:48AM -0400, Jason Keltz wrote
>>> My list of NFS exports has been gradually growing over the years.
>>> Right now, for example, my home directories are exported to around
>>> 800 hosts. (although only a relatively small subset of those will
>>> mount at the same time...).  I used to just add hosts to
>>> /etc/exports on the file server, and run "exportfs -r", and
>>> everything would be fine.  New systems would be able to mount
>>> everything perfectly, and existing systems would not be affected at
>>> all.  As the list has grown, I've been noticing a problem. Now, when
>>> I run exportfs -r, there is an approximate 7-10 second hang on the
>>> systems that have already mounted the share, and then everything
>>> returns to normal.  This doesn't happen *while* exportfs -r is
>>> running, but just after it exits.  I figured that maybe exportfs was
>>> "unexporting"/re-exporting to hosts that already had the share in
>>> use which might have caused the problem, so I tried to manually
>>> add/remove hosts thinking that this would only affect those hosts,
>>> but it did not. Exporting to one new host still causes the hang on
>>> all existing hosts.
>>>
>>> Since I have multiple exports to all of the hosts, adding one new
>>> host can hang things for a while.  I can see that reducing the list
>>> of exports, or hosts would reduce the delay.  What I am wondering is
>>> if there is a better way that I can add hosts without affecting
>>> connectivity to existing hosts?
>>>
>>> The NFS server itself is pretty powerful -- dual quad core box, lots
>>> of memory, many NFS threads, exclusive NFS server, etc...  I am
>>> running an older RHEL4 release though, so it would have an older
>>> kernel/NFS system.  Maybe this issue has been solved in newer
>>> releases.
>>> There have been fixes in this area, though I don't see any that I'm sure
>>> would address your problem.  If you could test with the latest nfs-utils
>>> (ideally, with the latest nfs-utils and kernel) and let us know the
>>> result, that would be helpful.
>>>
>>> The -t option to rpc.mountd (may need a newer nfs-utils?) may also help.
>>>
>>> Also worth filing an RHEL bug.
>> Hi Bruce,
>>
>> I backported the -t option to RHEL4 by looking at the latest
>> nfs-utils, but it didn't fix the problem.
>> I'm having trouble compiling the latest nfs-utils for RHEL4 because
>> a couple of changed libraries...
>>
>> What I have learned:
>>
>> 1) whether exportfs -r, or manually add a single host with exportfs,
>> or even remove a host with exportfs -u, the delay to all the clients
>> is the same.  The delay doesn't change depending on the share.
>> 2) the delay doesn't happen while exportfs is running.  It happens
>> immediately afterwards, and when it does happen, an strace of
>> rpc.mountd shows that rpc.mountd is busy resolving every single
>> hostname in etab.. on one of our NFS servers, this means a total of
>> 13,000 DNS requests... on another system, that's over 30,000 DNS
>> requests (and around a 30 second delay to all shares).  Once
>> rpc.mountd stops burdening the DNS, that's exactly when activity on
>> all the shares returns.
>> 3) I've tried to change /etc/exports to use just IP... but exportfs
>> happily switches etab back to using hostnames, and then mountd does
>> all the lookups again...
>>
>> I suppose that the reason why exportfs doesn't convert etab to just
>> use IPs in the first place is because a name can resolve to multiple
>> IPs... but if I start with a list of IPs in /etc/exports, it would
>> be nice if they just stayed like that in etab, and if mountd could
>> use them as is... what's the point of all the DNS requests? (first
>> to generate etab, then from mountd a second time!)
>>
>> The only thing I can think to try at this point would be to see if I
>> populated /etc/hosts locally on the file server to see if the timing
>> works better than the DNS requests.
>>
>> If someone has any suggestions, I'd love to hear them.
> 
> Did you ever figure out anything more about the problem?

Hi Bruce,

Actually, since I was sharing out over a private network, I did not need 
to include every address in the list, and was able to use an IP range. 
Now, I share out to pretty much the same host list, but export is 
instantaneous.  I still believe that I *should* have been able to export 
to the large number of hosts without the tremendous delay, but I was not 
able to solve the problem with my current (RHEL4) installation.  We will 
upgrade to RHEL6 when it's available, so I might try again with that 
just for fun.

Jason.

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <4CA9D4BC.1060201-Au2wK/vbkGRbjDw2KniTvg@public.gmane.org>]

* Re: hang on existing systems when exporting NFS share to new systems
       [not found]           ` <4CA9D4BC.1060201-Au2wK/vbkGRbjDw2KniTvg@public.gmane.org>
@ 2010-10-04 15:52             ` J. Bruce Fields
  0 siblings, 0 replies; 6+ messages in thread
From: J. Bruce Fields @ 2010-10-04 15:52 UTC (permalink / raw)
  To: Jason Keltz; +Cc: linux-nfs

On Mon, Oct 04, 2010 at 09:21:00AM -0400, Jason Keltz wrote:
> Actually, since I was sharing out over a private network, I did not
> need to include every address in the list, and was able to use an IP
> range. Now, I share out to pretty much the same host list, but
> export is instantaneous.  I still believe that I *should* have been
> able to export to the large number of hosts without the tremendous
> delay, but I was not able to solve the problem with my current
> (RHEL4) installation.  We will upgrade to RHEL6 when it's available,
> so I might try again with that just for fun.

OK, thanks--let us know of any results you get, positive or negative.

--b.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-10-04 15:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-28 13:44 hang on existing systems when exporting NFS share to new systems Jason Keltz
2010-07-28 17:42 ` J. Bruce Fields
2010-07-31  2:07   ` Jason Keltz
     [not found]     ` <4C53855F.5060201-Au2wK/vbkGRbjDw2KniTvg@public.gmane.org>
2010-10-03 21:13       ` J. Bruce Fields
2010-10-04 13:21         ` Jason Keltz
     [not found]           ` <4CA9D4BC.1060201-Au2wK/vbkGRbjDw2KniTvg@public.gmane.org>
2010-10-04 15:52             ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).