All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS cache problem
@ 2005-05-25 23:01 Anton Starikov
  2005-05-25 23:34 ` Trond Myklebust
  0 siblings, 1 reply; 8+ messages in thread
From: Anton Starikov @ 2005-05-25 23:01 UTC (permalink / raw)
  To: nfs

I have fileserver exporting via NFSv3 /home directories to few desktops
and small cluster. File server has two NICs, one for cluster, one for
desktops. Kernel version is 2.6.5.


export options are (exportfs -v):
/home           192.168.211.0/24(rw,async,no_root_squash)

mount options are (cat /proc/mounts):
192.168.211.240:/home /home nfs
rw,sync,v3,rsize=32768,wsize=32768,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,intr,tcp,noac,lock,addr=192.168.211.240
0 0

Time on all clients (desktops and cluster) are synchronized via NTP.

But time to time I have strange situation. You can rewrite file at one
host, but for long time (up to few hours!!!) some hosts will read new
file, some hosts will read old file.

I tried to play with all options without any result. Exporting with
"sync" seems to be a sollution, but performance is really very low
(because of this "sync" wasn't realy tested). And and I don't need
"sync" because I'm not interesting in saving to "real" media.
File-cache is good enough for me. And, in principal, "syns/async" on
server side should be irrelevant in this case (at least on client side I
have "sync", it's enough).

To avoid future discussion, there is nothing "cluster specific". Cluster
is very small and  there is nothing like concurent read/write.
Basically, only one specific thing, the same data can be accesible from
different host. But even not at the same time usually.
You write file on one host, let say, and in couple minutes you read it
from different host (you prepare input data on client host or master
node and after you submit job into the queue). That should be OK, but in
my case...up to few hours clients can see chaos by reading different
versions of file.

Does anybody has some ideas how to solve the problem?

BTW, hardware configuration.
server:
3ware SATA raid, 2xXeon CPUs (NFSD started in 8 threads). Intel and
Broadcom GbE NICs.

Clients - mostly dual Opteron machines.

Actually, I have strong filling that problem started to be much more
"visible" when I have added second CPU to server. It exists before, but
usually not longer that for 10 minutes. Now my users report me about
hours. This is incredible. Basically, work in my group partly paralysed
now :(

Of course there is such things like lustre, PVFS and so on. But I
beleive that my case isn't proper case to start use such filesystems.
NFS should be more than enough.

Thanks,
	Anton Starikov.



-------------------------------------------------------
SF.Net email is sponsored by: GoToMeeting - the easiest way to collaborate
online with coworkers and clients while avoiding the high cost of travel and
communications. There is no equipment to buy and you can meet as often as
you want. Try it free.http://ads.osdn.com/?ad_id=7402&alloc_id=16135&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 8+ messages in thread
* RE: NFS cache problem
@ 2005-05-26  1:21 Lever, Charles
  2005-05-26  1:27 ` Anton Starikov
  0 siblings, 1 reply; 8+ messages in thread
From: Lever, Charles @ 2005-05-26  1:21 UTC (permalink / raw)
  To: Anton Starikov; +Cc: nfs

hi anton-

> I use ReiserFS.
> And here we talk not about seconds...we talk about hours sometimes.
> That's seems too strange for me. For seconds I can find plenty of
> explanations :)
> In principal even if it will be one minute, I'll be much more=20
> happy than
> now.

> > If you seriously need uncached reads and writes, then you=20
> should rather,
> > consider using O_DIRECT.
> Unfortunatelly this is not trivial problem. A lot of software are
> involved here. And I beleive that NFS should be able to work in this
> conditions properly. At least it did with solaris in similar=20
> environment.

if you have a way of reproducing this condition, maybe you could capture
a network trace on the server while running your test case...  "tcpdump
-s0 -w dumpfile" and post it on the web so we can take a look at what's
going on.


-------------------------------------------------------
SF.Net email is sponsored by: GoToMeeting - the easiest way to collaborate
online with coworkers and clients while avoiding the high cost of travel and
communications. There is no equipment to buy and you can meet as often as
you want. Try it free.http://ads.osdn.com/?ad_id=7402&alloc_id=16135&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-05-26  2:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-25 23:01 NFS cache problem Anton Starikov
2005-05-25 23:34 ` Trond Myklebust
2005-05-25 23:44   ` Anton Starikov
2005-05-26  0:12     ` Trond Myklebust
2005-05-26  0:29       ` Anton Starikov
2005-05-26  2:12         ` Trond Myklebust
  -- strict thread matches above, loose matches on Subject: below --
2005-05-26  1:21 Lever, Charles
2005-05-26  1:27 ` Anton Starikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.