NFSv4 server on ARM

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* NFSv4 server on ARM
@ 2011-05-29 13:25 Nigel Roberts
  2011-06-05  1:39 ` Nigel Roberts
  2011-10-23  3:55 ` Nigel Roberts
  0 siblings, 2 replies; 4+ messages in thread
From: Nigel Roberts @ 2011-05-29 13:25 UTC (permalink / raw)
  To: linux-nfs

I've run into a problem with running an nfsv4 server on a Marvell
Kirkwood (armv5tel) platform. When copying a large file (>1GB) to 
the NFS  server, thewrite speed will suddenly slow down to ~750kB/s 
and the CPU wait will jump to 100% for the remainder of the transfer.

Other important points:

* it doesn't happen on nfsv3 or any other protocol or indeed any other 
I/O that I have tried.
* other I/O works fine on the server while this is happening.
* the next operation will work normally unless it runs into the same 
problem again after starting out ok.
* the server is running Debian testing (ARM) and kernel 2.6.39. I've 
tried 2.6.38 and 2.6.32 and they have the same problem.
* I'm using gss/krb5i for the mounts
* it doesn't seem to affect x86_64 servers running the same kernel
version and distribution.

Here is what vmstat 1 shows just as the problem occurs:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0   2828  44044 169100    0    0     0 35924 2931 2661  0 86 12  2
 2  1      0   3084  44044 168656    0    0     0  5156 2744 4684  0 47 53  0
 2  0      0   2876  44048 169144    0    0     0    34 2804 5185  0 37 55  8
 1  0      0   3072  44048 168740    0    0     0     0 3068 5870  0 39 61  0

All normal upto this point and it may stay this way for 100s of MB, 
but then things go pear-shaped:

 0  1      0   3732  44092 167768    0    0     0 16486 1162 3489  0 31 4 65
 0  1      0   3252  44188 168184    0    0     0   728  423  481  0  1 0 99
 0  1      0   2728  44264 168632    0    0     0   784  426  509  0  0 0 100
 0  1      0   2968  44316 168540    0    0     0   728  394  489  0  0 0 100
 0  1      0   3300  44396 168220    0    0     0   784  422  517  0  0 0 100

Does anyone have any ideas what might cause this and how I can help to
get it fixed?

Regards,
Nigel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NFSv4 server on ARM
  2011-05-29 13:25 NFSv4 server on ARM Nigel Roberts
@ 2011-06-05  1:39 ` Nigel Roberts
  2011-10-23  3:55 ` Nigel Roberts
  1 sibling, 0 replies; 4+ messages in thread
From: Nigel Roberts @ 2011-06-05  1:39 UTC (permalink / raw)
  To: linux-nfs

On Sun, 29 May 2011 at 13:25:15 +0000, Nigel Roberts wrote:

> I've run into a problem with running an nfsv4 server on a Marvell
> Kirkwood (armv5tel) platform. When copying a large file (>1GB) to 
> the NFS  server, thewrite speed will suddenly slow down to ~750kB/s 
> and the CPU wait will jump to 100% for the remainder of the transfer.

I was to try to spend some time figuring this this out this weekend, but
after updating to 2.6.39.1 for other reasons the problem seems to have
been resolved and I can no longer produce it. I can't see anything in
the changelog that might be the reason for this, but I'm wondering if
it might be the client (ubuntu natty) that was triggering the issue. If
it does start happening again I will be sure to get some packet
captures.

This is just a message for people with similar problems going through
the archives.

Regards,
Nigel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NFSv4 server on ARM
  2011-05-29 13:25 NFSv4 server on ARM Nigel Roberts
  2011-06-05  1:39 ` Nigel Roberts
@ 2011-10-23  3:55 ` Nigel Roberts
  2011-11-22 19:04   ` J. Bruce Fields
  1 sibling, 1 reply; 4+ messages in thread
From: Nigel Roberts @ 2011-10-23  3:55 UTC (permalink / raw)
  To: linux-nfs

On 05/29/2011 11:25 PM, Nigel Roberts wrote:
> I've run into a problem with running an nfsv4 server on a Marvell
> Kirkwood (armv5tel) platform. When copying a large file (>1GB) to
> the NFS  server, thewrite speed will suddenly slow down to ~750kB/s
> and the CPU wait will jump to 100% for the remainder of the transfer.

I've been doing some large file transfers recently and I've run into 
another similar problem, but this time it's system CPU instead of I/O 
wait. I've done some more testing and I've found the following:

* Seems to only affect nfsv4, I can't reproduce it with nfsv3
* It appears to be triggered when free memory is low i.e. the file size 
is large enough to cause cache memory to reach its maximum.
* Happens with both SLAB and SLUB
* Happens with sec=krb5, krb5i and krb5p
* If I transfer a file that's small enough to fit into free memory, the 
problem doesn't occur.

Here are the exports on the server (Debian squeeze + 3.0.7 kernel + 
latest nfs packages from testing on an iomega ix2-200 marvell kirkwood 
system)

/var/store/exports/backups 
*(sec=krb5:krb5i:krb5p,rw,nohide,sync,no_subtree_check)
/var/store/exports/backups 
oobie.home.nobiscuit.com(rw,async,no_subtree_check)

Here are the fstab entries on the client (Ubuntu Oneric, x86_64)

blue.home.nobiscuit.com:/var/store/exports/backups 
/test/backup-nfs3	nfs	rw,proto=tcp,user,noauto	0	0
blue.home.nobiscuit.com:/backups	/test/backup-nfs4	nfs4 
rw,proto=tcp,user,noauto,sec=krb5	0	0


Here's what a an nfsv3 transfer looks like in vmstat (vmstat 2 with swap 
information removed)

procs -----------memory---------- -----io---- -system-- ----cpu----
  r  b   swpd   free   buff  cache    bi    bo   in   cs us sy id wa
  0  0   1940   4068   8516 214252    0     0   32   10  0  0 100  0
  0  0   1940   4068   8516 214252    0     0   32   11  0  0 100  0
  0  0   1940   4068   8516 214252    0     0   32    9  0  0 100  0
<transfer starts here>
  2  0   1940 204348   8516  16588    0   194 2139 2951  0 34 66  0
  0  0   1940 190824   8516  29988    0  7522 3835 5188  0 42 59  0
  1  0   1940 176148   8528  44644    0  5957 4266 5394  0 50 49  2
  2  0   1940 163248   8532  57588    0  4095 3832 5006  0 34 66  0
  1  0   1940 149456   8536  71308    0  5388 3972 5117  0 39 62  0
  2  0   1940 137104   8544  83404    0 10068 3642 4624  0 42 56  3
  4  0   1940 127336   8544  92364    0  5761 2737 3796  0 32 68  0
  1  0   1940 112804   8552 107308    0  2393 4314 5613  0 42 58  1
  0  0   1940  98284   8552 121612    0  8698 4131 5324  0 40 60  0
  2  0   1940  84604   8552 135116    0  5316 4017 5192  0 48 52  0
  0  0   1940  72244   8560 147308    0  8186 3523 4625  0 42 57  2
  0  0   1940  59944   8560 159340    0  8860 3564 4602  0 41 59  0
  1  0   1940  46684   8568 172396    0  2064 3824 5040  0 37 62  1
  1  0   1940  33784   8568 185132    0  7026 3506 4612  0 37 63  0
  1  0   1940  20824   8572 197876    0  9442 3780 4862  0 39 62  0
  1  0   1940   7264   8580 211220    0  5844 3882 5133  0 42 57  2
  0  0   1940   2704   8448 215824    0  4098 4271 5392  0 47 54  0
  0  0   1940   2688   8452 215680    0  8450 4109 5123  1 47 50  3
  0  0   1940   3064   8444 215276    0  7300 3787 4909  0 34 66  0
  0  0   1940   3116   8444 215212    0  8850 3488 4524  0 37 63  0
  1  0   1940   2688   8456 215564    0  8815 3982 4924  0 41 58  2
  0  0   1940   2980   8448 215488    0     1 4101 5408  0 44 57  0
  3  0   1940   3688   8456 214108    0  9047 3783 4880  0 44 55  2
  1  0   1940   2816   8448 215636    0  8867 3285 4452  0 35 66  0
...
The transfer continues to the end with neither a sudden jump in the sy 
number nor any drop in bo.

Here is a transfer with the same file to the same location, using nfsv4 
with sec=krb5:

procs -----------memory---------- -----io---- -system-- ----cpu----
  r  b   swpd   free   buff  cache    bi    bo   in   cs us sy id wa
  0  0   1940   3072   8432 215320    0     0   32   11  0  0 100  0
  0  0   1940   3072   8432 215320    0     0   31   10  0  0 100  0
  0  0   1940   3072   8432 215320    0     0   31   10  0  0 100  0
<transfer starts here>
  0  0   1940   3072   8440 215344    6    20   75  101  0  1 97  3
  0  0   1940   3072   8456 215344    0    20  125  172  1  1 93  5
  0  0   1940 203404   8476  17556    0    50 2394 3514  0 44 52  5
  1  0   1940 191316   8480  29216    0  9299 3333 4694  0 39 61  0
  2  0   1940 179876   8496  40952    0    26 3382 4837  0 42 56  1
  1  0   1940 167472   8496  53364    0  6605 3514 5091  0 37 63  0
  0  0   1940 155892   8496  64948    0  5128 3249 4691  0 37 63  0
  1  0   1940 144312   8504  76372    0  4664 3182 4652  0 38 61  2
  1  0   1940 130856   8504  89588    0  6760 3694 5102  0 44 57  0
  2  0   1940 115272   8516 104988    0  6519 4308 5937  0 50 49  2
  2  0   1940 100752   8516 119220    0  5052 4062 5498  0 48 53  0
  1  0   1940  85512   8516 134236    0  6425 4144 5791  0 49 51  0
  0  0   1940  72192   8524 147388    0  4962 3629 5177  0 40 57  4
  2  0   1940  59532   8524 159708    0  4928 3439 5080  0 33 68  0
  2  0   1940  47652   8532 171452    0 15865 3290 4709  0 47 51  3
  2  0   1940  33372   8532 185500    0  6142 3874 5541  0 47 54  0
  3  0   1940  19632   8532 199036    0  6709 3793 5270  0 45 56  0
  2  0   1940   6956   8540 211520    0  6598 3556 4900  0 40 58  2
  2  0   1940   3132   8408 215328    0  4092 3494 5157  0 39 61  0
<sudden drop in performance here>
  2  1   1940   2796   8540 215364    0  9439 1021 1191  0 92  9  0
  1  1   1940   3096   8724 214612    0   740  429  486  0 100  0  0
  1  1   1940   3096   8900 214656    0   728  424  471  0 100  0  0
  1  1   1940   2856   9076 214808    0   760  449  477  0 100  0  0
  1  1   1940   2616   9252 214820    0   712  420  466  0 100  0  0
  1  1   1940   3096   9428 214108    0   792  467  490  0 100  0  0
  1  1   1940   2976   9620 214120    0   784  456  498  0 100  0  0
  1  1   1940   2556   9804 214292    0   804  461  495  0 100  0  0
...

The transfer will eventually complete, but obviously it takes much longer.

At the point where free memory reaches its lowest point, note the sudden 
increase in sy and the big drop off in bo. Is it a memory allocation 
problem? I've tried increasing the logging for nfsd but there's nothing 
obvious that I can see.

Are there some other statistics I should be looking at? I've tried to 
get ftrace working but I haven't had any luck yet (the ftrace tests fail 
on boot).

Regards,
Nigel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NFSv4 server on ARM
  2011-10-23  3:55 ` Nigel Roberts
@ 2011-11-22 19:04   ` J. Bruce Fields
  0 siblings, 0 replies; 4+ messages in thread
From: J. Bruce Fields @ 2011-11-22 19:04 UTC (permalink / raw)
  To: Nigel Roberts; +Cc: linux-nfs

On Sun, Oct 23, 2011 at 02:55:50PM +1100, Nigel Roberts wrote:
> On 05/29/2011 11:25 PM, Nigel Roberts wrote:
> >I've run into a problem with running an nfsv4 server on a Marvell
> >Kirkwood (armv5tel) platform. When copying a large file (>1GB) to
> >the NFS  server, thewrite speed will suddenly slow down to ~750kB/s
> >and the CPU wait will jump to 100% for the remainder of the transfer.
> 
> I've been doing some large file transfers recently and I've run into
> another similar problem, but this time it's system CPU instead of
> I/O wait. I've done some more testing and I've found the following:
> 
> * Seems to only affect nfsv4, I can't reproduce it with nfsv3
> * It appears to be triggered when free memory is low i.e. the file
> size is large enough to cause cache memory to reach its maximum.
> * Happens with both SLAB and SLUB
> * Happens with sec=krb5, krb5i and krb5p
> * If I transfer a file that's small enough to fit into free memory,
> the problem doesn't occur.

That's interesting!

> Here's what a an nfsv3 transfer looks like in vmstat (vmstat 2 with
> swap information removed)
...
> Here is a transfer with the same file to the same location, using
> nfsv4 with sec=krb5:

You're changing two things at once there (NFS version and security
flavor).  How about trying nfsv4 with sec=sys?

> procs -----------memory---------- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache    bi    bo   in   cs us sy id wa
>  0  0   1940   3072   8432 215320    0     0   32   11  0  0 100  0
>  0  0   1940   3072   8432 215320    0     0   31   10  0  0 100  0
>  0  0   1940   3072   8432 215320    0     0   31   10  0  0 100  0
> <transfer starts here>
>  0  0   1940   3072   8440 215344    6    20   75  101  0  1 97  3
>  0  0   1940   3072   8456 215344    0    20  125  172  1  1 93  5
>  0  0   1940 203404   8476  17556    0    50 2394 3514  0 44 52  5
>  1  0   1940 191316   8480  29216    0  9299 3333 4694  0 39 61  0
>  2  0   1940 179876   8496  40952    0    26 3382 4837  0 42 56  1
>  1  0   1940 167472   8496  53364    0  6605 3514 5091  0 37 63  0
>  0  0   1940 155892   8496  64948    0  5128 3249 4691  0 37 63  0
>  1  0   1940 144312   8504  76372    0  4664 3182 4652  0 38 61  2
>  1  0   1940 130856   8504  89588    0  6760 3694 5102  0 44 57  0
>  2  0   1940 115272   8516 104988    0  6519 4308 5937  0 50 49  2
>  2  0   1940 100752   8516 119220    0  5052 4062 5498  0 48 53  0
>  1  0   1940  85512   8516 134236    0  6425 4144 5791  0 49 51  0
>  0  0   1940  72192   8524 147388    0  4962 3629 5177  0 40 57  4
>  2  0   1940  59532   8524 159708    0  4928 3439 5080  0 33 68  0
>  2  0   1940  47652   8532 171452    0 15865 3290 4709  0 47 51  3
>  2  0   1940  33372   8532 185500    0  6142 3874 5541  0 47 54  0
>  3  0   1940  19632   8532 199036    0  6709 3793 5270  0 45 56  0
>  2  0   1940   6956   8540 211520    0  6598 3556 4900  0 40 58  2
>  2  0   1940   3132   8408 215328    0  4092 3494 5157  0 39 61  0
> <sudden drop in performance here>
>  2  1   1940   2796   8540 215364    0  9439 1021 1191  0 92  9  0
>  1  1   1940   3096   8724 214612    0   740  429  486  0 100  0  0
>  1  1   1940   3096   8900 214656    0   728  424  471  0 100  0  0
>  1  1   1940   2856   9076 214808    0   760  449  477  0 100  0  0
>  1  1   1940   2616   9252 214820    0   712  420  466  0 100  0  0
>  1  1   1940   3096   9428 214108    0   792  467  490  0 100  0  0
>  1  1   1940   2976   9620 214120    0   784  456  498  0 100  0  0
>  1  1   1940   2556   9804 214292    0   804  461  495  0 100  0  0
> ...
> 
> The transfer will eventually complete, but obviously it takes much longer.
> 
> At the point where free memory reaches its lowest point, note the
> sudden increase in sy and the big drop off in bo. Is it a memory
> allocation problem? I've tried increasing the logging for nfsd but
> there's nothing obvious that I can see.
> 
> Are there some other statistics I should be looking at? I've tried
> to get ftrace working but I haven't had any luck yet (the ftrace
> tests fail on boot).

Yes, some kind of profiling would be useful.  (I'm not sure what to
recommend.)

--b.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-11-22 19:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-29 13:25 NFSv4 server on ARM Nigel Roberts
2011-06-05  1:39 ` Nigel Roberts
2011-10-23  3:55 ` Nigel Roberts
2011-11-22 19:04   ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).