NFS performance degradation of local loopback FS.

Linux NFS development
 help / color / mirror / Atom feed

* NFS performance degradation of local loopback FS.
@ 2008-06-19  6:46 Krishna Kumar2
  2008-06-19  9:58 ` Krishna Kumar2
  0 siblings, 1 reply; 32+ messages in thread
From: Krishna Kumar2 @ 2008-06-19  6:46 UTC (permalink / raw)
  To: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 873 bytes --]

Hi,

I am running 2.6.25 kernel on a [4 way, 3.2 x86_64, 4GB] system. The test
is doing I/O on a local ext3 filesystem, and measuring the bandwidth, and
then NFS mounting the filesystem loopback on the same system. I have
configured 64 nfsd's to run. The test script is attached at the bottom.

My configuration is:
      /dev/some-local-disk  :            /local
      NFS mount /local       :            /nfs

The result is:
      200 processes:
            /local: 108000 KB/s
            /nfs:     66000 KB/s: Drop of 40%

      300 processes (KB/s):
            /local: 112000 KB/s
            /nfs:    57000 KB/s: Drop of 50%

I am not using any tuning, though I have tested with both
sunrpc.tcp_slot_table_entries=16 & 128

Is this big a drop expected for a loopback NFS mount? Any
feedback/suggestions are very
appreciated.

Thanks,

- KK

(See attached file: nfs)

[-- Attachment #2: nfs --]
[-- Type: application/octet-stream, Size: 865 bytes --]

typeset -i i

# Arguments: I/O size, Processes, Time to run, Filesystem prefix, eg:
#	4096 200 10 /local/ddp; AND:
#	4096 200 10 /nfs/ddp
# For bufsize:4K, procs:200, time:10secs, Filesystem prefix: remainder.
#       where /local is an ext3 filesystem NFSv3 mounted on /nfs

bufsize=$1
max=$2
time=$3
prefix=$4

dir=`dirname $prefix`
localprefix=`echo $prefix | sed 's/nfs/local/'`

i=0
while [ $i -lt $max ]
do
	dd if=/dev/zero of=$prefix.$i bs=$bufsize count=100000000000 &
	i=$i+1
done

sleep $time

kill -9 `ps | grep dd | grep -v grep | awk '{print $1}'` > /dev/null 2>&1

# kill takes too long to finish, preserve the state using the local
# filesystem to get immediate and quick results).
ls -l $localprefix* > /tmp/$$

total=`cat /tmp/$$ | awk '{print $5}' | add`
bw=`echo "$total / $time / 1024" | bc`
echo "Total: $total bytes in $time sec = $bw KB/sec"

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-19  6:46 NFS performance degradation of local loopback FS Krishna Kumar2
@ 2008-06-19  9:58 ` Krishna Kumar2
  2008-06-19 12:04   ` Peter Staubach
  0 siblings, 1 reply; 32+ messages in thread
From: Krishna Kumar2 @ 2008-06-19  9:58 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: linux-nfs

>       200 processes:

By "200 processes", I meant 200 dd's, each reading from /dev/zero and
writing to a file on the filesystem. The script "nfs" was run twice, first
with
a local filesystem and the second time with the same filesystem NFS
mounted.

Thanks,

- KK

linux-nfs-owner@vger.kernel.org wrote on 06/19/2008 12:16:23 PM:

>
> Hi,
>
> I am running 2.6.25 kernel on a [4 way, 3.2 x86_64, 4GB] system. The test
> is doing I/O on a local ext3 filesystem, and measuring the bandwidth, and
> then NFS mounting the filesystem loopback on the same system. I have
> configured 64 nfsd's to run. The test script is attached at the bottom.
>
> My configuration is:
>       /dev/some-local-disk  :            /local
>       NFS mount /local       :            /nfs
>
> The result is:
>       200 processes:
>             /local: 108000 KB/s
>             /nfs:     66000 KB/s: Drop of 40%
>
>       300 processes (KB/s):
>             /local: 112000 KB/s
>             /nfs:    57000 KB/s: Drop of 50%
>
> I am not using any tuning, though I have tested with both
> sunrpc.tcp_slot_table_entries=16 & 128
>
> Is this big a drop expected for a loopback NFS mount? Any
> feedback/suggestions are very
> appreciated.
>
> Thanks,
>
> - KK
>
> (See attached file: nfs)[attachment "nfs" deleted by Krishna
Kumar2/India/IBM]


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-19  9:58 ` Krishna Kumar2
@ 2008-06-19 12:04   ` Peter Staubach
  2008-06-19 12:52     ` Benny Halevy
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Staubach @ 2008-06-19 12:04 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: linux-nfs

Krishna Kumar2 wrote:
>>       200 processes:
>>     
>
> By "200 processes", I meant 200 dd's, each reading from /dev/zero and
> writing to a file on the filesystem. The script "nfs" was run twice, first
> with
> a local filesystem and the second time with the same filesystem NFS
> mounted.
>
>   

Well, you aren't exactly comparing apples to apples.  The NFS
client does close-to-open semantics, meaning that it writes
all modified data to the server on close.  The dd commands run
on the local file system do not.  You might trying using
something which does an fsync before closing so that you are
making a closer comparison.

All that said, yes, one would expect a slow down.  How much is
debatable and varies from platform to platform and load to load.

I would also advise care when running NFS like that.  It is
subject to deadlock and is not recommended.

       ps

> Thanks,
>
> - KK
>
> linux-nfs-owner@vger.kernel.org wrote on 06/19/2008 12:16:23 PM:
>
>   
>> Hi,
>>
>> I am running 2.6.25 kernel on a [4 way, 3.2 x86_64, 4GB] system. The test
>> is doing I/O on a local ext3 filesystem, and measuring the bandwidth, and
>> then NFS mounting the filesystem loopback on the same system. I have
>> configured 64 nfsd's to run. The test script is attached at the bottom.
>>
>> My configuration is:
>>       /dev/some-local-disk  :            /local
>>       NFS mount /local       :            /nfs
>>
>> The result is:
>>       200 processes:
>>             /local: 108000 KB/s
>>             /nfs:     66000 KB/s: Drop of 40%
>>
>>       300 processes (KB/s):
>>             /local: 112000 KB/s
>>             /nfs:    57000 KB/s: Drop of 50%
>>
>> I am not using any tuning, though I have tested with both
>> sunrpc.tcp_slot_table_entries=16 & 128
>>
>> Is this big a drop expected for a loopback NFS mount? Any
>> feedback/suggestions are very
>> appreciated.
>>
>> Thanks,
>>
>> - KK
>>
>> (See attached file: nfs)[attachment "nfs" deleted by Krishna
>>     
> Kumar2/India/IBM]
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-19 12:04   ` Peter Staubach
@ 2008-06-19 12:52     ` Benny Halevy
  2008-06-20  6:39       ` Krishna Kumar2
  2008-06-20  9:21       ` Krishna Kumar2
  0 siblings, 2 replies; 32+ messages in thread
From: Benny Halevy @ 2008-06-19 12:52 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: Peter Staubach, linux-nfs

On Jun. 19, 2008, 15:04 +0300, Peter Staubach <staubach@redhat.com> wrote:
> Krishna Kumar2 wrote:
>>>       200 processes:
>>>     
>> By "200 processes", I meant 200 dd's, each reading from /dev/zero and
>> writing to a file on the filesystem. The script "nfs" was run twice, first
>> with
>> a local filesystem and the second time with the same filesystem NFS
>> mounted.
>>
>>   
> 
> Well, you aren't exactly comparing apples to apples.  The NFS
> client does close-to-open semantics, meaning that it writes
> all modified data to the server on close.  The dd commands run
> on the local file system do not.  You might trying using
> something which does an fsync before closing so that you are
> making a closer comparison.

try dd conv=fsync ...

Benny

> 
> All that said, yes, one would expect a slow down.  How much is
> debatable and varies from platform to platform and load to load.
> 
> I would also advise care when running NFS like that.  It is
> subject to deadlock and is not recommended.
> 
>        ps
> 
>> Thanks,
>>
>> - KK
>>
>> linux-nfs-owner@vger.kernel.org wrote on 06/19/2008 12:16:23 PM:
>>
>>   
>>> Hi,
>>>
>>> I am running 2.6.25 kernel on a [4 way, 3.2 x86_64, 4GB] system. The test
>>> is doing I/O on a local ext3 filesystem, and measuring the bandwidth, and
>>> then NFS mounting the filesystem loopback on the same system. I have
>>> configured 64 nfsd's to run. The test script is attached at the bottom.
>>>
>>> My configuration is:
>>>       /dev/some-local-disk  :            /local
>>>       NFS mount /local       :            /nfs
>>>
>>> The result is:
>>>       200 processes:
>>>             /local: 108000 KB/s
>>>             /nfs:     66000 KB/s: Drop of 40%
>>>
>>>       300 processes (KB/s):
>>>             /local: 112000 KB/s
>>>             /nfs:    57000 KB/s: Drop of 50%
>>>
>>> I am not using any tuning, though I have tested with both
>>> sunrpc.tcp_slot_table_entries=16 & 128
>>>
>>> Is this big a drop expected for a loopback NFS mount? Any
>>> feedback/suggestions are very
>>> appreciated.
>>>
>>> Thanks,
>>>
>>> - KK
>>>
>>> (See attached file: nfs)[attachment "nfs" deleted by Krishna
>>>     
>> Kumar2/India/IBM]
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>   
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Benny Halevy
Software Architect
Tel/Fax: +972-3-647-8340
Mobile: +972-54-802-8340
bhalevy@panasas.com
 
Panasas, Inc.
The Leader in Parallel Storage
www.panasas.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-19 12:52     ` Benny Halevy
@ 2008-06-20  6:39       ` Krishna Kumar2
  2008-06-20  9:21       ` Krishna Kumar2
  1 sibling, 0 replies; 32+ messages in thread
From: Krishna Kumar2 @ 2008-06-20  6:39 UTC (permalink / raw)
  To: Benny Halevy; +Cc: linux-nfs, Peter Staubach

Thanks Peter for your explanation, and Benny for this option I was not
aware of. Let me
run some tests with this option.

Regards,

- KK

linux-nfs-owner@vger.kernel.org wrote on 06/19/2008 06:22:42 PM:

> On Jun. 19, 2008, 15:04 +0300, Peter Staubach <staubach@redhat.com>
wrote:
> > Krishna Kumar2 wrote:
> >>>       200 processes:
> >>>
> >> By "200 processes", I meant 200 dd's, each reading from /dev/zero and
> >> writing to a file on the filesystem. The script "nfs" was run twice,
first
> >> with
> >> a local filesystem and the second time with the same filesystem NFS
> >> mounted.
> >>
> >>
> >
> > Well, you aren't exactly comparing apples to apples.  The NFS
> > client does close-to-open semantics, meaning that it writes
> > all modified data to the server on close.  The dd commands run
> > on the local file system do not.  You might trying using
> > something which does an fsync before closing so that you are
> > making a closer comparison.
>
> try dd conv=fsync ...
>
> Benny
>
> >
> > All that said, yes, one would expect a slow down.  How much is
> > debatable and varies from platform to platform and load to load.
> >
> > I would also advise care when running NFS like that.  It is
> > subject to deadlock and is not recommended.
> >
> >        ps
> >
> >> Thanks,
> >>
> >> - KK
> >>
> >> linux-nfs-owner@vger.kernel.org wrote on 06/19/2008 12:16:23 PM:
> >>
> >>
> >>> Hi,
> >>>
> >>> I am running 2.6.25 kernel on a [4 way, 3.2 x86_64, 4GB] system. The
test
> >>> is doing I/O on a local ext3 filesystem, and measuring the bandwidth,
and
> >>> then NFS mounting the filesystem loopback on the same system. I have
> >>> configured 64 nfsd's to run. The test script is attached at the
bottom.
> >>>
> >>> My configuration is:
> >>>       /dev/some-local-disk  :            /local
> >>>       NFS mount /local       :            /nfs
> >>>
> >>> The result is:
> >>>       200 processes:
> >>>             /local: 108000 KB/s
> >>>             /nfs:     66000 KB/s: Drop of 40%
> >>>
> >>>       300 processes (KB/s):
> >>>             /local: 112000 KB/s
> >>>             /nfs:    57000 KB/s: Drop of 50%
> >>>
> >>> I am not using any tuning, though I have tested with both
> >>> sunrpc.tcp_slot_table_entries=16 & 128
> >>>
> >>> Is this big a drop expected for a loopback NFS mount? Any
> >>> feedback/suggestions are very
> >>> appreciated.
> >>>
> >>> Thanks,
> >>>
> >>> - KK
> >>>
> >>> (See attached file: nfs)[attachment "nfs" deleted by Krishna
> >>>
> >> Kumar2/India/IBM]
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> --
> Benny Halevy
> Software Architect
> Tel/Fax: +972-3-647-8340
> Mobile: +972-54-802-8340
> bhalevy@panasas.com
>
> Panasas, Inc.
> The Leader in Parallel Storage
> www.panasas.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-19 12:52     ` Benny Halevy
  2008-06-20  6:39       ` Krishna Kumar2
@ 2008-06-20  9:21       ` Krishna Kumar2
  2008-06-22  8:35         ` Benny Halevy
  1 sibling, 1 reply; 32+ messages in thread
From: Krishna Kumar2 @ 2008-06-20  9:21 UTC (permalink / raw)
  To: Benny Halevy; +Cc: linux-nfs, Peter Staubach

Benny Halevy <bhalevy@panasas.com> wrote on 06/19/2008 06:22:42 PM:

> > Well, you aren't exactly comparing apples to apples.  The NFS
> > client does close-to-open semantics, meaning that it writes
> > all modified data to the server on close.  The dd commands run
> > on the local file system do not.  You might trying using
> > something which does an fsync before closing so that you are
> > making a closer comparison.
>
> try dd conv=fsync ...

I ran a single 'dd' with this option on /local and later on /nfs (same
filesystem nfs mounted on the same system). The script is umounting and
mounting local and nfs partitions between each 'dd'. Following are the
file sizes for 20 and 60 second runs respectively:
      -rw-r--r-- 1 root root 1558056960 Jun 20 14:41 local.1
      -rw-r--r-- 1 root root  671834112 Jun 20 14:41 nfs.1     (56% drop)
                        &
      -rw-r--r-- 1 root root 3845812224 Jun 20 14:42 local.1
      -rw-r--r-- 1 root root 2420342784 Jun 20 14:43 nfs.1     (37% drop)

Since I am new to NFS, I am not sure if this much degradation is expected,
or whether I need to tune something. Is there some code I can look at or
hack into to find possible locations for the performance fall? At this time
I cannot even tell whether the *possible* bug is in server or client code.

Thanks,

- KK


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-20  9:21       ` Krishna Kumar2
@ 2008-06-22  8:35         ` Benny Halevy
  2008-06-23  8:11           ` Krishna Kumar2
  0 siblings, 1 reply; 32+ messages in thread
From: Benny Halevy @ 2008-06-22  8:35 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: linux-nfs, Peter Staubach

On Jun. 20, 2008, 12:21 +0300, Krishna Kumar2 <krkumar2@in.ibm.com> wrote:
> Benny Halevy <bhalevy@panasas.com> wrote on 06/19/2008 06:22:42 PM:
> 
>>> Well, you aren't exactly comparing apples to apples.  The NFS
>>> client does close-to-open semantics, meaning that it writes
>>> all modified data to the server on close.  The dd commands run
>>> on the local file system do not.  You might trying using
>>> something which does an fsync before closing so that you are
>>> making a closer comparison.
>> try dd conv=fsync ...
> 
> I ran a single 'dd' with this option on /local and later on /nfs (same
> filesystem nfs mounted on the same system). The script is umounting and
> mounting local and nfs partitions between each 'dd'. Following are the
> file sizes for 20 and 60 second runs respectively:

According to dd's man page, the f{,date}sync options tell it to
"physically write output file data before finishing"
If you kill it before that you end up with dirty data in the cache.
What exactly are you trying to measure, what is the expected application
workload?

>       -rw-r--r-- 1 root root 1558056960 Jun 20 14:41 local.1
>       -rw-r--r-- 1 root root  671834112 Jun 20 14:41 nfs.1     (56% drop)
>                         &
>       -rw-r--r-- 1 root root 3845812224 Jun 20 14:42 local.1
>       -rw-r--r-- 1 root root 2420342784 Jun 20 14:43 nfs.1     (37% drop)
> 
> Since I am new to NFS, I am not sure if this much degradation is expected,
> or whether I need to tune something. Is there some code I can look at or
> hack into to find possible locations for the performance fall? At this time
> I cannot even tell whether the *possible* bug is in server or client code.

I'm not sure if there's a any bug per-se at all although there seems to be
some room for improvement.

As another data point, I'm seeing about 20% worse write throughput on my
system with a single dd writing local file system vs. writing to the same fs
over a loopback mounted nfs with a 2.6.26-rc6 based kernel (nfs 3 and 4 gave
similar results).
Disk:
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-7: HDT722516DLA380, V43OA96A, max UDMA/133
ata3.00: 321672960 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata3.00: configured for UDMA/133

ext3 mount options: noatime
nfs mount options: rsize=65536,wsize=65536
dd options: bs=64k count=10k conv=fsync

(write results average of 3 runs)
write local disk:     47.6 MB/s
write loopback nfsv3: 30.2 MB/s
write remote nfsv3:   29.0 MB/s
write loopback nfsv4: 37.5 MB/s
write remote nfsv4:   29.1 MB/s

read local disk:      50.8 MB/s
read loopback nfsv3:  27.2 MB/s
read remote nfsv3:    21.8 MB/s
read loopback nfsv4:  25.4 MB/s
read remote nfsv4:    21.4 MB/s

Benny

> 
> Thanks,
> 
> - KK
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-22  8:35         ` Benny Halevy
@ 2008-06-23  8:11           ` Krishna Kumar2
  2008-06-23 12:40             ` Benny Halevy
  0 siblings, 1 reply; 32+ messages in thread
From: Krishna Kumar2 @ 2008-06-23  8:11 UTC (permalink / raw)
  To: Benny Halevy; +Cc: Benny Halevy, linux-nfs, Peter Staubach

Hi Benny,

> According to dd's man page, the f{,date}sync options tell it to
> "physically write output file data before finishing"
> If you kill it before that you end up with dirty data in the cache.
> What exactly are you trying to measure, what is the expected application
> workload?

I changed my test to do what you were doing instead of killing
dd's, etc. The end application is DB2 and it is using multiple
processes and I wanted to simulate that with micro-benchmarks.
The only reliable way to benchmark bandwidth for multiple
processes is to kill the tests after running them for some time
instead of letting them run till conclusion.

> ext3 mount options: noatime
> nfs mount options: rsize=65536,wsize=65536
> dd options: bs=64k count=10k conv=fsync
>
> (write results average of 3 runs)
> write local disk:     47.6 MB/s
> write loopback nfsv3: 30.2 MB/s
> write remote nfsv3:   29.0 MB/s
> write loopback nfsv4: 37.5 MB/s
> write remote nfsv4:   29.1 MB/s
>
> read local disk:      50.8 MB/s
> read loopback nfsv3:  27.2 MB/s
> read remote nfsv3:    21.8 MB/s
> read loopback nfsv4:  25.4 MB/s
> read remote nfsv4:    21.4 MB/s

I used the exact same options you are using, and here is the results
averaged across 3 runs:

Write local disk      58.5 MB/s
Write loopback nfsv3: 29.42 MB/s (50% drop)

Reading (file created from /dev/urandom, somehow I am getting in GB/sec
            while your results were comparable to write's):
      Read local disk:      2.77 GB/s
      Read loopback nfsv3:  2.86 GB/s (higher for some reason)

Thanks,

- KK


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-23  8:11           ` Krishna Kumar2
@ 2008-06-23 12:40             ` Benny Halevy
  2008-06-26  7:19               ` Krishna Kumar2
  0 siblings, 1 reply; 32+ messages in thread
From: Benny Halevy @ 2008-06-23 12:40 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: linux-nfs, Peter Staubach

On Jun. 23, 2008, 11:11 +0300, Krishna Kumar2 <krkumar2@in.ibm.com> wrote:
> Hi Benny,
> 
>> According to dd's man page, the f{,date}sync options tell it to
>> "physically write output file data before finishing"
>> If you kill it before that you end up with dirty data in the cache.
>> What exactly are you trying to measure, what is the expected application
>> workload?
> 
> I changed my test to do what you were doing instead of killing
> dd's, etc. The end application is DB2 and it is using multiple
> processes and I wanted to simulate that with micro-benchmarks.
> The only reliable way to benchmark bandwidth for multiple
> processes is to kill the tests after running them for some time
> instead of letting them run till conclusion.

BTW, iozone (http://www.iozone.org/) might be your friend if you're
looking for a reliable I/O benchmark (w/ -e and -c options to include
fsync and close).

> 
>> ext3 mount options: noatime
>> nfs mount options: rsize=65536,wsize=65536
>> dd options: bs=64k count=10k conv=fsync
>>
>> (write results average of 3 runs)
>> write local disk:     47.6 MB/s
>> write loopback nfsv3: 30.2 MB/s
>> write remote nfsv3:   29.0 MB/s
>> write loopback nfsv4: 37.5 MB/s
>> write remote nfsv4:   29.1 MB/s
>>
>> read local disk:      50.8 MB/s
>> read loopback nfsv3:  27.2 MB/s
>> read remote nfsv3:    21.8 MB/s
>> read loopback nfsv4:  25.4 MB/s
>> read remote nfsv4:    21.4 MB/s
> 
> I used the exact same options you are using, and here is the results
> averaged across 3 runs:
> 
> Write local disk      58.5 MB/s
> Write loopback nfsv3: 29.42 MB/s (50% drop)
> 
> Reading (file created from /dev/urandom, somehow I am getting in GB/sec
>             while your results were comparable to write's):

Apparently the file is cached.  You needed to restart nfs
and remount the file system to make sure it isn't before reading it.
Or, you can create a file larger than your host's cache size so
when you write (or read) it sequentially, its tail evicts its head
out of the cache.  This is a less reliable method, yet creating a
file about 25% larger than the host's memory size should work for you.

Benny

>       Read local disk:      2.77 GB/s
>       Read loopback nfsv3:  2.86 GB/s (higher for some reason)
> 
> Thanks,
> 
> - KK
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-23 12:40             ` Benny Halevy
@ 2008-06-26  7:19               ` Krishna Kumar2
  2008-06-26 17:42                 ` Chuck Lever
  0 siblings, 1 reply; 32+ messages in thread
From: Krishna Kumar2 @ 2008-06-26  7:19 UTC (permalink / raw)
  To: Benny Halevy; +Cc: linux-nfs, Peter Staubach

Benny Halevy <bhalevy@panasas.com> wrote on 06/23/2008 06:10:40 PM:

> Apparently the file is cached.  You needed to restart nfs
> and remount the file system to make sure it isn't before reading it.
> Or, you can create a file larger than your host's cache size so
> when you write (or read) it sequentially, its tail evicts its head
> out of the cache.  This is a less reliable method, yet creating a
> file about 25% larger than the host's memory size should work for you.

I did a umount of all filesystems and restart NFS before testing. Here
is the result:

Local:
      Read:  69.5 MB/s
      Write: 70.0 MB/s
NFS of same FS mounted loopback on same system:
      Read:  29.5 MB/s  (57% drop)
      Write: 27.5 MB/s  (60% drop)

The drops seems exceedingly high. How can I figure out the source of the
problem? Even if it is as general as to be able to state: "Problem is in
the NFS client code" or "Problem is in the NFS server code", or "Problem
can be mitigated by tuning" :-)

Thanks,

- KK


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-26  7:19               ` Krishna Kumar2
@ 2008-06-26 17:42                 ` Chuck Lever
  2008-06-26 17:55                   ` J. Bruce Fields
  2008-06-27  9:04                   ` NFS performance degradation of local loopback FS Krishna Kumar2
  0 siblings, 2 replies; 32+ messages in thread
From: Chuck Lever @ 2008-06-26 17:42 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: Benny Halevy, linux-nfs, Peter Staubach

On Jun 26, 2008, at 3:19 AM, Krishna Kumar2 wrote:
> Benny Halevy <bhalevy@panasas.com> wrote on 06/23/2008 06:10:40 PM:
>
>> Apparently the file is cached.  You needed to restart nfs
>> and remount the file system to make sure it isn't before reading it.
>> Or, you can create a file larger than your host's cache size so
>> when you write (or read) it sequentially, its tail evicts its head
>> out of the cache.  This is a less reliable method, yet creating a
>> file about 25% larger than the host's memory size should work for  
>> you.
>
> I did a umount of all filesystems and restart NFS before testing. Here
> is the result:
>
> Local:
>      Read:  69.5 MB/s
>      Write: 70.0 MB/s
> NFS of same FS mounted loopback on same system:
>      Read:  29.5 MB/s  (57% drop)
>      Write: 27.5 MB/s  (60% drop)
>
> The drops seems exceedingly high. How can I figure out the source of  
> the
> problem? Even if it is as general as to be able to state: "Problem  
> is in
> the NFS client code" or "Problem is in the NFS server code", or  
> "Problem
> can be mitigated by tuning" :-)

It's hard to say what might be the problem just by looking at  
performance results.

You can look at client-side NFS and RPC performance metrics using some  
prototype Python tools that were just added to nfs-utils.  The scripts  
themselves can be downloaded from:

    http://oss.oracle.com/~cel/Linux-2.6/2.6.25

but unfortunately they are not fully documented yet so you will have  
to approach them with an open mind and a sense of experimentation.

You can also capture network traces on your loopback interface to see  
if there is, for example, unexpected congestion or latency, or if  
there are other problems.

But for loopback, the problem is often that the client and server are  
sharing the same physical memory for caching data.  Analyzing your  
test system's physical memory utilization might be revealing.

Otherwise, you should always expect some performance degradation when  
comparing NFS and local disk.  50% is not completely unheard of.  It's  
the price paid for being able to share your file data concurrently  
among multiple clients.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-26 17:42                 ` Chuck Lever
@ 2008-06-26 17:55                   ` J. Bruce Fields
  2008-06-26 21:05                     ` Chuck Lever
  2008-06-27  9:04                   ` NFS performance degradation of local loopback FS Krishna Kumar2
  1 sibling, 1 reply; 32+ messages in thread
From: J. Bruce Fields @ 2008-06-26 17:55 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Krishna Kumar2, Benny Halevy, linux-nfs, Peter Staubach

On Thu, Jun 26, 2008 at 01:42:58PM -0400, Chuck Lever wrote:
> On Jun 26, 2008, at 3:19 AM, Krishna Kumar2 wrote:
>> Benny Halevy <bhalevy@panasas.com> wrote on 06/23/2008 06:10:40 PM:
>>
>>> Apparently the file is cached.  You needed to restart nfs
>>> and remount the file system to make sure it isn't before reading it.
>>> Or, you can create a file larger than your host's cache size so
>>> when you write (or read) it sequentially, its tail evicts its head
>>> out of the cache.  This is a less reliable method, yet creating a
>>> file about 25% larger than the host's memory size should work for  
>>> you.
>>
>> I did a umount of all filesystems and restart NFS before testing. Here
>> is the result:
>>
>> Local:
>>      Read:  69.5 MB/s
>>      Write: 70.0 MB/s
>> NFS of same FS mounted loopback on same system:
>>      Read:  29.5 MB/s  (57% drop)
>>      Write: 27.5 MB/s  (60% drop)
>>
>> The drops seems exceedingly high. How can I figure out the source of  
>> the
>> problem? Even if it is as general as to be able to state: "Problem is 
>> in
>> the NFS client code" or "Problem is in the NFS server code", or  
>> "Problem
>> can be mitigated by tuning" :-)
>
> It's hard to say what might be the problem just by looking at  
> performance results.
>
> You can look at client-side NFS and RPC performance metrics using some  
> prototype Python tools that were just added to nfs-utils.  The scripts  
> themselves can be downloaded from:
>
>    http://oss.oracle.com/~cel/Linux-2.6/2.6.25
>
> but unfortunately they are not fully documented yet so you will have to 
> approach them with an open mind and a sense of experimentation.
>
> You can also capture network traces on your loopback interface to see if 
> there is, for example, unexpected congestion or latency, or if there are 
> other problems.
>
> But for loopback, the problem is often that the client and server are  
> sharing the same physical memory for caching data.  Analyzing your test 
> system's physical memory utilization might be revealing.

If he's just doing a single large read or write with cold caches (sounds
like that's probably the case), then memory probably doesn't matter
much, does it?

--b.

>
> Otherwise, you should always expect some performance degradation when  
> comparing NFS and local disk.  50% is not completely unheard of.  It's  
> the price paid for being able to share your file data concurrently among 
> multiple clients.
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-26 17:55                   ` J. Bruce Fields
@ 2008-06-26 21:05                     ` Chuck Lever
       [not found]                       ` <76bd70e30806261405g9357c6fg51b973ff076ee78b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 32+ messages in thread
From: Chuck Lever @ 2008-06-26 21:05 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Krishna Kumar2, Benny Halevy, linux-nfs, Peter Staubach

On Thu, Jun 26, 2008 at 1:55 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Thu, Jun 26, 2008 at 01:42:58PM -0400, Chuck Lever wrote:
>> On Jun 26, 2008, at 3:19 AM, Krishna Kumar2 wrote:
>>> Benny Halevy <bhalevy@panasas.com> wrote on 06/23/2008 06:10:40 PM:
>>>
>>>> Apparently the file is cached.  You needed to restart nfs
>>>> and remount the file system to make sure it isn't before reading it.
>>>> Or, you can create a file larger than your host's cache size so
>>>> when you write (or read) it sequentially, its tail evicts its head
>>>> out of the cache.  This is a less reliable method, yet creating a
>>>> file about 25% larger than the host's memory size should work for
>>>> you.
>>>
>>> I did a umount of all filesystems and restart NFS before testing. Here
>>> is the result:
>>>
>>> Local:
>>>      Read:  69.5 MB/s
>>>      Write: 70.0 MB/s
>>> NFS of same FS mounted loopback on same system:
>>>      Read:  29.5 MB/s  (57% drop)
>>>      Write: 27.5 MB/s  (60% drop)
>>>
>>> The drops seems exceedingly high. How can I figure out the source of
>>> the
>>> problem? Even if it is as general as to be able to state: "Problem is
>>> in
>>> the NFS client code" or "Problem is in the NFS server code", or
>>> "Problem
>>> can be mitigated by tuning" :-)
>>
>> It's hard to say what might be the problem just by looking at
>> performance results.
>>
>> You can look at client-side NFS and RPC performance metrics using some
>> prototype Python tools that were just added to nfs-utils.  The scripts
>> themselves can be downloaded from:
>>
>>    http://oss.oracle.com/~cel/Linux-2.6/2.6.25
>>
>> but unfortunately they are not fully documented yet so you will have to
>> approach them with an open mind and a sense of experimentation.
>>
>> You can also capture network traces on your loopback interface to see if
>> there is, for example, unexpected congestion or latency, or if there are
>> other problems.
>>
>> But for loopback, the problem is often that the client and server are
>> sharing the same physical memory for caching data.  Analyzing your test
>> system's physical memory utilization might be revealing.
>
> If he's just doing a single large read or write with cold caches (sounds
> like that's probably the case), then memory probably doesn't matter
> much, does it?

I expect it might.

The client and server would contend for available physical memory as
the file was first read in from the physical file system by the
server, and then a second copy was cached by the client.

A file as small as half the available physical memory on his system
could trigger this behavior.

On older 2.6 kernels (.18 or so), both the server's physical file
system and the client would trigger bdi congestion throttling.

--
Chuck Lever
chu ckl eve rat ora cle dot com

^ permalink raw reply	[flat|nested] 32+ messages in thread

[parent not found: <76bd70e30806261405g9357c6fg51b973ff076ee78b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: kernel hacker's pub night
       [not found]                       ` <76bd70e30806261405g9357c6fg51b973ff076ee78b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-06-26 21:22                         ` J. Bruce Fields
  2008-06-26 21:24                           ` J. Bruce Fields
  0 siblings, 1 reply; 32+ messages in thread
From: J. Bruce Fields @ 2008-06-26 21:22 UTC (permalink / raw)
  To: chucklever; +Cc: Krishna Kumar2, Benny Halevy, linux-nfs, Peter Staubach

On Thu, Jun 26, 2008 at 05:05:44PM -0400, Chuck Lever wrote:
> On Thu, Jun 26, 2008 at 1:55 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Thu, Jun 26, 2008 at 01:42:58PM -0400, Chuck Lever wrote:
> >> On Jun 26, 2008, at 3:19 AM, Krishna Kumar2 wrote:
> >>> Benny Halevy <bhalevy@panasas.com> wrote on 06/23/2008 06:10:40 PM:
> >>>
> >>>> Apparently the file is cached.  You needed to restart nfs
> >>>> and remount the file system to make sure it isn't before reading it.
> >>>> Or, you can create a file larger than your host's cache size so
> >>>> when you write (or read) it sequentially, its tail evicts its head
> >>>> out of the cache.  This is a less reliable method, yet creating a
> >>>> file about 25% larger than the host's memory size should work for
> >>>> you.
> >>>
> >>> I did a umount of all filesystems and restart NFS before testing. Here
> >>> is the result:
> >>>
> >>> Local:
> >>>      Read:  69.5 MB/s
> >>>      Write: 70.0 MB/s
> >>> NFS of same FS mounted loopback on same system:
> >>>      Read:  29.5 MB/s  (57% drop)
> >>>      Write: 27.5 MB/s  (60% drop)
> >>>
> >>> The drops seems exceedingly high. How can I figure out the source of
> >>> the
> >>> problem? Even if it is as general as to be able to state: "Problem is
> >>> in
> >>> the NFS client code" or "Problem is in the NFS server code", or
> >>> "Problem
> >>> can be mitigated by tuning" :-)
> >>
> >> It's hard to say what might be the problem just by looking at
> >> performance results.
> >>
> >> You can look at client-side NFS and RPC performance metrics using some
> >> prototype Python tools that were just added to nfs-utils.  The scripts
> >> themselves can be downloaded from:
> >>
> >>    http://oss.oracle.com/~cel/Linux-2.6/2.6.25
> >>
> >> but unfortunately they are not fully documented yet so you will have to
> >> approach them with an open mind and a sense of experimentation.
> >>
> >> You can also capture network traces on your loopback interface to see if
> >> there is, for example, unexpected congestion or latency, or if there are
> >> other problems.
> >>
> >> But for loopback, the problem is often that the client and server are
> >> sharing the same physical memory for caching data.  Analyzing your test
> >> system's physical memory utilization might be revealing.
> >
> > If he's just doing a single large read or write with cold caches (sounds
> > like that's probably the case), then memory probably doesn't matter
> > much, does it?
> 
> I expect it might.
> 
> The client and server would contend for available physical memory as
> the file was first read in from the physical file system by the
> server, and then a second copy was cached by the client.
> 
> A file as small as half the available physical memory on his system
> could trigger this behavior.

So, forgive me for being naive about this stuff, but I would've thought
that the cached pages (which have been read once and then never touched
again) would just be discarded, and life would continue.  Otherwise how
would the kernel be able to get acceptable streaming read performance in
any situation? 

This doesn't sound fundamentally different, e.g., from doing streaming
reads two files on different filesystems at once.

> On older 2.6 kernels (.18 or so), both the server's physical file
> system and the client would trigger bdi congestion throttling.

How does that work?

--b.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kernel hacker's pub night
  2008-06-26 21:22                         ` kernel hacker's pub night J. Bruce Fields
@ 2008-06-26 21:24                           ` J. Bruce Fields
  2008-06-27  7:14                             ` Benny Halevy
  0 siblings, 1 reply; 32+ messages in thread
From: J. Bruce Fields @ 2008-06-26 21:24 UTC (permalink / raw)
  To: chucklever; +Cc: Krishna Kumar2, Benny Halevy, linux-nfs, Peter Staubach

Err, wow, I have no idea how I managed to screw up that subject line.
(But, hey, if you happen to be on this list and live in Ann Arbor, MI,
feel free to join us at Grizzly Peak in a half hour.)

--b.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: kernel hacker's pub night
  2008-06-26 21:24                           ` J. Bruce Fields
@ 2008-06-27  7:14                             ` Benny Halevy
  0 siblings, 0 replies; 32+ messages in thread
From: Benny Halevy @ 2008-06-27  7:14 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: chucklever, Krishna Kumar2, linux-nfs, Peter Staubach

On Jun. 27, 2008, 0:24 +0300, "J. Bruce Fields" <bfields@fieldses.org> wrote:
> Err, wow, I have no idea how I managed to screw up that subject line.
> (But, hey, if you happen to be on this list and live in Ann Arbor, MI,
> feel free to join us at Grizzly Peak in a half hour.)
> 
> --b.

Geez, missed that ;-(
that's too short of a notice for me :)

Benny

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-26 17:42                 ` Chuck Lever
  2008-06-26 17:55                   ` J. Bruce Fields
@ 2008-06-27  9:04                   ` Krishna Kumar2
  2008-06-27 14:06                     ` Chuck Lever
                                       ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: Krishna Kumar2 @ 2008-06-27  9:04 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Benny Halevy, linux-nfs, Peter Staubach, J. Bruce Fields

Chuck Lever <chuck.lever@oracle.com> wrote on 06/26/2008 11:12:58 PM:

> > Local:
> >      Read:  69.5 MB/s
> >      Write: 70.0 MB/s
> > NFS of same FS mounted loopback on same system:
> >      Read:  29.5 MB/s  (57% drop)
> >      Write: 27.5 MB/s  (60% drop)
>
> You can look at client-side NFS and RPC performance metrics using some
> prototype Python tools that were just added to nfs-utils.  The scripts
> themselves can be downloaded from:
>     http://oss.oracle.com/~cel/Linux-2.6/2.6.25
> but unfortunately they are not fully documented yet so you will have
> to approach them with an open mind and a sense of experimentation.
>
> You can also capture network traces on your loopback interface to see
> if there is, for example, unexpected congestion or latency, or if
> there are other problems.
>
> But for loopback, the problem is often that the client and server are
> sharing the same physical memory for caching data.  Analyzing your
> test system's physical memory utilization might be revealing.

But loopback is better than actual network traffic. If my file size is
less than half the available physical memory, then this should not be
a problem, right? The server caches the file data (64K at a time), and
sends to the client (on the same system) and the client has a local
copy. I am testing today with that assumption.

My system has 4GB memory, of which 3.4GB is free before running the test.
I created a 1.46GB (so that double that size for server/client copies will
not be more than 3GB) file by running:
      dd if=/dev/zero of=smaller_file bs=65536 count=24000

To measure the time exactly for just the I/O part, I have a small program
that
reads data in chunks of 64K and discards it "while (read(fd, buf, 64K) >
0)",
with a gettimeofday before and after it to measure bandwidth. For each run,
the script does (psuedo): "umount /nfs, stop nfs server, umount /local,
mount /local, start nfs server, and mount /nfs". The result is:

Testing on /local
      Time: 38.4553     BW:39.01 MB/s
      Time: 38.3073     BW:39.16 MB/s
      Time: 38.3807     BW:39.08 MB/s
      Time: 38.3724     BW:39.09 MB/s
      Time: 38.3463     BW:39.12 MB/s
Testing on /nfs
      Time: 52.4386     BW:28.60 MB/s
      Time: 50.7531     BW:29.55 MB/s
      Time: 50.8296     BW:29.51 MB/s
      Time: 48.2363     BW:31.10 MB/s
      Time: 51.1992     BW:29.30 MB/s

Average bandwidth drop across 5 runs is 24.24%.

Memory stats *before* and *after* one run for /local and /nfs is:

********** local.start ******
MemFree:       3500700 kB
Cached:         317076 kB
Inactive:       249356 kB

********** local.end ********
MemFree:       1961872 kB
Cached:        1853100 kB
Inactive:      1785028 kB

********** nfs.start ********
MemFree:       3480456 kB
Cached:         317072 kB
Inactive:       252740 kB

********** nfs.end **********
MemFree:        400892 kB
Cached:        3389164 kB
Inactive:      3324800 kB

I don't know if this is useful but looking at ratios:
Memfree increased almost 5 times from 1.78 (Memfree before / Memfree after)
to 8.68 for /local and /nfs respectively. Inactive almost doubled from 7.15
times to 13.15 times for /local and /nfs (Inactive after / Inactive
before),
and Cached also almost doubled from 5.84 times to 10.69 times (same for
Cached).

> Otherwise, you should always expect some performance degradation when
> comparing NFS and local disk.  50% is not completely unheard of.  It's
> the price paid for being able to share your file data concurrently
> among multiple clients.

But if the file is being shared only with one client (and that too
locally),
isn't 25% too high?

Will I get better results on NFSv4, and should I try delegation (that
sounds
automatic and not something that the user has to start)?

Thanks,

- KK

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-27  9:04                   ` NFS performance degradation of local loopback FS Krishna Kumar2
@ 2008-06-27 14:06                     ` Chuck Lever
       [not found]                       ` <76bd70e30806270706x7cbfd291l6cb6d0cc5e81771-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2008-06-27 17:44                     ` J. Bruce Fields
  2008-06-27 18:06                     ` Dean Hildebrand
  2 siblings, 1 reply; 32+ messages in thread
From: Chuck Lever @ 2008-06-27 14:06 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: Benny Halevy, linux-nfs, Peter Staubach, J. Bruce Fields

On Fri, Jun 27, 2008 at 5:04 AM, Krishna Kumar2 <krkumar2@in.ibm.com> wrote:
> Chuck Lever <chuck.lever@oracle.com> wrote on 06/26/2008 11:12:58 PM:
>> > Local:
>> >      Read:  69.5 MB/s
>> >      Write: 70.0 MB/s
>> > NFS of same FS mounted loopback on same system:
>> >      Read:  29.5 MB/s  (57% drop)
>> >      Write: 27.5 MB/s  (60% drop)
>>
>> You can look at client-side NFS and RPC performance metrics using some
>> prototype Python tools that were just added to nfs-utils.  The scripts
>> themselves can be downloaded from:
>>     http://oss.oracle.com/~cel/Linux-2.6/2.6.25
>> but unfortunately they are not fully documented yet so you will have
>> to approach them with an open mind and a sense of experimentation.
>>
>> You can also capture network traces on your loopback interface to see
>> if there is, for example, unexpected congestion or latency, or if
>> there are other problems.
>>
>> But for loopback, the problem is often that the client and server are
>> sharing the same physical memory for caching data.  Analyzing your
>> test system's physical memory utilization might be revealing.
>
> But loopback is better than actual network traffic.

What precisely do you mean by that?

You are testing with the client and server on the same machine.  Is
the loopback mount over the lo interface, but you mount the machine's
actual IP address for the "network" test?

I would expect that in that case, loopback would perform better
because a memory copy is always faster than going through the network
stack and the NIC.

It would be interesting to compare a network-only performance test
(like iPerf) for loopback and for going through the NIC.

> If my file size is
> less than half the available physical memory, then this should not be
> a problem, right?

It is likely not a problem in that case, but you never know until you
have analyzed the network traffic carefully to see what's going on.

>> Otherwise, you should always expect some performance degradation when
>> comparing NFS and local disk.  50% is not completely unheard of.  It's
>> the price paid for being able to share your file data concurrently
>> among multiple clients.
>
> But if the file is being shared only with one client (and that too
> locally), isn't 25% too high?

NFS always allows the possibility of sharing, so it doesn't matter how
many clients have mounted the server.

The distinction I'm drawing here is between something like iSCSI,
where only a single client ever mounts a LUN, and thus can cache
aggressively, versus NFS in the same environment, where the client has
to assume that any other client can access a file at any time, and
therefore must cache more conservatively.

You are doing cold cache tests, so this may not be at issue here either.

A 25% performance drop between a 'dd' directly on the server, and one
from an NFS client, is probably typical.

> Will I get better results on NFSv4, and should I try delegation (that
> sounds automatic and not something that the user has to start)?

It's hard to predict if NFSv4 will help because we don't understand
what is causing your performance drop yet.

Delegation is usually automatic if the client's mount command has
generated a plausible callback IP address, and the server is
successfully able to connect to it.  However, I didn't think the
server hands out a delegation until the second OPEN... with a single
dd, the client opens the file only once.

--
Chuck Lever

^ permalink raw reply	[flat|nested] 32+ messages in thread

[parent not found: <76bd70e30806270706x7cbfd291l6cb6d0cc5e81771-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: NFS performance degradation of local loopback FS.
       [not found]                       ` <76bd70e30806270706x7cbfd291l6cb6d0cc5e81771-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-06-30  9:57                         ` Krishna Kumar2
  2008-06-30 15:25                           ` Chuck Lever
  0 siblings, 1 reply; 32+ messages in thread
From: Krishna Kumar2 @ 2008-06-30  9:57 UTC (permalink / raw)
  To: chucklever
  Cc: J. Bruce Fields, Benny Halevy, chucklever, linux-nfs,
	Peter Staubach

chucklever@gmail.com wrote on 06/27/2008 07:36:44 PM:

> > But loopback is better than actual network traffic.
>
> What precisely do you mean by that?

Sorry I was not clear. I meant that the loopback will be better than
actual traffic between different server/client.

> You are testing with the client and server on the same machine.  Is
> the loopback mount over the lo interface, but you mount the machine's
> actual IP address for the "network" test?

Actually isn't that the same? I am using localhost in any case.

> It would be interesting to compare a network-only performance test
> (like iPerf) for loopback and for going through the NIC.

iperf (one thread, 64K I/O size, 30 secs):
      NIC: 445 MB/s
      Loopback: 735 MB/s

In retrospect, for disk I/O:
      /local: 39 MB/s
      /nfs (loopback): 29 MB/s            (25.5% drop)
      /nfs (from a real server): 27 MB/s  (30.5% drop, only point is that
            this is a different disk on a different system and it doesn't
make
            much sense to compare this to /local).

Thanks,

- KK


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-30  9:57                         ` Krishna Kumar2
@ 2008-06-30 15:25                           ` Chuck Lever
       [not found]                             ` <76bd70e30806300825t6490477dpb8ce3ee48a0a6777-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 32+ messages in thread
From: Chuck Lever @ 2008-06-30 15:25 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: J. Bruce Fields, Benny Halevy, linux-nfs, Peter Staubach

On Mon, Jun 30, 2008 at 5:57 AM, Krishna Kumar2 <krkumar2@in.ibm.com> wrote:
> chucklever@gmail.com wrote on 06/27/2008 07:36:44 PM:
>
>> > But loopback is better than actual network traffic.
>>
>> What precisely do you mean by that?
>
> Sorry I was not clear. I meant that the loopback will be better than
> actual traffic between different server/client.
>
>> You are testing with the client and server on the same machine.  Is
>> the loopback mount over the lo interface, but you mount the machine's
>> actual IP address for the "network" test?
>
> Actually isn't that the same? I am using localhost in any case.

As I understand it, "lo" is effectively a virtualized network device
with point-to-point routing.  Looping back through a real NIC can, in
many cases, go all the way down to the network hardware and back, and
is likely subject to routing decisions in your system's network layer.
 So I would expect them to be different in most cases.

>> It would be interesting to compare a network-only performance test
>> (like iPerf) for loopback and for going through the NIC.
>
> iperf (one thread, 64K I/O size, 30 secs):
>      NIC: 445 MB/s
>      Loopback: 735 MB/s

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 32+ messages in thread

[parent not found: <76bd70e30806300825t6490477dpb8ce3ee48a0a6777-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: NFS performance degradation of local loopback FS.
       [not found]                             ` <76bd70e30806300825t6490477dpb8ce3ee48a0a6777-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-07-01  3:43                               ` Krishna Kumar2
  0 siblings, 0 replies; 32+ messages in thread
From: Krishna Kumar2 @ 2008-07-01  3:43 UTC (permalink / raw)
  To: chucklever
  Cc: J. Bruce Fields, Benny Halevy, chucklever, linux-nfs,
	Peter Staubach

Hi Chuck,

> As I understand it, "lo" is effectively a virtualized network device
> with point-to-point routing.  Looping back through a real NIC can, in
> many cases, go all the way down to the network hardware and back, and
> is likely subject to routing decisions in your system's network layer.
>  So I would expect them to be different in most cases.

Atleast in the linux stack, if you address a local network device, the
kernel does a route lookup to figure out which interface to send the
packet out on, and this results in using lo.

Thanks,

- KK


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-27  9:04                   ` NFS performance degradation of local loopback FS Krishna Kumar2
  2008-06-27 14:06                     ` Chuck Lever
@ 2008-06-27 17:44                     ` J. Bruce Fields
  2008-06-27 18:06                     ` Dean Hildebrand
  2 siblings, 0 replies; 32+ messages in thread
From: J. Bruce Fields @ 2008-06-27 17:44 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: Chuck Lever, Benny Halevy, linux-nfs, Peter Staubach

On Fri, Jun 27, 2008 at 02:34:24PM +0530, Krishna Kumar2 wrote:
> But if the file is being shared only with one client (and that too
> locally),
> isn't 25% too high?
> 
> Will I get better results on NFSv4, and should I try delegation (that
> sounds
> automatic and not something that the user has to start)?

No, delegation couldn't possibly help in this case--more caching can't
help if you're only reading the file once.

--b.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-27  9:04                   ` NFS performance degradation of local loopback FS Krishna Kumar2
  2008-06-27 14:06                     ` Chuck Lever
  2008-06-27 17:44                     ` J. Bruce Fields
@ 2008-06-27 18:06                     ` Dean Hildebrand
  2008-06-30 10:10                       ` Krishna Kumar2
  2 siblings, 1 reply; 32+ messages in thread
From: Dean Hildebrand @ 2008-06-27 18:06 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: Chuck Lever, Benny Halevy, linux-nfs, Peter Staubach,
	J. Bruce Fields

One option might be to try using O_DIRECT if you are worried about 
memory (although I would read/write in at least 1 MB at a time).  I 
would expect this to help at least a bit especially on reads.

Also, check all the standard nfs tuning stuff, #nfsds, #rpc slots.  
Since with a loopback you effectively have no latency, you would want to 
ensure that neither the #nfsds or #rpc slots is a bottleneck (if either 
one is too low, you will have a problem).  One way to reduce the # of 
requests and therefore require fewer nfsds/rpc_slots is to 'cat 
/proc/mounts' to see your wsize/rsize.  Ensure your wsize/rsize is a 
decent size (~ 1MB).

Dean

Krishna Kumar2 wrote:
> Chuck Lever <chuck.lever@oracle.com> wrote on 06/26/2008 11:12:58 PM:
>
>   
>>> Local:
>>>      Read:  69.5 MB/s
>>>      Write: 70.0 MB/s
>>> NFS of same FS mounted loopback on same system:
>>>      Read:  29.5 MB/s  (57% drop)
>>>      Write: 27.5 MB/s  (60% drop)
>>>       
>> You can look at client-side NFS and RPC performance metrics using some
>> prototype Python tools that were just added to nfs-utils.  The scripts
>> themselves can be downloaded from:
>>     http://oss.oracle.com/~cel/Linux-2.6/2.6.25
>> but unfortunately they are not fully documented yet so you will have
>> to approach them with an open mind and a sense of experimentation.
>>
>> You can also capture network traces on your loopback interface to see
>> if there is, for example, unexpected congestion or latency, or if
>> there are other problems.
>>
>> But for loopback, the problem is often that the client and server are
>> sharing the same physical memory for caching data.  Analyzing your
>> test system's physical memory utilization might be revealing.
>>     
>
> But loopback is better than actual network traffic. If my file size is
> less than half the available physical memory, then this should not be
> a problem, right? The server caches the file data (64K at a time), and
> sends to the client (on the same system) and the client has a local
> copy. I am testing today with that assumption.
>
> My system has 4GB memory, of which 3.4GB is free before running the test.
> I created a 1.46GB (so that double that size for server/client copies will
> not be more than 3GB) file by running:
>       dd if=/dev/zero of=smaller_file bs=65536 count=24000
>
> To measure the time exactly for just the I/O part, I have a small program
> that
> reads data in chunks of 64K and discards it "while (read(fd, buf, 64K) >
> 0)",
> with a gettimeofday before and after it to measure bandwidth. For each run,
> the script does (psuedo): "umount /nfs, stop nfs server, umount /local,
> mount /local, start nfs server, and mount /nfs". The result is:
>
> Testing on /local
>       Time: 38.4553     BW:39.01 MB/s
>       Time: 38.3073     BW:39.16 MB/s
>       Time: 38.3807     BW:39.08 MB/s
>       Time: 38.3724     BW:39.09 MB/s
>       Time: 38.3463     BW:39.12 MB/s
> Testing on /nfs
>       Time: 52.4386     BW:28.60 MB/s
>       Time: 50.7531     BW:29.55 MB/s
>       Time: 50.8296     BW:29.51 MB/s
>       Time: 48.2363     BW:31.10 MB/s
>       Time: 51.1992     BW:29.30 MB/s
>
> Average bandwidth drop across 5 runs is 24.24%.
>
> Memory stats *before* and *after* one run for /local and /nfs is:
>
> ********** local.start ******
> MemFree:       3500700 kB
> Cached:         317076 kB
> Inactive:       249356 kB
>
> ********** local.end ********
> MemFree:       1961872 kB
> Cached:        1853100 kB
> Inactive:      1785028 kB
>
> ********** nfs.start ********
> MemFree:       3480456 kB
> Cached:         317072 kB
> Inactive:       252740 kB
>
> ********** nfs.end **********
> MemFree:        400892 kB
> Cached:        3389164 kB
> Inactive:      3324800 kB
>
> I don't know if this is useful but looking at ratios:
> Memfree increased almost 5 times from 1.78 (Memfree before / Memfree after)
> to 8.68 for /local and /nfs respectively. Inactive almost doubled from 7.15
> times to 13.15 times for /local and /nfs (Inactive after / Inactive
> before),
> and Cached also almost doubled from 5.84 times to 10.69 times (same for
> Cached).
>
>   
>> Otherwise, you should always expect some performance degradation when
>> comparing NFS and local disk.  50% is not completely unheard of.  It's
>> the price paid for being able to share your file data concurrently
>> among multiple clients.
>>     
>
> But if the file is being shared only with one client (and that too
> locally),
> isn't 25% too high?
>
> Will I get better results on NFSv4, and should I try delegation (that
> sounds
> automatic and not something that the user has to start)?
>
> Thanks,
>
> - KK
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-27 18:06                     ` Dean Hildebrand
@ 2008-06-30 10:10                       ` Krishna Kumar2
  2008-06-30 15:26                         ` Jeff Layton
  2008-06-30 15:30                         ` Chuck Lever
  0 siblings, 2 replies; 32+ messages in thread
From: Krishna Kumar2 @ 2008-06-30 10:10 UTC (permalink / raw)
  To: Dean Hildebrand
  Cc: J. Bruce Fields, Benny Halevy, Chuck Lever, linux-nfs,
	Peter Staubach

Dean Hildebrand <seattleplus@gmail.com> wrote on 06/27/2008 11:36:28 PM:

> One option might be to try using O_DIRECT if you are worried about
> memory (although I would read/write in at least 1 MB at a time).  I
> would expect this to help at least a bit especially on reads.
>
> Also, check all the standard nfs tuning stuff, #nfsds, #rpc slots.
> Since with a loopback you effectively have no latency, you would want to
> ensure that neither the #nfsds or #rpc slots is a bottleneck (if either
> one is too low, you will have a problem).  One way to reduce the # of
> requests and therefore require fewer nfsds/rpc_slots is to 'cat
> /proc/mounts' to see your wsize/rsize.  Ensure your wsize/rsize is a
> decent size (~ 1MB).

Number of nfsd: 64, and
      sunrpc.transports = sunrpc.udp_slot_table_entries = 128
      sunrpc.tcp_slot_table_entries = 128

I am using:

      mount -o
rw,bg,hard,nointr,proto=tcp,vers=3,rsize=65536,wsize=65536,timeo=600,noatime
 localhost:/local /nfs

I have also tried with 1MB for both rsize/wsize and it didn't change the BW
(other than
mini variations).

thanks,

- KK


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-30 10:10                       ` Krishna Kumar2
@ 2008-06-30 15:26                         ` Jeff Layton
       [not found]                           ` <20080630112654.012ce3e4-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>
  2008-06-30 15:30                         ` Chuck Lever
  1 sibling, 1 reply; 32+ messages in thread
From: Jeff Layton @ 2008-06-30 15:26 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: Dean Hildebrand, J. Bruce Fields, Benny Halevy, Chuck Lever,
	linux-nfs, Peter Staubach

On Mon, 30 Jun 2008 15:40:30 +0530
Krishna Kumar2 <krkumar2@in.ibm.com> wrote:

> Dean Hildebrand <seattleplus@gmail.com> wrote on 06/27/2008 11:36:28 PM:
> 
> > One option might be to try using O_DIRECT if you are worried about
> > memory (although I would read/write in at least 1 MB at a time).  I
> > would expect this to help at least a bit especially on reads.
> >
> > Also, check all the standard nfs tuning stuff, #nfsds, #rpc slots.
> > Since with a loopback you effectively have no latency, you would want to
> > ensure that neither the #nfsds or #rpc slots is a bottleneck (if either
> > one is too low, you will have a problem).  One way to reduce the # of
> > requests and therefore require fewer nfsds/rpc_slots is to 'cat
> > /proc/mounts' to see your wsize/rsize.  Ensure your wsize/rsize is a
> > decent size (~ 1MB).
> 
> Number of nfsd: 64, and
>       sunrpc.transports = sunrpc.udp_slot_table_entries = 128
>       sunrpc.tcp_slot_table_entries = 128
> 
> I am using:
> 
>       mount -o
> rw,bg,hard,nointr,proto=tcp,vers=3,rsize=65536,wsize=65536,timeo=600,noatime
>  localhost:/local /nfs
> 
> I have also tried with 1MB for both rsize/wsize and it didn't change the BW
> (other than
> mini variations).
> 
> thanks,
> 
> - KK
> 

Recently I spent some time with others here at Red Hat looking
at problems with nfs server performance. One thing we found was that
there are some problems with multiple nfsd's. It seems like the I/O
scheduling or something is fooled by the fact that sequential write
calls are often handled by different nfsd's. This can negatively
impact performance (I don't think we've tracked this down completely
yet, however).

Since you're just doing some single-threaded testing on the client
side, it might be interesting to try running a single nfsd and testing
performance with that. It might provide an interesting data point.

Some other thoughts of things to try:

1) run the tests against an exported tmpfs filesystem to eliminate
underlying disk performance as a factor.

2) test nfsv4 -- nfsd opens and closes the file for each read/write.
nfsv4 is statelful, however, so I don't believe it does that there.

As others have pointed out though, testing with client and server on
the same machine is not necessarily eliminating performance
bottlenecks. You may want to test with dedicated clients and servers
(maybe on a nice fast network or with a gigE crossover cable or
something).

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

[parent not found: <20080630112654.012ce3e4-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>]

* Re: NFS performance degradation of local loopback FS.
       [not found]                           ` <20080630112654.012ce3e4-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>
@ 2008-06-30 15:35                             ` J. Bruce Fields
  2008-06-30 16:00                               ` Chuck Lever
  2008-07-01 10:19                               ` Krishna Kumar2
  2008-06-30 15:35                             ` Chuck Lever
  2008-07-01  5:07                             ` Krishna Kumar2
  2 siblings, 2 replies; 32+ messages in thread
From: J. Bruce Fields @ 2008-06-30 15:35 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Krishna Kumar2, Dean Hildebrand, Benny Halevy, Chuck Lever,
	linux-nfs, Peter Staubach, aglo

On Mon, Jun 30, 2008 at 11:26:54AM -0400, Jeff Layton wrote:
> Recently I spent some time with others here at Red Hat looking
> at problems with nfs server performance. One thing we found was that
> there are some problems with multiple nfsd's. It seems like the I/O
> scheduling or something is fooled by the fact that sequential write
> calls are often handled by different nfsd's. This can negatively
> impact performance (I don't think we've tracked this down completely
> yet, however).

Yes, we've been trying to see how close to full network speed we can get
over a 10 gig network and have run into situations where increasing the
number of threads (without changing anything else) seems to decrease
performance of a simple sequential write.

And the hypothesis that the problem was randomized IO scheduling was the
first thing that came to mind.  But I'm not sure what the easiest way
would be to really prove that that was the problem.

And then once we really are sure that's the problem, I'm not sure what
to do about it.  I suppose it may depend partly on exactly where the
reordering is happening.

--b.

> 
> Since you're just doing some single-threaded testing on the client
> side, it might be interesting to try running a single nfsd and testing
> performance with that. It might provide an interesting data point.
> 
> Some other thoughts of things to try:
> 
> 1) run the tests against an exported tmpfs filesystem to eliminate
> underlying disk performance as a factor.
> 
> 2) test nfsv4 -- nfsd opens and closes the file for each read/write.
> nfsv4 is statelful, however, so I don't believe it does that there.
> 
> As others have pointed out though, testing with client and server on
> the same machine is not necessarily eliminating performance
> bottlenecks. You may want to test with dedicated clients and servers
> (maybe on a nice fast network or with a gigE crossover cable or
> something).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-30 15:35                             ` J. Bruce Fields
@ 2008-06-30 16:00                               ` Chuck Lever
  2008-07-01 10:19                               ` Krishna Kumar2
  1 sibling, 0 replies; 32+ messages in thread
From: Chuck Lever @ 2008-06-30 16:00 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, Krishna Kumar2, Dean Hildebrand, Benny Halevy,
	linux-nfs, Peter Staubach, aglo

[-- Attachment #1: Type: text/plain, Size: 1502 bytes --]

J. Bruce Fields wrote:
> On Mon, Jun 30, 2008 at 11:26:54AM -0400, Jeff Layton wrote:
>> Recently I spent some time with others here at Red Hat looking
>> at problems with nfs server performance. One thing we found was that
>> there are some problems with multiple nfsd's. It seems like the I/O
>> scheduling or something is fooled by the fact that sequential write
>> calls are often handled by different nfsd's. This can negatively
>> impact performance (I don't think we've tracked this down completely
>> yet, however).
> 
> Yes, we've been trying to see how close to full network speed we can get
> over a 10 gig network and have run into situations where increasing the
> number of threads (without changing anything else) seems to decrease
> performance of a simple sequential write.
> 
> And the hypothesis that the problem was randomized IO scheduling was the
> first thing that came to mind.  But I'm not sure what the easiest way
> would be to really prove that that was the problem.

Here's an easy way for reads:  instrument the VFS code that manages 
read-ahead contexts.  Probably not an issue for krkumar2, since the file 
from one of the read tests is small enough to fit in the server's cache, 
and the other read test involves only /dev/null.

I had always thought wdelay would mitigate write request re-ordering, 
but I've never looked at how it's implemented in Linux's nfsd.  Of 
course, if the client is sending too many COMMIT requests, this will 
negate the benefit of wdelay.

[-- Attachment #2: chuck_lever.vcf --]
[-- Type: text/x-vcard, Size: 259 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-30 15:35                             ` J. Bruce Fields
  2008-06-30 16:00                               ` Chuck Lever
@ 2008-07-01 10:19                               ` Krishna Kumar2
  2008-07-01 12:47                                 ` Jeff Layton
  1 sibling, 1 reply; 32+ messages in thread
From: Krishna Kumar2 @ 2008-07-01 10:19 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: aglo, Benny Halevy, Chuck Lever, Jeff Layton, linux-nfs,
	Dean Hildebrand, Peter Staubach

"J. Bruce Fields" <bfields@fieldses.org> wrote on 06/30/2008 09:05:41 PM:

> On Mon, Jun 30, 2008 at 11:26:54AM -0400, Jeff Layton wrote:
> > Recently I spent some time with others here at Red Hat looking
> > at problems with nfs server performance. One thing we found was that
> > there are some problems with multiple nfsd's. It seems like the I/O
> > scheduling or something is fooled by the fact that sequential write
> > calls are often handled by different nfsd's. This can negatively
> > impact performance (I don't think we've tracked this down completely
> > yet, however).
>
> Yes, we've been trying to see how close to full network speed we can get
> over a 10 gig network and have run into situations where increasing the
> number of threads (without changing anything else) seems to decrease
> performance of a simple sequential write.
>
> And the hypothesis that the problem was randomized IO scheduling was the
> first thing that came to mind.  But I'm not sure what the easiest way
> would be to really prove that that was the problem.
>
> And then once we really are sure that's the problem, I'm not sure what
> to do about it.  I suppose it may depend partly on exactly where the
> reordering is happening.

For 1 process, this theory seems to work:
1 testing process: /local:            39.11 MB/s
        64 nfsd's:                    29.63 MB/s
        1 nfs'd:                      38.99 MB/s


However for 6 processes reading 6 different files:
6 parallel testing processes: /local: 70 MB/s
        1 nfs'd:                      36 MB/s (49% drop)
        2 nfs'd:                      37.7 MB/s (46% drop)
        4 nfs'd:                      38.6 MB/s (44.9% drop)
        4 nfsd's on different cpu's:  37.5 MB/s (46% drop)
        32 nfs'd:                     38.3 MB/s (45% drop)
        64 nfs'd:                     38.3 MB/s (45% drop)

Thanks,

- KK


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-07-01 10:19                               ` Krishna Kumar2
@ 2008-07-01 12:47                                 ` Jeff Layton
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Layton @ 2008-07-01 12:47 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: J. Bruce Fields, aglo, Benny Halevy, Chuck Lever, linux-nfs,
	Dean Hildebrand, Peter Staubach

On Tue, 1 Jul 2008 15:49:44 +0530
Krishna Kumar2 <krkumar2@in.ibm.com> wrote:

> "J. Bruce Fields" <bfields@fieldses.org> wrote on 06/30/2008 09:05:41 PM:
> 
> > On Mon, Jun 30, 2008 at 11:26:54AM -0400, Jeff Layton wrote:
> > > Recently I spent some time with others here at Red Hat looking
> > > at problems with nfs server performance. One thing we found was that
> > > there are some problems with multiple nfsd's. It seems like the I/O
> > > scheduling or something is fooled by the fact that sequential write
> > > calls are often handled by different nfsd's. This can negatively
> > > impact performance (I don't think we've tracked this down completely
> > > yet, however).
> >
> > Yes, we've been trying to see how close to full network speed we can get
> > over a 10 gig network and have run into situations where increasing the
> > number of threads (without changing anything else) seems to decrease
> > performance of a simple sequential write.
> >
> > And the hypothesis that the problem was randomized IO scheduling was the
> > first thing that came to mind.  But I'm not sure what the easiest way
> > would be to really prove that that was the problem.
> >
> > And then once we really are sure that's the problem, I'm not sure what
> > to do about it.  I suppose it may depend partly on exactly where the
> > reordering is happening.
> 
> For 1 process, this theory seems to work:
> 1 testing process: /local:            39.11 MB/s
>         64 nfsd's:                    29.63 MB/s
>         1 nfs'd:                      38.99 MB/s
> 
> 
> However for 6 processes reading 6 different files:
> 6 parallel testing processes: /local: 70 MB/s
>         1 nfs'd:                      36 MB/s (49% drop)
>         2 nfs'd:                      37.7 MB/s (46% drop)
>         4 nfs'd:                      38.6 MB/s (44.9% drop)
>         4 nfsd's on different cpu's:  37.5 MB/s (46% drop)
>         32 nfs'd:                     38.3 MB/s (45% drop)
>         64 nfs'd:                     38.3 MB/s (45% drop)
> 

That makes some sense, I think...

What's happening is that the processes on the client doing the I/O are
being "masqueraded" behind the nfsd's. This is throwing off readahead
(and maybe other predictive I/O optimizations?). These optimizations
help when a single thread is doing I/O, but when a single process is
feeding multiple nfsd's or multiple processes are spewing I/O to a
single nfsd, it falls back to random I/O behavior.

Also in the single nfsd case, you're also being bottlenecked by the
fact that all of the I/O is serialized. Not a problem with a single
client-side process, but it may be a significant slowdown when there
are multiple writers on the client.

We have a RHBZ open on this for RHEL5:

https://bugzilla.redhat.com/show_bug.cgi?id=448130

...there is a partial workaround described there as well.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
       [not found]                           ` <20080630112654.012ce3e4-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>
  2008-06-30 15:35                             ` J. Bruce Fields
@ 2008-06-30 15:35                             ` Chuck Lever
  2008-07-01  5:07                             ` Krishna Kumar2
  2 siblings, 0 replies; 32+ messages in thread
From: Chuck Lever @ 2008-06-30 15:35 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Krishna Kumar2, Dean Hildebrand, J. Bruce Fields, Benny Halevy,
	linux-nfs, Peter Staubach

On Mon, Jun 30, 2008 at 11:26 AM, Jeff Layton <jlayton@redhat.com> wrote:
> On Mon, 30 Jun 2008 15:40:30 +0530
> Krishna Kumar2 <krkumar2@in.ibm.com> wrote:
>
>> Dean Hildebrand <seattleplus@gmail.com> wrote on 06/27/2008 11:36:28 PM:
>>
>> > One option might be to try using O_DIRECT if you are worried about
>> > memory (although I would read/write in at least 1 MB at a time).  I
>> > would expect this to help at least a bit especially on reads.
>> >
>> > Also, check all the standard nfs tuning stuff, #nfsds, #rpc slots.
>> > Since with a loopback you effectively have no latency, you would want to
>> > ensure that neither the #nfsds or #rpc slots is a bottleneck (if either
>> > one is too low, you will have a problem).  One way to reduce the # of
>> > requests and therefore require fewer nfsds/rpc_slots is to 'cat
>> > /proc/mounts' to see your wsize/rsize.  Ensure your wsize/rsize is a
>> > decent size (~ 1MB).
>>
>> Number of nfsd: 64, and
>>       sunrpc.transports = sunrpc.udp_slot_table_entries = 128
>>       sunrpc.tcp_slot_table_entries = 128
>>
>> I am using:
>>
>>       mount -o
>> rw,bg,hard,nointr,proto=tcp,vers=3,rsize=65536,wsize=65536,timeo=600,noatime
>>  localhost:/local /nfs
>>
>> I have also tried with 1MB for both rsize/wsize and it didn't change the BW
>> (other than
>> mini variations).
>>
>> thanks,
>>
>> - KK
>>
>
> Recently I spent some time with others here at Red Hat looking
> at problems with nfs server performance. One thing we found was that
> there are some problems with multiple nfsd's. It seems like the I/O
> scheduling or something is fooled by the fact that sequential write
> calls are often handled by different nfsd's. This can negatively
> impact performance (I don't think we've tracked this down completely
> yet, however).

Yeah, I think that's what Dean is alluding to above.  There was a
FreeNix paper a few years back that discusses this same readahead
problem with the FreeBSD NFS server.

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
       [not found]                           ` <20080630112654.012ce3e4-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>
  2008-06-30 15:35                             ` J. Bruce Fields
  2008-06-30 15:35                             ` Chuck Lever
@ 2008-07-01  5:07                             ` Krishna Kumar2
  2 siblings, 0 replies; 32+ messages in thread
From: Krishna Kumar2 @ 2008-07-01  5:07 UTC (permalink / raw)
  To: Jeff Layton
  Cc: J. Bruce Fields, Benny Halevy, Chuck Lever, linux-nfs,
	Dean Hildebrand, Peter Staubach

Jeff Layton <jlayton@redhat.com> wrote on 06/30/2008 08:56:54 PM:

> Recently I spent some time with others here at Red Hat looking
> at problems with nfs server performance. One thing we found was that
> there are some problems with multiple nfsd's. It seems like the I/O
> scheduling or something is fooled by the fact that sequential write
> calls are often handled by different nfsd's. This can negatively
> impact performance (I don't think we've tracked this down completely
> yet, however).
>
> Since you're just doing some single-threaded testing on the client
> side, it might be interesting to try running a single nfsd and testing
> performance with that. It might provide an interesting data point.

Works perfectly now!

With 64 nfsd's:
[root@localhost nfs]# ./perf
      ********** Testing on /nfs *************
      Read Time: 50.6236      BW:29.63 MB/s
      ********** Testing on /local *************
      Read Time: 38.3506      BW:39.11 MB/s

With 1 nfs'd:
[root@localhost nfs]# ./perf
      ********** Testing on /nfs *************
      Read Time: 38.4760      BW:38.99 MB/s
      ********** Testing on /local *************
      Read Time: 38.4874      BW:38.97 MB/s

I will try your other suggestions too.

I have to see what happens if I increase my processes. The real test is
DB2 using 300 connections. I will update when I run some more tests. But
thanks to everyone's help so far.

Regards,

- KK


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: NFS performance degradation of local loopback FS.
  2008-06-30 10:10                       ` Krishna Kumar2
  2008-06-30 15:26                         ` Jeff Layton
@ 2008-06-30 15:30                         ` Chuck Lever
  1 sibling, 0 replies; 32+ messages in thread
From: Chuck Lever @ 2008-06-30 15:30 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: Dean Hildebrand, J. Bruce Fields, Benny Halevy, linux-nfs,
	Peter Staubach

On Mon, Jun 30, 2008 at 6:10 AM, Krishna Kumar2 <krkumar2@in.ibm.com> wrote:
> Dean Hildebrand <seattleplus@gmail.com> wrote on 06/27/2008 11:36:28 PM:
>
>> One option might be to try using O_DIRECT if you are worried about
>> memory (although I would read/write in at least 1 MB at a time).  I
>> would expect this to help at least a bit especially on reads.
>>
>> Also, check all the standard nfs tuning stuff, #nfsds, #rpc slots.
>> Since with a loopback you effectively have no latency, you would want to
>> ensure that neither the #nfsds or #rpc slots is a bottleneck (if either
>> one is too low, you will have a problem).  One way to reduce the # of
>> requests and therefore require fewer nfsds/rpc_slots is to 'cat
>> /proc/mounts' to see your wsize/rsize.  Ensure your wsize/rsize is a
>> decent size (~ 1MB).
>
> Number of nfsd: 64, and
>      sunrpc.transports = sunrpc.udp_slot_table_entries = 128
>      sunrpc.tcp_slot_table_entries = 128

Interestingly, sometimes using a large number of slots can be
detrimental to performance over loopback.  Have you tried 32 and 64 as
well as 128?  Also, I seem to recall that you should have the same as
or fewer slots on your clients than you have threads on your server.

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2008-07-01 12:49 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-19  6:46 NFS performance degradation of local loopback FS Krishna Kumar2
2008-06-19  9:58 ` Krishna Kumar2
2008-06-19 12:04   ` Peter Staubach
2008-06-19 12:52     ` Benny Halevy
2008-06-20  6:39       ` Krishna Kumar2
2008-06-20  9:21       ` Krishna Kumar2
2008-06-22  8:35         ` Benny Halevy
2008-06-23  8:11           ` Krishna Kumar2
2008-06-23 12:40             ` Benny Halevy
2008-06-26  7:19               ` Krishna Kumar2
2008-06-26 17:42                 ` Chuck Lever
2008-06-26 17:55                   ` J. Bruce Fields
2008-06-26 21:05                     ` Chuck Lever
     [not found]                       ` <76bd70e30806261405g9357c6fg51b973ff076ee78b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-06-26 21:22                         ` kernel hacker's pub night J. Bruce Fields
2008-06-26 21:24                           ` J. Bruce Fields
2008-06-27  7:14                             ` Benny Halevy
2008-06-27  9:04                   ` NFS performance degradation of local loopback FS Krishna Kumar2
2008-06-27 14:06                     ` Chuck Lever
     [not found]                       ` <76bd70e30806270706x7cbfd291l6cb6d0cc5e81771-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-06-30  9:57                         ` Krishna Kumar2
2008-06-30 15:25                           ` Chuck Lever
     [not found]                             ` <76bd70e30806300825t6490477dpb8ce3ee48a0a6777-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-07-01  3:43                               ` Krishna Kumar2
2008-06-27 17:44                     ` J. Bruce Fields
2008-06-27 18:06                     ` Dean Hildebrand
2008-06-30 10:10                       ` Krishna Kumar2
2008-06-30 15:26                         ` Jeff Layton
     [not found]                           ` <20080630112654.012ce3e4-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>
2008-06-30 15:35                             ` J. Bruce Fields
2008-06-30 16:00                               ` Chuck Lever
2008-07-01 10:19                               ` Krishna Kumar2
2008-07-01 12:47                                 ` Jeff Layton
2008-06-30 15:35                             ` Chuck Lever
2008-07-01  5:07                             ` Krishna Kumar2
2008-06-30 15:30                         ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox