From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Ben Greear <greearb@candelatech.com>
Cc: Chuck Lever <chuck.lever@oracle.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: Very slow O_DIRECT writes on NFS in .36
Date: Thu, 18 Nov 2010 17:42:04 -0500 [thread overview]
Message-ID: <1290120124.14909.12.camel@heimdal.trondhjem.org> (raw)
In-Reply-To: <4CE58B54.4090203@candelatech.com>
On Thu, 2010-11-18 at 12:23 -0800, Ben Greear wrote:
> On 11/18/2010 12:17 PM, Chuck Lever wrote:
> >
> > On Nov 18, 2010, at 3:07 PM, Ben Greear wrote:
> >
> >> I applied the NFS O_DIRECT patch (and all others) from the pending 2.6.36 stable
> >> queue, and now I can at least use O_DIRECT w/out immediate failure.
> >>
> >> However, I notice that when writing 2k chunks with O_DIRECT on
> >> NFS, it runs extremely slowly (about 300Kbps throughput). The
> >> server is a Fedora 13 64-bit system running 2.6.34.7-56.fc13.x86_64
> >>
> >> Here's some strace -ttT output for the writer:
> >>
> >> 07:03:42.898058 write(9, "\370'\37\345v\230\315\253\3\0\0\0\354\7\0\0\16\1\0\0\0\1\2\3\4\5\6\7\10\t\n\v"..., 2048) = 2048<0.059402>
> >> 07:03:42.957649 poll([{fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=6, events=POLLIN}], 3, 0) = 0 (Timeout)<0.000266>
> >> 07:03:42.958148 write(9, "\212$s\327v\230\315\253\3\0\0\0\354\7\0\0\17\1\0\0\0\1\2\3\4\5\6\7\10\t\n\v"..., 2048) = 2048<0.069295>
> >> 07:03:43.027524 poll([{fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=6, events=POLLIN}], 3, 0) = 0 (Timeout)<0.000011>
> >>
> >>
> >> Writing 64k chunks takes basically the same amount of time per system call:
> >>
> >> 07:06:13.537488 write(9, "\5\340\202\262v\230\315\253\3\0\0\0\354\377\0\0\6\0\0\0\0\1\2\3\4\5\6\7\10\t\n\v"..., 65536) = 65536<0.049462>
> >> 07:06:13.587083 poll([{fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=6, events=POLLIN}], 3, 0) = 0 (Timeout)<0.000035>
> >> 07:06:13.587410 write(9, "\250\231\377cv\230\315\253\3\0\0\0\354\377\0\0\7\0\0\0\0\1\2\3\4\5\6\7\10\t\n\v"..., 65536) = 65536<0.058612>
> >> 07:06:13.646233 poll([{fd=4, events=POLLIN}, {fd=8, events=POLLIN}, {fd=6, events=POLLIN}], 3, 0) = 0 (Timeout)<0.000095>
> >> 07:06:13.646616 write(9, "\5-@\5v\230\315\253\3\0\0\0\354\377\0\0\10\0\0\0\0\1\2\3\4\5\6\7\10\t\n\v"..., 65536) = 65536<0.050282>
> >>
> >>
> >> Reading is a good deal faster..about 34Mbps with O_DIRECT, NFS and 2k reads.
> >>
> >> Any ideas about why the write performance is so bad?
> >
> > A network trace will probably show you that the per-write latency is due to the server.
> >
>
> Looks like you are right. I don't remember it being this slow before, but
> maybe it was. We'll run some tests with older kernels and/or different
> servers.
>
> 6.700193 192.168.100.173 -> 192.168.100.3 NFS V3 WRITE Call, FH:0x6bc05782 Offset:96256 Len:1024 FILE_SYNC
> 6.740547 192.168.100.3 -> 192.168.100.173 TCP 2049 > 800 [ACK] Seq=12321 Ack=101729 Win=501 Len=0 TSV=218471603 TSER=1385525
> 6.769380 192.168.100.3 -> 192.168.100.173 NFS V3 WRITE Reply (Call In 262) Len:1024 FILE_SYNC
> 6.769609 192.168.100.173 -> 192.168.100.3 NFS V3 WRITE Call, FH:0x6bc05782 Offset:97280 Len:1024 FILE_SYNC
> 6.809777 192.168.100.3 -> 192.168.100.173 TCP 2049 > 800 [ACK] Seq=12461 Ack=102885 Win=501 Len=0 TSV=218471673 TSER=1385594
> 6.850373 192.168.100.3 -> 192.168.100.173 NFS V3 WRITE Reply (Call In 265) Len:1024 FILE_SYNC
> 6.850631 192.168.100.173 -> 192.168.100.3 NFS V3 WRITE Call, FH:0x6bc05782 Offset:98304 Len:1024 FILE_SYNC
> 6.890845 192.168.100.3 -> 192.168.100.173 TCP 2049 > 800 [ACK] Seq=12601 Ack=104041 Win=501 Len=0 TSV=218471754 TSER=1385675
> 6.930344 192.168.100.3 -> 192.168.100.173 NFS V3 WRITE Reply (Call In 268) Len:1024 FILE_SYNC
> 6.930703 192.168.100.173 -> 192.168.100.3 NFS V3 WRITE Call, FH:0x6bc05782 Offset:99328 Len:1024 FILE_SYNC
> 6.971753 192.168.100.3 -> 192.168.100.173 TCP 2049 > 800 [ACK] Seq=12741 Ack=105197 Win=501 Len=0 TSV=218471834 TSER=1385755
> 6.980341 192.168.100.3 -> 192.168.100.173 NFS V3 WRITE Reply (Call In 271) Len:1024 FILE_SYNC
This is all expected. Doing direct i/o with small chunks is always going
to be slow, since the kernel is required to ensure all the data has been
synchronised to disk before the syscall can complete.
You can significantly speed things up by grouping several writes into a
single syscall using the vectored write interface (i.e. writev()). That
allows the kernel to use unstable writes to the server and then issue a
single COMMIT call to sync all the data from the vector at once.
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
prev parent reply other threads:[~2010-11-18 22:42 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-18 20:07 Very slow O_DIRECT writes on NFS in .36 Ben Greear
2010-11-18 20:17 ` Chuck Lever
2010-11-18 20:23 ` Ben Greear
2010-11-18 22:42 ` Trond Myklebust [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1290120124.14909.12.camel@heimdal.trondhjem.org \
--to=trond.myklebust@netapp.com \
--cc=chuck.lever@oracle.com \
--cc=greearb@candelatech.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).