From: Wu Fengguang <fengguang.wu@intel.com>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: John Stoffel <john@stoffel.org>,
Dave Chinner <david@fromorbit.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] nfs: use 4*rsize readahead size
Date: Wed, 3 Mar 2010 11:27:24 +0800 [thread overview]
Message-ID: <20100303032724.GA9979@localhost> (raw)
In-Reply-To: <1267555339.3099.127.camel@localhost.localdomain>
On Wed, Mar 03, 2010 at 02:42:19AM +0800, Trond Myklebust wrote:
> On Tue, 2010-03-02 at 12:33 -0500, John Stoffel wrote:
> > >>>>> "Trond" == Trond Myklebust <Trond.Myklebust@netapp.com> writes:
> >
> > Trond> On Tue, 2010-03-02 at 11:10 +0800, Wu Fengguang wrote:
> > >> Dave,
> > >>
> > >> Here is one more test on a big ext4 disk file:
> > >>
> > >> 16k 39.7 MB/s
> > >> 32k 54.3 MB/s
> > >> 64k 63.6 MB/s
> > >> 128k 72.6 MB/s
> > >> 256k 71.7 MB/s
> > >> rsize ==> 512k 71.7 MB/s
> > >> 1024k 72.2 MB/s
> > >> 2048k 71.0 MB/s
> > >> 4096k 73.0 MB/s
> > >> 8192k 74.3 MB/s
> > >> 16384k 74.5 MB/s
> > >>
> > >> It shows that >=128k client side readahead is enough for single disk
> > >> case :) As for RAID configurations, I guess big server side readahead
> > >> should be enough.
> >
> > Trond> There are lots of people who would like to use NFS on their
> > Trond> company WAN, where you typically have high bandwidths (up to
> > Trond> 10GigE), but often a high latency too (due to geographical
> > Trond> dispersion). My ping latency from here to a typical server in
> > Trond> NetApp's Bangalore office is ~ 312ms. I read your test results
> > Trond> with 10ms delays, but have you tested with higher than that?
> >
> > If you have that high a latency, the low level TCP protocol is going
> > to kill your performance before you get to the NFS level. You really
> > need to open up the TCP window size at that point. And it only gets
> > worse as the bandwidth goes up too.
>
> Yes. You need to open the TCP window in addition to reading ahead
> aggressively.
I only get ~10MB/s throughput with following settings.
# huge NFS ra size
echo 89512 > /sys/devices/virtual/bdi/0:15/read_ahead_kb
# on both sides
/sbin/tc qdisc add dev eth0 root netem delay 200ms
net.core.rmem_max = 873800000
net.core.wmem_max = 655360000
net.ipv4.tcp_rmem = 8192 87380000 873800000
net.ipv4.tcp_wmem = 4096 65536000 655360000
Did I miss something?
Thanks,
Fengguang
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: John Stoffel <john@stoffel.org>,
Dave Chinner <david@fromorbit.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] nfs: use 4*rsize readahead size
Date: Wed, 3 Mar 2010 11:27:24 +0800 [thread overview]
Message-ID: <20100303032724.GA9979@localhost> (raw)
In-Reply-To: <1267555339.3099.127.camel@localhost.localdomain>
On Wed, Mar 03, 2010 at 02:42:19AM +0800, Trond Myklebust wrote:
> On Tue, 2010-03-02 at 12:33 -0500, John Stoffel wrote:
> > >>>>> "Trond" == Trond Myklebust <Trond.Myklebust@netapp.com> writes:
> >
> > Trond> On Tue, 2010-03-02 at 11:10 +0800, Wu Fengguang wrote:
> > >> Dave,
> > >>
> > >> Here is one more test on a big ext4 disk file:
> > >>
> > >> 16k 39.7 MB/s
> > >> 32k 54.3 MB/s
> > >> 64k 63.6 MB/s
> > >> 128k 72.6 MB/s
> > >> 256k 71.7 MB/s
> > >> rsize ==> 512k 71.7 MB/s
> > >> 1024k 72.2 MB/s
> > >> 2048k 71.0 MB/s
> > >> 4096k 73.0 MB/s
> > >> 8192k 74.3 MB/s
> > >> 16384k 74.5 MB/s
> > >>
> > >> It shows that >=128k client side readahead is enough for single disk
> > >> case :) As for RAID configurations, I guess big server side readahead
> > >> should be enough.
> >
> > Trond> There are lots of people who would like to use NFS on their
> > Trond> company WAN, where you typically have high bandwidths (up to
> > Trond> 10GigE), but often a high latency too (due to geographical
> > Trond> dispersion). My ping latency from here to a typical server in
> > Trond> NetApp's Bangalore office is ~ 312ms. I read your test results
> > Trond> with 10ms delays, but have you tested with higher than that?
> >
> > If you have that high a latency, the low level TCP protocol is going
> > to kill your performance before you get to the NFS level. You really
> > need to open up the TCP window size at that point. And it only gets
> > worse as the bandwidth goes up too.
>
> Yes. You need to open the TCP window in addition to reading ahead
> aggressively.
I only get ~10MB/s throughput with following settings.
# huge NFS ra size
echo 89512 > /sys/devices/virtual/bdi/0:15/read_ahead_kb
# on both sides
/sbin/tc qdisc add dev eth0 root netem delay 200ms
net.core.rmem_max = 873800000
net.core.wmem_max = 655360000
net.ipv4.tcp_rmem = 8192 87380000 873800000
net.ipv4.tcp_wmem = 4096 65536000 655360000
Did I miss something?
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-03-03 3:27 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-24 2:41 [RFC] nfs: use 2*rsize readahead size Wu Fengguang
2010-02-24 2:41 ` Wu Fengguang
2010-02-24 2:41 ` Wu Fengguang
2010-02-24 3:29 ` Dave Chinner
2010-02-24 3:29 ` Dave Chinner
2010-02-24 4:18 ` Wu Fengguang
2010-02-24 4:18 ` Wu Fengguang
2010-02-24 5:22 ` Dave Chinner
2010-02-24 5:22 ` Dave Chinner
2010-02-24 5:22 ` Dave Chinner
2010-02-24 6:12 ` Wu Fengguang
2010-02-24 6:12 ` Wu Fengguang
2010-02-24 7:39 ` Dave Chinner
2010-02-24 7:39 ` Dave Chinner
2010-02-26 7:49 ` [RFC] nfs: use 4*rsize " Wu Fengguang
2010-02-26 7:49 ` Wu Fengguang
2010-03-02 3:10 ` Wu Fengguang
2010-03-02 3:10 ` Wu Fengguang
2010-03-02 14:19 ` Trond Myklebust
2010-03-02 14:19 ` Trond Myklebust
2010-03-02 17:33 ` John Stoffel
2010-03-02 17:33 ` John Stoffel
2010-03-02 18:42 ` Trond Myklebust
2010-03-02 18:42 ` Trond Myklebust
2010-03-02 18:42 ` Trond Myklebust
2010-03-03 3:27 ` Wu Fengguang [this message]
2010-03-03 3:27 ` Wu Fengguang
2010-04-14 21:22 ` Dean Hildebrand
2010-04-14 21:22 ` Dean Hildebrand
2010-03-02 20:14 ` Bret Towe
2010-03-02 20:14 ` Bret Towe
2010-03-02 20:14 ` Bret Towe
2010-03-03 1:43 ` Wu Fengguang
2010-03-03 1:43 ` Wu Fengguang
2010-02-24 11:18 ` [RFC] nfs: use 2*rsize " Akshat Aranya
2010-02-24 11:18 ` Akshat Aranya
2010-02-24 11:18 ` Akshat Aranya
2010-02-25 12:37 ` Wu Fengguang
2010-02-25 12:37 ` Wu Fengguang
2010-02-25 12:37 ` Wu Fengguang
2010-02-24 4:24 ` Dave Chinner
2010-02-24 4:24 ` Dave Chinner
2010-02-24 4:33 ` Wu Fengguang
2010-02-24 4:33 ` Wu Fengguang
2010-02-24 4:43 ` Wu Fengguang
2010-02-24 4:43 ` Wu Fengguang
2010-02-24 4:43 ` Wu Fengguang
2010-02-24 5:24 ` Dave Chinner
2010-02-24 5:24 ` Dave Chinner
2010-02-24 5:24 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100303032724.GA9979@localhost \
--to=fengguang.wu@intel.com \
--cc=Trond.Myklebust@netapp.com \
--cc=david@fromorbit.com \
--cc=john@stoffel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.