From mboxrd@z Thu Jan  1 00:00:00 1970
From: Wu Fengguang <fengguang.wu@intel.com>
Subject: Re: [RFC] nfs: use 2*rsize readahead size
Date: Thu, 25 Feb 2010 20:37:55 +0800
Message-ID: <20100225123755.GB9077@localhost>
References: <20100224024100.GA17048@localhost> <20100224032934.GF16175@discord.disaster> <20100224041822.GB27459@localhost> <20100224052215.GH16175@discord.disaster> <e48344781002240318u6e6545bdt97712dca4efceb9f@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: Dave Chinner <david@fromorbit.com>,
	Trond Myklebust <Trond.Myklebust@netapp.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
To: Akshat Aranya <aaranya+fsdevel@gmail.com>
Return-path: <owner-linux-mm@kvack.org>
Content-Disposition: inline
In-Reply-To: <e48344781002240318u6e6545bdt97712dca4efceb9f@mail.gmail.com>
Sender: owner-linux-mm@kvack.org
List-Id: linux-fsdevel.vger.kernel.org

On Wed, Feb 24, 2010 at 07:18:26PM +0800, Akshat Aranya wrote:
> On Wed, Feb 24, 2010 at 12:22 AM, Dave Chinner <david@fromorbit.com> wr=
ote:
>=20
> >
> >> It sounds silly to have
> >>
> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 client_readahead_size > server_readahead=
_size
> >
> > I don't think it is =C2=A0- the client readahead has to take into acc=
ount
> > the network latency as well as the server latency. e.g. a network
> > with a high bandwidth but high latency is going to need much more
> > client side readahead than a high bandwidth, low latency network to
> > get the same throughput. Hence it is not uncommon to see larger
> > readahead windows on network clients than for local disk access.
> >
> > Also, the NFS server may not even be able to detect sequential IO
> > patterns because of the combined access patterns from the clients,
> > and so the only effective readahead might be what the clients
> > issue....
> >
>=20
> In my experiments, I have observed that the server-side readahead
> shuts off rather quickly even with a single client because the client
> readahead causes multiple pending read RPCs on the server which are
> then serviced in random order and the pattern observed by the
> underlying file system is non-sequential.  In our file system, we had
> to override what the VFS thought was a random workload and continue to
> do readahead anyway.

What's the server side kernel version, plus client/server side
readahead size? I'd expect the context readahead to handle it well.

With the patchset in <http://lkml.org/lkml/2010/2/23/376>, you can
actually see the readahead details:

        # echo 1 > /debug/tracing/events/readahead/enable
        # cp test-file /dev/null
        # cat /debug/tracing/trace  # trimmed output
        readahead-initial(dev=3D0:15, ino=3D100177, req=3D0+2, ra=3D0+4-2=
, async=3D0) =3D 4
        readahead-subsequent(dev=3D0:15, ino=3D100177, req=3D2+2, ra=3D4+=
8-8, async=3D1) =3D 8
        readahead-subsequent(dev=3D0:15, ino=3D100177, req=3D4+2, ra=3D12=
+16-16, async=3D1) =3D 16
        readahead-subsequent(dev=3D0:15, ino=3D100177, req=3D12+2, ra=3D2=
8+32-32, async=3D1) =3D 32
        readahead-subsequent(dev=3D0:15, ino=3D100177, req=3D28+2, ra=3D6=
0+60-60, async=3D1) =3D 24
        readahead-subsequent(dev=3D0:15, ino=3D100177, req=3D60+2, ra=3D1=
20+60-60, async=3D1) =3D 0

And I've actually verified the NFS case with the help of such traces
long ago.  When client_readahead_size <=3D server_readahead_size, the
readahead requests may look a bit random at first, and then will
quickly turn into a perfect series of sequential context readaheads.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=3Dmailto:"dont@kvack.org"> email@kvack.org </a>