Re: [RFC] nfs: use 2*rsize readahead size

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wu Fengguang <fengguang.wu@intel.com>
To: Akshat Aranya <aaranya+fsdevel@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>,
	Trond Myklebust <Trond.Myklebust@netapp.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] nfs: use 2*rsize readahead size
Date: Thu, 25 Feb 2010 20:37:55 +0800	[thread overview]
Message-ID: <20100225123755.GB9077@localhost> (raw)
In-Reply-To: <e48344781002240318u6e6545bdt97712dca4efceb9f@mail.gmail.com>

On Wed, Feb 24, 2010 at 07:18:26PM +0800, Akshat Aranya wrote:
> On Wed, Feb 24, 2010 at 12:22 AM, Dave Chinner <david@fromorbit.com> wrote:
> 
> >
> >> It sounds silly to have
> >>
> >>         client_readahead_size > server_readahead_size
> >
> > I don't think it is  - the client readahead has to take into account
> > the network latency as well as the server latency. e.g. a network
> > with a high bandwidth but high latency is going to need much more
> > client side readahead than a high bandwidth, low latency network to
> > get the same throughput. Hence it is not uncommon to see larger
> > readahead windows on network clients than for local disk access.
> >
> > Also, the NFS server may not even be able to detect sequential IO
> > patterns because of the combined access patterns from the clients,
> > and so the only effective readahead might be what the clients
> > issue....
> >
> 
> In my experiments, I have observed that the server-side readahead
> shuts off rather quickly even with a single client because the client
> readahead causes multiple pending read RPCs on the server which are
> then serviced in random order and the pattern observed by the
> underlying file system is non-sequential.  In our file system, we had
> to override what the VFS thought was a random workload and continue to
> do readahead anyway.

What's the server side kernel version, plus client/server side
readahead size? I'd expect the context readahead to handle it well.

With the patchset in <http://lkml.org/lkml/2010/2/23/376>, you can
actually see the readahead details:

        # echo 1 > /debug/tracing/events/readahead/enable
        # cp test-file /dev/null
        # cat /debug/tracing/trace  # trimmed output
        readahead-initial(dev=0:15, ino=100177, req=0+2, ra=0+4-2, async=0) = 4
        readahead-subsequent(dev=0:15, ino=100177, req=2+2, ra=4+8-8, async=1) = 8
        readahead-subsequent(dev=0:15, ino=100177, req=4+2, ra=12+16-16, async=1) = 16
        readahead-subsequent(dev=0:15, ino=100177, req=12+2, ra=28+32-32, async=1) = 32
        readahead-subsequent(dev=0:15, ino=100177, req=28+2, ra=60+60-60, async=1) = 24
        readahead-subsequent(dev=0:15, ino=100177, req=60+2, ra=120+60-60, async=1) = 0

And I've actually verified the NFS case with the help of such traces
long ago.  When client_readahead_size <= server_readahead_size, the
readahead requests may look a bit random at first, and then will
quickly turn into a perfect series of sequential context readaheads.

Thanks,
Fengguang

WARNING: multiple messages have this Message-ID (diff)

From: Wu Fengguang <fengguang.wu@intel.com>
To: Akshat Aranya <aaranya+fsdevel@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>,
	Trond Myklebust <Trond.Myklebust@netapp.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] nfs: use 2*rsize readahead size
Date: Thu, 25 Feb 2010 20:37:55 +0800	[thread overview]
Message-ID: <20100225123755.GB9077@localhost> (raw)
In-Reply-To: <e48344781002240318u6e6545bdt97712dca4efceb9f@mail.gmail.com>

On Wed, Feb 24, 2010 at 07:18:26PM +0800, Akshat Aranya wrote:
> On Wed, Feb 24, 2010 at 12:22 AM, Dave Chinner <david@fromorbit.com> wrote:
> 
> >
> >> It sounds silly to have
> >>
> >>         client_readahead_size > server_readahead_size
> >
> > I don't think it is  - the client readahead has to take into account
> > the network latency as well as the server latency. e.g. a network
> > with a high bandwidth but high latency is going to need much more
> > client side readahead than a high bandwidth, low latency network to
> > get the same throughput. Hence it is not uncommon to see larger
> > readahead windows on network clients than for local disk access.
> >
> > Also, the NFS server may not even be able to detect sequential IO
> > patterns because of the combined access patterns from the clients,
> > and so the only effective readahead might be what the clients
> > issue....
> >
> 
> In my experiments, I have observed that the server-side readahead
> shuts off rather quickly even with a single client because the client
> readahead causes multiple pending read RPCs on the server which are
> then serviced in random order and the pattern observed by the
> underlying file system is non-sequential.  In our file system, we had
> to override what the VFS thought was a random workload and continue to
> do readahead anyway.

What's the server side kernel version, plus client/server side
readahead size? I'd expect the context readahead to handle it well.

With the patchset in <http://lkml.org/lkml/2010/2/23/376>, you can
actually see the readahead details:

        # echo 1 > /debug/tracing/events/readahead/enable
        # cp test-file /dev/null
        # cat /debug/tracing/trace  # trimmed output
        readahead-initial(dev=0:15, ino=100177, req=0+2, ra=0+4-2, async=0) = 4
        readahead-subsequent(dev=0:15, ino=100177, req=2+2, ra=4+8-8, async=1) = 8
        readahead-subsequent(dev=0:15, ino=100177, req=4+2, ra=12+16-16, async=1) = 16
        readahead-subsequent(dev=0:15, ino=100177, req=12+2, ra=28+32-32, async=1) = 32
        readahead-subsequent(dev=0:15, ino=100177, req=28+2, ra=60+60-60, async=1) = 24
        readahead-subsequent(dev=0:15, ino=100177, req=60+2, ra=120+60-60, async=1) = 0

And I've actually verified the NFS case with the help of such traces
long ago.  When client_readahead_size <= server_readahead_size, the
readahead requests may look a bit random at first, and then will
quickly turn into a perfect series of sequential context readaheads.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Wu Fengguang <fengguang.wu@intel.com>
To: Akshat Aranya <aaranya+fsdevel@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>,
	Trond Myklebust <Trond.Myklebust@netapp.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] nfs: use 2*rsize readahead size
Date: Thu, 25 Feb 2010 20:37:55 +0800	[thread overview]
Message-ID: <20100225123755.GB9077@localhost> (raw)
In-Reply-To: <e48344781002240318u6e6545bdt97712dca4efceb9f@mail.gmail.com>

On Wed, Feb 24, 2010 at 07:18:26PM +0800, Akshat Aranya wrote:
> On Wed, Feb 24, 2010 at 12:22 AM, Dave Chinner <david@fromorbit.com> wrote:
> 
> >
> >> It sounds silly to have
> >>
> >> A  A  A  A  client_readahead_size > server_readahead_size
> >
> > I don't think it is A - the client readahead has to take into account
> > the network latency as well as the server latency. e.g. a network
> > with a high bandwidth but high latency is going to need much more
> > client side readahead than a high bandwidth, low latency network to
> > get the same throughput. Hence it is not uncommon to see larger
> > readahead windows on network clients than for local disk access.
> >
> > Also, the NFS server may not even be able to detect sequential IO
> > patterns because of the combined access patterns from the clients,
> > and so the only effective readahead might be what the clients
> > issue....
> >
> 
> In my experiments, I have observed that the server-side readahead
> shuts off rather quickly even with a single client because the client
> readahead causes multiple pending read RPCs on the server which are
> then serviced in random order and the pattern observed by the
> underlying file system is non-sequential.  In our file system, we had
> to override what the VFS thought was a random workload and continue to
> do readahead anyway.

What's the server side kernel version, plus client/server side
readahead size? I'd expect the context readahead to handle it well.

With the patchset in <http://lkml.org/lkml/2010/2/23/376>, you can
actually see the readahead details:

        # echo 1 > /debug/tracing/events/readahead/enable
        # cp test-file /dev/null
        # cat /debug/tracing/trace  # trimmed output
        readahead-initial(dev=0:15, ino=100177, req=0+2, ra=0+4-2, async=0) = 4
        readahead-subsequent(dev=0:15, ino=100177, req=2+2, ra=4+8-8, async=1) = 8
        readahead-subsequent(dev=0:15, ino=100177, req=4+2, ra=12+16-16, async=1) = 16
        readahead-subsequent(dev=0:15, ino=100177, req=12+2, ra=28+32-32, async=1) = 32
        readahead-subsequent(dev=0:15, ino=100177, req=28+2, ra=60+60-60, async=1) = 24
        readahead-subsequent(dev=0:15, ino=100177, req=60+2, ra=120+60-60, async=1) = 0

And I've actually verified the NFS case with the help of such traces
long ago.  When client_readahead_size <= server_readahead_size, the
readahead requests may look a bit random at first, and then will
quickly turn into a perfect series of sequential context readaheads.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-02-25 12:37 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-24  2:41 [RFC] nfs: use 2*rsize readahead size Wu Fengguang
2010-02-24  2:41 ` Wu Fengguang
2010-02-24  2:41 ` Wu Fengguang
2010-02-24  3:29 ` Dave Chinner
2010-02-24  3:29   ` Dave Chinner
2010-02-24  4:18   ` Wu Fengguang
2010-02-24  4:18     ` Wu Fengguang
2010-02-24  5:22     ` Dave Chinner
2010-02-24  5:22       ` Dave Chinner
2010-02-24  5:22       ` Dave Chinner
2010-02-24  6:12       ` Wu Fengguang
2010-02-24  6:12         ` Wu Fengguang
2010-02-24  7:39         ` Dave Chinner
2010-02-24  7:39           ` Dave Chinner
2010-02-26  7:49           ` [RFC] nfs: use 4*rsize " Wu Fengguang
2010-02-26  7:49             ` Wu Fengguang
2010-03-02  3:10             ` Wu Fengguang
2010-03-02  3:10               ` Wu Fengguang
2010-03-02 14:19               ` Trond Myklebust
2010-03-02 14:19                 ` Trond Myklebust
2010-03-02 17:33                 ` John Stoffel
2010-03-02 17:33                   ` John Stoffel
2010-03-02 18:42                   ` Trond Myklebust
2010-03-02 18:42                     ` Trond Myklebust
2010-03-02 18:42                     ` Trond Myklebust
2010-03-03  3:27                     ` Wu Fengguang
2010-03-03  3:27                       ` Wu Fengguang
2010-04-14 21:22                       ` Dean Hildebrand
2010-04-14 21:22                         ` Dean Hildebrand
2010-03-02 20:14               ` Bret Towe
2010-03-02 20:14                 ` Bret Towe
2010-03-02 20:14                 ` Bret Towe
2010-03-03  1:43                 ` Wu Fengguang
2010-03-03  1:43                   ` Wu Fengguang
2010-02-24 11:18       ` [RFC] nfs: use 2*rsize " Akshat Aranya
2010-02-24 11:18         ` Akshat Aranya
2010-02-24 11:18         ` Akshat Aranya
2010-02-25 12:37         ` Wu Fengguang [this message]
2010-02-25 12:37           ` Wu Fengguang
2010-02-25 12:37           ` Wu Fengguang
2010-02-24  4:24   ` Dave Chinner
2010-02-24  4:24     ` Dave Chinner
2010-02-24  4:33     ` Wu Fengguang
2010-02-24  4:33       ` Wu Fengguang
2010-02-24  4:43     ` Wu Fengguang
2010-02-24  4:43       ` Wu Fengguang
2010-02-24  4:43       ` Wu Fengguang
2010-02-24  5:24       ` Dave Chinner
2010-02-24  5:24         ` Dave Chinner
2010-02-24  5:24         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100225123755.GB9077@localhost \
    --to=fengguang.wu@intel.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=aaranya+fsdevel@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.