linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Mike Marshall <hubcapsc@gmail.com>
Cc: Mike Marshall <hubcap@omnibond.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: problem with orangefs readpage...
Date: Sat, 2 Jan 2021 02:07:33 +0000	[thread overview]
Message-ID: <20210102020733.GA431927@casper.infradead.org> (raw)
In-Reply-To: <CAAoczXbw9A+kqMemEsJax+CaPkQsJzZNw6Y7XFhTsBqDnGD6hw@mail.gmail.com>

On Fri, Jan 01, 2021 at 05:15:32PM -0500, Mike Marshall wrote:
> Hi Matthew... Thanks so much for the suggestions!
> > This is some new version of orangefs_readpage(), right?
> No, that code has been upstream for a while... that readahead_control
> thing looks very interesting :-) ...

Oh, my, I was looking at a tree from before 2018 that still had
orangefs_readpages.  So, yes, I think what's happening is that
orangefs_readpage() is being called from the readahead code.
You'll hit this path:

                        next_page = find_get_page(inode->i_mapping, index);
                        if (next_page) {
                                gossip_debug(GOSSIP_FILE_DEBUG,
                                        "%s: found next page, quitting\n",
                                        __func__);
                                put_page(next_page);
                                goto out;

because readahead already allocated those pages for you and is trying
to fill them one-at-a-time.

Implementing ->readahead, even without dhowells' patch to expand
the ractl will definitely help you!

> -Mike
> 
> On Thu, Dec 31, 2020 at 11:08 PM Matthew Wilcox <willy@infradead.org> wrote:
> 
> > On Thu, Dec 31, 2020 at 04:51:53PM -0500, Mike Marshall wrote:
> > > Greetings...
> > >
> > > I hope some of you will suffer through reading this long message :-) ...
> >
> > Hi Mike!  Happy New Year!
> >
> > > Orangefs isn't built to do small IO. Reading a
> > > big file in page cache sized chunks is slow and painful.
> > > I tried to write orangefs_readpage so that it would do a reasonable
> > > sized hard IO, fill the page that was being called for, and then
> > > go ahead and fill a whole bunch of the following pages into the
> > > page cache with the extra data in the IO buffer.
> >
> > This is some new version of orangefs_readpage(), right?  I don't see
> > anything resembling this in the current codebase.  Did you disable
> > orangefs_readpages() as part of this work?  Because the behaviour you're
> > describing sounds very much like what the readahead code might do to a
> > filesystem which implements readpage and neither readahead nor readpages.
> >
> > > orangefs_readpage gets called for the first four pages and then my
> > > prefill kicks in and fills the next pages and the right data ends
> > > up in /tmp/nine. I, of course, wished and planned for orangefs_readpage
> > > to only get called once, I don't understand why it gets called four
> > > times, which results in three extraneous expensive hard IOs.
> >
> > I might suggest some judicious calling of dump_stack() to understand
> > exactly what's calling you.  My suspicion is that it's this loop in
> > read_pages():
> >
> >                 while ((page = readahead_page(rac))) {
> >                         aops->readpage(rac->file, page);
> >                         put_page(page);
> >                 }
> >
> > which doesn't test for PageUptodate before calling you.
> >
> > It'd probably be best if you implemented ->readahead, which has its own
> > ideas about which pages would be the right ones to read.  It's not always
> > correct, but generally better to have that logic in the VFS than in each
> > filesystem.
> >
> > You probably want to have a look at Dave Howells' work to allow
> > the filesystem to expand the ractl:
> >
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
> >
> > specifically this patch:
> >
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=fscache-iter&id=f582790b32d5d1d8b937df95a8b2b5fdb8380e46
> >

      parent reply	other threads:[~2021-01-02  2:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-31 21:51 problem with orangefs readpage Mike Marshall
2021-01-01  4:08 ` Matthew Wilcox
     [not found]   ` <CAAoczXbw9A+kqMemEsJax+CaPkQsJzZNw6Y7XFhTsBqDnGD6hw@mail.gmail.com>
2021-01-01 22:23     ` Fwd: " Mike Marshall
2021-01-02  2:07     ` Matthew Wilcox [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210102020733.GA431927@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=hubcap@omnibond.com \
    --cc=hubcapsc@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).