From: Kenton Varda <kenton@cloudflare.com>
To: Ivan Babrou <ivan@cloudflare.com>
Cc: Dave Chinner <david@fromorbit.com>,
linux-xfs@vger.kernel.org, Shawn Bohrer <sbohrer@cloudflare.com>
Subject: Re: Non-blocking socket stuck for multiple seconds on xfs_reclaim_inodes_ag()
Date: Thu, 20 Dec 2018 20:00:21 -0800 [thread overview]
Message-ID: <CAJouXQn2mSyyacnf_CnrhX-JQ1x2QOUoB3=bzsSfbHFfAdRc9Q@mail.gmail.com> (raw)
In-Reply-To: <CABWYdi28ifToh-yWRAv4MSdJ9g6t-Rxyz2GAFXGFraCwf9BBDg@mail.gmail.com>
Hi all,
I'm a coworker of Ivan's and wanted to add some background here -- in
particular to answer Dave's question about our workload.
For the purpose of this discussion, we can describe our workload as a
giant, glorified HTTP caching proxy. (We receive HTTP requests in. We
check if we have a cached response. If so, we return it to the client,
otherwise we forward the request on to its "origin" server. When the
origin responds, if the response is cacheable, we save it, and either
way we return it to the client.)
Roughly speaking, each HTTP cache entry is stored as a file on disk.
Hence, we have a very large number of files with files being added and
removed frequently. We also rely heavily on page cache for
performance, rather than some more complicated database scheme.
The HTTP requests we serve almost always come from live end users
interacting with a web site. So, any kind of delay means someone is
sitting and waiting. When delays get up over 15 seconds, we start
hitting timeouts, meaning someone's web site doesn't load at all or
loads "broken". Also note that any particular machine may serve
thousands of requests per second, so blocking one machine may affect
thousands of users.
When XFS blocks direct reclaim, our service pretty much grinds to a
halt on that machine, because everything is trying to allocate memory
all the time. For example, as alluded by the subject of this thread,
writing to a socket allocates memory, and thus will block waiting for
XFS to write back inodes. What we find really frustrating is that we
almost always have over 100GB of clean page cache that could be
reclaimed immediately, without blocking, yet we end up waiting for the
much-smaller inode cache to be written back to disk.
We really can't accept random multi-second pauses. Our current plan is
to roll out the patch Ivan linked to. But, if you have any other
suggestions, we'd love to hear them. It would be great if we could
agree on an upstream solution, and maybe solve Facebook's problem too.
Hope that helps elucidate things.
Thanks,
-Kenton
On Wed, Dec 19, 2018 at 2:15 PM Ivan Babrou <ivan@cloudflare.com> wrote:
>
> We're sticking with the following patch that allows runtime switching
> between XFS memory reclaim strategies:
>
> * https://github.com/bobrik/linux/pull/2
>
> There are some tests and graphs describing the issue and how it can be solved.
>
> Let me know if you think this can be incorporated upstream, I'm fine if not.
>
> On Thu, Nov 29, 2018 at 11:45 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Fri, Nov 30, 2018 at 05:49:08PM +1100, Dave Chinner wrote:
> > > Seriously: describe your workload in detail for me so I can write a
> > > reproducer for it. Without that I cannot help you any further and I
> > > am just wasting my time asking you to describe the workload over
> > > and over again.
> >
> > FWIW, here's the discussion that about the FB issue. Go read it,
> > the first few emails are pretty much the same as this thread so far.
> >
> > https://www.spinics.net/lists/linux-xfs/msg01541.html
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
next prev parent reply other threads:[~2018-12-21 4:00 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-29 0:36 Non-blocking socket stuck for multiple seconds on xfs_reclaim_inodes_ag() Ivan Babrou
2018-11-29 2:18 ` Dave Chinner
2018-11-29 14:36 ` Shawn Bohrer
2018-11-29 21:20 ` Dave Chinner
2018-11-29 22:22 ` Ivan Babrou
2018-11-30 2:18 ` Dave Chinner
2018-11-30 3:31 ` Ivan Babrou
2018-11-30 6:49 ` Dave Chinner
2018-11-30 7:45 ` Dave Chinner
2018-12-19 22:15 ` Ivan Babrou
2018-12-21 4:00 ` Kenton Varda [this message]
2018-12-25 23:47 ` Dave Chinner
2018-12-26 3:16 ` Kenton Varda
2018-12-29 19:05 ` Darrick J. Wong
2019-01-01 23:48 ` Dave Chinner
2019-01-02 10:34 ` Arkadiusz Miśkiewicz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJouXQn2mSyyacnf_CnrhX-JQ1x2QOUoB3=bzsSfbHFfAdRc9Q@mail.gmail.com' \
--to=kenton@cloudflare.com \
--cc=david@fromorbit.com \
--cc=ivan@cloudflare.com \
--cc=linux-xfs@vger.kernel.org \
--cc=sbohrer@cloudflare.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).