From: Jan Kara <jack@suse.cz>
To: Andres Freund <andres@2ndquadrant.com>
Cc: Andy Lutomirski <luto@amacapital.net>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
lsf@lists.linux-foundation.org,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
rhaas@anarazel.de, Linux FS Devel <linux-fsdevel@vger.kernel.org>,
Wu Fengguang <fengguang.wu@intel.com>
Subject: Re: [Lsf] Postgresql performance problems with IO latency, especially during fsync()
Date: Thu, 27 Mar 2014 16:50:02 +0100 [thread overview]
Message-ID: <20140327155002.GF18118@quack.suse.cz> (raw)
In-Reply-To: <20140326215518.GH9066@alap3.anarazel.de>
On Wed 26-03-14 22:55:18, Andres Freund wrote:
> On 2014-03-26 14:41:31 -0700, Andy Lutomirski wrote:
> > On Wed, Mar 26, 2014 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote:
> > > Hi,
> > >
> > > At LSF/MM there was a slot about postgres' problems with the kernel. Our
> > > top#1 concern is frequent slow read()s that happen while another process
> > > calls fsync(), even though we'd be perfectly fine if that fsync() took
> > > ages.
> > > The "conclusion" of that part was that it'd be very useful to have a
> > > demonstration of the problem without needing a full blown postgres
> > > setup. I've quickly hacked something together, that seems to show the
> > > problem nicely.
> > >
> > > For a bit of context: lwn.net/SubscriberLink/591723/940134eb57fcc0b8/
> > > and the "IO Scheduling" bit in
> > > http://archives.postgresql.org/message-id/20140310101537.GC10663%40suse.de
> > >
> >
> > For your amusement: running this program in KVM on a 2GB disk image
> > failed, but it caused the *host* to go out to lunch for several
> > seconds while failing. In fact, it seems to have caused the host to
> > fall over so badly that the guest decided that the disk controller was
> > timing out. The host is btrfs, and I think that btrfs is *really* bad
> > at this kind of workload.
>
> Also, unless you changed the parameters, it's a) using a 48GB disk file,
> and writes really rather fast ;)
>
> > Even using ext4 is no good. I think that dm-crypt is dying under the
> > load. So I won't test your program for real :/
>
> Try to reduce data_size to RAM * 2, NUM_RANDOM_READERS to something
> smaller. If it still doesn't work consider increasing the two nsleep()s...
>
> I didn't have a good idea how to scale those to the current machine in a
> halfway automatic fashion.
That's not necessary. If we have a guidance like above, we can figure it
out ourselves (I hope ;).
> > > Possible solutions:
> > > * Add a fadvise(UNDIRTY), that doesn't stall on a full IO queue like
> > > sync_file_range() does.
> > > * Make IO triggered by writeback regard IO priorities and add it to
> > > schedulers other than CFQ
> > > * Add a tunable that allows limiting the amount of dirty memory before
> > > writeback on a per process basis.
> > > * ...?
> >
> > I thought the problem wasn't so much that priorities weren't respected
> > but that the fsync call fills up the queue, so everything starts
> > contending for the right to enqueue a new request.
>
> I think it's both actually. If I understand correctly there's not even a
> correct association to the originator anymore during a fsync triggered
> flush?
There is. The association is lost for background writeback (and sync(2)
for that matter) but IO from fsync(2) is submitted in the context of the
process doing fsync.
What I think happens is the problem with 'dependent sync IO' vs
'independent sync IO'. Reads are an example of dependent sync IO where you
submit a read, need it to complete and then you submit another read. OTOH
fsync is an example of independent sync IO where you fire of tons of IO to
the drive and they wait for everything. Since we treat both these types of
IO in the same way, it can easily happen that independent sync IO starves
out the dependent one (you execute say 100 IO requests for fsync and 1 IO
request for read). We've seen problems like this in the past.
I'll have a look into your test program and if my feeling is indeed
correct, I'll have a look into what we could do in the block layer to fix
this (and poke block layer guys - they had some preliminary patches that
tried to address this but it didn't went anywhere).
> > Since fsync blocks until all of its IO finishes anyway, what if it
> > could just limit itself to a much smaller number of outstanding
> > requests?
>
> Yea, that could already help. If you remove the fsync()s, the problem
> will periodically appear anyway, because writeback is triggered with
> vengeance. That'd need to be fixed in a similar way.
Actually, that might be triggered by a different problem because in case
of background writeback, block layer knows the IO is asynchronous and
treats it in a different way.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-03-27 15:50 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-26 19:11 Postgresql performance problems with IO latency, especially during fsync() Andres Freund
2014-03-26 21:41 ` [Lsf] " Andy Lutomirski
2014-03-26 21:55 ` Andres Freund
2014-03-26 22:26 ` Andy Lutomirski
2014-03-26 22:35 ` David Lang
2014-03-26 23:11 ` Andy Lutomirski
2014-03-26 23:28 ` Andy Lutomirski
2014-03-27 15:50 ` Jan Kara [this message]
2014-03-27 18:10 ` Fernando Luis Vazquez Cao
2014-03-27 15:52 ` Jan Kara
2014-04-09 9:20 ` Dave Chinner
2014-04-12 13:24 ` Andres Freund
2014-04-28 23:47 ` [Lsf] " Dave Chinner
2014-04-28 23:57 ` Andres Freund
2014-05-23 6:42 ` Dave Chinner
2014-06-04 20:06 ` Andres Freund
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140327155002.GF18118@quack.suse.cz \
--to=jack@suse.cz \
--cc=andres@2ndquadrant.com \
--cc=fengguang.wu@intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf@lists.linux-foundation.org \
--cc=luto@amacapital.net \
--cc=rhaas@anarazel.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).