From: Jan Kara <jack@suse.cz>
To: Viktor Nagy <viktor.nagy@thx4games.com>
Cc: Jan Kara <jack@suse.cz>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
"Darrick J. Wong" <djwong@us.ibm.com>,
chris.mason@fusionio.com
Subject: Re: Linux 3.0+ Disk performance problem - wrong pdflush behaviour
Date: Thu, 11 Oct 2012 17:47:07 +0200 [thread overview]
Message-ID: <20121011154707.GB8001@quack.suse.cz> (raw)
In-Reply-To: <5076C258.1070405@thx4games.com>
On Thu 11-10-12 13:58:00, Viktor Nagy wrote:
> >>>>>The regression you observe is caused by commit 3d08bcc8 "mm: Wait for
> >>>>>writeback when grabbing pages to begin a write". At the first sight I was
> >>>>>somewhat surprised when I saw that code path in the traces but later when I
> >>>>>did some math it's clear. What the commit does is that when a page is just
> >>>>>being written out to disk, we don't allow it's contents to be changed and
> >>>>>wait for IO to finish before letting next write to proceed. Now if you have
> >>>>>1 GB file, that's 256000 pages. By the observation from my test machine,
> >>>>>writeback code keeps around 10000 pages in flight to disk at any moment
> >>>>>(this number fluctuates a lot but average is around that number). Your
> >>>>>program dirties about 25600 pages per second. So the probability one of
> >>>>>dirtied pages is a page under writeback is equal to 1 for all practical
> >>>>>purposes (precisely it is 1-(1-10000/256000)^25600). Actually, on average
> >>>>>you are going to hit about 1000 pages under writeback per second which
> >>>>>clearly has a noticeable impact (even single page can have). Pity I didn't
> >>>>>do the math when we were considering those patches.
> >>>>>
> >>>>>There were plans to avoid waiting if underlying storage doesn't need it but
> >>>>>I'm not sure how far that plans got (added a couple of relevant CCs).
> >>>>>Anyway you are about second or third real workload that sees regression due
> >>>>>to "stable pages" so we have to fix that sooner rather than later... Thanks
> >>>>>for your detailed report!
> >>>>We develop a game server which gets very high load in some
> >>>>countries. We are trying to serve as much players as possible with
> >>>>one server.
> >>>>Currently the CPU usage is below the 50% at the peak times. And with
> >>>>the old kernel it runs smoothly. The pdflush runs non-stop on the
> >>>>database disk with ~3 MByte/s write (minimal read).
> >>>>This is at 43000 active sockets, 18000 rq/s, ~40000 packets/s.
> >>>>I think we are still below the theoratical limits of this server...
> >>>>but only if the disk writes are never done in sync.
> >>>>
> >>>>I will try the 3.2.31 kernel without the problematic commit
> >>>>(3d08bcc8 "mm: Wait for writeback when grabbing pages to begin a
> >>>>write").
> >>>>Is it a good idea? Will it be worse than 2.6.32?
> >>> Running without that commit should work just fine unless you use
> >>>something exotic like DIF/DIX or similar. Whether things will be worse than
> >>>in 2.6.32 I cannot say. For me, your test program behaves fine without that
> >>>commit but whether your real workload won't hit some other problem is
> >>>always a question. But if you hit another regression I'm interested in
> >>>hearing about it :).
> >>I've just tested it. After I've set the dirty_bytes over the file
> >>size the writes are never blocked.
> >>So it's working nice without the mentioned commit.
> >>
> >>The problem is that if you read the kernel's documentation about the
> >>dirty page handling it does not work that way (with the commit) It
> >>works unpredictable.
> > Which documentation do you mean exatly? The process won't be throttled
> >because of dirtying too much memory but we can still block it for other
> >reasons - e.g. because we decide to evict it's code from memory and have to
> >reload it again when the process gets scheduled. Or we can block during
> >memory allocation (which may be needed to allocate a page you write to) if
> >we find it necessary. There are no promises really...
> >
> Ok, it is very hard to get an overview about this whole thing.
> I thought I understood the behaviour checking the file
> Documentation/sysctl/vm.txt:
>
> "
> dirty_bytes
>
> Contains the amount of dirty memory at which a process generating
> disk writes
> will itself start writeback.
> ...
> "
>
> Ok, it not says exactly that other things can influence too.
>
> Several people are trying to get over the problem caused by the
> commit with setting the value of /sys/block/sda/queue/nr_requests to
> 4 (from 128).
> This helped a lot but was not enough for us.
Yes, that reduces amount of IO in flight at any moment so it reduces
chances you will wait in grab_cache_page_write_begin(). But it also reduces
throughput...
> I attach two performance graphs which shows our own CPU usage
> measurement (red). One minute averages, the blue line is the SQL
> time %.
>
> And a nice question: Without reverting the patch is it possible to
> get a smooth performance (in our case)?
I don't know how to fix the issue without reverting the patch. Sorry.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
prev parent reply other threads:[~2012-10-11 15:47 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <5073F13C.5050807@thx4games.com>
2012-10-10 16:57 ` Linux 3.0+ Disk performance problem - wrong pdflush behaviour Jan Kara
2012-10-10 20:44 ` Viktor Nagy
2012-10-10 21:27 ` Jan Kara
2012-10-11 10:52 ` Viktor Nagy
2012-10-11 10:10 ` Jan Kara
2012-10-11 12:58 ` Viktor Nagy
2012-10-11 15:47 ` Jan Kara [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121011154707.GB8001@quack.suse.cz \
--to=jack@suse.cz \
--cc=chris.mason@fusionio.com \
--cc=djwong@us.ibm.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viktor.nagy@thx4games.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).