All of lore.kernel.org
 help / color / mirror / Atom feed
* I/O blocked while dirty pages are being flushed
@ 2013-03-24  5:12 Fredrik Tolf
  2013-03-24  6:56 ` Eric Wong
  2013-03-24 10:27 ` Bart Van Assche
  0 siblings, 2 replies; 6+ messages in thread
From: Fredrik Tolf @ 2013-03-24  5:12 UTC (permalink / raw)
  To: linux-kernel

Dear list,

I've got an mmapped file (a Berkeley DB region file) with an access 
pattern such that it gets some 10-40 MBs of dirtied pages a couple of 
times per minute. When the VM comes around to flush these pages to disk, 
that causes loads of problems. Since the dirty pages are rather 
interspersed in the file, the flusher posts batches of some 3000-5000 
write requests to the disk queue, and since I'm using normal hard drives, 
this might sometimes take 10-30 seconds to complete.

While this flush is running, I find that many a process goes into disk 
sleep waiting for the flush to complete. This includes the process 
manipulating the mmapped file whenever it tries to redirty a page 
currently waiting to be flushed, but also, for instance, programs that 
write() to log files (since, I guess, the buffer page backing the last 
written portion of the log file is being flushed). The common culprits, 
then, are sleep_on_page and sleep_on_buffer. All these processes commonly 
block for up to several tens of seconds, then, which gets me all kind of 
trouble, as I'm sure you can see.

I'd like to hear your opinion on this case. Is Berkeley DB at fault for 
causing these kinds of access patterns? Is the kernel at fault for 
blocking all these processes needlessly? Is the hardware at fault for 
being so hopelessly slow and I should get with the times and get me some 
SSDs? Or am I at fault for not finding the obvious configuration settings 
to avoid the problem? :)

I'm inclined to think that the kernel is at fault for blocking the 
processes needlessly. If the contents of the pages being flushed need to 
be preserved until the write is completed, shouldn't they be copied when 
written to, rather than blocking the writer for who-knows-how-long? It 
seems that if the kernel doesn't do this, then I'm always put at the mercy 
of the hardware, and as long as I have free memory, I shouldn't have to 
be.

However, I could also see that Berkeley DB is somehow at fault for this 
kind of access, causing such massive disk writes, and that perhaps it 
should be using SysV SHM regions or such instead of disk-backed files? 
Would it be possible, perhaps, to get these files treated more like 
anonymous memory, their contents not being flushed back to disk unless 
necessary?

It is worth noting, also, that this seems to be a situation introduced 
somewhere between 2.6.26 and 2.6.32, because I started noticing it when I 
upgraded from Debian 5.0 to 6.0. I've since tried it on 3.2.0, 3.5.4 and 
3.7.1, and it appears in every version. However, I can't easily go back 
and bisect, because the new init scripts don't support kernels older than 
2.6.32, unfortunately.

I'm sorry, also, if this is the completely wrong list for such 
discussions, but I couldn't find another one to match better.

Thanks for reading my wall of text!

--

Fredrik Tolf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: I/O blocked while dirty pages are being flushed
  2013-03-24  5:12 I/O blocked while dirty pages are being flushed Fredrik Tolf
@ 2013-03-24  6:56 ` Eric Wong
  2013-03-25  7:34   ` Fredrik Tolf
  2013-03-24 10:27 ` Bart Van Assche
  1 sibling, 1 reply; 6+ messages in thread
From: Eric Wong @ 2013-03-24  6:56 UTC (permalink / raw)
  To: Fredrik Tolf; +Cc: linux-kernel

Fredrik Tolf <fredrik@dolda2000.com> wrote:
> It is worth noting, also, that this seems to be a situation
> introduced somewhere between 2.6.26 and 2.6.32, because I started
> noticing it when I upgraded from Debian 5.0 to 6.0. I've since tried
> it on 3.2.0, 3.5.4 and 3.7.1, and it appears in every version.
> However, I can't easily go back and bisect, because the new init
> scripts don't support kernels older than 2.6.32, unfortunately.

I'm not sure about Debian-specific changes to the kernel, but
in the stock kernel, the dirty*ratios changes could affect you:

before 2.6.22  dirty_ratio=40 dirty_background_ratio=10
2.6.22-2.6.29  dirty_ratio=10 dirty_background_ratio=5
2.6.30-...     dirty_ratio=20 dirty_background_ratio=10

So try lowering these sysctls to 2.6.26 levels (or lower) and see if
that helps.

Fwiw, I usually use dirty_ratio=2 dirty_background_ratio=1 on servers
with a few gigs of RAM (or appropriately low dirty*bytes values).

Lowering dirty*ratio helps servers get more consistent performance under
constant I/O pressure and aggressively throttles processes before a
large amount of dirty pages becomes a problem (as you've noticed).

High dirty*ratio is good for some bursty desktop workloads and some
benchmarks, though...

ref: commit 07db59bd6b0f279c31044cba6787344f63be87ea
ref: commit 1b5e62b42b55c509eea04c3c0f25e42c8b35b564


Heck, on a particularly bad server (2.6.18, pre-dirty_*bytes sysctl)
with lots of RAM and horrible disk throughput (~10M/s), I set both
dirty_writeback_centisecs and dirty_expire_centisecs to 100 to get
acceptable performance for writing HTTP access logs.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: I/O blocked while dirty pages are being flushed
  2013-03-24  5:12 I/O blocked while dirty pages are being flushed Fredrik Tolf
  2013-03-24  6:56 ` Eric Wong
@ 2013-03-24 10:27 ` Bart Van Assche
  2013-03-25  7:31   ` Fredrik Tolf
  1 sibling, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2013-03-24 10:27 UTC (permalink / raw)
  To: Fredrik Tolf; +Cc: linux-kernel

On 03/24/13 06:12, Fredrik Tolf wrote:
> While this flush is running, I find that many a process goes into disk
> sleep waiting for the flush to complete. This includes the process
> manipulating the mmapped file whenever it tries to redirty a page
> currently waiting to be flushed, but also, for instance, programs that
> write() to log files (since, I guess, the buffer page backing the last
> written portion of the log file is being flushed).

Had you already encountered this article: Jonathan Corbet, The trouble 
with stable pages, March 13, 2012 (http://lwn.net/Articles/486311/) ?

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: I/O blocked while dirty pages are being flushed
  2013-03-24 10:27 ` Bart Van Assche
@ 2013-03-25  7:31   ` Fredrik Tolf
  2013-04-09 22:11     ` Jan Kara
  0 siblings, 1 reply; 6+ messages in thread
From: Fredrik Tolf @ 2013-03-25  7:31 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-kernel

On Sun, 24 Mar 2013, Bart Van Assche wrote:
> On 03/24/13 06:12, Fredrik Tolf wrote:
>> While this flush is running, I find that many a process goes into disk
>> sleep waiting for the flush to complete. This includes the process
>> manipulating the mmapped file whenever it tries to redirty a page
>> currently waiting to be flushed, but also, for instance, programs that
>> write() to log files (since, I guess, the buffer page backing the last
>> written portion of the log file is being flushed).
>
> Had you already encountered this article: Jonathan Corbet, The trouble with 
> stable pages, March 13, 2012 (http://lwn.net/Articles/486311/) ?

I had not, but that certainly does seem to be the exact problem I'm 
having. Thanks for the link; it was a very interesting read! Does anyone 
know if any progress has been made since about any resolution of the 
situation?

I notice linked mail threads with people saying that they have simply 
removed the calls to wait_on_page_writeback to resolve their problems. Can 
this still be considered safe, or have other systems than this 
block-device checksumming started depending on stable pages since? Like 
software RAID, for instance?

--

Fredrik Tolf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: I/O blocked while dirty pages are being flushed
  2013-03-24  6:56 ` Eric Wong
@ 2013-03-25  7:34   ` Fredrik Tolf
  0 siblings, 0 replies; 6+ messages in thread
From: Fredrik Tolf @ 2013-03-25  7:34 UTC (permalink / raw)
  To: Eric Wong; +Cc: linux-kernel

On Sun, 24 Mar 2013, Eric Wong wrote:
> Fredrik Tolf <fredrik@dolda2000.com> wrote:
>> It is worth noting, also, that this seems to be a situation
>> introduced somewhere between 2.6.26 and 2.6.32, because I started
>> noticing it when I upgraded from Debian 5.0 to 6.0. I've since tried
>> it on 3.2.0, 3.5.4 and 3.7.1, and it appears in every version.
>> However, I can't easily go back and bisect, because the new init
>> scripts don't support kernels older than 2.6.32, unfortunately.
>
> So try lowering these sysctls to 2.6.26 levels (or lower) and see if
> that helps.

Thanks for the tip, but since the page dirtying happens in fast bursts for 
me, rather than gradually over time, that just caused the same sizes of 
writes to happen more often instead, which only made it worse. :)

I'll continue investigating the stable-page route, instead, since that 
seems to be my exact problem.

--

Fredrik Tolf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: I/O blocked while dirty pages are being flushed
  2013-03-25  7:31   ` Fredrik Tolf
@ 2013-04-09 22:11     ` Jan Kara
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2013-04-09 22:11 UTC (permalink / raw)
  To: Fredrik Tolf; +Cc: Bart Van Assche, linux-kernel

On Mon 25-03-13 08:31:43, Fredrik Tolf wrote:
> On Sun, 24 Mar 2013, Bart Van Assche wrote:
> >On 03/24/13 06:12, Fredrik Tolf wrote:
> >>While this flush is running, I find that many a process goes into disk
> >>sleep waiting for the flush to complete. This includes the process
> >>manipulating the mmapped file whenever it tries to redirty a page
> >>currently waiting to be flushed, but also, for instance, programs that
> >>write() to log files (since, I guess, the buffer page backing the last
> >>written portion of the log file is being flushed).
> >
> >Had you already encountered this article: Jonathan Corbet, The
> >trouble with stable pages, March 13, 2012
> >(http://lwn.net/Articles/486311/) ?
> 
> I had not, but that certainly does seem to be the exact problem I'm
> having. Thanks for the link; it was a very interesting read! Does
> anyone know if any progress has been made since about any resolution
> of the situation?
  Patches to fix the situation are already merged in Linus' tree and will
be in 3.9. So stay tuned...

> I notice linked mail threads with people saying that they have
> simply removed the calls to wait_on_page_writeback to resolve their
> problems. Can this still be considered safe, or have other systems
> than this block-device checksumming started depending on stable
> pages since? Like software RAID, for instance?
  It should be safe. Noone except HW really realies on stable pages (or
better, they do the copying if they do need it).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-09 22:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-24  5:12 I/O blocked while dirty pages are being flushed Fredrik Tolf
2013-03-24  6:56 ` Eric Wong
2013-03-25  7:34   ` Fredrik Tolf
2013-03-24 10:27 ` Bart Van Assche
2013-03-25  7:31   ` Fredrik Tolf
2013-04-09 22:11     ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.