2.6.36 io bring the system to its knees

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* 2.6.36 io bring the system to its knees
       [not found]   ` <AANLkTinzJ9a+9w7G5X0uZpX2o-L8E6XW98VFKoF1R_-S@mail.gmail.com>
@ 2010-10-28  6:09     ` Aidar Kultayev
  2010-10-28  6:32       ` Pekka Enberg
  2010-10-31  1:22       ` Wu Fengguang
  0 siblings, 2 replies; 65+ messages in thread
From: Aidar Kultayev @ 2010-10-28  6:09 UTC (permalink / raw)
  To: linux-kernel, linux-mm; +Cc: mingo

[-- Attachment #1: Type: text/plain, Size: 4913 bytes --]

QUOTE:***
And yes, we'd very much like to fix such slowdowns via heuristics as
well (detecting large sequential IO and not letting it poison the
existing cache), so good bugreports and reproducing testcases sent to
linux-kernel@vger.kernel.org and people willing to try out
experimental kernel patches would definitely be welcome.

Thanks,

Ingo

*** http://ask.slashdot.org/story/10/10/23/1828251/The-State-of-Linux-IO-Scheduling-For-the-Desktop#commentlisting

I'll be rather quick & to the point here.

I get & run stable kernels the same day they appear on kernel.org in
hope to get away from these annoying, ignored, neglected slowdowns.

.config attached - I have Lenovo ThinkPad T400, Core2Duo T9400, 4Gb
DDR2, w/integrated GM45 - xf86-video-intel, iwlagn for the intel 5300
wifi, CFS, ext2 for
swap partition - 4Gb, ext3 for boot, ext4 - 400Gb for everything else.
All the hardware I have runs linux natively.
No kernel helped me from the days of 2.6.28.x upto 2.6.36. The dubbed
slowdown fixes never worked for me.
The kernel config choices are rather typical : NO_HZ, I don't go crazy for
1000Hz and use 100 or 250Hz and voluntary preemption.
Regarding the userland:
Love choices, hence nothing but Gentoo + KDE4. Multilib. Some relevant
info here:

==============================================================================================
emerge --info
Portage 2.1.8.3 (default/linux/amd64/10.0/desktop, gcc-4.5.1,
glibc-2.11.2-r0, 2.6.36 x86_64)
=================================================================
System uname: Linux-2.6.36-x86_64-Intel-R-_Core-TM-2_Duo_CPU_T9400_@_2.53GHz-with-gentoo-1.12.13
Timestamp of tree: Tue, 26 Oct 2010 10:30:01 +0000
app-shells/bash:     4.1_p7
dev-java/java-config: 2.1.11
dev-lang/python:     2.5.4-r4, 2.6.5-r3, 3.1.2-r4
dev-util/cmake:      2.8.1-r2
sys-apps/baselayout: 1.12.13
sys-apps/sandbox:    2.3-r1
sys-devel/autoconf:  2.13, 2.65-r1
sys-devel/automake:  1.7.9-r1, 1.8.5-r4, 1.9.6-r3, 1.10.3, 1.11.1
sys-devel/binutils:  2.20.1-r1
sys-devel/gcc:       4.5.1
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.10
sys-devel/make:      3.81-r2
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe -march=native"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/X11/xkb /usr/share/config /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d
/etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf
/etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/
/etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d
/etc/terminfo"
CXXFLAGS="-O2 -pipe -march=native"
==============================================================================================

Now, I know, Ingo said he wants : "good bugreports and reproducing
testcases" and my testcase is very real life and rather replicates my
typical use of computer these days:

- VirtualBox running XP only to look at some 2007 ppts ( the Ooo3
doens't cut it )
- JuK ( or VLC ) KDE's music player - some music in the background
- Chromium browser, with bunch of tabs with J2EE/J2SE javadocs, eats
out some significant swap space
- bash terminals
- ktorrent
- PDFs opened in okular, Adobe reader
- sync'ing portage tree & emerging new ebuilds ( usually with gentoo )
- Netbeans, Eclipse, apache, vsftd, sshd, tomcat and the whole 9 yards.

How do I notice slowdowns ? The JuK lags so badly that it can't play
any music, the mouse pointer freezes, kwin effects freeze for few
seconds.
How can I make it much worse ? I can try & run disk clean up under XP,
that is running in VBox, with folder compression. On top of it if I
start copying big files in linux ( 700MB avis, etc ), GUI effects
freeze, mouse pointer freezes for few seconds.

And this is on 2.6.36 that is supposed to cure these "features". From
this perspective, 2.6.36 is no better than any previous stable kernel
I've tried. Probably as bad with regards to IO issues.

Find attached screenshot ( latencytop_n_powertop.png ) which depicts
artifacts where the window manager froze at the time I was trying to
see a tab in Konsole where the powertop was running.

At the time, in the other tabs of the Konsole the following was running :
.dd if=/dev/zero of=test.10g bs=1M count=10000;rm test.10g
.cp /home/ak/1.distr/Linux/openSUSE-11.2-DVD-x86_64.iso
/home/lameruser/;rm /home/lameruser/openSUSE-11.2-DVD-x86_64.iso;
.dd if=/dev/zero of=test.10g bs=1M count=10000;rm test.10g
.cp /home/ak/funeral.avi /home/ak/0.junk/;rm /home/ak/0.junk/funeral.avi
.the XP under VBox was compacting its old files.

the iso is about 4Gb, the avi is about 700Mb

I do follow the problem here :
https://bugzilla.kernel.org/show_bug.cgi?id=12309

This is a monumental failure for kernel development project and FLOSS
in general.
Poor management, no leadership/championship, no responsibility, neglect=

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28  6:09     ` 2.6.36 io bring the system to its knees Aidar Kultayev
@ 2010-10-28  6:32       ` Pekka Enberg
  2010-10-28  9:00         ` Ingo Molnar
  2010-10-31  1:22       ` Wu Fengguang
  1 sibling, 1 reply; 65+ messages in thread
From: Pekka Enberg @ 2010-10-28  6:32 UTC (permalink / raw)
  To: Aidar Kultayev; +Cc: linux-kernel, linux-mm, mingo

On Thu, Oct 28, 2010 at 9:09 AM, Aidar Kultayev <the.aidar@gmail.com> wrote:
> Find attached screenshot ( latencytop_n_powertop.png ) which depicts
> artifacts where the window manager froze at the time I was trying to
> see a tab in Konsole where the powertop was running.

You seem to have forgotten to include the attachment.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28  6:32       ` Pekka Enberg
@ 2010-10-28  9:00         ` Ingo Molnar
  2010-10-28  9:34           ` Pekka Enberg
  2010-10-28 11:16           ` Pekka Enberg
  0 siblings, 2 replies; 65+ messages in thread
From: Ingo Molnar @ 2010-10-28  9:00 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Aidar Kultayev, linux-kernel, linux-mm


* Pekka Enberg <penberg@kernel.org> wrote:

> On Thu, Oct 28, 2010 at 9:09 AM, Aidar Kultayev <the.aidar@gmail.com> wrote:
> > Find attached screenshot ( latencytop_n_powertop.png ) which depicts
> > artifacts where the window manager froze at the time I was trying to
> > see a tab in Konsole where the powertop was running.
> 
> You seem to have forgotten to include the attachment.

I got it - it appears it was too large for lkml's ~500K mail size limit.

Aidar, mind sending a smaller image?

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28  9:00         ` Ingo Molnar
@ 2010-10-28  9:34           ` Pekka Enberg
  2010-10-28 11:16           ` Pekka Enberg
  1 sibling, 0 replies; 65+ messages in thread
From: Pekka Enberg @ 2010-10-28  9:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Aidar Kultayev, linux-kernel, linux-mm

On 10/28/10 12:00 PM, Ingo Molnar wrote:
> * Pekka Enberg<penberg@kernel.org>  wrote:
>
>> On Thu, Oct 28, 2010 at 9:09 AM, Aidar Kultayev<the.aidar@gmail.com>  wrote:
>>> Find attached screenshot ( latencytop_n_powertop.png ) which depicts
>>> artifacts where the window manager froze at the time I was trying to
>>> see a tab in Konsole where the powertop was running.
>> You seem to have forgotten to include the attachment.
> I got it - it appears it was too large for lkml's ~500K mail size limit.
>
> Aidar, mind sending a smaller image?

Ingo, didn't you have some nice script to capture system state? Maybe 
that could shed some light to what's going on in Aidar's system?

             Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28  9:00         ` Ingo Molnar
  2010-10-28  9:34           ` Pekka Enberg
@ 2010-10-28 11:16           ` Pekka Enberg
  2010-10-28 11:33             ` Aidar Kultayev
  2010-10-28 13:30             ` Ingo Molnar
  1 sibling, 2 replies; 65+ messages in thread
From: Pekka Enberg @ 2010-10-28 11:16 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Aidar Kultayev, linux-kernel, linux-mm

* Pekka Enberg <penberg@kernel.org> wrote:
>> On Thu, Oct 28, 2010 at 9:09 AM, Aidar Kultayev <the.aidar@gmail.com> wrote:
>> > Find attached screenshot ( latencytop_n_powertop.png ) which depicts
>> > artifacts where the window manager froze at the time I was trying to
>> > see a tab in Konsole where the powertop was running.
>>
>> You seem to have forgotten to include the attachment.

On Thu, Oct 28, 2010 at 12:00 PM, Ingo Molnar <mingo@elte.hu> wrote:
> I got it - it appears it was too large for lkml's ~500K mail size limit.
>
> Aidar, mind sending a smaller image?

Looks mostly VFS to me. Aidar, does killing Picasa make things
smoother for you? If so, maybe the VFS scalability patches will help.

                        Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28 11:16           ` Pekka Enberg
@ 2010-10-28 11:33             ` Aidar Kultayev
  2010-10-28 11:48               ` Pekka Enberg
  2010-10-28 13:30             ` Ingo Molnar
  1 sibling, 1 reply; 65+ messages in thread
From: Aidar Kultayev @ 2010-10-28 11:33 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Ingo Molnar, linux-kernel, linux-mm

if it wasn't picasa, it would have been something else. I mean if I
kill picasa ( later on it was done indexing new pics anyway ), it
would have been for virtualbox to thrash the io. So, nope, getting rid
of picasa doesn't help either. In general the systems responsiveness
or sluggishness is dominated by those io operations going on - the DD
& CP & probably VBOX issuing whole bunch of its load for IO.

Another way I see these delays, is when I leave system overnight, with
ktorrent & juk(stopped) in the background. It takes some time for
WM(kwin) to work out ALT+TAB the very next morning. But this might be
because the WM(kwin & its code) has been swapped out, because of long
period of not using it.

But, in general, I have troubles with responsiveness, when I try to
restore my virtualbox image from saved state. If there is a DD doing
its stuff while virtualbox is restoring its image, I see those nasty
delays - the kwin, mouse pointer, etc...

thanks Aidar

PS : the good thing is, and I am getting used to it, I don't loose
data, I mean the system doesn't hang, just freezes for a while :)

On Thu, Oct 28, 2010 at 5:16 PM, Pekka Enberg <penberg@kernel.org> wrote:
> * Pekka Enberg <penberg@kernel.org> wrote:
>>> On Thu, Oct 28, 2010 at 9:09 AM, Aidar Kultayev <the.aidar@gmail.com> wrote:
>>> > Find attached screenshot ( latencytop_n_powertop.png ) which depicts
>>> > artifacts where the window manager froze at the time I was trying to
>>> > see a tab in Konsole where the powertop was running.
>>>
>>> You seem to have forgotten to include the attachment.
>
> On Thu, Oct 28, 2010 at 12:00 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> I got it - it appears it was too large for lkml's ~500K mail size limit.
>>
>> Aidar, mind sending a smaller image?
>
> Looks mostly VFS to me. Aidar, does killing Picasa make things
> smoother for you? If so, maybe the VFS scalability patches will help.
>
>                        Pekka
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28 11:33             ` Aidar Kultayev
@ 2010-10-28 11:48               ` Pekka Enberg
  2010-10-28 12:18                 ` Aidar Kultayev
  2010-10-28 13:46                 ` Christoph Hellwig
  0 siblings, 2 replies; 65+ messages in thread
From: Pekka Enberg @ 2010-10-28 11:48 UTC (permalink / raw)
  To: Aidar Kultayev
  Cc: Ingo Molnar, linux-kernel, linux-mm, npiggin, Dave Chinner,
	Andrew Morton

On Thu, Oct 28, 2010 at 2:33 PM, Aidar Kultayev <the.aidar@gmail.com> wrote:
> if it wasn't picasa, it would have been something else. I mean if I
> kill picasa ( later on it was done indexing new pics anyway ), it
> would have been for virtualbox to thrash the io. So, nope, getting rid
> of picasa doesn't help either. In general the systems responsiveness
> or sluggishness is dominated by those io operations going on - the DD
> & CP & probably VBOX issuing whole bunch of its load for IO.

Do you still see high latencies in vfs_lseek() and vfs_fsync()? I'm
not a VFS expert but looking at your latencytop output, it seems that
fsync grabs ->i_mutex which blocks vfs_llseek(), for example. I'm not
sure why that causes high latencies though it's a mutex we're holding.

On Thu, Oct 28, 2010 at 2:33 PM, Aidar Kultayev <the.aidar@gmail.com> wrote:
> Another way I see these delays, is when I leave system overnight, with
> ktorrent & juk(stopped) in the background. It takes some time for
> WM(kwin) to work out ALT+TAB the very next morning. But this might be
> because the WM(kwin & its code) has been swapped out, because of long
> period of not using it.

Yeah, that's probably paging overhead.

P.S. Can you please upload latencytop output somewhere and post an URL
to it so other people can also see it?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28 11:48               ` Pekka Enberg
@ 2010-10-28 12:18                 ` Aidar Kultayev
  2010-10-28 13:46                 ` Christoph Hellwig
  1 sibling, 0 replies; 65+ messages in thread
From: Aidar Kultayev @ 2010-10-28 12:18 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Ingo Molnar, linux-kernel, linux-mm, npiggin, Dave Chinner,
	Andrew Morton

http://picasaweb.google.com/aidar.eiei/LinuxIo#5533068249408411698

I will look into latencytop output and will figure out a usage pattern
that is most annoying with regards to IO.
Will try to see what leads to that & if possible to make a screenshot
of what is going on.
The thing is, I don't think the program that captures the screenshots
does it in a meaningful way, because at the moment the system is
brought to its knees, I don't think that this particular program
(KSnapshot) can get away from being affected. I mean it might take a
snapshot which is not representative enough.


thanks, Aidar

On Thu, Oct 28, 2010 at 5:48 PM, Pekka Enberg <penberg@kernel.org> wrote:
> On Thu, Oct 28, 2010 at 2:33 PM, Aidar Kultayev <the.aidar@gmail.com> wrote:
>> if it wasn't picasa, it would have been something else. I mean if I
>> kill picasa ( later on it was done indexing new pics anyway ), it
>> would have been for virtualbox to thrash the io. So, nope, getting rid
>> of picasa doesn't help either. In general the systems responsiveness
>> or sluggishness is dominated by those io operations going on - the DD
>> & CP & probably VBOX issuing whole bunch of its load for IO.
>
> Do you still see high latencies in vfs_lseek() and vfs_fsync()? I'm
> not a VFS expert but looking at your latencytop output, it seems that
> fsync grabs ->i_mutex which blocks vfs_llseek(), for example. I'm not
> sure why that causes high latencies though it's a mutex we're holding.
>
> On Thu, Oct 28, 2010 at 2:33 PM, Aidar Kultayev <the.aidar@gmail.com> wrote:
>> Another way I see these delays, is when I leave system overnight, with
>> ktorrent & juk(stopped) in the background. It takes some time for
>> WM(kwin) to work out ALT+TAB the very next morning. But this might be
>> because the WM(kwin & its code) has been swapped out, because of long
>> period of not using it.
>
> Yeah, that's probably paging overhead.
>
> P.S. Can you please upload latencytop output somewhere and post an URL
> to it so other people can also see it?
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28 11:16           ` Pekka Enberg
  2010-10-28 11:33             ` Aidar Kultayev
@ 2010-10-28 13:30             ` Ingo Molnar
  2010-10-28 13:47               ` Christoph Hellwig
  2010-10-28 17:01               ` Chris Mason
  1 sibling, 2 replies; 65+ messages in thread
From: Ingo Molnar @ 2010-10-28 13:30 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Aidar Kultayev, linux-kernel, linux-mm, Linus Torvalds,
	Andrew Morton, Jens Axboe, Peter Zijlstra, Nick Piggin,
	Arjan van de Ven, Thomas Gleixner


* Pekka Enberg <penberg@kernel.org> wrote:

> * Pekka Enberg <penberg@kernel.org> wrote:
> >> On Thu, Oct 28, 2010 at 9:09 AM, Aidar Kultayev <the.aidar@gmail.com> wrote:
> >> > Find attached screenshot ( latencytop_n_powertop.png ) which depicts
> >> > artifacts where the window manager froze at the time I was trying to
> >> > see a tab in Konsole where the powertop was running.
> >>
> >> You seem to have forgotten to include the attachment.
> 
> On Thu, Oct 28, 2010 at 12:00 PM, Ingo Molnar <mingo@elte.hu> wrote:
> > I got it - it appears it was too large for lkml's ~500K mail size limit.
> >
> > Aidar, mind sending a smaller image?
> 
> Looks mostly VFS to me. Aidar, does killing Picasa make things smoother for you? 
> If so, maybe the VFS scalability patches will help.

Hm, but the VFS scalability patches mostly decrease CPU usage, and does that mostly 
on many-core systems.

While the bugreport here is rather plain:

> How do I notice slowdowns ? The JuK lags so badly that it can't play any music, 
> the mouse pointer freezes, kwin effects freeze for few seconds.
>
> How can I make it much worse ? I can try & run disk clean up under XP, that is 
> running in VBox, with folder compression. On top of it if I start copying big 
> files in linux ( 700MB avis, etc ), GUI effects freeze, mouse pointer freezes for 
> few seconds.
>
> And this is on 2.6.36 that is supposed to cure these "features". From this 
> perspective, 2.6.36 is no better than any previous stable kernel I've tried. 
> Probably as bad with regards to IO issues.

"Many seconds freezes" and slowdowns wont be fixed via the VFS scalability patches 
i'm afraid.

This has the appearance of some really bad IO or VM latency problem. Unfixed and 
present in stable kernel versions going from years ago all the way to v2.6.36.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28 11:48               ` Pekka Enberg
  2010-10-28 12:18                 ` Aidar Kultayev
@ 2010-10-28 13:46                 ` Christoph Hellwig
  2010-10-28 13:54                   ` Ingo Molnar
  1 sibling, 1 reply; 65+ messages in thread
From: Christoph Hellwig @ 2010-10-28 13:46 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Aidar Kultayev, Ingo Molnar, linux-kernel, linux-mm, npiggin,
	Dave Chinner, Andrew Morton

On Thu, Oct 28, 2010 at 02:48:20PM +0300, Pekka Enberg wrote:
> On Thu, Oct 28, 2010 at 2:33 PM, Aidar Kultayev <the.aidar@gmail.com> wrote:
> > if it wasn't picasa, it would have been something else. I mean if I
> > kill picasa ( later on it was done indexing new pics anyway ), it
> > would have been for virtualbox to thrash the io. So, nope, getting rid
> > of picasa doesn't help either. In general the systems responsiveness
> > or sluggishness is dominated by those io operations going on - the DD
> > & CP & probably VBOX issuing whole bunch of its load for IO.
> 
> Do you still see high latencies in vfs_lseek() and vfs_fsync()? I'm
> not a VFS expert but looking at your latencytop output, it seems that
> fsync grabs ->i_mutex which blocks vfs_llseek(), for example. I'm not
> sure why that causes high latencies though it's a mutex we're holding.

It does.  But what workload does a lot of llseeks while fsyncing the
same file?  I'd bet some application is doing really stupid things here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28 13:30             ` Ingo Molnar
@ 2010-10-28 13:47               ` Christoph Hellwig
  2010-10-28 13:50                 ` Ingo Molnar
  2010-10-28 17:01               ` Chris Mason
  1 sibling, 1 reply; 65+ messages in thread
From: Christoph Hellwig @ 2010-10-28 13:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner

On Thu, Oct 28, 2010 at 03:30:36PM +0200, Ingo Molnar wrote:
> > Looks mostly VFS to me. Aidar, does killing Picasa make things smoother for you? 
> > If so, maybe the VFS scalability patches will help.
> 
> Hm, but the VFS scalability patches mostly decrease CPU usage, and does that mostly 
> on many-core systems.

If you have i_mutex contention they are not going to change anything.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28 13:47               ` Christoph Hellwig
@ 2010-10-28 13:50                 ` Ingo Molnar
  0 siblings, 0 replies; 65+ messages in thread
From: Ingo Molnar @ 2010-10-28 13:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner


* Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Oct 28, 2010 at 03:30:36PM +0200, Ingo Molnar wrote:
> > > Looks mostly VFS to me. Aidar, does killing Picasa make things smoother for you? 
> > > If so, maybe the VFS scalability patches will help.
> > 
> > Hm, but the VFS scalability patches mostly decrease CPU usage, and does that 
> > mostly on many-core systems.
> 
> If you have i_mutex contention they are not going to change anything.

Yes, that was my point.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28 13:46                 ` Christoph Hellwig
@ 2010-10-28 13:54                   ` Ingo Molnar
  0 siblings, 0 replies; 65+ messages in thread
From: Ingo Molnar @ 2010-10-28 13:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm, npiggin,
	Dave Chinner, Andrew Morton


* Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Oct 28, 2010 at 02:48:20PM +0300, Pekka Enberg wrote:
> > On Thu, Oct 28, 2010 at 2:33 PM, Aidar Kultayev <the.aidar@gmail.com> wrote:
> > >
> > > if it wasn't picasa, it would have been something else. I mean if I kill 
> > > picasa ( later on it was done indexing new pics anyway ), it would have been 
> > > for virtualbox to thrash the io. So, nope, getting rid of picasa doesn't help 
> > > either. In general the systems responsiveness or sluggishness is dominated by 
> > > those io operations going on - the DD & CP & probably VBOX issuing whole bunch 
> > > of its load for IO.
> > 
> > Do you still see high latencies in vfs_lseek() and vfs_fsync()? I'm not a VFS 
> > expert but looking at your latencytop output, it seems that fsync grabs 
> > ->i_mutex which blocks vfs_llseek(), for example. I'm not sure why that causes 
> > high latencies though it's a mutex we're holding.
> 
> It does.  But what workload does a lot of llseeks while fsyncing the same file?  
> I'd bet some application is doing really stupid things here.

Seeking in a file and fsync-ing it does not seem like an inherently bad or even 
stupid thing to do - why do you claim that it is stupid?

If mixed seek()+fsync() is the reason for these latencies (which is just an 
assumption right now) then it needs to be fixed in the kernel, not in apps.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28 13:30             ` Ingo Molnar
  2010-10-28 13:47               ` Christoph Hellwig
@ 2010-10-28 17:01               ` Chris Mason
  2010-10-28 17:57                 ` Pekka Enberg
                                   ` (2 more replies)
  1 sibling, 3 replies; 65+ messages in thread
From: Chris Mason @ 2010-10-28 17:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner

On Thu, Oct 28, 2010 at 03:30:36PM +0200, Ingo Molnar wrote:
> 
> "Many seconds freezes" and slowdowns wont be fixed via the VFS scalability patches 
> i'm afraid.
> 
> This has the appearance of some really bad IO or VM latency problem. Unfixed and 
> present in stable kernel versions going from years ago all the way to v2.6.36.

Hmmm, the workload you're describing here has two special parts.  First
it dramatically overloads the disk, and then it has guis doing things
waiting for the disk.

The virtualbox part of the workload is probably filling the queue with
huge amounts of synchronous random IO (I'm assuming it is going in via
O_DIRECT), and this will defeat any attempts from the filesystem to tell
the elevator "hey look, my IO is synchronous, please do hurry"

So, I'd try mounting ext4 in data=writeback mode.  I can't make ext4
stall fsyncs on non-fsync IO locally and it looks like they have solved
the ext3 data=ordered problem.  But I still like to rule out old and
known issues before we dig into new things.

I'd also suggest something like the below patch which is entirely
untested and must be blessed by an actual ext4 developer.  I think we
can make fsync faster if we put the mutex locking down in the FS, but
until then it should be ok to drop the mutex while we are doing the
expensive log commits:

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 592adf2..1b7a637 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -114,6 +114,7 @@ int ext4_sync_file(struct file *file, int datasync)
 	if (ext4_should_journal_data(inode))
 		return ext4_force_commit(inode->i_sb);
 
+	mutex_unlock(&inode->i_mutex);
 	commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid;
 	if (jbd2_log_start_commit(journal, commit_tid)) {
 		/*
@@ -133,5 +134,7 @@ int ext4_sync_file(struct file *file, int datasync)
 	} else if (journal->j_flags & JBD2_BARRIER)
 		blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL,
 			BLKDEV_IFL_WAIT);
+
+	mutex_lock(&inode->i_mutex);
 	return ret;
 }


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28 17:01               ` Chris Mason
@ 2010-10-28 17:57                 ` Pekka Enberg
  2010-10-29 14:52                   ` Ted Ts'o
  2010-11-02 11:47                 ` Sanjoy Mahajan
  2010-11-04 23:44                 ` Jesper Juhl
  2 siblings, 1 reply; 65+ messages in thread
From: Pekka Enberg @ 2010-10-28 17:57 UTC (permalink / raw)
  To: Chris Mason, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Linus Torvalds, Andrew Morton, Jens Axboe,
	Peter Zijlstra, Nick Piggin, Arjan van de Ven, Thomas Gleixner

On Thu, Oct 28, 2010 at 03:30:36PM +0200, Ingo Molnar wrote:
>> "Many seconds freezes" and slowdowns wont be fixed via the VFS scalability patches
>> i'm afraid.
>>
>> This has the appearance of some really bad IO or VM latency problem. Unfixed and
>> present in stable kernel versions going from years ago all the way to v2.6.36.

On Thu, Oct 28, 2010 at 8:01 PM, Chris Mason <chris.mason@oracle.com> wrote:
> Hmmm, the workload you're describing here has two special parts.  First
> it dramatically overloads the disk, and then it has guis doing things
> waiting for the disk.
>
> The virtualbox part of the workload is probably filling the queue with
> huge amounts of synchronous random IO (I'm assuming it is going in via
> O_DIRECT), and this will defeat any attempts from the filesystem to tell
> the elevator "hey look, my IO is synchronous, please do hurry"
>
> So, I'd try mounting ext4 in data=writeback mode.  I can't make ext4
> stall fsyncs on non-fsync IO locally and it looks like they have solved
> the ext3 data=ordered problem.  But I still like to rule out old and
> known issues before we dig into new things.
>
> I'd also suggest something like the below patch which is entirely
> untested and must be blessed by an actual ext4 developer.  I think we
> can make fsync faster if we put the mutex locking down in the FS, but
> until then it should be ok to drop the mutex while we are doing the
> expensive log commits:
>
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 592adf2..1b7a637 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -114,6 +114,7 @@ int ext4_sync_file(struct file *file, int datasync)
>        if (ext4_should_journal_data(inode))
>                return ext4_force_commit(inode->i_sb);
>
> +       mutex_unlock(&inode->i_mutex);
>        commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid;
>        if (jbd2_log_start_commit(journal, commit_tid)) {
>                /*
> @@ -133,5 +134,7 @@ int ext4_sync_file(struct file *file, int datasync)
>        } else if (journal->j_flags & JBD2_BARRIER)
>                blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL,
>                        BLKDEV_IFL_WAIT);
> +
> +       mutex_lock(&inode->i_mutex);
>        return ret;
>  }

Don't we need to call ext4_should_writeback_data() before we drop the
lock? It pokes at ->i_mode which needs ->i_mutex AFAICT.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28 17:57                 ` Pekka Enberg
@ 2010-10-29 14:52                   ` Ted Ts'o
  2010-10-29 15:33                     ` Aidar Kultayev
  0 siblings, 1 reply; 65+ messages in thread
From: Ted Ts'o @ 2010-10-29 14:52 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Chris Mason, Ingo Molnar, Aidar Kultayev, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner

On Thu, Oct 28, 2010 at 08:57:49PM +0300, Pekka Enberg wrote:
> Don't we need to call ext4_should_writeback_data() before we drop the
> lock? It pokes at ->i_mode which needs ->i_mutex AFAICT.

No, it should be fine.  It's not like a file is going to change from
being a regular file to a directory or vice versa.  :-)

>From a quick inspection it looks OK, but I haven't had the time to
look more closely to be 100% sure, and of course I haven't run it
through a battery of regression tests.  For normal usage it should be
fine though.

Aidar, if you'd be willing to try it with this patch applied, and with
the file system mounted data=writeback, and then let me know what the
latencytop reports, that would be useful.  I'm fairly sure that fixing
llseek() probably won't make that much difference, since it will
probably spread things out to other places, but it would be good to
make the experiment.

We will probably also need to use the uninitialized bit for protecting
data from showing up after a crash for extent-based files, and turning
on data=writeback is a good way to simulate that.  (Sorry, no way
we're going to make a change like that this merge cycle, but that
might be something we could do for 2.6.38.)  But I am curious to see
what are the next things that come up as being problematic after that.

Thanks,

					- Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-29 14:52                   ` Ted Ts'o
@ 2010-10-29 15:33                     ` Aidar Kultayev
  2010-10-30  9:14                       ` Ingo Molnar
  0 siblings, 1 reply; 65+ messages in thread
From: Aidar Kultayev @ 2010-10-29 15:33 UTC (permalink / raw)
  To: Ted Ts'o, Pekka Enberg, Chris Mason, Ingo Molnar,
	Aidar Kultayev, linux-kernel, linux-mm, Linus Torvalds,
	Andrew Morton, Jens Axboe, Peter Zijlstra, Nick Piggin,
	Arjan van de Ven, Thomas Gleixner

puling the git now - I will try whatever you throw at me.

On Fri, Oct 29, 2010 at 8:52 PM, Ted Ts'o <tytso@mit.edu> wrote:
> On Thu, Oct 28, 2010 at 08:57:49PM +0300, Pekka Enberg wrote:
>> Don't we need to call ext4_should_writeback_data() before we drop the
>> lock? It pokes at ->i_mode which needs ->i_mutex AFAICT.
>
> No, it should be fine.  It's not like a file is going to change from
> being a regular file to a directory or vice versa.  :-)
>
> From a quick inspection it looks OK, but I haven't had the time to
> look more closely to be 100% sure, and of course I haven't run it
> through a battery of regression tests.  For normal usage it should be
> fine though.
>
> Aidar, if you'd be willing to try it with this patch applied, and with
> the file system mounted data=writeback, and then let me know what the
> latencytop reports, that would be useful.  I'm fairly sure that fixing
> llseek() probably won't make that much difference, since it will
> probably spread things out to other places, but it would be good to
> make the experiment.
>
> We will probably also need to use the uninitialized bit for protecting
> data from showing up after a crash for extent-based files, and turning
> on data=writeback is a good way to simulate that.  (Sorry, no way
> we're going to make a change like that this merge cycle, but that
> might be something we could do for 2.6.38.)  But I am curious to see
> what are the next things that come up as being problematic after that.
>
> Thanks,
>
>                                        - Ted
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-29 15:33                     ` Aidar Kultayev
@ 2010-10-30  9:14                       ` Ingo Molnar
  2010-10-30 13:02                         ` Aidar Kultayev
  0 siblings, 1 reply; 65+ messages in thread
From: Ingo Molnar @ 2010-10-30  9:14 UTC (permalink / raw)
  To: Aidar Kultayev
  Cc: Ted Ts'o, Pekka Enberg, Chris Mason, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner


* Aidar Kultayev <the.aidar@gmail.com> wrote:

> puling the git now - I will try whatever you throw at me.

Ted, i stuck that patch into tip:out-of-tree as:

  22fd555f6c5f: <not for upstream> ext4: Relax i_mutex hold times

So that Aidar can test things more easily via:

  http://people.redhat.com/mingo/tip.git/README

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-30  9:14                       ` Ingo Molnar
@ 2010-10-30 13:02                         ` Aidar Kultayev
  2010-10-30 19:06                           ` Chris Mason
                                             ` (2 more replies)
  0 siblings, 3 replies; 65+ messages in thread
From: Aidar Kultayev @ 2010-10-30 13:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ted Ts'o, Pekka Enberg, Chris Mason, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner

Hi,

here is what I have :

.ext4 mounted with data=ordered
.-tip tree ( uname -a gives : Linux pussy 2.6.36-tip+ )

here is the latencytop & powertop & top screenshot:

http://picasaweb.google.com/lh/photo/bMTgbVDoojwUeXtVdyvIKw?feat=directlink

the system is/was doing :
.dd if=/dev/zero of=test.10g bs=1M count=10000;rm test.10g
.netbeans
.compiling gcc-4.5.1
.running VBox, which wasn't doing any IO. The guest os was idle in other words
.vlc
.chromium
.firefox
and bunch of other small stuff.

Even without having running DD, the mouse cursor would occasionally
lag. The alt+tab effect in KWin would take 5+seconds to workout.
When I run DD on top of the workload it consistently made system much
more laggy. The cursor would freeze much more frequent. It is like if
you drag your mouse physically, but the cursor on the screen would
jump discretely, in other words there is no continuity.
Music would stop.

I am free to try out anything here.

thanks, Aidar

On Sat, Oct 30, 2010 at 3:14 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Aidar Kultayev <the.aidar@gmail.com> wrote:
>
>> puling the git now - I will try whatever you throw at me.
>
> Ted, i stuck that patch into tip:out-of-tree as:
>
>  22fd555f6c5f: <not for upstream> ext4: Relax i_mutex hold times
>
> So that Aidar can test things more easily via:
>
>  http://people.redhat.com/mingo/tip.git/README
>
> Thanks,
>
>        Ingo
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-30 13:02                         ` Aidar Kultayev
@ 2010-10-30 19:06                           ` Chris Mason
  2010-10-31  2:31                           ` Ted Ts'o
  2010-11-02  3:10                           ` Shaohua Li
  2 siblings, 0 replies; 65+ messages in thread
From: Chris Mason @ 2010-10-30 19:06 UTC (permalink / raw)
  To: Aidar Kultayev
  Cc: Ingo Molnar, Ted Ts'o, Pekka Enberg, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner

On Sat, Oct 30, 2010 at 07:02:35PM +0600, Aidar Kultayev wrote:
> Hi,
> 
> here is what I have :
> 
> .ext4 mounted with data=ordered
> .-tip tree ( uname -a gives : Linux pussy 2.6.36-tip+ )
> 
> here is the latencytop & powertop & top screenshot:
> 
> http://picasaweb.google.com/lh/photo/bMTgbVDoojwUeXtVdyvIKw?feat=directlink

It's actually better, fsync is missing anyway.  Please try ext4
data=writeback.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28  6:09     ` 2.6.36 io bring the system to its knees Aidar Kultayev
  2010-10-28  6:32       ` Pekka Enberg
@ 2010-10-31  1:22       ` Wu Fengguang
  2010-10-31  1:51         ` Wu Fengguang
  1 sibling, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2010-10-31  1:22 UTC (permalink / raw)
  To: Aidar Kultayev
  Cc: Ingo Molnar, Ted Ts'o, Pekka Enberg, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner

Hi Aidar,

On Thu, Oct 28, 2010 at 12:09:36PM +0600, Aidar Kultayev wrote:
> QUOTE:***
> And yes, we'd very much like to fix such slowdowns via heuristics as
> well (detecting large sequential IO and not letting it poison the
> existing cache), so good bugreports and reproducing testcases sent to
> linux-kernel@vger.kernel.org and people willing to try out
> experimental kernel patches would definitely be welcome.
> 
> Thanks,
> 
> Ingo
> 
> *** http://ask.slashdot.org/story/10/10/23/1828251/The-State-of-Linux-IO-Scheduling-For-the-Desktop#commentlisting
> 
> I'll be rather quick & to the point here.
> 
> I get & run stable kernels the same day they appear on kernel.org in
> hope to get away from these annoying, ignored, neglected slowdowns.
> 
> .config attached - I have Lenovo ThinkPad T400, Core2Duo T9400, 4Gb
> DDR2, w/integrated GM45 - xf86-video-intel, iwlagn for the intel 5300
> wifi, CFS, ext2 for
> swap partition - 4Gb, ext3 for boot, ext4 - 400Gb for everything else.

If possible I'd suggest to turn off the swap and check if it helps.
Some people reports(*) desktop responsiveness problems that can be
poor-man-fixed by disabling swap.

(*) https://bugzilla.kernel.org/show_bug.cgi?id=12309

> All the hardware I have runs linux natively.
> No kernel helped me from the days of 2.6.28.x upto 2.6.36. The dubbed
> slowdown fixes never worked for me.

There are multiple causes of slowdown. 2.6.36 includes some easy fix.
The swap problem is (maybe partly) root caused(**), however will need a
rather complex and intrusive patch to fix.

(**) http://www.spinics.net/lists/linux-fsdevel/msg35397.html

Thanks,
Fengguang

> The kernel config choices are rather typical : NO_HZ, I don't go crazy for
> 1000Hz and use 100 or 250Hz and voluntary preemption.
> Regarding the userland:
> Love choices, hence nothing but Gentoo + KDE4. Multilib. Some relevant
> info here:
> 
> ==============================================================================================
> emerge --info
> Portage 2.1.8.3 (default/linux/amd64/10.0/desktop, gcc-4.5.1,
> glibc-2.11.2-r0, 2.6.36 x86_64)
> =================================================================
> System uname: Linux-2.6.36-x86_64-Intel-R-_Core-TM-2_Duo_CPU_T9400_@_2.53GHz-with-gentoo-1.12.13
> Timestamp of tree: Tue, 26 Oct 2010 10:30:01 +0000
> app-shells/bash: A  A  4.1_p7
> dev-java/java-config: 2.1.11
> dev-lang/python: A  A  2.5.4-r4, 2.6.5-r3, 3.1.2-r4
> dev-util/cmake: A  A  A 2.8.1-r2
> sys-apps/baselayout: 1.12.13
> sys-apps/sandbox: A  A 2.3-r1
> sys-devel/autoconf: A 2.13, 2.65-r1
> sys-devel/automake: A 1.7.9-r1, 1.8.5-r4, 1.9.6-r3, 1.10.3, 1.11.1
> sys-devel/binutils: A 2.20.1-r1
> sys-devel/gcc: A  A  A  4.5.1
> sys-devel/gcc-config: 1.4.1
> sys-devel/libtool: A  2.2.10
> sys-devel/make: A  A  A 3.81-r2
> CBUILD="x86_64-pc-linux-gnu"
> CFLAGS="-O2 -pipe -march=native"
> CHOST="x86_64-pc-linux-gnu"
> CONFIG_PROTECT="/etc /usr/share/X11/xkb /usr/share/config /var/lib/hsqldb"
> CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d
> /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf
> /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/
> /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d
> /etc/terminfo"
> CXXFLAGS="-O2 -pipe -march=native"
> ==============================================================================================
> 
> Now, I know, Ingo said he wants : "good bugreports and reproducing
> testcases" and my testcase is very real life and rather replicates my
> typical use of computer these days:
> 
> - VirtualBox running XP only to look at some 2007 ppts ( the Ooo3
> doens't cut it )
> - JuK ( or VLC ) KDE's music player - some music in the background
> - Chromium browser, with bunch of tabs with J2EE/J2SE javadocs, eats
> out some significant swap space
> - bash terminals
> - ktorrent
> - PDFs opened in okular, Adobe reader
> - sync'ing portage tree & emerging new ebuilds ( usually with gentoo )
> - Netbeans, Eclipse, apache, vsftd, sshd, tomcat and the whole 9 yards.
> 
> How do I notice slowdowns ? The JuK lags so badly that it can't play
> any music, the mouse pointer freezes, kwin effects freeze for few
> seconds.
> How can I make it much worse ? I can try & run disk clean up under XP,
> that is running in VBox, with folder compression. On top of it if I
> start copying big files in linux ( 700MB avis, etc ), GUI effects
> freeze, mouse pointer freezes for few seconds.
> 
> And this is on 2.6.36 that is supposed to cure these "features". From
> this perspective, 2.6.36 is no better than any previous stable kernel
> I've tried. Probably as bad with regards to IO issues.
> 
> 
> Find attached screenshot ( latencytop_n_powertop.png ) which depicts
> artifacts where the window manager froze at the time I was trying to
> see a tab in Konsole where the powertop was running.
> 
> At the time, in the other tabs of the Konsole the following was running :
> .dd if=/dev/zero of=test.10g bs=1M count=10000;rm test.10g
> .cp /home/ak/1.distr/Linux/openSUSE-11.2-DVD-x86_64.iso
> /home/lameruser/;rm /home/lameruser/openSUSE-11.2-DVD-x86_64.iso;
> .dd if=/dev/zero of=test.10g bs=1M count=10000;rm test.10g
> .cp /home/ak/funeral.avi /home/ak/0.junk/;rm /home/ak/0.junk/funeral.avi
> .the XP under VBox was compacting its old files.
> 
> the iso is about 4Gb, the avi is about 700Mb
> 
> I do follow the problem here :
> https://bugzilla.kernel.org/show_bug.cgi?id=12309
> 
> This is a monumental failure for kernel development project andA FLOSS
> in general.
> Poor management,A no leadership/championship,A no responsibility, neglect

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-31  1:22       ` Wu Fengguang
@ 2010-10-31  1:51         ` Wu Fengguang
  2010-11-01  1:09           ` Dimitrios Apostolou
  0 siblings, 1 reply; 65+ messages in thread
From: Wu Fengguang @ 2010-10-31  1:51 UTC (permalink / raw)
  To: Aidar Kultayev
  Cc: Ingo Molnar, Ted Ts'o, Pekka Enberg, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner

> > How do I notice slowdowns ? The JuK lags so badly that it can't play
> > any music, the mouse pointer freezes, kwin effects freeze for few
> > seconds.

> > How can I make it much worse ? I can try & run disk clean up under XP,
> > that is running in VBox, with folder compression. On top of it if I
> > start copying big files in linux ( 700MB avis, etc ), GUI effects
> > freeze, mouse pointer freezes for few seconds.

It may also help to lower the dirty ratio.

echo 5 > /proc/sys/vm/dirty_ratio

Memory pressure + heavy write can easily hurt responsiveness.

- eats up to 20% (the default value for dirty_ratio) memory with dirty
  pages and hence increase the memory pressure and number of swap IO

- the file copy makes the device write congested and hence makes
  pageout() easily blocked in get_request_wait()

As a result every application may be slowed down by the heavy swap IO
when page fault as well as being blocked when allocating memory (which
may go into direct reclaim and then call pageout()). 

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-30 13:02                         ` Aidar Kultayev
  2010-10-30 19:06                           ` Chris Mason
@ 2010-10-31  2:31                           ` Ted Ts'o
  2010-10-31 17:49                             ` Corrado Zoccolo
  2010-11-02  3:10                           ` Shaohua Li
  2 siblings, 1 reply; 65+ messages in thread
From: Ted Ts'o @ 2010-10-31  2:31 UTC (permalink / raw)
  To: Aidar Kultayev
  Cc: Ingo Molnar, Pekka Enberg, Chris Mason, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner

On Sat, Oct 30, 2010 at 07:02:35PM +0600, Aidar Kultayev wrote:
> the system is/was doing :
> .dd if=/dev/zero of=test.10g bs=1M count=10000;rm test.10g
> .netbeans
> .compiling gcc-4.5.1
> .running VBox, which wasn't doing any IO. The guest os was idle in other words
> .vlc
> .chromium
> .firefox
> and bunch of other small stuff.
> 
> Even without having running DD, the mouse cursor would occasionally
> lag. The alt+tab effect in KWin would take 5+seconds to workout.
> When I run DD on top of the workload it consistently made system much
> more laggy. The cursor would freeze much more frequent. It is like if
> you drag your mouse physically, but the cursor on the screen would
> jump discretely, in other words there is no continuity.
> Music would stop.

If you start shutting down tasks, Vbox, netbeans, chromium, etc., at
what point does the cursor start tracking the system easily?  Is the
system swapping?  Do you know how to use tools like dstat or iostat to
see if the system is actively writing to the swap partition?  (And are
you using a swap partition or a swap file?)

The fact that cursor isn't tracking well even when the dd is running,
and presumably the only source of I/O is the gcc and vlc, makes me
suspect that you may be swapping pretty heavily.  Have you tried
investigating that possibility, and made sure it has been ruled out?

						- Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-31  2:31                           ` Ted Ts'o
@ 2010-10-31 17:49                             ` Corrado Zoccolo
  0 siblings, 0 replies; 65+ messages in thread
From: Corrado Zoccolo @ 2010-10-31 17:49 UTC (permalink / raw)
  To: Ted Ts'o, Aidar Kultayev, Ingo Molnar, Pekka Enberg,
	Chris Mason, linux-kernel, linux-mm, Linus Torvalds,
	Andrew Morton, Jens Axboe, Peter Zijlstra, Nick Piggin,
	Arjan van de Ven, Thomas Gleixner

On Sun, Oct 31, 2010 at 3:31 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Sat, Oct 30, 2010 at 07:02:35PM +0600, Aidar Kultayev wrote:
>> the system is/was doing :
>> .dd if=/dev/zero of=test.10g bs=1M count=10000;rm test.10g
>> .netbeans
>> .compiling gcc-4.5.1
>> .running VBox, which wasn't doing any IO. The guest os was idle in other words
>> .vlc
>> .chromium
>> .firefox
>> and bunch of other small stuff.
>>
>> Even without having running DD, the mouse cursor would occasionally
>> lag. The alt+tab effect in KWin would take 5+seconds to workout.
>> When I run DD on top of the workload it consistently made system much
>> more laggy. The cursor would freeze much more frequent. It is like if
>> you drag your mouse physically, but the cursor on the screen would
>> jump discretely, in other words there is no continuity.
>> Music would stop.
>
> If you start shutting down tasks, Vbox, netbeans, chromium, etc., at
> what point does the cursor start tracking the system easily?  Is the
> system swapping?  Do you know how to use tools like dstat or iostat to
> see if the system is actively writing to the swap partition?  (And are
> you using a swap partition or a swap file?)
>
> The fact that cursor isn't tracking well even when the dd is running,
> and presumably the only source of I/O is the gcc and vlc, makes me
> suspect that you may be swapping pretty heavily.  Have you tried
> investigating that possibility, and made sure it has been ruled out?

Something to try is also to raise X cpu scheduling priority, since I
would be really surprised if we evict from memory the routine that
draws the cursor.
BTW, I've seen the cursor jumping problem even when not swapping, and
with minimal *real* disk activity (but with heavy usage of a fuse
filesystem providing remote resources), and high cpu activity.
Raising X priority solved the problem with the mouse pointer, but the
gui programs still didn't respond quickly...

Thanks
Corrado

>
>                                                - Ted
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-31  1:51         ` Wu Fengguang
@ 2010-11-01  1:09           ` Dimitrios Apostolou
  2010-11-02  1:20             ` Wu Fengguang
  0 siblings, 1 reply; 65+ messages in thread
From: Dimitrios Apostolou @ 2010-11-01  1:09 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel

Hello, 

On Sun, 31 Oct 2010 09:51:32 +0800, Wu Fengguang wrote:
> It may also help to lower the dirty ratio.
> 
> echo 5 > /proc/sys/vm/dirty_ratio
> 
> Memory pressure + heavy write can easily hurt responsiveness.
> 
> - eats up to 20% (the default value for dirty_ratio) memory with dirty
>   pages and hence increase the memory pressure and number of swap IO

My experience has been different with that. Wouldn't it make more sense 
to _increase_ dirty_ratio (to 50 lets say) and at the same time decrease 
dirty_background_ratio? That way writing to disk starts early, but the 
related apps stall waiting for I/O only when dirty_ratio is reached.


Thanks, 
Dimitris

> 
> - the file copy makes the device write congested and hence makes
>   pageout() easily blocked in get_request_wait()
> 
> As a result every application may be slowed down by the heavy swap IO
> when page fault as well as being blocked when allocating memory (which
> may go into direct reclaim and then call pageout()).
> 
> Thanks,
> Fengguang


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-01  1:09           ` Dimitrios Apostolou
@ 2010-11-02  1:20             ` Wu Fengguang
  0 siblings, 0 replies; 65+ messages in thread
From: Wu Fengguang @ 2010-11-02  1:20 UTC (permalink / raw)
  To: Dimitrios Apostolou; +Cc: linux-mm, linux-kernel

On Mon, Nov 01, 2010 at 01:09:34AM +0000, Dimitrios Apostolou wrote:
> Hello, 
> 
> On Sun, 31 Oct 2010 09:51:32 +0800, Wu Fengguang wrote:
> > It may also help to lower the dirty ratio.
> > 
> > echo 5 > /proc/sys/vm/dirty_ratio
> > 
> > Memory pressure + heavy write can easily hurt responsiveness.
> > 
> > - eats up to 20% (the default value for dirty_ratio) memory with dirty
> >   pages and hence increase the memory pressure and number of swap IO
> 
> My experience has been different with that. Wouldn't it make more sense 
> to _increase_ dirty_ratio (to 50 lets say) and at the same time decrease 
> dirty_background_ratio? That way writing to disk starts early, but the 
> related apps stall waiting for I/O only when dirty_ratio is reached.

50% dirty ratio may help before the system goes thrashing (writing
processes will be throttled less/later). However Aidar is seeing hours
of unresponsiveness with heavy IO, in this case large dirty ratio
won't help reduce the throttling any more.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-30 13:02                         ` Aidar Kultayev
  2010-10-30 19:06                           ` Chris Mason
  2010-10-31  2:31                           ` Ted Ts'o
@ 2010-11-02  3:10                           ` Shaohua Li
  2 siblings, 0 replies; 65+ messages in thread
From: Shaohua Li @ 2010-11-02  3:10 UTC (permalink / raw)
  To: Aidar Kultayev
  Cc: Ingo Molnar, Ted Ts'o, Pekka Enberg, Chris Mason,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds,
	Andrew Morton, Jens Axboe, Peter Zijlstra, Nick Piggin,
	Arjan van de Ven, Thomas Gleixner

On Sat, 2010-10-30 at 21:02 +0800, Aidar Kultayev wrote:
> Hi,
> 
> here is what I have :
> 
> .ext4 mounted with data=ordered
> .-tip tree ( uname -a gives : Linux pussy 2.6.36-tip+ )
> 
> here is the latencytop & powertop & top screenshot:
> 
> http://picasaweb.google.com/lh/photo/bMTgbVDoojwUeXtVdyvIKw?feat=directlink
> 
> the system is/was doing :
> .dd if=/dev/zero of=test.10g bs=1M count=10000;rm test.10g
> .netbeans
> .compiling gcc-4.5.1
> .running VBox, which wasn't doing any IO. The guest os was idle in other words
> .vlc
> .chromium
> .firefox
> and bunch of other small stuff.
> 
> Even without having running DD, the mouse cursor would occasionally
> lag. The alt+tab effect in KWin would take 5+seconds to workout.
> When I run DD on top of the workload it consistently made system much
> more laggy. The cursor would freeze much more frequent. It is like if
> you drag your mouse physically, but the cursor on the screen would
> jump discretely, in other words there is no continuity.
> Music would stop.
> 
> I am free to try out anything here.
would you please try the vm_exec protect patch here?
http://www.spinics.net/lists/linux-mm/msg09617.html

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28 17:01               ` Chris Mason
  2010-10-28 17:57                 ` Pekka Enberg
@ 2010-11-02 11:47                 ` Sanjoy Mahajan
  2010-11-02 13:12                   ` Chris Mason
  2010-11-04 23:44                 ` Jesper Juhl
  2 siblings, 1 reply; 65+ messages in thread
From: Sanjoy Mahajan @ 2010-11-02 11:47 UTC (permalink / raw)
  To: Chris Mason
  Cc: Ingo Molnar, Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter.Zijl

Chris Mason <chris.mason@oracle.com> wrote:

> > This has the appearance of some really bad IO or VM latency
> > problem. Unfixed and present in stable kernel versions going from
> > years ago all the way to v2.6.36.
> 
> Hmmm, the workload you're describing here has two special parts.
> First it dramatically overloads the disk, and then it has guis doing
> things waiting for the disk.

I think I see this same issue every few days when I back up my hard
drive to a USB hard drive using rsync.  While the backup is running, the
interactive response is bad.  A reproducible measurement of the badness
is starting an rxvt with F8 (bound to "rxvt &" in my .twmrc).  Often it
takes 8 seconds for the window to appear (as it just did about 2 minutes
ago)!  (Starting a subsequent rxvt is quick.)

The command for running the backup:

  rsync -av --delete /etc /home /media/usbdrive/bak > /tmp/homebackup.log

The hardware is a T60 w/ Intel graphics and wireless, 1.5GB RAM, 5400rpm
160GB harddrive w/ ext3 filesystems, and it's running vanilla 2.6.36.
There's not much memory pressure.  The swap is mostly empty, and there's
usually a Firefox eating 500MB of RAM.  Even Emacs at 50MB is in the
noise compared to the Firefox.

Here's the 'free' output:

             total       used       free     shared    buffers     cached
Mem:       1545292    1500288      45004          0      92848     713988
-/+ buffers/cache:     693452     851840
Swap:      2000088      22680    1977408

What tests or probes are worth running when the problem reappears in
order to find the root cause?

-Sanjoy

`Until lions have their historians, tales of the hunt shall always
 glorify the hunters.'  --African Proverb

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-02 11:47                 ` Sanjoy Mahajan
@ 2010-11-02 13:12                   ` Chris Mason
  2010-11-04 16:05                     ` Sanjoy Mahajan
  0 siblings, 1 reply; 65+ messages in thread
From: Chris Mason @ 2010-11-02 13:12 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: Ingo Molnar, Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter.Zijl

On Tue, Nov 02, 2010 at 07:47:15AM -0400, Sanjoy Mahajan wrote:
> Chris Mason <chris.mason@oracle.com> wrote:
> 
> > > This has the appearance of some really bad IO or VM latency
> > > problem. Unfixed and present in stable kernel versions going from
> > > years ago all the way to v2.6.36.
> > 
> > Hmmm, the workload you're describing here has two special parts.
> > First it dramatically overloads the disk, and then it has guis doing
> > things waiting for the disk.
> 
> I think I see this same issue every few days when I back up my hard
> drive to a USB hard drive using rsync.  While the backup is running, the
> interactive response is bad.  A reproducible measurement of the badness
> is starting an rxvt with F8 (bound to "rxvt &" in my .twmrc).  Often it
> takes 8 seconds for the window to appear (as it just did about 2 minutes
> ago)!  (Starting a subsequent rxvt is quick.)

So this sounds like the backup is just thrashing your cache.  Latencies
starting an app are less surprising than latencies where a running app
doesn't respond at all.

Does rsync have the option to do an fadvise DONTNEED?

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-02 13:12                   ` Chris Mason
@ 2010-11-04 16:05                     ` Sanjoy Mahajan
  2010-11-04 23:35                       ` Steven Barrett
  0 siblings, 1 reply; 65+ messages in thread
From: Sanjoy Mahajan @ 2010-11-04 16:05 UTC (permalink / raw)
  To: Chris Mason
  Cc: Ingo Molnar, Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter.Zijl

> So this sounds like the backup is just thrashing your cache.

I think it's more than that.  Starting an rxvt shouldn't take 8 seconds,
even with a cold cache.  Actually, it does take a while, so you do have
a point.  I just did

  echo 3 > /proc/sys/vm/drop_caches

and then started rxvt.  That takes about 3 seconds (which seems long,
but I don't know wherein that slowness lies), of which maybe 0.25
seconds is loading and running 'date':

$ time rxvt -e date
real	0m2.782s
user	0m0.148s
sys	0m0.032s

The 8-second delay during the rsync must have at least two causes: (1)
the cache is wiped out, and (2) the rxvt binary cannot be paged in
quickly because the disk is doing lots of other I/O.  

Can the system someknow that paging in the rxvt binary and shared
libraries is interactive I/O, because it was started by an interactive
process, and therefore should take priority over the rsync?

> Does rsync have the option to do an fadvise DONTNEED?

I couldn't find one.  It would be good to have a solution that is
independent of the backup app.  (The 'locate' cron job does a similar
thrashing of the interactive response.)

-Sanjoy

`Until lions have their historians, tales of the hunt shall always
 glorify the hunters.'  --African Proverb

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-04 16:05                     ` Sanjoy Mahajan
@ 2010-11-04 23:35                       ` Steven Barrett
  0 siblings, 0 replies; 65+ messages in thread
From: Steven Barrett @ 2010-11-04 23:35 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: Chris Mason, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Linus Torvalds, Andrew Morton, Jens Axboe,
	Peter.Zijl

On 11/04/2010 11:05 AM, Sanjoy Mahajan wrote:
>> So this sounds like the backup is just thrashing your cache.
> 
> I think it's more than that.  Starting an rxvt shouldn't take 8 seconds,
> even with a cold cache.  Actually, it does take a while, so you do have
> a point.  I just did
> 
>   echo 3 > /proc/sys/vm/drop_caches
> 
> and then started rxvt.  That takes about 3 seconds (which seems long,
> but I don't know wherein that slowness lies), of which maybe 0.25
> seconds is loading and running 'date':
> 
> $ time rxvt -e date
> real	0m2.782s
> user	0m0.148s
> sys	0m0.032s
> 
> The 8-second delay during the rsync must have at least two causes: (1)
> the cache is wiped out, and (2) the rxvt binary cannot be paged in
> quickly because the disk is doing lots of other I/O.  
> 
> Can the system someknow that paging in the rxvt binary and shared
> libraries is interactive I/O, because it was started by an interactive
> process, and therefore should take priority over the rsync?
> 
>> Does rsync have the option to do an fadvise DONTNEED?
> 
> I couldn't find one.  It would be good to have a solution that is
> independent of the backup app.  (The 'locate' cron job does a similar
> thrashing of the interactive response.)

I'm definitely no expert in Linux' file cache management, but from what
I've experienced... isn't the real problem that the "interactive"
processes, like your web browser or file manager, lose their inode and
dentry cache when rsync runs?  Then while rsync is busy reading and
writing to the disk, whenever you click on your interactive application,
it tries to read what it lost to rsync from the disk while rsync is
still thrashing your inode/dentry cache.

This is a major problem even when my system has lots of ram (4gB on this
laptop).

What has helped me, however, is reducing vm.vfs_cache_pressure to a
smaller value (25 here) so that Linux prefers to retain the current
inode / dentry cache rather than suddenly give it up for a new greedy
I/O type of program.  The only side effect is that file copying is a
little slower than usual... totally worth it though.

> 
> -Sanjoy
> 
> `Until lions have their historians, tales of the hunt shall always
>  glorify the hunters.'  --African Proverb

	Steven Barrett

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-10-28 17:01               ` Chris Mason
  2010-10-28 17:57                 ` Pekka Enberg
  2010-11-02 11:47                 ` Sanjoy Mahajan
@ 2010-11-04 23:44                 ` Jesper Juhl
  2010-11-04 23:48                   ` Jesper Juhl
  2 siblings, 1 reply; 65+ messages in thread
From: Jesper Juhl @ 2010-11-04 23:44 UTC (permalink / raw)
  To: Chris Mason
  Cc: Ingo Molnar, Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Ted Ts'o,
	Corrado Zoccolo, Shaohua Li, Sanjoy Mahajan, Steven Barrett

On Thu, 28 Oct 2010, Chris Mason wrote:

> On Thu, Oct 28, 2010 at 03:30:36PM +0200, Ingo Molnar wrote:
> > 
> > "Many seconds freezes" and slowdowns wont be fixed via the VFS scalability patches 
> > i'm afraid.
> > 
> > This has the appearance of some really bad IO or VM latency problem. Unfixed and 
> > present in stable kernel versions going from years ago all the way to v2.6.36.
> 
> Hmmm, the workload you're describing here has two special parts.  First
> it dramatically overloads the disk, and then it has guis doing things
> waiting for the disk.
> 

Just want to chime in with a 'me too'.

I see something similar on Arch Linux when doing 'pacman -Syyuv' and there 
are many (as in more than 5-10) updates to apply. While the update is 
running (even if that's all the system is doing) system responsiveness is 
terrible - just starting 'chromium' which is usually instant (at least 
less than 2 sec at worst) can take upwards of 10 seconds and the mouse 
cursor in X starts to jump a bit as well and switching virtual desktops 
noticably lags when redrawing the new desktop if there's a full screen app 
like gimp or OpenOffice open there. This is on a Lenovo Thinkpad R61i 
which has a 'Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz' CPU, 2GB of 
memory and 499996 kilobytes of swap.

-- 
Jesper Juhl <jj@chaosbits.net>             http://www.chaosbits.net/
Plain text mails only, please      http://www.expita.com/nomime.html
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-04 23:44                 ` Jesper Juhl
@ 2010-11-04 23:48                   ` Jesper Juhl
  2010-11-05  1:43                     ` Dave Chinner
  0 siblings, 1 reply; 65+ messages in thread
From: Jesper Juhl @ 2010-11-04 23:48 UTC (permalink / raw)
  To: Chris Mason
  Cc: Ingo Molnar, Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Ted Ts'o,
	Corrado Zoccolo, Shaohua Li, Sanjoy Mahajan, Steven Barrett

On Fri, 5 Nov 2010, Jesper Juhl wrote:

> On Thu, 28 Oct 2010, Chris Mason wrote:
> 
> > On Thu, Oct 28, 2010 at 03:30:36PM +0200, Ingo Molnar wrote:
> > > 
> > > "Many seconds freezes" and slowdowns wont be fixed via the VFS scalability patches 
> > > i'm afraid.
> > > 
> > > This has the appearance of some really bad IO or VM latency problem. Unfixed and 
> > > present in stable kernel versions going from years ago all the way to v2.6.36.
> > 
> > Hmmm, the workload you're describing here has two special parts.  First
> > it dramatically overloads the disk, and then it has guis doing things
> > waiting for the disk.
> > 
> 
> Just want to chime in with a 'me too'.
> 
> I see something similar on Arch Linux when doing 'pacman -Syyuv' and there 
> are many (as in more than 5-10) updates to apply. While the update is 
> running (even if that's all the system is doing) system responsiveness is 
> terrible - just starting 'chromium' which is usually instant (at least 
> less than 2 sec at worst) can take upwards of 10 seconds and the mouse 
> cursor in X starts to jump a bit as well and switching virtual desktops 
> noticably lags when redrawing the new desktop if there's a full screen app 
> like gimp or OpenOffice open there. This is on a Lenovo Thinkpad R61i 
> which has a 'Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz' CPU, 2GB of 
> memory and 499996 kilobytes of swap.
> 
Forgot to mention the kernel I currently experience this with : 

[jj@dragon ~]$ uname -a
Linux dragon 2.6.35-ARCH #1 SMP PREEMPT Sat Oct 30 21:22:26 CEST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz GenuineIntel GNU/Linux

-- 
Jesper Juhl <jj@chaosbits.net>             http://www.chaosbits.net/
Plain text mails only, please      http://www.expita.com/nomime.html
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-04 23:48                   ` Jesper Juhl
@ 2010-11-05  1:43                     ` Dave Chinner
  2010-11-05 12:48                       ` Sanjoy Mahajan
                                         ` (2 more replies)
  0 siblings, 3 replies; 65+ messages in thread
From: Dave Chinner @ 2010-11-05  1:43 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Chris Mason, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Linus Torvalds, Andrew Morton, Jens Axboe,
	Peter Zijlstra, Nick Piggin, Arjan van de Ven, Thomas Gleixner,
	Ted Ts'o, Corrado Zoccolo, Shaohua Li, Sanjoy Mahajan,
	Steven Barrett

On Fri, Nov 05, 2010 at 12:48:17AM +0100, Jesper Juhl wrote:
> On Fri, 5 Nov 2010, Jesper Juhl wrote:
> 
> > On Thu, 28 Oct 2010, Chris Mason wrote:
> > 
> > > On Thu, Oct 28, 2010 at 03:30:36PM +0200, Ingo Molnar wrote:
> > > > 
> > > > "Many seconds freezes" and slowdowns wont be fixed via the VFS scalability patches 
> > > > i'm afraid.
> > > > 
> > > > This has the appearance of some really bad IO or VM latency problem. Unfixed and 
> > > > present in stable kernel versions going from years ago all the way to v2.6.36.
> > > 
> > > Hmmm, the workload you're describing here has two special parts.  First
> > > it dramatically overloads the disk, and then it has guis doing things
> > > waiting for the disk.
> > > 
> > 
> > Just want to chime in with a 'me too'.
> > 
> > I see something similar on Arch Linux when doing 'pacman -Syyuv' and there 
> > are many (as in more than 5-10) updates to apply. While the update is 
> > running (even if that's all the system is doing) system responsiveness is 
> > terrible - just starting 'chromium' which is usually instant (at least 
> > less than 2 sec at worst) can take upwards of 10 seconds and the mouse 
> > cursor in X starts to jump a bit as well and switching virtual desktops 
> > noticably lags when redrawing the new desktop if there's a full screen app 
> > like gimp or OpenOffice open there. This is on a Lenovo Thinkpad R61i 
> > which has a 'Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz' CPU, 2GB of 
> > memory and 499996 kilobytes of swap.
> > 
> Forgot to mention the kernel I currently experience this with : 
> 
> [jj@dragon ~]$ uname -a
> Linux dragon 2.6.35-ARCH #1 SMP PREEMPT Sat Oct 30 21:22:26 CEST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz GenuineIntel GNU/Linux

I think anyone reporting a interactivity problem also needs to
indicate what their filesystem is, what mount paramters they are
using, what their storage config is, whether barriers are active or
not, what elevator they are using, whether one or more of the
applications are issuing fsync() or sync() calls, and so on.

Basically, what we need to know is whether these problems are
isolated to a particular filesystem or storage type because
they may simply be known problems (e.g. the ext3 fsync-the-world
problem).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-05  1:43                     ` Dave Chinner
@ 2010-11-05 12:48                       ` Sanjoy Mahajan
  2010-11-06 14:10                         ` dave b
  2010-11-06 19:10                         ` Arjan van de Ven
  2010-11-07 17:16                       ` Jesper Juhl
  2010-11-09 21:00                       ` Chris Mason
  2 siblings, 2 replies; 65+ messages in thread
From: Sanjoy Mahajan @ 2010-11-05 12:48 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jesper Juhl, Chris Mason, Ingo Molnar, Pekka Enberg,
	Aidar Kultayev, linux-kernel, linux-mm, Linus Torvalds,
	Andrew Morton, Jens Axboe, Peter Zijlstra, Nick Piggin,
	Arjan van de Ven, Thomas Gleixner, Ted Ts'o, Corrado Zoccolo,
	Shaohua Li, Steven Barrett

Dave Chinner <david@fromorbit.com> wrote:

> I think anyone reporting a interactivity problem also needs to
> indicate what their filesystem is, what mount paramters they are
> using, what their storage config is, whether barriers are active or
> not, what elevator they are using, whether one or more of the
> applications are issuing fsync() or sync() calls, and so on.

Good idea.  

The filesystems are all ext3 with default mount parameters.  The dmesgs
say that the filesystems are mounted in ordered data mode and that
barriers are not enabled.

mount says:

/dev/sda2 on / type ext3 (rw,errors=remount-ro,commit=0)
/dev/sda1 on /boot type ext3 (rw,commit=0)
/dev/sda3 on /home type ext3 (rw,commit=0)

> storage config

Do you mean the partition sizes?  Here's that:

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              72G   52G   17G  77% /
tmpfs                 755M  4.0K  755M   1% /lib/init/rw
udev                  750M  212K  750M   1% /dev
tmpfs                 755M     0  755M   0% /dev/shm
/dev/sda1             274M  117M  143M  45% /boot
/dev/sda3              74G   37G   33G  53% /home

> elevator

CFQ

> sync-related calls

I don't have a test from the time I ran rsync (but I'll check that
tonight), but I traced the currently running emacs and iceweasel
(a.k.a. firefox) with "strace -p PID 2>&1 | grep sync".  That didn't
turn up any sync-related calls.

(I checked the firefox because I seem to remember that it used to do
fsync absurdly often, but I also seem to remember that the outcry made
them stop.)

-Sanjoy

`Until lions have their historians, tales of the hunt shall always
 glorify the hunters.'  --African Proverb

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-05 12:48                       ` Sanjoy Mahajan
@ 2010-11-06 14:10                         ` dave b
  2010-11-06 15:12                           ` Dave Chinner
  2010-11-07 12:08                           ` Jens Axboe
  2010-11-06 19:10                         ` Arjan van de Ven
  1 sibling, 2 replies; 65+ messages in thread
From: dave b @ 2010-11-06 14:10 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: Dave Chinner, Jesper Juhl, Chris Mason, Ingo Molnar, Pekka Enberg,
	Aidar Kultayev, linux-kernel, linux-mm, Linus Torvalds,
	Andrew Morton, Jens Axboe, Peter Zijlstra, Nick Piggin,
	Arjan van de Ven, Thomas Gleixner, Ted Ts'o, Corrado Zoccolo,
	Shaohua Li, Steven Barrett

I now personally have thought that this problem is the kernel not
keeping track of reads vs writers properly  or not providing enough
time to reading processes as writing ones which look like they are
blocking the system....

If you want to do a simple test do an unlimited dd  (or two dd's of a
limited size, say 10gb) and a find /
Tell me how it goes :) ( the system will stall)
(obviously stop the dd after some time :) ).

http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/4561
iirc can reproduce this on plain ext3.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-06 14:10                         ` dave b
@ 2010-11-06 15:12                           ` Dave Chinner
  2010-11-07  6:06                             ` dave b
  2010-11-07 12:08                           ` Jens Axboe
  1 sibling, 1 reply; 65+ messages in thread
From: Dave Chinner @ 2010-11-06 15:12 UTC (permalink / raw)
  To: dave b
  Cc: Sanjoy Mahajan, Jesper Juhl, Chris Mason, Ingo Molnar,
	Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Ted Ts'o,
	Corrado Zoccolo, Shaohua Li, Steven Barrett

On Sun, Nov 07, 2010 at 01:10:24AM +1100, dave b wrote:
> I now personally have thought that this problem is the kernel not
> keeping track of reads vs writers properly  or not providing enough
> time to reading processes as writing ones which look like they are
> blocking the system....

Could be anything from that description....

> If you want to do a simple test do an unlimited dd  (or two dd's of a
> limited size, say 10gb) and a find /
> Tell me how it goes :)

The find runs at IO latency speed while the dd processes run at disk
bandwidth:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
vda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
vdb               0.00     0.00   58.00 1251.00     0.45   556.54   871.45    26.69   20.39   0.72  94.32
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

That looks pretty normal to me for XFS and the noop IO scheduler,
and there are no signs of latency or interactive problems in
the system at all. Kill the dd's and:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
vda               0.00     0.00    0.00    0.00     0.00     0.00 0.00     0.00    0.00   0.00   0.00
vdb               0.00     0.00  214.80    0.40     1.68     0.00 15.99     0.33    1.54   1.54  33.12
sda               0.00     0.00    0.00    0.00     0.00     0.00 0.00     0.00    0.00   0.00   0.00

And the find runs 3-4x faster, but ~200 iops is about the limit
I'd expect from 7200rpm SATA drives given a single thread issuing IO
(i.e. 5ms average seek time).

> ( the system will stall)

No, the system doesn't stall at all. It runs just fine. Sure,
anything that requires IO on the loaded filesystem is _slower_, but
if you're writing huge files to it that's pretty much expected. The
root drive (on a different spindle) is still perfectly responsive on
a cold cache:

$ sudo time find / -xdev > /dev/null
0.10user 1.87system 0:03.39elapsed 58%CPU (0avgtext+0avgdata 7008maxresident)k
0inputs+0outputs (1major+844minor)pagefaults 0swap

So what you describe is not a systemic problem, but a problem that
your system configuration triggers. That's why we need to know
_exactly_ how your storage subsystem is configured....

> http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/4561
> iirc can reproduce this on plain ext3.

You're pointing to a "fsync-tester" program that exercises a
well-known problem with ext3 (sync-the-world-on-fsync). Other
filesystems do not have that design flaw so don't suffer from
interactivity problems uner these workloads.  As it is, your above
dd workload example is not related to this fsync problem, either.

This is what I'm trying to point out - you need to describe in
significant detail your setup and what your applications are doing
so we can identify if you are seeing a known problem or not. If you
are seeing problems as a result of the above ext3 fsync problem,
then the simple answer is "don't use ext3".

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-05 12:48                       ` Sanjoy Mahajan
  2010-11-06 14:10                         ` dave b
@ 2010-11-06 19:10                         ` Arjan van de Ven
  1 sibling, 0 replies; 65+ messages in thread
From: Arjan van de Ven @ 2010-11-06 19:10 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: Dave Chinner, Jesper Juhl, Chris Mason, Ingo Molnar, Pekka Enberg,
	Aidar Kultayev, linux-kernel, linux-mm, Linus Torvalds,
	Andrew Morton, Jens Axboe, Peter Zijlstra, Nick Piggin,
	Thomas Gleixner, Ted Ts'o, Corrado Zoccolo, Shaohua Li,
	Steven Barrett

On Fri, 5 Nov 2010 08:48:13 -0400
Sanjoy Mahajan <sanjoy@olin.edu> wrote:

> Dave Chinner <david@fromorbit.com> wrote:
> 
> > I think anyone reporting a interactivity problem also needs to
> > indicate what their filesystem is, what mount paramters they are
> > using, what their storage config is, whether barriers are active or
> > not, what elevator they are using, whether one or more of the
> > applications are issuing fsync() or sync() calls, and so on.
> 
> Good idea.  
> 
> The filesystems are all ext3 with default mount parameters.  The
> dmesgs say that the filesystems are mounted in ordered data mode and
> that barriers are not enabled.

btw few more things to try (from my standard rc.local script):

echo 4096 > /sys/block/sda/queue/nr_requests

for i in `pidof kjournald` ; do ionice -c1 -p $i ; done

echo 75 >  /proc/sys/vm/dirty_ratio


(replace sda with whatever your disk is of course)

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-06 15:12                           ` Dave Chinner
@ 2010-11-07  6:06                             ` dave b
  0 siblings, 0 replies; 65+ messages in thread
From: dave b @ 2010-11-07  6:06 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Sanjoy Mahajan, Jesper Juhl, Chris Mason, Ingo Molnar,
	Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Jens Axboe, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Ted Ts'o,
	Corrado Zoccolo, Shaohua Li, Steven Barrett

On 7 November 2010 02:12, Dave Chinner <david@fromorbit.com> wrote:
> On Sun, Nov 07, 2010 at 01:10:24AM +1100, dave b wrote:
>> I now personally have thought that this problem is the kernel not
>> keeping track of reads vs writers properly  or not providing enough
>> time to reading processes as writing ones which look like they are
>> blocking the system....
>
> Could be anything from that description....
>
>> If you want to do a simple test do an unlimited dd  (or two dd's of a
>> limited size, say 10gb) and a find /
>> Tell me how it goes :)
>
> The find runs at IO latency speed while the dd processes run at disk
> bandwidth:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> vda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
> vdb               0.00     0.00   58.00 1251.00     0.45   556.54   871.45    26.69   20.39   0.72  94.32
> sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
>
> That looks pretty normal to me for XFS and the noop IO scheduler,
> and there are no signs of latency or interactive problems in
> the system at all. Kill the dd's and:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> vda               0.00     0.00    0.00    0.00     0.00     0.00 0.00     0.00    0.00   0.00   0.00
> vdb               0.00     0.00  214.80    0.40     1.68     0.00 15.99     0.33    1.54   1.54  33.12
> sda               0.00     0.00    0.00    0.00     0.00     0.00 0.00     0.00    0.00   0.00   0.00
>
> And the find runs 3-4x faster, but ~200 iops is about the limit
> I'd expect from 7200rpm SATA drives given a single thread issuing IO
> (i.e. 5ms average seek time).
>
>> ( the system will stall)
>
> No, the system doesn't stall at all. It runs just fine. Sure,
> anything that requires IO on the loaded filesystem is _slower_, but
> if you're writing huge files to it that's pretty much expected. The
> root drive (on a different spindle) is still perfectly responsive on
> a cold cache:
>
> $ sudo time find / -xdev > /dev/null
> 0.10user 1.87system 0:03.39elapsed 58%CPU (0avgtext+0avgdata 7008maxresident)k
> 0inputs+0outputs (1major+844minor)pagefaults 0swap
>
> So what you describe is not a systemic problem, but a problem that
> your system configuration triggers. That's why we need to know
> _exactly_ how your storage subsystem is configured....
>
>> http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/4561
>> iirc can reproduce this on plain ext3.
>
> You're pointing to a "fsync-tester" program that exercises a
> well-known problem with ext3 (sync-the-world-on-fsync). Other
> filesystems do not have that design flaw so don't suffer from
> interactivity problems uner these workloads.  As it is, your above
> dd workload example is not related to this fsync problem, either.
>
> This is what I'm trying to point out - you need to describe in
> significant detail your setup and what your applications are doing
> so we can identify if you are seeing a known problem or not. If you
> are seeing problems as a result of the above ext3 fsync problem,
> then the simple answer is "don't use ext3".

Thank you for your reply.
Well I am not sure :)
Is the answer "don't use ext3" ?
If it is what should I really be using instead?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-06 14:10                         ` dave b
  2010-11-06 15:12                           ` Dave Chinner
@ 2010-11-07 12:08                           ` Jens Axboe
  2010-11-07 15:50                             ` Linus Torvalds
  1 sibling, 1 reply; 65+ messages in thread
From: Jens Axboe @ 2010-11-07 12:08 UTC (permalink / raw)
  To: dave b
  Cc: Sanjoy Mahajan, Dave Chinner, Jesper Juhl, Chris Mason,
	Ingo Molnar, Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Linus Torvalds, Andrew Morton, Peter Zijlstra, Nick Piggin,
	Arjan van de Ven, Thomas Gleixner, Ted Ts'o, Corrado Zoccolo,
	Shaohua Li, Steven Barrett

On 2010-11-06 15:10, dave b wrote:
> I now personally have thought that this problem is the kernel not
> keeping track of reads vs writers properly  or not providing enough
> time to reading processes as writing ones which look like they are
> blocking the system....
> 
> If you want to do a simple test do an unlimited dd  (or two dd's of a
> limited size, say 10gb) and a find /
> Tell me how it goes :) ( the system will stall)
> (obviously stop the dd after some time :) ).
> 
> http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/4561
> iirc can reproduce this on plain ext3.

As already mentioned, ext3 is just not a good choice for this sort of
thing. Did you have atimes enabled?

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-07 12:08                           ` Jens Axboe
@ 2010-11-07 15:50                             ` Linus Torvalds
  2010-11-10  1:32                               ` Dave Chinner
  0 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2010-11-07 15:50 UTC (permalink / raw)
  To: Jens Axboe
  Cc: dave b, Sanjoy Mahajan, Dave Chinner, Jesper Juhl, Chris Mason,
	Ingo Molnar, Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Andrew Morton, Peter Zijlstra, Nick Piggin, Arjan van de Ven,
	Thomas Gleixner, Ted Ts'o, Corrado Zoccolo, Shaohua Li,
	Steven Barrett

On Sun, Nov 7, 2010 at 4:08 AM, Jens Axboe <axboe@kernel.dk> wrote:
>
> As already mentioned, ext3 is just not a good choice for this sort of
> thing. Did you have atimes enabled?

At least for ext3, more important than atimes is the "data=writeback"
setting. Especially since our atime default is sane these days (ie if
you don't specify anything, we end up using 'relatime').

If you compile your own kernel, answer "N" to the question

  Default to 'data=ordered' in ext3?

at config time (CONFIG_EXT3_DEFAULTS_TO_ORDERED), or you can make sure
"data=writeback" is in the fstab (but I don't think everything honors
it for the root filesystem).

                                   Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-05  1:43                     ` Dave Chinner
  2010-11-05 12:48                       ` Sanjoy Mahajan
@ 2010-11-07 17:16                       ` Jesper Juhl
  2010-11-09 19:47                         ` Evgeniy Ivanov
  2010-11-09 21:00                       ` Chris Mason
  2 siblings, 1 reply; 65+ messages in thread
From: Jesper Juhl @ 2010-11-07 17:16 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Chris Mason, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Linus Torvalds, Andrew Morton, Jens Axboe,
	Peter Zijlstra, Nick Piggin, Arjan van de Ven, Thomas Gleixner,
	Ted Ts'o, Corrado Zoccolo, Shaohua Li, Sanjoy Mahajan,
	Steven Barrett

On Fri, 5 Nov 2010, Dave Chinner wrote:

> On Fri, Nov 05, 2010 at 12:48:17AM +0100, Jesper Juhl wrote:
> > On Fri, 5 Nov 2010, Jesper Juhl wrote:
> > 
> > > On Thu, 28 Oct 2010, Chris Mason wrote:
> > > 
> > > > On Thu, Oct 28, 2010 at 03:30:36PM +0200, Ingo Molnar wrote:
> > > > > 
> > > > > "Many seconds freezes" and slowdowns wont be fixed via the VFS scalability patches 
> > > > > i'm afraid.
> > > > > 
> > > > > This has the appearance of some really bad IO or VM latency problem. Unfixed and 
> > > > > present in stable kernel versions going from years ago all the way to v2.6.36.
> > > > 
> > > > Hmmm, the workload you're describing here has two special parts.  First
> > > > it dramatically overloads the disk, and then it has guis doing things
> > > > waiting for the disk.
> > > > 
> > > 
> > > Just want to chime in with a 'me too'.
> > > 
> > > I see something similar on Arch Linux when doing 'pacman -Syyuv' and there 
> > > are many (as in more than 5-10) updates to apply. While the update is 
> > > running (even if that's all the system is doing) system responsiveness is 
> > > terrible - just starting 'chromium' which is usually instant (at least 
> > > less than 2 sec at worst) can take upwards of 10 seconds and the mouse 
> > > cursor in X starts to jump a bit as well and switching virtual desktops 
> > > noticably lags when redrawing the new desktop if there's a full screen app 
> > > like gimp or OpenOffice open there. This is on a Lenovo Thinkpad R61i 
> > > which has a 'Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz' CPU, 2GB of 
> > > memory and 499996 kilobytes of swap.
> > > 
> > Forgot to mention the kernel I currently experience this with : 
> > 
> > [jj@dragon ~]$ uname -a
> > Linux dragon 2.6.35-ARCH #1 SMP PREEMPT Sat Oct 30 21:22:26 CEST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz GenuineIntel GNU/Linux
> 
> I think anyone reporting a interactivity problem also needs to
> indicate what their filesystem is, what mount paramters they are
> using, what their storage config is, whether barriers are active or
> not, what elevator they are using, whether one or more of the
> applications are issuing fsync() or sync() calls, and so on.
>
Some details below.

[jj@dragon ~]$ mount
proc on /proc type proc (rw,relatime)
sys on /sys type sysfs (rw,relatime)
udev on /dev type devtmpfs 
(rw,nosuid,relatime,size=10240k,nr_inodes=255749,mode=755)
/dev/disk/by-uuid/61d104a5-4f7b-40ef-a9c8-44ad2765513e on / type ext4 (rw,commit=0)
devpts on /dev/pts type devpts (rw)
shm on /dev/shm type tmpfs (rw,nosuid,nodev)

[root@dragon ~]# hdparm -v /dev/disk/by-uuid/61d104a5-4f7b-40ef-a9c8-44ad2765513e

/dev/disk/by-uuid/61d104a5-4f7b-40ef-a9c8-44ad2765513e:
 multcount     = 16 (on)
 IO_support    =  1 (32-bit)
 readonly      =  0 (off)
 readahead     = 256 (on)
 geometry      = 9729/255/63, sectors = 25220160, start = 119644560

[root@dragon ~]# dmesg | grep -i ext4
EXT4-fs (sda4): mounted filesystem with ordered data mode. Opts: (null)
EXT4-fs (sda4): re-mounted. Opts: (null)
EXT4-fs (sda4): re-mounted. Opts: (null)
EXT4-fs (sda4): re-mounted. Opts: commit=0

The elevator in use is CFQ.

The app that's causing the system to behave this way (the 'pacman' package 
manager in Arch Linux) makes a few calls (2-4)  to fsync() during its run, 
but that's all.


-- 
Jesper Juhl <jj@chaosbits.net>             http://www.chaosbits.net/
Plain text mails only, please      http://www.expita.com/nomime.html
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-07 17:16                       ` Jesper Juhl
@ 2010-11-09 19:47                         ` Evgeniy Ivanov
  2010-11-09 20:20                           ` Christoph Hellwig
  0 siblings, 1 reply; 65+ messages in thread
From: Evgeniy Ivanov @ 2010-11-09 19:47 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Dave Chinner, Chris Mason, Ingo Molnar, Pekka Enberg,
	Aidar Kultayev, linux-kernel, linux-mm, Linus Torvalds,
	Andrew Morton, Jens Axboe, Peter Zijlstra, Nick Piggin,
	Arjan van de Ven, Thomas Gleixner, Ted Ts'o, Corrado Zoccolo,
	Shaohua Li, Sanjoy Mahajan, Steven Barrett

I have almost same problem (system is less interactive, but no freeze happens).
Here are tests I use (written by Alexander Nekrasov):
logrotate.sh (hard writer): http://pastebin.com/PPnSvP2f
writetest (small writer): http://pastebin.com/616JvWEK

If you run "writetest 15 realtime" timings will be OK, but if you also
run "logrotate.sh 300 3" you will see that RT processes start trashing
(timings periodically increase from 50ms to 2000-4000ms).
I do tests on 2.6.31, but same happens on 2.6.36. CFQ with default
settings is used. I've played with page-background.c and noticed, that
writeback still works for RT processes (no write through/disk wait). I
even tried to increase dirty_ratio for RT processes. Also I've limited
memory consumed by dd (logrotate.sh), since I had situation when it
consumed too much and kernel started to reclaim pages.

It doesn't want to work on ext3 (compiled and mounted like Linus
suggested in this thread), but works fine on ext4 with
"data=writeback" and on XFS. I'm not sure if it means that problem in
ext3 and in journaling (in case of ext4 without data=writeback).
I'm not sure if "data=writeback" (makes ext4 journaling similar to
XFS) really fixes the problem, probably it increases FS bandwidth, so
we just don't see the problem, but it can still present.

On Sun, Nov 7, 2010 at 8:16 PM, Jesper Juhl <jj@chaosbits.net> wrote:
> On Fri, 5 Nov 2010, Dave Chinner wrote:
>
>> On Fri, Nov 05, 2010 at 12:48:17AM +0100, Jesper Juhl wrote:
>> > On Fri, 5 Nov 2010, Jesper Juhl wrote:
>> >
>> > > On Thu, 28 Oct 2010, Chris Mason wrote:
>> > >
>> > > > On Thu, Oct 28, 2010 at 03:30:36PM +0200, Ingo Molnar wrote:
>> > > > >
>> > > > > "Many seconds freezes" and slowdowns wont be fixed via the VFS scalability patches
>> > > > > i'm afraid.
>> > > > >
>> > > > > This has the appearance of some really bad IO or VM latency problem. Unfixed and
>> > > > > present in stable kernel versions going from years ago all the way to v2.6.36.
>> > > >
>> > > > Hmmm, the workload you're describing here has two special parts.  First
>> > > > it dramatically overloads the disk, and then it has guis doing things
>> > > > waiting for the disk.
>> > > >
>> > >
>> > > Just want to chime in with a 'me too'.
>> > >
>> > > I see something similar on Arch Linux when doing 'pacman -Syyuv' and there
>> > > are many (as in more than 5-10) updates to apply. While the update is
>> > > running (even if that's all the system is doing) system responsiveness is
>> > > terrible - just starting 'chromium' which is usually instant (at least
>> > > less than 2 sec at worst) can take upwards of 10 seconds and the mouse
>> > > cursor in X starts to jump a bit as well and switching virtual desktops
>> > > noticably lags when redrawing the new desktop if there's a full screen app
>> > > like gimp or OpenOffice open there. This is on a Lenovo Thinkpad R61i
>> > > which has a 'Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz' CPU, 2GB of
>> > > memory and 499996 kilobytes of swap.
>> > >
>> > Forgot to mention the kernel I currently experience this with :
>> >
>> > [jj@dragon ~]$ uname -a
>> > Linux dragon 2.6.35-ARCH #1 SMP PREEMPT Sat Oct 30 21:22:26 CEST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz GenuineIntel GNU/Linux
>>
>> I think anyone reporting a interactivity problem also needs to
>> indicate what their filesystem is, what mount paramters they are
>> using, what their storage config is, whether barriers are active or
>> not, what elevator they are using, whether one or more of the
>> applications are issuing fsync() or sync() calls, and so on.
>>
> Some details below.
>
> [jj@dragon ~]$ mount
> proc on /proc type proc (rw,relatime)
> sys on /sys type sysfs (rw,relatime)
> udev on /dev type devtmpfs
> (rw,nosuid,relatime,size=10240k,nr_inodes=255749,mode=755)
> /dev/disk/by-uuid/61d104a5-4f7b-40ef-a9c8-44ad2765513e on / type ext4 (rw,commit=0)
> devpts on /dev/pts type devpts (rw)
> shm on /dev/shm type tmpfs (rw,nosuid,nodev)
>
> [root@dragon ~]# hdparm -v /dev/disk/by-uuid/61d104a5-4f7b-40ef-a9c8-44ad2765513e
>
> /dev/disk/by-uuid/61d104a5-4f7b-40ef-a9c8-44ad2765513e:
>  multcount     = 16 (on)
>  IO_support    =  1 (32-bit)
>  readonly      =  0 (off)
>  readahead     = 256 (on)
>  geometry      = 9729/255/63, sectors = 25220160, start = 119644560
>
> [root@dragon ~]# dmesg | grep -i ext4
> EXT4-fs (sda4): mounted filesystem with ordered data mode. Opts: (null)
> EXT4-fs (sda4): re-mounted. Opts: (null)
> EXT4-fs (sda4): re-mounted. Opts: (null)
> EXT4-fs (sda4): re-mounted. Opts: commit=0
>
> The elevator in use is CFQ.
>
> The app that's causing the system to behave this way (the 'pacman' package
> manager in Arch Linux) makes a few calls (2-4)  to fsync() during its run,
> but that's all.
>
>
> --
> Jesper Juhl <jj@chaosbits.net>             http://www.chaosbits.net/
> Plain text mails only, please      http://www.expita.com/nomime.html
> Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>



-- 
Evgeniy Ivanov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-09 19:47                         ` Evgeniy Ivanov
@ 2010-11-09 20:20                           ` Christoph Hellwig
  0 siblings, 0 replies; 65+ messages in thread
From: Christoph Hellwig @ 2010-11-09 20:20 UTC (permalink / raw)
  To: Evgeniy Ivanov
  Cc: Jesper Juhl, Dave Chinner, Chris Mason, Ingo Molnar, Pekka Enberg,
	Aidar Kultayev, linux-kernel, linux-mm, Linus Torvalds,
	Andrew Morton, Jens Axboe, Peter Zijlstra, Nick Piggin,
	Arjan van de Ven, Thomas Gleixner, Ted Ts'o, Corrado Zoccolo,
	Shaohua Li, Sanjoy Mahajan, Steven Barrett

> I'm not sure if "data=writeback" (makes ext4 journaling similar to
> XFS) really fixes the problem

It doesn't.  XFS does not expose stale data after a crash, while ext3/4
data=writeback does.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-05  1:43                     ` Dave Chinner
  2010-11-05 12:48                       ` Sanjoy Mahajan
  2010-11-07 17:16                       ` Jesper Juhl
@ 2010-11-09 21:00                       ` Chris Mason
  2 siblings, 0 replies; 65+ messages in thread
From: Chris Mason @ 2010-11-09 21:00 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jesper Juhl, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Linus Torvalds, Andrew Morton, Jens Axboe,
	Peter Zijlstra, Nick Piggin, Arjan van de Ven, Thomas Gleixner,
	Ted Ts'o, Corrado Zoccolo, Shaohua Li, Sanjoy Mahajan,
	Steven Barrett

Excerpts from Dave Chinner's message of 2010-11-04 21:43:34 -0400:
> On Fri, Nov 05, 2010 at 12:48:17AM +0100, Jesper Juhl wrote:
>
> [ the disks are slow for me too!!!!!!!!!!!!!! ]
>
> > Forgot to mention the kernel I currently experience this with : 
> > 
> > [jj@dragon ~]$ uname -a
> > Linux dragon 2.6.35-ARCH #1 SMP PREEMPT Sat Oct 30 21:22:26 CEST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz GenuineIntel GNU/Linux
> 
> I think anyone reporting a interactivity problem also needs to
> indicate what their filesystem is, what mount paramters they are
> using, what their storage config is, whether barriers are active or
> not, what elevator they are using, whether one or more of the
> applications are issuing fsync() or sync() calls, and so on.
> 
> Basically, what we need to know is whether these problems are
> isolated to a particular filesystem or storage type because
> they may simply be known problems (e.g. the ext3 fsync-the-world
> problem).

latencytop does help quite a lot in nailing down why we're waiting on
the disk, but the interface doesn't lend itself very well to remote
debugging.  We end up asking for screen shots that may or may not really
nail down what is going on.

I've got a patch that adds latencytop -c, which you use like this:

latencytop -c >& out

It spits out latency info for all the procs every 10 seconds or so,
along with a short stack trace that often helps figure things out.

The patch is below and works properly with the current latencytop
git.  If some of the people hitting bad latencies could try it, it might
help narrow things down.

From: Chris Mason <chris.mason@oracle.com>
Subject: [PATCH] Add latencytop -c to dump process information to the console

This adds something similar to vmstat 1 to latencytop, where
it simply does a text dump of all the process latency information
to the console every 10 seconds.  Back traces are included in the
dump.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
---
 src/Makefile     |    2 +-
 src/latencytop.c |   38 +++++++---
 src/latencytop.h |    1 +
 src/text_dump.c  |  199 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 227 insertions(+), 13 deletions(-)
 create mode 100644 src/text_dump.c

diff --git a/src/Makefile b/src/Makefile
index de24551..1ff9740 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -6,7 +6,7 @@ SBINDIR = /usr/sbin
 XCFLAGS = -W  -g `pkg-config --cflags glib-2.0` -D_FORTIFY_SOURCE=2 -Wno-sign-compare
 LDF = -Wl,--as-needed `pkg-config --libs glib-2.0`   -lncursesw 
 
-OBJS= latencytop.o text_display.o translate.o fsync.o
+OBJS= latencytop.o text_display.o text_dump.o translate.o fsync.o
 
 ifdef HAS_GTK_GUI
   XCFLAGS += `pkg-config --cflags gtk+-2.0` -DHAS_GTK_GUI
diff --git a/src/latencytop.c b/src/latencytop.c
index f516f53..fe252d0 100644
--- a/src/latencytop.c
+++ b/src/latencytop.c
@@ -111,6 +111,10 @@ static void fixup_reason(struct latency_line *line, char *c)
 		*(c2++) = 0;
 	} else
 		strncpy(line->reason, c2, 1024);
+
+	c2 = strchr(line->reason, '\n');
+	if (c2)
+		*c2=0;
 }
 
 void parse_global_list(void)
@@ -538,19 +542,13 @@ static void cleanup_sysctl(void)
 int main(int argc, char **argv)
 {
 	int i, use_gtk = 0;
+	int console_dump = 0;
 
 	enable_sysctl();
 	enable_fsync_tracer();
 	atexit(cleanup_sysctl);
 
-#ifdef HAS_GTK_GUI
-	if (preinitialize_gtk_ui(&argc, &argv))
-		use_gtk = 1;
-#endif
-	if (!use_gtk)
-		preinitialize_text_ui(&argc, &argv);
-
-	for (i = 1; i < argc; i++)		
+	for (i = 1; i < argc; i++) {
 		if (strcmp(argv[i],"-d") == 0) {
 			init_translations("latencytop.trans");
 			parse_global_list();
@@ -558,6 +556,17 @@ int main(int argc, char **argv)
 			dump_global_to_console();
 			return EXIT_SUCCESS;
 		}
+		if (strcmp(argv[i],"-c") == 0)
+			console_dump = 1;
+	}
+
+#ifdef HAS_GTK_GUI
+	if (!console_dump && preinitialize_gtk_ui(&argc, &argv))
+		use_gtk = 1;
+#endif
+	if (!console_dump && !use_gtk)
+		preinitialize_text_ui(&argc, &argv);
+
 	for (i = 1; i < argc; i++)
 		if (strcmp(argv[i], "--unknown") == 0) {
 			noui = 1;
@@ -579,12 +588,17 @@ int main(int argc, char **argv)
 		sleep(5);
 		fprintf(stderr, ".");
 	}
+
+	if (console_dump) {
+		start_text_dump();
+	} else {
 #ifdef HAS_GTK_GUI
-	if (use_gtk)
-		start_gtk_ui();
-	else
+		if (use_gtk)
+			start_gtk_ui();
+		else
 #endif
-		start_text_ui();
+			start_text_ui();
+	}
 
 	prune_unused_procs();
 	delete_list();
diff --git a/src/latencytop.h b/src/latencytop.h
index 79775ac..f3e0934 100644
--- a/src/latencytop.h
+++ b/src/latencytop.h
@@ -50,6 +50,7 @@ extern void start_gtk_ui(void);
 
 extern void preinitialize_text_ui(int *argc, char ***argv);
 extern void start_text_ui(void);
+extern void start_text_dump(void);
 
 extern char *translate(char *line);
 extern void init_translations(char *filename);
diff --git a/src/text_dump.c b/src/text_dump.c
new file mode 100644
index 0000000..76fc7b1
--- /dev/null
+++ b/src/text_dump.c
@@ -0,0 +1,199 @@
+/*
+ * Copyright 2008, Intel Corporation
+ *
+ * This file is part of LatencyTOP
+ *
+ * This program file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program in a file named COPYING; if not, write to the
+ * Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor,
+ * Boston, MA 02110-1301 USA
+ *
+ * Authors:
+ * 	Arjan van de Ven <arjan@linux.intel.com>
+ *	Chris Mason <chris.mason@oracle.com>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <dirent.h>
+#include <time.h>
+#include <wchar.h>
+#include <ctype.h>
+
+#include <glib.h>
+
+#include "latencytop.h"
+
+static GList *cursor_e = NULL;
+static int done = 0;
+
+static void print_global_list(void)
+{
+	GList *item;
+	struct latency_line *line;
+	int i = 1;
+
+	printf("Globals: Cause Maximum Percentage\n");
+	item = g_list_first(lines);
+	while (item && i < 10) {
+		line = item->data;
+		item = g_list_next(item);
+
+		if (line->max*0.001 < 0.1)
+			continue;
+		printf("%s", line->reason);
+		printf("\t%5.1f msec        %5.1f %%\n",
+				line->max * 0.001,
+				(line->time * 100 +0.0001) / total_time);
+		i++;
+	}
+}
+
+static void print_one_backtrace(char *trace)
+{
+	char *p;
+	int pos;
+	int after;
+	int tabs = 0;
+
+	if (!trace || !trace[0])
+		return;
+	pos = 16;
+	while(*trace && *trace == ' ')
+		trace++;
+
+	if (!trace[0])
+		return;
+
+	while(*trace) {
+		p = strchr(trace, ' ');
+		if (p) {
+			pos += p - trace + 1;
+			*p = '\0';
+		}
+		if (!tabs) {
+			/* we haven't printed anything yet */
+			printf("\t\t");
+			tabs = 1;
+		} else if (pos > 79) {
+			/*
+			 * we have printed something our line is going to be
+			 * long
+			 */
+			printf("\n\t\t");
+			pos = 16 + p - trace + 1;
+		}
+		printf("%s ", trace);
+		if (!p)
+			break;
+
+		trace = p + 1;
+		if (trace && pos > 70) {
+			printf("\n");
+			tabs = 0;
+			pos = 16;
+		}
+	}
+	printf("\n");
+}
+
+static void print_procs()
+{
+	struct process *proc;
+	GList *item;
+	double total;
+
+	printf("Process details:\n");
+	item = g_list_first(procs);
+	while (item) {
+		int printit = 0;
+		GList *item2;
+		struct latency_line *line;
+		proc = item->data;
+		item = g_list_next(item);
+
+		total = 0.0;
+
+		item2 = g_list_first(proc->latencies);
+		while (item2) {
+			line = item2->data;
+			item2 = g_list_next(item2);
+			total = total + line->time;
+		}
+		item2 = g_list_first(proc->latencies);
+		while (item2) {
+			char *p;
+			char *backtrace;
+			line = item2->data;
+			item2 = g_list_next(item2);
+			if (line->max*0.001 < 0.1)
+				continue;
+			if (!printit) {
+				printf("Process %s (%i) ", proc->name, proc->pid);
+				printf("Total: %5.1f msec\n", total*0.001);
+				printit = 1;
+			}
+			printf("\t%s", line->reason);
+			printf("\t%5.1f msec        %5.1f %%\n",
+				line->max * 0.001,
+				(line->time * 100 +0.0001) / total
+				);
+			print_one_backtrace(line->backtrace);
+		}
+
+	}
+}
+
+static int done_yet(int time, struct timeval *p1)
+{
+	int seconds;
+	int usecs;
+	struct timeval p2;
+	gettimeofday(&p2, NULL);
+	seconds = p2.tv_sec - p1->tv_sec;
+	usecs = p2.tv_usec - p1->tv_usec;
+
+	usecs += seconds * 1000000;
+	if (usecs > time * 1000000)
+		return 1;
+	return 0;
+}
+
+void signal_func(int foobie)
+{
+	done = 1;
+}
+
+void start_text_dump(void)
+{
+	struct timeval now;
+	struct tm *tm;
+	signal(SIGINT, signal_func);
+	signal(SIGTERM, signal_func);
+
+	while (!done) {
+		gettimeofday(&now, NULL);
+		printf("=============== %s", asctime(localtime(&now.tv_sec)));
+		update_list();
+		print_global_list();
+		print_procs();
+		if (done)
+			break;
+		sleep(10);
+	}
+}
+
-- 
1.6.5.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-07 15:50                             ` Linus Torvalds
@ 2010-11-10  1:32                               ` Dave Chinner
  2010-11-10  2:01                                 ` dave b
                                                   ` (4 more replies)
  0 siblings, 5 replies; 65+ messages in thread
From: Dave Chinner @ 2010-11-10  1:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, dave b, Sanjoy Mahajan, Jesper Juhl, Chris Mason,
	Ingo Molnar, Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Andrew Morton, Peter Zijlstra, Nick Piggin, Arjan van de Ven,
	Thomas Gleixner, Ted Ts'o, Corrado Zoccolo, Shaohua Li,
	Steven Barrett

On Sun, Nov 07, 2010 at 07:50:13AM -0800, Linus Torvalds wrote:
> On Sun, Nov 7, 2010 at 4:08 AM, Jens Axboe <axboe@kernel.dk> wrote:
> >
> > As already mentioned, ext3 is just not a good choice for this sort of
> > thing. Did you have atimes enabled?
> 
> At least for ext3, more important than atimes is the "data=writeback"
> setting. Especially since our atime default is sane these days (ie if
> you don't specify anything, we end up using 'relatime').
> 
> If you compile your own kernel, answer "N" to the question
> 
>   Default to 'data=ordered' in ext3?
> 
> at config time (CONFIG_EXT3_DEFAULTS_TO_ORDERED), or you can make sure
> "data=writeback" is in the fstab (but I don't think everything honors
> it for the root filesystem).

Don't forget to mention data=writeback is not the default because if
your system crashes or you lose power running in this mode it will
*CORRUPT YOUR FILESYSTEM* and you *WILL LOSE DATA*. Not to mention
the significant security issues (e.g stale data exposure) that also
occur even if the filesystem is not corrupted by the crash. IOWs,
data=writeback is the "fast but I'll eat your data" option for ext3.

So I recommend that nobody follows this path because it only leads
to worse trouble down the road.  Your best bet it to migrate away
from ext3 to a filesystem that doesn't have such inherent ordering
problems like ext4 or XFS....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10  1:32                               ` Dave Chinner
@ 2010-11-10  2:01                                 ` dave b
  2010-11-10  8:08                                 ` Evgeniy Ivanov
                                                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 65+ messages in thread
From: dave b @ 2010-11-10  2:01 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Jens Axboe, Sanjoy Mahajan, Jesper Juhl,
	Chris Mason, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Andrew Morton, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Ted Ts'o,
	Corrado Zoccolo, Shaohua Li, Steven Barrett

Ok so all of us on ext3 should just up and move to ext4 ^ ^ ? (who
want to avoid these problems)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10  1:32                               ` Dave Chinner
  2010-11-10  2:01                                 ` dave b
@ 2010-11-10  8:08                                 ` Evgeniy Ivanov
  2010-11-10  8:24                                   ` Dave Chinner
  2010-11-10 14:20                                 ` Pavel Machek
                                                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 65+ messages in thread
From: Evgeniy Ivanov @ 2010-11-10  8:08 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Jens Axboe, dave b, Sanjoy Mahajan, Jesper Juhl,
	Chris Mason, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Andrew Morton, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Ted Ts'o,
	Corrado Zoccolo, Shaohua Li, Steven Barrett

On Wed, Nov 10, 2010 at 4:32 AM, Dave Chinner <david@fromorbit.com> wrote:
> Don't forget to mention data=writeback is not the default because if
> your system crashes or you lose power running in this mode it will
> *CORRUPT YOUR FILESYSTEM* and you *WILL LOSE DATA*. Not to mention
> the significant security issues (e.g stale data exposure) that also
> occur even if the filesystem is not corrupted by the crash. IOWs,
> data=writeback is the "fast but I'll eat your data" option for ext3.
>
> So I recommend that nobody follows this path because it only leads
> to worse trouble down the road.  Your best bet it to migrate away
> from ext3 to a filesystem that doesn't have such inherent ordering
> problems like ext4 or XFS....

Is it save to use "data=writeback" with ext4? At least are there
security issues?
Why do you say, that fs can be corrupted? Metadata is still
journalled, so only data might be corrupted, but FS should still be
consistent.


-- 
Evgeniy Ivanov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10  8:08                                 ` Evgeniy Ivanov
@ 2010-11-10  8:24                                   ` Dave Chinner
  2010-11-10 14:22                                     ` Pavel Machek
  0 siblings, 1 reply; 65+ messages in thread
From: Dave Chinner @ 2010-11-10  8:24 UTC (permalink / raw)
  To: Evgeniy Ivanov
  Cc: Linus Torvalds, Jens Axboe, dave b, Sanjoy Mahajan, Jesper Juhl,
	Chris Mason, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Andrew Morton, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Ted Ts'o,
	Corrado Zoccolo, Shaohua Li, Steven Barrett

On Wed, Nov 10, 2010 at 11:08:17AM +0300, Evgeniy Ivanov wrote:
> On Wed, Nov 10, 2010 at 4:32 AM, Dave Chinner <david@fromorbit.com> wrote:
> > Don't forget to mention data=writeback is not the default because if
> > your system crashes or you lose power running in this mode it will
> > *CORRUPT YOUR FILESYSTEM* and you *WILL LOSE DATA*. Not to mention
> > the significant security issues (e.g stale data exposure) that also
> > occur even if the filesystem is not corrupted by the crash. IOWs,
> > data=writeback is the "fast but I'll eat your data" option for ext3.
> >
> > So I recommend that nobody follows this path because it only leads
> > to worse trouble down the road.  Your best bet it to migrate away
> > from ext3 to a filesystem that doesn't have such inherent ordering
> > problems like ext4 or XFS....
> 
> Is it save to use "data=writeback" with ext4?

I believe the same issues exist with data=writeback in ext4, but you
probably should have an ext4 developer answer that question for
certain.

> At least are there security issues?
> Why do you say, that fs can be corrupted? Metadata is still
> journalled, so only data might be corrupted, but FS should still be
> consistent.

Data corruption is still a filesystem corruption.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10  1:32                               ` Dave Chinner
  2010-11-10  2:01                                 ` dave b
  2010-11-10  8:08                                 ` Evgeniy Ivanov
@ 2010-11-10 14:20                                 ` Pavel Machek
  2010-11-10 14:27                                   ` Ingo Molnar
  2010-11-10 14:33                                 ` Theodore Tso
  2010-11-10 15:59                                 ` Linus Torvalds
  4 siblings, 1 reply; 65+ messages in thread
From: Pavel Machek @ 2010-11-10 14:20 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Jens Axboe, dave b, Sanjoy Mahajan, Jesper Juhl,
	Chris Mason, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Andrew Morton, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Ted Ts'o,
	Corrado Zoccolo, Shaohua Li, Steven Barrett

Hi!

> > > As already mentioned, ext3 is just not a good choice for this sort of
> > > thing. Did you have atimes enabled?
> > 
> > At least for ext3, more important than atimes is the "data=writeback"
> > setting. Especially since our atime default is sane these days (ie if
> > you don't specify anything, we end up using 'relatime').
> > 
> > If you compile your own kernel, answer "N" to the question
> > 
> >   Default to 'data=ordered' in ext3?
> > 
> > at config time (CONFIG_EXT3_DEFAULTS_TO_ORDERED), or you can make sure
> > "data=writeback" is in the fstab (but I don't think everything honors
> > it for the root filesystem).
> 
> Don't forget to mention data=writeback is not the default because if
> your system crashes or you lose power running in this mode it will
> *CORRUPT YOUR FILESYSTEM* and you *WILL LOSE DATA*. Not to mention

You will lose your data, but the filesystem should still be
consistent, right? Metadata are still journaled.

> the significant security issues (e.g stale data exposure) that also
> occur even if the filesystem is not corrupted by the crash. IOWs,

I agree on security issues.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10  8:24                                   ` Dave Chinner
@ 2010-11-10 14:22                                     ` Pavel Machek
  0 siblings, 0 replies; 65+ messages in thread
From: Pavel Machek @ 2010-11-10 14:22 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Evgeniy Ivanov, Linus Torvalds, Jens Axboe, dave b,
	Sanjoy Mahajan, Jesper Juhl, Chris Mason, Ingo Molnar,
	Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Andrew Morton, Peter Zijlstra, Nick Piggin, Arjan van de Ven,
	Thomas Gleixner, Ted Ts'o, Corrado Zoccolo, Shaohua Li,
	Steven Barrett

Hi!

> > At least are there security issues?
> > Why do you say, that fs can be corrupted? Metadata is still
> > journalled, so only data might be corrupted, but FS should still be
> > consistent.
> 
> Data corruption is still a filesystem corruption.

As far as I understand, apps should not expect anything unless they
use fsync(). And fsync() still works in ext3...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10 14:20                                 ` Pavel Machek
@ 2010-11-10 14:27                                   ` Ingo Molnar
  2010-11-10 14:55                                     ` Christoph Hellwig
  0 siblings, 1 reply; 65+ messages in thread
From: Ingo Molnar @ 2010-11-10 14:27 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Dave Chinner, Linus Torvalds, Jens Axboe, dave b, Sanjoy Mahajan,
	Jesper Juhl, Chris Mason, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Andrew Morton, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Ted Ts'o,
	Corrado Zoccolo, Shaohua Li, Steven Barrett


* Pavel Machek <pavel@ucw.cz> wrote:

> Hi!
> 
> > > > As already mentioned, ext3 is just not a good choice for this sort of
> > > > thing. Did you have atimes enabled?
> > > 
> > > At least for ext3, more important than atimes is the "data=writeback"
> > > setting. Especially since our atime default is sane these days (ie if
> > > you don't specify anything, we end up using 'relatime').
> > > 
> > > If you compile your own kernel, answer "N" to the question
> > > 
> > >   Default to 'data=ordered' in ext3?
> > > 
> > > at config time (CONFIG_EXT3_DEFAULTS_TO_ORDERED), or you can make sure
> > > "data=writeback" is in the fstab (but I don't think everything honors
> > > it for the root filesystem).
> > 
> > Don't forget to mention data=writeback is not the default because if your system 
> > crashes or you lose power running in this mode it will *CORRUPT YOUR FILESYSTEM* 
> > and you *WILL LOSE DATA*. Not to mention
> 
> You will lose your data, but the filesystem should still be consistent, right? 
> Metadata are still journaled.

That is data that was freshly touched around the time the system went down, right?

I.e. data that was probably half-modified by user-space to begin with.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10  1:32                               ` Dave Chinner
                                                   ` (2 preceding siblings ...)
  2010-11-10 14:20                                 ` Pavel Machek
@ 2010-11-10 14:33                                 ` Theodore Tso
  2010-11-10 14:57                                   ` Christoph Hellwig
  2010-11-10 23:36                                   ` Dave Chinner
  2010-11-10 15:59                                 ` Linus Torvalds
  4 siblings, 2 replies; 65+ messages in thread
From: Theodore Tso @ 2010-11-10 14:33 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Jens Axboe, dave b, Sanjoy Mahajan, Jesper Juhl,
	Chris Mason, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Andrew Morton, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Corrado Zoccolo,
	Shaohua Li, Steven Barrett

On Nov 9, 2010, at 8:32 PM, Dave Chinner wrote:

> Don't forget to mention data=writeback is not the default because if
> your system crashes or you lose power running in this mode it will
> *CORRUPT YOUR FILESYSTEM* and you *WILL LOSE DATA*. Not to mention
> the significant security issues (e.g stale data exposure) that also
> occur even if the filesystem is not corrupted by the crash. IOWs,
> data=writeback is the "fast but I'll eat your data" option for ext3.

This is strictly speaking not true.  Using data=writeback will not cause you to lose any data --- at least, not any more than you would without the feature.   If you have applications that write files in an unsafe way, that data is going to be lost, one way or another.  (i.e., with XFS in a similar situation you'll get a zero-length file)   The difference is that in the case of a system crash, there may be unwritten data revealed if you use data=writeback.  This could be a security exposure, especially if you are using your system in as time-sharing system, and where you see the contents of deleted files belonging to another user.

So it is not an "eat your data" situation,  but rather, a "possibly expose old data".   Whether or not you care on a single-user workstation situation, is an individual judgement call.   There's been a lot of controversy about this.

The chance that this occurs using data=writeback in ext4 is much less, BTW, because with delayed allocation we delay updating the inode until right before we write the block.  I have a plan for changing things so that we write the data blocks *first* and then update the metadata blocks second, which will mean that ext4 data=ordered will go away entirely, and we'll get both the safety and as well as avoiding the forced data page writeouts during journal commits.

-- Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10 14:27                                   ` Ingo Molnar
@ 2010-11-10 14:55                                     ` Christoph Hellwig
  2010-11-10 19:09                                       ` Pavel Machek
  0 siblings, 1 reply; 65+ messages in thread
From: Christoph Hellwig @ 2010-11-10 14:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pavel Machek, Dave Chinner, Linus Torvalds, Jens Axboe, dave b,
	Sanjoy Mahajan, Jesper Juhl, Chris Mason, Pekka Enberg,
	Aidar Kultayev, linux-kernel, linux-mm, Andrew Morton,
	Peter Zijlstra, Nick Piggin, Arjan van de Ven, Thomas Gleixner,
	Ted Ts'o, Corrado Zoccolo, Shaohua Li, Steven Barrett

On Wed, Nov 10, 2010 at 03:27:21PM +0100, Ingo Molnar wrote:
> That is data that was freshly touched around the time the system went down, right?
> 
> I.e. data that was probably half-modified by user-space to begin with.

It's data that wasn't synced out yet, yes.  Which isn't the problem per
se.  With ext3/4 in ordered mode, or xfs, or btrfs the file size won't
be incremented until the data is written.  in ext3/4 in writeback mode
(or various non-journaling filesystems) however the inode size is
updated, and metadagta changes are logged.  Besides exposing stale
data which is a security risk in multi-user systems it also means the
inode looks modified (by size and timestamps), but contains other data
than actually written.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10 14:33                                 ` Theodore Tso
@ 2010-11-10 14:57                                   ` Christoph Hellwig
  2010-11-10 15:00                                     ` Chris Mason
  2010-11-10 23:36                                   ` Dave Chinner
  1 sibling, 1 reply; 65+ messages in thread
From: Christoph Hellwig @ 2010-11-10 14:57 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Dave Chinner, Linus Torvalds, Jens Axboe, dave b, Sanjoy Mahajan,
	Jesper Juhl, Chris Mason, Ingo Molnar, Pekka Enberg,
	Aidar Kultayev, linux-kernel, linux-mm, Andrew Morton,
	Peter Zijlstra, Nick Piggin, Arjan van de Ven, Thomas Gleixner,
	Corrado Zoccolo, Shaohua Li, Steven Barrett

On Wed, Nov 10, 2010 at 09:33:29AM -0500, Theodore Tso wrote:
> The chance that this occurs using data=writeback in ext4 is much less, BTW, because with delayed allocation we delay updating the inode until right before we write the block.  I have a plan for changing things so that we write the data blocks *first* and then update the metadata blocks second, which will mean that ext4 data=ordered will go away entirely, and we'll get both the safety and as well as avoiding the forced data page writeouts during journal commits.

That's the scheme used by XFS and btrfs in one form or another.  Chris
also had a patch to implement it for ext3, which unfortunately fell
under the floor.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10 14:57                                   ` Christoph Hellwig
@ 2010-11-10 15:00                                     ` Chris Mason
  0 siblings, 0 replies; 65+ messages in thread
From: Chris Mason @ 2010-11-10 15:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Theodore Tso, Dave Chinner, Linus Torvalds, Jens Axboe, dave b,
	Sanjoy Mahajan, Jesper Juhl, Ingo Molnar, Pekka Enberg,
	Aidar Kultayev, linux-kernel, linux-mm, Andrew Morton,
	Peter Zijlstra, Nick Piggin, Arjan van de Ven, Thomas Gleixner,
	Corrado Zoccolo, Shaohua Li, Steven Barrett

Excerpts from Christoph Hellwig's message of 2010-11-10 09:57:12 -0500:
> On Wed, Nov 10, 2010 at 09:33:29AM -0500, Theodore Tso wrote:
> > The chance that this occurs using data=writeback in ext4 is much less, BTW, because with delayed allocation we delay updating the inode until right before we write the block.  I have a plan for changing things so that we write the data blocks *first* and then update the metadata blocks second, which will mean that ext4 data=ordered will go away entirely, and we'll get both the safety and as well as avoiding the forced data page writeouts during journal commits.
> 
> That's the scheme used by XFS and btrfs in one form or another.  Chris
> also had a patch to implement it for ext3, which unfortunately fell
> under the floor.

It probably still applies, but by the time I had it stable I realized
that ext4 was really a better place to fix this stuff.  ext3 is what it
is (good and bad), and a big change like my data=guarded code probably
isn't the best way to help.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10  1:32                               ` Dave Chinner
                                                   ` (3 preceding siblings ...)
  2010-11-10 14:33                                 ` Theodore Tso
@ 2010-11-10 15:59                                 ` Linus Torvalds
  2010-11-10 16:46                                   ` Alexey Dobriyan
  2010-11-10 23:43                                   ` Dave Chinner
  4 siblings, 2 replies; 65+ messages in thread
From: Linus Torvalds @ 2010-11-10 15:59 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jens Axboe, dave b, Sanjoy Mahajan, Jesper Juhl, Chris Mason,
	Ingo Molnar, Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Andrew Morton, Peter Zijlstra, Nick Piggin, Arjan van de Ven,
	Thomas Gleixner, Ted Ts'o, Corrado Zoccolo, Shaohua Li,
	Steven Barrett

On Tue, Nov 9, 2010 at 5:32 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> Don't forget to mention data=writeback is not the default because if
> your system crashes or you lose power running in this mode it will
> *CORRUPT YOUR FILESYSTEM* and you *WILL LOSE DATA*.

You will lose data even with data=ordered. All the data that didn't
get logged before the crash is lost anyway.

So your argument is kind of dishonest. The thing is, if you have a
crash or power outage or whatever, the only data you can really rely
on is always going to be the data that you fsync'ed before the crash.
Everything else is just gravy.

Are there downsides to "data=writeback"? Absolutely. But anybody who
tries to push those downsides without taking the performance and
latency issues into account is just not thinking straight.

Too many people think that "correct" is somehow black-and-white. It's
not. "The correct answer too late" is not worth anything. Sane people
understand that "good enough" is important.

And quite frankly, "data=writeback" is not wonderful, but it's "good
enough". And it helps enormously with at least one class of serious
performance problems. Dismissing it because it doesn't have quite the
guarantees of "data=ordered" is like saying that you should never use
"pi=3.14" for any calculations because it's not as exact as
"pi=3.14159265". The thing is, for many things, three significant
digits (or even _one_ significant digit) is plenty.

ext3 [f]sync sucks. We know. All filesystems suck. They just tend to
do it in different dimensions.

                         Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10 15:59                                 ` Linus Torvalds
@ 2010-11-10 16:46                                   ` Alexey Dobriyan
  2010-11-10 16:55                                     ` Linus Torvalds
  2010-11-10 18:27                                     ` Mike Galbraith
  2010-11-10 23:43                                   ` Dave Chinner
  1 sibling, 2 replies; 65+ messages in thread
From: Alexey Dobriyan @ 2010-11-10 16:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Jens Axboe, dave b, Sanjoy Mahajan, Jesper Juhl,
	Chris Mason, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Andrew Morton, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Ted Ts'o,
	Corrado Zoccolo, Shaohua Li, Steven Barrett

On Wed, Nov 10, 2010 at 5:59 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Nov 9, 2010 at 5:32 PM, Dave Chinner <david@fromorbit.com> wrote:
>>
>> Don't forget to mention data=writeback is not the default because if
>> your system crashes or you lose power running in this mode it will
>> *CORRUPT YOUR FILESYSTEM* and you *WILL LOSE DATA*.
>
> You will lose data even with data=ordered. All the data that didn't
> get logged before the crash is lost anyway.

Linus, are you using with data=writeback?

Those of us, who did (without UPS), will never do it again.

Propability of non-trivial FS corruption becomes so much bigger.
I believe from my experience, average number of crashes before
one loses FS becomes single digit number.

With data=ordered, it's quite hard.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10 16:46                                   ` Alexey Dobriyan
@ 2010-11-10 16:55                                     ` Linus Torvalds
  2010-11-10 17:10                                       ` Alexey Dobriyan
  2010-11-10 18:27                                     ` Mike Galbraith
  1 sibling, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2010-11-10 16:55 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Dave Chinner, Jens Axboe, dave b, Sanjoy Mahajan, Jesper Juhl,
	Chris Mason, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Andrew Morton, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Ted Ts'o,
	Corrado Zoccolo, Shaohua Li, Steven Barrett

On Wed, Nov 10, 2010 at 8:46 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
>>
>> You will lose data even with data=ordered. All the data that didn't
>> get logged before the crash is lost anyway.
>
> Linus, are you using with data=writeback?

I used to, indeed. But since I upgrade computers fairly regularly, and
all the distros have moved towards ext4, I'm no longer using ext3 at
all.

But yes, to me ext3 was totally unusable with rotational media and
"data=ordered". Not just bad. Total crap. Whenever the mail client
wanted to write something out, the whole machine basically stopped.

Of course, part of that was that long ago I used reiserfs back when
SuSE had it as the default. So I didn't think that the hickups were
"normal" like a lot of people probably do. I knew better. So it was
"bad latency, and I know it's the filesystem that is total crap".

> Those of us, who did (without UPS), will never do it again.

Before or after the change to make renaming on top of old files do the
IO flushing?

That made a big difference for some rather common cases.

                            Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10 16:55                                     ` Linus Torvalds
@ 2010-11-10 17:10                                       ` Alexey Dobriyan
  2010-11-10 18:55                                         ` Mark Lord
  0 siblings, 1 reply; 65+ messages in thread
From: Alexey Dobriyan @ 2010-11-10 17:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Jens Axboe, dave b, Sanjoy Mahajan, Jesper Juhl,
	Chris Mason, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Andrew Morton, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Ted Ts'o,
	Corrado Zoccolo, Shaohua Li, Steven Barrett

On Wed, Nov 10, 2010 at 6:55 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Nov 10, 2010 at 8:46 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
>> Those of us, who did (without UPS), will never do it again.
>
> Before or after the change to make renaming on top of old files do the
> IO flushing?

It was long ago, so before patch.

> That made a big difference for some rather common cases.

That's good.
Maybe, it's only an order of magnitude likely to lose FS now instead of several.
:-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10 16:46                                   ` Alexey Dobriyan
  2010-11-10 16:55                                     ` Linus Torvalds
@ 2010-11-10 18:27                                     ` Mike Galbraith
  1 sibling, 0 replies; 65+ messages in thread
From: Mike Galbraith @ 2010-11-10 18:27 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Linus Torvalds, Dave Chinner, Jens Axboe, dave b, Sanjoy Mahajan,
	Jesper Juhl, Chris Mason, Ingo Molnar, Pekka Enberg,
	Aidar Kultayev, linux-kernel, linux-mm, Andrew Morton,
	Peter Zijlstra, Nick Piggin, Arjan van de Ven, Thomas Gleixner,
	Ted Ts'o, Corrado Zoccolo, Shaohua Li, Steven Barrett

On Wed, 2010-11-10 at 18:46 +0200, Alexey Dobriyan wrote:
> On Wed, Nov 10, 2010 at 5:59 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Tue, Nov 9, 2010 at 5:32 PM, Dave Chinner <david@fromorbit.com> wrote:
> >>
> >> Don't forget to mention data=writeback is not the default because if
> >> your system crashes or you lose power running in this mode it will
> >> *CORRUPT YOUR FILESYSTEM* and you *WILL LOSE DATA*.
> >
> > You will lose data even with data=ordered. All the data that didn't
> > get logged before the crash is lost anyway.
> 
> Linus, are you using with data=writeback?
> 
> Those of us, who did (without UPS), will never do it again.

I've been using it for a looong time on my desktop box.  Yeah, you can
be bitten easier than ordered, and I have been, but it's never been
anything major.  The risk for me is worth it, as data=ordered sucked
really bad.

If I didn't need to maintain compatibility with 30+ old kernels for
regression testing, I'd upgrade desktop to ext4, and likely be happy.

> Propability of non-trivial FS corruption becomes so much bigger.
> I believe from my experience, average number of crashes before
> one loses FS becomes single digit number.

That's not my experience.  I've yet to have to rebuild my ext3 fs since
upgrading box to shiny new opensuse 11.1 (however long ago and how many
many explosions ago that was;)

> With data=ordered, it's quite hard.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10 17:10                                       ` Alexey Dobriyan
@ 2010-11-10 18:55                                         ` Mark Lord
  0 siblings, 0 replies; 65+ messages in thread
From: Mark Lord @ 2010-11-10 18:55 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Linus Torvalds, Dave Chinner, Jens Axboe, dave b, Sanjoy Mahajan,
	Jesper Juhl, Chris Mason, Ingo Molnar, Pekka Enberg,
	Aidar Kultayev, linux-kernel, linux-mm, Andrew Morton,
	Peter Zijlstra, Nick Piggin, Arjan van de Ven, Thomas Gleixner,
	Ted Ts'o, Corrado Zoccolo, Shaohua Li, Steven Barrett

On 10-11-10 12:10 PM, Alexey Dobriyan wrote:
> On Wed, Nov 10, 2010 at 6:55 PM, Linus Torvalds
> <torvalds@linux-foundation.org>  wrote:
>> On Wed, Nov 10, 2010 at 8:46 AM, Alexey Dobriyan<adobriyan@gmail.com>  wrote:
>>> Those of us, who did (without UPS), will never do it again.

I've used ext2 and ext3 extensively on all of the boxes here,
every since each first became available.   I developed Linux IDE,
the first IDE DMA, lots of custom storage drivers, and more recently
worked on libata drivers.  This meant a LOT of sudden and catastrophic
system failures, as the bugs and other kinks were worked on.

Never lost a nibble.  Totally, utterly reliable stuff for everyday use.
*WITH* the write-caches all enabled on all of the drives, too.

Sure, sudden power-failures could have a better chance of corrupting data,
but those are really rare, and the few that have happened were again non-events 
here.

That's the difference between theory and practice.

Cheers
-ml

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10 14:55                                     ` Christoph Hellwig
@ 2010-11-10 19:09                                       ` Pavel Machek
  0 siblings, 0 replies; 65+ messages in thread
From: Pavel Machek @ 2010-11-10 19:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ingo Molnar, Dave Chinner, Linus Torvalds, Jens Axboe, dave b,
	Sanjoy Mahajan, Jesper Juhl, Chris Mason, Pekka Enberg,
	Aidar Kultayev, linux-kernel, linux-mm, Andrew Morton,
	Peter Zijlstra, Nick Piggin, Arjan van de Ven, Thomas Gleixner,
	Ted Ts'o, Corrado Zoccolo, Shaohua Li, Steven Barrett

Hi!

> > That is data that was freshly touched around the time the system went down, right?
> > 
> > I.e. data that was probably half-modified by user-space to begin with.
> 
> It's data that wasn't synced out yet, yes.  Which isn't the problem per
> se.  With ext3/4 in ordered mode, or xfs, or btrfs the file size won't
> be incremented until the data is written.  in ext3/4 in writeback mode
> (or various non-journaling filesystems) however the inode size is
> updated, and metadagta changes are logged.  Besides exposing stale
> data which is a security risk in multi-user systems it also means the
> inode looks modified (by size and timestamps), but contains other data
> than actually written.

Well, afaict thats traditional unix behaviour... while it is not user
friendly, I'd not call it 'corrupted filesytem'.
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10 14:33                                 ` Theodore Tso
  2010-11-10 14:57                                   ` Christoph Hellwig
@ 2010-11-10 23:36                                   ` Dave Chinner
  1 sibling, 0 replies; 65+ messages in thread
From: Dave Chinner @ 2010-11-10 23:36 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Linus Torvalds, Jens Axboe, dave b, Sanjoy Mahajan, Jesper Juhl,
	Chris Mason, Ingo Molnar, Pekka Enberg, Aidar Kultayev,
	linux-kernel, linux-mm, Andrew Morton, Peter Zijlstra,
	Nick Piggin, Arjan van de Ven, Thomas Gleixner, Corrado Zoccolo,
	Shaohua Li, Steven Barrett

On Wed, Nov 10, 2010 at 09:33:29AM -0500, Theodore Tso wrote:
> 
> On Nov 9, 2010, at 8:32 PM, Dave Chinner wrote:
> 
> > Don't forget to mention data=writeback is not the default because if
> > your system crashes or you lose power running in this mode it will
> > *CORRUPT YOUR FILESYSTEM* and you *WILL LOSE DATA*. Not to mention
> > the significant security issues (e.g stale data exposure) that also
> > occur even if the filesystem is not corrupted by the crash. IOWs,
> > data=writeback is the "fast but I'll eat your data" option for ext3.
> 
> This is strictly speaking not true.  Using data=writeback will not
> cause you to lose any data --- at least, not any more than you
> would without the feature.   If you have applications that write
> files in an unsafe way, that data is going to be lost, one way or
> another.  (i.e., with XFS in a similar situation you'll get a
> zero-length file)   The difference is that in the case of a system
> crash, there may be unwritten data revealed if you use
> data=writeback.  This could be a security exposure, especially if
> you are using your system in as time-sharing system, and where you
> see the contents of deleted files belonging to another user.

In theory, that's all that is _supposed_ to happen. However, my
recent experience is that massive ext3 filesystem corruption occurs
in data=writeback mode when the system crashes and that does not
happen in ordered mode.

Why do you think i posted the patches to change the default back to
ordered mode a few months back? I basically trashed the root ext3
partitions on three test machines (to the point where >5000 files
across /sbin, /bin, /lib and /usr were corrupted or missing and I
had to reinstall from scratch) when I'd forgotten to set the
ordered-is-defult config option in the kernel i was testing.  And
that is when the only thing being written to the root filesystems
was log files...

The worst part about this was that I also had ext3 filesystems
corrupted by crashes in such a way that e2fsck didn't detect it but
they would repeatedly trigger kernel crashes at runtime....

> So it is not an "eat your data" situation,

My experience says otherwise....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: 2.6.36 io bring the system to its knees
  2010-11-10 15:59                                 ` Linus Torvalds
  2010-11-10 16:46                                   ` Alexey Dobriyan
@ 2010-11-10 23:43                                   ` Dave Chinner
  1 sibling, 0 replies; 65+ messages in thread
From: Dave Chinner @ 2010-11-10 23:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, dave b, Sanjoy Mahajan, Jesper Juhl, Chris Mason,
	Ingo Molnar, Pekka Enberg, Aidar Kultayev, linux-kernel, linux-mm,
	Andrew Morton, Peter Zijlstra, Nick Piggin, Arjan van de Ven,
	Thomas Gleixner, Ted Ts'o, Corrado Zoccolo, Shaohua Li,
	Steven Barrett

On Wed, Nov 10, 2010 at 07:59:10AM -0800, Linus Torvalds wrote:
> On Tue, Nov 9, 2010 at 5:32 PM, Dave Chinner <david@fromorbit.com> wrote:
> >
> > Don't forget to mention data=writeback is not the default because if
> > your system crashes or you lose power running in this mode it will
> > *CORRUPT YOUR FILESYSTEM* and you *WILL LOSE DATA*.
> 
> You will lose data even with data=ordered. All the data that didn't
> get logged before the crash is lost anyway.
> 
> So your argument is kind of dishonest. The thing is, if you have a
> crash or power outage or whatever, the only data you can really rely
> on is always going to be the data that you fsync'ed before the crash.
> Everything else is just gravy.

I crash kernels tens of times every day doing filesystem testing.
With data=ordered I have not seen a corrupted root filesystem as a
result of normal testing and crashing as long as I can remember.
With data=writeback, I'll have corrupted root ext3 partitions in
under a day. Hardly what I'd call stable or something you'd want
to deploy in production.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2010-11-10 23:44 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <AANLkTimt7wzR9RwGWbvhiOmot_zzayfCfSh_-v6yvuAP@mail.gmail.com>
     [not found] ` <AANLkTikRKVBzO=ruy=JDmBF28NiUdJmAqb4-1VhK0QBX@mail.gmail.com>
     [not found]   ` <AANLkTinzJ9a+9w7G5X0uZpX2o-L8E6XW98VFKoF1R_-S@mail.gmail.com>
2010-10-28  6:09     ` 2.6.36 io bring the system to its knees Aidar Kultayev
2010-10-28  6:32       ` Pekka Enberg
2010-10-28  9:00         ` Ingo Molnar
2010-10-28  9:34           ` Pekka Enberg
2010-10-28 11:16           ` Pekka Enberg
2010-10-28 11:33             ` Aidar Kultayev
2010-10-28 11:48               ` Pekka Enberg
2010-10-28 12:18                 ` Aidar Kultayev
2010-10-28 13:46                 ` Christoph Hellwig
2010-10-28 13:54                   ` Ingo Molnar
2010-10-28 13:30             ` Ingo Molnar
2010-10-28 13:47               ` Christoph Hellwig
2010-10-28 13:50                 ` Ingo Molnar
2010-10-28 17:01               ` Chris Mason
2010-10-28 17:57                 ` Pekka Enberg
2010-10-29 14:52                   ` Ted Ts'o
2010-10-29 15:33                     ` Aidar Kultayev
2010-10-30  9:14                       ` Ingo Molnar
2010-10-30 13:02                         ` Aidar Kultayev
2010-10-30 19:06                           ` Chris Mason
2010-10-31  2:31                           ` Ted Ts'o
2010-10-31 17:49                             ` Corrado Zoccolo
2010-11-02  3:10                           ` Shaohua Li
2010-11-02 11:47                 ` Sanjoy Mahajan
2010-11-02 13:12                   ` Chris Mason
2010-11-04 16:05                     ` Sanjoy Mahajan
2010-11-04 23:35                       ` Steven Barrett
2010-11-04 23:44                 ` Jesper Juhl
2010-11-04 23:48                   ` Jesper Juhl
2010-11-05  1:43                     ` Dave Chinner
2010-11-05 12:48                       ` Sanjoy Mahajan
2010-11-06 14:10                         ` dave b
2010-11-06 15:12                           ` Dave Chinner
2010-11-07  6:06                             ` dave b
2010-11-07 12:08                           ` Jens Axboe
2010-11-07 15:50                             ` Linus Torvalds
2010-11-10  1:32                               ` Dave Chinner
2010-11-10  2:01                                 ` dave b
2010-11-10  8:08                                 ` Evgeniy Ivanov
2010-11-10  8:24                                   ` Dave Chinner
2010-11-10 14:22                                     ` Pavel Machek
2010-11-10 14:20                                 ` Pavel Machek
2010-11-10 14:27                                   ` Ingo Molnar
2010-11-10 14:55                                     ` Christoph Hellwig
2010-11-10 19:09                                       ` Pavel Machek
2010-11-10 14:33                                 ` Theodore Tso
2010-11-10 14:57                                   ` Christoph Hellwig
2010-11-10 15:00                                     ` Chris Mason
2010-11-10 23:36                                   ` Dave Chinner
2010-11-10 15:59                                 ` Linus Torvalds
2010-11-10 16:46                                   ` Alexey Dobriyan
2010-11-10 16:55                                     ` Linus Torvalds
2010-11-10 17:10                                       ` Alexey Dobriyan
2010-11-10 18:55                                         ` Mark Lord
2010-11-10 18:27                                     ` Mike Galbraith
2010-11-10 23:43                                   ` Dave Chinner
2010-11-06 19:10                         ` Arjan van de Ven
2010-11-07 17:16                       ` Jesper Juhl
2010-11-09 19:47                         ` Evgeniy Ivanov
2010-11-09 20:20                           ` Christoph Hellwig
2010-11-09 21:00                       ` Chris Mason
2010-10-31  1:22       ` Wu Fengguang
2010-10-31  1:51         ` Wu Fengguang
2010-11-01  1:09           ` Dimitrios Apostolou
2010-11-02  1:20             ` Wu Fengguang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).