[Bug 14830] When other IO is running sync times go to 10 to 20 minutes

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
@ 2010-01-13 19:15 ` bugzilla-daemon
  2010-01-18 21:50 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-13 19:15 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830


Alasdair G Kergon <agk@redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |agk@redhat.com
          Component|LVM2/DM                     |ext4
         AssignedTo|agk@redhat.com              |fs_ext4@kernel-bugs.osdl.or
                   |                            |g
            Product|IO/Storage                  |File System




--- Comment #1 from Alasdair G Kergon <agk@redhat.com>  2010-01-13 19:15:12 ---
filesystem or nfs perhaps?

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
  2010-01-13 19:15 ` bugzilla-daemon
@ 2010-01-18 21:50 ` bugzilla-daemon
  2010-01-18 23:58 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-18 21:50 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830

Jan Kara <jack@suse.cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jack@suse.cz

--- Comment #2 from Jan Kara <jack@suse.cz>  2010-01-18 21:50:38 ---
There are two issues here - the first is that sync takes long - this is at the
level of "don't do it when it hurts" kind of thing ;). So when you do heavy
writing and call sync, it simply takes long time to flush all the caches to
disk. If you think the time is inappropriately long, we can have a look at it
but for that we'd need much more details like amount and nature of data writen
(many small files vs a few large ones), time it takes sync to complete, speed
of disks for sequential IO...

The second issue is that nfsd blocks as well. Partly this might be because sync
blocks writers (so that it can get it's work done in a finite time), partly it
might be a limitation of ext4 because all metadata writes go through a journal
which has a limited size and thus we have to copy data from the journal to
final locations on disk once in a while and that usually leads to all writer
processes blocking waiting for the journal space to be freed (which can take a
longer time when 'sync' process is making disk busy with data writes).

Note in the log file:
This problem prevents production use of systems using this kernel.

evokes a question: Do you have a kernel which behaved better for you? Which
one?

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
  2010-01-13 19:15 ` bugzilla-daemon
  2010-01-18 21:50 ` bugzilla-daemon
@ 2010-01-18 23:58 ` bugzilla-daemon
  2010-01-19 17:24   ` Chris Lee
  2010-01-19 17:24 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-18 23:58 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #3 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-01-18 23:58:09 ---
>This problem prevents production use of systems using this kernel.

>evokes a question: Do you have a kernel which behaved better for you? Which
>one?

Yes.  RHEL5.4 does not show this problem.  It is the production
system that works in this environment.

The response above is disappointing.  Is sync response of 20 minutes,
including several task timeouts to be considered "normal?"

>If you think the time is inappropriately long, we can have a look at it
>but for that we'd need much more details like amount and nature of data writen
>(many small files vs a few large ones), time it takes sync to complete, speed
>of disks for sequential IO...

I am sorry to have to tell you that in this environment we do not
deal in exclusively small or large files, we actually have quite a
few of both. When an rsync which transfers about 50GB of files of
various sizes is running, the hung condition is continuous until the rsync
completes.  This is just a pretty typical load.  You could try it
yourself.  No special sizes of files are required.  I think I
mentioned that the ext4 LVM is a RAID 50 3ware 9650SE-8LPML,
with 8 2T drives.  Its throughput for reading and writing is good
when the system is not locked up.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
  2010-01-18 23:58 ` bugzilla-daemon
@ 2010-01-19 17:24   ` Chris Lee
  0 siblings, 0 replies; 48+ messages in thread
From: Chris Lee @ 2010-01-19 17:24 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-ext4



bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=14830
>
>
>
>
>
> --- Comment #3 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-01-18 23:58:09 ---
>   
>> This problem prevents production use of systems using this kernel.
>>     
>
>   
>> evokes a question: Do you have a kernel which behaved better for you? Which
>> one?
>>     
>
> Yes.  RHEL5.4 does not show this problem.  It is the production
> system that works in this environment.
>
> The response above is disappointing.  Is sync response of 20 minutes,
> including several task timeouts to be considered "normal?"
>
>   
>> If you think the time is inappropriately long, we can have a look at it
>> but for that we'd need much more details like amount and nature of data writen
>> (many small files vs a few large ones), time it takes sync to complete, speed
>> of disks for sequential IO...
>>     
>
> I am sorry to have to tell you that in this environment we do not
> deal in exclusively small or large files, we actually have quite a
> few of both. When an rsync which transfers about 50GB of files of
> various sizes is running, the hung condition is continuous until the rsync
> completes.  This is just a pretty typical load.  You could try it
> yourself.  No special sizes of files are required.  I think I
> mentioned that the ext4 LVM is a RAID 50 3ware 9650SE-8LPML,
> with 8 2T drives.  Its throughput for reading and writing is good
> when the system is not locked up.
>
>   
Is it possible that it is something allong the lines of what is 
described at this link:
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html

If so a runtime adjustment might help you out.

Chris.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (2 preceding siblings ...)
  2010-01-18 23:58 ` bugzilla-daemon
@ 2010-01-19 17:24 ` bugzilla-daemon
  2010-01-19 19:38 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-19 17:24 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #4 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2010-01-19 17:24:49 ---
Reply-To: cslee-list@cybericom.co.uk

bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=14830
>
>
>
>
>
> --- Comment #3 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-01-18 23:58:09 ---
>   
>> This problem prevents production use of systems using this kernel.
>>     
>
>   
>> evokes a question: Do you have a kernel which behaved better for you? Which
>> one?
>>     
>
> Yes.  RHEL5.4 does not show this problem.  It is the production
> system that works in this environment.
>
> The response above is disappointing.  Is sync response of 20 minutes,
> including several task timeouts to be considered "normal?"
>
>   
>> If you think the time is inappropriately long, we can have a look at it
>> but for that we'd need much more details like amount and nature of data writen
>> (many small files vs a few large ones), time it takes sync to complete, speed
>> of disks for sequential IO...
>>     
>
> I am sorry to have to tell you that in this environment we do not
> deal in exclusively small or large files, we actually have quite a
> few of both. When an rsync which transfers about 50GB of files of
> various sizes is running, the hung condition is continuous until the rsync
> completes.  This is just a pretty typical load.  You could try it
> yourself.  No special sizes of files are required.  I think I
> mentioned that the ext4 LVM is a RAID 50 3ware 9650SE-8LPML,
> with 8 2T drives.  Its throughput for reading and writing is good
> when the system is not locked up.
>
>   
Is it possible that it is something allong the lines of what is 
described at this link:
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html

If so a runtime adjustment might help you out.

Chris.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (3 preceding siblings ...)
  2010-01-19 17:24 ` bugzilla-daemon
@ 2010-01-19 19:38 ` bugzilla-daemon
  2010-01-19 21:56 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-19 19:38 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830

Theodore Tso <tytso@mit.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tytso@mit.edu

--- Comment #5 from Theodore Tso <tytso@mit.edu>  2010-01-19 19:38:34 ---
So the Linux write cache mystery is unlikely to solve the problem since that
was talking about backporting some tuning parameters from 2.6.22 to the
RHEL/CentOS 2.6.18 kernel, and here the problem seems to be that the sync is
taking much longer on a 2.6.31 FC 12 kernel. 

So the first thing I notice is the fact that you have the nodelalloc mount
option enabled.   Any particular reason why you did that?   Try removing that;
one of the reasons why ext4 is generally described as being much better than
ext3 with respect to this problem (of the machine becoming unresponsive during
a sync) is because of delayed allocation, and you've turned it off.   So try
removing nodelalloc and see if that makes the performance come back.

Another thing that might be worth testing is to see whether an ext3 filesystem
on a 2.6.31 FC 12 kernel behaves any differently.   This may be something that
is some kind of VM tuning issue between RHEL 5.4 and a modern kernel; I don't
know how many people try running Fedora 12 on a system with large amounts of
memory and an NFS load, and maybe there is some kind of tuning issue that has
been exposed.   So that's a quick experiment that's worth doing just so we can
figure out where we need to concentrate our diagnostic efforts.

Another thing to try is to do some instrumentation using iostat to see what the
system is doing, before, during, and after the sync command.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (4 preceding siblings ...)
  2010-01-19 19:38 ` bugzilla-daemon
@ 2010-01-19 21:56 ` bugzilla-daemon
  2010-01-25  7:38 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-19 21:56 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #6 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-01-19 21:56:12 ---
>So the first thing I notice is the fact that you have the nodelalloc mount
>option enabled.   Any particular reason why you did that?  

This was required due to an error in 2.6.30.  It is possible that
it is no longer needed. I will check.

It is not really feasible to revert to ext3 other than going back to
RHEL5 which is what has been done.  One of the main purposes of using
FC11 was ext4.  fsck under ext3 is about 8 hours, under ext4 about
9 minutes.  And, ext4 has other well-advertised advantages.

>Another thing to try is to do some instrumentation using iostat to see what the
>system is doing, before, during, and after the sync command.

I have tried this to some extent.  It is not too easy when response
is extremely slow.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (5 preceding siblings ...)
  2010-01-19 21:56 ` bugzilla-daemon
@ 2010-01-25  7:38 ` bugzilla-daemon
  2010-01-25  8:27 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-25  7:38 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #7 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-01-25 07:38:52 ---
I tried the advice above about nodelalloc using a newer kernel.
It had no observable effect, but did not cause problems as it
had before. So, I left it turned off.  And, I installed kernel
2.6.31.12-174.2.3.fc12.x86_64.

None of this helped.  sync times still at 20 minutes during rsync,
and task (sync or rsync) timeouts in /var/log/messages.

I guess that patience is called for here.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (6 preceding siblings ...)
  2010-01-25  7:38 ` bugzilla-daemon
@ 2010-01-25  8:27 ` bugzilla-daemon
  2010-01-25 11:07 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-25  8:27 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #8 from Theodore Tso <tytso@mit.edu>  2010-01-25 08:27:26 ---
Hmm, can you run "iostat 1 | tee iostat.log &", and while that is running, wait
for a 15 seconds or so we can capture what things like in steady state, and
then type "sync", and note when the sync command was initiated in the
iostat.log file?   It would be useful seeing what this looks like on both your
ext3 production server and on the ext4 test server.  Do you know if there are
any other differences between the two systems, in terms of the workload seen by
your production server versus your test server?

Another thing that would be very useful to do is to enable ftrace, and then cd
to /sys/kernel/debug/tracing.  (This assumes that you have debugfs mounted on
/sys/kernel/debug.)   Then "echo 1 > events/jbd2/jbd2_run_stats/enable" and
then in a similar fashion, do "cat trace_pipe | tee /tmp/trace.output", wait
for four or five data samples from your file system of interest, and then issue
the sync command, and let's see what is happening.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (7 preceding siblings ...)
  2010-01-25  8:27 ` bugzilla-daemon
@ 2010-01-25 11:07 ` bugzilla-daemon
  2010-01-25 11:16 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-25 11:07 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #9 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-01-25 11:07:46 ---
Created an attachment (id=24705)
 --> (http://bugzilla.kernel.org/attachment.cgi?id=24705)
Requested iostat.log file

As requested by: Theodore Tso <tytso@mit.edu>

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (8 preceding siblings ...)
  2010-01-25 11:07 ` bugzilla-daemon
@ 2010-01-25 11:16 ` bugzilla-daemon
  2010-01-25 11:19 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-25 11:16 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #10 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-01-25 11:16:16 ---
Created an attachment (id=24706)
 --> (http://bugzilla.kernel.org/attachment.cgi?id=24706)
messages  output with sync running

Additional information for Theodore Tso <tytso@mit.edu>

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (9 preceding siblings ...)
  2010-01-25 11:16 ` bugzilla-daemon
@ 2010-01-25 11:19 ` bugzilla-daemon
  2010-01-27 13:06   ` Andre Noll
  2010-01-28  9:18 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-25 11:19 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #11 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-01-25 11:19:12 ---
I have attached the iostat.log and log/messages output.

sync was started after about 4 cycles of iostat.

This was run on the FC12 ext4 system.  No testing can
be done on the production system.

After kill -9 of the sync run it took about 20 minutes before 
it died.

Michael

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
  2010-01-25 11:19 ` bugzilla-daemon
@ 2010-01-27 13:06   ` Andre Noll
  2010-01-27 19:43     ` Andreas Dilger
  2010-01-28  7:53     ` tytso
  0 siblings, 2 replies; 48+ messages in thread
From: Andre Noll @ 2010-01-27 13:06 UTC (permalink / raw)
  To: Michael Godfrey; +Cc: linux-ext4

[-- Attachment #1: Type: text/plain, Size: 567 bytes --]

On 11:19, bugzilla-daemon@bugzilla.kernel.org wrote:
> After kill -9 of the sync run it took about 20 minutes before 
> it died.

I was seeing similar behaviour on one of our servers, and changing
the io scheduler to noop fixed things for me. So it seems to be an
issue with cfq which is somehow triggered by ext4 but not by ext3.

To change the IO scheduler, just execute

	echo noop > /sys/block/sda/queue/scheduler

(replace sda if necessary).

Just my 2 cents
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
  2010-01-27 13:06   ` Andre Noll
@ 2010-01-27 19:43     ` Andreas Dilger
  2010-01-28 10:24       ` Andre Noll
  2010-01-28  7:53     ` tytso
  1 sibling, 1 reply; 48+ messages in thread
From: Andreas Dilger @ 2010-01-27 19:43 UTC (permalink / raw)
  To: Andre Noll; +Cc: Michael Godfrey, linux-ext4

On 2010-01-27, at 06:06, Andre Noll wrote:
> On 11:19, bugzilla-daemon@bugzilla.kernel.org wrote:
>> After kill -9 of the sync run it took about 20 minutes before
>> it died.
>
> I was seeing similar behaviour on one of our servers, and changing
> the io scheduler to noop fixed things for me. So it seems to be an
> issue with cfq which is somehow triggered by ext4 but not by ext3.
>
> To change the IO scheduler, just execute
>
> 	echo noop > /sys/block/sda/queue/scheduler

Andre, could you please also test deadline instead of noop?  In  
general, deadline has nearly the same IO behaviour as noop, but still  
allows simple request merging and is generally a better option than  
noop.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
  2010-01-27 13:06   ` Andre Noll
  2010-01-27 19:43     ` Andreas Dilger
@ 2010-01-28  7:53     ` tytso
  2010-01-28 10:25       ` Andre Noll
  1 sibling, 1 reply; 48+ messages in thread
From: tytso @ 2010-01-28  7:53 UTC (permalink / raw)
  To: Andre Noll; +Cc: Michael Godfrey, linux-ext4, bugzilla-daemon

On Wed, Jan 27, 2010 at 02:06:25PM +0100, Andre Noll wrote:
> On 11:19, bugzilla-daemon@bugzilla.kernel.org wrote:
> > After kill -9 of the sync run it took about 20 minutes before 
> > it died.
> 
> I was seeing similar behaviour on one of our servers, and changing
> the io scheduler to noop fixed things for me. So it seems to be an
> issue with cfq which is somehow triggered by ext4 but not by ext3.
> 
> To change the IO scheduler, just execute
> 
> 	echo noop > /sys/block/sda/queue/scheduler
> 
> (replace sda if necessary).

Andre or Michael.  If switching away from cfq helps, that's
definitely... interesting.  Given that cfq is the default scheduler, I
definitely want to understand what might be going on here.  Are either
if you able to run blktrace so we can get a sense of what is going on
under the cfq and deadline/noop I/O schedulers?

And in both of your cases, were you using a new file system freshly
created using mke2fs -t ext4, or was this a ext2/ext3 filesystem that
was converted for use under ext4?

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (10 preceding siblings ...)
  2010-01-25 11:19 ` bugzilla-daemon
@ 2010-01-28  9:18 ` bugzilla-daemon
  2010-01-28 11:23 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-28  9:18 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #12 from Theodore Tso <tytso@mit.edu>  2010-01-28 09:18:21 ---
On Wed, Jan 27, 2010 at 02:06:25PM +0100, Andre Noll wrote:
> On 11:19, bugzilla-daemon@bugzilla.kernel.org wrote:
> > After kill -9 of the sync run it took about 20 minutes before 
> > it died.
> 
> I was seeing similar behaviour on one of our servers, and changing
> the io scheduler to noop fixed things for me. So it seems to be an
> issue with cfq which is somehow triggered by ext4 but not by ext3.
> 
> To change the IO scheduler, just execute
> 
> 	echo noop > /sys/block/sda/queue/scheduler
> 
> (replace sda if necessary).

Andre or Michael.  If switching away from cfq helps, that's
definitely... interesting.  Given that cfq is the default scheduler, I
definitely want to understand what might be going on here.  Are either
if you able to run blktrace so we can get a sense of what is going on
under the cfq and deadline/noop I/O schedulers?

And in both of your cases, were you using a new file system freshly
created using mke2fs -t ext4, or was this a ext2/ext3 filesystem that
was converted for use under ext4?

Thanks,

                    - Ted

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
  2010-01-27 19:43     ` Andreas Dilger
@ 2010-01-28 10:24       ` Andre Noll
  0 siblings, 0 replies; 48+ messages in thread
From: Andre Noll @ 2010-01-28 10:24 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Michael Godfrey, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 929 bytes --]

On 12:43, Andreas Dilger wrote:
> On 2010-01-27, at 06:06, Andre Noll wrote:
> >On 11:19, bugzilla-daemon@bugzilla.kernel.org wrote:
> >>After kill -9 of the sync run it took about 20 minutes before
> >>it died.
> >
> >I was seeing similar behaviour on one of our servers, and changing
> >the io scheduler to noop fixed things for me. So it seems to be an
> >issue with cfq which is somehow triggered by ext4 but not by ext3.
> >
> >To change the IO scheduler, just execute
> >
> >	echo noop > /sys/block/sda/queue/scheduler
> 
> Andre, could you please also test deadline instead of noop?

Sure. I just switched to deadline and the system still feels responsive
while rsync is running. With cfq a simple "ls" command took ages
to complete. I'll let you know if the system becomes sluggish again
after a while.

Thanks
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
  2010-01-28  7:53     ` tytso
@ 2010-01-28 10:25       ` Andre Noll
  0 siblings, 0 replies; 48+ messages in thread
From: Andre Noll @ 2010-01-28 10:25 UTC (permalink / raw)
  To: tytso; +Cc: Michael Godfrey, linux-ext4, bugzilla-daemon

[-- Attachment #1: Type: text/plain, Size: 1825 bytes --]

On 02:53, tytso@mit.edu wrote:
> On Wed, Jan 27, 2010 at 02:06:25PM +0100, Andre Noll wrote:
> > On 11:19, bugzilla-daemon@bugzilla.kernel.org wrote:
> > > After kill -9 of the sync run it took about 20 minutes before 
> > > it died.
> > 
> > I was seeing similar behaviour on one of our servers, and changing
> > the io scheduler to noop fixed things for me. So it seems to be an
> > issue with cfq which is somehow triggered by ext4 but not by ext3.
> > 
> > To change the IO scheduler, just execute
> > 
> > 	echo noop > /sys/block/sda/queue/scheduler
> > 
> > (replace sda if necessary).
> 
> Andre or Michael.  If switching away from cfq helps, that's
> definitely... interesting.  Given that cfq is the default scheduler, I
> definitely want to understand what might be going on here.  Are either
> if you able to run blktrace so we can get a sense of what is going on
> under the cfq and deadline/noop I/O schedulers?

Yes, I can use that machine freely for testing purposes, including
reboots. It is just our fallback server which creates hardlink-based
snapshots using rsync.

However, I have to recompile the kernel to include debugfs which is
needed by blktrace and I'd like to wait until the currently running
rsync completes before rebooting. Would you like to see the output of

	btrace /dev/mapper/...

or should I use more sophisticated command line options?

> And in both of your cases, were you using a new file system freshly
> created using mke2fs -t ext4, or was this a ext2/ext3 filesystem that
> was converted for use under ext4?

The ext4 file system was created from scratch using -O
dir_index,uninit_bg,extent, a block size of 4096 and 32768 bytes
per inode.

Thanks
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (11 preceding siblings ...)
  2010-01-28  9:18 ` bugzilla-daemon
@ 2010-01-28 11:23 ` bugzilla-daemon
  2010-01-28 16:22 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-28 11:23 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #13 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org>  2010-01-28 11:23:48 ---
Reply-To: maan@systemlinux.org

On 02:53, tytso@mit.edu wrote:
> On Wed, Jan 27, 2010 at 02:06:25PM +0100, Andre Noll wrote:
> > On 11:19, bugzilla-daemon@bugzilla.kernel.org wrote:
> > > After kill -9 of the sync run it took about 20 minutes before 
> > > it died.
> > 
> > I was seeing similar behaviour on one of our servers, and changing
> > the io scheduler to noop fixed things for me. So it seems to be an
> > issue with cfq which is somehow triggered by ext4 but not by ext3.
> > 
> > To change the IO scheduler, just execute
> > 
> > 	echo noop > /sys/block/sda/queue/scheduler
> > 
> > (replace sda if necessary).
> 
> Andre or Michael.  If switching away from cfq helps, that's
> definitely... interesting.  Given that cfq is the default scheduler, I
> definitely want to understand what might be going on here.  Are either
> if you able to run blktrace so we can get a sense of what is going on
> under the cfq and deadline/noop I/O schedulers?

Yes, I can use that machine freely for testing purposes, including
reboots. It is just our fallback server which creates hardlink-based
snapshots using rsync.

However, I have to recompile the kernel to include debugfs which is
needed by blktrace and I'd like to wait until the currently running
rsync completes before rebooting. Would you like to see the output of

    btrace /dev/mapper/...

or should I use more sophisticated command line options?

> And in both of your cases, were you using a new file system freshly
> created using mke2fs -t ext4, or was this a ext2/ext3 filesystem that
> was converted for use under ext4?

The ext4 file system was created from scratch using -O
dir_index,uninit_bg,extent, a block size of 4096 and 32768 bytes
per inode.

Thanks
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (12 preceding siblings ...)
  2010-01-28 11:23 ` bugzilla-daemon
@ 2010-01-28 16:22 ` bugzilla-daemon
  2010-01-29  0:52 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-28 16:22 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #14 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-01-28 16:21:57 ---
On 01/27/2010 11:53 PM, tytso@mit.edu wrote:
> And in both of your cases, were you using a new file system freshly
> created using mke2fs -t ext4, or was this a ext2/ext3 filesystem that
> was converted for use under ext4?
>
> Thanks,
>
> 					- Ted
>    
In my case it is new ext4.  I will try what tests I can in a day or
two.  The systems are busy right now.

Michael

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (13 preceding siblings ...)
  2010-01-28 16:22 ` bugzilla-daemon
@ 2010-01-29  0:52 ` bugzilla-daemon
  2010-01-29  6:37 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-29  0:52 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #15 from Theodore Tso <tytso@mit.edu>  2010-01-29 00:52:01 ---
> In my case it is new ext4.  I will try what tests I can in a day or
> two.  The systems are busy right now.

Well, I'm flying back from New Zealand so I'll be off the grid mostly
until Sunday or Monday.

Thanks for your efforts!

                        - Ted

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (14 preceding siblings ...)
  2010-01-29  0:52 ` bugzilla-daemon
@ 2010-01-29  6:37 ` bugzilla-daemon
  2010-02-04  4:16 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-01-29  6:37 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830

--- Comment #16 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-01-29 06:37:55 ---
I had a chance to try noop and deadline:

With the amount of testing that I could do they seemed to behave
about the same, and:

1. the task timeouts are gone.

2. sync, which used to take over 20 minutes is consistently
   about 10 minutes.  I did retest by setting scheduler back
   to cfg and the sync time went back to 24 minutes.

3. I also tried 2 variations:
   1. rsync from source to target:  rsync machinea:/aaa bbb
   2. rsync of an NFS-mounted filesystem to local filesystem.
      (i.e the rsync thought it was local:  rsync aaa/  bbb,
      but aaa was NFS-mounted (NFS3).
   These two variation behaved just about identically.

I have left the system set to deadline, as recommended.  With
this setting (and with noop) things like du seem quicker too.

So, this is good, but 10 minutes for sync is still way too long.
With these systems, I cannot compile debug kernels or stuff like
that.  They are in use for backup.

Any other suggestions?

Michael

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (15 preceding siblings ...)
  2010-01-29  6:37 ` bugzilla-daemon
@ 2010-02-04  4:16 ` bugzilla-daemon
  2010-02-15 14:28 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-02-04  4:16 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #17 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-02-04 04:16:40 ---
I have now set deadline for all tests, and I had
a chance to try a system just like the one I have been
using, but with an 8.1T ext4 partition instead of the
11T partition before.

I used the same load:  rsync  /aaa /bbb, where /aaa was
NFS-mounted.
The behavior was quite similar, but the longest sync
time was about 4 minutes.

This suggests that the size of the filesystem is an
important factor.  Increasing the filesystem size by 36%
more than doubled the sync time.

Th size of the database in the 2 systems was somewhat larger
in the 11T system, but part of that was unused in the transfer.

Michael

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (16 preceding siblings ...)
  2010-02-04  4:16 ` bugzilla-daemon
@ 2010-02-15 14:28 ` bugzilla-daemon
  2010-02-15 18:29 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-02-15 14:28 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #18 from Jan Kara <jack@suse.cz>  2010-02-15 14:28:38 ---
Created an attachment (id=25050)
 --> (http://bugzilla.kernel.org/attachment.cgi?id=25050)
Fix long sync times during heavy writing

Last week, I've noticed a bug in writeback code causing sync to take longer
than necessary. This patch should fix it. Is it possible for you to try it?
Thanks.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (17 preceding siblings ...)
  2010-02-15 14:28 ` bugzilla-daemon
@ 2010-02-15 18:29 ` bugzilla-daemon
  2010-03-10  6:56 ` bugzilla-daemon
  2010-03-10 15:36 ` bugzilla-daemon
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-02-15 18:29 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #19 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-02-15 18:29:18 ---
On 02/15/2010 06:28 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> Last week, I've noticed a bug in writeback code causing sync to take longer
> than necessary. This patch should fix it. Is it possible for you to try it?
> Thanks.
>    
I have been following this.  But right now I am leaving on a trip.  And,
I cannot right now build kernels for the system where we get the
problem.  These systems are in production use for backup.

Sorry that I cannot help more.

Michael

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (18 preceding siblings ...)
  2010-02-15 18:29 ` bugzilla-daemon
@ 2010-03-10  6:56 ` bugzilla-daemon
  2010-03-10 15:36 ` bugzilla-daemon
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-03-10  6:56 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #20 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-03-10 06:56:33 ---
Created an attachment (id=25447)
 --> (http://bugzilla.kernel.org/attachment.cgi?id=25447)
additional data from kernel 2.6.32.9-67.fc12.x86_64

Again, I have had very limited time, but I ran an 
rsync after update to kernel 2.6.32.9-67.fc12.x86_64.

The time for sync to complete was about as before,
(about 10 minutes) and timeouts occurred as shown in the
attached log entries.

I assume that the patches are not included in this
kernel, but I just thought that I should report this test.

Michael

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
                   ` (19 preceding siblings ...)
  2010-03-10  6:56 ` bugzilla-daemon
@ 2010-03-10 15:36 ` bugzilla-daemon
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-03-10 15:36 UTC (permalink / raw)
  To: linux-ext4

http://bugzilla.kernel.org/show_bug.cgi?id=14830


Eric Sandeen <sandeen@redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sandeen@redhat.com




--- Comment #21 from Eric Sandeen <sandeen@redhat.com>  2010-03-10 15:36:20 ---
Michael, you're right the patch is not in f12 (not in upstream yet, either).

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
@ 2010-04-02  5:41 ` bugzilla-daemon
  2010-04-02 21:23 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-02  5:41 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #22 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-04-02 05:41:26 ---
I tried another test with 2.6.32.10-90.fc12.x86_64.  I did
not expect an improvement.  But, the results were actually
a lot worse.  After starting an rsync  which transferred a
few 100GB through NFS, I started a sync using time sync.
This caused a number of the usual 2 minute timeout messages.  But, also
it did not close until about 20 minutes after the rsync had
completed.  All together  it ran for several hours. By the
way it was not possible to kill the sync using kill -9.

This is clearly hopeless.

Will anything be done about this in 2.6.33 for fc13?

Will the fact that Google is going with ext4 possibly help?

Michael

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
  2010-04-02  5:41 ` [Bug 14830] When other IO is running sync times go to 10 to 20 minutes bugzilla-daemon
@ 2010-04-02 21:23 ` bugzilla-daemon
  2010-04-06 16:47 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-02 21:23 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830

--- Comment #23 from Eric Sandeen <sandeen@redhat.com>  2010-04-02 16:41:39 ---
The patch in comment #18 is still not upstream.  Jan, what's the status of
that?

Michael, I did discover one issue upstream related to fsync, see:

http://marc.info/?l=linux-ext4&m=126987658403214&w=2

This was very inefficient scanning of large files for sync.  However, for
sys_sync, I didn't see the problem because the loop was limited in that case,
so it may not be related.

(In reply to comment #3)
> >This problem prevents production use of systems using this kernel.
> 
> >evokes a question: Do you have a kernel which behaved better for you? Which
> >one?
> 
> Yes.  RHEL5.4 does not show this problem.  It is the production
> system that works in this environment.

RHEL5.4 on ext3 or ext4?

> The response above is disappointing.  Is sync response of 20 minutes,
> including several task timeouts to be considered "normal?"

Probably not, but it really depends.  If you have a system with massive amounts
of memory, and a slow path to the disk, then sure, if you have to flush many
many gigabytes it will be slow.  But that's extreme, and I don't think you're
in that case.  You do have a 12G box though, so that's potentially a lot of
memory to flush.  OTOH your storage should probably be reasonably fast.

It does seem like something else is going on here.

(In reply to comment #22)
> I tried another test with 2.6.32.10-90.fc12.x86_64.  I did
> not expect an improvement.  But, the results were actually
> a lot worse.  After starting an rsync  which transferred a
> few 100GB through NFS, I started a sync using time sync.
> This caused a number of the usual 2 minute timeout messages.  But, also
> it did not close until about 20 minutes after the rsync had
> completed.  All together  it ran for several hours. By the
> way it was not possible to kill the sync using kill -9.
> 
> This is clearly hopeless.

Hm, don't give up quite yet ;)

Can you describe this test a little more explicitly; which box was the nfs
server vs. client, which boxes were the rsync servers/clients, which box ran
sync?  I just don't want to make wrong assumptions in trying to recreate this.

> Will anything be done about this in 2.6.33 for fc13?

we still have to get to the bottom of the problem before we can talk about
fixes, I'm afraid.

> Will the fact that Google is going with ext4 possibly help?

I don't think so.

One thing that may be interesting is to run blktrace (or use seekwatcher to do
that for you) during the sync call that is stalling out, to get an idea of what
is happening at the block layer and when.

--- Comment #24 from Eric Sandeen <sandeen@redhat.com>  2010-04-02 21:23:21 ---
For what it's worth, assuming I have replicated the behavior properly, the
long-running sync doesn't seem unique to ext4 at all.

I can replicate it by running a script which creates 4G files in sequence,
putting it in the background, sleeping for a while, and typing "sync" - which
never returns.

I see the same behavior on ext4 as well as xfs and ext3.

I applied Jan's patch from comment #18, and the behavior is unchanged.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
  2010-04-02  5:41 ` [Bug 14830] When other IO is running sync times go to 10 to 20 minutes bugzilla-daemon
  2010-04-02 21:23 ` bugzilla-daemon
@ 2010-04-06 16:47 ` bugzilla-daemon
  2010-04-06 16:49 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-06 16:47 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #25 from Jan Kara <jack@suse.cz>  2010-04-06 16:46:52 ---
As I'm reading through wb_writeback, it could happen that the flushing thread
gets stuck in background flushing and thus we never gets to processing the work
for sync(1) and thus sync(1) never finishes. The attached patch should fix
that.
Eric, could you please test whether this patch together with the patch from
comment 18 fixes your testcase?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (2 preceding siblings ...)
  2010-04-06 16:47 ` bugzilla-daemon
@ 2010-04-06 16:49 ` bugzilla-daemon
  2010-04-06 16:50 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-06 16:49 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #26 from Jan Kara <jack@suse.cz>  2010-04-06 16:47:54 ---
Created an attachment (id=25882)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=25882)
Patch to stop background writeback when other IO is in progress

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (3 preceding siblings ...)
  2010-04-06 16:49 ` bugzilla-daemon
@ 2010-04-06 16:50 ` bugzilla-daemon
  2010-04-06 16:52 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-06 16:50 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830


Jan Kara <jack@suse.cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #25882|Patch to stop background    |Patch to stop background
        description|writeback when other IO is  |writeback when other work
                   |in progress                 |is queued for the thread




--- Comment #26 from Jan Kara <jack@suse.cz>  2010-04-06 16:47:54 ---
Created an attachment (id=25882)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=25882)
Patch to stop background writeback when other IO is in progress

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (4 preceding siblings ...)
  2010-04-06 16:50 ` bugzilla-daemon
@ 2010-04-06 16:52 ` bugzilla-daemon
  2010-04-06 16:55 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-06 16:52 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #27 from Eric Sandeen <sandeen@redhat.com>  2010-04-06 16:52:28 ---
Jan, will do after lunch.  Thanks for looking into this!

-Eric

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (5 preceding siblings ...)
  2010-04-06 16:52 ` bugzilla-daemon
@ 2010-04-06 16:55 ` bugzilla-daemon
  2010-04-06 23:01 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-06 16:55 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830


Jan Kara <jack@suse.cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #25882|0                           |1
        is obsolete|                            |




--- Comment #28 from Jan Kara <jack@suse.cz>  2010-04-06 16:55:10 ---
Created an attachment (id=25883)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=25883)
Patch to stop background writeback when other work is queued for the thread

Oops, attached a wrong patch. This is the right one.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (6 preceding siblings ...)
  2010-04-06 16:55 ` bugzilla-daemon
@ 2010-04-06 23:01 ` bugzilla-daemon
  2010-04-06 23:06 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-06 23:01 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #29 from Eric Sandeen <sandeen@redhat.com>  2010-04-06 22:59:27 ---
A quick test on ext3 looks good; ext4 still seems to run away on sync :(

But I need to be a bit more methodical, and test a few more filesystems; will
let you know.

I'm not familiar enough with all the new writeback code; does this mean that a
sync will return as soon as any new IO is queued post-sync?  That seems odd if
so - but maybe I misunderstand.

Thanks,
-Eric

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (7 preceding siblings ...)
  2010-04-06 23:01 ` bugzilla-daemon
@ 2010-04-06 23:06 ` bugzilla-daemon
  2010-04-07  9:33 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-06 23:06 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #29 from Eric Sandeen <sandeen@redhat.com>  2010-04-06 22:59:27 ---
A quick test on ext3 looks good; ext4 still seems to run away on sync :(

But I need to be a bit more methodical, and test a few more filesystems; will
let you know.

I'm not familiar enough with all the new writeback code; does this mean that a
sync will return as soon as any new IO is queued post-sync?  That seems odd if
so - but maybe I misunderstand.

Thanks,
-Eric

--- Comment #30 from Eric Sandeen <sandeen@redhat.com>  2010-04-06 23:05:57 ---
Hm maybe I spoke too soon, had a couple runs on ext3 that looked good but now
it's been syncing for many minutes ...

(this is the test where I create 4G files in a loop, let it go for a while,
then time sync - on a 16g box)

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (8 preceding siblings ...)
  2010-04-06 23:06 ` bugzilla-daemon
@ 2010-04-07  9:33 ` bugzilla-daemon
  2010-04-07 15:10 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-07  9:33 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #31 from Jan Kara <jack@suse.cz>  2010-04-07 09:33:00 ---
(In reply to comment #29)
> I'm not familiar enough with all the new writeback code; does this mean that a
> sync will return as soon as any new IO is queued post-sync?  That seems odd if
> so - but maybe I misunderstand.
  No. The patch means that writeback thread stops doing pdflush-style writeback
when it sees new work queued - work does not mean IO. It means that someone
asks writeback thread to do some kind of writeout... Now I'm still not sure my
patch is the right approach to the problem but I just wanted to checkout
whether it
at least solves the problem of this particular workload.

  If you still see the problem (only less often) even with ext3, then we
probably have also some other work that is livelockable and thus we never get
to work submitted by sync(1). I guess I'll have to find a machine with enough
disks and memory to try this out...

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (9 preceding siblings ...)
  2010-04-07  9:33 ` bugzilla-daemon
@ 2010-04-07 15:10 ` bugzilla-daemon
  2010-04-15  8:45 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-07 15:10 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #32 from Eric Sandeen <sandeen@redhat.com>  2010-04-07 15:09:56 ---
I'm doing this on a machine with a single spare disk, dedicated to the test, so
disks aren't an issue.   I haven't tried to replicate with less memory.

I'll try to poke at it a bit more today...

-Eric

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (10 preceding siblings ...)
  2010-04-07 15:10 ` bugzilla-daemon
@ 2010-04-15  8:45 ` bugzilla-daemon
  2010-04-20  2:49 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-15  8:45 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830


Neil Broomfield <neil.broomfield@caasaudio.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |neil.broomfield@caasaudio.c
                   |                            |om




--- Comment #33 from Neil Broomfield <neil.broomfield@caasaudio.com>  2010-04-15 08:44:13 ---
I think I might be seeing the same/similar issue, see "dmesg | tail" below, it
seams fairly repeatable and occurs usually when I move a file from my download
drive to my raid array 5 (5 disk, 6TB), I'm running Ubuntu 9.10 with all the
latest official patches.

[301080.930044] INFO: task kjournald2:1583 blocked for more than 120 seconds.
[301080.930048] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[301080.930051] kjournald2    D ffff88007734e8e0     0  1583      2 0x00000000
[301080.930056]  ffff88006c38dd10 0000000000000046 ffff88005695d850
0000000000015880
[301080.930060]  ffff880070d247c0 0000000000015880 0000000000015880
0000000000015880
[301080.930063]  0000000000015880 ffff880070d247c0 0000000000015880
0000000000015880
[301080.930066] Call Trace:
[301080.930075]  [<ffffffff811f096a>]
jbd2_journal_commit_transaction+0x1aa/0x1120
[301080.930080]  [<ffffffff8127c336>] ? rb_erase+0xd6/0x160
[301080.930085]  [<ffffffff81010785>] ? __switch_to+0x1e5/0x370
[301080.930088]  [<ffffffff8104f075>] ? finish_task_switch+0x65/0x120
[301080.930093]  [<ffffffff8152c98a>] ? _spin_lock_irqsave+0x2a/0x40
[301080.930097]  [<ffffffff8106af57>] ? lock_timer_base+0x37/0x70
[301080.930101]  [<ffffffff81078a30>] ? autoremove_wake_function+0x0/0x40
[301080.930105]  [<ffffffff811f6963>] kjournald2+0x103/0x270
[301080.930108]  [<ffffffff81078a30>] ? autoremove_wake_function+0x0/0x40
[301080.930111]  [<ffffffff811f6860>] ? kjournald2+0x0/0x270
[301080.930114]  [<ffffffff81078646>] kthread+0xa6/0xb0
[301080.930117]  [<ffffffff8101316a>] child_rip+0xa/0x20
[301080.930120]  [<ffffffff810785a0>] ? kthread+0x0/0xb0
[301080.930123]  [<ffffffff81013160>] ? child_rip+0x0/0x20
[301080.930145] INFO: task mv:2005 blocked for more than 120 seconds.
[301080.930146] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[301080.930148] mv            D ffff880079705928     0  2005   2464 0x00000000
[301080.930152]  ffff8800210f9868 0000000000000086 ffff8800210f97e8
0000000000015880
[301080.930155]  ffff880003b747c0 0000000000015880 0000000000015880
0000000000015880
[301080.930159]  0000000000015880 ffff880003b747c0 0000000000015880
0000000000015880
[301080.930162] Call Trace:
[301080.930165]  [<ffffffff8152c87d>] __down_read+0x8d/0xc6
[301080.930169]  [<ffffffff8152bc49>] down_read+0x19/0x20
[301080.930172]  [<ffffffff811b5fe2>] ext4_get_blocks+0x52/0x210
[301080.930175]  [<ffffffff811b62e7>] ext4_da_get_block_prep+0x77/0x100
[301080.930179]  [<ffffffff81148663>] __block_prepare_write+0x1c3/0x560
[301080.930182]  [<ffffffff811b6270>] ? ext4_da_get_block_prep+0x0/0x100
[301080.930185]  [<ffffffff81148b9f>] block_write_begin+0x5f/0x100
[301080.930188]  [<ffffffff811b8a0d>] ext4_da_write_begin+0x12d/0x260
[301080.930191]  [<ffffffff811b6270>] ? ext4_da_get_block_prep+0x0/0x100
[301080.930194]  [<ffffffff8104f075>] ? finish_task_switch+0x65/0x120
[301080.930198]  [<ffffffff810da1a2>] generic_perform_write+0xb2/0x1d0
[301080.930202]  [<ffffffff811e4dbb>] ? ext4_xattr_get+0x5b/0x90
[301080.930206]  [<ffffffff810dafd3>] generic_file_buffered_write+0x83/0x140
[301080.930209]  [<ffffffff810dc950>]
__generic_file_aio_write_nolock+0x240/0x470
[301080.930213]  [<ffffffff811348f3>] ? touch_atime+0x33/0x150
[301080.930216]  [<ffffffff810dcca0>] generic_file_aio_write+0x70/0xf0
[301080.930221]  [<ffffffff811ae7f9>] ext4_file_write+0x49/0x160
[301080.930225]  [<ffffffff8111f342>] do_sync_write+0xf2/0x130
[301080.930229]  [<ffffffff811511db>] ? fsnotify+0xfb/0x140
[301080.930232]  [<ffffffff81078a30>] ? autoremove_wake_function+0x0/0x40
[301080.930235]  [<ffffffff81133333>] ? dput+0xc3/0x190
[301080.930239]  [<ffffffff81224a11>] ? security_file_permission+0x11/0x20
[301080.930242]  [<ffffffff8111f628>] vfs_write+0xb8/0x1a0
[301080.930245]  [<ffffffff811200dc>] sys_write+0x4c/0x80
[301080.930249]  [<ffffffff81012082>] system_call_fastpath+0x16/0x1b

Let me know if I can be of any further assistance? If you require more info /
test running, I'm happy to help.

Neil

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (11 preceding siblings ...)
  2010-04-15  8:45 ` bugzilla-daemon
@ 2010-04-20  2:49 ` bugzilla-daemon
  2010-04-20  3:46 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-20  2:49 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830


Dave Chinner <david@fromorbit.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |david@fromorbit.com




--- Comment #34 from Dave Chinner <david@fromorbit.com>  2010-04-20 02:49:34 ---
See:

http://marc.info/?l=linux-fsdevel&m=127166071530948&w=2

Sync is acting as designed right now. I agree it's not ideal, but it's now
defaulting to slow-but-safe behaviour rather than the previous behaviour of
potentially not syncing everything that was dirty at the time of the sync call.

Cheers,

Dave.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (12 preceding siblings ...)
  2010-04-20  2:49 ` bugzilla-daemon
@ 2010-04-20  3:46 ` bugzilla-daemon
  2010-04-29  9:31 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-20  3:46 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830

--- Comment #35 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-04-20 03:46:28 ---
Dave (above) said:

"Sync is acting as designed right now. I agree it's not ideal, but it's now
defaulting to slow-but-safe behaviour rather than the previous behaviour of
potentially not syncing everything that was dirty at the time of the sync call.

Dave,"

Are you aware that this blocks other IO so that a user who requests
a read of some data may have to wait for something like 20 minutes
before getting a response?  This includes, for instance, just typing
vi xxx.  Take a look at the reports above which show nfsd being effectively
blocked for periods of more than 20 minutes.

For me this is not just "not ideal" but simply useless.  I do not
see how a system with this behavior can be used.

I also do not see why sync completing with dirty data is a problem.
In an active system there will be new dirty data within milliseconds
of sync completion no matter what it does.

I am well-aware that this is not a simple problem.  But, a solution
that is consistent with the usability of the system is necessary.

Michael

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (13 preceding siblings ...)
  2010-04-20  3:46 ` bugzilla-daemon
@ 2010-04-29  9:31 ` bugzilla-daemon
  2010-04-30  6:02 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-29  9:31 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830


Ritesh Raj Sarraf <rrs@researchut.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rrs@researchut.com




--- Comment #36 from Dave Chinner <david@fromorbit.com>  2010-04-20 05:24:39 ---
(In reply to comment #35)

> Are you aware that this blocks other IO so that a user who requests
> a read of some data may have to wait for something like 20 minutes
> before getting a response? This includes, for instance, just typing
> vi xxx.

That's not a just a read - vi(m) writes a backup file when you open
it. So it's blocking on writes.

However, that sort of antisocial behaviour under heavy write loads is usually
caused by a filesystem concurrency limitation or a IO scheduler problem, not
sync. However, unless you can reproduce the read hangs on XFS when sync is
running, then I'm not the expert you're looking for to debug them. ;)

But I do know the endless sync problem is filesystem independent and I'm trying
to do something about mitigating it's effects:

http://lkml.org/lkml/2010/4/19/410

> Take a look at the reports above which show nfsd being effectively
> blocked for periods of more than 20 minutes.

Can't say I've heard of any such recent problems on XFS....

Cheers,

Dave.

--- Comment #37 from Neil Broomfield <neil.broomfield@caasaudio.com>  2010-04-20 09:00:28 ---
"Can't say I've heard of any such recent problems on XFS...."

Are these not effectivly the same issue?:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/276476

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/494476

Neil

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (14 preceding siblings ...)
  2010-04-29  9:31 ` bugzilla-daemon
@ 2010-04-30  6:02 ` bugzilla-daemon
  2010-05-20  2:11 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-04-30  6:02 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830

--- Comment #38 from Michael Godfrey <godfrey@isl.stanford.edu>  2010-04-30 06:02:32 ---
Just a couple of comments on the posts above:

1. In our environment, home directories are NFS mounted,
   so vi cannot read .vimrc.  It appears that this is where
   it hangs.  The important thing is that nothing
   appears on the user's screen after typing vi. 

2. We do not have an XFS filesystem, so I cannot comment on
   whether these problems exist for XFS.

3. The comments indicate that, as usual, there is more than
   one problem.  I am hopeful that the ext4 problem that Eric
   Sandeen is fixing and proposed fixes for sync will make
   a significant difference.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (15 preceding siblings ...)
  2010-04-30  6:02 ` bugzilla-daemon
@ 2010-05-20  2:11 ` bugzilla-daemon
  2010-05-20 11:11 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-05-20  2:11 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830


Justin Maggard <jmaggard10@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmaggard10@gmail.com




--- Comment #39 from Justin Maggard <jmaggard10@gmail.com>  2010-05-20 02:11:11 ---
(In reply to comment #28)
> Created an attachment (id=25883)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=25883) [details]
> Patch to stop background writeback when other work is queued for the thread
> 
> Oops, attached a wrong patch. This is the right one.

I ran into this issue recently (extra long sync times) and have been trying the
patches attached in this thread.  The above patch specifically causes a pretty
significant performance regression for me doing a simple sequential dd write on
a dual-core Atom system running 2.6.33.4 x86_64.  This command:
dd bs=1M conv=fsync if=/dev/zero of=test_file count=10000
went from 135 MB/sec to all the way down to 92.9 MB/sec.  Is this expected?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (16 preceding siblings ...)
  2010-05-20  2:11 ` bugzilla-daemon
@ 2010-05-20 11:11 ` bugzilla-daemon
  2010-05-20 17:53 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-05-20 11:11 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830


Jan Kara <jack@suse.cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #25883|0                           |1
        is obsolete|                            |




--- Comment #40 from Jan Kara <jack@suse.cz>  2010-05-20 11:11:41 ---
Created an attachment (id=26456)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=26456)
Patch to stop background writeback when other work is queued for the thread

Ah, there's a bug in the condition in the original patch and thus it stops
background writeback when there it *no* other work queued instead when there
*is*. Could you please test this fixed version?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (17 preceding siblings ...)
  2010-05-20 11:11 ` bugzilla-daemon
@ 2010-05-20 17:53 ` bugzilla-daemon
  2012-06-18 13:19 ` bugzilla-daemon
  2012-06-18 13:19 ` bugzilla-daemon
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2010-05-20 17:53 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #41 from Justin Maggard <jmaggard10@gmail.com>  2010-05-20 17:53:08 ---
Ah, much better. :)  Performance for me with this version is now just about the
same as without the patch.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (18 preceding siblings ...)
  2010-05-20 17:53 ` bugzilla-daemon
@ 2012-06-18 13:19 ` bugzilla-daemon
  2012-06-18 13:19 ` bugzilla-daemon
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2012-06-18 13:19 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830


Alan <alan@lxorguk.ukuu.org.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |alan@lxorguk.ukuu.org.uk
         Resolution|                            |OBSOLETE




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Bug 14830] When other IO is running sync times go to 10 to 20 minutes
       [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
                   ` (19 preceding siblings ...)
  2012-06-18 13:19 ` bugzilla-daemon
@ 2012-06-18 13:19 ` bugzilla-daemon
  20 siblings, 0 replies; 48+ messages in thread
From: bugzilla-daemon @ 2012-06-18 13:19 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=14830


Alan <alan@lxorguk.ukuu.org.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2012-06-18 13:19 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-14830-13602@https.bugzilla.kernel.org/>
2010-04-02  5:41 ` [Bug 14830] When other IO is running sync times go to 10 to 20 minutes bugzilla-daemon
2010-04-02 21:23 ` bugzilla-daemon
2010-04-06 16:47 ` bugzilla-daemon
2010-04-06 16:49 ` bugzilla-daemon
2010-04-06 16:50 ` bugzilla-daemon
2010-04-06 16:52 ` bugzilla-daemon
2010-04-06 16:55 ` bugzilla-daemon
2010-04-06 23:01 ` bugzilla-daemon
2010-04-06 23:06 ` bugzilla-daemon
2010-04-07  9:33 ` bugzilla-daemon
2010-04-07 15:10 ` bugzilla-daemon
2010-04-15  8:45 ` bugzilla-daemon
2010-04-20  2:49 ` bugzilla-daemon
2010-04-20  3:46 ` bugzilla-daemon
2010-04-29  9:31 ` bugzilla-daemon
2010-04-30  6:02 ` bugzilla-daemon
2010-05-20  2:11 ` bugzilla-daemon
2010-05-20 11:11 ` bugzilla-daemon
2010-05-20 17:53 ` bugzilla-daemon
2012-06-18 13:19 ` bugzilla-daemon
2012-06-18 13:19 ` bugzilla-daemon
     [not found] <bug-14830-13602@http.bugzilla.kernel.org/>
2010-01-13 19:15 ` bugzilla-daemon
2010-01-18 21:50 ` bugzilla-daemon
2010-01-18 23:58 ` bugzilla-daemon
2010-01-19 17:24   ` Chris Lee
2010-01-19 17:24 ` bugzilla-daemon
2010-01-19 19:38 ` bugzilla-daemon
2010-01-19 21:56 ` bugzilla-daemon
2010-01-25  7:38 ` bugzilla-daemon
2010-01-25  8:27 ` bugzilla-daemon
2010-01-25 11:07 ` bugzilla-daemon
2010-01-25 11:16 ` bugzilla-daemon
2010-01-25 11:19 ` bugzilla-daemon
2010-01-27 13:06   ` Andre Noll
2010-01-27 19:43     ` Andreas Dilger
2010-01-28 10:24       ` Andre Noll
2010-01-28  7:53     ` tytso
2010-01-28 10:25       ` Andre Noll
2010-01-28  9:18 ` bugzilla-daemon
2010-01-28 11:23 ` bugzilla-daemon
2010-01-28 16:22 ` bugzilla-daemon
2010-01-29  0:52 ` bugzilla-daemon
2010-01-29  6:37 ` bugzilla-daemon
2010-02-04  4:16 ` bugzilla-daemon
2010-02-15 14:28 ` bugzilla-daemon
2010-02-15 18:29 ` bugzilla-daemon
2010-03-10  6:56 ` bugzilla-daemon
2010-03-10 15:36 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).