linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ext3] kjournald writing after each read despite noatime,commit=nnn
@ 2008-12-31 21:28 Dave Johnson
  2009-01-01 14:02 ` Bart Samwel
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Johnson @ 2008-12-31 21:28 UTC (permalink / raw)
  To: linux-kernel, linux-ext4


Since upgrading from 2.6.23.1 to 2.6.27.8, I'm now seeing kjournald
perform a write 2 seconds after each read to an ext3 fs.

laptop mode is on and filesystem is mounted with noatime and a large
commit time (6 hours).

Did I miss something or is this expected?


Note, there is no other activity reported by block_dump at this time
from any other mounted filesystem.

Some examples:

Dec 31 15:43:14 gw kernel: w(26329): READ block 756266 on hda1 
Dec 31 15:43:16 gw kernel: kjournald(760): WRITE block 2152 on hda1 

Dec 31 15:45:05 gw kernel: top(26366): READ block 756008 on hda1 
Dec 31 15:45:05 gw kernel: top(26366): READ block 756016 on hda1 
Dec 31 15:45:05 gw kernel: top(26366): READ block 756026 on hda1 
Dec 31 15:45:05 gw kernel: top(26366): READ block 756098 on hda1 
Dec 31 15:45:07 gw kernel: kjournald(760): WRITE block 2154 on hda1 

Dec 31 15:48:43 gw kernel: cat(26430): READ block 502016 on hda1 
Dec 31 15:48:43 gw kernel: cat(26430): READ block 502024 on hda1 
Dec 31 15:48:43 gw kernel: cat(26430): READ block 502032 on hda1 
Dec 31 15:48:43 gw kernel: cat(26430): READ block 502040 on hda1 
Dec 31 15:48:43 gw kernel: cat(26430): READ block 502042 on hda1 
Dec 31 15:48:43 gw kernel: cat(26430): READ block 502050 on hda1 
Dec 31 15:48:43 gw kernel: cat(26430): READ block 502114 on hda1 
Dec 31 15:48:45 gw kernel: kjournald(760): WRITE block 2156 on hda1 

Dec 31 15:48:55 gw kernel: bash(26436): READ block 744586 on hda1 
Dec 31 15:48:55 gw kernel: bash(26436): READ block 744594 on hda1 
Dec 31 15:48:55 gw kernel: bash(26436): READ block 744602 on hda1 
Dec 31 15:48:55 gw kernel: bash(26436): READ block 744610 on hda1 
Dec 31 15:48:55 gw kernel: bash(26436): READ block 744612 on hda1 
Dec 31 15:48:55 gw kernel: reset(26436): READ block 744620 on hda1 
Dec 31 15:48:57 gw kernel: kjournald(760): WRITE block 2158 on hda1 

$ grep ext3 /proc/mounts 
/dev/root / ext3 rw,noatime,errors=remount-ro,commit=21600,data=ordered 0 0

$ for f in /proc/sys/vm/*; do printf "%-50s" $f; cat $f; done
/proc/sys/vm/block_dump                           1
/proc/sys/vm/dirty_background_ratio               10
/proc/sys/vm/dirty_expire_centisecs               2160000
/proc/sys/vm/dirty_ratio                          60
/proc/sys/vm/dirty_writeback_centisecs            2160000
/proc/sys/vm/drop_caches                          0
/proc/sys/vm/laptop_mode                          2
/proc/sys/vm/legacy_va_layout                     0
/proc/sys/vm/lowmem_reserve_ratio                 256   32
/proc/sys/vm/max_map_count                        65536
/proc/sys/vm/min_free_kbytes                      1440
/proc/sys/vm/nr_pdflush_threads                   2
/proc/sys/vm/oom_dump_tasks                       0
/proc/sys/vm/oom_kill_allocating_task             0
/proc/sys/vm/overcommit_memory                    0
/proc/sys/vm/overcommit_ratio                     50
/proc/sys/vm/page-cluster                         3
/proc/sys/vm/panic_on_oom                         0
/proc/sys/vm/percpu_pagelist_fraction             0
/proc/sys/vm/swappiness                           60
/proc/sys/vm/vdso_enabled                         2
/proc/sys/vm/vfs_cache_pressure                   100
$ 

-- 
Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ext3] kjournald writing after each read despite noatime,commit=nnn
  2008-12-31 21:28 [ext3] kjournald writing after each read despite noatime,commit=nnn Dave Johnson
@ 2009-01-01 14:02 ` Bart Samwel
  2009-01-01 16:49   ` Dave Johnson
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Samwel @ 2009-01-01 14:02 UTC (permalink / raw)
  To: Dave Johnson; +Cc: linux-kernel, linux-ext4

Hi Dave,

Dave Johnson wrote:
> Since upgrading from 2.6.23.1 to 2.6.27.8, I'm now seeing kjournald
> perform a write 2 seconds after each read to an ext3 fs.
> 
> laptop mode is on and filesystem is mounted with noatime and a large
> commit time (6 hours).
> 
> Did I miss something or is this expected?

This is the defined behaviour for laptop_mode. Whenever a *physical*
READ takes place, this is taken to indicate that the disk is spun up at
that time. The laptop_mode functionality then takes that opportunity to
sync any dirty data to disk, two seconds (or whatever value you put in
/proc/sys/vm/laptop_mode) after the physical disk activity has ceased.
The rationale behind this is that you want to sync your stuff when the
disk is spun up, and then you want to hold back writing back stuff for a
very long while. And the only way it can detect that the disk is spun up
is when there is physical disk activity.

This is exactly what happens in your case. The READ activity reported by
block_dump is *physical* read activity: some data was needed that was
not cached in memory. block_dump does not show you what data was
retrieved from the ext3 fs *without* having to access the disk, it only
shows actual physical disk I/O.

Cheers,
Bart

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ext3] kjournald writing after each read despite noatime,commit=nnn
  2009-01-01 14:02 ` Bart Samwel
@ 2009-01-01 16:49   ` Dave Johnson
  2009-01-01 18:59     ` Bart Samwel
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Johnson @ 2009-01-01 16:49 UTC (permalink / raw)
  To: Bart Samwel; +Cc: linux-kernel, linux-ext4

Bart Samwel writes:
> This is the defined behaviour for laptop_mode. Whenever a *physical*
> READ takes place, this is taken to indicate that the disk is spun up at
> that time. The laptop_mode functionality then takes that opportunity to
> sync any dirty data to disk, two seconds (or whatever value you put in
> /proc/sys/vm/laptop_mode) after the physical disk activity has ceased.
> The rationale behind this is that you want to sync your stuff when the
> disk is spun up, and then you want to hold back writing back stuff for a
> very long while. And the only way it can detect that the disk is spun up
> is when there is physical disk activity.
> 
> This is exactly what happens in your case. The READ activity reported by
> block_dump is *physical* read activity: some data was needed that was
> not cached in memory. block_dump does not show you what data was
> retrieved from the ext3 fs *without* having to access the disk, it only
> shows actual physical disk I/O.

Yep sounds good, but this happens even if there is no dirty data
needing a sync back to disk.

$ grep 'Dirty\|Write' /proc/meminfo 
Dirty:               0 kB
Writeback:           0 kB
WritebackTmp:        0 kB
$ cat /some/uncached/file >/dev/null 

Jan  1 11:43:49 gw kernel: cat(6615): READ block 864408 on hda1 
Jan  1 11:43:51 gw kernel: kjournald(760): WRITE block 2376 on hda1 


Note, the reason I ask is this is a SSD so just because a physical
read has taken place recently unneeded writes should be avoided.

Turning laptop_mode to 0, but leaving other settings the same
resolves the uneeded write:

$ echo 0 > /proc/sys/vm/laptop_mode 
$ grep 'Dirty\|Write' /proc/meminfo 
Dirty:               0 kB
Writeback:           0 kB
WritebackTmp:        0 kB
$ cat /some/other/uncached/file >/dev/null

Jan  1 11:47:15 gw kernel: cat(6653): READ block 864258 on hda1 


-- 
Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ext3] kjournald writing after each read despite noatime,commit=nnn
  2009-01-01 16:49   ` Dave Johnson
@ 2009-01-01 18:59     ` Bart Samwel
  2009-01-01 19:18       ` Dave Johnson
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Samwel @ 2009-01-01 18:59 UTC (permalink / raw)
  To: Dave Johnson; +Cc: linux-kernel, linux-ext4

Hi Dave,

Dave Johnson wrote:
> Bart Samwel writes:
>> This is the defined behaviour for laptop_mode. Whenever a *physical*
>> READ takes place, this is taken to indicate that the disk is spun up at
>> that time. The laptop_mode functionality then takes that opportunity to
>> sync any dirty data to disk, two seconds (or whatever value you put in
>> /proc/sys/vm/laptop_mode) after the physical disk activity has ceased.
>> The rationale behind this is that you want to sync your stuff when the
>> disk is spun up, and then you want to hold back writing back stuff for a
>> very long while. And the only way it can detect that the disk is spun up
>> is when there is physical disk activity.
>>
>> This is exactly what happens in your case. The READ activity reported by
>> block_dump is *physical* read activity: some data was needed that was
>> not cached in memory. block_dump does not show you what data was
>> retrieved from the ext3 fs *without* having to access the disk, it only
>> shows actual physical disk I/O.
> 
> Yep sounds good, but this happens even if there is no dirty data
> needing a sync back to disk.
> 
> $ grep 'Dirty\|Write' /proc/meminfo 
> Dirty:               0 kB
> Writeback:           0 kB
> WritebackTmp:        0 kB
> $ cat /some/uncached/file >/dev/null 
> 
> Jan  1 11:43:49 gw kernel: cat(6615): READ block 864408 on hda1 
> Jan  1 11:43:51 gw kernel: kjournald(760): WRITE block 2376 on hda1 

This looks like it's a generic property of syncing an ext3 file system.
Try turning off laptop_mode and then running "sync". You will probably
see the same behaviour.

> Note, the reason I ask is this is a SSD so just because a physical
> read has taken place recently unneeded writes should be avoided.
> 
> Turning laptop_mode to 0, but leaving other settings the same
> resolves the uneeded write:

For your SSD I guess you need to get rid of the
sync-after-disk-activity, but keep the other VM behaviours of
laptop_mode (such as avoiding swapping out pages / writing back dirty
pages in order to free memory as long as it is also possible to just
drop pages that are not dirty).
You can probably achieve this by:
- having a large commit interval etc., like you have now
- setting laptop_mode to a very large value, e.g. a couple of hours.
That will trigger a sync if and only if there has been *no* disk
activity at all for hours on end -- i.e., pretty much never. And the
other write-reducing VM features of laptop_mode will still be enabled.

It would perhaps be a good thing to split these mechanisms into separate
knobs. Write batching (the sync-after-disk-activity stuff and also the
dirty_ratio / dirty_background_ratio changes) are a completely separate
mechanism from write avoidance (the other mechanism I mentioned).

Cheers,
Bart

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ext3] kjournald writing after each read despite noatime,commit=nnn
  2009-01-01 18:59     ` Bart Samwel
@ 2009-01-01 19:18       ` Dave Johnson
  2009-01-01 19:35         ` Dave Johnson
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Johnson @ 2009-01-01 19:18 UTC (permalink / raw)
  To: Bart Samwel; +Cc: linux-kernel, linux-ext4

Bart Samwel writes:
> This looks like it's a generic property of syncing an ext3 file system.
> Try turning off laptop_mode and then running "sync". You will probably
> see the same behaviour.

yep, every sync() causes a gratuitous write to disk N seconds later
even when no data has been read or written since the last sync().

$ while sleep 5; do sync; done
^C
$ 

Jan  1 14:09:31 gw kernel: kjournald(760): WRITE block 2662 on hda1 
Jan  1 14:09:36 gw kernel: kjournald(760): WRITE block 2664 on hda1 
Jan  1 14:09:42 gw kernel: kjournald(760): WRITE block 2666 on hda1 
Jan  1 14:09:47 gw kernel: kjournald(760): WRITE block 2668 on hda1 
Jan  1 14:09:52 gw kernel: kjournald(760): WRITE block 2670 on hda1 

> > Note, the reason I ask is this is a SSD so just because a physical
> > read has taken place recently unneeded writes should be avoided.
> > 
> > Turning laptop_mode to 0, but leaving other settings the same
> > resolves the uneeded write:
> 
> For your SSD I guess you need to get rid of the
> sync-after-disk-activity, but keep the other VM behaviours of
> laptop_mode (such as avoiding swapping out pages / writing back dirty
> pages in order to free memory as long as it is also possible to just
> drop pages that are not dirty).
> 
> You can probably achieve this by:
> - having a large commit interval etc., like you have now
> - setting laptop_mode to a very large value, e.g. a couple of hours.
> That will trigger a sync if and only if there has been *no* disk
> activity at all for hours on end -- i.e., pretty much never. And the
> other write-reducing VM features of laptop_mode will still be enabled.
> 
> It would perhaps be a good thing to split these mechanisms into separate
> knobs. Write batching (the sync-after-disk-activity stuff and also the
> dirty_ratio / dirty_background_ratio changes) are a completely separate
> mechanism from write avoidance (the other mechanism I mentioned).

I'm going to run without laptop mode for a while to see how this
goes.  This system is a small router/firewall with most writes going
to a tmpfs so the number of writes to the ext3 fs is only a few a
month (provided no one logs into the system to do a config change).

Thanks for the help.

-- 
Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ext3] kjournald writing after each read despite noatime,commit=nnn
  2009-01-01 19:18       ` Dave Johnson
@ 2009-01-01 19:35         ` Dave Johnson
  2009-01-01 20:16           ` Eric Sandeen
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Johnson @ 2009-01-01 19:35 UTC (permalink / raw)
  To: Bart Samwel; +Cc: linux-kernel, linux-ext4

Dave Johnson writes:
> Bart Samwel writes:
> > This looks like it's a generic property of syncing an ext3 file system.
> > Try turning off laptop_mode and then running "sync". You will probably
> > see the same behaviour.
> 
> yep, every sync() causes a gratuitous write to disk N seconds later
> even when no data has been read or written since the last sync().

Woops, correction, every sync() call causes a single gratuitous write,
no delay.

$ for n in 1 2 3 4 5; do sync; done

Jan  1 14:32:54 gw kernel: kjournald(760): WRITE block 2682 on hda1 
Jan  1 14:32:54 gw kernel: kjournald(760): WRITE block 2684 on hda1 
Jan  1 14:32:54 gw kernel: kjournald(760): WRITE block 2686 on hda1 
Jan  1 14:32:54 gw kernel: kjournald(760): WRITE block 2688 on hda1 
Jan  1 14:32:54 gw kernel: kjournald(760): WRITE block 2690 on hda1 

-- 
Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ext3] kjournald writing after each read despite noatime,commit=nnn
  2009-01-01 19:35         ` Dave Johnson
@ 2009-01-01 20:16           ` Eric Sandeen
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2009-01-01 20:16 UTC (permalink / raw)
  To: Dave Johnson; +Cc: Bart Samwel, linux-kernel, linux-ext4

Dave Johnson wrote:
> Dave Johnson writes:
>> Bart Samwel writes:
>>> This looks like it's a generic property of syncing an ext3 file system.
>>> Try turning off laptop_mode and then running "sync". You will probably
>>> see the same behaviour.
>> yep, every sync() causes a gratuitous write to disk N seconds later
>> even when no data has been read or written since the last sync().
> 
> Woops, correction, every sync() call causes a single gratuitous write,
> no delay.
> 
> $ for n in 1 2 3 4 5; do sync; done
> 
> Jan  1 14:32:54 gw kernel: kjournald(760): WRITE block 2682 on hda1 
> Jan  1 14:32:54 gw kernel: kjournald(760): WRITE block 2684 on hda1 
> Jan  1 14:32:54 gw kernel: kjournald(760): WRITE block 2686 on hda1 
> Jan  1 14:32:54 gw kernel: kjournald(760): WRITE block 2688 on hda1 
> Jan  1 14:32:54 gw kernel: kjournald(760): WRITE block 2690 on hda1 
> 

This may be http://bugzilla.kernel.org/show_bug.cgi?id=12224

-Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-01-01 20:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-31 21:28 [ext3] kjournald writing after each read despite noatime,commit=nnn Dave Johnson
2009-01-01 14:02 ` Bart Samwel
2009-01-01 16:49   ` Dave Johnson
2009-01-01 18:59     ` Bart Samwel
2009-01-01 19:18       ` Dave Johnson
2009-01-01 19:35         ` Dave Johnson
2009-01-01 20:16           ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).