linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* R.I.P. pdflush
@ 2012-07-25 15:11 Artem Bityutskiy
  2012-07-25 15:11 ` [PATCH 01/16] vfs: kill write_super and sync_supers Artem Bityutskiy
                   ` (16 more replies)
  0 siblings, 17 replies; 24+ messages in thread
From: Artem Bityutskiy @ 2012-07-25 15:11 UTC (permalink / raw)
  To: Al Viro; +Cc: Linux Kernel Maling List, Linux FS Maling List

Now that all file-systems have been modified to not use the '->write_super()'
superblock method, we can kill the last pdflush leftover - the 'sync_supers'
kernel thread.

The sync_supers kernel thread does a very simple thing: wake up every 5
seconds (see [1]), iterate over all superblocks in the system and flush
dirty superblocks by calling their '->write_super()' method.

The problem is that from power-efficiency point of view it is very wasteful
to have a thread which wakes up every 5 seconds in the very core of the
Linux kernel. Indeed, most of the time this thread wakes the CPU from a deep
sleep state just to find out that there are no dirty superblocks. Besides,
modern file-systems like btrfs and ext4 (journalled mode only) do not even
register '->write_super()', so on many modern systems sync_super is completely
useless.

And as usually happens when trying to modify old code like that - removing
sync_supers was a tedious job. It required changing 12 file-systems, including
ancient ones. While changes were not that complex, testing all of them was the
most difficult part. While testing the mainstream file-systems like ext4 was
easy (just run xfstests and wait few hours), testing baroque file-systems was
problematic because they simply oopsed or errored even before I modified them.

For example, reiserfs deadlocked quickly when I tested it using xfstests with
resierfs quota support enabled. I spend several days trying to fix this, but
reiserfs is quite complex and I'd say its locking is crazy (partially because
of the BKL push-down). But I gave up after I realized that the dead-lock is
related to the quota support. I disabled quotas and xfstests passed.

I also had some adventures with affs and few other old file-systems.

The first patch of this patch-set removes the sync_supers thread and it is the
most important one. All the other patches are minor clean-ups and they simply
remove all references to 'write_super' and 'pdflush' from commentaries
and the documentation.

I suggest that all patches go in via Al's tree. However, not before the ext4,
exofs and udf changes are merged, which I expect to happen before v3.6-rc1.
The rest of the file-systems are merged already - here is the summary.

1.  ext4 - changes sit in Ted Ts'o's tree
    git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
2.  exofs - changes sit in Boaz Harrosh's tree
    git://git.open-osd.org/linux-open-osd linux-next
3.  udf - changes sit in Jan Kara's tree:
    git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs for_next
4.  sysv - merged upstream
    9d46be2 fs/sysv: stop using write_super and s_dirt
5.  ufs - merged upstream
    9e9ad5f fs/ufs: get rid of write_super
6.  affs - merged upstream:
    3dd8478 affs: get rid of affs_sync_super
7.  hfs - merged upstream:
    5687b57 hfs: get rid of hfs_sync_super
8.  hfsplus - merged upstream:
    9e6c582 hfsplus: get rid of write_super
9.  ext2 - merged upstream
    f72cf5e ext2: do not register write_super within VFS
10. vfat - merged upstream
    7849118 fat: switch to fsinfo_inode
11. jffs2 - merged upstream
    208b14e jffs2: get rid of jffs2_sync_super
12. reiserfs - merged upstream
    033369d reiserfs: get rid of resierfs_sync_super

These patches are also available here:
git://git.infradead.org/users/dedekind/linux-misc.git sync_supers

And just because this is the final pdflush removal, here is a brief historical
reference.

1. early days...2.6.31 - pdflush is the kernel daemon which periodically
   wakes-up and flushes all dirty inodes and superblocks.
2. 2.6.32 - Jens Axboe introduces per-block device BDI flusher threads which
   are now responsible to flushing dirty inodes [2]. The pdflush thread becomes
   very simple, it is re-named to sync_supers and it periodically wakes-up
   and flushes superblocks. While overall Jens' change was good, it introduced
   a regression: instead of one pdflush thread waking-up every 5 seconds [3]
   we ended up with multiple threads waking up every 5 seconds - sync_supers
   and several flusher threads.
3. 2.6.36 - Artem Bityutskiy :-) fixes the wake-ups regression (see commit
   6467716) and from now on flusher threads do not wake up unless there are
   some dirty data for the corresponding block device.

   Attempts are made to similarly optimize sync_supers, but they are vetoed
   by Al Viro who wants sync_supers to be killed altogether instead [4].
4. 3.6 - the sync_supers is hopefully finally killed. With this the last
   piece of pdflush is also gone.

I'd like to thank Intel OTC for supporting this project, Jan Kara for help
with ext[24], Andrew Morton, Al Viro, Ted Ts'o, Nick Piggin.

[1] 5 seconds is the default setting and major distributions do not change
    it. But it is tunable via /proc/sys/vm/dirty_writeback_centisecs
[2] http://lwn.net/Articles/326552/
[3] pdflush thread was forking itself if there were a lot dirty date, but it
    does not matter in this context.
[4] https://lkml.org/lkml/2010/7/22/96

--
Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-08-03  6:49 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-25 15:11 R.I.P. pdflush Artem Bityutskiy
2012-07-25 15:11 ` [PATCH 01/16] vfs: kill write_super and sync_supers Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 02/16] Documentation: get rid of write_super Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 03/16] Documentation: fix the VM knobs descritpion WRT pdflush Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 04/16] ext3: nuke write_super from comments Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 05/16] ext4: " Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 06/16] ext4: nuke pdflush " Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 07/16] btrfs: nuke write_super " Artem Bityutskiy
2012-07-25 15:46   ` cwillu
2012-07-25 16:06     ` Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 08/16] btrfs: nuke pdflush " Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 09/16] jbd/jbd2: nuke write_super " Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 10/16] vfs: nuke pdflush " Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 11/16] hfs: nuke write_super " Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 12/16] ntfs: " Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 13/16] nilfs2: " Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 14/16] drbd: nuke pdflush " Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 15/16] gfs2: " Artem Bityutskiy
2012-07-25 15:16   ` Bob Peterson
2012-07-26  9:10     ` Steven Whitehouse
2012-07-26  9:24       ` Artem Bityutskiy
2012-07-25 15:12 ` [PATCH 16/16] UBIFS: " Artem Bityutskiy
2012-08-02 21:27 ` R.I.P. pdflush Jeff Mahoney
2012-08-03  6:53   ` Artem Bityutskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).