linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Lazy itable initialization for Ext4
@ 2010-08-20 17:51 Lukas Czerner
  2010-08-20 17:51 ` [PATCH 1/3] Add helper function for blkdev_issue_zeroout Lukas Czerner
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Lukas Czerner @ 2010-08-20 17:51 UTC (permalink / raw)
  To: linux-ext4; +Cc: rwheeler, sandeen, jack, tytso, adilger, lczerner

Hi, all

I am presenting you initial version of my lazy inode table initialization
code for Ext4. The patch set consist of three patches. The first one adds
helper function for blkdev_issue_zeroout called sb_issue_zeroout as I am
using it to zero out inode table. Second patch adds new pair of mount
options (inititable/noinititable), so you can enable or disable this
feature. In default it is off (noinititable), so in order to try the new
code you should moutn the fs like this:

  mount -o noinititable /dev/sda /mnt/

And finally the third patch adds the inode table initialization code
itself. Thread initialization was heavily inspired by nilfs2 segctord.


DESCRIPTION:

When file system is mounted with "inititable" mount option, new thread
(called itableinitd) is created. This thread walks through allocation
groups searching for the group with not yet initialized inode table.
When such a group is found it write zeroes through whole inode table and
put itself into sleep for defined number of seconds to not disturb other
ongoing I/O. This is repeated until it walks through every allocation group
then the itableinitd thread is stopped.

When regular inode allocation are going too fast, there is a chance that
it hits the group with uninitialized inode table sooner than the
itableinitd thread. In that case it just initializes the itable for
itself the same way that itableinitd thread would do eventually. To
prevent race conditions, each group is protected by the mutex.



For now, there is no way to control itableinitd thread. It would probably be
nice to have more control over how often it is woken up maybe through
sysfs interface, or we can add some more logic into it in order to be more
I/O sensitive and not put too much stress on already stressed system.
However, this may be counterproductive, because we want inode table to be
initialized before regular inode allocator hits it.

Personally I think, we should create some kind of sysfs interface
(/sys/fs/ext4/<device>/itable_init_interval ?) so the user space can watch
the system and control itableinitd wake-up interval for itself.

This is the first version of patch set and it received very limited testing
so far. In order to try this patch you should do the following (aside
compiling kernel with this patch)

 1. Make Ext4 fs with lazy_itable_init extended option.
	mke2fs -T ext4 -E lazy_itable_init=1 <device>

 2. Mount this file system with inititable mount option
	mount -o inititable <device> <dir>



Any comments are more than welcomed.

Thanks!
-Lukas

---
[PATCH 1/3] Add helper function for blkdev_issue_zeroout
[PATCH 2/3] Add inititable/noinititable mount options for ext4
[PATCH 3/3] Add inode table initialization code into Ext4

 fs/ext4/ext4.h         |   66 ++++++++++++
 fs/ext4/ialloc.c       |   75 +++++++++++++
 fs/ext4/super.c        |  274 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |    8 ++
 4 files changed, 423 insertions(+), 0 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] Add helper function for blkdev_issue_zeroout
  2010-08-20 17:51 [PATCH 0/3] Lazy itable initialization for Ext4 Lukas Czerner
@ 2010-08-20 17:51 ` Lukas Czerner
  2010-08-20 17:51 ` [PATCH 2/3] Add inititable/noinititable mount options for ext4 Lukas Czerner
  2010-08-20 17:51 ` [PATCH 3/3] Add inode table initialization code into Ext4 Lukas Czerner
  2 siblings, 0 replies; 12+ messages in thread
From: Lukas Czerner @ 2010-08-20 17:51 UTC (permalink / raw)
  To: linux-ext4; +Cc: rwheeler, sandeen, jack, tytso, adilger, lczerner

This is done the same way as function sb_issue_discard for
blkdev_issue_discard.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 include/linux/blkdev.h |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 09a8402..a22939d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1023,6 +1023,14 @@ static inline int sb_issue_discard(struct super_block *sb,
 	return blkdev_issue_discard(sb->s_bdev, block, nr_blocks, GFP_KERNEL,
 				   BLKDEV_IFL_WAIT | BLKDEV_IFL_BARRIER);
 }
+static inline int sb_issue_zeroout(struct super_block *sb,
+				   sector_t block, sector_t nr_blocks)
+{
+	block <<= (sb->s_blocksize_bits - 9);
+	nr_blocks <<= (sb->s_blocksize_bits - 9);
+	return blkdev_issue_zeroout(sb->s_bdev, block, nr_blocks, GFP_KERNEL,
+				   BLKDEV_IFL_WAIT | BLKDEV_IFL_BARRIER);
+}
 
 extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
 
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] Add inititable/noinititable mount options for ext4
  2010-08-20 17:51 [PATCH 0/3] Lazy itable initialization for Ext4 Lukas Czerner
  2010-08-20 17:51 ` [PATCH 1/3] Add helper function for blkdev_issue_zeroout Lukas Czerner
@ 2010-08-20 17:51 ` Lukas Czerner
  2010-08-20 17:51 ` [PATCH 3/3] Add inode table initialization code into Ext4 Lukas Czerner
  2 siblings, 0 replies; 12+ messages in thread
From: Lukas Czerner @ 2010-08-20 17:51 UTC (permalink / raw)
  To: linux-ext4; +Cc: rwheeler, sandeen, jack, tytso, adilger, lczerner

Add new mount flag EXT4_MOUNT_INIT_INODE_TABLE and add new pair of mount
options (inititable/noinititable). When mounted with inititable file
system should try to initialize uninitialized inode tables, otherwise it
should prevent initializing inode tables.
For now, default is noinittable.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/ext4.h  |    1 +
 fs/ext4/super.c |   12 ++++++++++++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 19a4de5..dbd6760 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -885,6 +885,7 @@ struct ext4_inode_info {
 #define EXT4_MOUNT_DATA_ERR_ABORT	0x10000000 /* Abort on file data write */
 #define EXT4_MOUNT_BLOCK_VALIDITY	0x20000000 /* Block validity checking */
 #define EXT4_MOUNT_DISCARD		0x40000000 /* Issue DISCARD requests */
+#define EXT4_MOUNT_INIT_INODE_TABLE	0x80000000 /* Initialize uninitialized itables */
 
 #define clear_opt(o, opt)		o &= ~EXT4_MOUNT_##opt
 #define set_opt(o, opt)			o |= EXT4_MOUNT_##opt
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4e8983a..515e306 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -986,6 +986,9 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
 	if (test_opt(sb, DIOREAD_NOLOCK))
 		seq_puts(seq, ",dioread_nolock");
 
+	if (test_opt(sb, INIT_INODE_TABLE))
+		seq_puts(seq, ",init_inode_table");
+
 	ext4_show_quota_options(seq, sb);
 
 	return 0;
@@ -1161,6 +1164,7 @@ enum {
 	Opt_inode_readahead_blks, Opt_journal_ioprio,
 	Opt_dioread_nolock, Opt_dioread_lock,
 	Opt_discard, Opt_nodiscard,
+	Opt_init_inode_table, Opt_noinit_inode_table,
 };
 
 static const match_table_t tokens = {
@@ -1231,6 +1235,8 @@ static const match_table_t tokens = {
 	{Opt_dioread_lock, "dioread_lock"},
 	{Opt_discard, "discard"},
 	{Opt_nodiscard, "nodiscard"},
+	{Opt_init_inode_table, "inititable"},
+	{Opt_noinit_inode_table, "noinititable"},
 	{Opt_err, NULL},
 };
 
@@ -1699,6 +1705,12 @@ set_qf_format:
 		case Opt_dioread_lock:
 			clear_opt(sbi->s_mount_opt, DIOREAD_NOLOCK);
 			break;
+		case Opt_init_inode_table:
+			set_opt(sbi->s_mount_opt, INIT_INODE_TABLE);
+			break;
+		case Opt_noinit_inode_table:
+			clear_opt(sbi->s_mount_opt, INIT_INODE_TABLE);
+			break;
 		default:
 			ext4_msg(sb, KERN_ERR,
 			       "Unrecognized mount option \"%s\" "
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] Add inode table initialization code into Ext4
  2010-08-20 17:51 [PATCH 0/3] Lazy itable initialization for Ext4 Lukas Czerner
  2010-08-20 17:51 ` [PATCH 1/3] Add helper function for blkdev_issue_zeroout Lukas Czerner
  2010-08-20 17:51 ` [PATCH 2/3] Add inititable/noinititable mount options for ext4 Lukas Czerner
@ 2010-08-20 17:51 ` Lukas Czerner
  2010-08-23 22:59   ` Andreas Dilger
  2010-08-26 11:03   ` Peng Tao
  2 siblings, 2 replies; 12+ messages in thread
From: Lukas Czerner @ 2010-08-20 17:51 UTC (permalink / raw)
  To: linux-ext4; +Cc: rwheeler, sandeen, jack, tytso, adilger, lczerner

When lazy_itable_init extended option is passed to mke2fs, it
considerably speed up filesystem creation because inode tables are left
uninitialized, thus contains some old data. When this fs is mounted
filesystem code should initialize (zero out) uninitialized inode table.
So far this code was missing for ext4 and this patch adds this feature.

When file system is mounted with "inititable" mount option, new thread
(called itableinitd) is created. This thread walks through allocation
groups searching for the group with not yet initialized inode table.
When such a group is found it write zeroes through whole inode table and
put itself into sleep for defined number of seconds to not disturb other
ongoing I/O. This is repeated until it walks through every allocation group
then the iitableinitd thread is stopped.

When regular inode allocation are going too fast, there is a chance that
it hits the group with uninitialized inode table sooner than the
itableinitd thread. In that case it just initializes the itable for
itself the same way that itableinitd thread would do eventually. To
prevent race conditions, each group is protected by the mutex.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/ext4.h   |   65 +++++++++++++
 fs/ext4/ialloc.c |   75 ++++++++++++++++
 fs/ext4/super.c  |  262 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 402 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index dbd6760..906af5d 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1144,6 +1144,9 @@ struct ext4_sb_info {
 
 	/* workqueue for dio unwritten */
 	struct workqueue_struct *dio_unwritten_wq;
+
+	/* Lazy inode table initialization info */
+	struct ext4_li_info *s_li_info;
 };
 
 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
@@ -1467,6 +1470,66 @@ void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr,
 extern struct proc_dir_entry *ext4_proc_root;
 
 /*
+ * Timeout and state flag for lazy initialization inode thread.
+ */
+#define EXT4_LI_DEFAULT_TIMEOUT		5
+#define EXT4_LAZYINIT_QUIT		0x0001
+
+/*
+ * Lazy inode table initialization info
+ */
+struct ext4_li_info {
+	struct super_block	*li_super;
+	struct ext4_sb_info	*li_sbi;
+
+	spinlock_t		li_state_lock;
+	unsigned long		li_state;
+
+	wait_queue_head_t	li_wait_daemon;
+	wait_queue_head_t	li_wait_task;
+
+	unsigned long		li_interval;
+
+	struct timer_list	li_timer;
+	struct task_struct	*li_task;
+
+	struct mutex		*li_mtx;
+};
+
+/* Get pointer to lazyinit thread mutex li_mtx for particular group */
+static inline struct mutex *ext4_li_mutex_ptr(struct super_block *sb,
+						      ext4_group_t group)
+{
+	return &EXT4_SB(sb)->s_li_info->li_mtx[group];
+}
+
+/* Lock lazyinit thread mutex for particular thread */
+static inline void ext4_li_lock(struct super_block *sb,
+					ext4_group_t group)
+{
+	struct mutex *mtx = ext4_li_mutex_ptr(sb, group);
+	mutex_lock(mtx);
+}
+
+/* Unlock lazyinit thread mutex for particular thread */
+static inline void ext4_li_unlock(struct super_block *sb,
+					ext4_group_t group)
+{
+	struct mutex *mtx = ext4_li_mutex_ptr(sb, group);
+	mutex_unlock(mtx);
+}
+
+/*
+ * Inode table initialization is allowed if and only if
+ * s_li_info in ext4_sb_info is initialized
+ */
+static inline int ext4_itable_init_allowed(struct super_block *sb)
+{
+	struct ext4_li_info *eli = EXT4_SB(sb)->s_li_info;
+	return ((NULL == eli) ? 0 : 1);
+}
+
+/*
  * Function prototypes
  */
 
@@ -1539,6 +1602,8 @@ extern unsigned ext4_init_inode_bitmap(struct super_block *sb,
 				       ext4_group_t group,
 				       struct ext4_group_desc *desc);
 extern void mark_bitmap_end(int start_bit, int end_bit, char *bitmap);
+extern int ext4_init_inode_table(struct super_block *sb,
+				 ext4_group_t group);
 
 /* mballoc.c */
 extern long ext4_mb_stats;
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 25c4b31..df10302 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -123,6 +123,11 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
 		unlock_buffer(bh);
 		return bh;
 	}
+
+	if (!(desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED)) &&
+	      ext4_itable_init_allowed(sb))
+		ext4_init_inode_table(sb, block_group);
+
 	ext4_lock_group(sb, block_group);
 	if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
 		ext4_init_inode_bitmap(sb, bh, block_group, desc);
@@ -1205,3 +1210,73 @@ unsigned long ext4_count_dirs(struct super_block * sb)
 	}
 	return count;
 }
+
+/*
+ * Initializes an uninitialized inode table - just write zeroes through
+ * the whole inode table. Must be called without group spinlock. Since
+ * this is called from itableinitd thread as well as from ext4_new_inode
+ * there are mutexes in s_li_info to prevent race conditions. Do not call
+ * this withou s_li_info uninitialized. It s_li_info is not initialized
+ * user does not want to init inode tables, or they are already zeroed.
+ */
+extern int ext4_init_inode_table(struct super_block *sb, ext4_group_t group)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct ext4_group_desc *gdp = NULL;
+	struct buffer_head *group_desc_bh;
+	handle_t *handle;
+	ext4_fsblk_t blk;
+	int num, ret = 0;
+
+	BUG_ON(NULL == sbi->s_li_info);
+
+	if (sb->s_flags & MS_RDONLY) {
+		ext4_warning(sb, "Filesystem mounter read only. "
+				 "Lazy itable initialization aborted!");
+		ret = 1;
+		return ret;
+	}
+
+	handle = ext4_journal_start_sb(sb, 1);
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		return ret;
+	}
+
+	gdp = ext4_get_group_desc(sb, group, &group_desc_bh);
+	if (!gdp)
+		return ret;
+
+	blk = ext4_inode_table(sb, gdp);
+	num = sbi->s_itb_per_group - 1;
+
+	ext4_li_lock(sb, group);
+	ext4_lock_group(sb, group);
+
+	if (!(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))) {
+		BUFFER_TRACE(group_desc_bh, "get_write_access");
+		ret = ext4_journal_get_write_access(handle,
+						    group_desc_bh);
+		if (ret)
+			goto err_out;
+
+		ext4_unlock_group(sb, group);
+		ret = sb_issue_zeroout(sb, blk, num);
+		ext4_lock_group(sb, group);
+		gdp->bg_flags |= cpu_to_le16(EXT4_BG_INODE_ZEROED);
+
+		gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
+		BUFFER_TRACE(group_desc_bh,
+			     "call ext4_handle_dirty_metadata");
+		ret = ext4_handle_dirty_metadata(handle, NULL,
+						 group_desc_bh);
+		ext4_debug("zero out inode table in group %d\n", group);
+	}
+
+err_out:
+	ext4_unlock_group(sb, group);
+	ext4_li_unlock(sb, group);
+	ext4_journal_stop(handle);
+
+	return ret;
+}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 515e306..84a5993 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -41,6 +41,10 @@
 #include <linux/crc16.h>
 #include <asm/uaccess.h>
 
+
+#include <linux/kthread.h>
+#include <linux/freezer.h>
+
 #include "ext4.h"
 #include "ext4_jbd2.h"
 #include "xattr.h"
@@ -70,6 +74,7 @@ static void ext4_write_super(struct super_block *sb);
 static int ext4_freeze(struct super_block *sb);
 static int ext4_get_sb(struct file_system_type *fs_type, int flags,
 		       const char *dev_name, void *data, struct vfsmount *mnt);
+static void ext4_destroy_lazyinit_thread(struct super_block *sb);
 
 #if !defined(CONFIG_EXT3_FS) && !defined(CONFIG_EXT3_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
 static struct file_system_type ext3_fs_type = {
@@ -664,6 +669,7 @@ static void ext4_put_super(struct super_block *sb)
 				   "Couldn't clean up the journal");
 	}
 
+	ext4_destroy_lazyinit_thread(sb);
 	ext4_release_system_zone(sb);
 	ext4_mb_release(sb);
 	ext4_ext_release(sb);
@@ -2443,6 +2449,244 @@ static int ext4_feature_set_ok(struct super_block *sb, int readonly)
 	return 1;
 }
 
+static void ext4_lazyinode_timeout(unsigned long data)
+{
+	struct task_struct *p = (struct task_struct *)data;
+	wake_up_process(p);
+}
+
+/*
+ * This is called from itableinitd thread. Its purpose in life is to
+ * initialize (zero out) uninitialized inode tables. It walks through
+ * allocation groups searching for uninitialized inode table. When such
+ * a group is found it calls ext4_init_table and then puts itself into
+ * sleep for defined number seconds so it does not disturb other ongoing
+ * IO.
+ */
+static int ext4_lazyinit_thread(void *arg)
+{
+	struct ext4_li_info *eli = (struct ext4_li_info *)arg;
+	ext4_group_t group, ngroups;
+	struct ext4_group_desc *gdp = NULL;
+	struct super_block *sb;
+	int timeout = 0, ret;
+
+	BUG_ON(NULL == eli);
+
+	sb = eli->li_super;
+	ngroups = EXT4_SB(sb)->s_groups_count;
+	eli->li_timer.data = (unsigned long)current;
+	eli->li_timer.function = ext4_lazyinode_timeout;
+
+	eli->li_task = current;
+	wake_up(&eli->li_wait_task);
+
+	ext4_msg(sb, KERN_INFO,
+		"itableinitd starting. Wakeup interval = %lu seconds.",
+		eli->li_interval / HZ);
+
+	spin_lock(&eli->li_state_lock);
+	for (group = 0; group < ngroups; group++) {
+
+		if (eli->li_state & EXT4_LAZYINIT_QUIT)
+			break;
+
+		gdp = ext4_get_group_desc(sb, group, NULL);
+		if (!gdp)
+			continue;
+
+		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))
+			continue;
+
+		ret = ext4_init_inode_table(sb, group);
+		if (ret)
+			goto exit_thread;
+
+		eli->li_timer.expires = jiffies + eli->li_interval;
+		add_timer(&eli->li_timer);
+
+		if (freezing(current)) {
+			spin_unlock(&eli->li_state_lock);
+			refrigerator();
+			spin_lock(&eli->li_state_lock);
+		} else {
+			DEFINE_WAIT(wait);
+
+			prepare_to_wait(&eli->li_wait_daemon, &wait,
+					TASK_INTERRUPTIBLE);
+
+			spin_unlock(&eli->li_state_lock);
+			schedule();
+			spin_lock(&eli->li_state_lock);
+
+			finish_wait(&eli->li_wait_daemon, &wait);
+			timeout = time_after_eq(jiffies, eli->li_timer.expires);
+		}
+	}
+
+exit_thread:
+	spin_unlock(&eli->li_state_lock);
+
+	eli->li_task = NULL;
+	wake_up(&eli->li_wait_task);
+	return 0;
+}
+
+static int ext4_lazyinit_start_thread(struct ext4_li_info *eli)
+{
+	struct task_struct *t;
+
+	t = kthread_run(ext4_lazyinit_thread, eli, "itableinitd");
+	if (IS_ERR(t)) {
+		int err = PTR_ERR(t);
+
+		printk(KERN_CRIT "EXT4: error %d creating inode table "
+				 "initialization thread\n",
+		       err);
+		return err;
+	}
+	wait_event(eli->li_wait_task, eli->li_task != NULL);
+	return 0;
+}
+
+/*
+ * Initialize new ext4_li_info structure */
+static struct ext4_li_info *ext4_lazyinit_new(struct super_block *sb)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	ext4_group_t i, ngroups = sbi->s_groups_count;
+	struct ext4_li_info *eli;
+
+	eli = kzalloc(sizeof(*eli), GFP_KERNEL);
+	if (!eli)
+		return NULL;
+
+	eli->li_mtx = kzalloc(ngroups * sizeof(struct mutex),
+				       GFP_KERNEL);
+	if (!eli->li_mtx)
+		goto free_eli;
+
+	for (i = 0; i < ngroups; i++)
+		mutex_init(&eli->li_mtx[i]);
+
+	eli->li_sbi = sbi;
+	eli->li_super = sb;
+
+	init_waitqueue_head(&eli->li_wait_daemon);
+	init_waitqueue_head(&eli->li_wait_task);
+	spin_lock_init(&eli->li_state_lock);
+	init_timer(&eli->li_timer);
+
+	eli->li_interval = HZ * EXT4_LI_DEFAULT_TIMEOUT;
+
+	return eli;
+
+free_eli:
+	kfree(eli);
+	return NULL;
+}
+
+/*
+ * Check whether it make sense to run itableinitd or not.
+ * If there is at least one uninitialized inode table, return 1,
+ * else return 0.
+ */
+static int ext4_has_uninit_itable(struct super_block *sb)
+{
+	ext4_group_t i, ngroups = EXT4_SB(sb)->s_groups_count;
+	struct ext4_group_desc *gdp = NULL;
+	int ret = 1;
+
+	for (i = 0; i < ngroups; i++) {
+		gdp = ext4_get_group_desc(sb, i, NULL);
+		if (!gdp)
+			continue;
+
+		if (!(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED)))
+			goto has_uinit_itb;
+	}
+	ret = 0;
+
+has_uinit_itb:
+	return ret;
+}
+
+static int ext4_create_lazyinit_thread(struct super_block *sb)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	int ret = 0;
+
+	if ((sb->s_flags & MS_RDONLY) ||
+	     !test_opt(sb, INIT_INODE_TABLE) ||
+	     sbi->s_li_info ||
+	     !ext4_has_uninit_itable(sb)) {
+		sbi->s_li_info = NULL;
+		goto out;
+	}
+
+	sbi->s_li_info = ext4_lazyinit_new(sb);
+	if (!sbi->s_li_info) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = ext4_lazyinit_start_thread(sbi->s_li_info);
+	if (ret) {
+		kfree(sbi->s_li_info);
+		sbi->s_li_info = NULL;
+	}
+out:
+	return ret;
+}
+
+static void ext4_stop_lazyinit_thread(struct ext4_li_info *eli)
+{
+	eli->li_state |= EXT4_LAZYINIT_QUIT;
+
+	while (eli->li_task) {
+		wake_up(&eli->li_wait_daemon);
+		spin_unlock(&eli->li_state_lock);
+		wait_event(eli->li_wait_task, eli->li_task == NULL);
+		spin_lock(&eli->li_state_lock);
+	}
+}
+
+static void ext4_destroy_lazyinit_thread(struct super_block *sb)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct ext4_li_info *eli = sbi->s_li_info;
+	ext4_group_t i, ngroups = sbi->s_groups_count;
+
+	/*
+	 * If itableinitd thread exited earlier
+	 * there's nothing to be done.
+	 */
+	if (!eli)
+		return;
+
+	spin_lock(&eli->li_state_lock);
+	ext4_stop_lazyinit_thread(eli);
+	spin_unlock(&eli->li_state_lock);
+
+	del_timer_sync(&eli->li_timer);
+
+	/*
+	 * Acquire all semaphores, detach s_li_info from ext4_sb_info,
+	 * drop all semaphores (because of lockdep) and then we can safely
+	 * free the ext4_li_info structure.
+	 */
+	for (i = 0; i < ngroups; i++)
+		mutex_lock(&eli->li_mtx[i]);
+
+	sbi->s_li_info = NULL;
+
+	for (i = 0; i < ngroups; i++)
+		mutex_unlock(&eli->li_mtx[i]);
+
+	kfree(eli->li_mtx);
+	kfree(eli);
+}
+
 static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 				__releases(kernel_lock)
 				__acquires(kernel_lock)
@@ -3037,6 +3281,11 @@ no_journal:
 		goto failed_mount4;
 	};
 
+	err = ext4_create_lazyinit_thread(sb);
+	if (err)
+		ext4_msg(sb, KERN_ERR, "failed to initalize itableinitd (%d)",
+			 err);
+
 	EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
 	ext4_orphan_cleanup(sb, es);
 	EXT4_SB(sb)->s_mount_state &= ~EXT4_ORPHAN_FS;
@@ -3723,6 +3972,19 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
 			enable_quota = 1;
 		}
 	}
+
+	/*
+	 * Reinitialize lazy itable initialization thread based on
+	 * current settings
+	 */
+	ext4_destroy_lazyinit_thread(sb);
+	err = ext4_create_lazyinit_thread(sb);
+	if (err) {
+		ext4_msg(sb, KERN_ERR,
+			"failed to initalize itableinitd (%d)",
+			 err);
+	}
+
 	ext4_setup_system_zone(sb);
 	if (sbi->s_journal == NULL)
 		ext4_commit_super(sb, 1);
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Add inode table initialization code into Ext4
  2010-08-20 17:51 ` [PATCH 3/3] Add inode table initialization code into Ext4 Lukas Czerner
@ 2010-08-23 22:59   ` Andreas Dilger
  2010-08-24  9:01     ` Lukas Czerner
  2010-08-24 12:38     ` Ric Wheeler
  2010-08-26 11:03   ` Peng Tao
  1 sibling, 2 replies; 12+ messages in thread
From: Andreas Dilger @ 2010-08-23 22:59 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, rwheeler, sandeen, jack, tytso

On 2010-08-20, at 11:51, Lukas Czerner wrote:
> When file system is mounted with "inititable" mount option, new thread
> (called itableinitd) is created. This thread walks through allocation
> groups searching for the group with not yet initialized inode table.
> When such a group is found it write zeroes through whole inode table and
> put itself into sleep for defined number of seconds to not disturb other
> ongoing I/O. This is repeated until it walks through every allocation group
> then the iitableinitd thread is stopped.

Eric and I were discussing this in IRC, and for SSDs and thinly-provisioned storage (including loopback files, VM images, etc) it makes sense to only write zeroes into itable blocks if they are not already zero.  Otherwise, the backing storage will allocate thousands/millions of blocks that might not otherwise be needed.  One of the original reasons for the mke2fs lazy_itable_init option was to allow testing of large filesystems with sparse loopback files smaller than the filesystem being tested.

Reading the itable blocks before writing zeroes into them wouldn't be much (if any) slower than writing them.  It might make sense to just have a binary toggle, so that if any non-zero blocks are read from the filesystem the rest of the blocks will be zeroed out (to avoid doubling the total IO needed).  This has the added advantage that reads from erased SSDs (which return zero on read, the only sane action IMHO) or loopback files never need to do any writes.

The drawback is that reading the blocks will potentially pollute the cache if done incorrectly, but this can be handled with O_DIRECT reads.  It would also increase the CPU usage during the single zeroing pass.

> When regular inode allocation are going too fast, there is a chance that
> it hits the group with uninitialized inode table sooner than the
> itableinitd thread. In that case it just initializes the itable for
> itself the same way that itableinitd thread would do eventually.

I think this is the correct approach.  It should only take a fraction of a second to zero the few MB in the itable being accessed, and is still much faster than waiting for mke2fs to zero all of them.

> To prevent race conditions, each group is protected by the mutex.

> +/* Get pointer to lazyinit thread mutex li_mtx for particular group */
> +static inline struct mutex *ext4_li_mutex_ptr(struct super_block *sb,
> +						      ext4_group_t group)
> +{
> +	return &EXT4_SB(sb)->s_li_info->li_mtx[group];
> +}

Note that even allocating a single pointer per group is too much on a very large filesystem.  At 1TB this is 8192 groups * 32 bytes/ptr = 256kB (struct mutex may be even larger if debug is enabled).  This will fail allocation.

There is already the per-group alloc_sem in ext4_group_info that could be used.  Yes, it would mean that itable zeroing can conflict with block allocation for that group, but I consider this a very low-probability event, since blocks are normally allocated from the group where the inode was allocated, and this lock is only held once in the lifetime of the filesystem.

> @@ -123,6 +123,11 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t +	if (!(desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED)) &&
> +	      ext4_itable_init_allowed(sb))
> +		ext4_init_inode_table(sb, block_group);

This should be marked "unlikely()" since it will only ever be true right after formatting.

> +/*
> + * Initializes an uninitialized inode table - just write zeroes through
> + * the whole inode table. Must be called without group spinlock. Since
> + * this is called from itableinitd thread as well as from ext4_new_inode
> + * there are mutexes in s_li_info to prevent race conditions.

This comment would need to be changed to reflect that the locks are not in s_li_info.

> Do not call
> + * this withou s_li_info uninitialized. It s_li_info is not initialized

Some typos here...  Should be:

"Do not call this with s_li_info uninitialized.  If s_li_info is not ..."

> + * user does not want to init inode tables, or they are already zeroed.

That said, it makes sense to allow this function to be called without the need
for s_li_info at all.  Consider the case of resizing a filesystem, this can be
used to zero out the inode table, instead of the much-less-efficient code in setup_new_group_blocks().  That code currently journals all of the itable blocks to handle crash recovery with a write cache, but with the improved barrier code that is no longer necessary.

> +extern int ext4_init_inode_table(struct super_block *sb, ext4_group_t group)
> +{
> +	BUG_ON(NULL == sbi->s_li_info);

It looks like s_li_info is only needed for the mutex, which is no longer needed, so this requirement can just be removed.

> +	if (sb->s_flags & MS_RDONLY) {
> +		ext4_warning(sb, "Filesystem mounter read only. "

s/mounter/mounted/

> +	handle = ext4_journal_start_sb(sb, 1);
> +	if (IS_ERR(handle)) {
> +		ret = PTR_ERR(handle);
> +		return ret;
> +	}
> +
> +	gdp = ext4_get_group_desc(sb, group, &group_desc_bh);
> +	if (!gdp)
> +		return ret;
> +
> +	blk = ext4_inode_table(sb, gdp);
> +	num = sbi->s_itb_per_group - 1;
> +
> +	ext4_li_lock(sb, group);
> +	ext4_lock_group(sb, group);

It's generally a bad idea to start a journal handle and then grab blocking locks while holding the handle.  I always try to get all of the potential failure cases out of the way before starting the journal handle, so that the only way it can fail is because the journal was aborted or the IO fails.

There will also be additional lock ordering issues with ext4_group_info->alloc_sem and the journal.  It looks like alloc_sem is gotten after the journal handle in ext4_add_groupblocks(), but alloc_sem looks to be held BEFORE the journal handle in the mballoc code (AFAICS, it isn't totally clear from the code).

> +
> +	if (!(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))) {
> +		BUFFER_TRACE(group_desc_bh, "get_write_access");
> +		ret = ext4_journal_get_write_access(handle,
> +						    group_desc_bh);
> +		if (ret)
> +			goto err_out;
> +
> +		ext4_unlock_group(sb, group);
> +		ret = sb_issue_zeroout(sb, blk, num);

Note that there is also ext4_ext_zeroout(), which is zeroing fallocate() chunks on disk in a slightly different manner (submit_bio() with a bio of a zeroed page).  It makes sense to have only a single mechanism for doing this, and of sb_issue_zeroout() is the right way to do that, ext4_ext_zeroout() should also be changed to use this.

> +static int ext4_lazyinit_thread(void *arg)
> +{
> +	spin_lock(&eli->li_state_lock);
> +	for (group = 0; group < ngroups; group++) {
> +
> +		if (eli->li_state & EXT4_LAZYINIT_QUIT)
> +			break;
> +
> +		gdp = ext4_get_group_desc(sb, group, NULL);
> +		if (!gdp)
> +			continue;
> +
> +		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))
> +			continue;

For a very large filesystem, this may spin through a very large number of groups, so I wonder if it makes sense to yield the spinlock here.  That may not be needed if other "on demand" itable zeroing threads do not need to get li_state_lock, which then makes me wonder why this lock is needed if there are no other threads contending on it?

> +		ret = ext4_init_inode_table(sb, group);
> +		if (ret)
> +			goto exit_thread;

This can't call ext4_init_inode_table() with the spinlock held, since that is calling sleeping functions (journal start, disk IO, etc).

> +static int ext4_lazyinit_start_thread(struct ext4_li_info *eli)
> +{
> +	t = kthread_run(ext4_lazyinit_thread, eli, "itableinitd");

This name doesn't really tell an uninformed user what the thread is doing.  Something like "ext4init-08:01" or similar would tell the user that this thread is ext4-related, it is initializing something (granted, there isn't room to say it is zeroing the itable, but my hope is that this thread will also do other things like check the group descriptor checksums at startup also), and it is working on block device 08:01.

On that related note, it may make sense to handle the itable zeroing for all filesystems from a single thread, instead of possibly starting multiple threads all writing to the same block device and causing a lot of seeking.  That means that at thread startup time it should verify that only a single thread is started, and all registered filesystems should put their zeroing tasks onto a list (under spinlock) that the one thread checks before it exits.


> +static int ext4_has_uninit_itable(struct super_block *sb)
> +{
> +	ext4_group_t i, ngroups = EXT4_SB(sb)->s_groups_count;
> +	struct ext4_group_desc *gdp = NULL;
> +	int ret = 1;
> +
> +	for (i = 0; i < ngroups; i++) {
> +		gdp = ext4_get_group_desc(sb, i, NULL);
> +		if (!gdp)
> +			continue;
> +
> +		if (!(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED)))
> +			goto has_uinit_itb;

By the time this scan is done, you may as well have started the thread to do the same scan.  That will avoid traversing up to 131072 groups for a 16TB filesystem twice, if the previous thread was interrupted near the end.

> +static void ext4_stop_lazyinit_thread(struct ext4_li_info *eli)
> +{
> +	eli->li_state |= EXT4_LAZYINIT_QUIT;
> +
> +	while (eli->li_task) {
> +		wake_up(&eli->li_wait_daemon);
> +		spin_unlock(&eli->li_state_lock);
> +		wait_event(eli->li_wait_task, eli->li_task == NULL);
> +		spin_lock(&eli->li_state_lock);
> +	}
> +}

This isn't very clear from a readability or usability standpoint.  A function shouldn't unlock a lock it didn't get.  It isn't really clear what the spinlock is needed for in this case?  It can't be needed to access "eli", otherwise the whole loop is unsafe.

> +	err = ext4_create_lazyinit_thread(sb);
> +	if (err)
> +		ext4_msg(sb, KERN_ERR, "failed to initalize itableinitd (%d)",
> +			 err);

This message shouldn't hard-code the name of the thread.  Maybe just write:

"failed to initialize inode table zeroing thread"

> @@ -3723,6 +3972,19 @@ static int ext4_remount(struct super_block *sb, int +	+	ext4_destroy_lazyinit_thread(sb);
> +	err = ext4_create_lazyinit_thread(sb);

Rather than killing the old thread and starting a new one (which will have to scan all of the groups again) it makes sense to just silently fail the startup of the new thread if one is already running.


Cheers, Andreas






^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Add inode table initialization code into Ext4
  2010-08-23 22:59   ` Andreas Dilger
@ 2010-08-24  9:01     ` Lukas Czerner
  2010-08-25  6:08       ` Andreas Dilger
  2010-08-24 12:38     ` Ric Wheeler
  1 sibling, 1 reply; 12+ messages in thread
From: Lukas Czerner @ 2010-08-24  9:01 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Lukas Czerner, linux-ext4, rwheeler, sandeen, jack, tytso

On Mon, 23 Aug 2010, Andreas Dilger wrote:

> On 2010-08-20, at 11:51, Lukas Czerner wrote:
> > When file system is mounted with "inititable" mount option, new thread
> > (called itableinitd) is created. This thread walks through allocation
> > groups searching for the group with not yet initialized inode table.
> > When such a group is found it write zeroes through whole inode table and
> > put itself into sleep for defined number of seconds to not disturb other
> > ongoing I/O. This is repeated until it walks through every allocation group
> > then the iitableinitd thread is stopped.
> 
> Eric and I were discussing this in IRC, and for SSDs and thinly-provisioned storage (including loopback files, VM images, etc) it makes sense to only write zeroes into itable blocks if they are not already zero.  Otherwise, the backing storage will allocate thousands/millions of blocks that might not otherwise be needed.  One of the original reasons for the mke2fs lazy_itable_init option was to allow testing of large filesystems with sparse loopback files smaller than the filesystem being tested.
> 
> Reading the itable blocks before writing zeroes into them wouldn't be much (if any) slower than writing them.  It might make sense to just have a binary toggle, so that if any non-zero blocks are read from the filesystem the rest of the blocks will be zeroed out (to avoid doubling the total IO needed).  This has the added advantage that reads from erased SSDs (which return zero on read, the only sane action IMHO) or loopback files never need to do any writes.
> 
> The drawback is that reading the blocks will potentially pollute the cache if done incorrectly, but this can be handled with O_DIRECT reads.  It would also increase the CPU usage during the single zeroing pass.
> 
> > When regular inode allocation are going too fast, there is a chance that
> > it hits the group with uninitialized inode table sooner than the
> > itableinitd thread. In that case it just initializes the itable for
> > itself the same way that itableinitd thread would do eventually.
> 
> I think this is the correct approach.  It should only take a fraction of a second to zero the few MB in the itable being accessed, and is still much faster than waiting for mke2fs to zero all of them.
> 
> > To prevent race conditions, each group is protected by the mutex.
> 
> > +/* Get pointer to lazyinit thread mutex li_mtx for particular group */
> > +static inline struct mutex *ext4_li_mutex_ptr(struct super_block *sb,
> > +						      ext4_group_t group)
> > +{
> > +	return &EXT4_SB(sb)->s_li_info->li_mtx[group];
> > +}
> 
> Note that even allocating a single pointer per group is too much on a very large filesystem.  At 1TB this is 8192 groups * 32 bytes/ptr = 256kB (struct mutex may be even larger if debug is enabled).  This will fail allocation.
> 
> There is already the per-group alloc_sem in ext4_group_info that could be used.  Yes, it would mean that itable zeroing can conflict with block allocation for that group, but I consider this a very low-probability event, since blocks are normally allocated from the group where the inode was allocated, and this lock is only held once in the lifetime of the filesystem.
> 
> > @@ -123,6 +123,11 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t +	if (!(desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED)) &&
> > +	      ext4_itable_init_allowed(sb))
> > +		ext4_init_inode_table(sb, block_group);
> 
> This should be marked "unlikely()" since it will only ever be true right after formatting.
> 
> > +/*
> > + * Initializes an uninitialized inode table - just write zeroes through
> > + * the whole inode table. Must be called without group spinlock. Since
> > + * this is called from itableinitd thread as well as from ext4_new_inode
> > + * there are mutexes in s_li_info to prevent race conditions.
> 
> This comment would need to be changed to reflect that the locks are not in s_li_info.
> 
> > Do not call
> > + * this withou s_li_info uninitialized. It s_li_info is not initialized
> 
> Some typos here...  Should be:
> 
> "Do not call this with s_li_info uninitialized.  If s_li_info is not ..."
> 
> > + * user does not want to init inode tables, or they are already zeroed.
> 
> That said, it makes sense to allow this function to be called without the need
> for s_li_info at all.  Consider the case of resizing a filesystem, this can be
> used to zero out the inode table, instead of the much-less-efficient code in setup_new_group_blocks().  That code currently journals all of the itable blocks to handle crash recovery with a write cache, but with the improved barrier code that is no longer necessary.
> 
> > +extern int ext4_init_inode_table(struct super_block *sb, ext4_group_t group)
> > +{
> > +	BUG_ON(NULL == sbi->s_li_info);
> 
> It looks like s_li_info is only needed for the mutex, which is no longer needed, so this requirement can just be removed.
> 
> > +	if (sb->s_flags & MS_RDONLY) {
> > +		ext4_warning(sb, "Filesystem mounter read only. "
> 
> s/mounter/mounted/
> 
> > +	handle = ext4_journal_start_sb(sb, 1);
> > +	if (IS_ERR(handle)) {
> > +		ret = PTR_ERR(handle);
> > +		return ret;
> > +	}
> > +
> > +	gdp = ext4_get_group_desc(sb, group, &group_desc_bh);
> > +	if (!gdp)
> > +		return ret;
> > +
> > +	blk = ext4_inode_table(sb, gdp);
> > +	num = sbi->s_itb_per_group - 1;
> > +
> > +	ext4_li_lock(sb, group);
> > +	ext4_lock_group(sb, group);
> 
> It's generally a bad idea to start a journal handle and then grab blocking locks while holding the handle.  I always try to get all of the potential failure cases out of the way before starting the journal handle, so that the only way it can fail is because the journal was aborted or the IO fails.
> 
> There will also be additional lock ordering issues with ext4_group_info->alloc_sem and the journal.  It looks like alloc_sem is gotten after the journal handle in ext4_add_groupblocks(), but alloc_sem looks to be held BEFORE the journal handle in the mballoc code (AFAICS, it isn't totally clear from the code).
> 
> > +
> > +	if (!(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))) {
> > +		BUFFER_TRACE(group_desc_bh, "get_write_access");
> > +		ret = ext4_journal_get_write_access(handle,
> > +						    group_desc_bh);
> > +		if (ret)
> > +			goto err_out;
> > +
> > +		ext4_unlock_group(sb, group);
> > +		ret = sb_issue_zeroout(sb, blk, num);
> 
> Note that there is also ext4_ext_zeroout(), which is zeroing fallocate() chunks on disk in a slightly different manner (submit_bio() with a bio of a zeroed page).  It makes sense to have only a single mechanism for doing this, and of sb_issue_zeroout() is the right way to do that, ext4_ext_zeroout() should also be changed to use this.
> 
> > +static int ext4_lazyinit_thread(void *arg)
> > +{
> > +	spin_lock(&eli->li_state_lock);
> > +	for (group = 0; group < ngroups; group++) {
> > +
> > +		if (eli->li_state & EXT4_LAZYINIT_QUIT)
> > +			break;
> > +
> > +		gdp = ext4_get_group_desc(sb, group, NULL);
> > +		if (!gdp)
> > +			continue;
> > +
> > +		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))
> > +			continue;
> 
> For a very large filesystem, this may spin through a very large number of groups, so I wonder if it makes sense to yield the spinlock here.  That may not be needed if other "on demand" itable zeroing threads do not need to get li_state_lock, which then makes me wonder why this lock is needed if there are no other threads contending on it?
> 
> > +		ret = ext4_init_inode_table(sb, group);
> > +		if (ret)
> > +			goto exit_thread;
> 
> This can't call ext4_init_inode_table() with the spinlock held, since that is calling sleeping functions (journal start, disk IO, etc).
> 
> > +static int ext4_lazyinit_start_thread(struct ext4_li_info *eli)
> > +{
> > +	t = kthread_run(ext4_lazyinit_thread, eli, "itableinitd");
> 
> This name doesn't really tell an uninformed user what the thread is doing.  Something like "ext4init-08:01" or similar would tell the user that this thread is ext4-related, it is initializing something (granted, there isn't room to say it is zeroing the itable, but my hope is that this thread will also do other things like check the group descriptor checksums at startup also), and it is working on block device 08:01.
> 
> On that related note, it may make sense to handle the itable zeroing for all filesystems from a single thread, instead of possibly starting multiple threads all writing to the same block device and causing a lot of seeking.  That means that at thread startup time it should verify that only a single thread is started, and all registered filesystems should put their zeroing tasks onto a list (under spinlock) that the one thread checks before it exits.

So you are suggesting filesystem independent threat which can be used by
any filesystem on any block device ? I really do not know, if other file
systems will be interested in it.

One problem that might emerge when we have one thread for multiple
devices is, that it can be fairly complicated to control the thread
according to device load. So it might be better to have one thread per
device (disk). But it is just my first guess, since I need to look more
into it.

> 
> 
> > +static int ext4_has_uninit_itable(struct super_block *sb)
> > +{
> > +	ext4_group_t i, ngroups = EXT4_SB(sb)->s_groups_count;
> > +	struct ext4_group_desc *gdp = NULL;
> > +	int ret = 1;
> > +
> > +	for (i = 0; i < ngroups; i++) {
> > +		gdp = ext4_get_group_desc(sb, i, NULL);
> > +		if (!gdp)
> > +			continue;
> > +
> > +		if (!(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED)))
> > +			goto has_uinit_itb;
> 
> By the time this scan is done, you may as well have started the thread to do the same scan.  That will avoid traversing up to 131072 groups for a 16TB filesystem twice, if the previous thread was interrupted near the end.
> 
> > +static void ext4_stop_lazyinit_thread(struct ext4_li_info *eli)
> > +{
> > +	eli->li_state |= EXT4_LAZYINIT_QUIT;
> > +
> > +	while (eli->li_task) {
> > +		wake_up(&eli->li_wait_daemon);
> > +		spin_unlock(&eli->li_state_lock);
> > +		wait_event(eli->li_wait_task, eli->li_task == NULL);
> > +		spin_lock(&eli->li_state_lock);
> > +	}
> > +}
> 
> This isn't very clear from a readability or usability standpoint.  A function shouldn't unlock a lock it didn't get.  It isn't really clear what the spinlock is needed for in this case?  It can't be needed to access "eli", otherwise the whole loop is unsafe.

Oh, the spinlock is probably not needed at all. I'll get rid of it.
> 
> > +	err = ext4_create_lazyinit_thread(sb);
> > +	if (err)
> > +		ext4_msg(sb, KERN_ERR, "failed to initalize itableinitd (%d)",
> > +			 err);
> 
> This message shouldn't hard-code the name of the thread.  Maybe just write:
> 
> "failed to initialize inode table zeroing thread"
> 
> > @@ -3723,6 +3972,19 @@ static int ext4_remount(struct super_block *sb, int +	+	ext4_destroy_lazyinit_thread(sb);
> > +	err = ext4_create_lazyinit_thread(sb);
> 
> Rather than killing the old thread and starting a new one (which will have to scan all of the groups again) it makes sense to just silently fail the startup of the new thread if one is already running.
> 
> 
> Cheers, Andreas
> 

Andreas, thank you for you comments and suggestions, it was very helpful
and I really appreciate it.

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Add inode table initialization code into Ext4
  2010-08-23 22:59   ` Andreas Dilger
  2010-08-24  9:01     ` Lukas Czerner
@ 2010-08-24 12:38     ` Ric Wheeler
  1 sibling, 0 replies; 12+ messages in thread
From: Ric Wheeler @ 2010-08-24 12:38 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Lukas Czerner, linux-ext4, sandeen, jack, tytso

On 08/23/2010 06:59 PM, Andreas Dilger wrote:
> On 2010-08-20, at 11:51, Lukas Czerner wrote:
>> When file system is mounted with "inititable" mount option, new thread
>> (called itableinitd) is created. This thread walks through allocation
>> groups searching for the group with not yet initialized inode table.
>> When such a group is found it write zeroes through whole inode table and
>> put itself into sleep for defined number of seconds to not disturb other
>> ongoing I/O. This is repeated until it walks through every allocation group
>> then the iitableinitd thread is stopped.
>
> Eric and I were discussing this in IRC, and for SSDs and thinly-provisioned storage (including loopback files, VM images, etc) it makes sense to only write zeroes into itable blocks if they are not already zero.  Otherwise, the backing storage will allocate thousands/millions of blocks that might not otherwise be needed.  One of the original reasons for the mke2fs lazy_itable_init option was to allow testing of large filesystems with sparse loopback files smaller than the filesystem being tested.
>
> Reading the itable blocks before writing zeroes into them wouldn't be much (if any) slower than writing them.  It might make sense to just have a binary toggle, so that if any non-zero blocks are read from the filesystem the rest of the blocks will be zeroed out (to avoid doubling the total IO needed).  This has the added advantage that reads from erased SSDs (which return zero on read, the only sane action IMHO) or loopback files never need to do any writes.
>
> The drawback is that reading the blocks will potentially pollute the cache if done incorrectly, but this can be handled with O_DIRECT reads.  It would also increase the CPU usage during the single zeroing pass.
>

I think that this makes sense. It the lun is sparse and the range is unmapped, 
note that no actual physical read will happen (many arrays will treat this like 
we treat allocated but unwritten extents and just return zeroed data to us).

ric


>> When regular inode allocation are going too fast, there is a chance that
>> it hits the group with uninitialized inode table sooner than the
>> itableinitd thread. In that case it just initializes the itable for
>> itself the same way that itableinitd thread would do eventually.
>
> I think this is the correct approach.  It should only take a fraction of a second to zero the few MB in the itable being accessed, and is still much faster than waiting for mke2fs to zero all of them.
>
>> To prevent race conditions, each group is protected by the mutex.
>
>> +/* Get pointer to lazyinit thread mutex li_mtx for particular group */
>> +static inline struct mutex *ext4_li_mutex_ptr(struct super_block *sb,
>> +						      ext4_group_t group)
>> +{
>> +	return&EXT4_SB(sb)->s_li_info->li_mtx[group];
>> +}
>
> Note that even allocating a single pointer per group is too much on a very large filesystem.  At 1TB this is 8192 groups * 32 bytes/ptr = 256kB (struct mutex may be even larger if debug is enabled).  This will fail allocation.
>
> There is already the per-group alloc_sem in ext4_group_info that could be used.  Yes, it would mean that itable zeroing can conflict with block allocation for that group, but I consider this a very low-probability event, since blocks are normally allocated from the group where the inode was allocated, and this lock is only held once in the lifetime of the filesystem.
>
>> @@ -123,6 +123,11 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t +	if (!(desc->bg_flags&  cpu_to_le16(EXT4_BG_INODE_ZEROED))&&
>> +	      ext4_itable_init_allowed(sb))
>> +		ext4_init_inode_table(sb, block_group);
>
> This should be marked "unlikely()" since it will only ever be true right after formatting.
>
>> +/*
>> + * Initializes an uninitialized inode table - just write zeroes through
>> + * the whole inode table. Must be called without group spinlock. Since
>> + * this is called from itableinitd thread as well as from ext4_new_inode
>> + * there are mutexes in s_li_info to prevent race conditions.
>
> This comment would need to be changed to reflect that the locks are not in s_li_info.
>
>> Do not call
>> + * this withou s_li_info uninitialized. It s_li_info is not initialized
>
> Some typos here...  Should be:
>
> "Do not call this with s_li_info uninitialized.  If s_li_info is not ..."
>
>> + * user does not want to init inode tables, or they are already zeroed.
>
> That said, it makes sense to allow this function to be called without the need
> for s_li_info at all.  Consider the case of resizing a filesystem, this can be
> used to zero out the inode table, instead of the much-less-efficient code in setup_new_group_blocks().  That code currently journals all of the itable blocks to handle crash recovery with a write cache, but with the improved barrier code that is no longer necessary.
>
>> +extern int ext4_init_inode_table(struct super_block *sb, ext4_group_t group)
>> +{
>> +	BUG_ON(NULL == sbi->s_li_info);
>
> It looks like s_li_info is only needed for the mutex, which is no longer needed, so this requirement can just be removed.
>
>> +	if (sb->s_flags&  MS_RDONLY) {
>> +		ext4_warning(sb, "Filesystem mounter read only. "
>
> s/mounter/mounted/
>
>> +	handle = ext4_journal_start_sb(sb, 1);
>> +	if (IS_ERR(handle)) {
>> +		ret = PTR_ERR(handle);
>> +		return ret;
>> +	}
>> +
>> +	gdp = ext4_get_group_desc(sb, group,&group_desc_bh);
>> +	if (!gdp)
>> +		return ret;
>> +
>> +	blk = ext4_inode_table(sb, gdp);
>> +	num = sbi->s_itb_per_group - 1;
>> +
>> +	ext4_li_lock(sb, group);
>> +	ext4_lock_group(sb, group);
>
> It's generally a bad idea to start a journal handle and then grab blocking locks while holding the handle.  I always try to get all of the potential failure cases out of the way before starting the journal handle, so that the only way it can fail is because the journal was aborted or the IO fails.
>
> There will also be additional lock ordering issues with ext4_group_info->alloc_sem and the journal.  It looks like alloc_sem is gotten after the journal handle in ext4_add_groupblocks(), but alloc_sem looks to be held BEFORE the journal handle in the mballoc code (AFAICS, it isn't totally clear from the code).
>
>> +
>> +	if (!(gdp->bg_flags&  cpu_to_le16(EXT4_BG_INODE_ZEROED))) {
>> +		BUFFER_TRACE(group_desc_bh, "get_write_access");
>> +		ret = ext4_journal_get_write_access(handle,
>> +						    group_desc_bh);
>> +		if (ret)
>> +			goto err_out;
>> +
>> +		ext4_unlock_group(sb, group);
>> +		ret = sb_issue_zeroout(sb, blk, num);
>
> Note that there is also ext4_ext_zeroout(), which is zeroing fallocate() chunks on disk in a slightly different manner (submit_bio() with a bio of a zeroed page).  It makes sense to have only a single mechanism for doing this, and of sb_issue_zeroout() is the right way to do that, ext4_ext_zeroout() should also be changed to use this.
>
>> +static int ext4_lazyinit_thread(void *arg)
>> +{
>> +	spin_lock(&eli->li_state_lock);
>> +	for (group = 0; group<  ngroups; group++) {
>> +
>> +		if (eli->li_state&  EXT4_LAZYINIT_QUIT)
>> +			break;
>> +
>> +		gdp = ext4_get_group_desc(sb, group, NULL);
>> +		if (!gdp)
>> +			continue;
>> +
>> +		if (gdp->bg_flags&  cpu_to_le16(EXT4_BG_INODE_ZEROED))
>> +			continue;
>
> For a very large filesystem, this may spin through a very large number of groups, so I wonder if it makes sense to yield the spinlock here.  That may not be needed if other "on demand" itable zeroing threads do not need to get li_state_lock, which then makes me wonder why this lock is needed if there are no other threads contending on it?
>
>> +		ret = ext4_init_inode_table(sb, group);
>> +		if (ret)
>> +			goto exit_thread;
>
> This can't call ext4_init_inode_table() with the spinlock held, since that is calling sleeping functions (journal start, disk IO, etc).
>
>> +static int ext4_lazyinit_start_thread(struct ext4_li_info *eli)
>> +{
>> +	t = kthread_run(ext4_lazyinit_thread, eli, "itableinitd");
>
> This name doesn't really tell an uninformed user what the thread is doing.  Something like "ext4init-08:01" or similar would tell the user that this thread is ext4-related, it is initializing something (granted, there isn't room to say it is zeroing the itable, but my hope is that this thread will also do other things like check the group descriptor checksums at startup also), and it is working on block device 08:01.
>
> On that related note, it may make sense to handle the itable zeroing for all filesystems from a single thread, instead of possibly starting multiple threads all writing to the same block device and causing a lot of seeking.  That means that at thread startup time it should verify that only a single thread is started, and all registered filesystems should put their zeroing tasks onto a list (under spinlock) that the one thread checks before it exits.
>
>
>> +static int ext4_has_uninit_itable(struct super_block *sb)
>> +{
>> +	ext4_group_t i, ngroups = EXT4_SB(sb)->s_groups_count;
>> +	struct ext4_group_desc *gdp = NULL;
>> +	int ret = 1;
>> +
>> +	for (i = 0; i<  ngroups; i++) {
>> +		gdp = ext4_get_group_desc(sb, i, NULL);
>> +		if (!gdp)
>> +			continue;
>> +
>> +		if (!(gdp->bg_flags&  cpu_to_le16(EXT4_BG_INODE_ZEROED)))
>> +			goto has_uinit_itb;
>
> By the time this scan is done, you may as well have started the thread to do the same scan.  That will avoid traversing up to 131072 groups for a 16TB filesystem twice, if the previous thread was interrupted near the end.
>
>> +static void ext4_stop_lazyinit_thread(struct ext4_li_info *eli)
>> +{
>> +	eli->li_state |= EXT4_LAZYINIT_QUIT;
>> +
>> +	while (eli->li_task) {
>> +		wake_up(&eli->li_wait_daemon);
>> +		spin_unlock(&eli->li_state_lock);
>> +		wait_event(eli->li_wait_task, eli->li_task == NULL);
>> +		spin_lock(&eli->li_state_lock);
>> +	}
>> +}
>
> This isn't very clear from a readability or usability standpoint.  A function shouldn't unlock a lock it didn't get.  It isn't really clear what the spinlock is needed for in this case?  It can't be needed to access "eli", otherwise the whole loop is unsafe.
>
>> +	err = ext4_create_lazyinit_thread(sb);
>> +	if (err)
>> +		ext4_msg(sb, KERN_ERR, "failed to initalize itableinitd (%d)",
>> +			 err);
>
> This message shouldn't hard-code the name of the thread.  Maybe just write:
>
> "failed to initialize inode table zeroing thread"
>
>> @@ -3723,6 +3972,19 @@ static int ext4_remount(struct super_block *sb, int +	+	ext4_destroy_lazyinit_thread(sb);
>> +	err = ext4_create_lazyinit_thread(sb);
>
> Rather than killing the old thread and starting a new one (which will have to scan all of the groups again) it makes sense to just silently fail the startup of the new thread if one is already running.
>
>
> Cheers, Andreas
>
>
>
>
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Add inode table initialization code into Ext4
  2010-08-24  9:01     ` Lukas Czerner
@ 2010-08-25  6:08       ` Andreas Dilger
  2010-08-25 13:00         ` Lukas Czerner
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Dilger @ 2010-08-25  6:08 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, rwheeler, sandeen, jack, tytso

On 2010-08-24, at 03:01, Lukas Czerner wrote:
> On Mon, 23 Aug 2010, Andreas Dilger wrote:
>> 
>> On that related note, it may make sense to handle the itable zeroing for all filesystems from a single thread, instead of possibly starting multiple threads all writing to the same block device and causing a lot of seeking.  That means that at thread startup time it should verify that only a single thread is started, and all registered filesystems should put their zeroing tasks onto a list (under spinlock) that the one thread checks before it exits.
> 
> So you are suggesting filesystem independent threat which can be used by
> any filesystem on any block device ? I really do not know, if other file
> systems will be interested in it.

No, I just mean for ext4 filesystems, not other types of filesystems.  What I'm thinking about is that a system gets newly installed with 3 or 4 ext4 filesystems (/, /var, /home, /usr) in LVM or partitions on the same disk.  If we have multiple zeroing threads going on the same disk at one time it will be slow and seeking, and will significantly impact interactive performance.

> One problem that might emerge when we have one thread for multiple
> devices is, that it can be fairly complicated to control the thread
> according to device load. So it might be better to have one thread per
> device (disk). But it is just my first guess, since I need to look more
> into it.

While one thread per disk would possibly be faster, I don't think this is a critical performance issue, and determining the filesystem-to-disk mapping is complex and not easily done.  Having a single thread initialize the inode tables in series (one filesystem at a time) is fine, and it only needs to control the device load for a single filesystem.

Cheers, Andreas






^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Add inode table initialization code into Ext4
  2010-08-25  6:08       ` Andreas Dilger
@ 2010-08-25 13:00         ` Lukas Czerner
  0 siblings, 0 replies; 12+ messages in thread
From: Lukas Czerner @ 2010-08-25 13:00 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Lukas Czerner, linux-ext4, rwheeler, sandeen, jack, tytso

On Wed, 25 Aug 2010, Andreas Dilger wrote:

> On 2010-08-24, at 03:01, Lukas Czerner wrote:
> > On Mon, 23 Aug 2010, Andreas Dilger wrote:
> >> 
> >> On that related note, it may make sense to handle the itable zeroing for all filesystems from a single thread, instead of possibly starting multiple threads all writing to the same block device and causing a lot of seeking.  That means that at thread startup time it should verify that only a single thread is started, and all registered filesystems should put their zeroing tasks onto a list (under spinlock) that the one thread checks before it exits.

I Agree.

> > 
> > So you are suggesting filesystem independent threat which can be used by
> > any filesystem on any block device ? I really do not know, if other file
> > systems will be interested in it.
> 
> No, I just mean for ext4 filesystems, not other types of filesystems.  What I'm thinking about is that a system gets newly installed with 3 or 4 ext4 filesystems (/, /var, /home, /usr) in LVM or partitions on the same disk.  If we have multiple zeroing threads going on the same disk at one time it will be slow and seeking, and will significantly impact interactive performance.

That sounds good to me. Thanks.

> 
> > One problem that might emerge when we have one thread for multiple
> > devices is, that it can be fairly complicated to control the thread
> > according to device load. So it might be better to have one thread per
> > device (disk). But it is just my first guess, since I need to look more
> > into it.
> 
> While one thread per disk would possibly be faster, I don't think this is a critical performance issue, and determining the filesystem-to-disk mapping is complex and not easily done.  Having a single thread initialize the inode tables in series (one filesystem at a time) is fine, and it only needs to control the device load for a single filesystem.
> 
> Cheers, Andreas

Ok, doing one file system at a time seems reasonable, but in this case
mkfs should really zero out at least some of the inode tables, so we
have some reserve before the thread reach the next file system.

However, I think we might have a problem when there are more really big
filesystems, so at the end we will surely need to do some optimization
like "when there is too much traffic on one device, try another", or
"round-robin".

Regards.
-Lukas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Add inode table initialization code into Ext4
  2010-08-20 17:51 ` [PATCH 3/3] Add inode table initialization code into Ext4 Lukas Czerner
  2010-08-23 22:59   ` Andreas Dilger
@ 2010-08-26 11:03   ` Peng Tao
  2010-08-26 12:14     ` Lukas Czerner
  1 sibling, 1 reply; 12+ messages in thread
From: Peng Tao @ 2010-08-26 11:03 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, rwheeler, sandeen, jack, tytso, adilger

Hi, all,

On Sat, Aug 21, 2010 at 1:51 AM, Lukas Czerner <lczerner@redhat.com> wrote:
> When lazy_itable_init extended option is passed to mke2fs, it
> considerably speed up filesystem creation because inode tables are left
> uninitialized, thus contains some old data. When this fs is mounted
> filesystem code should initialize (zero out) uninitialized inode table.
> So far this code was missing for ext4 and this patch adds this feature.
>
> When file system is mounted with "inititable" mount option, new thread
> (called itableinitd) is created. This thread walks through allocation
> groups searching for the group with not yet initialized inode table.
> When such a group is found it write zeroes through whole inode table and
> put itself into sleep for defined number of seconds to not disturb other
> ongoing I/O. This is repeated until it walks through every allocation group
> then the iitableinitd thread is stopped.
This will slow down e2fsck speed that is gained from uninitialized
italbe. Am I missing something? What about having another block group
flag to tell itable that is just zeroed but not used, from itable that
is already in use?


-- 
Thanks,
-Bergwolf

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Add inode table initialization code into Ext4
  2010-08-26 11:03   ` Peng Tao
@ 2010-08-26 12:14     ` Lukas Czerner
  2010-08-26 14:15       ` Peng Tao
  0 siblings, 1 reply; 12+ messages in thread
From: Lukas Czerner @ 2010-08-26 12:14 UTC (permalink / raw)
  To: Peng Tao; +Cc: Lukas Czerner, linux-ext4, rwheeler, sandeen, jack, tytso,
	adilger

On Thu, 26 Aug 2010, Peng Tao wrote:

> Hi, all,
> 
> On Sat, Aug 21, 2010 at 1:51 AM, Lukas Czerner <lczerner@redhat.com> wrote:
> > When lazy_itable_init extended option is passed to mke2fs, it
> > considerably speed up filesystem creation because inode tables are left
> > uninitialized, thus contains some old data. When this fs is mounted
> > filesystem code should initialize (zero out) uninitialized inode table.
> > So far this code was missing for ext4 and this patch adds this feature.
> >
> > When file system is mounted with "inititable" mount option, new thread
> > (called itableinitd) is created. This thread walks through allocation
> > groups searching for the group with not yet initialized inode table.
> > When such a group is found it write zeroes through whole inode table and
> > put itself into sleep for defined number of seconds to not disturb other
> > ongoing I/O. This is repeated until it walks through every allocation group
> > then the iitableinitd thread is stopped.
> This will slow down e2fsck speed that is gained from uninitialized
> italbe. Am I missing something? What about having another block group
> flag to tell itable that is just zeroed but not used, from itable that
> is already in use?
> 

Hi,

this is probably my bad. I should have used term "zeroed inode table" instead
of "initialized inode table". You see, there are two flags.

* EXT4_BG_INODE_UNINIT tells us that inode BITMAP was not used yet, thus
  no inode was allocated from that group just yet, so kernel need not to
  read this bitmap from the disk and rather construct fresh (zeroed)
  inode bitmap in memory (see ext4_init_inode_bitmap).

* EXT4_BG_INODE_ZEROED tells us whether or not inode TABLE was
  zeroed out. This is the flag which is set by the mkfs when
  lazy_itable_init extended option is set. This flag was not used
  for anything useful in kernel, nor e2fsck until now.

Se when the thread is done zeroing the inode table it sets the
EXT4_BG_INODE_ZEROED flag, but leaves EXT4_BG_INODE_UNINIT as is, so
e2fsck should not be any slower.

I hope it helped.

Regards.
-Lukas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Add inode table initialization code into Ext4
  2010-08-26 12:14     ` Lukas Czerner
@ 2010-08-26 14:15       ` Peng Tao
  0 siblings, 0 replies; 12+ messages in thread
From: Peng Tao @ 2010-08-26 14:15 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-ext4, rwheeler, sandeen, jack, tytso, adilger

Hi, Lukas,

On Thu, Aug 26, 2010 at 8:14 PM, Lukas Czerner <lczerner@redhat.com> wrote:
> On Thu, 26 Aug 2010, Peng Tao wrote:
>
>> Hi, all,
>>
>> On Sat, Aug 21, 2010 at 1:51 AM, Lukas Czerner <lczerner@redhat.com> wrote:
>> > When lazy_itable_init extended option is passed to mke2fs, it
>> > considerably speed up filesystem creation because inode tables are left
>> > uninitialized, thus contains some old data. When this fs is mounted
>> > filesystem code should initialize (zero out) uninitialized inode table.
>> > So far this code was missing for ext4 and this patch adds this feature.
>> >
>> > When file system is mounted with "inititable" mount option, new thread
>> > (called itableinitd) is created. This thread walks through allocation
>> > groups searching for the group with not yet initialized inode table.
>> > When such a group is found it write zeroes through whole inode table and
>> > put itself into sleep for defined number of seconds to not disturb other
>> > ongoing I/O. This is repeated until it walks through every allocation group
>> > then the iitableinitd thread is stopped.
>> This will slow down e2fsck speed that is gained from uninitialized
>> italbe. Am I missing something? What about having another block group
>> flag to tell itable that is just zeroed but not used, from itable that
>> is already in use?
>>
>
> Hi,
>
> this is probably my bad. I should have used term "zeroed inode table" instead
> of "initialized inode table". You see, there are two flags.
>
> * EXT4_BG_INODE_UNINIT tells us that inode BITMAP was not used yet, thus
>  no inode was allocated from that group just yet, so kernel need not to
>  read this bitmap from the disk and rather construct fresh (zeroed)
>  inode bitmap in memory (see ext4_init_inode_bitmap).
>
> * EXT4_BG_INODE_ZEROED tells us whether or not inode TABLE was
>  zeroed out. This is the flag which is set by the mkfs when
>  lazy_itable_init extended option is set. This flag was not used
>  for anything useful in kernel, nor e2fsck until now.
>
> Se when the thread is done zeroing the inode table it sets the
> EXT4_BG_INODE_ZEROED flag, but leaves EXT4_BG_INODE_UNINIT as is, so
> e2fsck should not be any slower.
I see it. Thank you very much for the explanation.



-- 
Thanks,
-Bergwolf
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-08-26 14:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-20 17:51 [PATCH 0/3] Lazy itable initialization for Ext4 Lukas Czerner
2010-08-20 17:51 ` [PATCH 1/3] Add helper function for blkdev_issue_zeroout Lukas Czerner
2010-08-20 17:51 ` [PATCH 2/3] Add inititable/noinititable mount options for ext4 Lukas Czerner
2010-08-20 17:51 ` [PATCH 3/3] Add inode table initialization code into Ext4 Lukas Czerner
2010-08-23 22:59   ` Andreas Dilger
2010-08-24  9:01     ` Lukas Czerner
2010-08-25  6:08       ` Andreas Dilger
2010-08-25 13:00         ` Lukas Czerner
2010-08-24 12:38     ` Ric Wheeler
2010-08-26 11:03   ` Peng Tao
2010-08-26 12:14     ` Lukas Czerner
2010-08-26 14:15       ` Peng Tao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).