* ext4-block-reservation.patch
@ 2007-06-19 10:29 Aneesh Kumar K.V
2007-06-19 10:42 ` ext4-block-reservation.patch Alex Tomas
2007-06-19 15:05 ` ext4-block-reservation.patch Eric Sandeen
0 siblings, 2 replies; 6+ messages in thread
From: Aneesh Kumar K.V @ 2007-06-19 10:29 UTC (permalink / raw)
To: alex, linux-ext4
Hi,
In block reservation code while rebalancing the free blocks why are we not
looking at the reservation slots that have no free blocks left. Rebalancing
the free blocks equally across all the reservation slots will make sure
we have less chances of failure later when we try to reserve blocks.
I understand that we consider the CPU slot on which reservation failed while
rebalancing. But what is preventing considering other CPU slot that might have
zero blocks left ?
+void ext4_rebalance_reservation(struct ext4_reservation_slot *rs, __u64 free)
+{
+ int i, used_slots = 0;
+ __u64 chunk;
+
+ /* let's know what slots have been used */
+ for (i = 0; i < NR_CPUS; i++)
+ if (rs[i].rs_reserved || i == smp_processor_id())
+ used_slots++;
+
+ /* chunk is a number of block every used
+ * slot will get. make sure it isn't 0 */
+ chunk = free + used_slots - 1;
+ do_div(chunk, used_slots);
+
+ for (i = 0; i < NR_CPUS; i++) {
+ if (free < chunk)
+ chunk = free;
+ if (rs[i].rs_reserved || i == smp_processor_id()) {
+ rs[i].rs_reserved = chunk;
+ free -= chunk;
+ BUG_ON(free < 0);
+ }
+ }
+ BUG_ON(free);
+}
-aneesh
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: ext4-block-reservation.patch 2007-06-19 10:29 ext4-block-reservation.patch Aneesh Kumar K.V @ 2007-06-19 10:42 ` Alex Tomas 2007-06-19 15:05 ` ext4-block-reservation.patch Eric Sandeen 1 sibling, 0 replies; 6+ messages in thread From: Alex Tomas @ 2007-06-19 10:42 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: linux-ext4 I considered situation when few CPUs get out of blocks at same time rare. thanks, Alex Aneesh Kumar K.V wrote: > Hi, > > In block reservation code while rebalancing the free blocks why are we > not looking at the reservation slots that have no free blocks left. > Rebalancing > the free blocks equally across all the reservation slots will make sure > we have less chances of failure later when we try to reserve blocks. > > I understand that we consider the CPU slot on which reservation failed > while > rebalancing. But what is preventing considering other CPU slot that > might have > zero blocks left ? > > > > > +void ext4_rebalance_reservation(struct ext4_reservation_slot *rs, __u64 > free) > +{ > + int i, used_slots = 0; > + __u64 chunk; > + > + /* let's know what slots have been used */ > + for (i = 0; i < NR_CPUS; i++) > + if (rs[i].rs_reserved || i == smp_processor_id()) > + used_slots++; > + > + /* chunk is a number of block every used > + * slot will get. make sure it isn't 0 */ > + chunk = free + used_slots - 1; > + do_div(chunk, used_slots); > + > + for (i = 0; i < NR_CPUS; i++) { > + if (free < chunk) > + chunk = free; > + if (rs[i].rs_reserved || i == smp_processor_id()) { > + rs[i].rs_reserved = chunk; > + free -= chunk; > + BUG_ON(free < 0); > + } > + } > + BUG_ON(free); > +} > > > -aneesh > - > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext4-block-reservation.patch 2007-06-19 10:29 ext4-block-reservation.patch Aneesh Kumar K.V 2007-06-19 10:42 ` ext4-block-reservation.patch Alex Tomas @ 2007-06-19 15:05 ` Eric Sandeen 2007-06-19 17:08 ` ext4-block-reservation.patch Aneesh Kumar K.V 1 sibling, 1 reply; 6+ messages in thread From: Eric Sandeen @ 2007-06-19 15:05 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: alex, linux-ext4 Aneesh Kumar K.V wrote: > Hi, > > In block reservation code while rebalancing the free blocks why are we not > looking at the reservation slots that have no free blocks left. Rebalancing > the free blocks equally across all the reservation slots will make sure > we have less chances of failure later when we try to reserve blocks. > > > I understand that we consider the CPU slot on which reservation failed while > rebalancing. But what is preventing considering other CPU slot that might have > zero blocks left ? > > > > > +void ext4_rebalance_reservation(struct ext4_reservation_slot *rs, __u64 free) > +{ > + int i, used_slots = 0; > + __u64 chunk; > + > + /* let's know what slots have been used */ > + for (i = 0; i < NR_CPUS; i++) BTW... I think you really want: + for_each_possible_cpu(i) { in this and other similar places. NR_CPUS is a config-time option that may be much more than your actual count of runtime possible CPUs... on ia64 it's 512 by default, for example. That's a lot of pointlessness on a 2, 4 or 8 cpu box :) I can whip up a proper patch for current code to send (again)... -Eric > + if (rs[i].rs_reserved || i == smp_processor_id()) > + used_slots++; > + > + /* chunk is a number of block every used > + * slot will get. make sure it isn't 0 */ > + chunk = free + used_slots - 1; > + do_div(chunk, used_slots); > + > + for (i = 0; i < NR_CPUS; i++) { > + if (free < chunk) > + chunk = free; > + if (rs[i].rs_reserved || i == smp_processor_id()) { > + rs[i].rs_reserved = chunk; > + free -= chunk; > + BUG_ON(free < 0); > + } > + } > + BUG_ON(free); > +} > > > -aneesh > - > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ext4-block-reservation.patch 2007-06-19 15:05 ` ext4-block-reservation.patch Eric Sandeen @ 2007-06-19 17:08 ` Aneesh Kumar K.V 2007-06-19 17:16 ` ext4-block-reservation.patch Aneesh Kumar K.V 2007-06-19 22:10 ` ext4-block-reservation.patch Andreas Dilger 0 siblings, 2 replies; 6+ messages in thread From: Aneesh Kumar K.V @ 2007-06-19 17:08 UTC (permalink / raw) To: Eric Sandeen; +Cc: alex, linux-ext4 [-- Attachment #1: Type: text/plain, Size: 1374 bytes --] Eric Sandeen wrote: > Aneesh Kumar K.V wrote: >> Hi, >> >> In block reservation code while rebalancing the free blocks why are we not >> looking at the reservation slots that have no free blocks left. Rebalancing >> the free blocks equally across all the reservation slots will make sure >> we have less chances of failure later when we try to reserve blocks. >> >> >> I understand that we consider the CPU slot on which reservation failed while >> rebalancing. But what is preventing considering other CPU slot that might have >> zero blocks left ? >> >> >> >> >> +void ext4_rebalance_reservation(struct ext4_reservation_slot *rs, __u64 free) >> +{ >> + int i, used_slots = 0; >> + __u64 chunk; >> + >> + /* let's know what slots have been used */ >> + for (i = 0; i < NR_CPUS; i++) > > BTW... I think you really want: > > + for_each_possible_cpu(i) { > > in this and other similar places. > > NR_CPUS is a config-time option that may be much more than your actual > count of runtime possible CPUs... on ia64 it's 512 by default, for > example. That's a lot of pointlessness on a 2, 4 or 8 cpu box :) > > I can whip up a proper patch for current code to send (again)... > This is what i have modified. I am yet to build test it. I am looking at forward porting the mballoc patches and was planning to send it together. -aneesh [-- Attachment #2: 0001-RFC-delayed-allocation-for-ext4.patch --] [-- Type: text/x-patch, Size: 8069 bytes --] >From 12a6e86df40b258c91883b4436a2e2e0d588cda7 Mon Sep 17 00:00:00 2001 From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Date: Tue, 19 Jun 2007 22:25:45 +0530 Subject: [PATCH] [RFC] delayed allocation for ext4 ext4-block-reservation.patch this is scalable free space management. every time we delay allocation of some page, a space (including metadata) should be reserved From: Alex Tomas <alex@clusterfs.com> --- fs/ext4/balloc.c | 178 +++++++++++++++++++++++++++++++++++++++++++- fs/ext4/super.c | 2 + include/linux/ext4_fs.h | 5 + include/linux/ext4_fs_sb.h | 5 + 4 files changed, 187 insertions(+), 3 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index edde262..ad3f57c 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -630,8 +630,10 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, return; } ext4_free_blocks_sb(handle, sb, block, count, &dquot_freed_blocks); - if (dquot_freed_blocks) + if (dquot_freed_blocks) { + ext4_release_blocks(sb, dquot_freed_blocks); DQUOT_FREE_BLOCK(inode, dquot_freed_blocks); + } return; } @@ -1440,7 +1442,7 @@ ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode, struct ext4_sb_info *sbi; struct ext4_reserve_window_node *my_rsv = NULL; struct ext4_block_alloc_info *block_i; - unsigned short windowsz = 0; + unsigned short windowsz = 0, reserved = 0; #ifdef EXT4FS_DEBUG static int goal_hits, goal_attempts; #endif @@ -1462,6 +1464,13 @@ ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode, return 0; } + if (!(EXT4_I(inode)->i_state & EXT4_STATE_BLOCKS_RESERVED)) { + *errp = ext4_reserve_blocks(sb, num); + if (*errp) + return 0; + reserved = num; + } + sbi = EXT4_SB(sb); es = EXT4_SB(sb)->s_es; ext4_debug("goal=%lu.\n", goal); @@ -1674,8 +1683,11 @@ out: /* * Undo the block allocation */ - if (!performed_allocation) + if (!performed_allocation) { DQUOT_FREE_BLOCK(inode, *count); + if (reserved) + ext4_release_blocks(sb, reserved); + } brelse(bitmap_bh); return 0; } @@ -1834,3 +1846,163 @@ unsigned long ext4_bg_num_gdb(struct super_block *sb, int group) return ext4_bg_num_gdb_meta(sb,group); } + +/* + * reservation.c contains routines to reserve blocks. + * we need this for delayed allocation, otherwise we + * could meet -ENOSPC at flush time + */ + +/* + * as ->commit_write() where we're going to reserve + * non-allocated-yet blocks is well known hotpath, + * we have to make it scalable and avoid global + * data as much as possible + * + * there is per-sb array + */ + +struct ext4_reservation_slot { + __u64 rs_reserved; + spinlock_t rs_lock; +} ____cacheline_aligned; + + +int ext4_reserve_local(struct super_block *sb, int blocks) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_reservation_slot *rs; + int rc = -ENOSPC; + + preempt_disable(); + rs = sbi->s_reservation_slots + smp_processor_id(); + + spin_lock(&rs->rs_lock); + if (likely(rs->rs_reserved >= blocks)) { + rs->rs_reserved -= blocks; + rc = 0; + } + spin_unlock(&rs->rs_lock); + + preempt_enable(); + return rc; +} + + +void ext4_rebalance_reservation(struct ext4_reservation_slot *rs, __u64 free) +{ + int i, used_slots = 0; + __u64 chunk; + + /* let's know what slots have been used */ + for_each_possible_cpu(i) { + if (rs[i].rs_reserved || i == smp_processor_id()) + used_slots++; + } + + /* chunk is a number of block every used + * slot will get. make sure it isn't 0 */ + chunk = free + used_slots - 1; + do_div(chunk, used_slots); + + for_each_possible_cpu(i) { + if (free < chunk) + chunk = free; + if (rs[i].rs_reserved || i == smp_processor_id()) { + rs[i].rs_reserved = chunk; + free -= chunk; + BUG_ON(free < 0); + } + } + BUG_ON(free); +} + +int ext4_reserve_global(struct super_block *sb, int blocks) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_reservation_slot *rs; + int i, rc = -ENOENT; + __u64 free = 0; + + rs = sbi->s_reservation_slots; + + /* lock all slots */ + for_each_possible_cpu(i) { + spin_lock(&rs[i].rs_lock); + free += rs[i].rs_reserved; + } + + if (free >= blocks) { + free -= blocks; + ext4_rebalance_reservation(rs, free); + rc = 0; + } + + for_each_possible_cpu(i) { + spin_unlock(&rs[i].rs_lock); + } + + return rc; +} + +int ext4_reserve_blocks(struct super_block *sb, int blocks) +{ + int ret; + + BUG_ON(blocks <= 0); + + ret = ext4_reserve_local(sb, blocks); + if (likely(ret == 0)) + return 0; + + return ext4_reserve_global(sb, blocks); +} + +void ext4_release_blocks(struct super_block *sb, int blocks) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_reservation_slot *rs; + + BUG_ON(blocks <= 0); + + preempt_disable(); + rs = sbi->s_reservation_slots + smp_processor_id(); + + spin_lock(&rs->rs_lock); + rs->rs_reserved += blocks; + spin_unlock(&rs->rs_lock); + + preempt_enable(); +} + +int ext4_reserve_init(struct super_block *sb) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_reservation_slot *rs; + int i; + + rs = percpu_alloc(sizeof(struct ext4_reservation_slot), GFP_KERNEL); + if (rs == NULL) + return -ENOMEM; + sbi->s_reservation_slots = rs; + + for_each_possible_cpu(i) { + spin_lock_init(&rs[i].rs_lock); + rs[i].rs_reserved = 0; + } + rs[0].rs_reserved = percpu_counter_sum(&sbi->s_freeblocks_counter); + + return 0; +} + +void ext4_reserve_release(struct super_block *sb) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_reservation_slot *rs; + + rs = sbi->s_reservation_slots; + BUG_ON(sbi->s_reservation_slots == NULL); + kfree(sbi->s_reservation_slots); + sbi->s_reservation_slots = NULL; +} + diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 9013018..8abd919 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -439,6 +439,7 @@ static void ext4_put_super (struct super_block * sb) struct ext4_super_block *es = sbi->s_es; int i; + ext4_reserve_release(sb); ext4_ext_release(sb); ext4_xattr_put_super(sb); jbd2_journal_destroy(sbi->s_journal); @@ -1917,6 +1918,7 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent) "writeback"); ext4_ext_init(sb); + ext4_reserve_init(sb); lock_kernel(); return 0; diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h index 009dccf..fac942a 100644 --- a/include/linux/ext4_fs.h +++ b/include/linux/ext4_fs.h @@ -203,6 +203,7 @@ struct ext4_group_desc #define EXT4_STATE_NEW 0x00000002 /* inode is newly created */ #define EXT4_STATE_XATTR 0x00000004 /* has in-inode xattrs */ #define EXT4_STATE_NO_EXPAND 0x00000008 /* No space for expansion */ +#define EXT4_STATE_BLOCKS_RESERVED 0x00000010 /* blocks reserved */ /* Used to pass group descriptor data when online resize is done */ struct ext4_new_group_input { @@ -901,6 +902,10 @@ extern struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb, extern int ext4_should_retry_alloc(struct super_block *sb, int *retries); extern void ext4_init_block_alloc_info(struct inode *); extern void ext4_rsv_window_add(struct super_block *sb, struct ext4_reserve_window_node *rsv); +int ext4_reserve_init(struct super_block *sb); +void ext4_reserve_release(struct super_block *sb); +void ext4_release_blocks(struct super_block *sb, int blocks); +int ext4_reserve_blocks(struct super_block *sb, int blocks); /* dir.c */ extern int ext4_check_dir_entry(const char *, struct inode *, diff --git a/include/linux/ext4_fs_sb.h b/include/linux/ext4_fs_sb.h index c9dc1d7..6923f65 100644 --- a/include/linux/ext4_fs_sb.h +++ b/include/linux/ext4_fs_sb.h @@ -24,6 +24,8 @@ #endif #include <linux/rbtree.h> +struct ext4_reservation_slot; + /* * third extended-fs super-block data in memory */ @@ -65,6 +67,9 @@ struct ext4_sb_info { struct rb_root s_rsv_window_root; struct ext4_reserve_window_node s_rsv_window_head; + /* global reservation structures */ + struct ext4_reservation_slot *s_reservation_slots; + /* Journaling */ struct inode * s_journal_inode; struct journal_s * s_journal; -- 1.5.2.2.238.g7cbf2f2-dirty ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: ext4-block-reservation.patch 2007-06-19 17:08 ` ext4-block-reservation.patch Aneesh Kumar K.V @ 2007-06-19 17:16 ` Aneesh Kumar K.V 2007-06-19 22:10 ` ext4-block-reservation.patch Andreas Dilger 1 sibling, 0 replies; 6+ messages in thread From: Aneesh Kumar K.V @ 2007-06-19 17:16 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: Eric Sandeen, alex, linux-ext4 [-- Attachment #1: Type: text/plain, Size: 1559 bytes --] Aneesh Kumar K.V wrote: > > > Eric Sandeen wrote: >> Aneesh Kumar K.V wrote: >>> Hi, >>> >>> In block reservation code while rebalancing the free blocks why are >>> we not looking at the reservation slots that have no free blocks >>> left. Rebalancing >>> the free blocks equally across all the reservation slots will make sure >>> we have less chances of failure later when we try to reserve blocks. >>> >>> I understand that we consider the CPU slot on which reservation >>> failed while >>> rebalancing. But what is preventing considering other CPU slot that >>> might have >>> zero blocks left ? >>> >>> >>> >>> >>> +void ext4_rebalance_reservation(struct ext4_reservation_slot *rs, >>> __u64 free) >>> +{ >>> + int i, used_slots = 0; >>> + __u64 chunk; >>> + >>> + /* let's know what slots have been used */ >>> + for (i = 0; i < NR_CPUS; i++) >> >> BTW... I think you really want: >> >> + for_each_possible_cpu(i) { >> >> in this and other similar places. >> >> NR_CPUS is a config-time option that may be much more than your actual >> count of runtime possible CPUs... on ia64 it's 512 by default, for >> example. That's a lot of pointlessness on a 2, 4 or 8 cpu box :) >> >> I can whip up a proper patch for current code to send (again)... >> > > This is what i have modified. I am yet to build test it. I am looking at > forward porting the > mballoc patches and was planning to send it together. > I build tested the patch. I guess we need the below small patch on top of the previous patch. -aneesh [-- Attachment #2: k --] [-- Type: text/plain, Size: 432 bytes --] diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index ad3f57c..df6b83c 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -1981,7 +1981,7 @@ int ext4_reserve_init(struct super_block *sb) struct ext4_reservation_slot *rs; int i; - rs = percpu_alloc(sizeof(struct ext4_reservation_slot), GFP_KERNEL); + rs = alloc_percpu(struct ext4_reservation_slot); if (rs == NULL) return -ENOMEM; sbi->s_reservation_slots = rs; ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: ext4-block-reservation.patch 2007-06-19 17:08 ` ext4-block-reservation.patch Aneesh Kumar K.V 2007-06-19 17:16 ` ext4-block-reservation.patch Aneesh Kumar K.V @ 2007-06-19 22:10 ` Andreas Dilger 1 sibling, 0 replies; 6+ messages in thread From: Andreas Dilger @ 2007-06-19 22:10 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: Alex Tomas, linux-ext4, Eric Sandeen On Jun 19, 2007 22:38 +0530, Aneesh Kumar K.V wrote: > This is what i have modified. I am yet to build test it. I am looking at > forward porting the > mballoc patches and was planning to send it together. > +int ext4_reserve_local(struct super_block *sb, int blocks) > +{ > + preempt_disable(); > + rs = sbi->s_reservation_slots + smp_processor_id(); Should this be instead "rs = sbi->s_reservation_slots + get_cpu()" > + spin_lock(&rs->rs_lock); > + if (likely(rs->rs_reserved >= blocks)) { > + rs->rs_reserved -= blocks; > + rc = 0; > + } > + spin_unlock(&rs->rs_lock); > + > + preempt_enable(); And "put_cpu()" here? > +void ext4_rebalance_reservation(struct ext4_reservation_slot *rs, __u64 free) > +{ > + /* chunk is a number of block every used > + * slot will get. make sure it isn't 0 */ > + chunk = free + used_slots - 1; > + do_div(chunk, used_slots); > + > + for_each_possible_cpu(i) { > + if (free < chunk) > + chunk = free; > + if (rs[i].rs_reserved || i == smp_processor_id()) { > + rs[i].rs_reserved = chunk; > + free -= chunk; > + BUG_ON(free < 0); > + } > + } Should we be assigning reservations to offline CPUs? Doesn't it make sense to assign 0 reservation to offline CPUs until they come back? In the first loop, if it is "for_each_possible_cpu()" it would drop reservations from offline CPUs, and then the bottom one is "for_each_online_cpu()". Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-06-19 22:10 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-06-19 10:29 ext4-block-reservation.patch Aneesh Kumar K.V 2007-06-19 10:42 ` ext4-block-reservation.patch Alex Tomas 2007-06-19 15:05 ` ext4-block-reservation.patch Eric Sandeen 2007-06-19 17:08 ` ext4-block-reservation.patch Aneesh Kumar K.V 2007-06-19 17:16 ` ext4-block-reservation.patch Aneesh Kumar K.V 2007-06-19 22:10 ` ext4-block-reservation.patch Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox