* [PATCH 0/5] mballoc: trivial code cleanup @ 2011-02-06 4:53 Coly Li 2011-02-06 8:43 ` Amir Goldstein 0 siblings, 1 reply; 7+ messages in thread From: Coly Li @ 2011-02-06 4:53 UTC (permalink / raw) To: linux-ext4; +Cc: Alex Tomas, Theodore Tso This patch set contains several trivial code cleanup to mballoc code. A basic testing is done with Linux 2.6.38-rc3+ (commit 44f2c5c8). Signed-off-by: Coly Li <bosong.ly@taobao.com> Cc: Alex Tomas <alex@clusterfs.com> Cc: Theodore Tso <tytso@google.com> --- fs/ext4/mballoc.c | 15 ++++++++++----- fs/ext4/mballoc.h | 2 +- 2 files changed, 11 insertions(+), 6 deletions(-) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/5] mballoc: trivial code cleanup 2011-02-06 4:53 [PATCH 0/5] mballoc: trivial code cleanup Coly Li @ 2011-02-06 8:43 ` Amir Goldstein 2011-02-07 17:41 ` Ted Ts'o 0 siblings, 1 reply; 7+ messages in thread From: Amir Goldstein @ 2011-02-06 8:43 UTC (permalink / raw) To: i; +Cc: linux-ext4, Alex Tomas, Theodore Tso Hi Coly, Forgive me for hijacking your thread. I cannot comment on the cleanup, as I am new to mballoc code, but I was wondering if you could offer me a piece of advise. In the current implementation (prototype) of ext4 snapshots, I have an outstanding circular lockdep warning to fix: inode->i_data_sem => grp->alloc_sem => snapshot->i_data_sem/1 => grp->alloc_sem Snapshot block allocation can be nested inside another inode block allocation, because COW of bitmap block is triggered on get_write_access() inside ext4_mb_mark_diskspace_used(). i_data_sem nested lockdep is handled by calling down_read/write_nested() in ext4_map_blocks(): https://github.com/amir73il/ext4-snapshots/blob/ext4-snapshots/fs/ext4/inode.c#L1540 In ext3/next3, handling nested i_truncate_mutex lockdep was enough, but in ext4, I need to take care of alloc_sem and lg_mutex as well. I "handled" lg_mutex by never setting EXT4_MB_HINT_DATA on snapshot files. When looking at alloc_sem, I realized that it is only needed to avoid race with adjacent group buddy initialization. Snapshots feature, however, requires that block_size == page_size, so it seems that alloc_sem can be avoided altogether. My questions are: 1. Am I missing something in my analysis? 2. Wouldn't it make sense to bypass alloc_sem at all times if block_size == page_size (or block_size*2 >= page_size), regardless of the snapshots feature? After all, a group buddy is always initialized before calling allocation routines. Thanks for your time, Amir. On Sun, Feb 6, 2011 at 6:53 AM, Coly Li <i@coly.li> wrote: > This patch set contains several trivial code cleanup to mballoc code. > A basic testing is done with Linux 2.6.38-rc3+ (commit 44f2c5c8). > > Signed-off-by: Coly Li <bosong.ly@taobao.com> > Cc: Alex Tomas <alex@clusterfs.com> > Cc: Theodore Tso <tytso@google.com> > --- > fs/ext4/mballoc.c | 15 ++++++++++----- > fs/ext4/mballoc.h | 2 +- > 2 files changed, 11 insertions(+), 6 deletions(-) > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/5] mballoc: trivial code cleanup 2011-02-06 8:43 ` Amir Goldstein @ 2011-02-07 17:41 ` Ted Ts'o 2011-02-07 20:59 ` Amir Goldstein 0 siblings, 1 reply; 7+ messages in thread From: Ted Ts'o @ 2011-02-07 17:41 UTC (permalink / raw) To: Amir Goldstein; +Cc: i, linux-ext4, Alex Tomas, Theodore Tso On Sun, Feb 06, 2011 at 10:43:58AM +0200, Amir Goldstein wrote: > When looking at alloc_sem, I realized that it is only needed to avoid > race with adjacent group buddy initialization. Actually, alloc_sem is used to protect all of the block group specific data structures; the buddy bitmap counters, adjusting the buddy bitmap itself, the largest free order in a block group, etc. So even in the case where block_size == page_size, alloc_sem is still needed! - Ted ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/5] mballoc: trivial code cleanup 2011-02-07 17:41 ` Ted Ts'o @ 2011-02-07 20:59 ` Amir Goldstein 2011-02-07 22:24 ` Ted Ts'o 0 siblings, 1 reply; 7+ messages in thread From: Amir Goldstein @ 2011-02-07 20:59 UTC (permalink / raw) To: Ted Ts'o; +Cc: i, Ext4 Developers List On Mon, Feb 7, 2011 at 7:41 PM, Ted Ts'o <tytso@mit.edu> wrote: > On Sun, Feb 06, 2011 at 10:43:58AM +0200, Amir Goldstein wrote: >> When looking at alloc_sem, I realized that it is only needed to avoid >> race with adjacent group buddy initialization. > > Actually, alloc_sem is used to protect all of the block group specific > data structures; the buddy bitmap counters, adjusting the buddy bitmap > itself, the largest free order in a block group, etc. So even in the > case where block_size == page_size, alloc_sem is still needed! > This was my assumption by the name of the lock itself, but when I searched where down_write(alloc_sem) is called, I saw that it is called by 3 functions: mb_init_group - for lazy init of buddy cache, once per mount per blockgroup, before any alloc routines can access that blockgroup. add_groupblocks - for online resize, even before lazy init of buddy cache. init_inode_table - for lazy init of inode table, once per mkfs, per blockgroup. it seems to me like down_write(alloc_sem) may be taken in mb_init_group() *only* in case EXT4_MB_GRP_NEED_INIT(grp) and that all down_read(alloc_sem) calls in mballoc.c serve no purpose when block_size == page_size. I concluded that from the comment in mb_load_buddy(): /* Take the read lock on the group alloc * sem. This would make sure a parallel * ext4_mb_init_group happening on other * groups mapped by the page is blocked * till we are done with allocation */ it says alloc_sem protects against lazy init of adjacent groups and says nothing about protecting block group specific data structures... what am I missing??? what am I missing here??? -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/5] mballoc: trivial code cleanup 2011-02-07 20:59 ` Amir Goldstein @ 2011-02-07 22:24 ` Ted Ts'o 2011-02-08 8:54 ` Amir Goldstein 0 siblings, 1 reply; 7+ messages in thread From: Ted Ts'o @ 2011-02-07 22:24 UTC (permalink / raw) To: Amir Goldstein; +Cc: i, Ext4 Developers List On Mon, Feb 07, 2011 at 10:59:54PM +0200, Amir Goldstein wrote: > it says alloc_sem protects against lazy init of adjacent groups > and says nothing about protecting block group specific data structures... > > what am I missing??? You're missing ext4_mb_load_buddy(), which takes grp->alloc_sem, and which is released by ext4_mb_unload_buddy(). No, it's not the most obvious code in the world... - Ted ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/5] mballoc: trivial code cleanup 2011-02-07 22:24 ` Ted Ts'o @ 2011-02-08 8:54 ` Amir Goldstein 2011-02-09 9:09 ` Amir Goldstein 0 siblings, 1 reply; 7+ messages in thread From: Amir Goldstein @ 2011-02-08 8:54 UTC (permalink / raw) To: Ted Ts'o; +Cc: i, Ext4 Developers List On Tue, Feb 8, 2011 at 12:24 AM, Ted Ts'o <tytso@mit.edu> wrote: > On Mon, Feb 07, 2011 at 10:59:54PM +0200, Amir Goldstein wrote: >> it says alloc_sem protects against lazy init of adjacent groups >> and says nothing about protecting block group specific data structures... >> >> what am I missing??? > > You're missing ext4_mb_load_buddy(), which takes grp->alloc_sem, and > which is released by ext4_mb_unload_buddy(). No, it's not the most > obvious code in the world... > OK Ted, you leave me no choice... I need to paste the code of mb_load_buddy(): 1157 e4b->alloc_semp = &grp->alloc_sem; 1158 1159 /* Take the read lock on the group alloc 1160 * sem. This would make sure a parallel 1161 * ext4_mb_init_group happening on other 1162 * groups mapped by the page is blocked 1163 * till we are done with allocation 1164 */ 1165repeat_load_buddy: 1166 down_read(e4b->alloc_semp); 1167 1168 if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { 1169 /* we need to check for group need init flag 1170 * with alloc_semp held so that we can be sure 1171 * that new blocks didn't get added to the group 1172 * when we are loading the buddy cache 1173 */ 1174 up_read(e4b->alloc_semp); 1175 /* 1176 * we need full data about the group 1177 * to make a good selection 1178 */ 1179 ret = ext4_mb_init_group(sb, group); 1180 if (ret) 1181 return ret; 1182 goto repeat_load_buddy; 1183 } 1184 ext4_mb_load_buddy() *only* takes down_read(grp->alloc_sem), except for the first time after mount, in which ext4_mb_init_group() takes down_write(grp->alloc_sem), releases it, and then repeat_load_buddy label will re-take down_read(grp->alloc_sem). Essentially, this means that after time Ti(group), all users take only read access to grp->alloc_sem, which is kind of futile... Your statement that alloc_sem is needed certainly makes sense, but I just don't see it in the code. As un-obvious as the code may be, you cannot protect data structures without anyone taking write access to the semaphore on allocation routines. Also, I believe that buddy data structures are modified in ext4_mb_generate_buddy() under the protection of ext4_lock_group(). So at the risk of having to buy you a beer on LFS I will repeat my nagging question: What am I missing??? Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/5] mballoc: trivial code cleanup 2011-02-08 8:54 ` Amir Goldstein @ 2011-02-09 9:09 ` Amir Goldstein 0 siblings, 0 replies; 7+ messages in thread From: Amir Goldstein @ 2011-02-09 9:09 UTC (permalink / raw) To: Ted Ts'o, Aneesh Kumar K.V; +Cc: i, Ext4 Developers List On Tue, Feb 8, 2011 at 10:54 AM, Amir Goldstein <amir73il@gmail.com> wrote: > On Tue, Feb 8, 2011 at 12:24 AM, Ted Ts'o <tytso@mit.edu> wrote: >> On Mon, Feb 07, 2011 at 10:59:54PM +0200, Amir Goldstein wrote: >>> it says alloc_sem protects against lazy init of adjacent groups >>> and says nothing about protecting block group specific data structures... >>> >>> what am I missing??? >> >> You're missing ext4_mb_load_buddy(), which takes grp->alloc_sem, and >> which is released by ext4_mb_unload_buddy(). No, it's not the most >> obvious code in the world... >> > > OK Ted, you leave me no choice... I need to paste the code of mb_load_buddy(): > > 1157 e4b->alloc_semp = &grp->alloc_sem; > 1158 > 1159 /* Take the read lock on the group alloc > 1160 * sem. This would make sure a parallel > 1161 * ext4_mb_init_group happening on other > 1162 * groups mapped by the page is blocked > 1163 * till we are done with allocation > 1164 */ > 1165repeat_load_buddy: > 1166 down_read(e4b->alloc_semp); > 1167 > 1168 if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { > 1169 /* we need to check for group need init flag > 1170 * with alloc_semp held so that we can be sure > 1171 * that new blocks didn't get added to the group > 1172 * when we are loading the buddy cache > 1173 */ > 1174 up_read(e4b->alloc_semp); > 1175 /* > 1176 * we need full data about the group > 1177 * to make a good selection > 1178 */ > 1179 ret = ext4_mb_init_group(sb, group); > 1180 if (ret) > 1181 return ret; > 1182 goto repeat_load_buddy; > 1183 } > 1184 > > ext4_mb_load_buddy() *only* takes down_read(grp->alloc_sem), > except for the first time after mount, in which ext4_mb_init_group() takes > down_write(grp->alloc_sem), releases it, and then repeat_load_buddy label > will re-take down_read(grp->alloc_sem). > > Essentially, this means that after time Ti(group), all users take only read > access to grp->alloc_sem, which is kind of futile... > > Your statement that alloc_sem is needed certainly makes sense, but I just don't > see it in the code. > As un-obvious as the code may be, you cannot protect data structures > without anyone taking write access to the semaphore on allocation routines. > Also, I believe that buddy data structures are modified in > ext4_mb_generate_buddy() > under the protection of ext4_lock_group(). > > So at the risk of having to buy you a beer on LFS I will repeat my > nagging question: > What am I missing??? > I found out what I was missing (it's in the comment in line 1169 above). I wrongly assumed the EXT4_GROUP_INFO_NEED_INIT_BIT is set only once in a lifetime of an ext4_group_info, but I was wrong. It may also be set when adding blocks to an existing group from ext4_group_extend(). Still, I think that the use cases in which down_read(alloc_sem) is needed are very unlikely() and can be covered with the following check: if (blocks_per_page > 2 || group == sbi->s_groups_count - 1) /* Synchronize init of adjacent group and adding of blocks to last group */ e4b->alloc_semp = &grp->alloc_sem; else e4b->alloc_semp = NULL; I will post a patch for review. Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-02-09 9:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-06 4:53 [PATCH 0/5] mballoc: trivial code cleanup Coly Li 2011-02-06 8:43 ` Amir Goldstein 2011-02-07 17:41 ` Ted Ts'o 2011-02-07 20:59 ` Amir Goldstein 2011-02-07 22:24 ` Ted Ts'o 2011-02-08 8:54 ` Amir Goldstein 2011-02-09 9:09 ` Amir Goldstein
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).