* [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl. @ 2010-11-03 11:02 Tristan Ye 2010-11-03 11:02 ` [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' " Tristan Ye 2010-11-04 1:18 ` [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' " Joel Becker 0 siblings, 2 replies; 13+ messages in thread From: Tristan Ye @ 2010-11-03 11:02 UTC (permalink / raw) To: ocfs2-devel The new code is dedicated to calculate free inodes number of all inode_allocs, then return the info to userpace in terms of an array. Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent' from userspace, is now going to be involved. setting the flag on means no cluster coherency considered, usually, userspace tools choose none-coherency strategy by default for the sake of performace. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> --- fs/ocfs2/ioctl.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/ocfs2/ocfs2_ioctl.h | 11 ++++++ 2 files changed, 106 insertions(+), 0 deletions(-) diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c index 7a48681..a60eaec 100644 --- a/fs/ocfs2/ioctl.c +++ b/fs/ocfs2/ioctl.c @@ -23,6 +23,7 @@ #include "ioctl.h" #include "resize.h" #include "refcounttree.h" +#include "sysfile.h" #include <linux/ext2_fs.h> @@ -299,6 +300,96 @@ bail: return status; } +int ocfs2_info_scan_inode_alloc(struct inode *inode_alloc, + struct ocfs2_info_freeinode *fi, __u32 slot) +{ + int status = 0, unlock = 0; + + struct buffer_head *bh = NULL; + struct ocfs2_dinode *dinode_alloc = NULL; + + mutex_lock(&inode_alloc->i_mutex); + + if (!(fi->ifi_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) { + status = ocfs2_inode_lock(inode_alloc, &bh, 0); + if (status < 0) { + mlog_errno(status); + goto bail; + } + unlock = 1; + + } else { + status = ocfs2_read_inode_block(inode_alloc, &bh); + if (status < 0) { + mlog_errno(status); + goto bail; + } + } + + dinode_alloc = (struct ocfs2_dinode *)bh->b_data; + + fi->ifi_stat[slot].lfi_total = + le32_to_cpu(dinode_alloc->id1.bitmap1.i_total); + fi->ifi_stat[slot].lfi_free = + le32_to_cpu(dinode_alloc->id1.bitmap1.i_total) - + le32_to_cpu(dinode_alloc->id1.bitmap1.i_used); +bail: + if (unlock) + ocfs2_inode_unlock(inode_alloc, 0); + + mutex_unlock(&inode_alloc->i_mutex); + + iput(inode_alloc); + brelse(bh); + + mlog_exit(status); + return status; +} + +int ocfs2_info_handle_freeinode(struct inode *inode, + struct ocfs2_info_request __user *req) +{ + u32 i; + int status = -EFAULT; + struct ocfs2_info_freeinode oifi; + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); + struct inode *inode_alloc = NULL; + + if (o2info_from_user(oifi, req)) + goto bail; + + oifi.ifi_slotnum = osb->max_slots; + + for (i = 0; i < oifi.ifi_slotnum; i++) { + inode_alloc = + ocfs2_get_system_file_inode(osb, + INODE_ALLOC_SYSTEM_INODE, + i); + if (!inode_alloc) { + mlog(ML_ERROR, "unable to get alloc inode in " + "slot %u\n", i); + status = -EIO; + goto bail; + } + + status = ocfs2_info_scan_inode_alloc(inode_alloc, &oifi, i); + if (status < 0) + goto bail; + } + + oifi.ifi_req.ir_flags |= OCFS2_INFO_FL_FILLED; + + if (o2info_to_user(oifi, req)) + goto bail; + + status = 0; +bail: + if (status) + o2info_set_request_error(oifi, req); + + return status; +} + int ocfs2_info_handle_unknown(struct inode *inode, struct ocfs2_info_request __user *req) { @@ -370,6 +461,10 @@ int ocfs2_info_handle_request(struct inode *inode, if (oir.ir_size == sizeof(struct ocfs2_info_journal_size)) status = ocfs2_info_handle_journal_size(inode, req); break; + case OCFS2_INFO_FREEINODE: + if (oir.ir_size == sizeof(struct ocfs2_info_freeinode)) + status = ocfs2_info_handle_freeinode(inode, req); + break; default: status = ocfs2_info_handle_unknown(inode, req); break; diff --git a/fs/ocfs2/ocfs2_ioctl.h b/fs/ocfs2/ocfs2_ioctl.h index b46f39b..6b4b39a 100644 --- a/fs/ocfs2/ocfs2_ioctl.h +++ b/fs/ocfs2/ocfs2_ioctl.h @@ -142,6 +142,16 @@ struct ocfs2_info_journal_size { __u64 ij_journal_size; }; +struct ocfs2_info_freeinode { + struct ocfs2_info_request ifi_req; + struct ocfs2_info_local_freeinode { + __u64 lfi_total; + __u64 lfi_free; + } ifi_stat[OCFS2_MAX_SLOTS]; + __u32 ifi_slotnum; /* out */ + __u32 ifi_pad; +}; + /* Codes for ocfs2_info_request */ enum ocfs2_info_type { OCFS2_INFO_CLUSTERSIZE = 1, @@ -151,6 +161,7 @@ enum ocfs2_info_type { OCFS2_INFO_UUID, OCFS2_INFO_FS_FEATURES, OCFS2_INFO_JOURNAL_SIZE, + OCFS2_INFO_FREEINODE, OCFS2_INFO_NUM_TYPES }; -- 1.5.5 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl. 2010-11-03 11:02 [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl Tristan Ye @ 2010-11-03 11:02 ` Tristan Ye 2010-11-04 1:29 ` Joel Becker 2010-11-04 1:18 ` [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' " Joel Becker 1 sibling, 1 reply; 13+ messages in thread From: Tristan Ye @ 2010-11-03 11:02 UTC (permalink / raw) To: ocfs2-devel This new code is a bit more complicated than former ones, the goal is to show user all statistics required to take a deep insight into filesystem on how the disk is being fragmentaed. The goal is achieved by scaning global bitmap from (cluster)group to group to figure out following factors in the filesystem: - How many free chunks in a fixed size as user requested. - How many real free chunks in all size. - Min/Max/Avg size(in) clusters of free chunks. - How do free chunks distribute(in size) in terms of a histogram, just like following: --------------------------------------------------------- Extent Size Range : Free extents Free Clusters Percent 32K... 64K- : 1 1 0.00% 1M... 2M- : 9 288 0.03% 8M... 16M- : 2 831 0.09% 32M... 64M- : 1 2047 0.23% 128M... 256M- : 1 8191 0.92% 256M... 512M- : 2 21706 2.43% 512M... 1024M- : 27 858623 96.29% --------------------------------------------------------- Userspace ioctl() call eventually gets the above info returned by passing a 'struct ocfs2_info_freefrag' with the chunk_size being specified first. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> --- fs/ocfs2/ioctl.c | 249 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/ocfs2/ocfs2_ioctl.h | 23 +++++ 2 files changed, 272 insertions(+), 0 deletions(-) diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c index a60eaec..51ccf24 100644 --- a/fs/ocfs2/ioctl.c +++ b/fs/ocfs2/ioctl.c @@ -24,6 +24,8 @@ #include "resize.h" #include "refcounttree.h" #include "sysfile.h" +#include "buffer_head_io.h" +#include "suballoc.h" #include <linux/ext2_fs.h> @@ -390,6 +392,249 @@ bail: return status; } +void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg, __u32 chunksize) +{ + int index; + + index = __ilog2_u32(chunksize); + if (index >= OCFS2_INFO_MAX_HIST) + index = OCFS2_INFO_MAX_HIST - 1; + + ffg->iff_ffs.ffs_fc_hist.fc_chunks[index]++; + ffg->iff_ffs.ffs_fc_hist.fc_clusters[index] += chunksize; + + if (chunksize > ffg->iff_ffs.ffs_max) + ffg->iff_ffs.ffs_max = chunksize; + + if (chunksize < ffg->iff_ffs.ffs_min) + ffg->iff_ffs.ffs_min = chunksize; + + ffg->iff_ffs.ffs_avg += chunksize; + ffg->iff_ffs.ffs_free_chunks_real++; +} + +int ocfs2_info_freefrag_scan_chain(struct inode *gb_inode, + struct ocfs2_dinode *gb_dinode, + struct ocfs2_chain_rec *rec, + struct ocfs2_info_freefrag *ffg, + __u32 chunks_in_group) +{ + int status = 0, used; + u64 blkno; + + struct buffer_head *bh = NULL; + struct ocfs2_group_desc *bg = NULL; + + unsigned int max_bits, num_clusters; + unsigned int offset = 0, cluster, chunk; + unsigned int chunk_free, last_chunksize = 0; + + if (!le32_to_cpu(rec->c_free)) + goto bail; + + do { + if (!bg) + blkno = le64_to_cpu(rec->c_blkno); + else + blkno = le64_to_cpu(bg->bg_next_group); + + if (bh) { + brelse(bh); + bh = NULL; + } + + status = ocfs2_read_group_descriptor(gb_inode, gb_dinode, + blkno, &bh); + if (status < 0) { + mlog(ML_ERROR, "Can't read the group descriptor # " + "%llu from device.", (unsigned long long)blkno); + status = -EIO; + goto bail; + } + + bg = (struct ocfs2_group_desc *)bh->b_data; + + if (!le16_to_cpu(bg->bg_free_bits_count)) + continue; + + max_bits = le16_to_cpu(bg->bg_bits); + offset = 0; + + for (chunk = 0; chunk < chunks_in_group; chunk++) { + /* + * last chunk may be not an entire one. + */ + if ((offset + ffg->iff_chunksize) > max_bits) + num_clusters = max_bits - offset; + else + num_clusters = ffg->iff_chunksize; + + chunk_free = 0; + for (cluster = 0; cluster < num_clusters; cluster++) { + used = ocfs2_test_bit(offset, + (unsigned long *)bg->bg_bitmap); + /* + * - chunk_free counts free clusters in #N chunk. + * - last_chunksize records the size(in) clusters + * for the last real free chunk being counted. + */ + if (!used) { + last_chunksize++; + chunk_free++; + } + + if (used && last_chunksize) { + ocfs2_info_update_ffg(ffg, + last_chunksize); + last_chunksize = 0; + } + + offset++; + } + + if (chunk_free == ffg->iff_chunksize) + ffg->iff_ffs.ffs_free_chunks++; + } + + /* + * need to update the info for last free chunk. + */ + if (last_chunksize) + ocfs2_info_update_ffg(ffg, last_chunksize); + + } while (le64_to_cpu(bg->bg_next_group)); + +bail: + brelse(bh); + + mlog_exit(status); + return status; +} + +int ocfs2_info_freefrag_scan_bitmap(struct inode *gb_inode, + struct ocfs2_info_freefrag *ffg) +{ + __u32 chunks_in_group; + int status = 0, unlock = 0, i; + + struct buffer_head *bh = NULL; + struct ocfs2_chain_list *cl = NULL; + struct ocfs2_chain_rec *rec = NULL; + struct ocfs2_dinode *gb_dinode = NULL; + + mutex_lock(&gb_inode->i_mutex); + + if (!(ffg->iff_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) { + status = ocfs2_inode_lock(gb_inode, &bh, 0); + if (status < 0) { + mlog_errno(status); + goto bail; + } + unlock = 1; + } else { + status = ocfs2_read_inode_block(gb_inode, &bh); + if (status < 0) { + mlog_errno(status); + goto bail; + } + } + + gb_dinode = (struct ocfs2_dinode *)bh->b_data; + cl = &(gb_dinode->id2.i_chain); + + /* + * Chunksize(in) clusters from userspace should be + * less than clusters in a group. + */ + if (ffg->iff_chunksize > le16_to_cpu(cl->cl_cpg)) { + status = -EINVAL; + goto bail; + } + + memset(&ffg->iff_ffs, 0, sizeof(struct ocfs2_info_freefrag_stats)); + + ffg->iff_ffs.ffs_min = ~0U; + ffg->iff_ffs.ffs_clusters = + le32_to_cpu(gb_dinode->id1.bitmap1.i_total); + ffg->iff_ffs.ffs_free_clusters = ffg->iff_ffs.ffs_clusters - + le32_to_cpu(gb_dinode->id1.bitmap1.i_used); + + chunks_in_group = le16_to_cpu(cl->cl_cpg) / ffg->iff_chunksize + 1; + + for (i = 0; i < le16_to_cpu(cl->cl_next_free_rec); i++) { + rec = &(cl->cl_recs[i]); + status = ocfs2_info_freefrag_scan_chain(gb_inode, gb_dinode, + rec, ffg, + chunks_in_group); + if (status) + goto bail; + } + + if (ffg->iff_ffs.ffs_free_chunks_real) + ffg->iff_ffs.ffs_avg = (ffg->iff_ffs.ffs_avg / + ffg->iff_ffs.ffs_free_chunks_real); +bail: + if (unlock) + ocfs2_inode_unlock(gb_inode, 0); + + mutex_unlock(&gb_inode->i_mutex); + + iput(gb_inode); + brelse(bh); + + mlog_exit(status); + return status; +} + +int ocfs2_info_handle_freefrag(struct inode *inode, + struct ocfs2_info_request __user *req) +{ + int status = -EFAULT; + + struct ocfs2_info_freefrag oiff; + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); + struct inode *gb_inode = NULL; + + if (o2info_from_user(oiff, req)) + goto bail; + /* + * chunksize from userspace should be power of 2. + */ + if ((oiff.iff_chunksize & (oiff.iff_chunksize - 1)) || + (!oiff.iff_chunksize)) { + status = -EINVAL; + goto bail; + } + + memset(&oiff.iff_ffs, 0, sizeof(struct ocfs2_info_freefrag_stats)); + oiff.iff_ffs.ffs_min = ~0U; + + gb_inode = ocfs2_get_system_file_inode(osb, + GLOBAL_BITMAP_SYSTEM_INODE, + OCFS2_INVALID_SLOT); + if (!gb_inode) { + mlog(ML_ERROR, "unable to get global_bitmap inode\n"); + status = -EIO; + goto bail; + } + + status = ocfs2_info_freefrag_scan_bitmap(gb_inode, &oiff); + if (status < 0) + goto bail; + + oiff.iff_req.ir_flags |= OCFS2_INFO_FL_FILLED; + + if (o2info_to_user(oiff, req)) + goto bail; + + status = 0; +bail: + if (status) + o2info_set_request_error(oiff, req); + + return status; +} + int ocfs2_info_handle_unknown(struct inode *inode, struct ocfs2_info_request __user *req) { @@ -465,6 +710,10 @@ int ocfs2_info_handle_request(struct inode *inode, if (oir.ir_size == sizeof(struct ocfs2_info_freeinode)) status = ocfs2_info_handle_freeinode(inode, req); break; + case OCFS2_INFO_FREEFRAG: + if (oir.ir_size == sizeof(struct ocfs2_info_freefrag)) + status = ocfs2_info_handle_freefrag(inode, req); + break; default: status = ocfs2_info_handle_unknown(inode, req); break; diff --git a/fs/ocfs2/ocfs2_ioctl.h b/fs/ocfs2/ocfs2_ioctl.h index 6b4b39a..18b6770 100644 --- a/fs/ocfs2/ocfs2_ioctl.h +++ b/fs/ocfs2/ocfs2_ioctl.h @@ -152,6 +152,28 @@ struct ocfs2_info_freeinode { __u32 ifi_pad; }; +#define OCFS2_INFO_MAX_HIST (32) + +struct ocfs2_info_freefrag { + struct ocfs2_info_request iff_req; + struct ocfs2_info_freefrag_stats { /* (out) */ + struct ocfs2_info_free_chunk_list { + __u32 fc_chunks[OCFS2_INFO_MAX_HIST]; + __u32 fc_clusters[OCFS2_INFO_MAX_HIST]; + } ffs_fc_hist; + __u32 ffs_clusters; + __u32 ffs_free_clusters; + __u32 ffs_free_chunks; + __u32 ffs_free_chunks_real; + __u32 ffs_min; /* Minimum free chunksize in clusters */ + __u32 ffs_max; + __u32 ffs_avg; + __u32 ffs_pad; + } iff_ffs; + __u32 iff_chunksize; /* chunksize in clusters(in) */ + __u32 iff_pad; +}; + /* Codes for ocfs2_info_request */ enum ocfs2_info_type { OCFS2_INFO_CLUSTERSIZE = 1, @@ -162,6 +184,7 @@ enum ocfs2_info_type { OCFS2_INFO_FS_FEATURES, OCFS2_INFO_JOURNAL_SIZE, OCFS2_INFO_FREEINODE, + OCFS2_INFO_FREEFRAG, OCFS2_INFO_NUM_TYPES }; -- 1.5.5 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl. 2010-11-03 11:02 ` [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' " Tristan Ye @ 2010-11-04 1:29 ` Joel Becker 2010-11-04 2:56 ` tristan 0 siblings, 1 reply; 13+ messages in thread From: Joel Becker @ 2010-11-04 1:29 UTC (permalink / raw) To: ocfs2-devel On Wed, Nov 03, 2010 at 07:02:06PM +0800, Tristan Ye wrote: > This new code is a bit more complicated than former ones, the goal is to > show user all statistics required to take a deep insight into filesystem > on how the disk is being fragmentaed. > > The goal is achieved by scaning global bitmap from (cluster)group to group > to figure out following factors in the filesystem: > > - How many free chunks in a fixed size as user requested. > - How many real free chunks in all size. Let me see if I understand the above requirements. In the same call, you are going build a list of free chunk sizes. Then you will also count how many chunks of a user-specified size. Why can't the user just take the list and look at the size they want? What do they get from specifying a size to the ioctl(2)? > +void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg, __u32 chunksize) > +{ > + int index; > + > + index = __ilog2_u32(chunksize); > + if (index >= OCFS2_INFO_MAX_HIST) > + index = OCFS2_INFO_MAX_HIST - 1; > + > + ffg->iff_ffs.ffs_fc_hist.fc_chunks[index]++; > + ffg->iff_ffs.ffs_fc_hist.fc_clusters[index] += chunksize; > + > + if (chunksize > ffg->iff_ffs.ffs_max) > + ffg->iff_ffs.ffs_max = chunksize; > + > + if (chunksize < ffg->iff_ffs.ffs_min) > + ffg->iff_ffs.ffs_min = chunksize; > + > + ffg->iff_ffs.ffs_avg += chunksize; > + ffg->iff_ffs.ffs_free_chunks_real++; > +} I'm sorry, but this is just unreadable. I get that we can only abbreviate 'free-frag-scan' so many ways, but there is no way people will understand this. I don't think renaming helps too much. These names are always going to be bad. I think we *can* reduce the long structure strings, mostly with helper functions. For example, here we could have: static void o2ffg_update_histogram(struct ocfs2_info_free_chunk_list *hist, unsigned int chunksize) { int index; index = __ilog2_u32(chunksize); if (index >= OCFS2_INFO_MAX_HIST) index = OCFS2_INFO_MAX_HIST - 1; hist->fc_chunks[index]++; hist->fc_clusters[index] += chunksize; } static void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg, unsigned int chunksize) { o2ffg_update_histogram(&ffg->iff_ffs.ffs_fc_hist, chunksize); ... } and so on. Btw, the chunksize can be an unsigned int, as the calling function uses that type and not the more specific __u32. > +int ocfs2_info_freefrag_scan_bitmap(struct inode *gb_inode, > + struct ocfs2_info_freefrag *ffg) > +{ > + __u32 chunks_in_group; > + int status = 0, unlock = 0, i; > + > + struct buffer_head *bh = NULL; > + struct ocfs2_chain_list *cl = NULL; > + struct ocfs2_chain_rec *rec = NULL; > + struct ocfs2_dinode *gb_dinode = NULL; > + > + mutex_lock(&gb_inode->i_mutex); > + > + if (!(ffg->iff_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) { > + status = ocfs2_inode_lock(gb_inode, &bh, 0); > + if (status < 0) { > + mlog_errno(status); > + goto bail; > + } > + unlock = 1; > + } else { > + status = ocfs2_read_inode_block(gb_inode, &bh); > + if (status < 0) { > + mlog_errno(status); > + goto bail; > + } > + } Same thoughts about ocfs2_ilookup() here. Joel -- Life's Little Instruction Book #347 "Never waste the oppourtunity to tell someone you love them." Joel Becker Senior Development Manager Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl. 2010-11-04 1:29 ` Joel Becker @ 2010-11-04 2:56 ` tristan 2010-11-04 4:47 ` tristan 2010-11-04 6:32 ` Joel Becker 0 siblings, 2 replies; 13+ messages in thread From: tristan @ 2010-11-04 2:56 UTC (permalink / raw) To: ocfs2-devel Joel Becker wrote: > On Wed, Nov 03, 2010 at 07:02:06PM +0800, Tristan Ye wrote: >> This new code is a bit more complicated than former ones, the goal is to >> show user all statistics required to take a deep insight into filesystem >> on how the disk is being fragmentaed. >> >> The goal is achieved by scaning global bitmap from (cluster)group to group >> to figure out following factors in the filesystem: >> >> - How many free chunks in a fixed size as user requested. >> - How many real free chunks in all size. > > Let me see if I understand the above requirements. In the same > call, you are going build a list of free chunk sizes. Then you will > also count how many chunks of a user-specified size. Why can't the user > just take the list and look at the size they want? What do they get > from specifying a size to the ioctl(2)? Actually, the original idea was borrowed from ext4, while it only did this in userspace however. Let me try to explain this in a detailed way, I'm not surprised that you'll be confused a little bit;-) I did maintain some lists, while they're not lists of free chunk sizes, the list was indexed by size range, see: ------------------------------------------------------------------------------------------- Extent Size Range : Free extents Free Clusters Percent 4K... 8K- : 9 9 0.00% 8K... 16K- : 12 31 0.00% 16K... 32K- : 45 249 0.03% 32K... 64K- : 376 4654 0.59% 8M... 16M- : 2 5180 0.66% 16M... 32M- : 5 27556 3.50% 32M... 64M- : 1 16087 2.04% 64M... 128M- : 24 733653 93.17% ------------------------------------------------------------------------------------------- Here I maintained two lists fc_chunks and fc_clusters, the former one records the number of chunks whose size was within the size range, while the latter one keeps the number of free clusters in that range. While the user-specified chunksize is brand-new concept here;), it splits the whole bitmap from chunk to chunk sequentially. and then to see how many of this sequential/fixed-size chunks are free. it sounds like the concept of page frame for main memory. When a user is doing I/Os in a very size-fixed fashion, they may be happy to see how many chunks were fully free there. Besides, we also keep track of the number/min_size/max_size/avg_size of free chunks in all size being counted. Goal here is to provide user a deep insight into fs's fragmentation. > > >> +void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg, __u32 chunksize) >> +{ >> + int index; >> + >> + index = __ilog2_u32(chunksize); >> + if (index >= OCFS2_INFO_MAX_HIST) >> + index = OCFS2_INFO_MAX_HIST - 1; >> + >> + ffg->iff_ffs.ffs_fc_hist.fc_chunks[index]++; >> + ffg->iff_ffs.ffs_fc_hist.fc_clusters[index] += chunksize; >> + >> + if (chunksize > ffg->iff_ffs.ffs_max) >> + ffg->iff_ffs.ffs_max = chunksize; >> + >> + if (chunksize < ffg->iff_ffs.ffs_min) >> + ffg->iff_ffs.ffs_min = chunksize; >> + >> + ffg->iff_ffs.ffs_avg += chunksize; >> + ffg->iff_ffs.ffs_free_chunks_real++; >> +} > > I'm sorry, but this is just unreadable. I get that we can only > abbreviate 'free-frag-scan' so many ways, but there is no way people > will understand this. > I don't think renaming helps too much. These names are always > going to be bad. I think we *can* reduce the long structure strings, > mostly with helper functions. > For example, here we could have: > > static void o2ffg_update_histogram(struct ocfs2_info_free_chunk_list *hist, > unsigned int chunksize) > { > int index; > > index = __ilog2_u32(chunksize); > if (index >= OCFS2_INFO_MAX_HIST) > index = OCFS2_INFO_MAX_HIST - 1; > > hist->fc_chunks[index]++; > hist->fc_clusters[index] += chunksize; > } > > static void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg, > unsigned int chunksize) > { > o2ffg_update_histogram(&ffg->iff_ffs.ffs_fc_hist, chunksize); > > ... > } > > and so on. Btw, the chunksize can be an unsigned int, as the calling > function uses that type and not the more specific __u32. Make sense. > >> +int ocfs2_info_freefrag_scan_bitmap(struct inode *gb_inode, >> + struct ocfs2_info_freefrag *ffg) >> +{ >> + __u32 chunks_in_group; >> + int status = 0, unlock = 0, i; >> + >> + struct buffer_head *bh = NULL; >> + struct ocfs2_chain_list *cl = NULL; >> + struct ocfs2_chain_rec *rec = NULL; >> + struct ocfs2_dinode *gb_dinode = NULL; >> + >> + mutex_lock(&gb_inode->i_mutex); >> + >> + if (!(ffg->iff_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) { >> + status = ocfs2_inode_lock(gb_inode, &bh, 0); >> + if (status < 0) { >> + mlog_errno(status); >> + goto bail; >> + } >> + unlock = 1; >> + } else { >> + status = ocfs2_read_inode_block(gb_inode, &bh); >> + if (status < 0) { >> + mlog_errno(status); >> + goto bail; >> + } >> + } > > Same thoughts about ocfs2_ilookup() here. Why we need to think about ocfs2_ilookup? since we're just trying to get a inode block here, not a cached VFS inode. > > Joel > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl. 2010-11-04 2:56 ` tristan @ 2010-11-04 4:47 ` tristan 2010-11-04 6:32 ` Joel Becker 1 sibling, 0 replies; 13+ messages in thread From: tristan @ 2010-11-04 4:47 UTC (permalink / raw) To: ocfs2-devel tristan wrote: > Joel Becker wrote: >> On Wed, Nov 03, 2010 at 07:02:06PM +0800, Tristan Ye wrote: >>> This new code is a bit more complicated than former ones, the goal >>> is to >>> show user all statistics required to take a deep insight into >>> filesystem >>> on how the disk is being fragmentaed. >>> >>> The goal is achieved by scaning global bitmap from (cluster)group to >>> group >>> to figure out following factors in the filesystem: >>> >>> - How many free chunks in a fixed size as user requested. >>> - How many real free chunks in all size. >> >> Let me see if I understand the above requirements. In the same >> call, you are going build a list of free chunk sizes. Then you will >> also count how many chunks of a user-specified size. Why can't the user >> just take the list and look at the size they want? What do they get >> from specifying a size to the ioctl(2)? > Actually, the original idea was borrowed from ext4, while it only did > this in userspace however. > > Let me try to explain this in a detailed way, I'm not surprised that > you'll be confused a little bit;-) > > I did maintain some lists, while they're not lists of free chunk > sizes, the list was indexed by size range, see: > ------------------------------------------------------------------------------------------- > > Extent Size Range : Free extents Free Clusters Percent > 4K... 8K- : 9 9 0.00% > 8K... 16K- : 12 31 0.00% > 16K... 32K- : 45 249 0.03% > 32K... 64K- : 376 4654 0.59% > 8M... 16M- : 2 5180 0.66% > 16M... 32M- : 5 27556 3.50% > 32M... 64M- : 1 16087 2.04% > 64M... 128M- : 24 733653 93.17% > ------------------------------------------------------------------------------------------- > I hate that thunderbird suck the format again:(, hope following lines work for me. Extent Size Range : Free extents Free Clusters Percent 4K... 8K- : 9 9 0.00% 8K... 16K- : 12 31 0.00% 16K... 32K- : 45 249 0.03% 32K... 64K- : 376 4654 0.59% 8M... 16M- : 2 5180 0.66% 16M... 32M- : 5 27556 3.50% 32M... 64M- : 1 16087 2.04% 64M... 128M- : 24 733653 93.17% > Here I maintained two lists fc_chunks and fc_clusters, the former one > records the number of chunks whose size was within > the size range, while the latter one keeps the number of free clusters > in that range. > > > While the user-specified chunksize is brand-new concept here;), it > splits the whole bitmap from chunk to chunk sequentially. > and then to see how many of this sequential/fixed-size chunks are > free. it sounds like the concept of page frame for main memory. > When a user is doing I/Os in a very size-fixed fashion, they may be > happy to see how many chunks were fully free there. > > Besides, we also keep track of the number/min_size/max_size/avg_size > of free chunks in all size being counted. > > > Goal here is to provide user a deep insight into fs's fragmentation. > >> >> >>> +void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg, __u32 >>> chunksize) >>> +{ >>> + int index; >>> + >>> + index = __ilog2_u32(chunksize); >>> + if (index >= OCFS2_INFO_MAX_HIST) >>> + index = OCFS2_INFO_MAX_HIST - 1; >>> + >>> + ffg->iff_ffs.ffs_fc_hist.fc_chunks[index]++; >>> + ffg->iff_ffs.ffs_fc_hist.fc_clusters[index] += chunksize; >>> + >>> + if (chunksize > ffg->iff_ffs.ffs_max) >>> + ffg->iff_ffs.ffs_max = chunksize; >>> + >>> + if (chunksize < ffg->iff_ffs.ffs_min) >>> + ffg->iff_ffs.ffs_min = chunksize; >>> + >>> + ffg->iff_ffs.ffs_avg += chunksize; >>> + ffg->iff_ffs.ffs_free_chunks_real++; >>> +} >> >> I'm sorry, but this is just unreadable. I get that we can only >> abbreviate 'free-frag-scan' so many ways, but there is no way people >> will understand this. >> I don't think renaming helps too much. These names are always >> going to be bad. I think we *can* reduce the long structure strings, >> mostly with helper functions. >> For example, here we could have: >> >> static void o2ffg_update_histogram(struct ocfs2_info_free_chunk_list >> *hist, >> unsigned int chunksize) >> { >> int index; >> >> index = __ilog2_u32(chunksize); >> if (index >= OCFS2_INFO_MAX_HIST) >> index = OCFS2_INFO_MAX_HIST - 1; >> >> hist->fc_chunks[index]++; >> hist->fc_clusters[index] += chunksize; >> } >> >> static void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg, >> unsigned int chunksize) >> { >> o2ffg_update_histogram(&ffg->iff_ffs.ffs_fc_hist, chunksize); >> >> ... >> } >> >> and so on. Btw, the chunksize can be an unsigned int, as the calling >> function uses that type and not the more specific __u32. > > Make sense. > >> >>> +int ocfs2_info_freefrag_scan_bitmap(struct inode *gb_inode, >>> + struct ocfs2_info_freefrag *ffg) >>> +{ >>> + __u32 chunks_in_group; >>> + int status = 0, unlock = 0, i; >>> + >>> + struct buffer_head *bh = NULL; >>> + struct ocfs2_chain_list *cl = NULL; >>> + struct ocfs2_chain_rec *rec = NULL; >>> + struct ocfs2_dinode *gb_dinode = NULL; >>> + >>> + mutex_lock(&gb_inode->i_mutex); >>> + >>> + if (!(ffg->iff_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) { >>> + status = ocfs2_inode_lock(gb_inode, &bh, 0); >>> + if (status < 0) { >>> + mlog_errno(status); >>> + goto bail; >>> + } >>> + unlock = 1; >>> + } else { >>> + status = ocfs2_read_inode_block(gb_inode, &bh); >>> + if (status < 0) { >>> + mlog_errno(status); >>> + goto bail; >>> + } >>> + } >> >> Same thoughts about ocfs2_ilookup() here. > > Why we need to think about ocfs2_ilookup? since we're just trying to > get a inode block here, not a cached VFS inode. > >> >> Joel >> > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl. 2010-11-04 2:56 ` tristan 2010-11-04 4:47 ` tristan @ 2010-11-04 6:32 ` Joel Becker 2010-11-04 7:05 ` tristan 1 sibling, 1 reply; 13+ messages in thread From: Joel Becker @ 2010-11-04 6:32 UTC (permalink / raw) To: ocfs2-devel On Thu, Nov 04, 2010 at 10:56:09AM +0800, tristan wrote: > While the user-specified chunksize is brand-new concept here;), it > splits the whole bitmap from chunk to chunk sequentially. > and then to see how many of this sequential/fixed-size chunks are free. > it sounds like the concept of page frame for main memory. > When a user is doing I/Os in a very size-fixed fashion, they may be > happy to see how many chunks were fully free there. Let me see if I understand this. If the user specified a chunksize of 64M and you find a free extent of 640M, your free list will say, "I've got one free extent of more than 128M," while your chunksize logic also reports, "I've got ten 64M chunks?" Is that right? > >> + status = ocfs2_read_inode_block(gb_inode, &bh); > >> + if (status < 0) { > >> + mlog_errno(status); > >> + goto bail; > >> + } > >> + } > > > > Same thoughts about ocfs2_ilookup() here. > > Why we need to think about ocfs2_ilookup? since we're just trying to get > a inode block here, not a cached VFS inode. If you already have the blkno, you're not igetting, and that's fine. My bad. Joel -- "Conservative, n. A statesman who is enamoured of existing evils, as distinguished from the Liberal, who wishes to replace them with others." - Ambrose Bierce, The Devil's Dictionary Joel Becker Senior Development Manager Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl. 2010-11-04 6:32 ` Joel Becker @ 2010-11-04 7:05 ` tristan 2010-12-07 0:46 ` Joel Becker 0 siblings, 1 reply; 13+ messages in thread From: tristan @ 2010-11-04 7:05 UTC (permalink / raw) To: ocfs2-devel Joel Becker wrote: > On Thu, Nov 04, 2010 at 10:56:09AM +0800, tristan wrote: >> While the user-specified chunksize is brand-new concept here;), it >> splits the whole bitmap from chunk to chunk sequentially. >> and then to see how many of this sequential/fixed-size chunks are free. >> it sounds like the concept of page frame for main memory. >> When a user is doing I/Os in a very size-fixed fashion, they may be >> happy to see how many chunks were fully free there. > > Let me see if I understand this. If the user specified a > chunksize of 64M and you find a free extent of 640M, your free list will > say, "I've got one free extent of more than 128M," while your chunksize > logic also reports, "I've got ten 64M chunks?" Is that right? Hmmm... Not exactly, while your first guess is right, my free list do says: "I've got one free extent of more than 128M", while chunksize logic may not report 10 free chunks in this case, 10 free chunks will only be reported when the start of this 640M extent was fully 64M aligned. otherwise, 9 free chunks get reported. So the essence of 'user-specified-chunksize' logic is splitting the whole bitmap into chunks with fixed size. then check if these chunks are fully free from chunk to chunk sequentially, a counter called 'free_chunks' will only be increased when all clusters in this chunk were free. it sounds like a situation when we specify the pagesize for a page frame, then counting the free pages in the physical memory. it's all about aligned/fixed-size/sequential things. While another counter called 'real_free_chunks' is dedicated to counting the number of all free chunks we detected when scaning the whole bitmap from cluster to cluster. Take following figure as example, say user specified the chunksize as 4 clusters, and the total size of global bitmap is 16 clusters. '0' denotes free cluster, while '1' stands for used one. #1 chunk #2 chunk #3 chunk #4 chunk 1100 0010 0000 1100 From above case, we found 3 real free chunks, they are: 1st free_chunk_size = 4 2nd free_chunk_size = 5 3rd free_chunk_size = 2 Therefore, real_free_chunks = 3 While for the 'user-specified-chunksize' logic, we have 4 chunks(chunksize = 4), and only one(#3 chunk) was free. Does it make sense? > >>>> + status = ocfs2_read_inode_block(gb_inode, &bh); >>>> + if (status < 0) { >>>> + mlog_errno(status); >>>> + goto bail; >>>> + } >>>> + } >>> Same thoughts about ocfs2_ilookup() here. >> Why we need to think about ocfs2_ilookup? since we're just trying to get >> a inode block here, not a cached VFS inode. > > If you already have the blkno, you're not igetting, and that's > fine. My bad. > > Joel > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl. 2010-11-04 7:05 ` tristan @ 2010-12-07 0:46 ` Joel Becker 0 siblings, 0 replies; 13+ messages in thread From: Joel Becker @ 2010-12-07 0:46 UTC (permalink / raw) To: ocfs2-devel On Thu, Nov 04, 2010 at 03:05:14PM +0800, tristan wrote: > #1 chunk #2 chunk #3 chunk #4 chunk > 1100 0010 0000 1100 > > From above case, we found 3 real free chunks, they are: > > 1st free_chunk_size = 4 > > 2nd free_chunk_size = 5 > > 3rd free_chunk_size = 2 > > Therefore, real_free_chunks = 3 > > While for the 'user-specified-chunksize' logic, we have 4 > chunks(chunksize = 4), and only > one(#3 chunk) was free. > > Does it make sense? Yes. Joel -- print STDOUT q Just another Perl hacker, unless $spring - Larry Wall Joel Becker Senior Development Manager Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl. 2010-11-03 11:02 [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl Tristan Ye 2010-11-03 11:02 ` [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' " Tristan Ye @ 2010-11-04 1:18 ` Joel Becker 2010-11-04 2:27 ` tristan 1 sibling, 1 reply; 13+ messages in thread From: Joel Becker @ 2010-11-04 1:18 UTC (permalink / raw) To: ocfs2-devel On Wed, Nov 03, 2010 at 07:02:05PM +0800, Tristan Ye wrote: > The new code is dedicated to calculate free inodes number of all inode_allocs, > then return the info to userpace in terms of an array. > > Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent' > from userspace, is now going to be involved. setting the flag on means no cluster > coherency considered, usually, userspace tools choose none-coherency strategy by > default for the sake of performace. This looks pretty straightforward. Note that any non-cached allocator is going to lock, regardless of the coherency flag. Do we want to use ocfs2_ilookup() instead? > +int ocfs2_info_scan_inode_alloc(struct inode *inode_alloc, > + struct ocfs2_info_freeinode *fi, __u32 slot) > +{ > + int status = 0, unlock = 0; > + > + struct buffer_head *bh = NULL; > + struct ocfs2_dinode *dinode_alloc = NULL; > + > + mutex_lock(&inode_alloc->i_mutex); > + > + if (!(fi->ifi_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) { Also, I'm thinking that this would look much better as: if (!ocfs2_info_coherent(&fi->ifi_req)) { That implies we probably also want ocfs2_info_set_filled(request), etc. Good, bad? Joel -- The Graham Corollary: The longer a socially-moderated news website exists, the probability of an old Paul Graham link appearing at the top approaches certainty. Joel Becker Senior Development Manager Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl. 2010-11-04 1:18 ` [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' " Joel Becker @ 2010-11-04 2:27 ` tristan 2010-11-04 6:28 ` Joel Becker 0 siblings, 1 reply; 13+ messages in thread From: tristan @ 2010-11-04 2:27 UTC (permalink / raw) To: ocfs2-devel Joel Becker wrote: > On Wed, Nov 03, 2010 at 07:02:05PM +0800, Tristan Ye wrote: >> The new code is dedicated to calculate free inodes number of all inode_allocs, >> then return the info to userpace in terms of an array. >> >> Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent' >> from userspace, is now going to be involved. setting the flag on means no cluster >> coherency considered, usually, userspace tools choose none-coherency strategy by >> default for the sake of performace. > > This looks pretty straightforward. Note that any non-cached > allocator is going to lock, regardless of the coherency flag. Do we > want to use ocfs2_ilookup() instead? A bit confused here, did you mean we use 'ocfs2_ilookup' instead of 'ocfs2_get_system_file_inode' here? coherency flag refers to a cluster-aware lock, while ocfs2_get_system_file_inode will use iget_locked() to get the inode if it didn't exist in cache, does iget_locked() also refer to a cluster-aware lock somehow? Or you mean using following logic is not fine? if (!coherency) ocfs2_inode_lock(); else ocfs2_read_inode_block(); > >> +int ocfs2_info_scan_inode_alloc(struct inode *inode_alloc, >> + struct ocfs2_info_freeinode *fi, __u32 slot) >> +{ >> + int status = 0, unlock = 0; >> + >> + struct buffer_head *bh = NULL; >> + struct ocfs2_dinode *dinode_alloc = NULL; >> + >> + mutex_lock(&inode_alloc->i_mutex); >> + >> + if (!(fi->ifi_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) { > > Also, I'm thinking that this would look much better as: > > if (!ocfs2_info_coherent(&fi->ifi_req)) { > > That implies we probably also want ocfs2_info_set_filled(request), etc. > Good, bad? Good idea, it does simplify the codes. > > Joel > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl. 2010-11-04 2:27 ` tristan @ 2010-11-04 6:28 ` Joel Becker 2010-11-04 8:10 ` tristan 0 siblings, 1 reply; 13+ messages in thread From: Joel Becker @ 2010-11-04 6:28 UTC (permalink / raw) To: ocfs2-devel On Thu, Nov 04, 2010 at 10:27:01AM +0800, tristan wrote: > Joel Becker wrote: > > On Wed, Nov 03, 2010 at 07:02:05PM +0800, Tristan Ye wrote: > >> The new code is dedicated to calculate free inodes number of all inode_allocs, > >> then return the info to userpace in terms of an array. > >> > >> Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent' > >> from userspace, is now going to be involved. setting the flag on means no cluster > >> coherency considered, usually, userspace tools choose none-coherency strategy by > >> default for the sake of performace. > > > > This looks pretty straightforward. Note that any non-cached > > allocator is going to lock, regardless of the coherency flag. Do we > > want to use ocfs2_ilookup() instead? > > A bit confused here, did you mean we use 'ocfs2_ilookup' instead of > 'ocfs2_get_system_file_inode' here? I do. ocfs2_get_system_file_inode() calls ocfs2_iget(), which will read and lock the inode if it is not in the inode cache. Now, the cache-coherent case obviously wants this. But in the non-coherent case we have the following conditions: 1) We have an inode in the inode cache, we can use it to look up the blkno. 2) We have no inode in the cache, and we have to go get it. Case (1) is fast. Case (2) is not, especially because it locks the inode. Why not merely look up blkno via ocfs2_lookup_ino_from_name() and go from there? Note that ocfs2_ilookup() isn't even needed, unless you have a faster way to get to the blkno. > coherency flag refers to a cluster-aware lock, while > ocfs2_get_system_file_inode will use iget_locked() to get > the inode if it didn't exist in cache, does iget_locked() also refer to > a cluster-aware lock somehow? Yes. If an inode is not in cache, it will eventually call ocfs2_read_inode_locked(). Joel -- "The trouble with being punctual is that nobody's there to appreciate it." - Franklin P. Jones Joel Becker Senior Development Manager Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl. 2010-11-04 6:28 ` Joel Becker @ 2010-11-04 8:10 ` tristan 2010-11-04 21:24 ` Joel Becker 0 siblings, 1 reply; 13+ messages in thread From: tristan @ 2010-11-04 8:10 UTC (permalink / raw) To: ocfs2-devel Joel Becker wrote: > On Thu, Nov 04, 2010 at 10:27:01AM +0800, tristan wrote: >> Joel Becker wrote: >>> On Wed, Nov 03, 2010 at 07:02:05PM +0800, Tristan Ye wrote: >>>> The new code is dedicated to calculate free inodes number of all inode_allocs, >>>> then return the info to userpace in terms of an array. >>>> >>>> Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent' >>>> from userspace, is now going to be involved. setting the flag on means no cluster >>>> coherency considered, usually, userspace tools choose none-coherency strategy by >>>> default for the sake of performace. >>> This looks pretty straightforward. Note that any non-cached >>> allocator is going to lock, regardless of the coherency flag. Do we >>> want to use ocfs2_ilookup() instead? >> A bit confused here, did you mean we use 'ocfs2_ilookup' instead of >> 'ocfs2_get_system_file_inode' here? > > I do. ocfs2_get_system_file_inode() calls ocfs2_iget(), which > will read and lock the inode if it is not in the inode cache. Now, the > cache-coherent case obviously wants this. > But in the non-coherent case we have the following conditions: > > 1) We have an inode in the inode cache, we can use it to look up the > blkno. > 2) We have no inode in the cache, and we have to go get it. > > Case (1) is fast. Case (2) is not, especially because it locks > the inode. Why not merely look up blkno via > ocfs2_lookup_ino_from_name() and go from there? Note that > ocfs2_ilookup() isn't even needed, unless you have a faster way to get > to the blkno. > Yep, you're correct, other than the operation we read inode block, the lookup of blkno/inode should also be treated differently according to coherency flag, I guess following logic could help: if (cluster_coherent) { alloc_inode = ocfs2_get_system_file_inode(); ocfs2_inode_lock(alloc_inode, &bh); } else { ocfs2_lookup_ino_from_name("global_bitmap", &blkno); ocfs2_read_blocks(blkno, 1, &bh); } ... Above logic guarantee the performance for none-coherency case. >> coherency flag refers to a cluster-aware lock, while >> ocfs2_get_system_file_inode will use iget_locked() to get >> the inode if it didn't exist in cache, does iget_locked() also refer to >> a cluster-aware lock somehow? > > Yes. If an inode is not in cache, it will eventually call > ocfs2_read_inode_locked(). > > Joel > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl. 2010-11-04 8:10 ` tristan @ 2010-11-04 21:24 ` Joel Becker 0 siblings, 0 replies; 13+ messages in thread From: Joel Becker @ 2010-11-04 21:24 UTC (permalink / raw) To: ocfs2-devel On Thu, Nov 04, 2010 at 04:10:50PM +0800, tristan wrote: > Yep, you're correct, other than the operation we read inode block, > the lookup > of blkno/inode should also be treated differently according to coherency > flag, I > guess following logic could help: > > if (cluster_coherent) { > alloc_inode = ocfs2_get_system_file_inode(); > ocfs2_inode_lock(alloc_inode, &bh); > } else { > ocfs2_lookup_ino_from_name("global_bitmap", &blkno); > ocfs2_read_blocks(blkno, 1, &bh); > } > ... > > Above logic guarantee the performance for none-coherency case. You got it. But don't hardcode file names. Use the sprintf function for type+slot as in the top of _ocfs2_get_system_file_inode(). or write a lookup_ino_by_type. Or something like that. Joel -- Life's Little Instruction Book #80 "Slow dance" Joel Becker Senior Development Manager Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-12-07 0:46 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-03 11:02 [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl Tristan Ye 2010-11-03 11:02 ` [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' " Tristan Ye 2010-11-04 1:29 ` Joel Becker 2010-11-04 2:56 ` tristan 2010-11-04 4:47 ` tristan 2010-11-04 6:32 ` Joel Becker 2010-11-04 7:05 ` tristan 2010-12-07 0:46 ` Joel Becker 2010-11-04 1:18 ` [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' " Joel Becker 2010-11-04 2:27 ` tristan 2010-11-04 6:28 ` Joel Becker 2010-11-04 8:10 ` tristan 2010-11-04 21:24 ` Joel Becker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).