* [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl.
@ 2010-11-03 11:02 Tristan Ye
2010-11-03 11:02 ` [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' " Tristan Ye
2010-11-04 1:18 ` [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' " Joel Becker
0 siblings, 2 replies; 13+ messages in thread
From: Tristan Ye @ 2010-11-03 11:02 UTC (permalink / raw)
To: ocfs2-devel
The new code is dedicated to calculate free inodes number of all inode_allocs,
then return the info to userpace in terms of an array.
Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent'
from userspace, is now going to be involved. setting the flag on means no cluster
coherency considered, usually, userspace tools choose none-coherency strategy by
default for the sake of performace.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
---
fs/ocfs2/ioctl.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/ocfs2_ioctl.h | 11 ++++++
2 files changed, 106 insertions(+), 0 deletions(-)
diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c
index 7a48681..a60eaec 100644
--- a/fs/ocfs2/ioctl.c
+++ b/fs/ocfs2/ioctl.c
@@ -23,6 +23,7 @@
#include "ioctl.h"
#include "resize.h"
#include "refcounttree.h"
+#include "sysfile.h"
#include <linux/ext2_fs.h>
@@ -299,6 +300,96 @@ bail:
return status;
}
+int ocfs2_info_scan_inode_alloc(struct inode *inode_alloc,
+ struct ocfs2_info_freeinode *fi, __u32 slot)
+{
+ int status = 0, unlock = 0;
+
+ struct buffer_head *bh = NULL;
+ struct ocfs2_dinode *dinode_alloc = NULL;
+
+ mutex_lock(&inode_alloc->i_mutex);
+
+ if (!(fi->ifi_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) {
+ status = ocfs2_inode_lock(inode_alloc, &bh, 0);
+ if (status < 0) {
+ mlog_errno(status);
+ goto bail;
+ }
+ unlock = 1;
+
+ } else {
+ status = ocfs2_read_inode_block(inode_alloc, &bh);
+ if (status < 0) {
+ mlog_errno(status);
+ goto bail;
+ }
+ }
+
+ dinode_alloc = (struct ocfs2_dinode *)bh->b_data;
+
+ fi->ifi_stat[slot].lfi_total =
+ le32_to_cpu(dinode_alloc->id1.bitmap1.i_total);
+ fi->ifi_stat[slot].lfi_free =
+ le32_to_cpu(dinode_alloc->id1.bitmap1.i_total) -
+ le32_to_cpu(dinode_alloc->id1.bitmap1.i_used);
+bail:
+ if (unlock)
+ ocfs2_inode_unlock(inode_alloc, 0);
+
+ mutex_unlock(&inode_alloc->i_mutex);
+
+ iput(inode_alloc);
+ brelse(bh);
+
+ mlog_exit(status);
+ return status;
+}
+
+int ocfs2_info_handle_freeinode(struct inode *inode,
+ struct ocfs2_info_request __user *req)
+{
+ u32 i;
+ int status = -EFAULT;
+ struct ocfs2_info_freeinode oifi;
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct inode *inode_alloc = NULL;
+
+ if (o2info_from_user(oifi, req))
+ goto bail;
+
+ oifi.ifi_slotnum = osb->max_slots;
+
+ for (i = 0; i < oifi.ifi_slotnum; i++) {
+ inode_alloc =
+ ocfs2_get_system_file_inode(osb,
+ INODE_ALLOC_SYSTEM_INODE,
+ i);
+ if (!inode_alloc) {
+ mlog(ML_ERROR, "unable to get alloc inode in "
+ "slot %u\n", i);
+ status = -EIO;
+ goto bail;
+ }
+
+ status = ocfs2_info_scan_inode_alloc(inode_alloc, &oifi, i);
+ if (status < 0)
+ goto bail;
+ }
+
+ oifi.ifi_req.ir_flags |= OCFS2_INFO_FL_FILLED;
+
+ if (o2info_to_user(oifi, req))
+ goto bail;
+
+ status = 0;
+bail:
+ if (status)
+ o2info_set_request_error(oifi, req);
+
+ return status;
+}
+
int ocfs2_info_handle_unknown(struct inode *inode,
struct ocfs2_info_request __user *req)
{
@@ -370,6 +461,10 @@ int ocfs2_info_handle_request(struct inode *inode,
if (oir.ir_size == sizeof(struct ocfs2_info_journal_size))
status = ocfs2_info_handle_journal_size(inode, req);
break;
+ case OCFS2_INFO_FREEINODE:
+ if (oir.ir_size == sizeof(struct ocfs2_info_freeinode))
+ status = ocfs2_info_handle_freeinode(inode, req);
+ break;
default:
status = ocfs2_info_handle_unknown(inode, req);
break;
diff --git a/fs/ocfs2/ocfs2_ioctl.h b/fs/ocfs2/ocfs2_ioctl.h
index b46f39b..6b4b39a 100644
--- a/fs/ocfs2/ocfs2_ioctl.h
+++ b/fs/ocfs2/ocfs2_ioctl.h
@@ -142,6 +142,16 @@ struct ocfs2_info_journal_size {
__u64 ij_journal_size;
};
+struct ocfs2_info_freeinode {
+ struct ocfs2_info_request ifi_req;
+ struct ocfs2_info_local_freeinode {
+ __u64 lfi_total;
+ __u64 lfi_free;
+ } ifi_stat[OCFS2_MAX_SLOTS];
+ __u32 ifi_slotnum; /* out */
+ __u32 ifi_pad;
+};
+
/* Codes for ocfs2_info_request */
enum ocfs2_info_type {
OCFS2_INFO_CLUSTERSIZE = 1,
@@ -151,6 +161,7 @@ enum ocfs2_info_type {
OCFS2_INFO_UUID,
OCFS2_INFO_FS_FEATURES,
OCFS2_INFO_JOURNAL_SIZE,
+ OCFS2_INFO_FREEINODE,
OCFS2_INFO_NUM_TYPES
};
--
1.5.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.
2010-11-03 11:02 [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl Tristan Ye
@ 2010-11-03 11:02 ` Tristan Ye
2010-11-04 1:29 ` Joel Becker
2010-11-04 1:18 ` [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' " Joel Becker
1 sibling, 1 reply; 13+ messages in thread
From: Tristan Ye @ 2010-11-03 11:02 UTC (permalink / raw)
To: ocfs2-devel
This new code is a bit more complicated than former ones, the goal is to
show user all statistics required to take a deep insight into filesystem
on how the disk is being fragmentaed.
The goal is achieved by scaning global bitmap from (cluster)group to group
to figure out following factors in the filesystem:
- How many free chunks in a fixed size as user requested.
- How many real free chunks in all size.
- Min/Max/Avg size(in) clusters of free chunks.
- How do free chunks distribute(in size) in terms of a histogram,
just like following:
---------------------------------------------------------
Extent Size Range : Free extents Free Clusters Percent
32K... 64K- : 1 1 0.00%
1M... 2M- : 9 288 0.03%
8M... 16M- : 2 831 0.09%
32M... 64M- : 1 2047 0.23%
128M... 256M- : 1 8191 0.92%
256M... 512M- : 2 21706 2.43%
512M... 1024M- : 27 858623 96.29%
---------------------------------------------------------
Userspace ioctl() call eventually gets the above info returned by passing
a 'struct ocfs2_info_freefrag' with the chunk_size being specified first.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
---
fs/ocfs2/ioctl.c | 249 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/ocfs2/ocfs2_ioctl.h | 23 +++++
2 files changed, 272 insertions(+), 0 deletions(-)
diff --git a/fs/ocfs2/ioctl.c b/fs/ocfs2/ioctl.c
index a60eaec..51ccf24 100644
--- a/fs/ocfs2/ioctl.c
+++ b/fs/ocfs2/ioctl.c
@@ -24,6 +24,8 @@
#include "resize.h"
#include "refcounttree.h"
#include "sysfile.h"
+#include "buffer_head_io.h"
+#include "suballoc.h"
#include <linux/ext2_fs.h>
@@ -390,6 +392,249 @@ bail:
return status;
}
+void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg, __u32 chunksize)
+{
+ int index;
+
+ index = __ilog2_u32(chunksize);
+ if (index >= OCFS2_INFO_MAX_HIST)
+ index = OCFS2_INFO_MAX_HIST - 1;
+
+ ffg->iff_ffs.ffs_fc_hist.fc_chunks[index]++;
+ ffg->iff_ffs.ffs_fc_hist.fc_clusters[index] += chunksize;
+
+ if (chunksize > ffg->iff_ffs.ffs_max)
+ ffg->iff_ffs.ffs_max = chunksize;
+
+ if (chunksize < ffg->iff_ffs.ffs_min)
+ ffg->iff_ffs.ffs_min = chunksize;
+
+ ffg->iff_ffs.ffs_avg += chunksize;
+ ffg->iff_ffs.ffs_free_chunks_real++;
+}
+
+int ocfs2_info_freefrag_scan_chain(struct inode *gb_inode,
+ struct ocfs2_dinode *gb_dinode,
+ struct ocfs2_chain_rec *rec,
+ struct ocfs2_info_freefrag *ffg,
+ __u32 chunks_in_group)
+{
+ int status = 0, used;
+ u64 blkno;
+
+ struct buffer_head *bh = NULL;
+ struct ocfs2_group_desc *bg = NULL;
+
+ unsigned int max_bits, num_clusters;
+ unsigned int offset = 0, cluster, chunk;
+ unsigned int chunk_free, last_chunksize = 0;
+
+ if (!le32_to_cpu(rec->c_free))
+ goto bail;
+
+ do {
+ if (!bg)
+ blkno = le64_to_cpu(rec->c_blkno);
+ else
+ blkno = le64_to_cpu(bg->bg_next_group);
+
+ if (bh) {
+ brelse(bh);
+ bh = NULL;
+ }
+
+ status = ocfs2_read_group_descriptor(gb_inode, gb_dinode,
+ blkno, &bh);
+ if (status < 0) {
+ mlog(ML_ERROR, "Can't read the group descriptor # "
+ "%llu from device.", (unsigned long long)blkno);
+ status = -EIO;
+ goto bail;
+ }
+
+ bg = (struct ocfs2_group_desc *)bh->b_data;
+
+ if (!le16_to_cpu(bg->bg_free_bits_count))
+ continue;
+
+ max_bits = le16_to_cpu(bg->bg_bits);
+ offset = 0;
+
+ for (chunk = 0; chunk < chunks_in_group; chunk++) {
+ /*
+ * last chunk may be not an entire one.
+ */
+ if ((offset + ffg->iff_chunksize) > max_bits)
+ num_clusters = max_bits - offset;
+ else
+ num_clusters = ffg->iff_chunksize;
+
+ chunk_free = 0;
+ for (cluster = 0; cluster < num_clusters; cluster++) {
+ used = ocfs2_test_bit(offset,
+ (unsigned long *)bg->bg_bitmap);
+ /*
+ * - chunk_free counts free clusters in #N chunk.
+ * - last_chunksize records the size(in) clusters
+ * for the last real free chunk being counted.
+ */
+ if (!used) {
+ last_chunksize++;
+ chunk_free++;
+ }
+
+ if (used && last_chunksize) {
+ ocfs2_info_update_ffg(ffg,
+ last_chunksize);
+ last_chunksize = 0;
+ }
+
+ offset++;
+ }
+
+ if (chunk_free == ffg->iff_chunksize)
+ ffg->iff_ffs.ffs_free_chunks++;
+ }
+
+ /*
+ * need to update the info for last free chunk.
+ */
+ if (last_chunksize)
+ ocfs2_info_update_ffg(ffg, last_chunksize);
+
+ } while (le64_to_cpu(bg->bg_next_group));
+
+bail:
+ brelse(bh);
+
+ mlog_exit(status);
+ return status;
+}
+
+int ocfs2_info_freefrag_scan_bitmap(struct inode *gb_inode,
+ struct ocfs2_info_freefrag *ffg)
+{
+ __u32 chunks_in_group;
+ int status = 0, unlock = 0, i;
+
+ struct buffer_head *bh = NULL;
+ struct ocfs2_chain_list *cl = NULL;
+ struct ocfs2_chain_rec *rec = NULL;
+ struct ocfs2_dinode *gb_dinode = NULL;
+
+ mutex_lock(&gb_inode->i_mutex);
+
+ if (!(ffg->iff_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) {
+ status = ocfs2_inode_lock(gb_inode, &bh, 0);
+ if (status < 0) {
+ mlog_errno(status);
+ goto bail;
+ }
+ unlock = 1;
+ } else {
+ status = ocfs2_read_inode_block(gb_inode, &bh);
+ if (status < 0) {
+ mlog_errno(status);
+ goto bail;
+ }
+ }
+
+ gb_dinode = (struct ocfs2_dinode *)bh->b_data;
+ cl = &(gb_dinode->id2.i_chain);
+
+ /*
+ * Chunksize(in) clusters from userspace should be
+ * less than clusters in a group.
+ */
+ if (ffg->iff_chunksize > le16_to_cpu(cl->cl_cpg)) {
+ status = -EINVAL;
+ goto bail;
+ }
+
+ memset(&ffg->iff_ffs, 0, sizeof(struct ocfs2_info_freefrag_stats));
+
+ ffg->iff_ffs.ffs_min = ~0U;
+ ffg->iff_ffs.ffs_clusters =
+ le32_to_cpu(gb_dinode->id1.bitmap1.i_total);
+ ffg->iff_ffs.ffs_free_clusters = ffg->iff_ffs.ffs_clusters -
+ le32_to_cpu(gb_dinode->id1.bitmap1.i_used);
+
+ chunks_in_group = le16_to_cpu(cl->cl_cpg) / ffg->iff_chunksize + 1;
+
+ for (i = 0; i < le16_to_cpu(cl->cl_next_free_rec); i++) {
+ rec = &(cl->cl_recs[i]);
+ status = ocfs2_info_freefrag_scan_chain(gb_inode, gb_dinode,
+ rec, ffg,
+ chunks_in_group);
+ if (status)
+ goto bail;
+ }
+
+ if (ffg->iff_ffs.ffs_free_chunks_real)
+ ffg->iff_ffs.ffs_avg = (ffg->iff_ffs.ffs_avg /
+ ffg->iff_ffs.ffs_free_chunks_real);
+bail:
+ if (unlock)
+ ocfs2_inode_unlock(gb_inode, 0);
+
+ mutex_unlock(&gb_inode->i_mutex);
+
+ iput(gb_inode);
+ brelse(bh);
+
+ mlog_exit(status);
+ return status;
+}
+
+int ocfs2_info_handle_freefrag(struct inode *inode,
+ struct ocfs2_info_request __user *req)
+{
+ int status = -EFAULT;
+
+ struct ocfs2_info_freefrag oiff;
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct inode *gb_inode = NULL;
+
+ if (o2info_from_user(oiff, req))
+ goto bail;
+ /*
+ * chunksize from userspace should be power of 2.
+ */
+ if ((oiff.iff_chunksize & (oiff.iff_chunksize - 1)) ||
+ (!oiff.iff_chunksize)) {
+ status = -EINVAL;
+ goto bail;
+ }
+
+ memset(&oiff.iff_ffs, 0, sizeof(struct ocfs2_info_freefrag_stats));
+ oiff.iff_ffs.ffs_min = ~0U;
+
+ gb_inode = ocfs2_get_system_file_inode(osb,
+ GLOBAL_BITMAP_SYSTEM_INODE,
+ OCFS2_INVALID_SLOT);
+ if (!gb_inode) {
+ mlog(ML_ERROR, "unable to get global_bitmap inode\n");
+ status = -EIO;
+ goto bail;
+ }
+
+ status = ocfs2_info_freefrag_scan_bitmap(gb_inode, &oiff);
+ if (status < 0)
+ goto bail;
+
+ oiff.iff_req.ir_flags |= OCFS2_INFO_FL_FILLED;
+
+ if (o2info_to_user(oiff, req))
+ goto bail;
+
+ status = 0;
+bail:
+ if (status)
+ o2info_set_request_error(oiff, req);
+
+ return status;
+}
+
int ocfs2_info_handle_unknown(struct inode *inode,
struct ocfs2_info_request __user *req)
{
@@ -465,6 +710,10 @@ int ocfs2_info_handle_request(struct inode *inode,
if (oir.ir_size == sizeof(struct ocfs2_info_freeinode))
status = ocfs2_info_handle_freeinode(inode, req);
break;
+ case OCFS2_INFO_FREEFRAG:
+ if (oir.ir_size == sizeof(struct ocfs2_info_freefrag))
+ status = ocfs2_info_handle_freefrag(inode, req);
+ break;
default:
status = ocfs2_info_handle_unknown(inode, req);
break;
diff --git a/fs/ocfs2/ocfs2_ioctl.h b/fs/ocfs2/ocfs2_ioctl.h
index 6b4b39a..18b6770 100644
--- a/fs/ocfs2/ocfs2_ioctl.h
+++ b/fs/ocfs2/ocfs2_ioctl.h
@@ -152,6 +152,28 @@ struct ocfs2_info_freeinode {
__u32 ifi_pad;
};
+#define OCFS2_INFO_MAX_HIST (32)
+
+struct ocfs2_info_freefrag {
+ struct ocfs2_info_request iff_req;
+ struct ocfs2_info_freefrag_stats { /* (out) */
+ struct ocfs2_info_free_chunk_list {
+ __u32 fc_chunks[OCFS2_INFO_MAX_HIST];
+ __u32 fc_clusters[OCFS2_INFO_MAX_HIST];
+ } ffs_fc_hist;
+ __u32 ffs_clusters;
+ __u32 ffs_free_clusters;
+ __u32 ffs_free_chunks;
+ __u32 ffs_free_chunks_real;
+ __u32 ffs_min; /* Minimum free chunksize in clusters */
+ __u32 ffs_max;
+ __u32 ffs_avg;
+ __u32 ffs_pad;
+ } iff_ffs;
+ __u32 iff_chunksize; /* chunksize in clusters(in) */
+ __u32 iff_pad;
+};
+
/* Codes for ocfs2_info_request */
enum ocfs2_info_type {
OCFS2_INFO_CLUSTERSIZE = 1,
@@ -162,6 +184,7 @@ enum ocfs2_info_type {
OCFS2_INFO_FS_FEATURES,
OCFS2_INFO_JOURNAL_SIZE,
OCFS2_INFO_FREEINODE,
+ OCFS2_INFO_FREEFRAG,
OCFS2_INFO_NUM_TYPES
};
--
1.5.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl.
2010-11-03 11:02 [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl Tristan Ye
2010-11-03 11:02 ` [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' " Tristan Ye
@ 2010-11-04 1:18 ` Joel Becker
2010-11-04 2:27 ` tristan
1 sibling, 1 reply; 13+ messages in thread
From: Joel Becker @ 2010-11-04 1:18 UTC (permalink / raw)
To: ocfs2-devel
On Wed, Nov 03, 2010 at 07:02:05PM +0800, Tristan Ye wrote:
> The new code is dedicated to calculate free inodes number of all inode_allocs,
> then return the info to userpace in terms of an array.
>
> Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent'
> from userspace, is now going to be involved. setting the flag on means no cluster
> coherency considered, usually, userspace tools choose none-coherency strategy by
> default for the sake of performace.
This looks pretty straightforward. Note that any non-cached
allocator is going to lock, regardless of the coherency flag. Do we
want to use ocfs2_ilookup() instead?
> +int ocfs2_info_scan_inode_alloc(struct inode *inode_alloc,
> + struct ocfs2_info_freeinode *fi, __u32 slot)
> +{
> + int status = 0, unlock = 0;
> +
> + struct buffer_head *bh = NULL;
> + struct ocfs2_dinode *dinode_alloc = NULL;
> +
> + mutex_lock(&inode_alloc->i_mutex);
> +
> + if (!(fi->ifi_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) {
Also, I'm thinking that this would look much better as:
if (!ocfs2_info_coherent(&fi->ifi_req)) {
That implies we probably also want ocfs2_info_set_filled(request), etc.
Good, bad?
Joel
--
The Graham Corollary:
The longer a socially-moderated news website exists, the
probability of an old Paul Graham link appearing at the top
approaches certainty.
Joel Becker
Senior Development Manager
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.
2010-11-03 11:02 ` [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' " Tristan Ye
@ 2010-11-04 1:29 ` Joel Becker
2010-11-04 2:56 ` tristan
0 siblings, 1 reply; 13+ messages in thread
From: Joel Becker @ 2010-11-04 1:29 UTC (permalink / raw)
To: ocfs2-devel
On Wed, Nov 03, 2010 at 07:02:06PM +0800, Tristan Ye wrote:
> This new code is a bit more complicated than former ones, the goal is to
> show user all statistics required to take a deep insight into filesystem
> on how the disk is being fragmentaed.
>
> The goal is achieved by scaning global bitmap from (cluster)group to group
> to figure out following factors in the filesystem:
>
> - How many free chunks in a fixed size as user requested.
> - How many real free chunks in all size.
Let me see if I understand the above requirements. In the same
call, you are going build a list of free chunk sizes. Then you will
also count how many chunks of a user-specified size. Why can't the user
just take the list and look at the size they want? What do they get
from specifying a size to the ioctl(2)?
> +void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg, __u32 chunksize)
> +{
> + int index;
> +
> + index = __ilog2_u32(chunksize);
> + if (index >= OCFS2_INFO_MAX_HIST)
> + index = OCFS2_INFO_MAX_HIST - 1;
> +
> + ffg->iff_ffs.ffs_fc_hist.fc_chunks[index]++;
> + ffg->iff_ffs.ffs_fc_hist.fc_clusters[index] += chunksize;
> +
> + if (chunksize > ffg->iff_ffs.ffs_max)
> + ffg->iff_ffs.ffs_max = chunksize;
> +
> + if (chunksize < ffg->iff_ffs.ffs_min)
> + ffg->iff_ffs.ffs_min = chunksize;
> +
> + ffg->iff_ffs.ffs_avg += chunksize;
> + ffg->iff_ffs.ffs_free_chunks_real++;
> +}
I'm sorry, but this is just unreadable. I get that we can only
abbreviate 'free-frag-scan' so many ways, but there is no way people
will understand this.
I don't think renaming helps too much. These names are always
going to be bad. I think we *can* reduce the long structure strings,
mostly with helper functions.
For example, here we could have:
static void o2ffg_update_histogram(struct ocfs2_info_free_chunk_list *hist,
unsigned int chunksize)
{
int index;
index = __ilog2_u32(chunksize);
if (index >= OCFS2_INFO_MAX_HIST)
index = OCFS2_INFO_MAX_HIST - 1;
hist->fc_chunks[index]++;
hist->fc_clusters[index] += chunksize;
}
static void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg,
unsigned int chunksize)
{
o2ffg_update_histogram(&ffg->iff_ffs.ffs_fc_hist, chunksize);
...
}
and so on. Btw, the chunksize can be an unsigned int, as the calling
function uses that type and not the more specific __u32.
> +int ocfs2_info_freefrag_scan_bitmap(struct inode *gb_inode,
> + struct ocfs2_info_freefrag *ffg)
> +{
> + __u32 chunks_in_group;
> + int status = 0, unlock = 0, i;
> +
> + struct buffer_head *bh = NULL;
> + struct ocfs2_chain_list *cl = NULL;
> + struct ocfs2_chain_rec *rec = NULL;
> + struct ocfs2_dinode *gb_dinode = NULL;
> +
> + mutex_lock(&gb_inode->i_mutex);
> +
> + if (!(ffg->iff_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) {
> + status = ocfs2_inode_lock(gb_inode, &bh, 0);
> + if (status < 0) {
> + mlog_errno(status);
> + goto bail;
> + }
> + unlock = 1;
> + } else {
> + status = ocfs2_read_inode_block(gb_inode, &bh);
> + if (status < 0) {
> + mlog_errno(status);
> + goto bail;
> + }
> + }
Same thoughts about ocfs2_ilookup() here.
Joel
--
Life's Little Instruction Book #347
"Never waste the oppourtunity to tell someone you love them."
Joel Becker
Senior Development Manager
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl.
2010-11-04 1:18 ` [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' " Joel Becker
@ 2010-11-04 2:27 ` tristan
2010-11-04 6:28 ` Joel Becker
0 siblings, 1 reply; 13+ messages in thread
From: tristan @ 2010-11-04 2:27 UTC (permalink / raw)
To: ocfs2-devel
Joel Becker wrote:
> On Wed, Nov 03, 2010 at 07:02:05PM +0800, Tristan Ye wrote:
>> The new code is dedicated to calculate free inodes number of all inode_allocs,
>> then return the info to userpace in terms of an array.
>>
>> Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent'
>> from userspace, is now going to be involved. setting the flag on means no cluster
>> coherency considered, usually, userspace tools choose none-coherency strategy by
>> default for the sake of performace.
>
> This looks pretty straightforward. Note that any non-cached
> allocator is going to lock, regardless of the coherency flag. Do we
> want to use ocfs2_ilookup() instead?
A bit confused here, did you mean we use 'ocfs2_ilookup' instead of
'ocfs2_get_system_file_inode' here?
coherency flag refers to a cluster-aware lock, while
ocfs2_get_system_file_inode will use iget_locked() to get
the inode if it didn't exist in cache, does iget_locked() also refer to
a cluster-aware lock somehow?
Or you mean using following logic is not fine?
if (!coherency)
ocfs2_inode_lock();
else
ocfs2_read_inode_block();
>
>> +int ocfs2_info_scan_inode_alloc(struct inode *inode_alloc,
>> + struct ocfs2_info_freeinode *fi, __u32 slot)
>> +{
>> + int status = 0, unlock = 0;
>> +
>> + struct buffer_head *bh = NULL;
>> + struct ocfs2_dinode *dinode_alloc = NULL;
>> +
>> + mutex_lock(&inode_alloc->i_mutex);
>> +
>> + if (!(fi->ifi_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) {
>
> Also, I'm thinking that this would look much better as:
>
> if (!ocfs2_info_coherent(&fi->ifi_req)) {
>
> That implies we probably also want ocfs2_info_set_filled(request), etc.
> Good, bad?
Good idea, it does simplify the codes.
>
> Joel
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.
2010-11-04 1:29 ` Joel Becker
@ 2010-11-04 2:56 ` tristan
2010-11-04 4:47 ` tristan
2010-11-04 6:32 ` Joel Becker
0 siblings, 2 replies; 13+ messages in thread
From: tristan @ 2010-11-04 2:56 UTC (permalink / raw)
To: ocfs2-devel
Joel Becker wrote:
> On Wed, Nov 03, 2010 at 07:02:06PM +0800, Tristan Ye wrote:
>> This new code is a bit more complicated than former ones, the goal is to
>> show user all statistics required to take a deep insight into filesystem
>> on how the disk is being fragmentaed.
>>
>> The goal is achieved by scaning global bitmap from (cluster)group to group
>> to figure out following factors in the filesystem:
>>
>> - How many free chunks in a fixed size as user requested.
>> - How many real free chunks in all size.
>
> Let me see if I understand the above requirements. In the same
> call, you are going build a list of free chunk sizes. Then you will
> also count how many chunks of a user-specified size. Why can't the user
> just take the list and look at the size they want? What do they get
> from specifying a size to the ioctl(2)?
Actually, the original idea was borrowed from ext4, while it only did
this in userspace however.
Let me try to explain this in a detailed way, I'm not surprised that
you'll be confused a little bit;-)
I did maintain some lists, while they're not lists of free chunk sizes,
the list was indexed by size range, see:
-------------------------------------------------------------------------------------------
Extent Size Range : Free extents Free Clusters Percent
4K... 8K- : 9 9 0.00%
8K... 16K- : 12 31 0.00%
16K... 32K- : 45 249 0.03%
32K... 64K- : 376 4654 0.59%
8M... 16M- : 2 5180 0.66%
16M... 32M- : 5 27556 3.50%
32M... 64M- : 1 16087 2.04%
64M... 128M- : 24 733653 93.17%
-------------------------------------------------------------------------------------------
Here I maintained two lists fc_chunks and fc_clusters, the former one
records the number of chunks whose size was within
the size range, while the latter one keeps the number of free clusters
in that range.
While the user-specified chunksize is brand-new concept here;), it
splits the whole bitmap from chunk to chunk sequentially.
and then to see how many of this sequential/fixed-size chunks are free.
it sounds like the concept of page frame for main memory.
When a user is doing I/Os in a very size-fixed fashion, they may be
happy to see how many chunks were fully free there.
Besides, we also keep track of the number/min_size/max_size/avg_size of
free chunks in all size being counted.
Goal here is to provide user a deep insight into fs's fragmentation.
>
>
>> +void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg, __u32 chunksize)
>> +{
>> + int index;
>> +
>> + index = __ilog2_u32(chunksize);
>> + if (index >= OCFS2_INFO_MAX_HIST)
>> + index = OCFS2_INFO_MAX_HIST - 1;
>> +
>> + ffg->iff_ffs.ffs_fc_hist.fc_chunks[index]++;
>> + ffg->iff_ffs.ffs_fc_hist.fc_clusters[index] += chunksize;
>> +
>> + if (chunksize > ffg->iff_ffs.ffs_max)
>> + ffg->iff_ffs.ffs_max = chunksize;
>> +
>> + if (chunksize < ffg->iff_ffs.ffs_min)
>> + ffg->iff_ffs.ffs_min = chunksize;
>> +
>> + ffg->iff_ffs.ffs_avg += chunksize;
>> + ffg->iff_ffs.ffs_free_chunks_real++;
>> +}
>
> I'm sorry, but this is just unreadable. I get that we can only
> abbreviate 'free-frag-scan' so many ways, but there is no way people
> will understand this.
> I don't think renaming helps too much. These names are always
> going to be bad. I think we *can* reduce the long structure strings,
> mostly with helper functions.
> For example, here we could have:
>
> static void o2ffg_update_histogram(struct ocfs2_info_free_chunk_list *hist,
> unsigned int chunksize)
> {
> int index;
>
> index = __ilog2_u32(chunksize);
> if (index >= OCFS2_INFO_MAX_HIST)
> index = OCFS2_INFO_MAX_HIST - 1;
>
> hist->fc_chunks[index]++;
> hist->fc_clusters[index] += chunksize;
> }
>
> static void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg,
> unsigned int chunksize)
> {
> o2ffg_update_histogram(&ffg->iff_ffs.ffs_fc_hist, chunksize);
>
> ...
> }
>
> and so on. Btw, the chunksize can be an unsigned int, as the calling
> function uses that type and not the more specific __u32.
Make sense.
>
>> +int ocfs2_info_freefrag_scan_bitmap(struct inode *gb_inode,
>> + struct ocfs2_info_freefrag *ffg)
>> +{
>> + __u32 chunks_in_group;
>> + int status = 0, unlock = 0, i;
>> +
>> + struct buffer_head *bh = NULL;
>> + struct ocfs2_chain_list *cl = NULL;
>> + struct ocfs2_chain_rec *rec = NULL;
>> + struct ocfs2_dinode *gb_dinode = NULL;
>> +
>> + mutex_lock(&gb_inode->i_mutex);
>> +
>> + if (!(ffg->iff_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) {
>> + status = ocfs2_inode_lock(gb_inode, &bh, 0);
>> + if (status < 0) {
>> + mlog_errno(status);
>> + goto bail;
>> + }
>> + unlock = 1;
>> + } else {
>> + status = ocfs2_read_inode_block(gb_inode, &bh);
>> + if (status < 0) {
>> + mlog_errno(status);
>> + goto bail;
>> + }
>> + }
>
> Same thoughts about ocfs2_ilookup() here.
Why we need to think about ocfs2_ilookup? since we're just trying to get
a inode block here, not a cached VFS inode.
>
> Joel
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.
2010-11-04 2:56 ` tristan
@ 2010-11-04 4:47 ` tristan
2010-11-04 6:32 ` Joel Becker
1 sibling, 0 replies; 13+ messages in thread
From: tristan @ 2010-11-04 4:47 UTC (permalink / raw)
To: ocfs2-devel
tristan wrote:
> Joel Becker wrote:
>> On Wed, Nov 03, 2010 at 07:02:06PM +0800, Tristan Ye wrote:
>>> This new code is a bit more complicated than former ones, the goal
>>> is to
>>> show user all statistics required to take a deep insight into
>>> filesystem
>>> on how the disk is being fragmentaed.
>>>
>>> The goal is achieved by scaning global bitmap from (cluster)group to
>>> group
>>> to figure out following factors in the filesystem:
>>>
>>> - How many free chunks in a fixed size as user requested.
>>> - How many real free chunks in all size.
>>
>> Let me see if I understand the above requirements. In the same
>> call, you are going build a list of free chunk sizes. Then you will
>> also count how many chunks of a user-specified size. Why can't the user
>> just take the list and look at the size they want? What do they get
>> from specifying a size to the ioctl(2)?
> Actually, the original idea was borrowed from ext4, while it only did
> this in userspace however.
>
> Let me try to explain this in a detailed way, I'm not surprised that
> you'll be confused a little bit;-)
>
> I did maintain some lists, while they're not lists of free chunk
> sizes, the list was indexed by size range, see:
> -------------------------------------------------------------------------------------------
>
> Extent Size Range : Free extents Free Clusters Percent
> 4K... 8K- : 9 9 0.00%
> 8K... 16K- : 12 31 0.00%
> 16K... 32K- : 45 249 0.03%
> 32K... 64K- : 376 4654 0.59%
> 8M... 16M- : 2 5180 0.66%
> 16M... 32M- : 5 27556 3.50%
> 32M... 64M- : 1 16087 2.04%
> 64M... 128M- : 24 733653 93.17%
> -------------------------------------------------------------------------------------------
>
I hate that thunderbird suck the format again:(, hope following lines
work for me.
Extent Size Range : Free extents Free Clusters Percent
4K... 8K- : 9 9 0.00%
8K... 16K- : 12 31 0.00%
16K... 32K- : 45 249 0.03%
32K... 64K- : 376 4654 0.59%
8M... 16M- : 2 5180 0.66%
16M... 32M- : 5 27556 3.50%
32M... 64M- : 1 16087 2.04%
64M... 128M- : 24 733653 93.17%
> Here I maintained two lists fc_chunks and fc_clusters, the former one
> records the number of chunks whose size was within
> the size range, while the latter one keeps the number of free clusters
> in that range.
>
>
> While the user-specified chunksize is brand-new concept here;), it
> splits the whole bitmap from chunk to chunk sequentially.
> and then to see how many of this sequential/fixed-size chunks are
> free. it sounds like the concept of page frame for main memory.
> When a user is doing I/Os in a very size-fixed fashion, they may be
> happy to see how many chunks were fully free there.
>
> Besides, we also keep track of the number/min_size/max_size/avg_size
> of free chunks in all size being counted.
>
>
> Goal here is to provide user a deep insight into fs's fragmentation.
>
>>
>>
>>> +void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg, __u32
>>> chunksize)
>>> +{
>>> + int index;
>>> +
>>> + index = __ilog2_u32(chunksize);
>>> + if (index >= OCFS2_INFO_MAX_HIST)
>>> + index = OCFS2_INFO_MAX_HIST - 1;
>>> +
>>> + ffg->iff_ffs.ffs_fc_hist.fc_chunks[index]++;
>>> + ffg->iff_ffs.ffs_fc_hist.fc_clusters[index] += chunksize;
>>> +
>>> + if (chunksize > ffg->iff_ffs.ffs_max)
>>> + ffg->iff_ffs.ffs_max = chunksize;
>>> +
>>> + if (chunksize < ffg->iff_ffs.ffs_min)
>>> + ffg->iff_ffs.ffs_min = chunksize;
>>> +
>>> + ffg->iff_ffs.ffs_avg += chunksize;
>>> + ffg->iff_ffs.ffs_free_chunks_real++;
>>> +}
>>
>> I'm sorry, but this is just unreadable. I get that we can only
>> abbreviate 'free-frag-scan' so many ways, but there is no way people
>> will understand this.
>> I don't think renaming helps too much. These names are always
>> going to be bad. I think we *can* reduce the long structure strings,
>> mostly with helper functions.
>> For example, here we could have:
>>
>> static void o2ffg_update_histogram(struct ocfs2_info_free_chunk_list
>> *hist,
>> unsigned int chunksize)
>> {
>> int index;
>>
>> index = __ilog2_u32(chunksize);
>> if (index >= OCFS2_INFO_MAX_HIST)
>> index = OCFS2_INFO_MAX_HIST - 1;
>>
>> hist->fc_chunks[index]++;
>> hist->fc_clusters[index] += chunksize;
>> }
>>
>> static void ocfs2_info_update_ffg(struct ocfs2_info_freefrag *ffg,
>> unsigned int chunksize)
>> {
>> o2ffg_update_histogram(&ffg->iff_ffs.ffs_fc_hist, chunksize);
>>
>> ...
>> }
>>
>> and so on. Btw, the chunksize can be an unsigned int, as the calling
>> function uses that type and not the more specific __u32.
>
> Make sense.
>
>>
>>> +int ocfs2_info_freefrag_scan_bitmap(struct inode *gb_inode,
>>> + struct ocfs2_info_freefrag *ffg)
>>> +{
>>> + __u32 chunks_in_group;
>>> + int status = 0, unlock = 0, i;
>>> +
>>> + struct buffer_head *bh = NULL;
>>> + struct ocfs2_chain_list *cl = NULL;
>>> + struct ocfs2_chain_rec *rec = NULL;
>>> + struct ocfs2_dinode *gb_dinode = NULL;
>>> +
>>> + mutex_lock(&gb_inode->i_mutex);
>>> +
>>> + if (!(ffg->iff_req.ir_flags & OCFS2_INFO_FL_NON_COHERENT)) {
>>> + status = ocfs2_inode_lock(gb_inode, &bh, 0);
>>> + if (status < 0) {
>>> + mlog_errno(status);
>>> + goto bail;
>>> + }
>>> + unlock = 1;
>>> + } else {
>>> + status = ocfs2_read_inode_block(gb_inode, &bh);
>>> + if (status < 0) {
>>> + mlog_errno(status);
>>> + goto bail;
>>> + }
>>> + }
>>
>> Same thoughts about ocfs2_ilookup() here.
>
> Why we need to think about ocfs2_ilookup? since we're just trying to
> get a inode block here, not a cached VFS inode.
>
>>
>> Joel
>>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl.
2010-11-04 2:27 ` tristan
@ 2010-11-04 6:28 ` Joel Becker
2010-11-04 8:10 ` tristan
0 siblings, 1 reply; 13+ messages in thread
From: Joel Becker @ 2010-11-04 6:28 UTC (permalink / raw)
To: ocfs2-devel
On Thu, Nov 04, 2010 at 10:27:01AM +0800, tristan wrote:
> Joel Becker wrote:
> > On Wed, Nov 03, 2010 at 07:02:05PM +0800, Tristan Ye wrote:
> >> The new code is dedicated to calculate free inodes number of all inode_allocs,
> >> then return the info to userpace in terms of an array.
> >>
> >> Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent'
> >> from userspace, is now going to be involved. setting the flag on means no cluster
> >> coherency considered, usually, userspace tools choose none-coherency strategy by
> >> default for the sake of performace.
> >
> > This looks pretty straightforward. Note that any non-cached
> > allocator is going to lock, regardless of the coherency flag. Do we
> > want to use ocfs2_ilookup() instead?
>
> A bit confused here, did you mean we use 'ocfs2_ilookup' instead of
> 'ocfs2_get_system_file_inode' here?
I do. ocfs2_get_system_file_inode() calls ocfs2_iget(), which
will read and lock the inode if it is not in the inode cache. Now, the
cache-coherent case obviously wants this.
But in the non-coherent case we have the following conditions:
1) We have an inode in the inode cache, we can use it to look up the
blkno.
2) We have no inode in the cache, and we have to go get it.
Case (1) is fast. Case (2) is not, especially because it locks
the inode. Why not merely look up blkno via
ocfs2_lookup_ino_from_name() and go from there? Note that
ocfs2_ilookup() isn't even needed, unless you have a faster way to get
to the blkno.
> coherency flag refers to a cluster-aware lock, while
> ocfs2_get_system_file_inode will use iget_locked() to get
> the inode if it didn't exist in cache, does iget_locked() also refer to
> a cluster-aware lock somehow?
Yes. If an inode is not in cache, it will eventually call
ocfs2_read_inode_locked().
Joel
--
"The trouble with being punctual is that nobody's there to
appreciate it."
- Franklin P. Jones
Joel Becker
Senior Development Manager
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.
2010-11-04 2:56 ` tristan
2010-11-04 4:47 ` tristan
@ 2010-11-04 6:32 ` Joel Becker
2010-11-04 7:05 ` tristan
1 sibling, 1 reply; 13+ messages in thread
From: Joel Becker @ 2010-11-04 6:32 UTC (permalink / raw)
To: ocfs2-devel
On Thu, Nov 04, 2010 at 10:56:09AM +0800, tristan wrote:
> While the user-specified chunksize is brand-new concept here;), it
> splits the whole bitmap from chunk to chunk sequentially.
> and then to see how many of this sequential/fixed-size chunks are free.
> it sounds like the concept of page frame for main memory.
> When a user is doing I/Os in a very size-fixed fashion, they may be
> happy to see how many chunks were fully free there.
Let me see if I understand this. If the user specified a
chunksize of 64M and you find a free extent of 640M, your free list will
say, "I've got one free extent of more than 128M," while your chunksize
logic also reports, "I've got ten 64M chunks?" Is that right?
> >> + status = ocfs2_read_inode_block(gb_inode, &bh);
> >> + if (status < 0) {
> >> + mlog_errno(status);
> >> + goto bail;
> >> + }
> >> + }
> >
> > Same thoughts about ocfs2_ilookup() here.
>
> Why we need to think about ocfs2_ilookup? since we're just trying to get
> a inode block here, not a cached VFS inode.
If you already have the blkno, you're not igetting, and that's
fine. My bad.
Joel
--
"Conservative, n. A statesman who is enamoured of existing evils,
as distinguished from the Liberal, who wishes to replace them
with others."
- Ambrose Bierce, The Devil's Dictionary
Joel Becker
Senior Development Manager
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.
2010-11-04 6:32 ` Joel Becker
@ 2010-11-04 7:05 ` tristan
2010-12-07 0:46 ` Joel Becker
0 siblings, 1 reply; 13+ messages in thread
From: tristan @ 2010-11-04 7:05 UTC (permalink / raw)
To: ocfs2-devel
Joel Becker wrote:
> On Thu, Nov 04, 2010 at 10:56:09AM +0800, tristan wrote:
>> While the user-specified chunksize is brand-new concept here;), it
>> splits the whole bitmap from chunk to chunk sequentially.
>> and then to see how many of this sequential/fixed-size chunks are free.
>> it sounds like the concept of page frame for main memory.
>> When a user is doing I/Os in a very size-fixed fashion, they may be
>> happy to see how many chunks were fully free there.
>
> Let me see if I understand this. If the user specified a
> chunksize of 64M and you find a free extent of 640M, your free list will
> say, "I've got one free extent of more than 128M," while your chunksize
> logic also reports, "I've got ten 64M chunks?" Is that right?
Hmmm...
Not exactly, while your first guess is right, my free list do says:
"I've got
one free extent of more than 128M", while chunksize logic may not report
10 free
chunks in this case, 10 free chunks will only be reported when the start
of this
640M extent was fully 64M aligned. otherwise, 9 free chunks get reported.
So the essence of 'user-specified-chunksize' logic is splitting the
whole bitmap
into chunks with fixed size. then check if these chunks are fully free
from chunk to
chunk sequentially, a counter called 'free_chunks' will only be
increased when all
clusters in this chunk were free. it sounds like a situation when we
specify the
pagesize for a page frame, then counting the free pages in the physical
memory.
it's all about aligned/fixed-size/sequential things.
While another counter called 'real_free_chunks' is dedicated to counting
the number
of all free chunks we detected when scaning the whole bitmap from
cluster to cluster.
Take following figure as example, say user specified the chunksize as 4
clusters, and
the total size of global bitmap is 16 clusters. '0' denotes free
cluster, while '1'
stands for used one.
#1 chunk #2 chunk #3 chunk #4 chunk
1100 0010 0000 1100
From above case, we found 3 real free chunks, they are:
1st free_chunk_size = 4
2nd free_chunk_size = 5
3rd free_chunk_size = 2
Therefore, real_free_chunks = 3
While for the 'user-specified-chunksize' logic, we have 4
chunks(chunksize = 4), and only
one(#3 chunk) was free.
Does it make sense?
>
>>>> + status = ocfs2_read_inode_block(gb_inode, &bh);
>>>> + if (status < 0) {
>>>> + mlog_errno(status);
>>>> + goto bail;
>>>> + }
>>>> + }
>>> Same thoughts about ocfs2_ilookup() here.
>> Why we need to think about ocfs2_ilookup? since we're just trying to get
>> a inode block here, not a cached VFS inode.
>
> If you already have the blkno, you're not igetting, and that's
> fine. My bad.
>
> Joel
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl.
2010-11-04 6:28 ` Joel Becker
@ 2010-11-04 8:10 ` tristan
2010-11-04 21:24 ` Joel Becker
0 siblings, 1 reply; 13+ messages in thread
From: tristan @ 2010-11-04 8:10 UTC (permalink / raw)
To: ocfs2-devel
Joel Becker wrote:
> On Thu, Nov 04, 2010 at 10:27:01AM +0800, tristan wrote:
>> Joel Becker wrote:
>>> On Wed, Nov 03, 2010 at 07:02:05PM +0800, Tristan Ye wrote:
>>>> The new code is dedicated to calculate free inodes number of all inode_allocs,
>>>> then return the info to userpace in terms of an array.
>>>>
>>>> Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent'
>>>> from userspace, is now going to be involved. setting the flag on means no cluster
>>>> coherency considered, usually, userspace tools choose none-coherency strategy by
>>>> default for the sake of performace.
>>> This looks pretty straightforward. Note that any non-cached
>>> allocator is going to lock, regardless of the coherency flag. Do we
>>> want to use ocfs2_ilookup() instead?
>> A bit confused here, did you mean we use 'ocfs2_ilookup' instead of
>> 'ocfs2_get_system_file_inode' here?
>
> I do. ocfs2_get_system_file_inode() calls ocfs2_iget(), which
> will read and lock the inode if it is not in the inode cache. Now, the
> cache-coherent case obviously wants this.
> But in the non-coherent case we have the following conditions:
>
> 1) We have an inode in the inode cache, we can use it to look up the
> blkno.
> 2) We have no inode in the cache, and we have to go get it.
>
> Case (1) is fast. Case (2) is not, especially because it locks
> the inode. Why not merely look up blkno via
> ocfs2_lookup_ino_from_name() and go from there? Note that
> ocfs2_ilookup() isn't even needed, unless you have a faster way to get
> to the blkno.
>
Yep, you're correct, other than the operation we read inode block,
the lookup
of blkno/inode should also be treated differently according to coherency
flag, I
guess following logic could help:
if (cluster_coherent) {
alloc_inode = ocfs2_get_system_file_inode();
ocfs2_inode_lock(alloc_inode, &bh);
} else {
ocfs2_lookup_ino_from_name("global_bitmap", &blkno);
ocfs2_read_blocks(blkno, 1, &bh);
}
...
Above logic guarantee the performance for none-coherency case.
>> coherency flag refers to a cluster-aware lock, while
>> ocfs2_get_system_file_inode will use iget_locked() to get
>> the inode if it didn't exist in cache, does iget_locked() also refer to
>> a cluster-aware lock somehow?
>
> Yes. If an inode is not in cache, it will eventually call
> ocfs2_read_inode_locked().
>
> Joel
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl.
2010-11-04 8:10 ` tristan
@ 2010-11-04 21:24 ` Joel Becker
0 siblings, 0 replies; 13+ messages in thread
From: Joel Becker @ 2010-11-04 21:24 UTC (permalink / raw)
To: ocfs2-devel
On Thu, Nov 04, 2010 at 04:10:50PM +0800, tristan wrote:
> Yep, you're correct, other than the operation we read inode block,
> the lookup
> of blkno/inode should also be treated differently according to coherency
> flag, I
> guess following logic could help:
>
> if (cluster_coherent) {
> alloc_inode = ocfs2_get_system_file_inode();
> ocfs2_inode_lock(alloc_inode, &bh);
> } else {
> ocfs2_lookup_ino_from_name("global_bitmap", &blkno);
> ocfs2_read_blocks(blkno, 1, &bh);
> }
> ...
>
> Above logic guarantee the performance for none-coherency case.
You got it. But don't hardcode file names. Use the sprintf
function for type+slot as in the top of _ocfs2_get_system_file_inode().
or write a lookup_ino_by_type. Or something like that.
Joel
--
Life's Little Instruction Book #80
"Slow dance"
Joel Becker
Senior Development Manager
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.
2010-11-04 7:05 ` tristan
@ 2010-12-07 0:46 ` Joel Becker
0 siblings, 0 replies; 13+ messages in thread
From: Joel Becker @ 2010-12-07 0:46 UTC (permalink / raw)
To: ocfs2-devel
On Thu, Nov 04, 2010 at 03:05:14PM +0800, tristan wrote:
> #1 chunk #2 chunk #3 chunk #4 chunk
> 1100 0010 0000 1100
>
> From above case, we found 3 real free chunks, they are:
>
> 1st free_chunk_size = 4
>
> 2nd free_chunk_size = 5
>
> 3rd free_chunk_size = 2
>
> Therefore, real_free_chunks = 3
>
> While for the 'user-specified-chunksize' logic, we have 4
> chunks(chunksize = 4), and only
> one(#3 chunk) was free.
>
> Does it make sense?
Yes.
Joel
--
print STDOUT q
Just another Perl hacker,
unless $spring
- Larry Wall
Joel Becker
Senior Development Manager
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-12-07 0:46 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-03 11:02 [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl Tristan Ye
2010-11-03 11:02 ` [Ocfs2-devel] [PATCH 2/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' " Tristan Ye
2010-11-04 1:29 ` Joel Becker
2010-11-04 2:56 ` tristan
2010-11-04 4:47 ` tristan
2010-11-04 6:32 ` Joel Becker
2010-11-04 7:05 ` tristan
2010-12-07 0:46 ` Joel Becker
2010-11-04 1:18 ` [Ocfs2-devel] [PATCH 1/2] Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' " Joel Becker
2010-11-04 2:27 ` tristan
2010-11-04 6:28 ` Joel Becker
2010-11-04 8:10 ` tristan
2010-11-04 21:24 ` Joel Becker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).