* [Ocfs2-devel] [PATCH 0/2 V2] ocfs2: Resolve the problem of truncate log flush.
@ 2010-09-19 7:19 Tao Ma
2010-09-19 7:20 ` [Ocfs2-devel] [PATCH 1/2] ocfs2: flush truncate log in case it contains too many clusters Tao Ma
2010-09-19 7:20 ` [Ocfs2-devel] [PATCH 2/2] ocfs2: Start journal checkpoint if we have too many truncated clusters Tao Ma
0 siblings, 2 replies; 5+ messages in thread
From: Tao Ma @ 2010-09-19 7:19 UTC (permalink / raw)
To: ocfs2-devel
Hi all,
change log form v1 to v2:
0001: no change.
0002 is removed and now we use jbd2_journal_start_commit in local mode
as suggested by Joel.
Recently, one of our colleagues meet with a problem that if we
write/delete a 32mb files repeatly, we will get a ENOSPC in the end. And
the corresponding bug is 1288.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1288
So this patch set just tries to resolve it. It includes 2 patches:
0001 adds a new watermark for truncate log, FLUSH_TRUNCATE_LOG_RATIO. So
if the truncate log has collected too much clusters,
ocfs2_truncate_log_needs_flush will tell the caller to flush immediately.
0002 try to add journal checkpoint support if we finds the need for
checkpointing what truncate log has freed. for cluster mount, it is
simple and we just need to wake up the ocfs2cmt and let it work for us.
For local mode, we will call jbd2_journal_start_commit directly which
will start the checkpoint.
Regards,
Tao
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Ocfs2-devel] [PATCH 1/2] ocfs2: flush truncate log in case it contains too many clusters.
2010-09-19 7:19 [Ocfs2-devel] [PATCH 0/2 V2] ocfs2: Resolve the problem of truncate log flush Tao Ma
@ 2010-09-19 7:20 ` Tao Ma
2010-10-12 0:23 ` Mark Fasheh
2010-09-19 7:20 ` [Ocfs2-devel] [PATCH 2/2] ocfs2: Start journal checkpoint if we have too many truncated clusters Tao Ma
1 sibling, 1 reply; 5+ messages in thread
From: Tao Ma @ 2010-09-19 7:20 UTC (permalink / raw)
To: ocfs2-devel
When we test whether we need to flush truncate log in
ocfs2_truncate_log_needs_flush, we only take care of
whether the truncate log is full. But if the volume is
small and we have large block size(in this case truncate
log can store too many records), we may be too late
for flushing if the user create/write/delete files quickly.
So I add a new FLUSH_TRUNCATE_LOG_RATIO so that we will also
check whether the number of truncated clusters has reached
a watermark, if yes, flush the truncate log.
It resolves the ossbug #1288 somehow.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
fs/ocfs2/alloc.c | 25 ++++++++++++++++++++++++-
fs/ocfs2/ocfs2.h | 2 ++
2 files changed, 26 insertions(+), 1 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 592fae5..c765447 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -5751,11 +5751,13 @@ out:
return ret;
}
+#define FLUSH_TRUNCATE_LOG_RATIO 10
int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb)
{
struct buffer_head *tl_bh = osb->osb_tl_bh;
struct ocfs2_dinode *di;
struct ocfs2_truncate_log *tl;
+ int flush = 0;
di = (struct ocfs2_dinode *) tl_bh->b_data;
tl = &di->id2.i_dealloc;
@@ -5764,7 +5766,25 @@ int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb)
"slot %d, invalid truncate log parameters: used = "
"%u, count = %u\n", osb->slot_num,
le16_to_cpu(tl->tl_used), le16_to_cpu(tl->tl_count));
- return le16_to_cpu(tl->tl_used) == le16_to_cpu(tl->tl_count);
+ if (le16_to_cpu(tl->tl_used) == le16_to_cpu(tl->tl_count))
+ flush = 1;
+ else {
+ /*
+ * Check whether we have reserved enough clusters
+ * to flush.
+ */
+ u32 watermark = osb->osb_clusters_at_boot / 100 *
+ FLUSH_TRUNCATE_LOG_RATIO;
+
+ if (watermark < osb->truncated_clusters) {
+ mlog(0, "flush truncate log: watermark %u,"
+ " we have %u clusters truncated\n",
+ watermark, osb->truncated_clusters);
+ flush = 1;
+ }
+ }
+
+ return flush;
}
static int ocfs2_truncate_log_can_coalesce(struct ocfs2_truncate_log *tl,
@@ -5858,6 +5878,7 @@ int ocfs2_truncate_log_append(struct ocfs2_super *osb,
ocfs2_journal_dirty(handle, tl_bh);
+ osb->truncated_clusters += num_clusters;
bail:
mlog_exit(status);
return status;
@@ -5929,6 +5950,8 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
i--;
}
+ osb->truncated_clusters = 0;
+
bail:
mlog_exit(status);
return status;
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index c67003b..5f47883 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -425,6 +425,8 @@ struct ocfs2_super
/* rb tree root for refcount lock. */
struct rb_root osb_rf_lock_tree;
struct ocfs2_refcount_tree *osb_ref_tree_lru;
+
+ unsigned int truncated_clusters;
};
#define OCFS2_SB(sb) ((struct ocfs2_super *)(sb)->s_fs_info)
--
1.7.1.571.gba4d01
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [Ocfs2-devel] [PATCH 2/2] ocfs2: Start journal checkpoint if we have too many truncated clusters.
2010-09-19 7:19 [Ocfs2-devel] [PATCH 0/2 V2] ocfs2: Resolve the problem of truncate log flush Tao Ma
2010-09-19 7:20 ` [Ocfs2-devel] [PATCH 1/2] ocfs2: flush truncate log in case it contains too many clusters Tao Ma
@ 2010-09-19 7:20 ` Tao Ma
1 sibling, 0 replies; 5+ messages in thread
From: Tao Ma @ 2010-09-19 7:20 UTC (permalink / raw)
To: ocfs2-devel
Add a new para 'checkpoint' in ocfs2_truncate_log_needs_flush,
if it finds we need to checkpoint the journal:
1) call ocfs2_start_checkpoint if we are cluster mount.
2) call jbd2_journal_start_commit if we are in local mode.
For xattr truncate, I don't pass the parameter now since the value
now has a limit of 64K, so I don't think we need a checkpoint there.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
fs/ocfs2/alloc.c | 25 ++++++++++++++++++-------
fs/ocfs2/alloc.h | 2 +-
fs/ocfs2/journal.c | 13 +++++++++++++
fs/ocfs2/journal.h | 1 +
fs/ocfs2/xattr.c | 4 ++--
5 files changed, 35 insertions(+), 10 deletions(-)
diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index c765447..d2c6d3a 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -5646,7 +5646,7 @@ int ocfs2_remove_btree_range(struct inode *inode,
struct ocfs2_cached_dealloc_ctxt *dealloc,
u64 refcount_loc)
{
- int ret, credits = 0, extra_blocks = 0;
+ int ret, credits = 0, extra_blocks = 0, checkpoint = 0;
u64 phys_blkno = ocfs2_clusters_to_blocks(inode->i_sb, phys_cpos);
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct inode *tl_inode = osb->osb_tl_inode;
@@ -5686,7 +5686,7 @@ int ocfs2_remove_btree_range(struct inode *inode,
mutex_lock(&tl_inode->i_mutex);
- if (ocfs2_truncate_log_needs_flush(osb)) {
+ if (ocfs2_truncate_log_needs_flush(osb, &checkpoint)) {
ret = __ocfs2_flush_truncate_log(osb);
if (ret < 0) {
mlog_errno(ret);
@@ -5748,11 +5748,14 @@ out:
if (ref_tree)
ocfs2_unlock_refcount_tree(osb, ref_tree, 1);
+ if (!ret && checkpoint)
+ ocfs2_start_journal_commit(osb);
+
return ret;
}
#define FLUSH_TRUNCATE_LOG_RATIO 10
-int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb)
+int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb, int *checkpoint)
{
struct buffer_head *tl_bh = osb->osb_tl_bh;
struct ocfs2_dinode *di;
@@ -5781,6 +5784,9 @@ int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb)
" we have %u clusters truncated\n",
watermark, osb->truncated_clusters);
flush = 1;
+
+ if (checkpoint)
+ *checkpoint = 1;
}
}
@@ -6187,7 +6193,7 @@ bail:
int ocfs2_complete_truncate_log_recovery(struct ocfs2_super *osb,
struct ocfs2_dinode *tl_copy)
{
- int status = 0;
+ int status = 0, checkpoint = 0;
int i;
unsigned int clusters, num_recs, start_cluster;
u64 start_blk;
@@ -6209,7 +6215,7 @@ int ocfs2_complete_truncate_log_recovery(struct ocfs2_super *osb,
mutex_lock(&tl_inode->i_mutex);
for(i = 0; i < num_recs; i++) {
- if (ocfs2_truncate_log_needs_flush(osb)) {
+ if (ocfs2_truncate_log_needs_flush(osb, &checkpoint)) {
status = __ocfs2_flush_truncate_log(osb);
if (status < 0) {
mlog_errno(status);
@@ -6240,6 +6246,9 @@ int ocfs2_complete_truncate_log_recovery(struct ocfs2_super *osb,
bail_up:
mutex_unlock(&tl_inode->i_mutex);
+ if (!status && checkpoint)
+ ocfs2_start_journal_commit(osb);
+
mlog_exit(status);
return status;
}
@@ -6442,12 +6451,12 @@ static int ocfs2_free_cached_clusters(struct ocfs2_super *osb,
struct ocfs2_cached_block_free *tmp;
struct inode *tl_inode = osb->osb_tl_inode;
handle_t *handle;
- int ret = 0;
+ int ret = 0, checkpoint = 0;
mutex_lock(&tl_inode->i_mutex);
while (head) {
- if (ocfs2_truncate_log_needs_flush(osb)) {
+ if (ocfs2_truncate_log_needs_flush(osb, &checkpoint)) {
ret = __ocfs2_flush_truncate_log(osb);
if (ret < 0) {
mlog_errno(ret);
@@ -6485,6 +6494,8 @@ static int ocfs2_free_cached_clusters(struct ocfs2_super *osb,
kfree(tmp);
}
+ if (!ret && checkpoint)
+ ocfs2_start_checkpoint(osb);
return ret;
}
diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index 55762b5..967ee21 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -182,7 +182,7 @@ int ocfs2_begin_truncate_log_recovery(struct ocfs2_super *osb,
struct ocfs2_dinode **tl_copy);
int ocfs2_complete_truncate_log_recovery(struct ocfs2_super *osb,
struct ocfs2_dinode *tl_copy);
-int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb);
+int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb, int *checkpoint);
int ocfs2_truncate_log_append(struct ocfs2_super *osb,
handle_t *handle,
u64 start_blk,
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 9b57c03..f2a931d 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -2248,3 +2248,16 @@ out:
ret = -EROFS;
return ret;
}
+
+void ocfs2_start_journal_commit(struct ocfs2_super *osb)
+{
+ /*
+ * If we are mounted in local mode, start the journal commit
+ * directly. If not, just wake up thread ocfs2cmt and let it
+ * work for us.
+ */
+ if (ocfs2_mount_local(osb))
+ jbd2_journal_start_commit(osb->journal->j_journal, NULL);
+ else
+ ocfs2_start_checkpoint(osb);
+}
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index b5baaa8..a0ff12b 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -196,6 +196,7 @@ void ocfs2_recovery_thread(struct ocfs2_super *osb,
int ocfs2_mark_dead_nodes(struct ocfs2_super *osb);
void ocfs2_complete_mount_recovery(struct ocfs2_super *osb);
void ocfs2_complete_quota_recovery(struct ocfs2_super *osb);
+void ocfs2_start_journal_commit(struct ocfs2_super *osb);
static inline void ocfs2_start_checkpoint(struct ocfs2_super *osb)
{
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index d03469f..82f1fa2 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -3595,7 +3595,7 @@ int ocfs2_xattr_set(struct inode *inode,
mutex_lock(&tl_inode->i_mutex);
- if (ocfs2_truncate_log_needs_flush(osb)) {
+ if (ocfs2_truncate_log_needs_flush(osb, NULL)) {
ret = __ocfs2_flush_truncate_log(osb);
if (ret < 0) {
mutex_unlock(&tl_inode->i_mutex);
@@ -5447,7 +5447,7 @@ static int ocfs2_rm_xattr_cluster(struct inode *inode,
mutex_lock(&tl_inode->i_mutex);
- if (ocfs2_truncate_log_needs_flush(osb)) {
+ if (ocfs2_truncate_log_needs_flush(osb, NULL)) {
ret = __ocfs2_flush_truncate_log(osb);
if (ret < 0) {
mlog_errno(ret);
--
1.7.1.571.gba4d01
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [Ocfs2-devel] [PATCH 1/2] ocfs2: flush truncate log in case it contains too many clusters.
2010-09-19 7:20 ` [Ocfs2-devel] [PATCH 1/2] ocfs2: flush truncate log in case it contains too many clusters Tao Ma
@ 2010-10-12 0:23 ` Mark Fasheh
2010-10-12 4:55 ` Tao Ma
0 siblings, 1 reply; 5+ messages in thread
From: Mark Fasheh @ 2010-10-12 0:23 UTC (permalink / raw)
To: ocfs2-devel
Hi Tao,
On Sun, Sep 19, 2010 at 03:20:28PM +0800, Tao Ma wrote:
> When we test whether we need to flush truncate log in
> ocfs2_truncate_log_needs_flush, we only take care of
> whether the truncate log is full. But if the volume is
> small and we have large block size(in this case truncate
> log can store too many records), we may be too late
> for flushing if the user create/write/delete files quickly.
>
> So I add a new FLUSH_TRUNCATE_LOG_RATIO so that we will also
> check whether the number of truncated clusters has reached
> a watermark, if yes, flush the truncate log.
> It resolves the ossbug #1288 somehow.
The problem with the ratio of course is that it doesn't necessarily
correlate to when we actually run out of space. Also, I am concerned that a
10% disk-size limit on truncate log could hurt us in some cases (maybe
truncate of a huge file in the middle of some other rms?)
How about an alternate solution - at the time we see -ENOSPC (during write
for example), why not drop all the allocator locks (global bitmap, local
alloc) and then look at the truncate log for a sufficiently sized extent?
Removing it from the truncate log at that point shouldn't be too hard and
we'd always be able to fill it up completely.
--Mark
--
Mark Fasheh
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Ocfs2-devel] [PATCH 1/2] ocfs2: flush truncate log in case it contains too many clusters.
2010-10-12 0:23 ` Mark Fasheh
@ 2010-10-12 4:55 ` Tao Ma
0 siblings, 0 replies; 5+ messages in thread
From: Tao Ma @ 2010-10-12 4:55 UTC (permalink / raw)
To: ocfs2-devel
Hi Mark,
On 10/12/2010 08:23 AM, Mark Fasheh wrote:
> Hi Tao,
>
> On Sun, Sep 19, 2010 at 03:20:28PM +0800, Tao Ma wrote:
>> When we test whether we need to flush truncate log in
>> ocfs2_truncate_log_needs_flush, we only take care of
>> whether the truncate log is full. But if the volume is
>> small and we have large block size(in this case truncate
>> log can store too many records), we may be too late
>> for flushing if the user create/write/delete files quickly.
>>
>> So I add a new FLUSH_TRUNCATE_LOG_RATIO so that we will also
>> check whether the number of truncated clusters has reached
>> a watermark, if yes, flush the truncate log.
>> It resolves the ossbug #1288 somehow.
>
> The problem with the ratio of course is that it doesn't necessarily
> correlate to when we actually run out of space. Also, I am concerned that a
> 10% disk-size limit on truncate log could hurt us in some cases (maybe
> truncate of a huge file in the middle of some other rms?)
>
>
> How about an alternate solution - at the time we see -ENOSPC (during write
> for example), why not drop all the allocator locks (global bitmap, local
> alloc) and then look at the truncate log for a sufficiently sized extent?
> Removing it from the truncate log at that point shouldn't be too hard and
> we'd always be able to fill it up completely.
Thanks for the review. Your suggestion does make sense, but I am afraid
it will surely be more complicated than this patch.
Anyway, I will be on a trip until next week and I will work on it when I
come back.
Regards,
Tao
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-10-12 4:55 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-19 7:19 [Ocfs2-devel] [PATCH 0/2 V2] ocfs2: Resolve the problem of truncate log flush Tao Ma
2010-09-19 7:20 ` [Ocfs2-devel] [PATCH 1/2] ocfs2: flush truncate log in case it contains too many clusters Tao Ma
2010-10-12 0:23 ` Mark Fasheh
2010-10-12 4:55 ` Tao Ma
2010-09-19 7:20 ` [Ocfs2-devel] [PATCH 2/2] ocfs2: Start journal checkpoint if we have too many truncated clusters Tao Ma
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).