From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joseph Qi Date: Fri, 18 Apr 2014 17:18:27 +0800 Subject: [Ocfs2-devel] [PATCH] ocfs2: limit printk when journal is aborted In-Reply-To: <20140418024539.GE27178@wotan.suse.de> References: <534FB63A.9090402@huawei.com> <20140417210132.GB27178@wotan.suse.de> <535079A9.9060603@huawei.com> <20140418024539.GE27178@wotan.suse.de> Message-ID: <5350EDE3.2040605@huawei.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ocfs2-devel@oss.oracle.com On 2014/4/18 10:45, Mark Fasheh wrote: > On Fri, Apr 18, 2014 at 09:02:33AM +0800, Joseph Qi wrote: >> On 2014/4/18 5:01, Mark Fasheh wrote: >>> On Thu, Apr 17, 2014 at 07:08:42PM +0800, Joseph Qi wrote: >>>> >>>> Once JBD2_ABORT is set, ocfs2_commit_cache will fail in >>>> ocfs2_commit_thread. Then it will get into a loop with mass logs. This >>>> will meaninglessly consume a larger number of resource and may lead to >>>> system hung at last. >>>> So limit printk in this case. >>>> >>>> Signed-off-by: Joseph Qi >>>> --- >>>> fs/ocfs2/journal.c | 12 ++++++++++-- >>>> 1 file changed, 10 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c >>>> index 44fc3e5..cfefbd1 100644 >>>> --- a/fs/ocfs2/journal.c >>>> +++ b/fs/ocfs2/journal.c >>>> @@ -30,6 +30,7 @@ >>>> #include >>>> #include >>>> #include >>>> +#include >>>> >>>> #include >>>> >>>> @@ -2191,8 +2192,15 @@ static int ocfs2_commit_thread(void *arg) >>>> || kthread_should_stop()); >>>> >>>> status = ocfs2_commit_cache(osb); >>>> - if (status < 0) >>>> - mlog_errno(status); >>>> + if (status < 0) { >>>> + static unsigned long abort_warn_time; >>>> + >>>> + /* Warn about this once per minute */ >>>> + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ)) >>>> + mlog(ML_ERROR, "status = %d, journal is " >>>> + "already aborted.\n", status); >>>> + msleep_interruptible(1000); >>>> + } >>> >>> Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue >>> right after this anyway - is there a problem with it waiting on that? >>> >> Since jbd2 is already aborted, commit cache is meaningless. > > I understand that, but I'm asking why the msleep and whether we can avoid > that. To go back to my question: > > "ocfs2_commit_thread will wait on the checkpoint_event queue right after > this anyway - is there a problem with it waiting on that?" > > Thanks, > --Mark Sorry for my obscure description. If ocfs2_commit_cache fails because of JBD2_ABORT, j_num_trans won't be cleared. Then the condition of checkpoint event still evaluates true, so it won't wait. > > -- > Mark Fasheh > > . >