* [Ocfs2-devel] [PATCH] ocfs2: limit printk when journal is aborted
@ 2014-04-17 11:08 Joseph Qi
2014-04-17 21:01 ` Mark Fasheh
0 siblings, 1 reply; 8+ messages in thread
From: Joseph Qi @ 2014-04-17 11:08 UTC (permalink / raw)
To: ocfs2-devel
Once JBD2_ABORT is set, ocfs2_commit_cache will fail in
ocfs2_commit_thread. Then it will get into a loop with mass logs. This
will meaninglessly consume a larger number of resource and may lead to
system hung at last.
So limit printk in this case.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
---
fs/ocfs2/journal.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 44fc3e5..cfefbd1 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -30,6 +30,7 @@
#include <linux/kthread.h>
#include <linux/time.h>
#include <linux/random.h>
+#include <linux/delay.h>
#include <cluster/masklog.h>
@@ -2191,8 +2192,15 @@ static int ocfs2_commit_thread(void *arg)
|| kthread_should_stop());
status = ocfs2_commit_cache(osb);
- if (status < 0)
- mlog_errno(status);
+ if (status < 0) {
+ static unsigned long abort_warn_time;
+
+ /* Warn about this once per minute */
+ if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
+ mlog(ML_ERROR, "status = %d, journal is "
+ "already aborted.\n", status);
+ msleep_interruptible(1000);
+ }
if (kthread_should_stop() && atomic_read(&journal->j_num_trans)){
mlog(ML_KTHREAD,
--
1.8.4.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: limit printk when journal is aborted
2014-04-17 11:08 [Ocfs2-devel] [PATCH] ocfs2: limit printk when journal is aborted Joseph Qi
@ 2014-04-17 21:01 ` Mark Fasheh
2014-04-18 1:02 ` Joseph Qi
0 siblings, 1 reply; 8+ messages in thread
From: Mark Fasheh @ 2014-04-17 21:01 UTC (permalink / raw)
To: ocfs2-devel
On Thu, Apr 17, 2014 at 07:08:42PM +0800, Joseph Qi wrote:
>
> Once JBD2_ABORT is set, ocfs2_commit_cache will fail in
> ocfs2_commit_thread. Then it will get into a loop with mass logs. This
> will meaninglessly consume a larger number of resource and may lead to
> system hung at last.
> So limit printk in this case.
>
> Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
> ---
> fs/ocfs2/journal.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
> index 44fc3e5..cfefbd1 100644
> --- a/fs/ocfs2/journal.c
> +++ b/fs/ocfs2/journal.c
> @@ -30,6 +30,7 @@
> #include <linux/kthread.h>
> #include <linux/time.h>
> #include <linux/random.h>
> +#include <linux/delay.h>
>
> #include <cluster/masklog.h>
>
> @@ -2191,8 +2192,15 @@ static int ocfs2_commit_thread(void *arg)
> || kthread_should_stop());
>
> status = ocfs2_commit_cache(osb);
> - if (status < 0)
> - mlog_errno(status);
> + if (status < 0) {
> + static unsigned long abort_warn_time;
> +
> + /* Warn about this once per minute */
> + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
> + mlog(ML_ERROR, "status = %d, journal is "
> + "already aborted.\n", status);
> + msleep_interruptible(1000);
> + }
Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue
right after this anyway - is there a problem with it waiting on that?
Generally I really don't like peppering msleep() into the code where we
might need to sleep - there is often a more elegant solution available.
Thanks,
--Mark
--
Mark Fasheh
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: limit printk when journal is aborted
2014-04-17 21:01 ` Mark Fasheh
@ 2014-04-18 1:02 ` Joseph Qi
2014-04-18 2:45 ` Mark Fasheh
0 siblings, 1 reply; 8+ messages in thread
From: Joseph Qi @ 2014-04-18 1:02 UTC (permalink / raw)
To: ocfs2-devel
On 2014/4/18 5:01, Mark Fasheh wrote:
> On Thu, Apr 17, 2014 at 07:08:42PM +0800, Joseph Qi wrote:
>>
>> Once JBD2_ABORT is set, ocfs2_commit_cache will fail in
>> ocfs2_commit_thread. Then it will get into a loop with mass logs. This
>> will meaninglessly consume a larger number of resource and may lead to
>> system hung at last.
>> So limit printk in this case.
>>
>> Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
>> ---
>> fs/ocfs2/journal.c | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
>> index 44fc3e5..cfefbd1 100644
>> --- a/fs/ocfs2/journal.c
>> +++ b/fs/ocfs2/journal.c
>> @@ -30,6 +30,7 @@
>> #include <linux/kthread.h>
>> #include <linux/time.h>
>> #include <linux/random.h>
>> +#include <linux/delay.h>
>>
>> #include <cluster/masklog.h>
>>
>> @@ -2191,8 +2192,15 @@ static int ocfs2_commit_thread(void *arg)
>> || kthread_should_stop());
>>
>> status = ocfs2_commit_cache(osb);
>> - if (status < 0)
>> - mlog_errno(status);
>> + if (status < 0) {
>> + static unsigned long abort_warn_time;
>> +
>> + /* Warn about this once per minute */
>> + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
>> + mlog(ML_ERROR, "status = %d, journal is "
>> + "already aborted.\n", status);
>> + msleep_interruptible(1000);
>> + }
>
> Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue
> right after this anyway - is there a problem with it waiting on that?
>
Since jbd2 is already aborted, commit cache is meaningless.
> Generally I really don't like peppering msleep() into the code where we
> might need to sleep - there is often a more elegant solution available.
>
> Thanks,
> --Mark
>
> --
> Mark Fasheh
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: limit printk when journal is aborted
2014-04-18 1:02 ` Joseph Qi
@ 2014-04-18 2:45 ` Mark Fasheh
2014-04-18 9:18 ` Joseph Qi
0 siblings, 1 reply; 8+ messages in thread
From: Mark Fasheh @ 2014-04-18 2:45 UTC (permalink / raw)
To: ocfs2-devel
On Fri, Apr 18, 2014 at 09:02:33AM +0800, Joseph Qi wrote:
> On 2014/4/18 5:01, Mark Fasheh wrote:
> > On Thu, Apr 17, 2014 at 07:08:42PM +0800, Joseph Qi wrote:
> >>
> >> Once JBD2_ABORT is set, ocfs2_commit_cache will fail in
> >> ocfs2_commit_thread. Then it will get into a loop with mass logs. This
> >> will meaninglessly consume a larger number of resource and may lead to
> >> system hung at last.
> >> So limit printk in this case.
> >>
> >> Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
> >> ---
> >> fs/ocfs2/journal.c | 12 ++++++++++--
> >> 1 file changed, 10 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
> >> index 44fc3e5..cfefbd1 100644
> >> --- a/fs/ocfs2/journal.c
> >> +++ b/fs/ocfs2/journal.c
> >> @@ -30,6 +30,7 @@
> >> #include <linux/kthread.h>
> >> #include <linux/time.h>
> >> #include <linux/random.h>
> >> +#include <linux/delay.h>
> >>
> >> #include <cluster/masklog.h>
> >>
> >> @@ -2191,8 +2192,15 @@ static int ocfs2_commit_thread(void *arg)
> >> || kthread_should_stop());
> >>
> >> status = ocfs2_commit_cache(osb);
> >> - if (status < 0)
> >> - mlog_errno(status);
> >> + if (status < 0) {
> >> + static unsigned long abort_warn_time;
> >> +
> >> + /* Warn about this once per minute */
> >> + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
> >> + mlog(ML_ERROR, "status = %d, journal is "
> >> + "already aborted.\n", status);
> >> + msleep_interruptible(1000);
> >> + }
> >
> > Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue
> > right after this anyway - is there a problem with it waiting on that?
> >
> Since jbd2 is already aborted, commit cache is meaningless.
I understand that, but I'm asking why the msleep and whether we can avoid
that. To go back to my question:
"ocfs2_commit_thread will wait on the checkpoint_event queue right after
this anyway - is there a problem with it waiting on that?"
Thanks,
--Mark
--
Mark Fasheh
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: limit printk when journal is aborted
2014-04-18 2:45 ` Mark Fasheh
@ 2014-04-18 9:18 ` Joseph Qi
2014-04-21 19:18 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Joseph Qi @ 2014-04-18 9:18 UTC (permalink / raw)
To: ocfs2-devel
On 2014/4/18 10:45, Mark Fasheh wrote:
> On Fri, Apr 18, 2014 at 09:02:33AM +0800, Joseph Qi wrote:
>> On 2014/4/18 5:01, Mark Fasheh wrote:
>>> On Thu, Apr 17, 2014 at 07:08:42PM +0800, Joseph Qi wrote:
>>>>
>>>> Once JBD2_ABORT is set, ocfs2_commit_cache will fail in
>>>> ocfs2_commit_thread. Then it will get into a loop with mass logs. This
>>>> will meaninglessly consume a larger number of resource and may lead to
>>>> system hung at last.
>>>> So limit printk in this case.
>>>>
>>>> Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
>>>> ---
>>>> fs/ocfs2/journal.c | 12 ++++++++++--
>>>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
>>>> index 44fc3e5..cfefbd1 100644
>>>> --- a/fs/ocfs2/journal.c
>>>> +++ b/fs/ocfs2/journal.c
>>>> @@ -30,6 +30,7 @@
>>>> #include <linux/kthread.h>
>>>> #include <linux/time.h>
>>>> #include <linux/random.h>
>>>> +#include <linux/delay.h>
>>>>
>>>> #include <cluster/masklog.h>
>>>>
>>>> @@ -2191,8 +2192,15 @@ static int ocfs2_commit_thread(void *arg)
>>>> || kthread_should_stop());
>>>>
>>>> status = ocfs2_commit_cache(osb);
>>>> - if (status < 0)
>>>> - mlog_errno(status);
>>>> + if (status < 0) {
>>>> + static unsigned long abort_warn_time;
>>>> +
>>>> + /* Warn about this once per minute */
>>>> + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
>>>> + mlog(ML_ERROR, "status = %d, journal is "
>>>> + "already aborted.\n", status);
>>>> + msleep_interruptible(1000);
>>>> + }
>>>
>>> Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue
>>> right after this anyway - is there a problem with it waiting on that?
>>>
>> Since jbd2 is already aborted, commit cache is meaningless.
>
> I understand that, but I'm asking why the msleep and whether we can avoid
> that. To go back to my question:
>
> "ocfs2_commit_thread will wait on the checkpoint_event queue right after
> this anyway - is there a problem with it waiting on that?"
>
> Thanks,
> --Mark
Sorry for my obscure description.
If ocfs2_commit_cache fails because of JBD2_ABORT, j_num_trans won't be cleared.
Then the condition of checkpoint event still evaluates true, so it won't wait.
>
> --
> Mark Fasheh
>
> .
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: limit printk when journal is aborted
2014-04-18 9:18 ` Joseph Qi
@ 2014-04-21 19:18 ` Andrew Morton
2014-04-21 20:51 ` Mark Fasheh
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2014-04-21 19:18 UTC (permalink / raw)
To: ocfs2-devel
On Fri, 18 Apr 2014 17:18:27 +0800 Joseph Qi <joseph.qi@huawei.com> wrote:
> >>>> + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
> >>>> + mlog(ML_ERROR, "status = %d, journal is "
> >>>> + "already aborted.\n", status);
> >>>> + msleep_interruptible(1000);
> >>>> + }
> >>>
> >>> Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue
> >>> right after this anyway - is there a problem with it waiting on that?
> >>>
> >> Since jbd2 is already aborted, commit cache is meaningless.
> >
> > I understand that, but I'm asking why the msleep and whether we can avoid
> > that. To go back to my question:
> >
> > "ocfs2_commit_thread will wait on the checkpoint_event queue right after
> > this anyway - is there a problem with it waiting on that?"
> >
> > Thanks,
> > --Mark
> Sorry for my obscure description.
> If ocfs2_commit_cache fails because of JBD2_ABORT, j_num_trans won't be cleared.
> Then the condition of checkpoint event still evaluates true, so it won't wait.
If Mark didn't understand the reason for the msleep then nobody weill,
so we need to add a comment. This?
--- a/fs/ocfs2/journal.c~ocfs2-limit-printk-when-journal-is-aborted-fix
+++ a/fs/ocfs2/journal.c
@@ -2193,6 +2193,11 @@ static int ocfs2_commit_thread(void *arg
if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
mlog(ML_ERROR, "status = %d, journal is "
"already aborted.\n", status);
+ /*
+ * After ocfs2_commit_cache() fails, j_num_trans has a
+ * non-zero value. Sleep here to avoid a busy-wait
+ * loop.
+ */
msleep_interruptible(1000);
}
This patch seems rather hacky :( Isn't there a better solution?
Why even keep the kernel thread running after an abort?
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: limit printk when journal is aborted
2014-04-21 19:18 ` Andrew Morton
@ 2014-04-21 20:51 ` Mark Fasheh
2014-04-22 1:08 ` Joseph Qi
0 siblings, 1 reply; 8+ messages in thread
From: Mark Fasheh @ 2014-04-21 20:51 UTC (permalink / raw)
To: ocfs2-devel
On Mon, Apr 21, 2014 at 12:18:24PM -0700, Andrew Morton wrote:
> On Fri, 18 Apr 2014 17:18:27 +0800 Joseph Qi <joseph.qi@huawei.com> wrote:
>
> > >>>> + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
> > >>>> + mlog(ML_ERROR, "status = %d, journal is "
> > >>>> + "already aborted.\n", status);
> > >>>> + msleep_interruptible(1000);
> > >>>> + }
> > >>>
> > >>> Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue
> > >>> right after this anyway - is there a problem with it waiting on that?
> > >>>
> > >> Since jbd2 is already aborted, commit cache is meaningless.
> > >
> > > I understand that, but I'm asking why the msleep and whether we can avoid
> > > that. To go back to my question:
> > >
> > > "ocfs2_commit_thread will wait on the checkpoint_event queue right after
> > > this anyway - is there a problem with it waiting on that?"
> > >
> > > Thanks,
> > > --Mark
> > Sorry for my obscure description.
> > If ocfs2_commit_cache fails because of JBD2_ABORT, j_num_trans won't be cleared.
> > Then the condition of checkpoint event still evaluates true, so it won't wait.
>
> If Mark didn't understand the reason for the msleep then nobody weill,
> so we need to add a comment. This?
>
> --- a/fs/ocfs2/journal.c~ocfs2-limit-printk-when-journal-is-aborted-fix
> +++ a/fs/ocfs2/journal.c
> @@ -2193,6 +2193,11 @@ static int ocfs2_commit_thread(void *arg
> if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
> mlog(ML_ERROR, "status = %d, journal is "
> "already aborted.\n", status);
> + /*
> + * After ocfs2_commit_cache() fails, j_num_trans has a
> + * non-zero value. Sleep here to avoid a busy-wait
> + * loop.
> + */
> msleep_interruptible(1000);
> }
>
>
> This patch seems rather hacky :( Isn't there a better solution?
Right, that's what I was getting at with my followup later on in the mail
thread about this.
> Why even keep the kernel thread running after an abort?
The msleep is papering over the real issue. Either the thread should shut
down or we need to re-evaluate usage of j_num_trans which is the condition
that keeps it from sleeping (and from a quick glance it doesn't seem like
j_num_trans does anything for us).
--Mark
--
Mark Fasheh
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2: limit printk when journal is aborted
2014-04-21 20:51 ` Mark Fasheh
@ 2014-04-22 1:08 ` Joseph Qi
0 siblings, 0 replies; 8+ messages in thread
From: Joseph Qi @ 2014-04-22 1:08 UTC (permalink / raw)
To: ocfs2-devel
On 2014/4/22 4:51, Mark Fasheh wrote:
> On Mon, Apr 21, 2014 at 12:18:24PM -0700, Andrew Morton wrote:
>> On Fri, 18 Apr 2014 17:18:27 +0800 Joseph Qi <joseph.qi@huawei.com> wrote:
>>
>>>>>>> + if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
>>>>>>> + mlog(ML_ERROR, "status = %d, journal is "
>>>>>>> + "already aborted.\n", status);
>>>>>>> + msleep_interruptible(1000);
>>>>>>> + }
>>>>>>
>>>>>> Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue
>>>>>> right after this anyway - is there a problem with it waiting on that?
>>>>>>
>>>>> Since jbd2 is already aborted, commit cache is meaningless.
>>>>
>>>> I understand that, but I'm asking why the msleep and whether we can avoid
>>>> that. To go back to my question:
>>>>
>>>> "ocfs2_commit_thread will wait on the checkpoint_event queue right after
>>>> this anyway - is there a problem with it waiting on that?"
>>>>
>>>> Thanks,
>>>> --Mark
>>> Sorry for my obscure description.
>>> If ocfs2_commit_cache fails because of JBD2_ABORT, j_num_trans won't be cleared.
>>> Then the condition of checkpoint event still evaluates true, so it won't wait.
>>
>> If Mark didn't understand the reason for the msleep then nobody weill,
>> so we need to add a comment. This?
>>
>> --- a/fs/ocfs2/journal.c~ocfs2-limit-printk-when-journal-is-aborted-fix
>> +++ a/fs/ocfs2/journal.c
>> @@ -2193,6 +2193,11 @@ static int ocfs2_commit_thread(void *arg
>> if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
>> mlog(ML_ERROR, "status = %d, journal is "
>> "already aborted.\n", status);
>> + /*
>> + * After ocfs2_commit_cache() fails, j_num_trans has a
>> + * non-zero value. Sleep here to avoid a busy-wait
>> + * loop.
>> + */
>> msleep_interruptible(1000);
>> }
>>
>>
>> This patch seems rather hacky :( Isn't there a better solution?
>
> Right, that's what I was getting at with my followup later on in the mail
> thread about this.
>
>
>> Why even keep the kernel thread running after an abort?
>
> The msleep is papering over the real issue. Either the thread should shut
> down or we need to re-evaluate usage of j_num_trans which is the condition
> that keeps it from sleeping (and from a quick glance it doesn't seem like
> j_num_trans does anything for us).
> --Mark
>
AFAIK, the commit thread ends only if dismounting volume. Journal abort
is different from journal shutdown, that's why leaves it running.
> --
> Mark Fasheh
>
> .
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-04-22 1:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-17 11:08 [Ocfs2-devel] [PATCH] ocfs2: limit printk when journal is aborted Joseph Qi
2014-04-17 21:01 ` Mark Fasheh
2014-04-18 1:02 ` Joseph Qi
2014-04-18 2:45 ` Mark Fasheh
2014-04-18 9:18 ` Joseph Qi
2014-04-21 19:18 ` Andrew Morton
2014-04-21 20:51 ` Mark Fasheh
2014-04-22 1:08 ` Joseph Qi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).