* Re: [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages
2011-10-28 5:34 ` Kazuya Mio
@ 2011-11-01 23:13 ` Jan Kara
2011-11-02 5:24 ` Kazuya Mio
2011-11-07 8:00 ` Dmitry Monakhov
2011-11-08 0:03 ` Jan Kara
2 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2011-11-01 23:13 UTC (permalink / raw)
To: Kazuya Mio; +Cc: Jan Kara, ext4, Theodore Tso, Andreas Dilger
On Fri 28-10-11 14:34:31, Kazuya Mio wrote:
> 2011/10/25 22:40, Jan Kara wrote:
> > Please no. Generally this boils down to what do we do with dirty data
> >when there's error in writing them out. Currently we just throw them away
> >(e.g. in media error case) but I don't think that's a generally good thing
> >because e.g. admin may want to copy the data to other working storage or
> >so. So I think we should rather keep the data and provide a mechanism for
> >userspace to ask kernel to get rid of the data (so that we don't eventually
> >run OOM).
>
> I see. I agree with you.
>
> >>Do you have any ideas?
> > So the question is what would you like to achieve. If you just want to
> >unblock a thread then a solution would be to make a thread at
> >balance_dirty_pages() killable. If generally you want to get rid of dirty
> >memory, then I don't have a really good answer but throwing dirty data away
> >seems like a bad answer to me.
>
> The problem is that we cannot unmount the corrupted filesystem due to
> un-killable dd process. We must bring down the system to resume the service
> with no dirty pages. I think it is important for the service continuity
> to be able to kill the thread handling in balance_dirty_pages().
Sure. Then allowing a process to be killed while waiting in
balance_dirty_pages() would solve your problem. That can be done relatively
easily. I can write the patch, just now the code is under rewrite from
IO-less dirty throttling patches so I'll wait for a while for it to settle
down.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages
2011-11-01 23:13 ` Jan Kara
@ 2011-11-02 5:24 ` Kazuya Mio
0 siblings, 0 replies; 14+ messages in thread
From: Kazuya Mio @ 2011-11-02 5:24 UTC (permalink / raw)
To: Jan Kara; +Cc: ext4, Theodore Tso, Andreas Dilger
2011/11/02 8:13, Jan Kara wrote:
>> The problem is that we cannot unmount the corrupted filesystem due to
>> un-killable dd process. We must bring down the system to resume the service
>> with no dirty pages. I think it is important for the service continuity
>> to be able to kill the thread handling in balance_dirty_pages().
> Sure. Then allowing a process to be killed while waiting in
> balance_dirty_pages() would solve your problem. That can be done relatively
> easily. I can write the patch, just now the code is under rewrite from
> IO-less dirty throttling patches so I'll wait for a while for it to settle
> down.
Thanks for working this on.
Regards,
Kazuya Mio
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages
2011-10-28 5:34 ` Kazuya Mio
2011-11-01 23:13 ` Jan Kara
@ 2011-11-07 8:00 ` Dmitry Monakhov
2011-11-07 17:29 ` Jan Kara
2011-11-08 0:03 ` Jan Kara
2 siblings, 1 reply; 14+ messages in thread
From: Dmitry Monakhov @ 2011-11-07 8:00 UTC (permalink / raw)
To: Kazuya Mio, Jan Kara; +Cc: ext4, Theodore Tso, Andreas Dilger
On Fri, 28 Oct 2011 14:34:31 +0900, Kazuya Mio <k-mio@sx.jp.nec.com> wrote:
> 2011/10/25 22:40, Jan Kara wrote:
> > Please no. Generally this boils down to what do we do with dirty data
> > when there's error in writing them out. Currently we just throw them away
> > (e.g. in media error case) but I don't think that's a generally good thing
> > because e.g. admin may want to copy the data to other working storage or
> > so. So I think we should rather keep the data and provide a mechanism for
> > userspace to ask kernel to get rid of the data (so that we don't eventually
> > run OOM).
>
> I see. I agree with you.
>
> >> Do you have any ideas?
> > So the question is what would you like to achieve. If you just want to
> > unblock a thread then a solution would be to make a thread at
> > balance_dirty_pages() killable. If generally you want to get rid of dirty
> > memory, then I don't have a really good answer but throwing dirty data away
> > seems like a bad answer to me.
>
> The problem is that we cannot unmount the corrupted filesystem due to
> un-killable dd process. We must bring down the system to resume the service
> with no dirty pages. I think it is important for the service continuity
> to be able to kill the thread handling in balance_dirty_pages().
In fact you are very lucky because dd is just deadlocked, in many cases
journal abort result in BUG_ON triggering(if IO load is high enough).
This is because transaction abort check is racy. Right now i've no good
fix which has reasonable performance. My latest idea is to protect
transaction abort check via SRCU.
>
> Regards,
> Kazuya Mio
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages
2011-11-07 8:00 ` Dmitry Monakhov
@ 2011-11-07 17:29 ` Jan Kara
2011-11-07 17:45 ` Dmitry Monakhov
0 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2011-11-07 17:29 UTC (permalink / raw)
To: Dmitry Monakhov; +Cc: Kazuya Mio, Jan Kara, ext4, Theodore Tso, Andreas Dilger
On Mon 07-11-11 12:00:41, Dmitry Monakhov wrote:
> On Fri, 28 Oct 2011 14:34:31 +0900, Kazuya Mio <k-mio@sx.jp.nec.com> wrote:
> > 2011/10/25 22:40, Jan Kara wrote:
> > > Please no. Generally this boils down to what do we do with dirty data
> > > when there's error in writing them out. Currently we just throw them away
> > > (e.g. in media error case) but I don't think that's a generally good thing
> > > because e.g. admin may want to copy the data to other working storage or
> > > so. So I think we should rather keep the data and provide a mechanism for
> > > userspace to ask kernel to get rid of the data (so that we don't eventually
> > > run OOM).
> >
> > I see. I agree with you.
> >
> > >> Do you have any ideas?
> > > So the question is what would you like to achieve. If you just want to
> > > unblock a thread then a solution would be to make a thread at
> > > balance_dirty_pages() killable. If generally you want to get rid of dirty
> > > memory, then I don't have a really good answer but throwing dirty data away
> > > seems like a bad answer to me.
> >
> > The problem is that we cannot unmount the corrupted filesystem due to
> > un-killable dd process. We must bring down the system to resume the service
> > with no dirty pages. I think it is important for the service continuity
> > to be able to kill the thread handling in balance_dirty_pages().
> In fact you are very lucky because dd is just deadlocked, in many cases
> journal abort result in BUG_ON triggering(if IO load is high enough).
Can you provide the exact kernel message? I'd be interested...
> This is because transaction abort check is racy. Right now i've no good
> fix which has reasonable performance. My latest idea is to protect
> transaction abort check via SRCU.
Yeah, the code does not seem to care about races too much but I don't see
which BUG_ON would be triggered...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages
2011-11-07 17:29 ` Jan Kara
@ 2011-11-07 17:45 ` Dmitry Monakhov
2011-11-07 21:23 ` Jan Kara
0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Monakhov @ 2011-11-07 17:45 UTC (permalink / raw)
To: Jan Kara; +Cc: Kazuya Mio, Jan Kara, ext4, Theodore Tso, Andreas Dilger
On Mon, 7 Nov 2011 18:29:39 +0100, Jan Kara <jack@suse.cz> wrote:
> On Mon 07-11-11 12:00:41, Dmitry Monakhov wrote:
> > On Fri, 28 Oct 2011 14:34:31 +0900, Kazuya Mio <k-mio@sx.jp.nec.com> wrote:
> > > 2011/10/25 22:40, Jan Kara wrote:
> > > > Please no. Generally this boils down to what do we do with dirty data
> > > > when there's error in writing them out. Currently we just throw them away
> > > > (e.g. in media error case) but I don't think that's a generally good thing
> > > > because e.g. admin may want to copy the data to other working storage or
> > > > so. So I think we should rather keep the data and provide a mechanism for
> > > > userspace to ask kernel to get rid of the data (so that we don't eventually
> > > > run OOM).
> > >
> > > I see. I agree with you.
> > >
> > > >> Do you have any ideas?
> > > > So the question is what would you like to achieve. If you just want to
> > > > unblock a thread then a solution would be to make a thread at
> > > > balance_dirty_pages() killable. If generally you want to get rid of dirty
> > > > memory, then I don't have a really good answer but throwing dirty data away
> > > > seems like a bad answer to me.
> > >
> > > The problem is that we cannot unmount the corrupted filesystem due to
> > > un-killable dd process. We must bring down the system to resume the service
> > > with no dirty pages. I think it is important for the service continuity
> > > to be able to kill the thread handling in balance_dirty_pages().
> > In fact you are very lucky because dd is just deadlocked, in many cases
> > journal abort result in BUG_ON triggering(if IO load is high enough).
> Can you provide the exact kernel message? I'd be interested...
Several times i've failed in journal_stop() here:
int jbd2_journal_stop(handle_t *handle)
{
transaction_t *transaction = handle->h_transaction;
journal_t *journal = transaction->t_journal;
int err, wait_for_commit = 0;
tid_t tid;
pid_t pid;
J_ASSERT(journal_current_handle() == handle);
if (is_handle_aborted(handle))
err = -EIO;
else {
J_ASSERT(atomic_read(&transaction->t_updates) > 0);
##FAILED HERE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
err = 0;
}
>
> > This is because transaction abort check is racy. Right now i've no good
> > fix which has reasonable performance. My latest idea is to protect
> > transaction abort check via SRCU.
> Yeah, the code does not seem to care about races too much but I don't see
> which BUG_ON would be triggered...
>
> Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages
2011-11-07 17:45 ` Dmitry Monakhov
@ 2011-11-07 21:23 ` Jan Kara
0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2011-11-07 21:23 UTC (permalink / raw)
To: Dmitry Monakhov; +Cc: Jan Kara, Kazuya Mio, ext4, Theodore Tso, Andreas Dilger
On Mon 07-11-11 21:45:31, Dmitry Monakhov wrote:
> On Mon, 7 Nov 2011 18:29:39 +0100, Jan Kara <jack@suse.cz> wrote:
> > On Mon 07-11-11 12:00:41, Dmitry Monakhov wrote:
> > > On Fri, 28 Oct 2011 14:34:31 +0900, Kazuya Mio <k-mio@sx.jp.nec.com> wrote:
> > > > 2011/10/25 22:40, Jan Kara wrote:
> > > > > Please no. Generally this boils down to what do we do with dirty data
> > > > > when there's error in writing them out. Currently we just throw them away
> > > > > (e.g. in media error case) but I don't think that's a generally good thing
> > > > > because e.g. admin may want to copy the data to other working storage or
> > > > > so. So I think we should rather keep the data and provide a mechanism for
> > > > > userspace to ask kernel to get rid of the data (so that we don't eventually
> > > > > run OOM).
> > > >
> > > > I see. I agree with you.
> > > >
> > > > >> Do you have any ideas?
> > > > > So the question is what would you like to achieve. If you just want to
> > > > > unblock a thread then a solution would be to make a thread at
> > > > > balance_dirty_pages() killable. If generally you want to get rid of dirty
> > > > > memory, then I don't have a really good answer but throwing dirty data away
> > > > > seems like a bad answer to me.
> > > >
> > > > The problem is that we cannot unmount the corrupted filesystem due to
> > > > un-killable dd process. We must bring down the system to resume the service
> > > > with no dirty pages. I think it is important for the service continuity
> > > > to be able to kill the thread handling in balance_dirty_pages().
> > > In fact you are very lucky because dd is just deadlocked, in many cases
> > > journal abort result in BUG_ON triggering(if IO load is high enough).
> > Can you provide the exact kernel message? I'd be interested...
> Several times i've failed in journal_stop() here:
> int jbd2_journal_stop(handle_t *handle)
> {
> transaction_t *transaction = handle->h_transaction;
> journal_t *journal = transaction->t_journal;
> int err, wait_for_commit = 0;
> tid_t tid;
> pid_t pid;
>
> J_ASSERT(journal_current_handle() == handle);
>
> if (is_handle_aborted(handle))
> err = -EIO;
> else {
> J_ASSERT(atomic_read(&transaction->t_updates) > 0);
> ##FAILED HERE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> err = 0;
> }
Hum, interesting. The logic wrt t_updates looks correct to me. Whenever
we create a new handle in a transaction, we increase t_updates. Whenever we
remove the handle, decrease t_updates. Whether the journal / handle is
aborted or not does not play any role here. So I fail to see how the
assertion can be triggered - only if we tried to release a handle twice or
something like that...
Honza
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages
2011-10-28 5:34 ` Kazuya Mio
2011-11-01 23:13 ` Jan Kara
2011-11-07 8:00 ` Dmitry Monakhov
@ 2011-11-08 0:03 ` Jan Kara
2011-11-09 8:28 ` Kazuya Mio
2011-11-14 10:06 ` Kazuya Mio
2 siblings, 2 replies; 14+ messages in thread
From: Jan Kara @ 2011-11-08 0:03 UTC (permalink / raw)
To: Kazuya Mio; +Cc: Jan Kara, ext4, Theodore Tso, Andreas Dilger
[-- Attachment #1: Type: text/plain, Size: 1414 bytes --]
On Fri 28-10-11 14:34:31, Kazuya Mio wrote:
> 2011/10/25 22:40, Jan Kara wrote:
> > Please no. Generally this boils down to what do we do with dirty data
> >when there's error in writing them out. Currently we just throw them away
> >(e.g. in media error case) but I don't think that's a generally good thing
> >because e.g. admin may want to copy the data to other working storage or
> >so. So I think we should rather keep the data and provide a mechanism for
> >userspace to ask kernel to get rid of the data (so that we don't eventually
> >run OOM).
>
> I see. I agree with you.
>
> >>Do you have any ideas?
> > So the question is what would you like to achieve. If you just want to
> >unblock a thread then a solution would be to make a thread at
> >balance_dirty_pages() killable. If generally you want to get rid of dirty
> >memory, then I don't have a really good answer but throwing dirty data away
> >seems like a bad answer to me.
>
> The problem is that we cannot unmount the corrupted filesystem due to
> un-killable dd process. We must bring down the system to resume the service
> with no dirty pages. I think it is important for the service continuity
> to be able to kill the thread handling in balance_dirty_pages().
OK, attached are two patches based on latest Linus's tree that should
make your task killable. Can you test them?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
[-- Attachment #2: 0001-mm-Make-task-in-balance_dirty_pages-killable.patch --]
[-- Type: text/x-patch, Size: 1076 bytes --]
>From 62d9916059c0441b3f545158f723c7006bcdc1e8 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 7 Nov 2011 18:41:05 +0100
Subject: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
There is no reason why task in balance_dirty_pages() shouldn't be killable
and it helps in recovering from some error conditions (like when filesystem
goes in error state and cannot accept writeback anymore but we still want to
kill processes using it to be able to unmount it).
Signed-off-by: Jan Kara <jack@suse.cz>
---
mm/page-writeback.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0360d1b..e83c286 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1133,8 +1133,10 @@ pause:
pages_dirtied,
pause,
start_time);
- __set_current_state(TASK_UNINTERRUPTIBLE);
+ __set_current_state(TASK_KILLABLE);
io_schedule_timeout(pause);
+ if (fatal_signal_pending(current))
+ break;
dirty_thresh = hard_dirty_limit(dirty_thresh);
/*
--
1.7.1
[-- Attachment #3: 0002-fs-Make-write-2-interruptible-by-a-signal.patch --]
[-- Type: text/x-patch, Size: 979 bytes --]
>From 6eefa10d92cc35b66a8166cc26472d383b572b0d Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 7 Nov 2011 18:46:39 +0100
Subject: [PATCH 2/2] fs: Make write(2) interruptible by a signal
Currently write(2) to a file is not interruptible by a signal. Sometimes this
is desirable (e.g. when you want to quickly kill a process hogging your disk or
when some process gets blocked in balance_dirty_pages() indefinitely due to a
filesystem being in an error condition).
Signed-off-by: Jan Kara <jack@suse.cz>
---
mm/filemap.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index c0018f2..6b01d2f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file,
iov_iter_count(i));
again:
+ if (signal_pending(current)) {
+ status = -EINTR;
+ break;
+ }
/*
* Bring in the user page that we will copy from _first_.
--
1.7.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages
2011-11-08 0:03 ` Jan Kara
@ 2011-11-09 8:28 ` Kazuya Mio
2011-11-09 11:15 ` Jan Kara
2011-11-14 10:06 ` Kazuya Mio
1 sibling, 1 reply; 14+ messages in thread
From: Kazuya Mio @ 2011-11-09 8:28 UTC (permalink / raw)
To: Jan Kara; +Cc: ext4, Theodore Tso, Andreas Dilger
2011/11/08 9:03, Jan Kara wrote:
> On Fri 28-10-11 14:34:31, Kazuya Mio wrote:
>> 2011/10/25 22:40, Jan Kara wrote:
>>> Please no. Generally this boils down to what do we do with dirty data
>>> when there's error in writing them out. Currently we just throw them away
>>> (e.g. in media error case) but I don't think that's a generally good thing
>>> because e.g. admin may want to copy the data to other working storage or
>>> so. So I think we should rather keep the data and provide a mechanism for
>>> userspace to ask kernel to get rid of the data (so that we don't eventually
>>> run OOM).
>>
>> I see. I agree with you.
>>
>>>> Do you have any ideas?
>>> So the question is what would you like to achieve. If you just want to
>>> unblock a thread then a solution would be to make a thread at
>>> balance_dirty_pages() killable. If generally you want to get rid of dirty
>>> memory, then I don't have a really good answer but throwing dirty data away
>>> seems like a bad answer to me.
>>
>> The problem is that we cannot unmount the corrupted filesystem due to
>> un-killable dd process. We must bring down the system to resume the service
>> with no dirty pages. I think it is important for the service continuity
>> to be able to kill the thread handling in balance_dirty_pages().
> OK, attached are two patches based on latest Linus's tree that should
> make your task killable. Can you test them?
I'm trying to reproduce now, but it's hard. Could you wait a few days?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages
2011-11-09 8:28 ` Kazuya Mio
@ 2011-11-09 11:15 ` Jan Kara
0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2011-11-09 11:15 UTC (permalink / raw)
To: Kazuya Mio; +Cc: Jan Kara, ext4, Theodore Tso, Andreas Dilger
On Wed 09-11-11 17:28:20, Kazuya Mio wrote:
> 2011/11/08 9:03, Jan Kara wrote:
> > On Fri 28-10-11 14:34:31, Kazuya Mio wrote:
> >> 2011/10/25 22:40, Jan Kara wrote:
> >>> Please no. Generally this boils down to what do we do with dirty data
> >>> when there's error in writing them out. Currently we just throw them away
> >>> (e.g. in media error case) but I don't think that's a generally good thing
> >>> because e.g. admin may want to copy the data to other working storage or
> >>> so. So I think we should rather keep the data and provide a mechanism for
> >>> userspace to ask kernel to get rid of the data (so that we don't eventually
> >>> run OOM).
> >>
> >> I see. I agree with you.
> >>
> >>>> Do you have any ideas?
> >>> So the question is what would you like to achieve. If you just want to
> >>> unblock a thread then a solution would be to make a thread at
> >>> balance_dirty_pages() killable. If generally you want to get rid of dirty
> >>> memory, then I don't have a really good answer but throwing dirty data away
> >>> seems like a bad answer to me.
> >>
> >> The problem is that we cannot unmount the corrupted filesystem due to
> >> un-killable dd process. We must bring down the system to resume the service
> >> with no dirty pages. I think it is important for the service continuity
> >> to be able to kill the thread handling in balance_dirty_pages().
> > OK, attached are two patches based on latest Linus's tree that should
> > make your task killable. Can you test them?
>
> I'm trying to reproduce now, but it's hard. Could you wait a few days?
Sure, take as much time as you need.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages
2011-11-08 0:03 ` Jan Kara
2011-11-09 8:28 ` Kazuya Mio
@ 2011-11-14 10:06 ` Kazuya Mio
2011-11-14 11:11 ` Jan Kara
1 sibling, 1 reply; 14+ messages in thread
From: Kazuya Mio @ 2011-11-14 10:06 UTC (permalink / raw)
To: Jan Kara; +Cc: ext4, Theodore Tso, Andreas Dilger
2011/11/08 9:03, Jan Kara wrote:
> On Fri 28-10-11 14:34:31, Kazuya Mio wrote:
>> 2011/10/25 22:40, Jan Kara wrote:
>>> Please no. Generally this boils down to what do we do with dirty data
>>> when there's error in writing them out. Currently we just throw them away
>>> (e.g. in media error case) but I don't think that's a generally good thing
>>> because e.g. admin may want to copy the data to other working storage or
>>> so. So I think we should rather keep the data and provide a mechanism for
>>> userspace to ask kernel to get rid of the data (so that we don't eventually
>>> run OOM).
>>
>> I see. I agree with you.
>>
>>>> Do you have any ideas?
>>> So the question is what would you like to achieve. If you just want to
>>> unblock a thread then a solution would be to make a thread at
>>> balance_dirty_pages() killable. If generally you want to get rid of dirty
>>> memory, then I don't have a really good answer but throwing dirty data away
>>> seems like a bad answer to me.
>>
>> The problem is that we cannot unmount the corrupted filesystem due to
>> un-killable dd process. We must bring down the system to resume the service
>> with no dirty pages. I think it is important for the service continuity
>> to be able to kill the thread handling in balance_dirty_pages().
> OK, attached are two patches based on latest Linus's tree that should
> make your task killable. Can you test them?
Sorry for the late reply.
I confirmed that these patches fix the problem.
Reported-and-tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Regards,
Kazuya Mio
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages
2011-11-14 10:06 ` Kazuya Mio
@ 2011-11-14 11:11 ` Jan Kara
0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2011-11-14 11:11 UTC (permalink / raw)
To: Kazuya Mio; +Cc: Jan Kara, ext4, Theodore Tso, Andreas Dilger
On Mon 14-11-11 19:06:31, Kazuya Mio wrote:
> 2011/11/08 9:03, Jan Kara wrote:
> > On Fri 28-10-11 14:34:31, Kazuya Mio wrote:
> >> 2011/10/25 22:40, Jan Kara wrote:
> >>> Please no. Generally this boils down to what do we do with dirty data
> >>> when there's error in writing them out. Currently we just throw them away
> >>> (e.g. in media error case) but I don't think that's a generally good thing
> >>> because e.g. admin may want to copy the data to other working storage or
> >>> so. So I think we should rather keep the data and provide a mechanism for
> >>> userspace to ask kernel to get rid of the data (so that we don't eventually
> >>> run OOM).
> >>
> >> I see. I agree with you.
> >>
> >>>> Do you have any ideas?
> >>> So the question is what would you like to achieve. If you just want to
> >>> unblock a thread then a solution would be to make a thread at
> >>> balance_dirty_pages() killable. If generally you want to get rid of dirty
> >>> memory, then I don't have a really good answer but throwing dirty data away
> >>> seems like a bad answer to me.
> >>
> >> The problem is that we cannot unmount the corrupted filesystem due to
> >> un-killable dd process. We must bring down the system to resume the service
> >> with no dirty pages. I think it is important for the service continuity
> >> to be able to kill the thread handling in balance_dirty_pages().
> > OK, attached are two patches based on latest Linus's tree that should
> > make your task killable. Can you test them?
>
> Sorry for the late reply.
> I confirmed that these patches fix the problem.
>
> Reported-and-tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Thanks for testing! I've sent patches for inclusion...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread