* [PATCH 0/2 v3] Make task in balance_dirty_pages() killable @ 2011-11-16 11:12 Jan Kara 2011-11-16 11:12 ` [PATCH 1/2] mm: " Jan Kara ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Jan Kara @ 2011-11-16 11:12 UTC (permalink / raw) To: Wu Fengguang; +Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel Hello, since the discussion has calmed down, here is a third iteration of the patches. Fengguang, can you put them into your tree and merge them with Linus? Thanks. Changes since v2: * removed return value of balance_dirty_pages() since it didn't seem that useful after all. * kept fatal_signal_pending() check in generic_perform_write() since I don't think possible partial writes when app receives SIGKILL are a problem. Changes since v1: * slightly moved the check in balance_dirty_pages() as Fengguang requested * made balance_dirty_pages() return EINTR if fatal signal was detected * changed check for signal to check for fatal signal in generic_perform_write() to avoid unexpected results for userspace applications. Honza ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-16 11:12 [PATCH 0/2 v3] Make task in balance_dirty_pages() killable Jan Kara @ 2011-11-16 11:12 ` Jan Kara 2011-11-16 11:28 ` Wu Fengguang 2011-11-16 11:12 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 2011-11-16 11:23 ` [PATCH 0/2 v3] Make task in balance_dirty_pages() killable Wu Fengguang 2 siblings, 1 reply; 34+ messages in thread From: Jan Kara @ 2011-11-16 11:12 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel, Jan Kara There is no reason why task in balance_dirty_pages() shouldn't be killable and it helps in recovering from some error conditions (like when filesystem goes in error state and cannot accept writeback anymore but we still want to kill processes using it to be able to unmount it). Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Jan Kara <jack@suse.cz> --- mm/page-writeback.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 0360d1b..39f5805 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1133,7 +1133,7 @@ pause: pages_dirtied, pause, start_time); - __set_current_state(TASK_UNINTERRUPTIBLE); + __set_current_state(TASK_KILLABLE); io_schedule_timeout(pause); dirty_thresh = hard_dirty_limit(dirty_thresh); @@ -1145,6 +1145,9 @@ pause: */ if (nr_dirty < dirty_thresh) break; + + if (fatal_signal_pending(current)) + break; } if (!dirty_exceeded && bdi->dirty_exceeded) -- 1.7.1 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-16 11:12 ` [PATCH 1/2] mm: " Jan Kara @ 2011-11-16 11:28 ` Wu Fengguang 2011-11-16 12:58 ` Jan Kara 0 siblings, 1 reply; 34+ messages in thread From: Wu Fengguang @ 2011-11-16 11:28 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Neil Brown, KOSAKI Motohiro, Anton Altaparmakov On Wed, Nov 16, 2011 at 07:12:14PM +0800, Jan Kara wrote: > There is no reason why task in balance_dirty_pages() shouldn't be killable > and it helps in recovering from some error conditions (like when filesystem > goes in error state and cannot accept writeback anymore but we still want to > kill processes using it to be able to unmount it). Here I'd like to add a paragraph from my original patch: The follow up patches will further abort the generic_perform_write() and other filesystem write loops, to avoid large write + SIGKILL combination exceeding the dirty limit and possibly strange OOM. Thanks, Fengguang > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > mm/page-writeback.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 0360d1b..39f5805 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1133,7 +1133,7 @@ pause: > pages_dirtied, > pause, > start_time); > - __set_current_state(TASK_UNINTERRUPTIBLE); > + __set_current_state(TASK_KILLABLE); > io_schedule_timeout(pause); > > dirty_thresh = hard_dirty_limit(dirty_thresh); > @@ -1145,6 +1145,9 @@ pause: > */ > if (nr_dirty < dirty_thresh) > break; > + > + if (fatal_signal_pending(current)) > + break; > } > > if (!dirty_exceeded && bdi->dirty_exceeded) > -- > 1.7.1 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-16 11:28 ` Wu Fengguang @ 2011-11-16 12:58 ` Jan Kara 0 siblings, 0 replies; 34+ messages in thread From: Jan Kara @ 2011-11-16 12:58 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Neil Brown, KOSAKI Motohiro, Anton Altaparmakov On Wed 16-11-11 19:28:19, Wu Fengguang wrote: > On Wed, Nov 16, 2011 at 07:12:14PM +0800, Jan Kara wrote: > > There is no reason why task in balance_dirty_pages() shouldn't be killable > > and it helps in recovering from some error conditions (like when filesystem > > goes in error state and cannot accept writeback anymore but we still want to > > kill processes using it to be able to unmount it). > > Here I'd like to add a paragraph from my original patch: > > The follow up patches will further abort the generic_perform_write() and > other filesystem write loops, to avoid large write + SIGKILL combination > exceeding the dirty limit and possibly strange OOM. Sure, feel free to add the paragraph to the changelog. Honza > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > mm/page-writeback.c | 5 ++++- > > 1 files changed, 4 insertions(+), 1 deletions(-) > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index 0360d1b..39f5805 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -1133,7 +1133,7 @@ pause: > > pages_dirtied, > > pause, > > start_time); > > - __set_current_state(TASK_UNINTERRUPTIBLE); > > + __set_current_state(TASK_KILLABLE); > > io_schedule_timeout(pause); > > > > dirty_thresh = hard_dirty_limit(dirty_thresh); > > @@ -1145,6 +1145,9 @@ pause: > > */ > > if (nr_dirty < dirty_thresh) > > break; > > + > > + if (fatal_signal_pending(current)) > > + break; > > } > > > > if (!dirty_exceeded && bdi->dirty_exceeded) > > -- > > 1.7.1 -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-16 11:12 [PATCH 0/2 v3] Make task in balance_dirty_pages() killable Jan Kara 2011-11-16 11:12 ` [PATCH 1/2] mm: " Jan Kara @ 2011-11-16 11:12 ` Jan Kara 2011-11-16 11:44 ` Wu Fengguang 2011-11-16 11:23 ` [PATCH 0/2 v3] Make task in balance_dirty_pages() killable Wu Fengguang 2 siblings, 1 reply; 34+ messages in thread From: Jan Kara @ 2011-11-16 11:12 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel, Jan Kara Currently write(2) to a file is not interruptible by a signal. Sometimes this is desirable (e.g. when you want to quickly kill a process hogging your disk or when some process gets blocked in balance_dirty_pages() indefinitely due to a filesystem being in an error condition). Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Jan Kara <jack@suse.cz> --- mm/filemap.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index c0018f2..c106d3b 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, iov_iter_count(i)); again: - /* * Bring in the user page that we will copy from _first_. * Otherwise there's a nasty deadlock on copying from the @@ -2463,7 +2462,10 @@ again: written += copied; balance_dirty_pages_ratelimited(mapping); - + if (fatal_signal_pending(current)) { + status = -EINTR; + break; + } } while (iov_iter_count(i)); return written ? written : status; -- 1.7.1 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-16 11:12 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara @ 2011-11-16 11:44 ` Wu Fengguang 2011-11-16 12:54 ` Jan Kara 2011-11-22 22:28 ` Andrew Morton 0 siblings, 2 replies; 34+ messages in thread From: Wu Fengguang @ 2011-11-16 11:44 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org Jan, Due to the (very low) possibility of data loss by partial writes, IMHO it would safer to test this patch in linux-next until next merge window, would you agree? Pushing the first patch will address the main problem, anyway. Thanks, Fengguang On Wed, Nov 16, 2011 at 07:12:15PM +0800, Jan Kara wrote: > Currently write(2) to a file is not interruptible by a signal. Sometimes this > is desirable (e.g. when you want to quickly kill a process hogging your disk or > when some process gets blocked in balance_dirty_pages() indefinitely due to a > filesystem being in an error condition). > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > mm/filemap.c | 6 ++++-- > 1 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index c0018f2..c106d3b 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > iov_iter_count(i)); > > again: > - > /* > * Bring in the user page that we will copy from _first_. > * Otherwise there's a nasty deadlock on copying from the > @@ -2463,7 +2462,10 @@ again: > written += copied; > > balance_dirty_pages_ratelimited(mapping); > - > + if (fatal_signal_pending(current)) { > + status = -EINTR; > + break; > + } > } while (iov_iter_count(i)); > > return written ? written : status; > -- > 1.7.1 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-16 11:44 ` Wu Fengguang @ 2011-11-16 12:54 ` Jan Kara 2011-11-16 13:11 ` Wu Fengguang 2011-11-22 22:28 ` Andrew Morton 1 sibling, 1 reply; 34+ messages in thread From: Jan Kara @ 2011-11-16 12:54 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Wed 16-11-11 19:44:21, Wu Fengguang wrote: > Jan, > > Due to the (very low) possibility of data loss by partial writes, IMHO > it would safer to test this patch in linux-next until next merge window, > would you agree? Fine with me. Thanks. > Pushing the first patch will address the main problem, anyway. Hopefully, yes. Honza > On Wed, Nov 16, 2011 at 07:12:15PM +0800, Jan Kara wrote: > > Currently write(2) to a file is not interruptible by a signal. Sometimes this > > is desirable (e.g. when you want to quickly kill a process hogging your disk or > > when some process gets blocked in balance_dirty_pages() indefinitely due to a > > filesystem being in an error condition). > > > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > mm/filemap.c | 6 ++++-- > > 1 files changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index c0018f2..c106d3b 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > > iov_iter_count(i)); > > > > again: > > - > > /* > > * Bring in the user page that we will copy from _first_. > > * Otherwise there's a nasty deadlock on copying from the > > @@ -2463,7 +2462,10 @@ again: > > written += copied; > > > > balance_dirty_pages_ratelimited(mapping); > > - > > + if (fatal_signal_pending(current)) { > > + status = -EINTR; > > + break; > > + } > > } while (iov_iter_count(i)); > > > > return written ? written : status; > > -- > > 1.7.1 -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-16 12:54 ` Jan Kara @ 2011-11-16 13:11 ` Wu Fengguang 0 siblings, 0 replies; 34+ messages in thread From: Wu Fengguang @ 2011-11-16 13:11 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Kazuya Mio On Wed, Nov 16, 2011 at 08:54:45PM +0800, Jan Kara wrote: > On Wed 16-11-11 19:44:21, Wu Fengguang wrote: > > Jan, > > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > it would safer to test this patch in linux-next until next merge window, > > would you agree? > Fine with me. Thanks. Great. The two patches are now in: http://git.kernel.org/?p=linux/kernel/git/wfg/linux.git;a=shortlog;h=refs/heads/writeback-for-next When applying the patch I changed the title "... by a signal" to "... by SIGKILL" to reflect the updated patch content. Hopefully this is also what's in your mind. > > Pushing the first patch will address the main problem, anyway. > Hopefully, yes. I looked over Kazuya's posts in linux-ext4 and think the first patch alone does have good chance address the problem of busy looping in balance_dirty_pages() due to dirty pages never drop under fs error conditions. Thanks, Fengguang > > On Wed, Nov 16, 2011 at 07:12:15PM +0800, Jan Kara wrote: > > > Currently write(2) to a file is not interruptible by a signal. Sometimes this > > > is desirable (e.g. when you want to quickly kill a process hogging your disk or > > > when some process gets blocked in balance_dirty_pages() indefinitely due to a > > > filesystem being in an error condition). > > > > > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > > Signed-off-by: Jan Kara <jack@suse.cz> > > > --- > > > mm/filemap.c | 6 ++++-- > > > 1 files changed, 4 insertions(+), 2 deletions(-) > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > index c0018f2..c106d3b 100644 > > > --- a/mm/filemap.c > > > +++ b/mm/filemap.c > > > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > > > iov_iter_count(i)); > > > > > > again: > > > - > > > /* > > > * Bring in the user page that we will copy from _first_. > > > * Otherwise there's a nasty deadlock on copying from the > > > @@ -2463,7 +2462,10 @@ again: > > > written += copied; > > > > > > balance_dirty_pages_ratelimited(mapping); > > > - > > > + if (fatal_signal_pending(current)) { > > > + status = -EINTR; > > > + break; > > > + } > > > } while (iov_iter_count(i)); > > > > > > return written ? written : status; > > > -- > > > 1.7.1 > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-16 11:44 ` Wu Fengguang 2011-11-16 12:54 ` Jan Kara @ 2011-11-22 22:28 ` Andrew Morton 2011-11-23 9:05 ` Wu Fengguang 1 sibling, 1 reply; 34+ messages in thread From: Andrew Morton @ 2011-11-22 22:28 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Wed, 16 Nov 2011 19:44:21 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote: > Due to the (very low) possibility of data loss by partial writes, IMHO > it would safer to test this patch in linux-next until next merge window, Any such bugs will not be discovered in linux-next testing. The only way to find these things in a reasonable period of time is to go in and find them. For example, intensive fsx-linux testing with concurrent heavy memory pressure on various filesystems with various block sizes. And of course concurrent signalling. If you're talking about O_DIRECT then iirc I hacked support for that into fsx-linux. I think. Anyway, what _are_ the scenarios in which we think data can be lost? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-22 22:28 ` Andrew Morton @ 2011-11-23 9:05 ` Wu Fengguang 2011-11-23 9:50 ` Andrew Morton 2011-11-23 13:08 ` [PATCH 2/2] fs: " Jan Kara 0 siblings, 2 replies; 34+ messages in thread From: Wu Fengguang @ 2011-11-23 9:05 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote: > On Wed, 16 Nov 2011 19:44:21 +0800 > Wu Fengguang <fengguang.wu@intel.com> wrote: > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > it would safer to test this patch in linux-next until next merge window, > > Any such bugs will not be discovered in linux-next testing. Yup, I'm afraid. > The only way to find these things in a reasonable period of time is to > go in and find them. For example, intensive fsx-linux testing with > concurrent heavy memory pressure on various filesystems with various > block sizes. And of course concurrent signalling. If you're talking > about O_DIRECT then iirc I hacked support for that into fsx-linux. I > think. How are we going to measure the success/failure? Check if it eventually resulted in filesystem corruption or whatever? When received SIGKILL, fsx-linux itself will just die. > Anyway, what _are_ the scenarios in which we think data can be lost? It's the vision that there may be partial writes on SIGKILL. Before patch, the write will either succeed as a whole or not started at all, depending on the timing of write/SIGKILL. This is kind of atomic operation. However now the write can be half done. If the application really cares about atomic behavior, it can do create-and-rename. However there are always the possibility of broken applications. Maybe this is not that big problem as SIGKILL is considered be to destructive already. Thanks, Fengguang ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-23 9:05 ` Wu Fengguang @ 2011-11-23 9:50 ` Andrew Morton 2011-11-23 12:27 ` [PATCH 2/2] " Theodore Tso 2011-11-23 13:08 ` [PATCH 2/2] fs: " Jan Kara 1 sibling, 1 reply; 34+ messages in thread From: Andrew Morton @ 2011-11-23 9:50 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Wed, 23 Nov 2011 17:05:33 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote: > On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote: > > On Wed, 16 Nov 2011 19:44:21 +0800 > > Wu Fengguang <fengguang.wu@intel.com> wrote: > > > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > > it would safer to test this patch in linux-next until next merge window, > > > > Any such bugs will not be discovered in linux-next testing. > > Yup, I'm afraid. > > > The only way to find these things in a reasonable period of time is to > > go in and find them. For example, intensive fsx-linux testing with > > concurrent heavy memory pressure on various filesystems with various > > block sizes. And of course concurrent signalling. If you're talking > > about O_DIRECT then iirc I hacked support for that into fsx-linux. I > > think. > > How are we going to measure the success/failure? Check if it > eventually resulted in filesystem corruption or whatever? yup. > When received SIGKILL, fsx-linux itself will just die. Well, there are ways of simulating its effect. For example, bale out of the write() every seventh time if current->comm=="fsx-linux". Or set a rearming timer which triggers a baled-out write. You'll work it out ;) > > Anyway, what _are_ the scenarios in which we think data can be lost? > > It's the vision that there may be partial writes on SIGKILL. Before > patch, the write will either succeed as a whole or not started at > all, depending on the timing of write/SIGKILL. This is kind of atomic > operation. However now the write can be half done. > > If the application really cares about atomic behavior, it can do > create-and-rename. However there are always the possibility of broken > applications. > > Maybe this is not that big problem as SIGKILL is considered be to > destructive already. Yeah, I have dim dark memories that there are subtle problems with interrupting write(). Linus might remember. Others might remember too, but we're only talking to fsdevel which nobody reads :( ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] Make write(2) interruptible by a signal 2011-11-23 9:50 ` Andrew Morton @ 2011-11-23 12:27 ` Theodore Tso 2011-11-23 20:29 ` Andrew Morton 0 siblings, 1 reply; 34+ messages in thread From: Theodore Tso @ 2011-11-23 12:27 UTC (permalink / raw) To: Andrew Morton Cc: Theodore Tso, Wu Fengguang, Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Nov 23, 2011, at 4:50 AM, Andrew Morton wrote: >> >> If the application really cares about atomic behavior, it can do >> create-and-rename. However there are always the possibility of broken >> applications. Heh. Application programmers whine a lot about not doing create-and-rename already (and we had to introduce a work-around since so many programs assume close() implies fsync(). >> >> Maybe this is not that big problem as SIGKILL is considered be to >> destructive already. > > Yeah, I have dim dark memories that there are subtle problems with > interrupting write(). Linus might remember. The big one is that you're lucky if application programmers check the return values of write(2), and if they do, it's likely they will only check for error returns and not necessarily for partial writes --- since no other Unix-like or Linux-like system has ever returned partial reads or writes for files except in error conditions. We've barely gotten them trained to check for partial writes and reads with TCP connections and character devices, but I wouldn't bet on application programmers getting things right for files. Still, if it's ***only*** for SIGKILL, we'll probably be OK, since for that one case there's no chance userspace can intercept the signal, so it can't do any recovery anyway. (I could imagine some HPC program doing a massive 2GB write, and some user of that program depending on the fact that he can kill it at a predefined place by sending a SIGKILL and knowing that the file would be written up to that 2GB chunk --- but that's clearly an edge situation, as opposed to something that would effect most GNOME and KDE apps.) We just need to make sure we never try to do this for any other signal that could be caught, such as SIGINT or SIGQUIT or (worse yet) SIGTSTP. -- Ted ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] Make write(2) interruptible by a signal 2011-11-23 12:27 ` [PATCH 2/2] " Theodore Tso @ 2011-11-23 20:29 ` Andrew Morton 2011-11-24 19:27 ` Matthew Wilcox 0 siblings, 1 reply; 34+ messages in thread From: Andrew Morton @ 2011-11-23 20:29 UTC (permalink / raw) To: Theodore Tso Cc: Wu Fengguang, Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Wed, 23 Nov 2011 07:27:43 -0500 Theodore Tso <tytso@mit.edu> wrote: > >> > >> Maybe this is not that big problem as SIGKILL is considered be to > >> destructive already. > > > > Yeah, I have dim dark memories that there are subtle problems with > > interrupting write(). Linus might remember. (err, you're sending 600-column emails) > The big one is that you're lucky if application programmers check the > return values of write(2), and if they do, it's likely they will only > check for error returns and not necessarily for partial writes --- > since no other Unix-like or Linux-like system has ever returned partial > reads or writes for files except in error conditions. We've barely > gotten them trained to check for partial writes and reads with TCP > connections and character devices, but I wouldn't bet on application > programmers getting things right for files. > > Still, if it's ***only*** for SIGKILL, we'll probably be OK, since > for that one case there's no chance userspace can intercept the signal, > so it can't do any recovery anyway. (I could imagine some HPC program > doing a massive 2GB write, and some user of that program depending on > the fact that he can kill it at a predefined place by sending a SIGKILL > and knowing that the file would be written up to that 2GB chunk --- but > that's clearly an edge situation, as opposed to something that would > effect most GNOME and KDE apps.) We just need to make sure we never try > to do this for any other signal that could be caught, such as SIGINT or > SIGQUIT or (worse yet) SIGTSTP. That it is a fatal SIGKILL means that the *current* application doesn't care. But other processes will sometimes notice this change. Previously if an app did write(file, 128k) and was hit with SIGKILL, it would write either 0 bytes or 128k bytes. Now, it can write 36k bytes, yes? If the target file consisted of a stream of 128k records then the user will claim, with some justification, that Linux corrupted it. Dunno. People do lots of weird and flakey things. I have a suspicion that we'll be hearing back from them about this change. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] Make write(2) interruptible by a signal 2011-11-23 20:29 ` Andrew Morton @ 2011-11-24 19:27 ` Matthew Wilcox 2011-11-24 20:53 ` Ted Ts'o 2011-11-24 20:53 ` Jan Kara 0 siblings, 2 replies; 34+ messages in thread From: Matthew Wilcox @ 2011-11-24 19:27 UTC (permalink / raw) To: Andrew Morton Cc: Theodore Tso, Wu Fengguang, Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Wed, Nov 23, 2011 at 12:29:48PM -0800, Andrew Morton wrote: > > Still, if it's ***only*** for SIGKILL, we'll probably be OK, since > > for that one case there's no chance userspace can intercept the signal, > > so it can't do any recovery anyway. (I could imagine some HPC program > > doing a massive 2GB write, and some user of that program depending on > > the fact that he can kill it at a predefined place by sending a SIGKILL > > and knowing that the file would be written up to that 2GB chunk --- but > > that's clearly an edge situation, as opposed to something that would > > effect most GNOME and KDE apps.) We just need to make sure we never try > > to do this for any other signal that could be caught, such as SIGINT or > > SIGQUIT or (worse yet) SIGTSTP. > > That it is a fatal SIGKILL means that the *current* application doesn't > care. But other processes will sometimes notice this change. > Previously if an app did write(file, 128k) and was hit with SIGKILL, it > would write either 0 bytes or 128k bytes. Now, it can write 36k bytes, > yes? If the target file consisted of a stream of 128k records then the > user will claim, with some justification, that Linux corrupted it. On the other hand, if there was a crash mid-write, they might also get a 36k write that actually hit media (right? Or do we guarantee that on reboot you see a multiple of 128k?) We could put in some nice code that rewinds i_size to where it used to be if the write was interrupted. Or do we need to? Presumably write_end would not have been called, so i_size would not have been updated. > Dunno. People do lots of weird and flakey things. I have a suspicion > that we'll be hearing back from them about this change. The problem is that we may not hear from them for 6 years ... or whenever they decide to move off RHEL 3. -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] Make write(2) interruptible by a signal 2011-11-24 19:27 ` Matthew Wilcox @ 2011-11-24 20:53 ` Ted Ts'o 2011-11-25 0:10 ` Matthew Wilcox 2011-11-24 20:53 ` Jan Kara 1 sibling, 1 reply; 34+ messages in thread From: Ted Ts'o @ 2011-11-24 20:53 UTC (permalink / raw) To: Matthew Wilcox Cc: Andrew Morton, Wu Fengguang, Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Thu, Nov 24, 2011 at 12:27:11PM -0700, Matthew Wilcox wrote: > > On the other hand, if there was a crash mid-write, they might also get a > 36k write that actually hit media (right? Or do we guarantee that on > reboot you see a multiple of 128k?) Sure, but in the case the crash we expect things to be in a wonky state. The problem is if people assume atomic writes to files in a non-crash case, which has been a traditional Unix/Linux "feature". It's guaranteed by the standards as much a "close() implies fsync()", but once application programmers start coding to such assumptions, they refuse to admit they were wrong, and blame the kernel programmers. > > Dunno. People do lots of weird and flakey things. I have a suspicion > > that we'll be hearing back from them about this change. > > The problem is that we may not hear from them for 6 years ... or whenever > they decide to move off RHEL 3. Funny you mention RHEL 3. I once had to fly on site to a customer because their application programs depended on the order of addresses handed back from mmap(), and that changed between 2.4 and 2.6, and hence between RHEL 3 and RHEL 4 to debug the problem --- which turned out to be caused in a change of an undocumented problem in mmap(2)... (Well, it didn't break per se, but it turned an algorithm hidden inside their app that had been O(n) to O(n**2), and that caused it to slow down to the point that it was stalling long enough to make other programs time out. All the customer could tell us was "Og had program. Program worked on RHEL 3. Broke on RHEL 4. Og need support." :-) Still, I'm not too worried about those folks. Those are the customers who keep people who do advanced support at Red Hat and IBM employed. :-) I'm worried about the problems that break thousands of users using open source code --- and so long as we it only happens on SIGKILL, and not on any other signal, it's _probably_ going to be ok. - Ted ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] Make write(2) interruptible by a signal 2011-11-24 20:53 ` Ted Ts'o @ 2011-11-25 0:10 ` Matthew Wilcox 0 siblings, 0 replies; 34+ messages in thread From: Matthew Wilcox @ 2011-11-25 0:10 UTC (permalink / raw) To: Ted Ts'o Cc: Andrew Morton, Wu Fengguang, Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Thu, Nov 24, 2011 at 03:53:43PM -0500, Ted Ts'o wrote: > On Thu, Nov 24, 2011 at 12:27:11PM -0700, Matthew Wilcox wrote: > > On the other hand, if there was a crash mid-write, they might also get a > > 36k write that actually hit media (right? Or do we guarantee that on > > reboot you see a multiple of 128k?) > > Sure, but in the case the crash we expect things to be in a wonky > state. The problem is if people assume atomic writes to files in a > non-crash case, which has been a traditional Unix/Linux "feature". > It's guaranteed by the standards as much a "close() implies fsync()", > but once application programmers start coding to such assumptions, > they refuse to admit they were wrong, and blame the kernel > programmers. Sure, but resorting to kill -9 is almost the same as pushing the BRS. Nobody's arguing in favour of non-fatal signals interrupting write() [well, Honza was earlier, but we all talked him out of it]. -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] Make write(2) interruptible by a signal 2011-11-24 19:27 ` Matthew Wilcox 2011-11-24 20:53 ` Ted Ts'o @ 2011-11-24 20:53 ` Jan Kara 1 sibling, 0 replies; 34+ messages in thread From: Jan Kara @ 2011-11-24 20:53 UTC (permalink / raw) To: Matthew Wilcox Cc: Andrew Morton, Theodore Tso, Wu Fengguang, Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Thu 24-11-11 12:27:11, Matthew Wilcox wrote: > On Wed, Nov 23, 2011 at 12:29:48PM -0800, Andrew Morton wrote: > > > Still, if it's ***only*** for SIGKILL, we'll probably be OK, since > > > for that one case there's no chance userspace can intercept the signal, > > > so it can't do any recovery anyway. (I could imagine some HPC program > > > doing a massive 2GB write, and some user of that program depending on > > > the fact that he can kill it at a predefined place by sending a SIGKILL > > > and knowing that the file would be written up to that 2GB chunk --- but > > > that's clearly an edge situation, as opposed to something that would > > > effect most GNOME and KDE apps.) We just need to make sure we never try > > > to do this for any other signal that could be caught, such as SIGINT or > > > SIGQUIT or (worse yet) SIGTSTP. > > > > That it is a fatal SIGKILL means that the *current* application doesn't > > care. But other processes will sometimes notice this change. > > Previously if an app did write(file, 128k) and was hit with SIGKILL, it > > would write either 0 bytes or 128k bytes. Now, it can write 36k bytes, > > yes? If the target file consisted of a stream of 128k records then the > > user will claim, with some justification, that Linux corrupted it. > > On the other hand, if there was a crash mid-write, they might also get a > 36k write that actually hit media (right? Or do we guarantee that on > reboot you see a multiple of 128k?) They might see only 36k written for all filesystems I know... > We could put in some nice code that rewinds i_size to where it used to > be if the write was interrupted. Or do we need to? Presumably write_end > would not have been called, so i_size would not have been updated. But that would solve only extending writes and not overwrites. So I'm not sure it would be worth it... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-23 9:05 ` Wu Fengguang 2011-11-23 9:50 ` Andrew Morton @ 2011-11-23 13:08 ` Jan Kara 2011-11-23 13:27 ` Wu Fengguang 1 sibling, 1 reply; 34+ messages in thread From: Jan Kara @ 2011-11-23 13:08 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Wed 23-11-11 17:05:33, Wu Fengguang wrote: > On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote: > > On Wed, 16 Nov 2011 19:44:21 +0800 > > Wu Fengguang <fengguang.wu@intel.com> wrote: > > > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > > it would safer to test this patch in linux-next until next merge window, > > > > Any such bugs will not be discovered in linux-next testing. > > Yup, I'm afraid. > > > The only way to find these things in a reasonable period of time is to > > go in and find them. For example, intensive fsx-linux testing with > > concurrent heavy memory pressure on various filesystems with various > > block sizes. And of course concurrent signalling. If you're talking > > about O_DIRECT then iirc I hacked support for that into fsx-linux. I > > think. > > How are we going to measure the success/failure? Check if it > eventually resulted in filesystem corruption or whatever? There are a few different questions: 1) Checking for filesystem corruption via fsck - I find such corruption caused by stopping write early extremely unlikely. 2) Checking that we do not expose uninitialized data after a partial (possibly DIRECT_IO) write - I did not find a place where that could happen but this would be worth testing. I think I can write a test for this if people are afraid of data exposure problems. 3) Is it acceptable for write(2) to be interrupted by SIGKILL in the middle? That obviously does happen with my patches so there's no reason to test that. The question is whether someone cares or not and that can be tested only by reality check :). Since the signal is SIGKILL, the process itself cannot notice the interrupted write but someone else can. But as I already said earlier, partial writes can already be observed when the machine crashes, filesystem is close to ENOSPC or so. Arguably these are more severe error conditions than application catching SIGKILL so my patch lowers the bar for observing partial writes. But I wouldn't like to throw away a sensible thing - allow SIGKILL to interrupt a system call - just because of fear of possibility some broken app could rely on this. Sure if the reality check shows there are such broken apps and users who care enough to report, then I have nothing against biting the bullet and reverting the change... Opinions? Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-23 13:08 ` [PATCH 2/2] fs: " Jan Kara @ 2011-11-23 13:27 ` Wu Fengguang 2011-11-23 15:06 ` Theodore Tso 0 siblings, 1 reply; 34+ messages in thread From: Wu Fengguang @ 2011-11-23 13:27 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Theodore Ts'o On Wed, Nov 23, 2011 at 09:08:03PM +0800, Jan Kara wrote: > On Wed 23-11-11 17:05:33, Wu Fengguang wrote: > > On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote: > > > On Wed, 16 Nov 2011 19:44:21 +0800 > > > Wu Fengguang <fengguang.wu@intel.com> wrote: > > > > > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > > > it would safer to test this patch in linux-next until next merge window, > > > > > > Any such bugs will not be discovered in linux-next testing. > > > > Yup, I'm afraid. > > > > > The only way to find these things in a reasonable period of time is to > > > go in and find them. For example, intensive fsx-linux testing with > > > concurrent heavy memory pressure on various filesystems with various > > > block sizes. And of course concurrent signalling. If you're talking > > > about O_DIRECT then iirc I hacked support for that into fsx-linux. I > > > think. > > > > How are we going to measure the success/failure? Check if it > > eventually resulted in filesystem corruption or whatever? > There are a few different questions: > 1) Checking for filesystem corruption via fsck - I find such corruption > caused by stopping write early extremely unlikely. Agreed. > 2) Checking that we do not expose uninitialized data after a partial > (possibly DIRECT_IO) write - I did not find a place where that could happen > but this would be worth testing. I think I can write a test for this if > people are afraid of data exposure problems. Do we already have such kind of tests in xfstests? If not, it sounds like a good gap to fill :-) > 3) Is it acceptable for write(2) to be interrupted by SIGKILL in the > middle? That obviously does happen with my patches so there's no reason > to test that. The question is whether someone cares or not and that can be > tested only by reality check :). Since the signal is SIGKILL, the process > itself cannot notice the interrupted write but someone else can. But as I > already said earlier, partial writes can already be observed when the > machine crashes, filesystem is close to ENOSPC or so. Arguably these are > more severe error conditions than application catching SIGKILL so my > patch lowers the bar for observing partial writes. But I wouldn't like to > throw away a sensible thing - allow SIGKILL to interrupt a system call - > just because of fear of possibility some broken app could rely on this. > Sure if the reality check shows there are such broken apps and users who > care enough to report, then I have nothing against biting the bullet > and reverting the change... Opinions? Reading Ted's information feed, I tend to disregard the partial write issue: since the "broken" applications will already fail and get punished in various other cases, I don't care adding one more penalty case to them :-P Thanks, Fengguang ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-23 13:27 ` Wu Fengguang @ 2011-11-23 15:06 ` Theodore Tso 2011-11-28 3:08 ` Wu Fengguang 0 siblings, 1 reply; 34+ messages in thread From: Theodore Tso @ 2011-11-23 15:06 UTC (permalink / raw) To: Wu Fengguang Cc: Theodore Tso, Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Nov 23, 2011, at 8:27 AM, Wu Fengguang wrote: > > Reading Ted's information feed, I tend to disregard the partial write > issue: since the "broken" applications will already fail and get > punished in various other cases, I don't care adding one more penalty > case to them :-P Just wait until you have a bunch of rabid application programmers, questioning your judgement, your morality, and even your paternity. :-) -- Ted ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-23 15:06 ` Theodore Tso @ 2011-11-28 3:08 ` Wu Fengguang 2011-11-28 3:33 ` [PATCH] writeback: add a safety limit to the SIGKILL abort Wu Fengguang 2011-11-29 14:16 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 0 siblings, 2 replies; 34+ messages in thread From: Wu Fengguang @ 2011-11-28 3:08 UTC (permalink / raw) To: Theodore Tso Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Kazuya Mio, Dmitry Monakhov, Matthew Wilcox On Wed, Nov 23, 2011 at 11:06:56PM +0800, Theodore Ts'o wrote: > > On Nov 23, 2011, at 8:27 AM, Wu Fengguang wrote: > > > > > Reading Ted's information feed, I tend to disregard the partial write > > issue: since the "broken" applications will already fail and get > > punished in various other cases, I don't care adding one more penalty > > case to them :-P > > Just wait until you have a bunch of rabid application programmers, > questioning your judgement, your morality, and even your paternity. > :-) Ah OK, that sounds frightening. Hmm, till now every one have acknowledged the possibility of data corruption, only that people have different perceptions of the severeness. Let's rethink things this way: "Is it a _worthwhile_ risk at all?" I'm afraid not. Considering the origin of this patch [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages http://www.spinics.net/lists/linux-ext4/msg28464.html I *think* Jan's first patch is already enough for fixing the bug. IWO the patch we worried/discussed so much is really an optional one. I would imagine the easy and safe solution is to just drop it. Any objections? Thanks, Fengguang ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] writeback: add a safety limit to the SIGKILL abort 2011-11-28 3:08 ` Wu Fengguang @ 2011-11-28 3:33 ` Wu Fengguang 2011-11-29 14:18 ` Jan Kara 2011-11-29 14:16 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 1 sibling, 1 reply; 34+ messages in thread From: Wu Fengguang @ 2011-11-28 3:33 UTC (permalink / raw) To: Theodore Tso Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Kazuya Mio, Dmitry Monakhov, Matthew Wilcox On Mon, Nov 28, 2011 at 11:08:42AM +0800, Wu Fengguang wrote: > On Wed, Nov 23, 2011 at 11:06:56PM +0800, Theodore Ts'o wrote: > > > > On Nov 23, 2011, at 8:27 AM, Wu Fengguang wrote: > > > > > > > > Reading Ted's information feed, I tend to disregard the partial write > > > issue: since the "broken" applications will already fail and get > > > punished in various other cases, I don't care adding one more penalty > > > case to them :-P > > > > Just wait until you have a bunch of rabid application programmers, > > questioning your judgement, your morality, and even your paternity. > > :-) > > Ah OK, that sounds frightening. Hmm, till now every one have > acknowledged the possibility of data corruption, only that > people have different perceptions of the severeness. > > Let's rethink things this way: "Is it a _worthwhile_ risk at all?" > I'm afraid not. Considering the origin of this patch > > [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages > http://www.spinics.net/lists/linux-ext4/msg28464.html > > I *think* Jan's first patch is already enough for fixing the bug. IWO > the patch we worried/discussed so much is really an optional one. I > would imagine the easy and safe solution is to just drop it. Any > objections? Here is the replacement patch. --- Subject: writeback: add a safety limit to the SIGKILL abort Date: Mon Nov 28 11:16:54 CST 2011 This adds a safety limit to the SIGKILL abort in commit 499d05ecf990 ("mm: Make task in balance_dirty_pages() killable"). This will avoid dirty pages rushing arbitrarily high in the case some task receives SIGKILL and hence become *unthrottled* when doing a huge sized write. The alternative way is to check SIGKILL and return partial write in generic_perform_write(). However it will lead to data corruption as put by Andrew Morton: Previously if an app did write(file, 128k) and was hit with SIGKILL, it would write either 0 bytes or 128k bytes. Now, it can write 36k bytes, yes? If the target file consisted of a stream of 128k records then the user will claim, with some justification, that Linux corrupted it. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> --- mm/page-writeback.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- linux-next.orig/mm/page-writeback.c 2011-11-28 11:13:58.000000000 +0800 +++ linux-next/mm/page-writeback.c 2011-11-28 11:16:52.000000000 +0800 @@ -1136,7 +1136,8 @@ pause: if (task_ratelimit) break; - if (fatal_signal_pending(current)) + if (fatal_signal_pending(current) && + nr_dirty <= dirty_thresh + dirty_thresh / 2) break; } ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] writeback: add a safety limit to the SIGKILL abort 2011-11-28 3:33 ` [PATCH] writeback: add a safety limit to the SIGKILL abort Wu Fengguang @ 2011-11-29 14:18 ` Jan Kara 0 siblings, 0 replies; 34+ messages in thread From: Jan Kara @ 2011-11-29 14:18 UTC (permalink / raw) To: Wu Fengguang Cc: Theodore Tso, Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Kazuya Mio, Dmitry Monakhov, Matthew Wilcox On Mon 28-11-11 11:33:40, Wu Fengguang wrote: > On Mon, Nov 28, 2011 at 11:08:42AM +0800, Wu Fengguang wrote: > > On Wed, Nov 23, 2011 at 11:06:56PM +0800, Theodore Ts'o wrote: > > > > > > On Nov 23, 2011, at 8:27 AM, Wu Fengguang wrote: > > > > > > > > > > > Reading Ted's information feed, I tend to disregard the partial write > > > > issue: since the "broken" applications will already fail and get > > > > punished in various other cases, I don't care adding one more penalty > > > > case to them :-P > > > > > > Just wait until you have a bunch of rabid application programmers, > > > questioning your judgement, your morality, and even your paternity. > > > :-) > > > > Ah OK, that sounds frightening. Hmm, till now every one have > > acknowledged the possibility of data corruption, only that > > people have different perceptions of the severeness. > > > > Let's rethink things this way: "Is it a _worthwhile_ risk at all?" > > I'm afraid not. Considering the origin of this patch > > > > [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages > > http://www.spinics.net/lists/linux-ext4/msg28464.html > > > > I *think* Jan's first patch is already enough for fixing the bug. IWO > > the patch we worried/discussed so much is really an optional one. I > > would imagine the easy and safe solution is to just drop it. Any > > objections? > > Here is the replacement patch. > > --- > Subject: writeback: add a safety limit to the SIGKILL abort > Date: Mon Nov 28 11:16:54 CST 2011 > > This adds a safety limit to the SIGKILL abort in commit 499d05ecf990 > ("mm: Make task in balance_dirty_pages() killable"). This will avoid > dirty pages rushing arbitrarily high in the case some task receives > SIGKILL and hence become *unthrottled* when doing a huge sized write. > > The alternative way is to check SIGKILL and return partial write in > generic_perform_write(). However it will lead to data corruption as > put by Andrew Morton: > > Previously if an app did write(file, 128k) and was hit with SIGKILL, it > would write either 0 bytes or 128k bytes. Now, it can write 36k bytes, > yes? If the target file consisted of a stream of 128k records then the > user will claim, with some justification, that Linux corrupted it. > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> The patch looks good. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > mm/page-writeback.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > --- linux-next.orig/mm/page-writeback.c 2011-11-28 11:13:58.000000000 +0800 > +++ linux-next/mm/page-writeback.c 2011-11-28 11:16:52.000000000 +0800 > @@ -1136,7 +1136,8 @@ pause: > if (task_ratelimit) > break; > > - if (fatal_signal_pending(current)) > + if (fatal_signal_pending(current) && > + nr_dirty <= dirty_thresh + dirty_thresh / 2) > break; > } > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-28 3:08 ` Wu Fengguang 2011-11-28 3:33 ` [PATCH] writeback: add a safety limit to the SIGKILL abort Wu Fengguang @ 2011-11-29 14:16 ` Jan Kara 1 sibling, 0 replies; 34+ messages in thread From: Jan Kara @ 2011-11-29 14:16 UTC (permalink / raw) To: Wu Fengguang Cc: Theodore Tso, Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Kazuya Mio, Dmitry Monakhov, Matthew Wilcox On Mon 28-11-11 11:08:42, Wu Fengguang wrote: > On Wed, Nov 23, 2011 at 11:06:56PM +0800, Theodore Ts'o wrote: > > > > On Nov 23, 2011, at 8:27 AM, Wu Fengguang wrote: > > > > > > > > Reading Ted's information feed, I tend to disregard the partial write > > > issue: since the "broken" applications will already fail and get > > > punished in various other cases, I don't care adding one more penalty > > > case to them :-P > > > > Just wait until you have a bunch of rabid application programmers, > > questioning your judgement, your morality, and even your paternity. > > :-) > > Ah OK, that sounds frightening. Hmm, till now every one have > acknowledged the possibility of data corruption, only that > people have different perceptions of the severeness. > > Let's rethink things this way: "Is it a _worthwhile_ risk at all?" > I'm afraid not. Considering the origin of this patch > > [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages > http://www.spinics.net/lists/linux-ext4/msg28464.html > > I *think* Jan's first patch is already enough for fixing the bug. IWO > the patch we worried/discussed so much is really an optional one. I > would imagine the easy and safe solution is to just drop it. Any > objections? I still think aborting write is a cleaner solution. But I don't want to argue to death about this so if you still don't think it's a good idea, I'll respect your decision. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 0/2 v3] Make task in balance_dirty_pages() killable 2011-11-16 11:12 [PATCH 0/2 v3] Make task in balance_dirty_pages() killable Jan Kara 2011-11-16 11:12 ` [PATCH 1/2] mm: " Jan Kara 2011-11-16 11:12 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara @ 2011-11-16 11:23 ` Wu Fengguang 2 siblings, 0 replies; 34+ messages in thread From: Wu Fengguang @ 2011-11-16 11:23 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Wed, Nov 16, 2011 at 07:12:13PM +0800, Jan Kara wrote: > > Hello, > > since the discussion has calmed down, here is a third iteration of the patches. > Fengguang, can you put them into your tree and merge them with Linus? Thanks. Sure, thanks for pushing this forward. I'll push them to linux-next and then to Linus. Thanks, Fengguang > Changes since v2: > * removed return value of balance_dirty_pages() since it didn't seem that > useful after all. > * kept fatal_signal_pending() check in generic_perform_write() since I don't think > possible partial writes when app receives SIGKILL are a problem. > > Changes since v1: > * slightly moved the check in balance_dirty_pages() as Fengguang requested > * made balance_dirty_pages() return EINTR if fatal signal was detected > * changed check for signal to check for fatal signal in generic_perform_write() > to avoid unexpected results for userspace applications. > > Honza ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 0/2 v2] Make task in balance_dirty_pages() killable @ 2011-11-14 16:15 Jan Kara 2011-11-14 16:15 ` [PATCH 1/2] mm: " Jan Kara 0 siblings, 1 reply; 34+ messages in thread From: Jan Kara @ 2011-11-14 16:15 UTC (permalink / raw) To: Wu Fengguang; +Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel Hello, here is a second iteration of the patches. Changes since v1: * slightly moved the check in balance_dirty_pages() as Fengguang requested * made balance_dirty_pages() return EINTR if fatal signal was detected * changed check for signal to check for fatal signal in generic_perform_write() to avoid unexpected results for userspace applications. Honza ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 16:15 [PATCH 0/2 v2] " Jan Kara @ 2011-11-14 16:15 ` Jan Kara 2011-11-14 16:23 ` Christoph Hellwig 2011-11-15 11:48 ` Wu Fengguang 0 siblings, 2 replies; 34+ messages in thread From: Jan Kara @ 2011-11-14 16:15 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel, Jan Kara There is no reason why task in balance_dirty_pages() shouldn't be killable and it helps in recovering from some error conditions (like when filesystem goes in error state and cannot accept writeback anymore but we still want to kill processes using it to be able to unmount it). Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Jan Kara <jack@suse.cz> --- include/linux/writeback.h | 8 ++++---- mm/page-writeback.c | 27 +++++++++++++++++++-------- 2 files changed, 23 insertions(+), 12 deletions(-) diff --git a/include/linux/writeback.h b/include/linux/writeback.h index a378c29..eb3ecca 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -170,13 +170,13 @@ void __bdi_update_bandwidth(struct backing_dev_info *bdi, unsigned long start_time); void page_writeback_init(void); -void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, - unsigned long nr_pages_dirtied); +int balance_dirty_pages_ratelimited_nr(struct address_space *mapping, + unsigned long nr_pages_dirtied); -static inline void +static inline int balance_dirty_pages_ratelimited(struct address_space *mapping) { - balance_dirty_pages_ratelimited_nr(mapping, 1); + return balance_dirty_pages_ratelimited_nr(mapping, 1); } typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc, diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 0360d1b..380279c 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1000,8 +1000,11 @@ static unsigned long bdi_max_pause(struct backing_dev_info *bdi, * the caller to wait once crossing the (background_thresh + dirty_thresh) / 2. * If we're over `background_thresh' then the writeback threads are woken to * perform some writeout. + * + * We return 0 if nothing special happened, EINTR if our wait has been + * interrupted by a fatal signal. */ -static void balance_dirty_pages(struct address_space *mapping, +static int balance_dirty_pages(struct address_space *mapping, unsigned long pages_dirtied) { unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */ @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping, unsigned long pos_ratio; struct backing_dev_info *bdi = mapping->backing_dev_info; unsigned long start_time = jiffies; + int err = 0; for (;;) { /* @@ -1133,7 +1137,7 @@ pause: pages_dirtied, pause, start_time); - __set_current_state(TASK_UNINTERRUPTIBLE); + __set_current_state(TASK_KILLABLE); io_schedule_timeout(pause); dirty_thresh = hard_dirty_limit(dirty_thresh); @@ -1145,6 +1149,11 @@ pause: */ if (nr_dirty < dirty_thresh) break; + + if (fatal_signal_pending(current)) { + err = -EINTR; + break; + } } if (!dirty_exceeded && bdi->dirty_exceeded) @@ -1168,7 +1177,7 @@ pause: } if (writeback_in_progress(bdi)) - return; + return err; /* * In laptop mode, we wait until hitting the higher threshold before @@ -1179,10 +1188,11 @@ pause: * background_thresh, to keep the amount of dirty memory low. */ if (laptop_mode) - return; + return err; if (nr_reclaimable > background_thresh) bdi_start_background_writeback(bdi); + return err; } void set_page_dirty_balance(struct page *page, int page_mkwrite) @@ -1211,15 +1221,15 @@ static DEFINE_PER_CPU(int, bdp_ratelimits); * limit we decrease the ratelimiting by a lot, to prevent individual processes * from overshooting the limit by (ratelimit_pages) each. */ -void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, - unsigned long nr_pages_dirtied) +int balance_dirty_pages_ratelimited_nr(struct address_space *mapping, + unsigned long nr_pages_dirtied) { struct backing_dev_info *bdi = mapping->backing_dev_info; int ratelimit; int *p; if (!bdi_cap_account_dirty(bdi)) - return; + return 0; ratelimit = current->nr_dirtied_pause; if (bdi->dirty_exceeded) @@ -1247,7 +1257,8 @@ void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, preempt_enable(); if (unlikely(current->nr_dirtied >= ratelimit)) - balance_dirty_pages(mapping, current->nr_dirtied); + return balance_dirty_pages(mapping, current->nr_dirtied); + return 0; } EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr); -- 1.7.1 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 16:15 ` [PATCH 1/2] mm: " Jan Kara @ 2011-11-14 16:23 ` Christoph Hellwig 2011-11-15 11:48 ` Wu Fengguang 1 sibling, 0 replies; 34+ messages in thread From: Christoph Hellwig @ 2011-11-14 16:23 UTC (permalink / raw) To: Jan Kara Cc: Wu Fengguang, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel On Mon, Nov 14, 2011 at 05:15:23PM +0100, Jan Kara wrote: > There is no reason why task in balance_dirty_pages() shouldn't be killable > and it helps in recovering from some error conditions (like when filesystem > goes in error state and cannot accept writeback anymore but we still want to > kill processes using it to be able to unmount it). Sorry for noticing this so late, but what about killing the balance_dirty_pages_ratelimited wrapper, and only exporting a version that takes a number of pages now that we have to change the signature anyway? I'd recommend to call the new one simply balance_dirty_pages_ratelimited, that will give a compile breakge to anyone using one of the old versions and thus cause them to actually adapt and hopefull check the return value. (and not for this series but soon later - we probably really should call into balance_dirty_pages in larger batch sizes in most places). ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 16:15 ` [PATCH 1/2] mm: " Jan Kara 2011-11-14 16:23 ` Christoph Hellwig @ 2011-11-15 11:48 ` Wu Fengguang 2011-11-15 13:41 ` Jan Kara 1 sibling, 1 reply; 34+ messages in thread From: Wu Fengguang @ 2011-11-15 11:48 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org > +static int balance_dirty_pages(struct address_space *mapping, > unsigned long pages_dirtied) > { > unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */ > @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping, > unsigned long pos_ratio; > struct backing_dev_info *bdi = mapping->backing_dev_info; > unsigned long start_time = jiffies; > + int err = 0; > > for (;;) { > /* > @@ -1133,7 +1137,7 @@ pause: > pages_dirtied, > pause, > start_time); > - __set_current_state(TASK_UNINTERRUPTIBLE); > + __set_current_state(TASK_KILLABLE); > io_schedule_timeout(pause); > > dirty_thresh = hard_dirty_limit(dirty_thresh); > @@ -1145,6 +1149,11 @@ pause: > */ > if (nr_dirty < dirty_thresh) > break; > + > + if (fatal_signal_pending(current)) { > + err = -EINTR; > + break; > + } > } The other alternative is to raise the limit on fatal_signal_pending: if (fatal_signal_pending(current) && nr_dirty < dirty_thresh + dirty_thresh / 2) break; That should work well enough in practice and avoids touching the fs code. Thanks, Fengguang ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-15 11:48 ` Wu Fengguang @ 2011-11-15 13:41 ` Jan Kara 2011-11-15 14:15 ` Wu Fengguang 0 siblings, 1 reply; 34+ messages in thread From: Jan Kara @ 2011-11-15 13:41 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Tue 15-11-11 19:48:44, Wu Fengguang wrote: > > +static int balance_dirty_pages(struct address_space *mapping, > > unsigned long pages_dirtied) > > { > > unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */ > > @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping, > > unsigned long pos_ratio; > > struct backing_dev_info *bdi = mapping->backing_dev_info; > > unsigned long start_time = jiffies; > > + int err = 0; > > > > for (;;) { > > /* > > @@ -1133,7 +1137,7 @@ pause: > > pages_dirtied, > > pause, > > start_time); > > - __set_current_state(TASK_UNINTERRUPTIBLE); > > + __set_current_state(TASK_KILLABLE); > > io_schedule_timeout(pause); > > > > dirty_thresh = hard_dirty_limit(dirty_thresh); > > @@ -1145,6 +1149,11 @@ pause: > > */ > > if (nr_dirty < dirty_thresh) > > break; > > + > > + if (fatal_signal_pending(current)) { > > + err = -EINTR; > > + break; > > + } > > } > > The other alternative is to raise the limit on fatal_signal_pending: > > if (fatal_signal_pending(current) && > nr_dirty < dirty_thresh + dirty_thresh / 2) > break; > > That should work well enough in practice and avoids touching the fs code. Sorry, but I fail to see what would this bring us... Can you elaborate a bit please? Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-15 13:41 ` Jan Kara @ 2011-11-15 14:15 ` Wu Fengguang 2011-11-15 14:44 ` Jan Kara 0 siblings, 1 reply; 34+ messages in thread From: Wu Fengguang @ 2011-11-15 14:15 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Tue, Nov 15, 2011 at 09:41:27PM +0800, Jan Kara wrote: > On Tue 15-11-11 19:48:44, Wu Fengguang wrote: > > > +static int balance_dirty_pages(struct address_space *mapping, > > > unsigned long pages_dirtied) > > > { > > > unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */ > > > @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping, > > > unsigned long pos_ratio; > > > struct backing_dev_info *bdi = mapping->backing_dev_info; > > > unsigned long start_time = jiffies; > > > + int err = 0; > > > > > > for (;;) { > > > /* > > > @@ -1133,7 +1137,7 @@ pause: > > > pages_dirtied, > > > pause, > > > start_time); > > > - __set_current_state(TASK_UNINTERRUPTIBLE); > > > + __set_current_state(TASK_KILLABLE); > > > io_schedule_timeout(pause); > > > > > > dirty_thresh = hard_dirty_limit(dirty_thresh); > > > @@ -1145,6 +1149,11 @@ pause: > > > */ > > > if (nr_dirty < dirty_thresh) > > > break; > > > + > > > + if (fatal_signal_pending(current)) { > > > + err = -EINTR; > > > + break; > > > + } > > > } > > > > The other alternative is to raise the limit on fatal_signal_pending: > > > > if (fatal_signal_pending(current) && > > nr_dirty < dirty_thresh + dirty_thresh / 2) > > break; > > > > That should work well enough in practice and avoids touching the fs code. > Sorry, but I fail to see what would this bring us... Can you elaborate a > bit please? It's not bringing us something, but allows us to get rid of patch 2 as well as adding fatal_signal_pending() tests to all the other fs. So no more worries about partial writes :) Sure it leaves ->write_begin() blocks unhandled, however that should be much less a problem than the blocks inside balance_dirty_pages(). Thanks, Fengguang ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-15 14:15 ` Wu Fengguang @ 2011-11-15 14:44 ` Jan Kara 0 siblings, 0 replies; 34+ messages in thread From: Jan Kara @ 2011-11-15 14:44 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Tue 15-11-11 22:15:28, Wu Fengguang wrote: > On Tue, Nov 15, 2011 at 09:41:27PM +0800, Jan Kara wrote: > > On Tue 15-11-11 19:48:44, Wu Fengguang wrote: > > > > +static int balance_dirty_pages(struct address_space *mapping, > > > > unsigned long pages_dirtied) > > > > { > > > > unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */ > > > > @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping, > > > > unsigned long pos_ratio; > > > > struct backing_dev_info *bdi = mapping->backing_dev_info; > > > > unsigned long start_time = jiffies; > > > > + int err = 0; > > > > > > > > for (;;) { > > > > /* > > > > @@ -1133,7 +1137,7 @@ pause: > > > > pages_dirtied, > > > > pause, > > > > start_time); > > > > - __set_current_state(TASK_UNINTERRUPTIBLE); > > > > + __set_current_state(TASK_KILLABLE); > > > > io_schedule_timeout(pause); > > > > > > > > dirty_thresh = hard_dirty_limit(dirty_thresh); > > > > @@ -1145,6 +1149,11 @@ pause: > > > > */ > > > > if (nr_dirty < dirty_thresh) > > > > break; > > > > + > > > > + if (fatal_signal_pending(current)) { > > > > + err = -EINTR; > > > > + break; > > > > + } > > > > } > > > > > > The other alternative is to raise the limit on fatal_signal_pending: > > > > > > if (fatal_signal_pending(current) && > > > nr_dirty < dirty_thresh + dirty_thresh / 2) > > > break; > > > > > > That should work well enough in practice and avoids touching the fs code. > > Sorry, but I fail to see what would this bring us... Can you elaborate a > > bit please? > > It's not bringing us something, but allows us to get rid of patch 2 > as well as adding fatal_signal_pending() tests to all the other fs. > > So no more worries about partial writes :) > > Sure it leaves ->write_begin() blocks unhandled, however that should > be much less a problem than the blocks inside balance_dirty_pages(). Ah, OK, I see. Frankly, I'd rather keep the loop break in generic_perform_write() and have the code straightforward. I don't see any security (data exposure) problem there and if someone complains on the grounds that the behavior has changed, we can revert the change in the worst case. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 0/2] Make task doing heavy writing killable @ 2011-11-14 11:10 Jan Kara 2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara 0 siblings, 1 reply; 34+ messages in thread From: Jan Kara @ 2011-11-14 11:10 UTC (permalink / raw) To: linux-fsdevel Cc: Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig Hello, these two patches aim at making task waiting in balance_dirty_pages() killable. This is desirable because otherwise if filesystem stops accepting writes (e.g. if device has been removed or other serious error condidion) we have a task stuck in D state forever. I'm not sure who should merge these two patches... Al, Fengguang? Honza ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara @ 2011-11-14 11:10 ` Jan Kara 2011-11-14 12:12 ` Wu Fengguang 0 siblings, 1 reply; 34+ messages in thread From: Jan Kara @ 2011-11-14 11:10 UTC (permalink / raw) To: linux-fsdevel Cc: Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig, Jan Kara There is no reason why task in balance_dirty_pages() shouldn't be killable and it helps in recovering from some error conditions (like when filesystem goes in error state and cannot accept writeback anymore but we still want to kill processes using it to be able to unmount it). Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Jan Kara <jack@suse.cz> --- mm/page-writeback.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 0360d1b..e83c286 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1133,8 +1133,10 @@ pause: pages_dirtied, pause, start_time); - __set_current_state(TASK_UNINTERRUPTIBLE); + __set_current_state(TASK_KILLABLE); io_schedule_timeout(pause); + if (fatal_signal_pending(current)) + break; dirty_thresh = hard_dirty_limit(dirty_thresh); /* -- 1.7.1 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara @ 2011-11-14 12:12 ` Wu Fengguang 2011-11-14 12:37 ` Jan Kara 0 siblings, 1 reply; 34+ messages in thread From: Wu Fengguang @ 2011-11-14 12:12 UTC (permalink / raw) To: Jan Kara Cc: linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig On Mon, Nov 14, 2011 at 07:10:29PM +0800, Jan Kara wrote: > There is no reason why task in balance_dirty_pages() shouldn't be killable > and it helps in recovering from some error conditions (like when filesystem > goes in error state and cannot accept writeback anymore but we still want to > kill processes using it to be able to unmount it). > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > mm/page-writeback.c | 4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 0360d1b..e83c286 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1133,8 +1133,10 @@ pause: > pages_dirtied, > pause, > start_time); > - __set_current_state(TASK_UNINTERRUPTIBLE); > + __set_current_state(TASK_KILLABLE); Dumb question: will the task continue to show up as 'D' in ps? > io_schedule_timeout(pause); > + if (fatal_signal_pending(current)) > + break; I'd like to move that several lines below, to the end of the loop, so that the condition is not tested at all in normal executions. > dirty_thresh = hard_dirty_limit(dirty_thresh); > /* > -- > 1.7.1 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 12:12 ` Wu Fengguang @ 2011-11-14 12:37 ` Jan Kara 0 siblings, 0 replies; 34+ messages in thread From: Jan Kara @ 2011-11-14 12:37 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig On Mon 14-11-11 20:12:45, Wu Fengguang wrote: > On Mon, Nov 14, 2011 at 07:10:29PM +0800, Jan Kara wrote: > > There is no reason why task in balance_dirty_pages() shouldn't be killable > > and it helps in recovering from some error conditions (like when filesystem > > goes in error state and cannot accept writeback anymore but we still want to > > kill processes using it to be able to unmount it). > > > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > mm/page-writeback.c | 4 +++- > > 1 files changed, 3 insertions(+), 1 deletions(-) > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index 0360d1b..e83c286 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -1133,8 +1133,10 @@ pause: > > pages_dirtied, > > pause, > > start_time); > > - __set_current_state(TASK_UNINTERRUPTIBLE); > > + __set_current_state(TASK_KILLABLE); > > Dumb question: will the task continue to show up as 'D' in ps? Yes, it will. The only difference is that the task will be woken when SIGKILL arrives. > > io_schedule_timeout(pause); > > + if (fatal_signal_pending(current)) > > + break; > > I'd like to move that several lines below, to the end of the loop, > so that the condition is not tested at all in normal executions. OK, will do that. > > dirty_thresh = hard_dirty_limit(dirty_thresh); > > /* Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2011-11-29 14:18 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-16 11:12 [PATCH 0/2 v3] Make task in balance_dirty_pages() killable Jan Kara 2011-11-16 11:12 ` [PATCH 1/2] mm: " Jan Kara 2011-11-16 11:28 ` Wu Fengguang 2011-11-16 12:58 ` Jan Kara 2011-11-16 11:12 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 2011-11-16 11:44 ` Wu Fengguang 2011-11-16 12:54 ` Jan Kara 2011-11-16 13:11 ` Wu Fengguang 2011-11-22 22:28 ` Andrew Morton 2011-11-23 9:05 ` Wu Fengguang 2011-11-23 9:50 ` Andrew Morton 2011-11-23 12:27 ` [PATCH 2/2] " Theodore Tso 2011-11-23 20:29 ` Andrew Morton 2011-11-24 19:27 ` Matthew Wilcox 2011-11-24 20:53 ` Ted Ts'o 2011-11-25 0:10 ` Matthew Wilcox 2011-11-24 20:53 ` Jan Kara 2011-11-23 13:08 ` [PATCH 2/2] fs: " Jan Kara 2011-11-23 13:27 ` Wu Fengguang 2011-11-23 15:06 ` Theodore Tso 2011-11-28 3:08 ` Wu Fengguang 2011-11-28 3:33 ` [PATCH] writeback: add a safety limit to the SIGKILL abort Wu Fengguang 2011-11-29 14:18 ` Jan Kara 2011-11-29 14:16 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 2011-11-16 11:23 ` [PATCH 0/2 v3] Make task in balance_dirty_pages() killable Wu Fengguang -- strict thread matches above, loose matches on Subject: below -- 2011-11-14 16:15 [PATCH 0/2 v2] " Jan Kara 2011-11-14 16:15 ` [PATCH 1/2] mm: " Jan Kara 2011-11-14 16:23 ` Christoph Hellwig 2011-11-15 11:48 ` Wu Fengguang 2011-11-15 13:41 ` Jan Kara 2011-11-15 14:15 ` Wu Fengguang 2011-11-15 14:44 ` Jan Kara 2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara 2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara 2011-11-14 12:12 ` Wu Fengguang 2011-11-14 12:37 ` Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).