* [PATCH 0/2] Make task doing heavy writing killable
@ 2011-11-14 11:10 Jan Kara
2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Jan Kara @ 2011-11-14 11:10 UTC (permalink / raw)
To: linux-fsdevel
Cc: Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig
Hello,
these two patches aim at making task waiting in balance_dirty_pages()
killable. This is desirable because otherwise if filesystem stops accepting
writes (e.g. if device has been removed or other serious error condidion) we
have a task stuck in D state forever.
I'm not sure who should merge these two patches... Al, Fengguang?
Honza
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara
@ 2011-11-14 11:10 ` Jan Kara
2011-11-14 12:12 ` Wu Fengguang
2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang
2 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2011-11-14 11:10 UTC (permalink / raw)
To: linux-fsdevel
Cc: Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig,
Jan Kara
There is no reason why task in balance_dirty_pages() shouldn't be killable
and it helps in recovering from some error conditions (like when filesystem
goes in error state and cannot accept writeback anymore but we still want to
kill processes using it to be able to unmount it).
Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
mm/page-writeback.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0360d1b..e83c286 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1133,8 +1133,10 @@ pause:
pages_dirtied,
pause,
start_time);
- __set_current_state(TASK_UNINTERRUPTIBLE);
+ __set_current_state(TASK_KILLABLE);
io_schedule_timeout(pause);
+ if (fatal_signal_pending(current))
+ break;
dirty_thresh = hard_dirty_limit(dirty_thresh);
/*
--
1.7.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara
2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara
@ 2011-11-14 11:10 ` Jan Kara
2011-11-14 12:12 ` Matthew Wilcox
2011-11-14 12:15 ` Wu Fengguang
2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang
2 siblings, 2 replies; 30+ messages in thread
From: Jan Kara @ 2011-11-14 11:10 UTC (permalink / raw)
To: linux-fsdevel
Cc: Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig,
Jan Kara
Currently write(2) to a file is not interruptible by a signal. Sometimes this
is desirable (e.g. when you want to quickly kill a process hogging your disk or
when some process gets blocked in balance_dirty_pages() indefinitely due to a
filesystem being in an error condition).
Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
mm/filemap.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index c0018f2..6b01d2f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file,
iov_iter_count(i));
again:
+ if (signal_pending(current)) {
+ status = -EINTR;
+ break;
+ }
/*
* Bring in the user page that we will copy from _first_.
--
1.7.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara
2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara
2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
@ 2011-11-14 11:59 ` Wu Fengguang
2011-11-14 12:05 ` Christoph Hellwig
2011-11-14 12:12 ` Jan Kara
2 siblings, 2 replies; 30+ messages in thread
From: Wu Fengguang @ 2011-11-14 11:59 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com,
Andrew Morton, Christoph Hellwig
[-- Attachment #1: Type: text/plain, Size: 955 bytes --]
On Mon, Nov 14, 2011 at 07:10:28PM +0800, Jan Kara wrote:
>
> Hello,
>
> these two patches aim at making task waiting in balance_dirty_pages()
> killable. This is desirable because otherwise if filesystem stops accepting
> writes (e.g. if device has been removed or other serious error condidion) we
> have a task stuck in D state forever.
Agreed totally. I myself has run into such conditions and get very
annoyed not being able to kill the hard throttled tasks -- they just
stuck there for ever if the error condition does not change.
> I'm not sure who should merge these two patches... Al, Fengguang?
I'd like to do it -- otherwise there will obviously be merge conflicts.
Actually I also queued a patch to do this (attached). Your patches do
better on TASK_KILLABLE and the use of signal_pending() in write
routines, while mine goes further to add the break to various
filesystems. How about combining them together?
Thanks,
Fengguang
[-- Attachment #2: writeback-break-on-signal-pending.patch --]
[-- Type: text/x-diff, Size: 4638 bytes --]
Subject: writeback: quit throttling when fatal signal pending
Date: Wed Sep 08 17:40:22 CST 2010
This allows quick response to Ctrl-C etc. for impatient users.
It's necessary to abort the generic_perform_write() and other filesystem
write loops too, to avoid large write + SIGKILL combination exceeding
the dirty limit and possibly strange OOM.
It mainly helps the rare bdi/global dirty exceeded cases.
In the normal case of not exceeded, it will quit the loop anyway.
if (fatal_signal_pending(current) &&
nr_reclaimable + nr_writeback <= dirty_thresh)
break;
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
cc: Chris Mason <chris.mason@oracle.com>
cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
fs/btrfs/file.c | 4 ++++
fs/btrfs/ioctl.c | 4 ++++
fs/btrfs/relocation.c | 4 ++++
fs/buffer.c | 4 ++++
fs/ntfs/attrib.c | 2 ++
fs/ntfs/file.c | 6 ++++++
mm/filemap.c | 3 ++-
mm/page-writeback.c | 2 ++
8 files changed, 28 insertions(+), 1 deletion(-)
--- linux-next.orig/mm/page-writeback.c 2011-11-13 20:37:11.000000000 +0800
+++ linux-next/mm/page-writeback.c 2011-11-13 20:37:57.000000000 +0800
@@ -1185,6 +1185,8 @@ pause:
*/
if (task_ratelimit)
break;
+ if (fatal_signal_pending(current))
+ break;
}
if (!dirty_exceeded && bdi->dirty_exceeded)
--- linux-next.orig/mm/filemap.c 2011-11-13 20:03:23.000000000 +0800
+++ linux-next/mm/filemap.c 2011-11-13 20:37:17.000000000 +0800
@@ -2463,7 +2463,8 @@ again:
written += copied;
balance_dirty_pages_ratelimited(mapping);
-
+ if (fatal_signal_pending(current))
+ break;
} while (iov_iter_count(i));
return written ? written : status;
--- linux-next.orig/fs/btrfs/file.c 2011-11-13 20:37:11.000000000 +0800
+++ linux-next/fs/btrfs/file.c 2011-11-13 20:37:17.000000000 +0800
@@ -1274,6 +1274,10 @@ static noinline ssize_t __btrfs_buffered
dirty_pages);
if (dirty_pages < (root->leafsize >> PAGE_CACHE_SHIFT) + 1)
btrfs_btree_balance_dirty(root, 1);
+ if (fatal_signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
btrfs_throttle(root);
pos += copied;
--- linux-next.orig/fs/btrfs/ioctl.c 2011-11-13 20:03:23.000000000 +0800
+++ linux-next/fs/btrfs/ioctl.c 2011-11-13 20:37:17.000000000 +0800
@@ -1115,6 +1115,10 @@ int btrfs_defrag_file(struct inode *inod
defrag_count += ret;
balance_dirty_pages_ratelimited_nr(inode->i_mapping, ret);
+ if (fatal_signal_pending(current)) {
+ ret = -EINTR;
+ goto out_ra;
+ }
if (newer_than) {
if (newer_off == (u64)-1)
--- linux-next.orig/fs/btrfs/relocation.c 2011-11-13 20:03:23.000000000 +0800
+++ linux-next/fs/btrfs/relocation.c 2011-11-13 20:37:17.000000000 +0800
@@ -3009,6 +3009,10 @@ static int relocate_file_extent_cluster(
index++;
balance_dirty_pages_ratelimited(inode->i_mapping);
+ if (fatal_signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
btrfs_throttle(BTRFS_I(inode)->root);
}
WARN_ON(nr != cluster->nr);
--- linux-next.orig/fs/buffer.c 2011-11-13 20:03:23.000000000 +0800
+++ linux-next/fs/buffer.c 2011-11-13 20:37:17.000000000 +0800
@@ -2255,6 +2255,10 @@ static int cont_expand_zero(struct file
err = 0;
balance_dirty_pages_ratelimited(mapping);
+ if (fatal_signal_pending(current)) {
+ err = -EINTR;
+ goto out;
+ }
}
/* page covers the boundary, find the boundary offset */
--- linux-next.orig/fs/ntfs/attrib.c 2011-11-13 20:03:23.000000000 +0800
+++ linux-next/fs/ntfs/attrib.c 2011-11-13 20:37:17.000000000 +0800
@@ -2588,6 +2588,8 @@ int ntfs_attr_set(ntfs_inode *ni, const
unlock_page(page);
page_cache_release(page);
balance_dirty_pages_ratelimited(mapping);
+ if (fatal_signal_pending(current))
+ return -EINTR;
cond_resched();
}
/* If there is a last partial page, need to do it the slow way. */
--- linux-next.orig/fs/ntfs/file.c 2011-11-13 20:03:23.000000000 +0800
+++ linux-next/fs/ntfs/file.c 2011-11-13 20:37:17.000000000 +0800
@@ -278,6 +278,10 @@ do_non_resident_extend:
* files.
*/
balance_dirty_pages_ratelimited(mapping);
+ if (fatal_signal_pending(current)) {
+ err = -EINTR;
+ goto init_err_out;
+ }
cond_resched();
} while (++index < end_index);
read_lock_irqsave(&ni->size_lock, flags);
@@ -2054,6 +2058,8 @@ static ssize_t ntfs_file_buffered_write(
if (unlikely(status))
break;
balance_dirty_pages_ratelimited(mapping);
+ if (fatal_signal_pending(current))
+ break;
cond_resched();
} while (count);
err_out:
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang
@ 2011-11-14 12:05 ` Christoph Hellwig
2011-11-14 12:24 ` Jan Kara
2011-11-14 12:29 ` Wu Fengguang
2011-11-14 12:12 ` Jan Kara
1 sibling, 2 replies; 30+ messages in thread
From: Christoph Hellwig @ 2011-11-14 12:05 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon, Nov 14, 2011 at 07:59:12PM +0800, Wu Fengguang wrote:
> On Mon, Nov 14, 2011 at 07:10:28PM +0800, Jan Kara wrote:
> >
> > Hello,
> >
> > these two patches aim at making task waiting in balance_dirty_pages()
> > killable. This is desirable because otherwise if filesystem stops accepting
> > writes (e.g. if device has been removed or other serious error condidion) we
> > have a task stuck in D state forever.
>
> Agreed totally. I myself has run into such conditions and get very
> annoyed not being able to kill the hard throttled tasks -- they just
> stuck there for ever if the error condition does not change.
>
> > I'm not sure who should merge these two patches... Al, Fengguang?
>
> I'd like to do it -- otherwise there will obviously be merge conflicts.
>
> Actually I also queued a patch to do this (attached). Your patches do
> better on TASK_KILLABLE and the use of signal_pending() in write
> routines, while mine goes further to add the break to various
> filesystems. How about combining them together?
Can you make balance_dirty_pages(_ratelimited) return an error instead
of opencoding the fatal signal check everywhere? That would make the
interface a bit more obvious.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang
2011-11-14 12:05 ` Christoph Hellwig
@ 2011-11-14 12:12 ` Jan Kara
1 sibling, 0 replies; 30+ messages in thread
From: Jan Kara @ 2011-11-14 12:12 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon 14-11-11 19:59:12, Wu Fengguang wrote:
> On Mon, Nov 14, 2011 at 07:10:28PM +0800, Jan Kara wrote:
> >
> > Hello,
> >
> > these two patches aim at making task waiting in balance_dirty_pages()
> > killable. This is desirable because otherwise if filesystem stops accepting
> > writes (e.g. if device has been removed or other serious error condidion) we
> > have a task stuck in D state forever.
>
> Agreed totally. I myself has run into such conditions and get very
> annoyed not being able to kill the hard throttled tasks -- they just
> stuck there for ever if the error condition does not change.
>
> > I'm not sure who should merge these two patches... Al, Fengguang?
>
> I'd like to do it -- otherwise there will obviously be merge conflicts.
Good.
> Actually I also queued a patch to do this (attached). Your patches do
> better on TASK_KILLABLE and the use of signal_pending() in write
> routines, while mine goes further to add the break to various
> filesystems. How about combining them together?
>
> Subject: writeback: quit throttling when fatal signal pending
> Date: Wed Sep 08 17:40:22 CST 2010
>
> This allows quick response to Ctrl-C etc. for impatient users.
>
> It's necessary to abort the generic_perform_write() and other filesystem
> write loops too, to avoid large write + SIGKILL combination exceeding
> the dirty limit and possibly strange OOM.
>
> It mainly helps the rare bdi/global dirty exceeded cases.
> In the normal case of not exceeded, it will quit the loop anyway.
>
> if (fatal_signal_pending(current) &&
> nr_reclaimable + nr_writeback <= dirty_thresh)
> break;
>
> Reviewed-by: Neil Brown <neilb@suse.de>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> cc: Chris Mason <chris.mason@oracle.com>
> cc: Anton Altaparmakov <aia21@cantab.net>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
> fs/btrfs/file.c | 4 ++++
> fs/btrfs/ioctl.c | 4 ++++
> fs/btrfs/relocation.c | 4 ++++
> fs/buffer.c | 4 ++++
> fs/ntfs/attrib.c | 2 ++
> fs/ntfs/file.c | 6 ++++++
> mm/filemap.c | 3 ++-
> mm/page-writeback.c | 2 ++
> 8 files changed, 28 insertions(+), 1 deletion(-)
>
> --- linux-next.orig/mm/page-writeback.c 2011-11-13 20:37:11.000000000 +0800
> +++ linux-next/mm/page-writeback.c 2011-11-13 20:37:57.000000000 +0800
> @@ -1185,6 +1185,8 @@ pause:
> */
> if (task_ratelimit)
> break;
> + if (fatal_signal_pending(current))
> + break;
> }
>
> if (!dirty_exceeded && bdi->dirty_exceeded)
> --- linux-next.orig/mm/filemap.c 2011-11-13 20:03:23.000000000 +0800
> +++ linux-next/mm/filemap.c 2011-11-13 20:37:17.000000000 +0800
> @@ -2463,7 +2463,8 @@ again:
> written += copied;
>
> balance_dirty_pages_ratelimited(mapping);
> -
> + if (fatal_signal_pending(current))
> + break;
> } while (iov_iter_count(i));
>
> return written ? written : status;
Above two hunks are already part of my patches (effectively). The hunks
below look good but I guess they should be split to per-filesystem patches
and sent to filesystem maintainers for merging. The changes are independent
so there's no reason to do them in one big patch or skip fs maintainers...
Honza
> --- linux-next.orig/fs/btrfs/file.c 2011-11-13 20:37:11.000000000 +0800
> +++ linux-next/fs/btrfs/file.c 2011-11-13 20:37:17.000000000 +0800
> @@ -1274,6 +1274,10 @@ static noinline ssize_t __btrfs_buffered
> dirty_pages);
> if (dirty_pages < (root->leafsize >> PAGE_CACHE_SHIFT) + 1)
> btrfs_btree_balance_dirty(root, 1);
> + if (fatal_signal_pending(current)) {
> + ret = -EINTR;
> + break;
> + }
> btrfs_throttle(root);
>
> pos += copied;
> --- linux-next.orig/fs/btrfs/ioctl.c 2011-11-13 20:03:23.000000000 +0800
> +++ linux-next/fs/btrfs/ioctl.c 2011-11-13 20:37:17.000000000 +0800
> @@ -1115,6 +1115,10 @@ int btrfs_defrag_file(struct inode *inod
>
> defrag_count += ret;
> balance_dirty_pages_ratelimited_nr(inode->i_mapping, ret);
> + if (fatal_signal_pending(current)) {
> + ret = -EINTR;
> + goto out_ra;
> + }
>
> if (newer_than) {
> if (newer_off == (u64)-1)
> --- linux-next.orig/fs/btrfs/relocation.c 2011-11-13 20:03:23.000000000 +0800
> +++ linux-next/fs/btrfs/relocation.c 2011-11-13 20:37:17.000000000 +0800
> @@ -3009,6 +3009,10 @@ static int relocate_file_extent_cluster(
>
> index++;
> balance_dirty_pages_ratelimited(inode->i_mapping);
> + if (fatal_signal_pending(current)) {
> + ret = -EINTR;
> + break;
> + }
> btrfs_throttle(BTRFS_I(inode)->root);
> }
> WARN_ON(nr != cluster->nr);
> --- linux-next.orig/fs/buffer.c 2011-11-13 20:03:23.000000000 +0800
> +++ linux-next/fs/buffer.c 2011-11-13 20:37:17.000000000 +0800
> @@ -2255,6 +2255,10 @@ static int cont_expand_zero(struct file
> err = 0;
>
> balance_dirty_pages_ratelimited(mapping);
> + if (fatal_signal_pending(current)) {
> + err = -EINTR;
> + goto out;
> + }
> }
>
> /* page covers the boundary, find the boundary offset */
> --- linux-next.orig/fs/ntfs/attrib.c 2011-11-13 20:03:23.000000000 +0800
> +++ linux-next/fs/ntfs/attrib.c 2011-11-13 20:37:17.000000000 +0800
> @@ -2588,6 +2588,8 @@ int ntfs_attr_set(ntfs_inode *ni, const
> unlock_page(page);
> page_cache_release(page);
> balance_dirty_pages_ratelimited(mapping);
> + if (fatal_signal_pending(current))
> + return -EINTR;
> cond_resched();
> }
> /* If there is a last partial page, need to do it the slow way. */
> --- linux-next.orig/fs/ntfs/file.c 2011-11-13 20:03:23.000000000 +0800
> +++ linux-next/fs/ntfs/file.c 2011-11-13 20:37:17.000000000 +0800
> @@ -278,6 +278,10 @@ do_non_resident_extend:
> * files.
> */
> balance_dirty_pages_ratelimited(mapping);
> + if (fatal_signal_pending(current)) {
> + err = -EINTR;
> + goto init_err_out;
> + }
> cond_resched();
> } while (++index < end_index);
> read_lock_irqsave(&ni->size_lock, flags);
> @@ -2054,6 +2058,8 @@ static ssize_t ntfs_file_buffered_write(
> if (unlikely(status))
> break;
> balance_dirty_pages_ratelimited(mapping);
> + if (fatal_signal_pending(current))
> + break;
> cond_resched();
> } while (count);
> err_out:
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara
@ 2011-11-14 12:12 ` Wu Fengguang
2011-11-14 12:37 ` Jan Kara
0 siblings, 1 reply; 30+ messages in thread
From: Wu Fengguang @ 2011-11-14 12:12 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com,
Andrew Morton, Christoph Hellwig
On Mon, Nov 14, 2011 at 07:10:29PM +0800, Jan Kara wrote:
> There is no reason why task in balance_dirty_pages() shouldn't be killable
> and it helps in recovering from some error conditions (like when filesystem
> goes in error state and cannot accept writeback anymore but we still want to
> kill processes using it to be able to unmount it).
>
> Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> mm/page-writeback.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 0360d1b..e83c286 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -1133,8 +1133,10 @@ pause:
> pages_dirtied,
> pause,
> start_time);
> - __set_current_state(TASK_UNINTERRUPTIBLE);
> + __set_current_state(TASK_KILLABLE);
Dumb question: will the task continue to show up as 'D' in ps?
> io_schedule_timeout(pause);
> + if (fatal_signal_pending(current))
> + break;
I'd like to move that several lines below, to the end of the loop,
so that the condition is not tested at all in normal executions.
> dirty_thresh = hard_dirty_limit(dirty_thresh);
> /*
> --
> 1.7.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
@ 2011-11-14 12:12 ` Matthew Wilcox
2011-11-14 12:15 ` Wu Fengguang
1 sibling, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2011-11-14 12:12 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Wu Fengguang, Al Viro, k-mio, Andrew Morton,
Christoph Hellwig
On Mon, Nov 14, 2011 at 12:10:30PM +0100, Jan Kara wrote:
> again:
> + if (signal_pending(current)) {
Uhh ... surely 'fatal_signal_pending', no?
> + status = -EINTR;
> + break;
> + }
>
> /*
> * Bring in the user page that we will copy from _first_.
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
2011-11-14 12:12 ` Matthew Wilcox
@ 2011-11-14 12:15 ` Wu Fengguang
2011-11-14 12:34 ` Jan Kara
1 sibling, 1 reply; 30+ messages in thread
From: Wu Fengguang @ 2011-11-14 12:15 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com,
Andrew Morton, Christoph Hellwig
> @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file,
> iov_iter_count(i));
>
> again:
> + if (signal_pending(current)) {
signal_pending looks more useful than fatal_signal_pending in that it
covers normal signals too. However it's exactly the broader coverage
that makes it an interface change -- will this possibly break casually
written applications?
> + status = -EINTR;
> + break;
> + }
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 12:05 ` Christoph Hellwig
@ 2011-11-14 12:24 ` Jan Kara
2011-11-14 12:29 ` Wu Fengguang
1 sibling, 0 replies; 30+ messages in thread
From: Jan Kara @ 2011-11-14 12:24 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Wu Fengguang, Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton
On Mon 14-11-11 07:05:46, Christoph Hellwig wrote:
> On Mon, Nov 14, 2011 at 07:59:12PM +0800, Wu Fengguang wrote:
> > On Mon, Nov 14, 2011 at 07:10:28PM +0800, Jan Kara wrote:
> > >
> > > Hello,
> > >
> > > these two patches aim at making task waiting in balance_dirty_pages()
> > > killable. This is desirable because otherwise if filesystem stops accepting
> > > writes (e.g. if device has been removed or other serious error condidion) we
> > > have a task stuck in D state forever.
> >
> > Agreed totally. I myself has run into such conditions and get very
> > annoyed not being able to kill the hard throttled tasks -- they just
> > stuck there for ever if the error condition does not change.
> >
> > > I'm not sure who should merge these two patches... Al, Fengguang?
> >
> > I'd like to do it -- otherwise there will obviously be merge conflicts.
> >
> > Actually I also queued a patch to do this (attached). Your patches do
> > better on TASK_KILLABLE and the use of signal_pending() in write
> > routines, while mine goes further to add the break to various
> > filesystems. How about combining them together?
>
> Can you make balance_dirty_pages(_ratelimited) return an error instead
> of opencoding the fatal signal check everywhere? That would make the
> interface a bit more obvious.
We can do this. It's just that signal_pending() check e.g. in
generic_perform_write() has a sense even when balance_dirty_pages() will be
returning error because it also catches other cases...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 12:05 ` Christoph Hellwig
2011-11-14 12:24 ` Jan Kara
@ 2011-11-14 12:29 ` Wu Fengguang
2011-11-14 12:41 ` Christoph Hellwig
1 sibling, 1 reply; 30+ messages in thread
From: Wu Fengguang @ 2011-11-14 12:29 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton
> Can you make balance_dirty_pages(_ratelimited) return an error instead
> of opencoding the fatal signal che
Hmm that means a bigger change to the prototype of the _exported_
balance_dirty_pages_ratelimited_nr() which impacts out of tree FS...
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 12:15 ` Wu Fengguang
@ 2011-11-14 12:34 ` Jan Kara
2011-11-14 14:16 ` Matthew Wilcox
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2011-11-14 12:34 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon 14-11-11 20:15:56, Wu Fengguang wrote:
> > @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file,
> > iov_iter_count(i));
> >
> > again:
> > + if (signal_pending(current)) {
>
> signal_pending looks more useful than fatal_signal_pending in that it
> covers normal signals too. However it's exactly the broader coverage
> that makes it an interface change -- will this possibly break casually
> written applications?
Yeah, this is upto discussion. Historically, write() (or any other system
call) could have returned EINTR. In fact, write() to a socket can return
EINTR even now. But you are right that we didn't return EINTR from write()
to a regular file. So if you prefer to never return EINTR from a write to a
regular file, I can change the check since I'm also slightly worried that
some badly written app can notice.
Honza
>
> > + status = -EINTR;
> > + break;
> > + }
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-14 12:12 ` Wu Fengguang
@ 2011-11-14 12:37 ` Jan Kara
0 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2011-11-14 12:37 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon 14-11-11 20:12:45, Wu Fengguang wrote:
> On Mon, Nov 14, 2011 at 07:10:29PM +0800, Jan Kara wrote:
> > There is no reason why task in balance_dirty_pages() shouldn't be killable
> > and it helps in recovering from some error conditions (like when filesystem
> > goes in error state and cannot accept writeback anymore but we still want to
> > kill processes using it to be able to unmount it).
> >
> > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > mm/page-writeback.c | 4 +++-
> > 1 files changed, 3 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 0360d1b..e83c286 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -1133,8 +1133,10 @@ pause:
> > pages_dirtied,
> > pause,
> > start_time);
> > - __set_current_state(TASK_UNINTERRUPTIBLE);
> > + __set_current_state(TASK_KILLABLE);
>
> Dumb question: will the task continue to show up as 'D' in ps?
Yes, it will. The only difference is that the task will be woken when
SIGKILL arrives.
> > io_schedule_timeout(pause);
> > + if (fatal_signal_pending(current))
> > + break;
>
> I'd like to move that several lines below, to the end of the loop,
> so that the condition is not tested at all in normal executions.
OK, will do that.
> > dirty_thresh = hard_dirty_limit(dirty_thresh);
> > /*
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 12:29 ` Wu Fengguang
@ 2011-11-14 12:41 ` Christoph Hellwig
2011-11-14 13:01 ` Wu Fengguang
0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2011-11-14 12:41 UTC (permalink / raw)
To: Wu Fengguang
Cc: Christoph Hellwig, Jan Kara, linux-fsdevel@vger.kernel.org,
Al Viro, k-mio@sx.jp.nec.com, Andrew Morton
On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote:
> > Can you make balance_dirty_pages(_ratelimited) return an error instead
> > of opencoding the fatal signal che
>
> Hmm that means a bigger change to the prototype of the _exported_
> balance_dirty_pages_ratelimited_nr() which impacts out of tree FS...
So what? out of tree fses have always been entirely supported on their
own, and not a matter for keeping progress back.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 12:41 ` Christoph Hellwig
@ 2011-11-14 13:01 ` Wu Fengguang
2011-11-14 15:28 ` Jan Kara
0 siblings, 1 reply; 30+ messages in thread
From: Wu Fengguang @ 2011-11-14 13:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton
On Mon, Nov 14, 2011 at 08:41:45PM +0800, Christoph Hellwig wrote:
> On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote:
> > > Can you make balance_dirty_pages(_ratelimited) return an error instead
> > > of opencoding the fatal signal che
> >
> > Hmm that means a bigger change to the prototype of the _exported_
> > balance_dirty_pages_ratelimited_nr() which impacts out of tree FS...
>
> So what? out of tree fses have always been entirely supported on their
> own, and not a matter for keeping progress back.
OK. Jan, would you do the mm/*.c part in v2? I'll do the various FS
parts.
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 12:34 ` Jan Kara
@ 2011-11-14 14:16 ` Matthew Wilcox
2011-11-14 15:30 ` Jan Kara
0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2011-11-14 14:16 UTC (permalink / raw)
To: Jan Kara
Cc: Wu Fengguang, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon, Nov 14, 2011 at 01:34:46PM +0100, Jan Kara wrote:
> On Mon 14-11-11 20:15:56, Wu Fengguang wrote:
> > > @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file,
> > > iov_iter_count(i));
> > >
> > > again:
> > > + if (signal_pending(current)) {
> >
> > signal_pending looks more useful than fatal_signal_pending in that it
> > covers normal signals too. However it's exactly the broader coverage
> > that makes it an interface change -- will this possibly break casually
> > written applications?
> Yeah, this is upto discussion. Historically, write() (or any other system
> call) could have returned EINTR. In fact, write() to a socket can return
> EINTR even now. But you are right that we didn't return EINTR from write()
> to a regular file. So if you prefer to never return EINTR from a write to a
> regular file, I can change the check since I'm also slightly worried that
> some badly written app can notice.
No, this is not up for discussion. You can't return short writes (or
reads). This is why the 'fatal_signal_pending' API exists -- if the
signal is fatal, the task is never returned to, so its bug (not checking
the return from read/write) is not exposed.
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 13:01 ` Wu Fengguang
@ 2011-11-14 15:28 ` Jan Kara
2011-11-14 15:32 ` Christoph Hellwig
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2011-11-14 15:28 UTC (permalink / raw)
To: Wu Fengguang
Cc: Christoph Hellwig, Jan Kara, linux-fsdevel@vger.kernel.org,
Al Viro, k-mio@sx.jp.nec.com, Andrew Morton
On Mon 14-11-11 21:01:53, Wu Fengguang wrote:
> On Mon, Nov 14, 2011 at 08:41:45PM +0800, Christoph Hellwig wrote:
> > On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote:
> > > > Can you make balance_dirty_pages(_ratelimited) return an error instead
> > > > of opencoding the fatal signal che
> > >
> > > Hmm that means a bigger change to the prototype of the _exported_
> > > balance_dirty_pages_ratelimited_nr() which impacts out of tree FS...
> >
> > So what? out of tree fses have always been entirely supported on their
> > own, and not a matter for keeping progress back.
>
> OK. Jan, would you do the mm/*.c part in v2? I'll do the various FS
> parts.
Sure, I'll update my patch to balance_dirty_pages(). Should I also update
my patch to generic_perform_write() or does that fall under "various FS
parts" cathegory?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 14:16 ` Matthew Wilcox
@ 2011-11-14 15:30 ` Jan Kara
2011-11-14 18:44 ` Jeremy Allison
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2011-11-14 15:30 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Jan Kara, Wu Fengguang, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon 14-11-11 07:16:26, Matthew Wilcox wrote:
> On Mon, Nov 14, 2011 at 01:34:46PM +0100, Jan Kara wrote:
> > On Mon 14-11-11 20:15:56, Wu Fengguang wrote:
> > > > @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file,
> > > > iov_iter_count(i));
> > > >
> > > > again:
> > > > + if (signal_pending(current)) {
> > >
> > > signal_pending looks more useful than fatal_signal_pending in that it
> > > covers normal signals too. However it's exactly the broader coverage
> > > that makes it an interface change -- will this possibly break casually
> > > written applications?
> > Yeah, this is upto discussion. Historically, write() (or any other system
> > call) could have returned EINTR. In fact, write() to a socket can return
> > EINTR even now. But you are right that we didn't return EINTR from write()
> > to a regular file. So if you prefer to never return EINTR from a write to a
> > regular file, I can change the check since I'm also slightly worried that
> > some badly written app can notice.
>
> No, this is not up for discussion. You can't return short writes (or
> reads). This is why the 'fatal_signal_pending' API exists -- if the
> signal is fatal, the task is never returned to, so its bug (not checking
> the return from read/write) is not exposed.
By "can't return" you mean userspace need not be expecting it so we
shouldn't break it or is there some standard which forbids it? Just
curious...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 15:28 ` Jan Kara
@ 2011-11-14 15:32 ` Christoph Hellwig
2011-11-14 16:19 ` Jan Kara
0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2011-11-14 15:32 UTC (permalink / raw)
To: Jan Kara
Cc: Wu Fengguang, Christoph Hellwig, linux-fsdevel@vger.kernel.org,
Al Viro, k-mio@sx.jp.nec.com, Andrew Morton
On Mon, Nov 14, 2011 at 04:28:32PM +0100, Jan Kara wrote:
> On Mon 14-11-11 21:01:53, Wu Fengguang wrote:
> > On Mon, Nov 14, 2011 at 08:41:45PM +0800, Christoph Hellwig wrote:
> > > On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote:
> > > > > Can you make balance_dirty_pages(_ratelimited) return an error instead
> > > > > of opencoding the fatal signal che
> > > >
> > > > Hmm that means a bigger change to the prototype of the _exported_
> > > > balance_dirty_pages_ratelimited_nr() which impacts out of tree FS...
> > >
> > > So what? out of tree fses have always been entirely supported on their
> > > own, and not a matter for keeping progress back.
> >
> > OK. Jan, would you do the mm/*.c part in v2? I'll do the various FS
> > parts.
> Sure, I'll update my patch to balance_dirty_pages(). Should I also update
> my patch to generic_perform_write() or does that fall under "various FS
> parts" cathegory?
I'd be much happier if all bits go in together. IMHO this is a change
worthwhile for 3.2 still, so we should probably try to get it in ASAP.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-14 16:15 [PATCH 0/2 v2] Make task in balance_dirty_pages() killable Jan Kara
@ 2011-11-14 16:15 ` Jan Kara
2011-11-14 16:23 ` Christoph Hellwig
2011-11-15 11:48 ` Wu Fengguang
0 siblings, 2 replies; 30+ messages in thread
From: Jan Kara @ 2011-11-14 16:15 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel,
Jan Kara
There is no reason why task in balance_dirty_pages() shouldn't be killable
and it helps in recovering from some error conditions (like when filesystem
goes in error state and cannot accept writeback anymore but we still want to
kill processes using it to be able to unmount it).
Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
include/linux/writeback.h | 8 ++++----
mm/page-writeback.c | 27 +++++++++++++++++++--------
2 files changed, 23 insertions(+), 12 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index a378c29..eb3ecca 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -170,13 +170,13 @@ void __bdi_update_bandwidth(struct backing_dev_info *bdi,
unsigned long start_time);
void page_writeback_init(void);
-void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
- unsigned long nr_pages_dirtied);
+int balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
+ unsigned long nr_pages_dirtied);
-static inline void
+static inline int
balance_dirty_pages_ratelimited(struct address_space *mapping)
{
- balance_dirty_pages_ratelimited_nr(mapping, 1);
+ return balance_dirty_pages_ratelimited_nr(mapping, 1);
}
typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0360d1b..380279c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1000,8 +1000,11 @@ static unsigned long bdi_max_pause(struct backing_dev_info *bdi,
* the caller to wait once crossing the (background_thresh + dirty_thresh) / 2.
* If we're over `background_thresh' then the writeback threads are woken to
* perform some writeout.
+ *
+ * We return 0 if nothing special happened, EINTR if our wait has been
+ * interrupted by a fatal signal.
*/
-static void balance_dirty_pages(struct address_space *mapping,
+static int balance_dirty_pages(struct address_space *mapping,
unsigned long pages_dirtied)
{
unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */
@@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping,
unsigned long pos_ratio;
struct backing_dev_info *bdi = mapping->backing_dev_info;
unsigned long start_time = jiffies;
+ int err = 0;
for (;;) {
/*
@@ -1133,7 +1137,7 @@ pause:
pages_dirtied,
pause,
start_time);
- __set_current_state(TASK_UNINTERRUPTIBLE);
+ __set_current_state(TASK_KILLABLE);
io_schedule_timeout(pause);
dirty_thresh = hard_dirty_limit(dirty_thresh);
@@ -1145,6 +1149,11 @@ pause:
*/
if (nr_dirty < dirty_thresh)
break;
+
+ if (fatal_signal_pending(current)) {
+ err = -EINTR;
+ break;
+ }
}
if (!dirty_exceeded && bdi->dirty_exceeded)
@@ -1168,7 +1177,7 @@ pause:
}
if (writeback_in_progress(bdi))
- return;
+ return err;
/*
* In laptop mode, we wait until hitting the higher threshold before
@@ -1179,10 +1188,11 @@ pause:
* background_thresh, to keep the amount of dirty memory low.
*/
if (laptop_mode)
- return;
+ return err;
if (nr_reclaimable > background_thresh)
bdi_start_background_writeback(bdi);
+ return err;
}
void set_page_dirty_balance(struct page *page, int page_mkwrite)
@@ -1211,15 +1221,15 @@ static DEFINE_PER_CPU(int, bdp_ratelimits);
* limit we decrease the ratelimiting by a lot, to prevent individual processes
* from overshooting the limit by (ratelimit_pages) each.
*/
-void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
- unsigned long nr_pages_dirtied)
+int balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
+ unsigned long nr_pages_dirtied)
{
struct backing_dev_info *bdi = mapping->backing_dev_info;
int ratelimit;
int *p;
if (!bdi_cap_account_dirty(bdi))
- return;
+ return 0;
ratelimit = current->nr_dirtied_pause;
if (bdi->dirty_exceeded)
@@ -1247,7 +1257,8 @@ void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
preempt_enable();
if (unlikely(current->nr_dirtied >= ratelimit))
- balance_dirty_pages(mapping, current->nr_dirtied);
+ return balance_dirty_pages(mapping, current->nr_dirtied);
+ return 0;
}
EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr);
--
1.7.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 15:32 ` Christoph Hellwig
@ 2011-11-14 16:19 ` Jan Kara
0 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2011-11-14 16:19 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, Wu Fengguang, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton
On Mon 14-11-11 10:32:26, Christoph Hellwig wrote:
> On Mon, Nov 14, 2011 at 04:28:32PM +0100, Jan Kara wrote:
> > On Mon 14-11-11 21:01:53, Wu Fengguang wrote:
> > > On Mon, Nov 14, 2011 at 08:41:45PM +0800, Christoph Hellwig wrote:
> > > > On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote:
> > > > > > Can you make balance_dirty_pages(_ratelimited) return an error instead
> > > > > > of opencoding the fatal signal che
> > > > >
> > > > > Hmm that means a bigger change to the prototype of the _exported_
> > > > > balance_dirty_pages_ratelimited_nr() which impacts out of tree FS...
> > > >
> > > > So what? out of tree fses have always been entirely supported on their
> > > > own, and not a matter for keeping progress back.
> > >
> > > OK. Jan, would you do the mm/*.c part in v2? I'll do the various FS
> > > parts.
> > Sure, I'll update my patch to balance_dirty_pages(). Should I also update
> > my patch to generic_perform_write() or does that fall under "various FS
> > parts" cathegory?
>
> I'd be much happier if all bits go in together. IMHO this is a change
> worthwhile for 3.2 still, so we should probably try to get it in ASAP.
OK, I've resent my patches with requested changes. I think those two
patches make sense on it's own and fix the biggest single pain point.
Fengguang can add other fs bits to these two patches (although personally
I'd be happier to have respective fs maintainers acks on them before
pushing).
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-14 16:15 ` [PATCH 1/2] mm: " Jan Kara
@ 2011-11-14 16:23 ` Christoph Hellwig
2011-11-15 11:48 ` Wu Fengguang
1 sibling, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2011-11-14 16:23 UTC (permalink / raw)
To: Jan Kara
Cc: Wu Fengguang, Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel
On Mon, Nov 14, 2011 at 05:15:23PM +0100, Jan Kara wrote:
> There is no reason why task in balance_dirty_pages() shouldn't be killable
> and it helps in recovering from some error conditions (like when filesystem
> goes in error state and cannot accept writeback anymore but we still want to
> kill processes using it to be able to unmount it).
Sorry for noticing this so late, but what about killing the
balance_dirty_pages_ratelimited wrapper, and only exporting a version
that takes a number of pages now that we have to change the signature
anyway? I'd recommend to call the new one simply
balance_dirty_pages_ratelimited, that will give a compile breakge to
anyone using one of the old versions and thus cause them to actually
adapt and hopefull check the return value.
(and not for this series but soon later - we probably really should call
into balance_dirty_pages in larger batch sizes in most places).
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 15:30 ` Jan Kara
@ 2011-11-14 18:44 ` Jeremy Allison
0 siblings, 0 replies; 30+ messages in thread
From: Jeremy Allison @ 2011-11-14 18:44 UTC (permalink / raw)
To: Jan Kara
Cc: Matthew Wilcox, Wu Fengguang, linux-fsdevel@vger.kernel.org,
Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon, Nov 14, 2011 at 04:30:47PM +0100, Jan Kara wrote:
> On Mon 14-11-11 07:16:26, Matthew Wilcox wrote:
> > On Mon, Nov 14, 2011 at 01:34:46PM +0100, Jan Kara wrote:
> > > On Mon 14-11-11 20:15:56, Wu Fengguang wrote:
> > > > > @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file,
> > > > > iov_iter_count(i));
> > > > >
> > > > > again:
> > > > > + if (signal_pending(current)) {
> > > >
> > > > signal_pending looks more useful than fatal_signal_pending in that it
> > > > covers normal signals too. However it's exactly the broader coverage
> > > > that makes it an interface change -- will this possibly break casually
> > > > written applications?
> > > Yeah, this is upto discussion. Historically, write() (or any other system
> > > call) could have returned EINTR. In fact, write() to a socket can return
> > > EINTR even now. But you are right that we didn't return EINTR from write()
> > > to a regular file. So if you prefer to never return EINTR from a write to a
> > > regular file, I can change the check since I'm also slightly worried that
> > > some badly written app can notice.
> >
> > No, this is not up for discussion. You can't return short writes (or
> > reads). This is why the 'fatal_signal_pending' API exists -- if the
> > signal is fatal, the task is never returned to, so its bug (not checking
> > the return from read/write) is not exposed.
> By "can't return" you mean userspace need not be expecting it so we
> shouldn't break it or is there some standard which forbids it? Just
> curious...
This *WILL* break userspace code if you return short writes on a
filesystem fd. Guarenteed. I originally wrote code in Samba to
take care of it back before I learned the difference between
"slow" and "fast" interruptable system calls (see W.R.Stevens
for details).
Don't return short writes or reads on a filesystem fd.
Jeremy.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-14 16:15 ` [PATCH 1/2] mm: " Jan Kara
2011-11-14 16:23 ` Christoph Hellwig
@ 2011-11-15 11:48 ` Wu Fengguang
2011-11-15 13:41 ` Jan Kara
1 sibling, 1 reply; 30+ messages in thread
From: Wu Fengguang @ 2011-11-15 11:48 UTC (permalink / raw)
To: Jan Kara
Cc: Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org
> +static int balance_dirty_pages(struct address_space *mapping,
> unsigned long pages_dirtied)
> {
> unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */
> @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping,
> unsigned long pos_ratio;
> struct backing_dev_info *bdi = mapping->backing_dev_info;
> unsigned long start_time = jiffies;
> + int err = 0;
>
> for (;;) {
> /*
> @@ -1133,7 +1137,7 @@ pause:
> pages_dirtied,
> pause,
> start_time);
> - __set_current_state(TASK_UNINTERRUPTIBLE);
> + __set_current_state(TASK_KILLABLE);
> io_schedule_timeout(pause);
>
> dirty_thresh = hard_dirty_limit(dirty_thresh);
> @@ -1145,6 +1149,11 @@ pause:
> */
> if (nr_dirty < dirty_thresh)
> break;
> +
> + if (fatal_signal_pending(current)) {
> + err = -EINTR;
> + break;
> + }
> }
The other alternative is to raise the limit on fatal_signal_pending:
if (fatal_signal_pending(current) &&
nr_dirty < dirty_thresh + dirty_thresh / 2)
break;
That should work well enough in practice and avoids touching the fs code.
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-15 11:48 ` Wu Fengguang
@ 2011-11-15 13:41 ` Jan Kara
2011-11-15 14:15 ` Wu Fengguang
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2011-11-15 13:41 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org
On Tue 15-11-11 19:48:44, Wu Fengguang wrote:
> > +static int balance_dirty_pages(struct address_space *mapping,
> > unsigned long pages_dirtied)
> > {
> > unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */
> > @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping,
> > unsigned long pos_ratio;
> > struct backing_dev_info *bdi = mapping->backing_dev_info;
> > unsigned long start_time = jiffies;
> > + int err = 0;
> >
> > for (;;) {
> > /*
> > @@ -1133,7 +1137,7 @@ pause:
> > pages_dirtied,
> > pause,
> > start_time);
> > - __set_current_state(TASK_UNINTERRUPTIBLE);
> > + __set_current_state(TASK_KILLABLE);
> > io_schedule_timeout(pause);
> >
> > dirty_thresh = hard_dirty_limit(dirty_thresh);
> > @@ -1145,6 +1149,11 @@ pause:
> > */
> > if (nr_dirty < dirty_thresh)
> > break;
> > +
> > + if (fatal_signal_pending(current)) {
> > + err = -EINTR;
> > + break;
> > + }
> > }
>
> The other alternative is to raise the limit on fatal_signal_pending:
>
> if (fatal_signal_pending(current) &&
> nr_dirty < dirty_thresh + dirty_thresh / 2)
> break;
>
> That should work well enough in practice and avoids touching the fs code.
Sorry, but I fail to see what would this bring us... Can you elaborate a
bit please?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-15 13:41 ` Jan Kara
@ 2011-11-15 14:15 ` Wu Fengguang
2011-11-15 14:44 ` Jan Kara
0 siblings, 1 reply; 30+ messages in thread
From: Wu Fengguang @ 2011-11-15 14:15 UTC (permalink / raw)
To: Jan Kara
Cc: Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org
On Tue, Nov 15, 2011 at 09:41:27PM +0800, Jan Kara wrote:
> On Tue 15-11-11 19:48:44, Wu Fengguang wrote:
> > > +static int balance_dirty_pages(struct address_space *mapping,
> > > unsigned long pages_dirtied)
> > > {
> > > unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */
> > > @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping,
> > > unsigned long pos_ratio;
> > > struct backing_dev_info *bdi = mapping->backing_dev_info;
> > > unsigned long start_time = jiffies;
> > > + int err = 0;
> > >
> > > for (;;) {
> > > /*
> > > @@ -1133,7 +1137,7 @@ pause:
> > > pages_dirtied,
> > > pause,
> > > start_time);
> > > - __set_current_state(TASK_UNINTERRUPTIBLE);
> > > + __set_current_state(TASK_KILLABLE);
> > > io_schedule_timeout(pause);
> > >
> > > dirty_thresh = hard_dirty_limit(dirty_thresh);
> > > @@ -1145,6 +1149,11 @@ pause:
> > > */
> > > if (nr_dirty < dirty_thresh)
> > > break;
> > > +
> > > + if (fatal_signal_pending(current)) {
> > > + err = -EINTR;
> > > + break;
> > > + }
> > > }
> >
> > The other alternative is to raise the limit on fatal_signal_pending:
> >
> > if (fatal_signal_pending(current) &&
> > nr_dirty < dirty_thresh + dirty_thresh / 2)
> > break;
> >
> > That should work well enough in practice and avoids touching the fs code.
> Sorry, but I fail to see what would this bring us... Can you elaborate a
> bit please?
It's not bringing us something, but allows us to get rid of patch 2
as well as adding fatal_signal_pending() tests to all the other fs.
So no more worries about partial writes :)
Sure it leaves ->write_begin() blocks unhandled, however that should
be much less a problem than the blocks inside balance_dirty_pages().
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-15 14:15 ` Wu Fengguang
@ 2011-11-15 14:44 ` Jan Kara
0 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2011-11-15 14:44 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org
On Tue 15-11-11 22:15:28, Wu Fengguang wrote:
> On Tue, Nov 15, 2011 at 09:41:27PM +0800, Jan Kara wrote:
> > On Tue 15-11-11 19:48:44, Wu Fengguang wrote:
> > > > +static int balance_dirty_pages(struct address_space *mapping,
> > > > unsigned long pages_dirtied)
> > > > {
> > > > unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */
> > > > @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping,
> > > > unsigned long pos_ratio;
> > > > struct backing_dev_info *bdi = mapping->backing_dev_info;
> > > > unsigned long start_time = jiffies;
> > > > + int err = 0;
> > > >
> > > > for (;;) {
> > > > /*
> > > > @@ -1133,7 +1137,7 @@ pause:
> > > > pages_dirtied,
> > > > pause,
> > > > start_time);
> > > > - __set_current_state(TASK_UNINTERRUPTIBLE);
> > > > + __set_current_state(TASK_KILLABLE);
> > > > io_schedule_timeout(pause);
> > > >
> > > > dirty_thresh = hard_dirty_limit(dirty_thresh);
> > > > @@ -1145,6 +1149,11 @@ pause:
> > > > */
> > > > if (nr_dirty < dirty_thresh)
> > > > break;
> > > > +
> > > > + if (fatal_signal_pending(current)) {
> > > > + err = -EINTR;
> > > > + break;
> > > > + }
> > > > }
> > >
> > > The other alternative is to raise the limit on fatal_signal_pending:
> > >
> > > if (fatal_signal_pending(current) &&
> > > nr_dirty < dirty_thresh + dirty_thresh / 2)
> > > break;
> > >
> > > That should work well enough in practice and avoids touching the fs code.
> > Sorry, but I fail to see what would this bring us... Can you elaborate a
> > bit please?
>
> It's not bringing us something, but allows us to get rid of patch 2
> as well as adding fatal_signal_pending() tests to all the other fs.
>
> So no more worries about partial writes :)
>
> Sure it leaves ->write_begin() blocks unhandled, however that should
> be much less a problem than the blocks inside balance_dirty_pages().
Ah, OK, I see. Frankly, I'd rather keep the loop break in
generic_perform_write() and have the code straightforward. I don't see any
security (data exposure) problem there and if someone complains on the
grounds that the behavior has changed, we can revert the change in the
worst case.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-16 11:12 [PATCH 0/2 v3] " Jan Kara
@ 2011-11-16 11:12 ` Jan Kara
2011-11-16 11:28 ` Wu Fengguang
0 siblings, 1 reply; 30+ messages in thread
From: Jan Kara @ 2011-11-16 11:12 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel,
Jan Kara
There is no reason why task in balance_dirty_pages() shouldn't be killable
and it helps in recovering from some error conditions (like when filesystem
goes in error state and cannot accept writeback anymore but we still want to
kill processes using it to be able to unmount it).
Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
mm/page-writeback.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0360d1b..39f5805 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1133,7 +1133,7 @@ pause:
pages_dirtied,
pause,
start_time);
- __set_current_state(TASK_UNINTERRUPTIBLE);
+ __set_current_state(TASK_KILLABLE);
io_schedule_timeout(pause);
dirty_thresh = hard_dirty_limit(dirty_thresh);
@@ -1145,6 +1145,9 @@ pause:
*/
if (nr_dirty < dirty_thresh)
break;
+
+ if (fatal_signal_pending(current))
+ break;
}
if (!dirty_exceeded && bdi->dirty_exceeded)
--
1.7.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-16 11:12 ` [PATCH 1/2] mm: " Jan Kara
@ 2011-11-16 11:28 ` Wu Fengguang
2011-11-16 12:58 ` Jan Kara
0 siblings, 1 reply; 30+ messages in thread
From: Wu Fengguang @ 2011-11-16 11:28 UTC (permalink / raw)
To: Jan Kara
Cc: Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org, Neil Brown, KOSAKI Motohiro,
Anton Altaparmakov
On Wed, Nov 16, 2011 at 07:12:14PM +0800, Jan Kara wrote:
> There is no reason why task in balance_dirty_pages() shouldn't be killable
> and it helps in recovering from some error conditions (like when filesystem
> goes in error state and cannot accept writeback anymore but we still want to
> kill processes using it to be able to unmount it).
Here I'd like to add a paragraph from my original patch:
The follow up patches will further abort the generic_perform_write() and
other filesystem write loops, to avoid large write + SIGKILL combination
exceeding the dirty limit and possibly strange OOM.
Thanks,
Fengguang
> Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> mm/page-writeback.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 0360d1b..39f5805 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -1133,7 +1133,7 @@ pause:
> pages_dirtied,
> pause,
> start_time);
> - __set_current_state(TASK_UNINTERRUPTIBLE);
> + __set_current_state(TASK_KILLABLE);
> io_schedule_timeout(pause);
>
> dirty_thresh = hard_dirty_limit(dirty_thresh);
> @@ -1145,6 +1145,9 @@ pause:
> */
> if (nr_dirty < dirty_thresh)
> break;
> +
> + if (fatal_signal_pending(current))
> + break;
> }
>
> if (!dirty_exceeded && bdi->dirty_exceeded)
> --
> 1.7.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-16 11:28 ` Wu Fengguang
@ 2011-11-16 12:58 ` Jan Kara
0 siblings, 0 replies; 30+ messages in thread
From: Jan Kara @ 2011-11-16 12:58 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org, Neil Brown, KOSAKI Motohiro,
Anton Altaparmakov
On Wed 16-11-11 19:28:19, Wu Fengguang wrote:
> On Wed, Nov 16, 2011 at 07:12:14PM +0800, Jan Kara wrote:
> > There is no reason why task in balance_dirty_pages() shouldn't be killable
> > and it helps in recovering from some error conditions (like when filesystem
> > goes in error state and cannot accept writeback anymore but we still want to
> > kill processes using it to be able to unmount it).
>
> Here I'd like to add a paragraph from my original patch:
>
> The follow up patches will further abort the generic_perform_write() and
> other filesystem write loops, to avoid large write + SIGKILL combination
> exceeding the dirty limit and possibly strange OOM.
Sure, feel free to add the paragraph to the changelog.
Honza
> > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > mm/page-writeback.c | 5 ++++-
> > 1 files changed, 4 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 0360d1b..39f5805 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -1133,7 +1133,7 @@ pause:
> > pages_dirtied,
> > pause,
> > start_time);
> > - __set_current_state(TASK_UNINTERRUPTIBLE);
> > + __set_current_state(TASK_KILLABLE);
> > io_schedule_timeout(pause);
> >
> > dirty_thresh = hard_dirty_limit(dirty_thresh);
> > @@ -1145,6 +1145,9 @@ pause:
> > */
> > if (nr_dirty < dirty_thresh)
> > break;
> > +
> > + if (fatal_signal_pending(current))
> > + break;
> > }
> >
> > if (!dirty_exceeded && bdi->dirty_exceeded)
> > --
> > 1.7.1
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2011-11-16 12:58 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara
2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara
2011-11-14 12:12 ` Wu Fengguang
2011-11-14 12:37 ` Jan Kara
2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
2011-11-14 12:12 ` Matthew Wilcox
2011-11-14 12:15 ` Wu Fengguang
2011-11-14 12:34 ` Jan Kara
2011-11-14 14:16 ` Matthew Wilcox
2011-11-14 15:30 ` Jan Kara
2011-11-14 18:44 ` Jeremy Allison
2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang
2011-11-14 12:05 ` Christoph Hellwig
2011-11-14 12:24 ` Jan Kara
2011-11-14 12:29 ` Wu Fengguang
2011-11-14 12:41 ` Christoph Hellwig
2011-11-14 13:01 ` Wu Fengguang
2011-11-14 15:28 ` Jan Kara
2011-11-14 15:32 ` Christoph Hellwig
2011-11-14 16:19 ` Jan Kara
2011-11-14 12:12 ` Jan Kara
-- strict thread matches above, loose matches on Subject: below --
2011-11-14 16:15 [PATCH 0/2 v2] Make task in balance_dirty_pages() killable Jan Kara
2011-11-14 16:15 ` [PATCH 1/2] mm: " Jan Kara
2011-11-14 16:23 ` Christoph Hellwig
2011-11-15 11:48 ` Wu Fengguang
2011-11-15 13:41 ` Jan Kara
2011-11-15 14:15 ` Wu Fengguang
2011-11-15 14:44 ` Jan Kara
2011-11-16 11:12 [PATCH 0/2 v3] " Jan Kara
2011-11-16 11:12 ` [PATCH 1/2] mm: " Jan Kara
2011-11-16 11:28 ` Wu Fengguang
2011-11-16 12:58 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).