* [PATCH 0/2] Make task doing heavy writing killable
@ 2011-11-14 11:10 Jan Kara
2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara
` (2 more replies)
0 siblings, 3 replies; 39+ messages in thread
From: Jan Kara @ 2011-11-14 11:10 UTC (permalink / raw)
To: linux-fsdevel
Cc: Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig
Hello,
these two patches aim at making task waiting in balance_dirty_pages()
killable. This is desirable because otherwise if filesystem stops accepting
writes (e.g. if device has been removed or other serious error condidion) we
have a task stuck in D state forever.
I'm not sure who should merge these two patches... Al, Fengguang?
Honza
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara
@ 2011-11-14 11:10 ` Jan Kara
2011-11-14 12:12 ` Wu Fengguang
2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang
2 siblings, 1 reply; 39+ messages in thread
From: Jan Kara @ 2011-11-14 11:10 UTC (permalink / raw)
To: linux-fsdevel
Cc: Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig,
Jan Kara
There is no reason why task in balance_dirty_pages() shouldn't be killable
and it helps in recovering from some error conditions (like when filesystem
goes in error state and cannot accept writeback anymore but we still want to
kill processes using it to be able to unmount it).
Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
mm/page-writeback.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0360d1b..e83c286 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1133,8 +1133,10 @@ pause:
pages_dirtied,
pause,
start_time);
- __set_current_state(TASK_UNINTERRUPTIBLE);
+ __set_current_state(TASK_KILLABLE);
io_schedule_timeout(pause);
+ if (fatal_signal_pending(current))
+ break;
dirty_thresh = hard_dirty_limit(dirty_thresh);
/*
--
1.7.1
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara
2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara
@ 2011-11-14 11:10 ` Jan Kara
2011-11-14 12:12 ` Matthew Wilcox
2011-11-14 12:15 ` Wu Fengguang
2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang
2 siblings, 2 replies; 39+ messages in thread
From: Jan Kara @ 2011-11-14 11:10 UTC (permalink / raw)
To: linux-fsdevel
Cc: Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig,
Jan Kara
Currently write(2) to a file is not interruptible by a signal. Sometimes this
is desirable (e.g. when you want to quickly kill a process hogging your disk or
when some process gets blocked in balance_dirty_pages() indefinitely due to a
filesystem being in an error condition).
Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
mm/filemap.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index c0018f2..6b01d2f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file,
iov_iter_count(i));
again:
+ if (signal_pending(current)) {
+ status = -EINTR;
+ break;
+ }
/*
* Bring in the user page that we will copy from _first_.
--
1.7.1
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara
2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara
2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
@ 2011-11-14 11:59 ` Wu Fengguang
2011-11-14 12:05 ` Christoph Hellwig
2011-11-14 12:12 ` Jan Kara
2 siblings, 2 replies; 39+ messages in thread
From: Wu Fengguang @ 2011-11-14 11:59 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com,
Andrew Morton, Christoph Hellwig
[-- Attachment #1: Type: text/plain, Size: 955 bytes --]
On Mon, Nov 14, 2011 at 07:10:28PM +0800, Jan Kara wrote:
>
> Hello,
>
> these two patches aim at making task waiting in balance_dirty_pages()
> killable. This is desirable because otherwise if filesystem stops accepting
> writes (e.g. if device has been removed or other serious error condidion) we
> have a task stuck in D state forever.
Agreed totally. I myself has run into such conditions and get very
annoyed not being able to kill the hard throttled tasks -- they just
stuck there for ever if the error condition does not change.
> I'm not sure who should merge these two patches... Al, Fengguang?
I'd like to do it -- otherwise there will obviously be merge conflicts.
Actually I also queued a patch to do this (attached). Your patches do
better on TASK_KILLABLE and the use of signal_pending() in write
routines, while mine goes further to add the break to various
filesystems. How about combining them together?
Thanks,
Fengguang
[-- Attachment #2: writeback-break-on-signal-pending.patch --]
[-- Type: text/x-diff, Size: 4638 bytes --]
Subject: writeback: quit throttling when fatal signal pending
Date: Wed Sep 08 17:40:22 CST 2010
This allows quick response to Ctrl-C etc. for impatient users.
It's necessary to abort the generic_perform_write() and other filesystem
write loops too, to avoid large write + SIGKILL combination exceeding
the dirty limit and possibly strange OOM.
It mainly helps the rare bdi/global dirty exceeded cases.
In the normal case of not exceeded, it will quit the loop anyway.
if (fatal_signal_pending(current) &&
nr_reclaimable + nr_writeback <= dirty_thresh)
break;
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
cc: Chris Mason <chris.mason@oracle.com>
cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
fs/btrfs/file.c | 4 ++++
fs/btrfs/ioctl.c | 4 ++++
fs/btrfs/relocation.c | 4 ++++
fs/buffer.c | 4 ++++
fs/ntfs/attrib.c | 2 ++
fs/ntfs/file.c | 6 ++++++
mm/filemap.c | 3 ++-
mm/page-writeback.c | 2 ++
8 files changed, 28 insertions(+), 1 deletion(-)
--- linux-next.orig/mm/page-writeback.c 2011-11-13 20:37:11.000000000 +0800
+++ linux-next/mm/page-writeback.c 2011-11-13 20:37:57.000000000 +0800
@@ -1185,6 +1185,8 @@ pause:
*/
if (task_ratelimit)
break;
+ if (fatal_signal_pending(current))
+ break;
}
if (!dirty_exceeded && bdi->dirty_exceeded)
--- linux-next.orig/mm/filemap.c 2011-11-13 20:03:23.000000000 +0800
+++ linux-next/mm/filemap.c 2011-11-13 20:37:17.000000000 +0800
@@ -2463,7 +2463,8 @@ again:
written += copied;
balance_dirty_pages_ratelimited(mapping);
-
+ if (fatal_signal_pending(current))
+ break;
} while (iov_iter_count(i));
return written ? written : status;
--- linux-next.orig/fs/btrfs/file.c 2011-11-13 20:37:11.000000000 +0800
+++ linux-next/fs/btrfs/file.c 2011-11-13 20:37:17.000000000 +0800
@@ -1274,6 +1274,10 @@ static noinline ssize_t __btrfs_buffered
dirty_pages);
if (dirty_pages < (root->leafsize >> PAGE_CACHE_SHIFT) + 1)
btrfs_btree_balance_dirty(root, 1);
+ if (fatal_signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
btrfs_throttle(root);
pos += copied;
--- linux-next.orig/fs/btrfs/ioctl.c 2011-11-13 20:03:23.000000000 +0800
+++ linux-next/fs/btrfs/ioctl.c 2011-11-13 20:37:17.000000000 +0800
@@ -1115,6 +1115,10 @@ int btrfs_defrag_file(struct inode *inod
defrag_count += ret;
balance_dirty_pages_ratelimited_nr(inode->i_mapping, ret);
+ if (fatal_signal_pending(current)) {
+ ret = -EINTR;
+ goto out_ra;
+ }
if (newer_than) {
if (newer_off == (u64)-1)
--- linux-next.orig/fs/btrfs/relocation.c 2011-11-13 20:03:23.000000000 +0800
+++ linux-next/fs/btrfs/relocation.c 2011-11-13 20:37:17.000000000 +0800
@@ -3009,6 +3009,10 @@ static int relocate_file_extent_cluster(
index++;
balance_dirty_pages_ratelimited(inode->i_mapping);
+ if (fatal_signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
btrfs_throttle(BTRFS_I(inode)->root);
}
WARN_ON(nr != cluster->nr);
--- linux-next.orig/fs/buffer.c 2011-11-13 20:03:23.000000000 +0800
+++ linux-next/fs/buffer.c 2011-11-13 20:37:17.000000000 +0800
@@ -2255,6 +2255,10 @@ static int cont_expand_zero(struct file
err = 0;
balance_dirty_pages_ratelimited(mapping);
+ if (fatal_signal_pending(current)) {
+ err = -EINTR;
+ goto out;
+ }
}
/* page covers the boundary, find the boundary offset */
--- linux-next.orig/fs/ntfs/attrib.c 2011-11-13 20:03:23.000000000 +0800
+++ linux-next/fs/ntfs/attrib.c 2011-11-13 20:37:17.000000000 +0800
@@ -2588,6 +2588,8 @@ int ntfs_attr_set(ntfs_inode *ni, const
unlock_page(page);
page_cache_release(page);
balance_dirty_pages_ratelimited(mapping);
+ if (fatal_signal_pending(current))
+ return -EINTR;
cond_resched();
}
/* If there is a last partial page, need to do it the slow way. */
--- linux-next.orig/fs/ntfs/file.c 2011-11-13 20:03:23.000000000 +0800
+++ linux-next/fs/ntfs/file.c 2011-11-13 20:37:17.000000000 +0800
@@ -278,6 +278,10 @@ do_non_resident_extend:
* files.
*/
balance_dirty_pages_ratelimited(mapping);
+ if (fatal_signal_pending(current)) {
+ err = -EINTR;
+ goto init_err_out;
+ }
cond_resched();
} while (++index < end_index);
read_lock_irqsave(&ni->size_lock, flags);
@@ -2054,6 +2058,8 @@ static ssize_t ntfs_file_buffered_write(
if (unlikely(status))
break;
balance_dirty_pages_ratelimited(mapping);
+ if (fatal_signal_pending(current))
+ break;
cond_resched();
} while (count);
err_out:
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang
@ 2011-11-14 12:05 ` Christoph Hellwig
2011-11-14 12:24 ` Jan Kara
2011-11-14 12:29 ` Wu Fengguang
2011-11-14 12:12 ` Jan Kara
1 sibling, 2 replies; 39+ messages in thread
From: Christoph Hellwig @ 2011-11-14 12:05 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon, Nov 14, 2011 at 07:59:12PM +0800, Wu Fengguang wrote:
> On Mon, Nov 14, 2011 at 07:10:28PM +0800, Jan Kara wrote:
> >
> > Hello,
> >
> > these two patches aim at making task waiting in balance_dirty_pages()
> > killable. This is desirable because otherwise if filesystem stops accepting
> > writes (e.g. if device has been removed or other serious error condidion) we
> > have a task stuck in D state forever.
>
> Agreed totally. I myself has run into such conditions and get very
> annoyed not being able to kill the hard throttled tasks -- they just
> stuck there for ever if the error condition does not change.
>
> > I'm not sure who should merge these two patches... Al, Fengguang?
>
> I'd like to do it -- otherwise there will obviously be merge conflicts.
>
> Actually I also queued a patch to do this (attached). Your patches do
> better on TASK_KILLABLE and the use of signal_pending() in write
> routines, while mine goes further to add the break to various
> filesystems. How about combining them together?
Can you make balance_dirty_pages(_ratelimited) return an error instead
of opencoding the fatal signal check everywhere? That would make the
interface a bit more obvious.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang
2011-11-14 12:05 ` Christoph Hellwig
@ 2011-11-14 12:12 ` Jan Kara
1 sibling, 0 replies; 39+ messages in thread
From: Jan Kara @ 2011-11-14 12:12 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon 14-11-11 19:59:12, Wu Fengguang wrote:
> On Mon, Nov 14, 2011 at 07:10:28PM +0800, Jan Kara wrote:
> >
> > Hello,
> >
> > these two patches aim at making task waiting in balance_dirty_pages()
> > killable. This is desirable because otherwise if filesystem stops accepting
> > writes (e.g. if device has been removed or other serious error condidion) we
> > have a task stuck in D state forever.
>
> Agreed totally. I myself has run into such conditions and get very
> annoyed not being able to kill the hard throttled tasks -- they just
> stuck there for ever if the error condition does not change.
>
> > I'm not sure who should merge these two patches... Al, Fengguang?
>
> I'd like to do it -- otherwise there will obviously be merge conflicts.
Good.
> Actually I also queued a patch to do this (attached). Your patches do
> better on TASK_KILLABLE and the use of signal_pending() in write
> routines, while mine goes further to add the break to various
> filesystems. How about combining them together?
>
> Subject: writeback: quit throttling when fatal signal pending
> Date: Wed Sep 08 17:40:22 CST 2010
>
> This allows quick response to Ctrl-C etc. for impatient users.
>
> It's necessary to abort the generic_perform_write() and other filesystem
> write loops too, to avoid large write + SIGKILL combination exceeding
> the dirty limit and possibly strange OOM.
>
> It mainly helps the rare bdi/global dirty exceeded cases.
> In the normal case of not exceeded, it will quit the loop anyway.
>
> if (fatal_signal_pending(current) &&
> nr_reclaimable + nr_writeback <= dirty_thresh)
> break;
>
> Reviewed-by: Neil Brown <neilb@suse.de>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> cc: Chris Mason <chris.mason@oracle.com>
> cc: Anton Altaparmakov <aia21@cantab.net>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
> fs/btrfs/file.c | 4 ++++
> fs/btrfs/ioctl.c | 4 ++++
> fs/btrfs/relocation.c | 4 ++++
> fs/buffer.c | 4 ++++
> fs/ntfs/attrib.c | 2 ++
> fs/ntfs/file.c | 6 ++++++
> mm/filemap.c | 3 ++-
> mm/page-writeback.c | 2 ++
> 8 files changed, 28 insertions(+), 1 deletion(-)
>
> --- linux-next.orig/mm/page-writeback.c 2011-11-13 20:37:11.000000000 +0800
> +++ linux-next/mm/page-writeback.c 2011-11-13 20:37:57.000000000 +0800
> @@ -1185,6 +1185,8 @@ pause:
> */
> if (task_ratelimit)
> break;
> + if (fatal_signal_pending(current))
> + break;
> }
>
> if (!dirty_exceeded && bdi->dirty_exceeded)
> --- linux-next.orig/mm/filemap.c 2011-11-13 20:03:23.000000000 +0800
> +++ linux-next/mm/filemap.c 2011-11-13 20:37:17.000000000 +0800
> @@ -2463,7 +2463,8 @@ again:
> written += copied;
>
> balance_dirty_pages_ratelimited(mapping);
> -
> + if (fatal_signal_pending(current))
> + break;
> } while (iov_iter_count(i));
>
> return written ? written : status;
Above two hunks are already part of my patches (effectively). The hunks
below look good but I guess they should be split to per-filesystem patches
and sent to filesystem maintainers for merging. The changes are independent
so there's no reason to do them in one big patch or skip fs maintainers...
Honza
> --- linux-next.orig/fs/btrfs/file.c 2011-11-13 20:37:11.000000000 +0800
> +++ linux-next/fs/btrfs/file.c 2011-11-13 20:37:17.000000000 +0800
> @@ -1274,6 +1274,10 @@ static noinline ssize_t __btrfs_buffered
> dirty_pages);
> if (dirty_pages < (root->leafsize >> PAGE_CACHE_SHIFT) + 1)
> btrfs_btree_balance_dirty(root, 1);
> + if (fatal_signal_pending(current)) {
> + ret = -EINTR;
> + break;
> + }
> btrfs_throttle(root);
>
> pos += copied;
> --- linux-next.orig/fs/btrfs/ioctl.c 2011-11-13 20:03:23.000000000 +0800
> +++ linux-next/fs/btrfs/ioctl.c 2011-11-13 20:37:17.000000000 +0800
> @@ -1115,6 +1115,10 @@ int btrfs_defrag_file(struct inode *inod
>
> defrag_count += ret;
> balance_dirty_pages_ratelimited_nr(inode->i_mapping, ret);
> + if (fatal_signal_pending(current)) {
> + ret = -EINTR;
> + goto out_ra;
> + }
>
> if (newer_than) {
> if (newer_off == (u64)-1)
> --- linux-next.orig/fs/btrfs/relocation.c 2011-11-13 20:03:23.000000000 +0800
> +++ linux-next/fs/btrfs/relocation.c 2011-11-13 20:37:17.000000000 +0800
> @@ -3009,6 +3009,10 @@ static int relocate_file_extent_cluster(
>
> index++;
> balance_dirty_pages_ratelimited(inode->i_mapping);
> + if (fatal_signal_pending(current)) {
> + ret = -EINTR;
> + break;
> + }
> btrfs_throttle(BTRFS_I(inode)->root);
> }
> WARN_ON(nr != cluster->nr);
> --- linux-next.orig/fs/buffer.c 2011-11-13 20:03:23.000000000 +0800
> +++ linux-next/fs/buffer.c 2011-11-13 20:37:17.000000000 +0800
> @@ -2255,6 +2255,10 @@ static int cont_expand_zero(struct file
> err = 0;
>
> balance_dirty_pages_ratelimited(mapping);
> + if (fatal_signal_pending(current)) {
> + err = -EINTR;
> + goto out;
> + }
> }
>
> /* page covers the boundary, find the boundary offset */
> --- linux-next.orig/fs/ntfs/attrib.c 2011-11-13 20:03:23.000000000 +0800
> +++ linux-next/fs/ntfs/attrib.c 2011-11-13 20:37:17.000000000 +0800
> @@ -2588,6 +2588,8 @@ int ntfs_attr_set(ntfs_inode *ni, const
> unlock_page(page);
> page_cache_release(page);
> balance_dirty_pages_ratelimited(mapping);
> + if (fatal_signal_pending(current))
> + return -EINTR;
> cond_resched();
> }
> /* If there is a last partial page, need to do it the slow way. */
> --- linux-next.orig/fs/ntfs/file.c 2011-11-13 20:03:23.000000000 +0800
> +++ linux-next/fs/ntfs/file.c 2011-11-13 20:37:17.000000000 +0800
> @@ -278,6 +278,10 @@ do_non_resident_extend:
> * files.
> */
> balance_dirty_pages_ratelimited(mapping);
> + if (fatal_signal_pending(current)) {
> + err = -EINTR;
> + goto init_err_out;
> + }
> cond_resched();
> } while (++index < end_index);
> read_lock_irqsave(&ni->size_lock, flags);
> @@ -2054,6 +2058,8 @@ static ssize_t ntfs_file_buffered_write(
> if (unlikely(status))
> break;
> balance_dirty_pages_ratelimited(mapping);
> + if (fatal_signal_pending(current))
> + break;
> cond_resched();
> } while (count);
> err_out:
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara
@ 2011-11-14 12:12 ` Wu Fengguang
2011-11-14 12:37 ` Jan Kara
0 siblings, 1 reply; 39+ messages in thread
From: Wu Fengguang @ 2011-11-14 12:12 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com,
Andrew Morton, Christoph Hellwig
On Mon, Nov 14, 2011 at 07:10:29PM +0800, Jan Kara wrote:
> There is no reason why task in balance_dirty_pages() shouldn't be killable
> and it helps in recovering from some error conditions (like when filesystem
> goes in error state and cannot accept writeback anymore but we still want to
> kill processes using it to be able to unmount it).
>
> Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> mm/page-writeback.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 0360d1b..e83c286 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -1133,8 +1133,10 @@ pause:
> pages_dirtied,
> pause,
> start_time);
> - __set_current_state(TASK_UNINTERRUPTIBLE);
> + __set_current_state(TASK_KILLABLE);
Dumb question: will the task continue to show up as 'D' in ps?
> io_schedule_timeout(pause);
> + if (fatal_signal_pending(current))
> + break;
I'd like to move that several lines below, to the end of the loop,
so that the condition is not tested at all in normal executions.
> dirty_thresh = hard_dirty_limit(dirty_thresh);
> /*
> --
> 1.7.1
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
@ 2011-11-14 12:12 ` Matthew Wilcox
2011-11-14 12:15 ` Wu Fengguang
1 sibling, 0 replies; 39+ messages in thread
From: Matthew Wilcox @ 2011-11-14 12:12 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, Wu Fengguang, Al Viro, k-mio, Andrew Morton,
Christoph Hellwig
On Mon, Nov 14, 2011 at 12:10:30PM +0100, Jan Kara wrote:
> again:
> + if (signal_pending(current)) {
Uhh ... surely 'fatal_signal_pending', no?
> + status = -EINTR;
> + break;
> + }
>
> /*
> * Bring in the user page that we will copy from _first_.
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
2011-11-14 12:12 ` Matthew Wilcox
@ 2011-11-14 12:15 ` Wu Fengguang
2011-11-14 12:34 ` Jan Kara
1 sibling, 1 reply; 39+ messages in thread
From: Wu Fengguang @ 2011-11-14 12:15 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com,
Andrew Morton, Christoph Hellwig
> @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file,
> iov_iter_count(i));
>
> again:
> + if (signal_pending(current)) {
signal_pending looks more useful than fatal_signal_pending in that it
covers normal signals too. However it's exactly the broader coverage
that makes it an interface change -- will this possibly break casually
written applications?
> + status = -EINTR;
> + break;
> + }
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 12:05 ` Christoph Hellwig
@ 2011-11-14 12:24 ` Jan Kara
2011-11-14 12:29 ` Wu Fengguang
1 sibling, 0 replies; 39+ messages in thread
From: Jan Kara @ 2011-11-14 12:24 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Wu Fengguang, Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton
On Mon 14-11-11 07:05:46, Christoph Hellwig wrote:
> On Mon, Nov 14, 2011 at 07:59:12PM +0800, Wu Fengguang wrote:
> > On Mon, Nov 14, 2011 at 07:10:28PM +0800, Jan Kara wrote:
> > >
> > > Hello,
> > >
> > > these two patches aim at making task waiting in balance_dirty_pages()
> > > killable. This is desirable because otherwise if filesystem stops accepting
> > > writes (e.g. if device has been removed or other serious error condidion) we
> > > have a task stuck in D state forever.
> >
> > Agreed totally. I myself has run into such conditions and get very
> > annoyed not being able to kill the hard throttled tasks -- they just
> > stuck there for ever if the error condition does not change.
> >
> > > I'm not sure who should merge these two patches... Al, Fengguang?
> >
> > I'd like to do it -- otherwise there will obviously be merge conflicts.
> >
> > Actually I also queued a patch to do this (attached). Your patches do
> > better on TASK_KILLABLE and the use of signal_pending() in write
> > routines, while mine goes further to add the break to various
> > filesystems. How about combining them together?
>
> Can you make balance_dirty_pages(_ratelimited) return an error instead
> of opencoding the fatal signal check everywhere? That would make the
> interface a bit more obvious.
We can do this. It's just that signal_pending() check e.g. in
generic_perform_write() has a sense even when balance_dirty_pages() will be
returning error because it also catches other cases...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 12:05 ` Christoph Hellwig
2011-11-14 12:24 ` Jan Kara
@ 2011-11-14 12:29 ` Wu Fengguang
2011-11-14 12:41 ` Christoph Hellwig
1 sibling, 1 reply; 39+ messages in thread
From: Wu Fengguang @ 2011-11-14 12:29 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton
> Can you make balance_dirty_pages(_ratelimited) return an error instead
> of opencoding the fatal signal che
Hmm that means a bigger change to the prototype of the _exported_
balance_dirty_pages_ratelimited_nr() which impacts out of tree FS...
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 12:15 ` Wu Fengguang
@ 2011-11-14 12:34 ` Jan Kara
2011-11-14 14:16 ` Matthew Wilcox
0 siblings, 1 reply; 39+ messages in thread
From: Jan Kara @ 2011-11-14 12:34 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon 14-11-11 20:15:56, Wu Fengguang wrote:
> > @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file,
> > iov_iter_count(i));
> >
> > again:
> > + if (signal_pending(current)) {
>
> signal_pending looks more useful than fatal_signal_pending in that it
> covers normal signals too. However it's exactly the broader coverage
> that makes it an interface change -- will this possibly break casually
> written applications?
Yeah, this is upto discussion. Historically, write() (or any other system
call) could have returned EINTR. In fact, write() to a socket can return
EINTR even now. But you are right that we didn't return EINTR from write()
to a regular file. So if you prefer to never return EINTR from a write to a
regular file, I can change the check since I'm also slightly worried that
some badly written app can notice.
Honza
>
> > + status = -EINTR;
> > + break;
> > + }
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable
2011-11-14 12:12 ` Wu Fengguang
@ 2011-11-14 12:37 ` Jan Kara
0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2011-11-14 12:37 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon 14-11-11 20:12:45, Wu Fengguang wrote:
> On Mon, Nov 14, 2011 at 07:10:29PM +0800, Jan Kara wrote:
> > There is no reason why task in balance_dirty_pages() shouldn't be killable
> > and it helps in recovering from some error conditions (like when filesystem
> > goes in error state and cannot accept writeback anymore but we still want to
> > kill processes using it to be able to unmount it).
> >
> > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > mm/page-writeback.c | 4 +++-
> > 1 files changed, 3 insertions(+), 1 deletions(-)
> >
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 0360d1b..e83c286 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -1133,8 +1133,10 @@ pause:
> > pages_dirtied,
> > pause,
> > start_time);
> > - __set_current_state(TASK_UNINTERRUPTIBLE);
> > + __set_current_state(TASK_KILLABLE);
>
> Dumb question: will the task continue to show up as 'D' in ps?
Yes, it will. The only difference is that the task will be woken when
SIGKILL arrives.
> > io_schedule_timeout(pause);
> > + if (fatal_signal_pending(current))
> > + break;
>
> I'd like to move that several lines below, to the end of the loop,
> so that the condition is not tested at all in normal executions.
OK, will do that.
> > dirty_thresh = hard_dirty_limit(dirty_thresh);
> > /*
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 12:29 ` Wu Fengguang
@ 2011-11-14 12:41 ` Christoph Hellwig
2011-11-14 13:01 ` Wu Fengguang
0 siblings, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2011-11-14 12:41 UTC (permalink / raw)
To: Wu Fengguang
Cc: Christoph Hellwig, Jan Kara, linux-fsdevel@vger.kernel.org,
Al Viro, k-mio@sx.jp.nec.com, Andrew Morton
On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote:
> > Can you make balance_dirty_pages(_ratelimited) return an error instead
> > of opencoding the fatal signal che
>
> Hmm that means a bigger change to the prototype of the _exported_
> balance_dirty_pages_ratelimited_nr() which impacts out of tree FS...
So what? out of tree fses have always been entirely supported on their
own, and not a matter for keeping progress back.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 12:41 ` Christoph Hellwig
@ 2011-11-14 13:01 ` Wu Fengguang
2011-11-14 15:28 ` Jan Kara
0 siblings, 1 reply; 39+ messages in thread
From: Wu Fengguang @ 2011-11-14 13:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton
On Mon, Nov 14, 2011 at 08:41:45PM +0800, Christoph Hellwig wrote:
> On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote:
> > > Can you make balance_dirty_pages(_ratelimited) return an error instead
> > > of opencoding the fatal signal che
> >
> > Hmm that means a bigger change to the prototype of the _exported_
> > balance_dirty_pages_ratelimited_nr() which impacts out of tree FS...
>
> So what? out of tree fses have always been entirely supported on their
> own, and not a matter for keeping progress back.
OK. Jan, would you do the mm/*.c part in v2? I'll do the various FS
parts.
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 12:34 ` Jan Kara
@ 2011-11-14 14:16 ` Matthew Wilcox
2011-11-14 15:30 ` Jan Kara
0 siblings, 1 reply; 39+ messages in thread
From: Matthew Wilcox @ 2011-11-14 14:16 UTC (permalink / raw)
To: Jan Kara
Cc: Wu Fengguang, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon, Nov 14, 2011 at 01:34:46PM +0100, Jan Kara wrote:
> On Mon 14-11-11 20:15:56, Wu Fengguang wrote:
> > > @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file,
> > > iov_iter_count(i));
> > >
> > > again:
> > > + if (signal_pending(current)) {
> >
> > signal_pending looks more useful than fatal_signal_pending in that it
> > covers normal signals too. However it's exactly the broader coverage
> > that makes it an interface change -- will this possibly break casually
> > written applications?
> Yeah, this is upto discussion. Historically, write() (or any other system
> call) could have returned EINTR. In fact, write() to a socket can return
> EINTR even now. But you are right that we didn't return EINTR from write()
> to a regular file. So if you prefer to never return EINTR from a write to a
> regular file, I can change the check since I'm also slightly worried that
> some badly written app can notice.
No, this is not up for discussion. You can't return short writes (or
reads). This is why the 'fatal_signal_pending' API exists -- if the
signal is fatal, the task is never returned to, so its bug (not checking
the return from read/write) is not exposed.
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 13:01 ` Wu Fengguang
@ 2011-11-14 15:28 ` Jan Kara
2011-11-14 15:32 ` Christoph Hellwig
0 siblings, 1 reply; 39+ messages in thread
From: Jan Kara @ 2011-11-14 15:28 UTC (permalink / raw)
To: Wu Fengguang
Cc: Christoph Hellwig, Jan Kara, linux-fsdevel@vger.kernel.org,
Al Viro, k-mio@sx.jp.nec.com, Andrew Morton
On Mon 14-11-11 21:01:53, Wu Fengguang wrote:
> On Mon, Nov 14, 2011 at 08:41:45PM +0800, Christoph Hellwig wrote:
> > On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote:
> > > > Can you make balance_dirty_pages(_ratelimited) return an error instead
> > > > of opencoding the fatal signal che
> > >
> > > Hmm that means a bigger change to the prototype of the _exported_
> > > balance_dirty_pages_ratelimited_nr() which impacts out of tree FS...
> >
> > So what? out of tree fses have always been entirely supported on their
> > own, and not a matter for keeping progress back.
>
> OK. Jan, would you do the mm/*.c part in v2? I'll do the various FS
> parts.
Sure, I'll update my patch to balance_dirty_pages(). Should I also update
my patch to generic_perform_write() or does that fall under "various FS
parts" cathegory?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 14:16 ` Matthew Wilcox
@ 2011-11-14 15:30 ` Jan Kara
2011-11-14 18:44 ` Jeremy Allison
0 siblings, 1 reply; 39+ messages in thread
From: Jan Kara @ 2011-11-14 15:30 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Jan Kara, Wu Fengguang, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon 14-11-11 07:16:26, Matthew Wilcox wrote:
> On Mon, Nov 14, 2011 at 01:34:46PM +0100, Jan Kara wrote:
> > On Mon 14-11-11 20:15:56, Wu Fengguang wrote:
> > > > @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file,
> > > > iov_iter_count(i));
> > > >
> > > > again:
> > > > + if (signal_pending(current)) {
> > >
> > > signal_pending looks more useful than fatal_signal_pending in that it
> > > covers normal signals too. However it's exactly the broader coverage
> > > that makes it an interface change -- will this possibly break casually
> > > written applications?
> > Yeah, this is upto discussion. Historically, write() (or any other system
> > call) could have returned EINTR. In fact, write() to a socket can return
> > EINTR even now. But you are right that we didn't return EINTR from write()
> > to a regular file. So if you prefer to never return EINTR from a write to a
> > regular file, I can change the check since I'm also slightly worried that
> > some badly written app can notice.
>
> No, this is not up for discussion. You can't return short writes (or
> reads). This is why the 'fatal_signal_pending' API exists -- if the
> signal is fatal, the task is never returned to, so its bug (not checking
> the return from read/write) is not exposed.
By "can't return" you mean userspace need not be expecting it so we
shouldn't break it or is there some standard which forbids it? Just
curious...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 15:28 ` Jan Kara
@ 2011-11-14 15:32 ` Christoph Hellwig
2011-11-14 16:19 ` Jan Kara
0 siblings, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2011-11-14 15:32 UTC (permalink / raw)
To: Jan Kara
Cc: Wu Fengguang, Christoph Hellwig, linux-fsdevel@vger.kernel.org,
Al Viro, k-mio@sx.jp.nec.com, Andrew Morton
On Mon, Nov 14, 2011 at 04:28:32PM +0100, Jan Kara wrote:
> On Mon 14-11-11 21:01:53, Wu Fengguang wrote:
> > On Mon, Nov 14, 2011 at 08:41:45PM +0800, Christoph Hellwig wrote:
> > > On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote:
> > > > > Can you make balance_dirty_pages(_ratelimited) return an error instead
> > > > > of opencoding the fatal signal che
> > > >
> > > > Hmm that means a bigger change to the prototype of the _exported_
> > > > balance_dirty_pages_ratelimited_nr() which impacts out of tree FS...
> > >
> > > So what? out of tree fses have always been entirely supported on their
> > > own, and not a matter for keeping progress back.
> >
> > OK. Jan, would you do the mm/*.c part in v2? I'll do the various FS
> > parts.
> Sure, I'll update my patch to balance_dirty_pages(). Should I also update
> my patch to generic_perform_write() or does that fall under "various FS
> parts" cathegory?
I'd be much happier if all bits go in together. IMHO this is a change
worthwhile for 3.2 still, so we should probably try to get it in ASAP.
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 16:15 [PATCH 0/2 v2] Make task in balance_dirty_pages() killable Jan Kara
@ 2011-11-14 16:15 ` Jan Kara
2011-11-14 16:26 ` Christoph Hellwig
2011-11-14 22:19 ` Andrew Morton
0 siblings, 2 replies; 39+ messages in thread
From: Jan Kara @ 2011-11-14 16:15 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel,
Jan Kara
Currently write(2) to a file is not interruptible by a signal. Sometimes this
is desirable (e.g. when you want to quickly kill a process hogging your disk or
when some process gets blocked in balance_dirty_pages() indefinitely due to a
filesystem being in an error condition).
Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
mm/filemap.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index c0018f2..166b30e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file,
iov_iter_count(i));
again:
-
/*
* Bring in the user page that we will copy from _first_.
* Otherwise there's a nasty deadlock on copying from the
@@ -2463,7 +2462,15 @@ again:
written += copied;
balance_dirty_pages_ratelimited(mapping);
-
+ /*
+ * We check the signal independently of balance_dirty_pages()
+ * because we need not wait and check for signal there although
+ * this loop could have taken significant amount of time...
+ */
+ if (fatal_signal_pending(current)) {
+ status = -EINTR;
+ break;
+ }
} while (iov_iter_count(i));
return written ? written : status;
--
1.7.1
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable
2011-11-14 15:32 ` Christoph Hellwig
@ 2011-11-14 16:19 ` Jan Kara
0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2011-11-14 16:19 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, Wu Fengguang, linux-fsdevel@vger.kernel.org, Al Viro,
k-mio@sx.jp.nec.com, Andrew Morton
On Mon 14-11-11 10:32:26, Christoph Hellwig wrote:
> On Mon, Nov 14, 2011 at 04:28:32PM +0100, Jan Kara wrote:
> > On Mon 14-11-11 21:01:53, Wu Fengguang wrote:
> > > On Mon, Nov 14, 2011 at 08:41:45PM +0800, Christoph Hellwig wrote:
> > > > On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote:
> > > > > > Can you make balance_dirty_pages(_ratelimited) return an error instead
> > > > > > of opencoding the fatal signal che
> > > > >
> > > > > Hmm that means a bigger change to the prototype of the _exported_
> > > > > balance_dirty_pages_ratelimited_nr() which impacts out of tree FS...
> > > >
> > > > So what? out of tree fses have always been entirely supported on their
> > > > own, and not a matter for keeping progress back.
> > >
> > > OK. Jan, would you do the mm/*.c part in v2? I'll do the various FS
> > > parts.
> > Sure, I'll update my patch to balance_dirty_pages(). Should I also update
> > my patch to generic_perform_write() or does that fall under "various FS
> > parts" cathegory?
>
> I'd be much happier if all bits go in together. IMHO this is a change
> worthwhile for 3.2 still, so we should probably try to get it in ASAP.
OK, I've resent my patches with requested changes. I think those two
patches make sense on it's own and fix the biggest single pain point.
Fengguang can add other fs bits to these two patches (although personally
I'd be happier to have respective fs maintainers acks on them before
pushing).
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 16:15 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
@ 2011-11-14 16:26 ` Christoph Hellwig
2011-11-14 16:46 ` Jan Kara
2011-11-14 22:19 ` Andrew Morton
1 sibling, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2011-11-14 16:26 UTC (permalink / raw)
To: Jan Kara
Cc: Wu Fengguang, Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel
On Mon, Nov 14, 2011 at 05:15:24PM +0100, Jan Kara wrote:
> Currently write(2) to a file is not interruptible by a signal. Sometimes this
> is desirable (e.g. when you want to quickly kill a process hogging your disk or
> when some process gets blocked in balance_dirty_pages() indefinitely due to a
> filesystem being in an error condition).
>
> Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> mm/filemap.c | 11 +++++++++--
> 1 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c0018f2..166b30e 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file,
> iov_iter_count(i));
>
> again:
> -
> /*
> * Bring in the user page that we will copy from _first_.
> * Otherwise there's a nasty deadlock on copying from the
> @@ -2463,7 +2462,15 @@ again:
> written += copied;
>
> balance_dirty_pages_ratelimited(mapping);
> -
> + /*
> + * We check the signal independently of balance_dirty_pages()
> + * because we need not wait and check for signal there although
> + * this loop could have taken significant amount of time...
> + */
> + if (fatal_signal_pending(current)) {
> + status = -EINTR;
> + break;
> + }
Hmm. If we need to check again every time adding the return value to
balance_dirty_pages is rather pointless.
I have a bit of a problem parsing the comment - does it try to say that
we might spend too much time after the fatal_signal_pending in the
balance_dirty_pages code so that we have to check it again? Why not
repeat the check at the end of balance_dirty_pages_ratelimited and thus
avoid having to duplicate the thing in all callers?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 16:26 ` Christoph Hellwig
@ 2011-11-14 16:46 ` Jan Kara
2011-11-14 20:13 ` Christoph Hellwig
0 siblings, 1 reply; 39+ messages in thread
From: Jan Kara @ 2011-11-14 16:46 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, Wu Fengguang, Andrew Morton, Al Viro, linux-fsdevel
On Mon 14-11-11 11:26:00, Christoph Hellwig wrote:
> On Mon, Nov 14, 2011 at 05:15:24PM +0100, Jan Kara wrote:
> > Currently write(2) to a file is not interruptible by a signal. Sometimes this
> > is desirable (e.g. when you want to quickly kill a process hogging your disk or
> > when some process gets blocked in balance_dirty_pages() indefinitely due to a
> > filesystem being in an error condition).
> >
> > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > mm/filemap.c | 11 +++++++++--
> > 1 files changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index c0018f2..166b30e 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file,
> > iov_iter_count(i));
> >
> > again:
> > -
> > /*
> > * Bring in the user page that we will copy from _first_.
> > * Otherwise there's a nasty deadlock on copying from the
> > @@ -2463,7 +2462,15 @@ again:
> > written += copied;
> >
> > balance_dirty_pages_ratelimited(mapping);
> > -
> > + /*
> > + * We check the signal independently of balance_dirty_pages()
> > + * because we need not wait and check for signal there although
> > + * this loop could have taken significant amount of time...
> > + */
> > + if (fatal_signal_pending(current)) {
> > + status = -EINTR;
> > + break;
> > + }
>
> Hmm. If we need to check again every time adding the return value to
> balance_dirty_pages is rather pointless.
>
> I have a bit of a problem parsing the comment - does it try to say that
> we might spend too much time after the fatal_signal_pending in the
> balance_dirty_pages code so that we have to check it again? Why not
> repeat the check at the end of balance_dirty_pages_ratelimited and thus
> avoid having to duplicate the thing in all callers?
The point I was trying to get across is that all:
iov_iter_fault_in_readable()
->write_begin()
...
->write_end()
can take a rather long time. Considering that
balance_dirty_pages_ratelimited() needn't call balance_dirty_pages() or
balance_dirty_pages() can just return without doing anything because there
aren't many dirty pages, the end result is that
balance_dirty_pages_ratelimited() won't return EINTR for rather long time
although there is a signal pending. That's why I want to check for signal
pending directly in the loop in generic_perform_write().
There can be loops where the only blocking point is in
balance_dirty_pages() and then using the return value makes sense but
I think such loops would be in minority so maybe the return value doesn't
make much sense after all.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 15:30 ` Jan Kara
@ 2011-11-14 18:44 ` Jeremy Allison
0 siblings, 0 replies; 39+ messages in thread
From: Jeremy Allison @ 2011-11-14 18:44 UTC (permalink / raw)
To: Jan Kara
Cc: Matthew Wilcox, Wu Fengguang, linux-fsdevel@vger.kernel.org,
Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig
On Mon, Nov 14, 2011 at 04:30:47PM +0100, Jan Kara wrote:
> On Mon 14-11-11 07:16:26, Matthew Wilcox wrote:
> > On Mon, Nov 14, 2011 at 01:34:46PM +0100, Jan Kara wrote:
> > > On Mon 14-11-11 20:15:56, Wu Fengguang wrote:
> > > > > @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file,
> > > > > iov_iter_count(i));
> > > > >
> > > > > again:
> > > > > + if (signal_pending(current)) {
> > > >
> > > > signal_pending looks more useful than fatal_signal_pending in that it
> > > > covers normal signals too. However it's exactly the broader coverage
> > > > that makes it an interface change -- will this possibly break casually
> > > > written applications?
> > > Yeah, this is upto discussion. Historically, write() (or any other system
> > > call) could have returned EINTR. In fact, write() to a socket can return
> > > EINTR even now. But you are right that we didn't return EINTR from write()
> > > to a regular file. So if you prefer to never return EINTR from a write to a
> > > regular file, I can change the check since I'm also slightly worried that
> > > some badly written app can notice.
> >
> > No, this is not up for discussion. You can't return short writes (or
> > reads). This is why the 'fatal_signal_pending' API exists -- if the
> > signal is fatal, the task is never returned to, so its bug (not checking
> > the return from read/write) is not exposed.
> By "can't return" you mean userspace need not be expecting it so we
> shouldn't break it or is there some standard which forbids it? Just
> curious...
This *WILL* break userspace code if you return short writes on a
filesystem fd. Guarenteed. I originally wrote code in Samba to
take care of it back before I learned the difference between
"slow" and "fast" interruptable system calls (see W.R.Stevens
for details).
Don't return short writes or reads on a filesystem fd.
Jeremy.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 16:46 ` Jan Kara
@ 2011-11-14 20:13 ` Christoph Hellwig
0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2011-11-14 20:13 UTC (permalink / raw)
To: Jan Kara
Cc: Christoph Hellwig, Wu Fengguang, Andrew Morton, Al Viro,
linux-fsdevel
On Mon, Nov 14, 2011 at 05:46:47PM +0100, Jan Kara wrote:
> There can be loops where the only blocking point is in
> balance_dirty_pages() and then using the return value makes sense but
> I think such loops would be in minority so maybe the return value doesn't
> make much sense after all.
Yes, maybe the return value isn't that smart. Sorry for all the noise.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 16:15 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
2011-11-14 16:26 ` Christoph Hellwig
@ 2011-11-14 22:19 ` Andrew Morton
2011-11-15 11:23 ` Jan Kara
1 sibling, 1 reply; 39+ messages in thread
From: Andrew Morton @ 2011-11-14 22:19 UTC (permalink / raw)
To: Jan Kara; +Cc: Wu Fengguang, Christoph Hellwig, Al Viro, linux-fsdevel
On Mon, 14 Nov 2011 17:15:24 +0100
Jan Kara <jack@suse.cz> wrote:
> Currently write(2) to a file is not interruptible by a signal. Sometimes this
> is desirable (e.g. when you want to quickly kill a process hogging your disk or
> when some process gets blocked in balance_dirty_pages() indefinitely due to a
> filesystem being in an error condition).
>
> Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> mm/filemap.c | 11 +++++++++--
> 1 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c0018f2..166b30e 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file,
> iov_iter_count(i));
>
> again:
> -
> /*
> * Bring in the user page that we will copy from _first_.
> * Otherwise there's a nasty deadlock on copying from the
> @@ -2463,7 +2462,15 @@ again:
> written += copied;
>
> balance_dirty_pages_ratelimited(mapping);
> -
> + /*
> + * We check the signal independently of balance_dirty_pages()
> + * because we need not wait and check for signal there although
> + * this loop could have taken significant amount of time...
> + */
> + if (fatal_signal_pending(current)) {
> + status = -EINTR;
> + break;
> + }
> } while (iov_iter_count(i));
>
> return written ? written : status;
Will this permit the interruption of things like fsync() or sync()? If
so, considerable pondering is needed.
Also I worry about stuff like the use of buffered write to finish off
loose ends in direct-IO writing. Sometimes these writes MUST complete,
to prevent exposing unwritten disk blocks to a subsequent read. Will a
well-timed ^C break this? If "no" then does this change introduce risk
that we'll later accidentally introduce such a security hole?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-14 22:19 ` Andrew Morton
@ 2011-11-15 11:23 ` Jan Kara
0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2011-11-15 11:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Jan Kara, Wu Fengguang, Christoph Hellwig, Al Viro, linux-fsdevel
On Mon 14-11-11 14:19:54, Andrew Morton wrote:
> On Mon, 14 Nov 2011 17:15:24 +0100
> Jan Kara <jack@suse.cz> wrote:
>
> > Currently write(2) to a file is not interruptible by a signal. Sometimes this
> > is desirable (e.g. when you want to quickly kill a process hogging your disk or
> > when some process gets blocked in balance_dirty_pages() indefinitely due to a
> > filesystem being in an error condition).
> >
> > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > mm/filemap.c | 11 +++++++++--
> > 1 files changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index c0018f2..166b30e 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file,
> > iov_iter_count(i));
> >
> > again:
> > -
> > /*
> > * Bring in the user page that we will copy from _first_.
> > * Otherwise there's a nasty deadlock on copying from the
> > @@ -2463,7 +2462,15 @@ again:
> > written += copied;
> >
> > balance_dirty_pages_ratelimited(mapping);
> > -
> > + /*
> > + * We check the signal independently of balance_dirty_pages()
> > + * because we need not wait and check for signal there although
> > + * this loop could have taken significant amount of time...
> > + */
> > + if (fatal_signal_pending(current)) {
> > + status = -EINTR;
> > + break;
> > + }
> > } while (iov_iter_count(i));
> >
> > return written ? written : status;
>
> Will this permit the interruption of things like fsync() or sync()? If
> so, considerable pondering is needed.
No, fsync() or sync() are not influenced.
> Also I worry about stuff like the use of buffered write to finish off
> loose ends in direct-IO writing. Sometimes these writes MUST complete,
> to prevent exposing unwritten disk blocks to a subsequent read. Will a
> well-timed ^C break this? If "no" then does this change introduce risk
> that we'll later accidentally introduce such a security hole?
That's a good question! Well timed SIGKILL can stop buffered write
completing bits of direct IO write. But looking into the code I fail to see
where we rely on buffered write being completed. I find it somewhere on a
boundary of a filesystem bug to return from direct IO with a potential of
stale data exposure...
For writes beyond i_size, we increase i_size only by the amount which was
successfully written so there is no risk of exposure. Otherwise, direct IO
would have to fill a hole to allocate a block - ext? filesystems, reiserfs,
ocfs2, and other simple filesystems avoid doing that. XFS fills a hole but
takes care to init everything. So I don't think anyone really expects
buffered write to cover his back.
To sum up the only result of interrupted buffered write after direct IO
write seems to be that some other application could see only part of the
write completed. That is possible in case of multipage buffered writes as
well and e.g. with ext3/4 such situation is possible even without these
patches (due to conditions like ENOSPC or due to a system crash -
admittedly more serious conditions than someone sending SIGKILL).
Thanks for your comments.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-16 11:12 [PATCH 0/2 v3] Make task in balance_dirty_pages() killable Jan Kara
@ 2011-11-16 11:12 ` Jan Kara
2011-11-16 11:44 ` Wu Fengguang
0 siblings, 1 reply; 39+ messages in thread
From: Jan Kara @ 2011-11-16 11:12 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel,
Jan Kara
Currently write(2) to a file is not interruptible by a signal. Sometimes this
is desirable (e.g. when you want to quickly kill a process hogging your disk or
when some process gets blocked in balance_dirty_pages() indefinitely due to a
filesystem being in an error condition).
Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
mm/filemap.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index c0018f2..c106d3b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file,
iov_iter_count(i));
again:
-
/*
* Bring in the user page that we will copy from _first_.
* Otherwise there's a nasty deadlock on copying from the
@@ -2463,7 +2462,10 @@ again:
written += copied;
balance_dirty_pages_ratelimited(mapping);
-
+ if (fatal_signal_pending(current)) {
+ status = -EINTR;
+ break;
+ }
} while (iov_iter_count(i));
return written ? written : status;
--
1.7.1
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-16 11:12 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
@ 2011-11-16 11:44 ` Wu Fengguang
2011-11-16 12:54 ` Jan Kara
2011-11-22 22:28 ` Andrew Morton
0 siblings, 2 replies; 39+ messages in thread
From: Wu Fengguang @ 2011-11-16 11:44 UTC (permalink / raw)
To: Jan Kara
Cc: Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org
Jan,
Due to the (very low) possibility of data loss by partial writes, IMHO
it would safer to test this patch in linux-next until next merge window,
would you agree?
Pushing the first patch will address the main problem, anyway.
Thanks,
Fengguang
On Wed, Nov 16, 2011 at 07:12:15PM +0800, Jan Kara wrote:
> Currently write(2) to a file is not interruptible by a signal. Sometimes this
> is desirable (e.g. when you want to quickly kill a process hogging your disk or
> when some process gets blocked in balance_dirty_pages() indefinitely due to a
> filesystem being in an error condition).
>
> Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> mm/filemap.c | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c0018f2..c106d3b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file,
> iov_iter_count(i));
>
> again:
> -
> /*
> * Bring in the user page that we will copy from _first_.
> * Otherwise there's a nasty deadlock on copying from the
> @@ -2463,7 +2462,10 @@ again:
> written += copied;
>
> balance_dirty_pages_ratelimited(mapping);
> -
> + if (fatal_signal_pending(current)) {
> + status = -EINTR;
> + break;
> + }
> } while (iov_iter_count(i));
>
> return written ? written : status;
> --
> 1.7.1
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-16 11:44 ` Wu Fengguang
@ 2011-11-16 12:54 ` Jan Kara
2011-11-16 13:11 ` Wu Fengguang
2011-11-22 22:28 ` Andrew Morton
1 sibling, 1 reply; 39+ messages in thread
From: Jan Kara @ 2011-11-16 12:54 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org
On Wed 16-11-11 19:44:21, Wu Fengguang wrote:
> Jan,
>
> Due to the (very low) possibility of data loss by partial writes, IMHO
> it would safer to test this patch in linux-next until next merge window,
> would you agree?
Fine with me. Thanks.
> Pushing the first patch will address the main problem, anyway.
Hopefully, yes.
Honza
> On Wed, Nov 16, 2011 at 07:12:15PM +0800, Jan Kara wrote:
> > Currently write(2) to a file is not interruptible by a signal. Sometimes this
> > is desirable (e.g. when you want to quickly kill a process hogging your disk or
> > when some process gets blocked in balance_dirty_pages() indefinitely due to a
> > filesystem being in an error condition).
> >
> > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > mm/filemap.c | 6 ++++--
> > 1 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index c0018f2..c106d3b 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file,
> > iov_iter_count(i));
> >
> > again:
> > -
> > /*
> > * Bring in the user page that we will copy from _first_.
> > * Otherwise there's a nasty deadlock on copying from the
> > @@ -2463,7 +2462,10 @@ again:
> > written += copied;
> >
> > balance_dirty_pages_ratelimited(mapping);
> > -
> > + if (fatal_signal_pending(current)) {
> > + status = -EINTR;
> > + break;
> > + }
> > } while (iov_iter_count(i));
> >
> > return written ? written : status;
> > --
> > 1.7.1
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-16 12:54 ` Jan Kara
@ 2011-11-16 13:11 ` Wu Fengguang
0 siblings, 0 replies; 39+ messages in thread
From: Wu Fengguang @ 2011-11-16 13:11 UTC (permalink / raw)
To: Jan Kara
Cc: Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org, Kazuya Mio
On Wed, Nov 16, 2011 at 08:54:45PM +0800, Jan Kara wrote:
> On Wed 16-11-11 19:44:21, Wu Fengguang wrote:
> > Jan,
> >
> > Due to the (very low) possibility of data loss by partial writes, IMHO
> > it would safer to test this patch in linux-next until next merge window,
> > would you agree?
> Fine with me. Thanks.
Great. The two patches are now in:
http://git.kernel.org/?p=linux/kernel/git/wfg/linux.git;a=shortlog;h=refs/heads/writeback-for-next
When applying the patch I changed the title "... by a signal" to
"... by SIGKILL" to reflect the updated patch content. Hopefully
this is also what's in your mind.
> > Pushing the first patch will address the main problem, anyway.
> Hopefully, yes.
I looked over Kazuya's posts in linux-ext4 and think the first patch
alone does have good chance address the problem of busy looping in
balance_dirty_pages() due to dirty pages never drop under fs error
conditions.
Thanks,
Fengguang
> > On Wed, Nov 16, 2011 at 07:12:15PM +0800, Jan Kara wrote:
> > > Currently write(2) to a file is not interruptible by a signal. Sometimes this
> > > is desirable (e.g. when you want to quickly kill a process hogging your disk or
> > > when some process gets blocked in balance_dirty_pages() indefinitely due to a
> > > filesystem being in an error condition).
> > >
> > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > > mm/filemap.c | 6 ++++--
> > > 1 files changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > index c0018f2..c106d3b 100644
> > > --- a/mm/filemap.c
> > > +++ b/mm/filemap.c
> > > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file,
> > > iov_iter_count(i));
> > >
> > > again:
> > > -
> > > /*
> > > * Bring in the user page that we will copy from _first_.
> > > * Otherwise there's a nasty deadlock on copying from the
> > > @@ -2463,7 +2462,10 @@ again:
> > > written += copied;
> > >
> > > balance_dirty_pages_ratelimited(mapping);
> > > -
> > > + if (fatal_signal_pending(current)) {
> > > + status = -EINTR;
> > > + break;
> > > + }
> > > } while (iov_iter_count(i));
> > >
> > > return written ? written : status;
> > > --
> > > 1.7.1
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-16 11:44 ` Wu Fengguang
2011-11-16 12:54 ` Jan Kara
@ 2011-11-22 22:28 ` Andrew Morton
2011-11-23 9:05 ` Wu Fengguang
1 sibling, 1 reply; 39+ messages in thread
From: Andrew Morton @ 2011-11-22 22:28 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org
On Wed, 16 Nov 2011 19:44:21 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:
> Due to the (very low) possibility of data loss by partial writes, IMHO
> it would safer to test this patch in linux-next until next merge window,
Any such bugs will not be discovered in linux-next testing.
The only way to find these things in a reasonable period of time is to
go in and find them. For example, intensive fsx-linux testing with
concurrent heavy memory pressure on various filesystems with various
block sizes. And of course concurrent signalling. If you're talking
about O_DIRECT then iirc I hacked support for that into fsx-linux. I
think.
Anyway, what _are_ the scenarios in which we think data can be lost?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-22 22:28 ` Andrew Morton
@ 2011-11-23 9:05 ` Wu Fengguang
2011-11-23 9:50 ` Andrew Morton
2011-11-23 13:08 ` Jan Kara
0 siblings, 2 replies; 39+ messages in thread
From: Wu Fengguang @ 2011-11-23 9:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Jan Kara, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org
On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote:
> On Wed, 16 Nov 2011 19:44:21 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > Due to the (very low) possibility of data loss by partial writes, IMHO
> > it would safer to test this patch in linux-next until next merge window,
>
> Any such bugs will not be discovered in linux-next testing.
Yup, I'm afraid.
> The only way to find these things in a reasonable period of time is to
> go in and find them. For example, intensive fsx-linux testing with
> concurrent heavy memory pressure on various filesystems with various
> block sizes. And of course concurrent signalling. If you're talking
> about O_DIRECT then iirc I hacked support for that into fsx-linux. I
> think.
How are we going to measure the success/failure? Check if it
eventually resulted in filesystem corruption or whatever?
When received SIGKILL, fsx-linux itself will just die.
> Anyway, what _are_ the scenarios in which we think data can be lost?
It's the vision that there may be partial writes on SIGKILL. Before
patch, the write will either succeed as a whole or not started at
all, depending on the timing of write/SIGKILL. This is kind of atomic
operation. However now the write can be half done.
If the application really cares about atomic behavior, it can do
create-and-rename. However there are always the possibility of broken
applications.
Maybe this is not that big problem as SIGKILL is considered be to
destructive already.
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-23 9:05 ` Wu Fengguang
@ 2011-11-23 9:50 ` Andrew Morton
2011-11-23 13:08 ` Jan Kara
1 sibling, 0 replies; 39+ messages in thread
From: Andrew Morton @ 2011-11-23 9:50 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org
On Wed, 23 Nov 2011 17:05:33 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote:
> > On Wed, 16 Nov 2011 19:44:21 +0800
> > Wu Fengguang <fengguang.wu@intel.com> wrote:
> >
> > > Due to the (very low) possibility of data loss by partial writes, IMHO
> > > it would safer to test this patch in linux-next until next merge window,
> >
> > Any such bugs will not be discovered in linux-next testing.
>
> Yup, I'm afraid.
>
> > The only way to find these things in a reasonable period of time is to
> > go in and find them. For example, intensive fsx-linux testing with
> > concurrent heavy memory pressure on various filesystems with various
> > block sizes. And of course concurrent signalling. If you're talking
> > about O_DIRECT then iirc I hacked support for that into fsx-linux. I
> > think.
>
> How are we going to measure the success/failure? Check if it
> eventually resulted in filesystem corruption or whatever?
yup.
> When received SIGKILL, fsx-linux itself will just die.
Well, there are ways of simulating its effect. For example, bale out
of the write() every seventh time if current->comm=="fsx-linux". Or
set a rearming timer which triggers a baled-out write. You'll work it
out ;)
> > Anyway, what _are_ the scenarios in which we think data can be lost?
>
> It's the vision that there may be partial writes on SIGKILL. Before
> patch, the write will either succeed as a whole or not started at
> all, depending on the timing of write/SIGKILL. This is kind of atomic
> operation. However now the write can be half done.
>
> If the application really cares about atomic behavior, it can do
> create-and-rename. However there are always the possibility of broken
> applications.
>
> Maybe this is not that big problem as SIGKILL is considered be to
> destructive already.
Yeah, I have dim dark memories that there are subtle problems with
interrupting write(). Linus might remember.
Others might remember too, but we're only talking to fsdevel which
nobody reads :(
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-23 9:05 ` Wu Fengguang
2011-11-23 9:50 ` Andrew Morton
@ 2011-11-23 13:08 ` Jan Kara
2011-11-23 13:27 ` Wu Fengguang
1 sibling, 1 reply; 39+ messages in thread
From: Jan Kara @ 2011-11-23 13:08 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, Jan Kara, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org
On Wed 23-11-11 17:05:33, Wu Fengguang wrote:
> On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote:
> > On Wed, 16 Nov 2011 19:44:21 +0800
> > Wu Fengguang <fengguang.wu@intel.com> wrote:
> >
> > > Due to the (very low) possibility of data loss by partial writes, IMHO
> > > it would safer to test this patch in linux-next until next merge window,
> >
> > Any such bugs will not be discovered in linux-next testing.
>
> Yup, I'm afraid.
>
> > The only way to find these things in a reasonable period of time is to
> > go in and find them. For example, intensive fsx-linux testing with
> > concurrent heavy memory pressure on various filesystems with various
> > block sizes. And of course concurrent signalling. If you're talking
> > about O_DIRECT then iirc I hacked support for that into fsx-linux. I
> > think.
>
> How are we going to measure the success/failure? Check if it
> eventually resulted in filesystem corruption or whatever?
There are a few different questions:
1) Checking for filesystem corruption via fsck - I find such corruption
caused by stopping write early extremely unlikely.
2) Checking that we do not expose uninitialized data after a partial
(possibly DIRECT_IO) write - I did not find a place where that could happen
but this would be worth testing. I think I can write a test for this if
people are afraid of data exposure problems.
3) Is it acceptable for write(2) to be interrupted by SIGKILL in the
middle? That obviously does happen with my patches so there's no reason
to test that. The question is whether someone cares or not and that can be
tested only by reality check :). Since the signal is SIGKILL, the process
itself cannot notice the interrupted write but someone else can. But as I
already said earlier, partial writes can already be observed when the
machine crashes, filesystem is close to ENOSPC or so. Arguably these are
more severe error conditions than application catching SIGKILL so my
patch lowers the bar for observing partial writes. But I wouldn't like to
throw away a sensible thing - allow SIGKILL to interrupt a system call -
just because of fear of possibility some broken app could rely on this.
Sure if the reality check shows there are such broken apps and users who
care enough to report, then I have nothing against biting the bullet
and reverting the change... Opinions?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-23 13:08 ` Jan Kara
@ 2011-11-23 13:27 ` Wu Fengguang
2011-11-23 15:06 ` Theodore Tso
0 siblings, 1 reply; 39+ messages in thread
From: Wu Fengguang @ 2011-11-23 13:27 UTC (permalink / raw)
To: Jan Kara
Cc: Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org, Theodore Ts'o
On Wed, Nov 23, 2011 at 09:08:03PM +0800, Jan Kara wrote:
> On Wed 23-11-11 17:05:33, Wu Fengguang wrote:
> > On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote:
> > > On Wed, 16 Nov 2011 19:44:21 +0800
> > > Wu Fengguang <fengguang.wu@intel.com> wrote:
> > >
> > > > Due to the (very low) possibility of data loss by partial writes, IMHO
> > > > it would safer to test this patch in linux-next until next merge window,
> > >
> > > Any such bugs will not be discovered in linux-next testing.
> >
> > Yup, I'm afraid.
> >
> > > The only way to find these things in a reasonable period of time is to
> > > go in and find them. For example, intensive fsx-linux testing with
> > > concurrent heavy memory pressure on various filesystems with various
> > > block sizes. And of course concurrent signalling. If you're talking
> > > about O_DIRECT then iirc I hacked support for that into fsx-linux. I
> > > think.
> >
> > How are we going to measure the success/failure? Check if it
> > eventually resulted in filesystem corruption or whatever?
> There are a few different questions:
> 1) Checking for filesystem corruption via fsck - I find such corruption
> caused by stopping write early extremely unlikely.
Agreed.
> 2) Checking that we do not expose uninitialized data after a partial
> (possibly DIRECT_IO) write - I did not find a place where that could happen
> but this would be worth testing. I think I can write a test for this if
> people are afraid of data exposure problems.
Do we already have such kind of tests in xfstests? If not, it sounds
like a good gap to fill :-)
> 3) Is it acceptable for write(2) to be interrupted by SIGKILL in the
> middle? That obviously does happen with my patches so there's no reason
> to test that. The question is whether someone cares or not and that can be
> tested only by reality check :). Since the signal is SIGKILL, the process
> itself cannot notice the interrupted write but someone else can. But as I
> already said earlier, partial writes can already be observed when the
> machine crashes, filesystem is close to ENOSPC or so. Arguably these are
> more severe error conditions than application catching SIGKILL so my
> patch lowers the bar for observing partial writes. But I wouldn't like to
> throw away a sensible thing - allow SIGKILL to interrupt a system call -
> just because of fear of possibility some broken app could rely on this.
> Sure if the reality check shows there are such broken apps and users who
> care enough to report, then I have nothing against biting the bullet
> and reverting the change... Opinions?
Reading Ted's information feed, I tend to disregard the partial write
issue: since the "broken" applications will already fail and get
punished in various other cases, I don't care adding one more penalty
case to them :-P
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-23 13:27 ` Wu Fengguang
@ 2011-11-23 15:06 ` Theodore Tso
2011-11-28 3:08 ` Wu Fengguang
0 siblings, 1 reply; 39+ messages in thread
From: Theodore Tso @ 2011-11-23 15:06 UTC (permalink / raw)
To: Wu Fengguang
Cc: Theodore Tso, Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org
On Nov 23, 2011, at 8:27 AM, Wu Fengguang wrote:
>
> Reading Ted's information feed, I tend to disregard the partial write
> issue: since the "broken" applications will already fail and get
> punished in various other cases, I don't care adding one more penalty
> case to them :-P
Just wait until you have a bunch of rabid application programmers, questioning your judgement, your morality, and even your paternity. :-)
-- Ted
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-23 15:06 ` Theodore Tso
@ 2011-11-28 3:08 ` Wu Fengguang
2011-11-29 14:16 ` Jan Kara
0 siblings, 1 reply; 39+ messages in thread
From: Wu Fengguang @ 2011-11-28 3:08 UTC (permalink / raw)
To: Theodore Tso
Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org, Kazuya Mio, Dmitry Monakhov,
Matthew Wilcox
On Wed, Nov 23, 2011 at 11:06:56PM +0800, Theodore Ts'o wrote:
>
> On Nov 23, 2011, at 8:27 AM, Wu Fengguang wrote:
>
> >
> > Reading Ted's information feed, I tend to disregard the partial write
> > issue: since the "broken" applications will already fail and get
> > punished in various other cases, I don't care adding one more penalty
> > case to them :-P
>
> Just wait until you have a bunch of rabid application programmers,
> questioning your judgement, your morality, and even your paternity.
> :-)
Ah OK, that sounds frightening. Hmm, till now every one have
acknowledged the possibility of data corruption, only that
people have different perceptions of the severeness.
Let's rethink things this way: "Is it a _worthwhile_ risk at all?"
I'm afraid not. Considering the origin of this patch
[BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages
http://www.spinics.net/lists/linux-ext4/msg28464.html
I *think* Jan's first patch is already enough for fixing the bug. IWO
the patch we worried/discussed so much is really an optional one. I
would imagine the easy and safe solution is to just drop it. Any
objections?
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal
2011-11-28 3:08 ` Wu Fengguang
@ 2011-11-29 14:16 ` Jan Kara
0 siblings, 0 replies; 39+ messages in thread
From: Jan Kara @ 2011-11-29 14:16 UTC (permalink / raw)
To: Wu Fengguang
Cc: Theodore Tso, Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro,
linux-fsdevel@vger.kernel.org, Kazuya Mio, Dmitry Monakhov,
Matthew Wilcox
On Mon 28-11-11 11:08:42, Wu Fengguang wrote:
> On Wed, Nov 23, 2011 at 11:06:56PM +0800, Theodore Ts'o wrote:
> >
> > On Nov 23, 2011, at 8:27 AM, Wu Fengguang wrote:
> >
> > >
> > > Reading Ted's information feed, I tend to disregard the partial write
> > > issue: since the "broken" applications will already fail and get
> > > punished in various other cases, I don't care adding one more penalty
> > > case to them :-P
> >
> > Just wait until you have a bunch of rabid application programmers,
> > questioning your judgement, your morality, and even your paternity.
> > :-)
>
> Ah OK, that sounds frightening. Hmm, till now every one have
> acknowledged the possibility of data corruption, only that
> people have different perceptions of the severeness.
>
> Let's rethink things this way: "Is it a _worthwhile_ risk at all?"
> I'm afraid not. Considering the origin of this patch
>
> [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages
> http://www.spinics.net/lists/linux-ext4/msg28464.html
>
> I *think* Jan's first patch is already enough for fixing the bug. IWO
> the patch we worried/discussed so much is really an optional one. I
> would imagine the easy and safe solution is to just drop it. Any
> objections?
I still think aborting write is a cleaner solution. But I don't want to
argue to death about this so if you still don't think it's a good idea,
I'll respect your decision.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2011-11-29 14:16 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara
2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara
2011-11-14 12:12 ` Wu Fengguang
2011-11-14 12:37 ` Jan Kara
2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
2011-11-14 12:12 ` Matthew Wilcox
2011-11-14 12:15 ` Wu Fengguang
2011-11-14 12:34 ` Jan Kara
2011-11-14 14:16 ` Matthew Wilcox
2011-11-14 15:30 ` Jan Kara
2011-11-14 18:44 ` Jeremy Allison
2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang
2011-11-14 12:05 ` Christoph Hellwig
2011-11-14 12:24 ` Jan Kara
2011-11-14 12:29 ` Wu Fengguang
2011-11-14 12:41 ` Christoph Hellwig
2011-11-14 13:01 ` Wu Fengguang
2011-11-14 15:28 ` Jan Kara
2011-11-14 15:32 ` Christoph Hellwig
2011-11-14 16:19 ` Jan Kara
2011-11-14 12:12 ` Jan Kara
-- strict thread matches above, loose matches on Subject: below --
2011-11-14 16:15 [PATCH 0/2 v2] Make task in balance_dirty_pages() killable Jan Kara
2011-11-14 16:15 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
2011-11-14 16:26 ` Christoph Hellwig
2011-11-14 16:46 ` Jan Kara
2011-11-14 20:13 ` Christoph Hellwig
2011-11-14 22:19 ` Andrew Morton
2011-11-15 11:23 ` Jan Kara
2011-11-16 11:12 [PATCH 0/2 v3] Make task in balance_dirty_pages() killable Jan Kara
2011-11-16 11:12 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara
2011-11-16 11:44 ` Wu Fengguang
2011-11-16 12:54 ` Jan Kara
2011-11-16 13:11 ` Wu Fengguang
2011-11-22 22:28 ` Andrew Morton
2011-11-23 9:05 ` Wu Fengguang
2011-11-23 9:50 ` Andrew Morton
2011-11-23 13:08 ` Jan Kara
2011-11-23 13:27 ` Wu Fengguang
2011-11-23 15:06 ` Theodore Tso
2011-11-28 3:08 ` Wu Fengguang
2011-11-29 14:16 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).