* [PATCH 0/2] Make task doing heavy writing killable @ 2011-11-14 11:10 Jan Kara 2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara ` (2 more replies) 0 siblings, 3 replies; 39+ messages in thread From: Jan Kara @ 2011-11-14 11:10 UTC (permalink / raw) To: linux-fsdevel Cc: Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig Hello, these two patches aim at making task waiting in balance_dirty_pages() killable. This is desirable because otherwise if filesystem stops accepting writes (e.g. if device has been removed or other serious error condidion) we have a task stuck in D state forever. I'm not sure who should merge these two patches... Al, Fengguang? Honza ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara @ 2011-11-14 11:10 ` Jan Kara 2011-11-14 12:12 ` Wu Fengguang 2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang 2 siblings, 1 reply; 39+ messages in thread From: Jan Kara @ 2011-11-14 11:10 UTC (permalink / raw) To: linux-fsdevel Cc: Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig, Jan Kara There is no reason why task in balance_dirty_pages() shouldn't be killable and it helps in recovering from some error conditions (like when filesystem goes in error state and cannot accept writeback anymore but we still want to kill processes using it to be able to unmount it). Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Jan Kara <jack@suse.cz> --- mm/page-writeback.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 0360d1b..e83c286 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1133,8 +1133,10 @@ pause: pages_dirtied, pause, start_time); - __set_current_state(TASK_UNINTERRUPTIBLE); + __set_current_state(TASK_KILLABLE); io_schedule_timeout(pause); + if (fatal_signal_pending(current)) + break; dirty_thresh = hard_dirty_limit(dirty_thresh); /* -- 1.7.1 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara @ 2011-11-14 12:12 ` Wu Fengguang 2011-11-14 12:37 ` Jan Kara 0 siblings, 1 reply; 39+ messages in thread From: Wu Fengguang @ 2011-11-14 12:12 UTC (permalink / raw) To: Jan Kara Cc: linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig On Mon, Nov 14, 2011 at 07:10:29PM +0800, Jan Kara wrote: > There is no reason why task in balance_dirty_pages() shouldn't be killable > and it helps in recovering from some error conditions (like when filesystem > goes in error state and cannot accept writeback anymore but we still want to > kill processes using it to be able to unmount it). > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > mm/page-writeback.c | 4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 0360d1b..e83c286 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1133,8 +1133,10 @@ pause: > pages_dirtied, > pause, > start_time); > - __set_current_state(TASK_UNINTERRUPTIBLE); > + __set_current_state(TASK_KILLABLE); Dumb question: will the task continue to show up as 'D' in ps? > io_schedule_timeout(pause); > + if (fatal_signal_pending(current)) > + break; I'd like to move that several lines below, to the end of the loop, so that the condition is not tested at all in normal executions. > dirty_thresh = hard_dirty_limit(dirty_thresh); > /* > -- > 1.7.1 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 12:12 ` Wu Fengguang @ 2011-11-14 12:37 ` Jan Kara 0 siblings, 0 replies; 39+ messages in thread From: Jan Kara @ 2011-11-14 12:37 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig On Mon 14-11-11 20:12:45, Wu Fengguang wrote: > On Mon, Nov 14, 2011 at 07:10:29PM +0800, Jan Kara wrote: > > There is no reason why task in balance_dirty_pages() shouldn't be killable > > and it helps in recovering from some error conditions (like when filesystem > > goes in error state and cannot accept writeback anymore but we still want to > > kill processes using it to be able to unmount it). > > > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > mm/page-writeback.c | 4 +++- > > 1 files changed, 3 insertions(+), 1 deletions(-) > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index 0360d1b..e83c286 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -1133,8 +1133,10 @@ pause: > > pages_dirtied, > > pause, > > start_time); > > - __set_current_state(TASK_UNINTERRUPTIBLE); > > + __set_current_state(TASK_KILLABLE); > > Dumb question: will the task continue to show up as 'D' in ps? Yes, it will. The only difference is that the task will be woken when SIGKILL arrives. > > io_schedule_timeout(pause); > > + if (fatal_signal_pending(current)) > > + break; > > I'd like to move that several lines below, to the end of the loop, > so that the condition is not tested at all in normal executions. OK, will do that. > > dirty_thresh = hard_dirty_limit(dirty_thresh); > > /* Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara 2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara @ 2011-11-14 11:10 ` Jan Kara 2011-11-14 12:12 ` Matthew Wilcox 2011-11-14 12:15 ` Wu Fengguang 2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang 2 siblings, 2 replies; 39+ messages in thread From: Jan Kara @ 2011-11-14 11:10 UTC (permalink / raw) To: linux-fsdevel Cc: Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig, Jan Kara Currently write(2) to a file is not interruptible by a signal. Sometimes this is desirable (e.g. when you want to quickly kill a process hogging your disk or when some process gets blocked in balance_dirty_pages() indefinitely due to a filesystem being in an error condition). Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Jan Kara <jack@suse.cz> --- mm/filemap.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index c0018f2..6b01d2f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file, iov_iter_count(i)); again: + if (signal_pending(current)) { + status = -EINTR; + break; + } /* * Bring in the user page that we will copy from _first_. -- 1.7.1 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara @ 2011-11-14 12:12 ` Matthew Wilcox 2011-11-14 12:15 ` Wu Fengguang 1 sibling, 0 replies; 39+ messages in thread From: Matthew Wilcox @ 2011-11-14 12:12 UTC (permalink / raw) To: Jan Kara Cc: linux-fsdevel, Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig On Mon, Nov 14, 2011 at 12:10:30PM +0100, Jan Kara wrote: > again: > + if (signal_pending(current)) { Uhh ... surely 'fatal_signal_pending', no? > + status = -EINTR; > + break; > + } > > /* > * Bring in the user page that we will copy from _first_. > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 2011-11-14 12:12 ` Matthew Wilcox @ 2011-11-14 12:15 ` Wu Fengguang 2011-11-14 12:34 ` Jan Kara 1 sibling, 1 reply; 39+ messages in thread From: Wu Fengguang @ 2011-11-14 12:15 UTC (permalink / raw) To: Jan Kara Cc: linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig > @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file, > iov_iter_count(i)); > > again: > + if (signal_pending(current)) { signal_pending looks more useful than fatal_signal_pending in that it covers normal signals too. However it's exactly the broader coverage that makes it an interface change -- will this possibly break casually written applications? > + status = -EINTR; > + break; > + } ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 12:15 ` Wu Fengguang @ 2011-11-14 12:34 ` Jan Kara 2011-11-14 14:16 ` Matthew Wilcox 0 siblings, 1 reply; 39+ messages in thread From: Jan Kara @ 2011-11-14 12:34 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig On Mon 14-11-11 20:15:56, Wu Fengguang wrote: > > @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file, > > iov_iter_count(i)); > > > > again: > > + if (signal_pending(current)) { > > signal_pending looks more useful than fatal_signal_pending in that it > covers normal signals too. However it's exactly the broader coverage > that makes it an interface change -- will this possibly break casually > written applications? Yeah, this is upto discussion. Historically, write() (or any other system call) could have returned EINTR. In fact, write() to a socket can return EINTR even now. But you are right that we didn't return EINTR from write() to a regular file. So if you prefer to never return EINTR from a write to a regular file, I can change the check since I'm also slightly worried that some badly written app can notice. Honza > > > + status = -EINTR; > > + break; > > + } -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 12:34 ` Jan Kara @ 2011-11-14 14:16 ` Matthew Wilcox 2011-11-14 15:30 ` Jan Kara 0 siblings, 1 reply; 39+ messages in thread From: Matthew Wilcox @ 2011-11-14 14:16 UTC (permalink / raw) To: Jan Kara Cc: Wu Fengguang, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig On Mon, Nov 14, 2011 at 01:34:46PM +0100, Jan Kara wrote: > On Mon 14-11-11 20:15:56, Wu Fengguang wrote: > > > @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file, > > > iov_iter_count(i)); > > > > > > again: > > > + if (signal_pending(current)) { > > > > signal_pending looks more useful than fatal_signal_pending in that it > > covers normal signals too. However it's exactly the broader coverage > > that makes it an interface change -- will this possibly break casually > > written applications? > Yeah, this is upto discussion. Historically, write() (or any other system > call) could have returned EINTR. In fact, write() to a socket can return > EINTR even now. But you are right that we didn't return EINTR from write() > to a regular file. So if you prefer to never return EINTR from a write to a > regular file, I can change the check since I'm also slightly worried that > some badly written app can notice. No, this is not up for discussion. You can't return short writes (or reads). This is why the 'fatal_signal_pending' API exists -- if the signal is fatal, the task is never returned to, so its bug (not checking the return from read/write) is not exposed. -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 14:16 ` Matthew Wilcox @ 2011-11-14 15:30 ` Jan Kara 2011-11-14 18:44 ` Jeremy Allison 0 siblings, 1 reply; 39+ messages in thread From: Jan Kara @ 2011-11-14 15:30 UTC (permalink / raw) To: Matthew Wilcox Cc: Jan Kara, Wu Fengguang, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig On Mon 14-11-11 07:16:26, Matthew Wilcox wrote: > On Mon, Nov 14, 2011 at 01:34:46PM +0100, Jan Kara wrote: > > On Mon 14-11-11 20:15:56, Wu Fengguang wrote: > > > > @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file, > > > > iov_iter_count(i)); > > > > > > > > again: > > > > + if (signal_pending(current)) { > > > > > > signal_pending looks more useful than fatal_signal_pending in that it > > > covers normal signals too. However it's exactly the broader coverage > > > that makes it an interface change -- will this possibly break casually > > > written applications? > > Yeah, this is upto discussion. Historically, write() (or any other system > > call) could have returned EINTR. In fact, write() to a socket can return > > EINTR even now. But you are right that we didn't return EINTR from write() > > to a regular file. So if you prefer to never return EINTR from a write to a > > regular file, I can change the check since I'm also slightly worried that > > some badly written app can notice. > > No, this is not up for discussion. You can't return short writes (or > reads). This is why the 'fatal_signal_pending' API exists -- if the > signal is fatal, the task is never returned to, so its bug (not checking > the return from read/write) is not exposed. By "can't return" you mean userspace need not be expecting it so we shouldn't break it or is there some standard which forbids it? Just curious... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 15:30 ` Jan Kara @ 2011-11-14 18:44 ` Jeremy Allison 0 siblings, 0 replies; 39+ messages in thread From: Jeremy Allison @ 2011-11-14 18:44 UTC (permalink / raw) To: Jan Kara Cc: Matthew Wilcox, Wu Fengguang, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig On Mon, Nov 14, 2011 at 04:30:47PM +0100, Jan Kara wrote: > On Mon 14-11-11 07:16:26, Matthew Wilcox wrote: > > On Mon, Nov 14, 2011 at 01:34:46PM +0100, Jan Kara wrote: > > > On Mon 14-11-11 20:15:56, Wu Fengguang wrote: > > > > > @@ -2407,6 +2407,10 @@ static ssize_t generic_perform_write(struct file *file, > > > > > iov_iter_count(i)); > > > > > > > > > > again: > > > > > + if (signal_pending(current)) { > > > > > > > > signal_pending looks more useful than fatal_signal_pending in that it > > > > covers normal signals too. However it's exactly the broader coverage > > > > that makes it an interface change -- will this possibly break casually > > > > written applications? > > > Yeah, this is upto discussion. Historically, write() (or any other system > > > call) could have returned EINTR. In fact, write() to a socket can return > > > EINTR even now. But you are right that we didn't return EINTR from write() > > > to a regular file. So if you prefer to never return EINTR from a write to a > > > regular file, I can change the check since I'm also slightly worried that > > > some badly written app can notice. > > > > No, this is not up for discussion. You can't return short writes (or > > reads). This is why the 'fatal_signal_pending' API exists -- if the > > signal is fatal, the task is never returned to, so its bug (not checking > > the return from read/write) is not exposed. > By "can't return" you mean userspace need not be expecting it so we > shouldn't break it or is there some standard which forbids it? Just > curious... This *WILL* break userspace code if you return short writes on a filesystem fd. Guarenteed. I originally wrote code in Samba to take care of it back before I learned the difference between "slow" and "fast" interruptable system calls (see W.R.Stevens for details). Don't return short writes or reads on a filesystem fd. Jeremy. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable 2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara 2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara 2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara @ 2011-11-14 11:59 ` Wu Fengguang 2011-11-14 12:05 ` Christoph Hellwig 2011-11-14 12:12 ` Jan Kara 2 siblings, 2 replies; 39+ messages in thread From: Wu Fengguang @ 2011-11-14 11:59 UTC (permalink / raw) To: Jan Kara Cc: linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig [-- Attachment #1: Type: text/plain, Size: 955 bytes --] On Mon, Nov 14, 2011 at 07:10:28PM +0800, Jan Kara wrote: > > Hello, > > these two patches aim at making task waiting in balance_dirty_pages() > killable. This is desirable because otherwise if filesystem stops accepting > writes (e.g. if device has been removed or other serious error condidion) we > have a task stuck in D state forever. Agreed totally. I myself has run into such conditions and get very annoyed not being able to kill the hard throttled tasks -- they just stuck there for ever if the error condition does not change. > I'm not sure who should merge these two patches... Al, Fengguang? I'd like to do it -- otherwise there will obviously be merge conflicts. Actually I also queued a patch to do this (attached). Your patches do better on TASK_KILLABLE and the use of signal_pending() in write routines, while mine goes further to add the break to various filesystems. How about combining them together? Thanks, Fengguang [-- Attachment #2: writeback-break-on-signal-pending.patch --] [-- Type: text/x-diff, Size: 4638 bytes --] Subject: writeback: quit throttling when fatal signal pending Date: Wed Sep 08 17:40:22 CST 2010 This allows quick response to Ctrl-C etc. for impatient users. It's necessary to abort the generic_perform_write() and other filesystem write loops too, to avoid large write + SIGKILL combination exceeding the dirty limit and possibly strange OOM. It mainly helps the rare bdi/global dirty exceeded cases. In the normal case of not exceeded, it will quit the loop anyway. if (fatal_signal_pending(current) && nr_reclaimable + nr_writeback <= dirty_thresh) break; Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> cc: Chris Mason <chris.mason@oracle.com> cc: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> --- fs/btrfs/file.c | 4 ++++ fs/btrfs/ioctl.c | 4 ++++ fs/btrfs/relocation.c | 4 ++++ fs/buffer.c | 4 ++++ fs/ntfs/attrib.c | 2 ++ fs/ntfs/file.c | 6 ++++++ mm/filemap.c | 3 ++- mm/page-writeback.c | 2 ++ 8 files changed, 28 insertions(+), 1 deletion(-) --- linux-next.orig/mm/page-writeback.c 2011-11-13 20:37:11.000000000 +0800 +++ linux-next/mm/page-writeback.c 2011-11-13 20:37:57.000000000 +0800 @@ -1185,6 +1185,8 @@ pause: */ if (task_ratelimit) break; + if (fatal_signal_pending(current)) + break; } if (!dirty_exceeded && bdi->dirty_exceeded) --- linux-next.orig/mm/filemap.c 2011-11-13 20:03:23.000000000 +0800 +++ linux-next/mm/filemap.c 2011-11-13 20:37:17.000000000 +0800 @@ -2463,7 +2463,8 @@ again: written += copied; balance_dirty_pages_ratelimited(mapping); - + if (fatal_signal_pending(current)) + break; } while (iov_iter_count(i)); return written ? written : status; --- linux-next.orig/fs/btrfs/file.c 2011-11-13 20:37:11.000000000 +0800 +++ linux-next/fs/btrfs/file.c 2011-11-13 20:37:17.000000000 +0800 @@ -1274,6 +1274,10 @@ static noinline ssize_t __btrfs_buffered dirty_pages); if (dirty_pages < (root->leafsize >> PAGE_CACHE_SHIFT) + 1) btrfs_btree_balance_dirty(root, 1); + if (fatal_signal_pending(current)) { + ret = -EINTR; + break; + } btrfs_throttle(root); pos += copied; --- linux-next.orig/fs/btrfs/ioctl.c 2011-11-13 20:03:23.000000000 +0800 +++ linux-next/fs/btrfs/ioctl.c 2011-11-13 20:37:17.000000000 +0800 @@ -1115,6 +1115,10 @@ int btrfs_defrag_file(struct inode *inod defrag_count += ret; balance_dirty_pages_ratelimited_nr(inode->i_mapping, ret); + if (fatal_signal_pending(current)) { + ret = -EINTR; + goto out_ra; + } if (newer_than) { if (newer_off == (u64)-1) --- linux-next.orig/fs/btrfs/relocation.c 2011-11-13 20:03:23.000000000 +0800 +++ linux-next/fs/btrfs/relocation.c 2011-11-13 20:37:17.000000000 +0800 @@ -3009,6 +3009,10 @@ static int relocate_file_extent_cluster( index++; balance_dirty_pages_ratelimited(inode->i_mapping); + if (fatal_signal_pending(current)) { + ret = -EINTR; + break; + } btrfs_throttle(BTRFS_I(inode)->root); } WARN_ON(nr != cluster->nr); --- linux-next.orig/fs/buffer.c 2011-11-13 20:03:23.000000000 +0800 +++ linux-next/fs/buffer.c 2011-11-13 20:37:17.000000000 +0800 @@ -2255,6 +2255,10 @@ static int cont_expand_zero(struct file err = 0; balance_dirty_pages_ratelimited(mapping); + if (fatal_signal_pending(current)) { + err = -EINTR; + goto out; + } } /* page covers the boundary, find the boundary offset */ --- linux-next.orig/fs/ntfs/attrib.c 2011-11-13 20:03:23.000000000 +0800 +++ linux-next/fs/ntfs/attrib.c 2011-11-13 20:37:17.000000000 +0800 @@ -2588,6 +2588,8 @@ int ntfs_attr_set(ntfs_inode *ni, const unlock_page(page); page_cache_release(page); balance_dirty_pages_ratelimited(mapping); + if (fatal_signal_pending(current)) + return -EINTR; cond_resched(); } /* If there is a last partial page, need to do it the slow way. */ --- linux-next.orig/fs/ntfs/file.c 2011-11-13 20:03:23.000000000 +0800 +++ linux-next/fs/ntfs/file.c 2011-11-13 20:37:17.000000000 +0800 @@ -278,6 +278,10 @@ do_non_resident_extend: * files. */ balance_dirty_pages_ratelimited(mapping); + if (fatal_signal_pending(current)) { + err = -EINTR; + goto init_err_out; + } cond_resched(); } while (++index < end_index); read_lock_irqsave(&ni->size_lock, flags); @@ -2054,6 +2058,8 @@ static ssize_t ntfs_file_buffered_write( if (unlikely(status)) break; balance_dirty_pages_ratelimited(mapping); + if (fatal_signal_pending(current)) + break; cond_resched(); } while (count); err_out: ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable 2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang @ 2011-11-14 12:05 ` Christoph Hellwig 2011-11-14 12:24 ` Jan Kara 2011-11-14 12:29 ` Wu Fengguang 2011-11-14 12:12 ` Jan Kara 1 sibling, 2 replies; 39+ messages in thread From: Christoph Hellwig @ 2011-11-14 12:05 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig On Mon, Nov 14, 2011 at 07:59:12PM +0800, Wu Fengguang wrote: > On Mon, Nov 14, 2011 at 07:10:28PM +0800, Jan Kara wrote: > > > > Hello, > > > > these two patches aim at making task waiting in balance_dirty_pages() > > killable. This is desirable because otherwise if filesystem stops accepting > > writes (e.g. if device has been removed or other serious error condidion) we > > have a task stuck in D state forever. > > Agreed totally. I myself has run into such conditions and get very > annoyed not being able to kill the hard throttled tasks -- they just > stuck there for ever if the error condition does not change. > > > I'm not sure who should merge these two patches... Al, Fengguang? > > I'd like to do it -- otherwise there will obviously be merge conflicts. > > Actually I also queued a patch to do this (attached). Your patches do > better on TASK_KILLABLE and the use of signal_pending() in write > routines, while mine goes further to add the break to various > filesystems. How about combining them together? Can you make balance_dirty_pages(_ratelimited) return an error instead of opencoding the fatal signal check everywhere? That would make the interface a bit more obvious. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable 2011-11-14 12:05 ` Christoph Hellwig @ 2011-11-14 12:24 ` Jan Kara 2011-11-14 12:29 ` Wu Fengguang 1 sibling, 0 replies; 39+ messages in thread From: Jan Kara @ 2011-11-14 12:24 UTC (permalink / raw) To: Christoph Hellwig Cc: Wu Fengguang, Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton On Mon 14-11-11 07:05:46, Christoph Hellwig wrote: > On Mon, Nov 14, 2011 at 07:59:12PM +0800, Wu Fengguang wrote: > > On Mon, Nov 14, 2011 at 07:10:28PM +0800, Jan Kara wrote: > > > > > > Hello, > > > > > > these two patches aim at making task waiting in balance_dirty_pages() > > > killable. This is desirable because otherwise if filesystem stops accepting > > > writes (e.g. if device has been removed or other serious error condidion) we > > > have a task stuck in D state forever. > > > > Agreed totally. I myself has run into such conditions and get very > > annoyed not being able to kill the hard throttled tasks -- they just > > stuck there for ever if the error condition does not change. > > > > > I'm not sure who should merge these two patches... Al, Fengguang? > > > > I'd like to do it -- otherwise there will obviously be merge conflicts. > > > > Actually I also queued a patch to do this (attached). Your patches do > > better on TASK_KILLABLE and the use of signal_pending() in write > > routines, while mine goes further to add the break to various > > filesystems. How about combining them together? > > Can you make balance_dirty_pages(_ratelimited) return an error instead > of opencoding the fatal signal check everywhere? That would make the > interface a bit more obvious. We can do this. It's just that signal_pending() check e.g. in generic_perform_write() has a sense even when balance_dirty_pages() will be returning error because it also catches other cases... Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable 2011-11-14 12:05 ` Christoph Hellwig 2011-11-14 12:24 ` Jan Kara @ 2011-11-14 12:29 ` Wu Fengguang 2011-11-14 12:41 ` Christoph Hellwig 1 sibling, 1 reply; 39+ messages in thread From: Wu Fengguang @ 2011-11-14 12:29 UTC (permalink / raw) To: Christoph Hellwig Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton > Can you make balance_dirty_pages(_ratelimited) return an error instead > of opencoding the fatal signal che Hmm that means a bigger change to the prototype of the _exported_ balance_dirty_pages_ratelimited_nr() which impacts out of tree FS... ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable 2011-11-14 12:29 ` Wu Fengguang @ 2011-11-14 12:41 ` Christoph Hellwig 2011-11-14 13:01 ` Wu Fengguang 0 siblings, 1 reply; 39+ messages in thread From: Christoph Hellwig @ 2011-11-14 12:41 UTC (permalink / raw) To: Wu Fengguang Cc: Christoph Hellwig, Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote: > > Can you make balance_dirty_pages(_ratelimited) return an error instead > > of opencoding the fatal signal che > > Hmm that means a bigger change to the prototype of the _exported_ > balance_dirty_pages_ratelimited_nr() which impacts out of tree FS... So what? out of tree fses have always been entirely supported on their own, and not a matter for keeping progress back. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable 2011-11-14 12:41 ` Christoph Hellwig @ 2011-11-14 13:01 ` Wu Fengguang 2011-11-14 15:28 ` Jan Kara 0 siblings, 1 reply; 39+ messages in thread From: Wu Fengguang @ 2011-11-14 13:01 UTC (permalink / raw) To: Christoph Hellwig Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton On Mon, Nov 14, 2011 at 08:41:45PM +0800, Christoph Hellwig wrote: > On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote: > > > Can you make balance_dirty_pages(_ratelimited) return an error instead > > > of opencoding the fatal signal che > > > > Hmm that means a bigger change to the prototype of the _exported_ > > balance_dirty_pages_ratelimited_nr() which impacts out of tree FS... > > So what? out of tree fses have always been entirely supported on their > own, and not a matter for keeping progress back. OK. Jan, would you do the mm/*.c part in v2? I'll do the various FS parts. Thanks, Fengguang ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable 2011-11-14 13:01 ` Wu Fengguang @ 2011-11-14 15:28 ` Jan Kara 2011-11-14 15:32 ` Christoph Hellwig 0 siblings, 1 reply; 39+ messages in thread From: Jan Kara @ 2011-11-14 15:28 UTC (permalink / raw) To: Wu Fengguang Cc: Christoph Hellwig, Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton On Mon 14-11-11 21:01:53, Wu Fengguang wrote: > On Mon, Nov 14, 2011 at 08:41:45PM +0800, Christoph Hellwig wrote: > > On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote: > > > > Can you make balance_dirty_pages(_ratelimited) return an error instead > > > > of opencoding the fatal signal che > > > > > > Hmm that means a bigger change to the prototype of the _exported_ > > > balance_dirty_pages_ratelimited_nr() which impacts out of tree FS... > > > > So what? out of tree fses have always been entirely supported on their > > own, and not a matter for keeping progress back. > > OK. Jan, would you do the mm/*.c part in v2? I'll do the various FS > parts. Sure, I'll update my patch to balance_dirty_pages(). Should I also update my patch to generic_perform_write() or does that fall under "various FS parts" cathegory? Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable 2011-11-14 15:28 ` Jan Kara @ 2011-11-14 15:32 ` Christoph Hellwig 2011-11-14 16:19 ` Jan Kara 0 siblings, 1 reply; 39+ messages in thread From: Christoph Hellwig @ 2011-11-14 15:32 UTC (permalink / raw) To: Jan Kara Cc: Wu Fengguang, Christoph Hellwig, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton On Mon, Nov 14, 2011 at 04:28:32PM +0100, Jan Kara wrote: > On Mon 14-11-11 21:01:53, Wu Fengguang wrote: > > On Mon, Nov 14, 2011 at 08:41:45PM +0800, Christoph Hellwig wrote: > > > On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote: > > > > > Can you make balance_dirty_pages(_ratelimited) return an error instead > > > > > of opencoding the fatal signal che > > > > > > > > Hmm that means a bigger change to the prototype of the _exported_ > > > > balance_dirty_pages_ratelimited_nr() which impacts out of tree FS... > > > > > > So what? out of tree fses have always been entirely supported on their > > > own, and not a matter for keeping progress back. > > > > OK. Jan, would you do the mm/*.c part in v2? I'll do the various FS > > parts. > Sure, I'll update my patch to balance_dirty_pages(). Should I also update > my patch to generic_perform_write() or does that fall under "various FS > parts" cathegory? I'd be much happier if all bits go in together. IMHO this is a change worthwhile for 3.2 still, so we should probably try to get it in ASAP. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable 2011-11-14 15:32 ` Christoph Hellwig @ 2011-11-14 16:19 ` Jan Kara 0 siblings, 0 replies; 39+ messages in thread From: Jan Kara @ 2011-11-14 16:19 UTC (permalink / raw) To: Christoph Hellwig Cc: Jan Kara, Wu Fengguang, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton On Mon 14-11-11 10:32:26, Christoph Hellwig wrote: > On Mon, Nov 14, 2011 at 04:28:32PM +0100, Jan Kara wrote: > > On Mon 14-11-11 21:01:53, Wu Fengguang wrote: > > > On Mon, Nov 14, 2011 at 08:41:45PM +0800, Christoph Hellwig wrote: > > > > On Mon, Nov 14, 2011 at 08:29:32PM +0800, Wu Fengguang wrote: > > > > > > Can you make balance_dirty_pages(_ratelimited) return an error instead > > > > > > of opencoding the fatal signal che > > > > > > > > > > Hmm that means a bigger change to the prototype of the _exported_ > > > > > balance_dirty_pages_ratelimited_nr() which impacts out of tree FS... > > > > > > > > So what? out of tree fses have always been entirely supported on their > > > > own, and not a matter for keeping progress back. > > > > > > OK. Jan, would you do the mm/*.c part in v2? I'll do the various FS > > > parts. > > Sure, I'll update my patch to balance_dirty_pages(). Should I also update > > my patch to generic_perform_write() or does that fall under "various FS > > parts" cathegory? > > I'd be much happier if all bits go in together. IMHO this is a change > worthwhile for 3.2 still, so we should probably try to get it in ASAP. OK, I've resent my patches with requested changes. I think those two patches make sense on it's own and fix the biggest single pain point. Fengguang can add other fs bits to these two patches (although personally I'd be happier to have respective fs maintainers acks on them before pushing). Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 0/2] Make task doing heavy writing killable 2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang 2011-11-14 12:05 ` Christoph Hellwig @ 2011-11-14 12:12 ` Jan Kara 1 sibling, 0 replies; 39+ messages in thread From: Jan Kara @ 2011-11-14 12:12 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig On Mon 14-11-11 19:59:12, Wu Fengguang wrote: > On Mon, Nov 14, 2011 at 07:10:28PM +0800, Jan Kara wrote: > > > > Hello, > > > > these two patches aim at making task waiting in balance_dirty_pages() > > killable. This is desirable because otherwise if filesystem stops accepting > > writes (e.g. if device has been removed or other serious error condidion) we > > have a task stuck in D state forever. > > Agreed totally. I myself has run into such conditions and get very > annoyed not being able to kill the hard throttled tasks -- they just > stuck there for ever if the error condition does not change. > > > I'm not sure who should merge these two patches... Al, Fengguang? > > I'd like to do it -- otherwise there will obviously be merge conflicts. Good. > Actually I also queued a patch to do this (attached). Your patches do > better on TASK_KILLABLE and the use of signal_pending() in write > routines, while mine goes further to add the break to various > filesystems. How about combining them together? > > Subject: writeback: quit throttling when fatal signal pending > Date: Wed Sep 08 17:40:22 CST 2010 > > This allows quick response to Ctrl-C etc. for impatient users. > > It's necessary to abort the generic_perform_write() and other filesystem > write loops too, to avoid large write + SIGKILL combination exceeding > the dirty limit and possibly strange OOM. > > It mainly helps the rare bdi/global dirty exceeded cases. > In the normal case of not exceeded, it will quit the loop anyway. > > if (fatal_signal_pending(current) && > nr_reclaimable + nr_writeback <= dirty_thresh) > break; > > Reviewed-by: Neil Brown <neilb@suse.de> > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > cc: Chris Mason <chris.mason@oracle.com> > cc: Anton Altaparmakov <aia21@cantab.net> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> > --- > fs/btrfs/file.c | 4 ++++ > fs/btrfs/ioctl.c | 4 ++++ > fs/btrfs/relocation.c | 4 ++++ > fs/buffer.c | 4 ++++ > fs/ntfs/attrib.c | 2 ++ > fs/ntfs/file.c | 6 ++++++ > mm/filemap.c | 3 ++- > mm/page-writeback.c | 2 ++ > 8 files changed, 28 insertions(+), 1 deletion(-) > > --- linux-next.orig/mm/page-writeback.c 2011-11-13 20:37:11.000000000 +0800 > +++ linux-next/mm/page-writeback.c 2011-11-13 20:37:57.000000000 +0800 > @@ -1185,6 +1185,8 @@ pause: > */ > if (task_ratelimit) > break; > + if (fatal_signal_pending(current)) > + break; > } > > if (!dirty_exceeded && bdi->dirty_exceeded) > --- linux-next.orig/mm/filemap.c 2011-11-13 20:03:23.000000000 +0800 > +++ linux-next/mm/filemap.c 2011-11-13 20:37:17.000000000 +0800 > @@ -2463,7 +2463,8 @@ again: > written += copied; > > balance_dirty_pages_ratelimited(mapping); > - > + if (fatal_signal_pending(current)) > + break; > } while (iov_iter_count(i)); > > return written ? written : status; Above two hunks are already part of my patches (effectively). The hunks below look good but I guess they should be split to per-filesystem patches and sent to filesystem maintainers for merging. The changes are independent so there's no reason to do them in one big patch or skip fs maintainers... Honza > --- linux-next.orig/fs/btrfs/file.c 2011-11-13 20:37:11.000000000 +0800 > +++ linux-next/fs/btrfs/file.c 2011-11-13 20:37:17.000000000 +0800 > @@ -1274,6 +1274,10 @@ static noinline ssize_t __btrfs_buffered > dirty_pages); > if (dirty_pages < (root->leafsize >> PAGE_CACHE_SHIFT) + 1) > btrfs_btree_balance_dirty(root, 1); > + if (fatal_signal_pending(current)) { > + ret = -EINTR; > + break; > + } > btrfs_throttle(root); > > pos += copied; > --- linux-next.orig/fs/btrfs/ioctl.c 2011-11-13 20:03:23.000000000 +0800 > +++ linux-next/fs/btrfs/ioctl.c 2011-11-13 20:37:17.000000000 +0800 > @@ -1115,6 +1115,10 @@ int btrfs_defrag_file(struct inode *inod > > defrag_count += ret; > balance_dirty_pages_ratelimited_nr(inode->i_mapping, ret); > + if (fatal_signal_pending(current)) { > + ret = -EINTR; > + goto out_ra; > + } > > if (newer_than) { > if (newer_off == (u64)-1) > --- linux-next.orig/fs/btrfs/relocation.c 2011-11-13 20:03:23.000000000 +0800 > +++ linux-next/fs/btrfs/relocation.c 2011-11-13 20:37:17.000000000 +0800 > @@ -3009,6 +3009,10 @@ static int relocate_file_extent_cluster( > > index++; > balance_dirty_pages_ratelimited(inode->i_mapping); > + if (fatal_signal_pending(current)) { > + ret = -EINTR; > + break; > + } > btrfs_throttle(BTRFS_I(inode)->root); > } > WARN_ON(nr != cluster->nr); > --- linux-next.orig/fs/buffer.c 2011-11-13 20:03:23.000000000 +0800 > +++ linux-next/fs/buffer.c 2011-11-13 20:37:17.000000000 +0800 > @@ -2255,6 +2255,10 @@ static int cont_expand_zero(struct file > err = 0; > > balance_dirty_pages_ratelimited(mapping); > + if (fatal_signal_pending(current)) { > + err = -EINTR; > + goto out; > + } > } > > /* page covers the boundary, find the boundary offset */ > --- linux-next.orig/fs/ntfs/attrib.c 2011-11-13 20:03:23.000000000 +0800 > +++ linux-next/fs/ntfs/attrib.c 2011-11-13 20:37:17.000000000 +0800 > @@ -2588,6 +2588,8 @@ int ntfs_attr_set(ntfs_inode *ni, const > unlock_page(page); > page_cache_release(page); > balance_dirty_pages_ratelimited(mapping); > + if (fatal_signal_pending(current)) > + return -EINTR; > cond_resched(); > } > /* If there is a last partial page, need to do it the slow way. */ > --- linux-next.orig/fs/ntfs/file.c 2011-11-13 20:03:23.000000000 +0800 > +++ linux-next/fs/ntfs/file.c 2011-11-13 20:37:17.000000000 +0800 > @@ -278,6 +278,10 @@ do_non_resident_extend: > * files. > */ > balance_dirty_pages_ratelimited(mapping); > + if (fatal_signal_pending(current)) { > + err = -EINTR; > + goto init_err_out; > + } > cond_resched(); > } while (++index < end_index); > read_lock_irqsave(&ni->size_lock, flags); > @@ -2054,6 +2058,8 @@ static ssize_t ntfs_file_buffered_write( > if (unlikely(status)) > break; > balance_dirty_pages_ratelimited(mapping); > + if (fatal_signal_pending(current)) > + break; > cond_resched(); > } while (count); > err_out: -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 0/2 v2] Make task in balance_dirty_pages() killable @ 2011-11-14 16:15 Jan Kara 2011-11-14 16:15 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 0 siblings, 1 reply; 39+ messages in thread From: Jan Kara @ 2011-11-14 16:15 UTC (permalink / raw) To: Wu Fengguang; +Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel Hello, here is a second iteration of the patches. Changes since v1: * slightly moved the check in balance_dirty_pages() as Fengguang requested * made balance_dirty_pages() return EINTR if fatal signal was detected * changed check for signal to check for fatal signal in generic_perform_write() to avoid unexpected results for userspace applications. Honza ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 16:15 [PATCH 0/2 v2] Make task in balance_dirty_pages() killable Jan Kara @ 2011-11-14 16:15 ` Jan Kara 2011-11-14 16:26 ` Christoph Hellwig 2011-11-14 22:19 ` Andrew Morton 0 siblings, 2 replies; 39+ messages in thread From: Jan Kara @ 2011-11-14 16:15 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel, Jan Kara Currently write(2) to a file is not interruptible by a signal. Sometimes this is desirable (e.g. when you want to quickly kill a process hogging your disk or when some process gets blocked in balance_dirty_pages() indefinitely due to a filesystem being in an error condition). Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Jan Kara <jack@suse.cz> --- mm/filemap.c | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index c0018f2..166b30e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, iov_iter_count(i)); again: - /* * Bring in the user page that we will copy from _first_. * Otherwise there's a nasty deadlock on copying from the @@ -2463,7 +2462,15 @@ again: written += copied; balance_dirty_pages_ratelimited(mapping); - + /* + * We check the signal independently of balance_dirty_pages() + * because we need not wait and check for signal there although + * this loop could have taken significant amount of time... + */ + if (fatal_signal_pending(current)) { + status = -EINTR; + break; + } } while (iov_iter_count(i)); return written ? written : status; -- 1.7.1 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 16:15 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara @ 2011-11-14 16:26 ` Christoph Hellwig 2011-11-14 16:46 ` Jan Kara 2011-11-14 22:19 ` Andrew Morton 1 sibling, 1 reply; 39+ messages in thread From: Christoph Hellwig @ 2011-11-14 16:26 UTC (permalink / raw) To: Jan Kara Cc: Wu Fengguang, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel On Mon, Nov 14, 2011 at 05:15:24PM +0100, Jan Kara wrote: > Currently write(2) to a file is not interruptible by a signal. Sometimes this > is desirable (e.g. when you want to quickly kill a process hogging your disk or > when some process gets blocked in balance_dirty_pages() indefinitely due to a > filesystem being in an error condition). > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > mm/filemap.c | 11 +++++++++-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index c0018f2..166b30e 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > iov_iter_count(i)); > > again: > - > /* > * Bring in the user page that we will copy from _first_. > * Otherwise there's a nasty deadlock on copying from the > @@ -2463,7 +2462,15 @@ again: > written += copied; > > balance_dirty_pages_ratelimited(mapping); > - > + /* > + * We check the signal independently of balance_dirty_pages() > + * because we need not wait and check for signal there although > + * this loop could have taken significant amount of time... > + */ > + if (fatal_signal_pending(current)) { > + status = -EINTR; > + break; > + } Hmm. If we need to check again every time adding the return value to balance_dirty_pages is rather pointless. I have a bit of a problem parsing the comment - does it try to say that we might spend too much time after the fatal_signal_pending in the balance_dirty_pages code so that we have to check it again? Why not repeat the check at the end of balance_dirty_pages_ratelimited and thus avoid having to duplicate the thing in all callers? ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 16:26 ` Christoph Hellwig @ 2011-11-14 16:46 ` Jan Kara 2011-11-14 20:13 ` Christoph Hellwig 0 siblings, 1 reply; 39+ messages in thread From: Jan Kara @ 2011-11-14 16:46 UTC (permalink / raw) To: Christoph Hellwig Cc: Jan Kara, Wu Fengguang, Andrew Morton, Al Viro, linux-fsdevel On Mon 14-11-11 11:26:00, Christoph Hellwig wrote: > On Mon, Nov 14, 2011 at 05:15:24PM +0100, Jan Kara wrote: > > Currently write(2) to a file is not interruptible by a signal. Sometimes this > > is desirable (e.g. when you want to quickly kill a process hogging your disk or > > when some process gets blocked in balance_dirty_pages() indefinitely due to a > > filesystem being in an error condition). > > > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > mm/filemap.c | 11 +++++++++-- > > 1 files changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index c0018f2..166b30e 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > > iov_iter_count(i)); > > > > again: > > - > > /* > > * Bring in the user page that we will copy from _first_. > > * Otherwise there's a nasty deadlock on copying from the > > @@ -2463,7 +2462,15 @@ again: > > written += copied; > > > > balance_dirty_pages_ratelimited(mapping); > > - > > + /* > > + * We check the signal independently of balance_dirty_pages() > > + * because we need not wait and check for signal there although > > + * this loop could have taken significant amount of time... > > + */ > > + if (fatal_signal_pending(current)) { > > + status = -EINTR; > > + break; > > + } > > Hmm. If we need to check again every time adding the return value to > balance_dirty_pages is rather pointless. > > I have a bit of a problem parsing the comment - does it try to say that > we might spend too much time after the fatal_signal_pending in the > balance_dirty_pages code so that we have to check it again? Why not > repeat the check at the end of balance_dirty_pages_ratelimited and thus > avoid having to duplicate the thing in all callers? The point I was trying to get across is that all: iov_iter_fault_in_readable() ->write_begin() ... ->write_end() can take a rather long time. Considering that balance_dirty_pages_ratelimited() needn't call balance_dirty_pages() or balance_dirty_pages() can just return without doing anything because there aren't many dirty pages, the end result is that balance_dirty_pages_ratelimited() won't return EINTR for rather long time although there is a signal pending. That's why I want to check for signal pending directly in the loop in generic_perform_write(). There can be loops where the only blocking point is in balance_dirty_pages() and then using the return value makes sense but I think such loops would be in minority so maybe the return value doesn't make much sense after all. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 16:46 ` Jan Kara @ 2011-11-14 20:13 ` Christoph Hellwig 0 siblings, 0 replies; 39+ messages in thread From: Christoph Hellwig @ 2011-11-14 20:13 UTC (permalink / raw) To: Jan Kara Cc: Christoph Hellwig, Wu Fengguang, Andrew Morton, Al Viro, linux-fsdevel On Mon, Nov 14, 2011 at 05:46:47PM +0100, Jan Kara wrote: > There can be loops where the only blocking point is in > balance_dirty_pages() and then using the return value makes sense but > I think such loops would be in minority so maybe the return value doesn't > make much sense after all. Yes, maybe the return value isn't that smart. Sorry for all the noise. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 16:15 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 2011-11-14 16:26 ` Christoph Hellwig @ 2011-11-14 22:19 ` Andrew Morton 2011-11-15 11:23 ` Jan Kara 1 sibling, 1 reply; 39+ messages in thread From: Andrew Morton @ 2011-11-14 22:19 UTC (permalink / raw) To: Jan Kara; +Cc: Wu Fengguang, Christoph Hellwig, Al Viro, linux-fsdevel On Mon, 14 Nov 2011 17:15:24 +0100 Jan Kara <jack@suse.cz> wrote: > Currently write(2) to a file is not interruptible by a signal. Sometimes this > is desirable (e.g. when you want to quickly kill a process hogging your disk or > when some process gets blocked in balance_dirty_pages() indefinitely due to a > filesystem being in an error condition). > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > mm/filemap.c | 11 +++++++++-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index c0018f2..166b30e 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > iov_iter_count(i)); > > again: > - > /* > * Bring in the user page that we will copy from _first_. > * Otherwise there's a nasty deadlock on copying from the > @@ -2463,7 +2462,15 @@ again: > written += copied; > > balance_dirty_pages_ratelimited(mapping); > - > + /* > + * We check the signal independently of balance_dirty_pages() > + * because we need not wait and check for signal there although > + * this loop could have taken significant amount of time... > + */ > + if (fatal_signal_pending(current)) { > + status = -EINTR; > + break; > + } > } while (iov_iter_count(i)); > > return written ? written : status; Will this permit the interruption of things like fsync() or sync()? If so, considerable pondering is needed. Also I worry about stuff like the use of buffered write to finish off loose ends in direct-IO writing. Sometimes these writes MUST complete, to prevent exposing unwritten disk blocks to a subsequent read. Will a well-timed ^C break this? If "no" then does this change introduce risk that we'll later accidentally introduce such a security hole? ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 22:19 ` Andrew Morton @ 2011-11-15 11:23 ` Jan Kara 0 siblings, 0 replies; 39+ messages in thread From: Jan Kara @ 2011-11-15 11:23 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Wu Fengguang, Christoph Hellwig, Al Viro, linux-fsdevel On Mon 14-11-11 14:19:54, Andrew Morton wrote: > On Mon, 14 Nov 2011 17:15:24 +0100 > Jan Kara <jack@suse.cz> wrote: > > > Currently write(2) to a file is not interruptible by a signal. Sometimes this > > is desirable (e.g. when you want to quickly kill a process hogging your disk or > > when some process gets blocked in balance_dirty_pages() indefinitely due to a > > filesystem being in an error condition). > > > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > mm/filemap.c | 11 +++++++++-- > > 1 files changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index c0018f2..166b30e 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > > iov_iter_count(i)); > > > > again: > > - > > /* > > * Bring in the user page that we will copy from _first_. > > * Otherwise there's a nasty deadlock on copying from the > > @@ -2463,7 +2462,15 @@ again: > > written += copied; > > > > balance_dirty_pages_ratelimited(mapping); > > - > > + /* > > + * We check the signal independently of balance_dirty_pages() > > + * because we need not wait and check for signal there although > > + * this loop could have taken significant amount of time... > > + */ > > + if (fatal_signal_pending(current)) { > > + status = -EINTR; > > + break; > > + } > > } while (iov_iter_count(i)); > > > > return written ? written : status; > > Will this permit the interruption of things like fsync() or sync()? If > so, considerable pondering is needed. No, fsync() or sync() are not influenced. > Also I worry about stuff like the use of buffered write to finish off > loose ends in direct-IO writing. Sometimes these writes MUST complete, > to prevent exposing unwritten disk blocks to a subsequent read. Will a > well-timed ^C break this? If "no" then does this change introduce risk > that we'll later accidentally introduce such a security hole? That's a good question! Well timed SIGKILL can stop buffered write completing bits of direct IO write. But looking into the code I fail to see where we rely on buffered write being completed. I find it somewhere on a boundary of a filesystem bug to return from direct IO with a potential of stale data exposure... For writes beyond i_size, we increase i_size only by the amount which was successfully written so there is no risk of exposure. Otherwise, direct IO would have to fill a hole to allocate a block - ext? filesystems, reiserfs, ocfs2, and other simple filesystems avoid doing that. XFS fills a hole but takes care to init everything. So I don't think anyone really expects buffered write to cover his back. To sum up the only result of interrupted buffered write after direct IO write seems to be that some other application could see only part of the write completed. That is possible in case of multipage buffered writes as well and e.g. with ext3/4 such situation is possible even without these patches (due to conditions like ENOSPC or due to a system crash - admittedly more serious conditions than someone sending SIGKILL). Thanks for your comments. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 0/2 v3] Make task in balance_dirty_pages() killable @ 2011-11-16 11:12 Jan Kara 2011-11-16 11:12 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 0 siblings, 1 reply; 39+ messages in thread From: Jan Kara @ 2011-11-16 11:12 UTC (permalink / raw) To: Wu Fengguang; +Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel Hello, since the discussion has calmed down, here is a third iteration of the patches. Fengguang, can you put them into your tree and merge them with Linus? Thanks. Changes since v2: * removed return value of balance_dirty_pages() since it didn't seem that useful after all. * kept fatal_signal_pending() check in generic_perform_write() since I don't think possible partial writes when app receives SIGKILL are a problem. Changes since v1: * slightly moved the check in balance_dirty_pages() as Fengguang requested * made balance_dirty_pages() return EINTR if fatal signal was detected * changed check for signal to check for fatal signal in generic_perform_write() to avoid unexpected results for userspace applications. Honza ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-16 11:12 [PATCH 0/2 v3] Make task in balance_dirty_pages() killable Jan Kara @ 2011-11-16 11:12 ` Jan Kara 2011-11-16 11:44 ` Wu Fengguang 0 siblings, 1 reply; 39+ messages in thread From: Jan Kara @ 2011-11-16 11:12 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel, Jan Kara Currently write(2) to a file is not interruptible by a signal. Sometimes this is desirable (e.g. when you want to quickly kill a process hogging your disk or when some process gets blocked in balance_dirty_pages() indefinitely due to a filesystem being in an error condition). Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Jan Kara <jack@suse.cz> --- mm/filemap.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index c0018f2..c106d3b 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, iov_iter_count(i)); again: - /* * Bring in the user page that we will copy from _first_. * Otherwise there's a nasty deadlock on copying from the @@ -2463,7 +2462,10 @@ again: written += copied; balance_dirty_pages_ratelimited(mapping); - + if (fatal_signal_pending(current)) { + status = -EINTR; + break; + } } while (iov_iter_count(i)); return written ? written : status; -- 1.7.1 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-16 11:12 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara @ 2011-11-16 11:44 ` Wu Fengguang 2011-11-16 12:54 ` Jan Kara 2011-11-22 22:28 ` Andrew Morton 0 siblings, 2 replies; 39+ messages in thread From: Wu Fengguang @ 2011-11-16 11:44 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org Jan, Due to the (very low) possibility of data loss by partial writes, IMHO it would safer to test this patch in linux-next until next merge window, would you agree? Pushing the first patch will address the main problem, anyway. Thanks, Fengguang On Wed, Nov 16, 2011 at 07:12:15PM +0800, Jan Kara wrote: > Currently write(2) to a file is not interruptible by a signal. Sometimes this > is desirable (e.g. when you want to quickly kill a process hogging your disk or > when some process gets blocked in balance_dirty_pages() indefinitely due to a > filesystem being in an error condition). > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > mm/filemap.c | 6 ++++-- > 1 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index c0018f2..c106d3b 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > iov_iter_count(i)); > > again: > - > /* > * Bring in the user page that we will copy from _first_. > * Otherwise there's a nasty deadlock on copying from the > @@ -2463,7 +2462,10 @@ again: > written += copied; > > balance_dirty_pages_ratelimited(mapping); > - > + if (fatal_signal_pending(current)) { > + status = -EINTR; > + break; > + } > } while (iov_iter_count(i)); > > return written ? written : status; > -- > 1.7.1 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-16 11:44 ` Wu Fengguang @ 2011-11-16 12:54 ` Jan Kara 2011-11-16 13:11 ` Wu Fengguang 2011-11-22 22:28 ` Andrew Morton 1 sibling, 1 reply; 39+ messages in thread From: Jan Kara @ 2011-11-16 12:54 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Wed 16-11-11 19:44:21, Wu Fengguang wrote: > Jan, > > Due to the (very low) possibility of data loss by partial writes, IMHO > it would safer to test this patch in linux-next until next merge window, > would you agree? Fine with me. Thanks. > Pushing the first patch will address the main problem, anyway. Hopefully, yes. Honza > On Wed, Nov 16, 2011 at 07:12:15PM +0800, Jan Kara wrote: > > Currently write(2) to a file is not interruptible by a signal. Sometimes this > > is desirable (e.g. when you want to quickly kill a process hogging your disk or > > when some process gets blocked in balance_dirty_pages() indefinitely due to a > > filesystem being in an error condition). > > > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > mm/filemap.c | 6 ++++-- > > 1 files changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index c0018f2..c106d3b 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > > iov_iter_count(i)); > > > > again: > > - > > /* > > * Bring in the user page that we will copy from _first_. > > * Otherwise there's a nasty deadlock on copying from the > > @@ -2463,7 +2462,10 @@ again: > > written += copied; > > > > balance_dirty_pages_ratelimited(mapping); > > - > > + if (fatal_signal_pending(current)) { > > + status = -EINTR; > > + break; > > + } > > } while (iov_iter_count(i)); > > > > return written ? written : status; > > -- > > 1.7.1 -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-16 12:54 ` Jan Kara @ 2011-11-16 13:11 ` Wu Fengguang 0 siblings, 0 replies; 39+ messages in thread From: Wu Fengguang @ 2011-11-16 13:11 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Kazuya Mio On Wed, Nov 16, 2011 at 08:54:45PM +0800, Jan Kara wrote: > On Wed 16-11-11 19:44:21, Wu Fengguang wrote: > > Jan, > > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > it would safer to test this patch in linux-next until next merge window, > > would you agree? > Fine with me. Thanks. Great. The two patches are now in: http://git.kernel.org/?p=linux/kernel/git/wfg/linux.git;a=shortlog;h=refs/heads/writeback-for-next When applying the patch I changed the title "... by a signal" to "... by SIGKILL" to reflect the updated patch content. Hopefully this is also what's in your mind. > > Pushing the first patch will address the main problem, anyway. > Hopefully, yes. I looked over Kazuya's posts in linux-ext4 and think the first patch alone does have good chance address the problem of busy looping in balance_dirty_pages() due to dirty pages never drop under fs error conditions. Thanks, Fengguang > > On Wed, Nov 16, 2011 at 07:12:15PM +0800, Jan Kara wrote: > > > Currently write(2) to a file is not interruptible by a signal. Sometimes this > > > is desirable (e.g. when you want to quickly kill a process hogging your disk or > > > when some process gets blocked in balance_dirty_pages() indefinitely due to a > > > filesystem being in an error condition). > > > > > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > > Signed-off-by: Jan Kara <jack@suse.cz> > > > --- > > > mm/filemap.c | 6 ++++-- > > > 1 files changed, 4 insertions(+), 2 deletions(-) > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > index c0018f2..c106d3b 100644 > > > --- a/mm/filemap.c > > > +++ b/mm/filemap.c > > > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > > > iov_iter_count(i)); > > > > > > again: > > > - > > > /* > > > * Bring in the user page that we will copy from _first_. > > > * Otherwise there's a nasty deadlock on copying from the > > > @@ -2463,7 +2462,10 @@ again: > > > written += copied; > > > > > > balance_dirty_pages_ratelimited(mapping); > > > - > > > + if (fatal_signal_pending(current)) { > > > + status = -EINTR; > > > + break; > > > + } > > > } while (iov_iter_count(i)); > > > > > > return written ? written : status; > > > -- > > > 1.7.1 > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-16 11:44 ` Wu Fengguang 2011-11-16 12:54 ` Jan Kara @ 2011-11-22 22:28 ` Andrew Morton 2011-11-23 9:05 ` Wu Fengguang 1 sibling, 1 reply; 39+ messages in thread From: Andrew Morton @ 2011-11-22 22:28 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Wed, 16 Nov 2011 19:44:21 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote: > Due to the (very low) possibility of data loss by partial writes, IMHO > it would safer to test this patch in linux-next until next merge window, Any such bugs will not be discovered in linux-next testing. The only way to find these things in a reasonable period of time is to go in and find them. For example, intensive fsx-linux testing with concurrent heavy memory pressure on various filesystems with various block sizes. And of course concurrent signalling. If you're talking about O_DIRECT then iirc I hacked support for that into fsx-linux. I think. Anyway, what _are_ the scenarios in which we think data can be lost? ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-22 22:28 ` Andrew Morton @ 2011-11-23 9:05 ` Wu Fengguang 2011-11-23 9:50 ` Andrew Morton 2011-11-23 13:08 ` Jan Kara 0 siblings, 2 replies; 39+ messages in thread From: Wu Fengguang @ 2011-11-23 9:05 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote: > On Wed, 16 Nov 2011 19:44:21 +0800 > Wu Fengguang <fengguang.wu@intel.com> wrote: > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > it would safer to test this patch in linux-next until next merge window, > > Any such bugs will not be discovered in linux-next testing. Yup, I'm afraid. > The only way to find these things in a reasonable period of time is to > go in and find them. For example, intensive fsx-linux testing with > concurrent heavy memory pressure on various filesystems with various > block sizes. And of course concurrent signalling. If you're talking > about O_DIRECT then iirc I hacked support for that into fsx-linux. I > think. How are we going to measure the success/failure? Check if it eventually resulted in filesystem corruption or whatever? When received SIGKILL, fsx-linux itself will just die. > Anyway, what _are_ the scenarios in which we think data can be lost? It's the vision that there may be partial writes on SIGKILL. Before patch, the write will either succeed as a whole or not started at all, depending on the timing of write/SIGKILL. This is kind of atomic operation. However now the write can be half done. If the application really cares about atomic behavior, it can do create-and-rename. However there are always the possibility of broken applications. Maybe this is not that big problem as SIGKILL is considered be to destructive already. Thanks, Fengguang ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-23 9:05 ` Wu Fengguang @ 2011-11-23 9:50 ` Andrew Morton 2011-11-23 13:08 ` Jan Kara 1 sibling, 0 replies; 39+ messages in thread From: Andrew Morton @ 2011-11-23 9:50 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Wed, 23 Nov 2011 17:05:33 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote: > On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote: > > On Wed, 16 Nov 2011 19:44:21 +0800 > > Wu Fengguang <fengguang.wu@intel.com> wrote: > > > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > > it would safer to test this patch in linux-next until next merge window, > > > > Any such bugs will not be discovered in linux-next testing. > > Yup, I'm afraid. > > > The only way to find these things in a reasonable period of time is to > > go in and find them. For example, intensive fsx-linux testing with > > concurrent heavy memory pressure on various filesystems with various > > block sizes. And of course concurrent signalling. If you're talking > > about O_DIRECT then iirc I hacked support for that into fsx-linux. I > > think. > > How are we going to measure the success/failure? Check if it > eventually resulted in filesystem corruption or whatever? yup. > When received SIGKILL, fsx-linux itself will just die. Well, there are ways of simulating its effect. For example, bale out of the write() every seventh time if current->comm=="fsx-linux". Or set a rearming timer which triggers a baled-out write. You'll work it out ;) > > Anyway, what _are_ the scenarios in which we think data can be lost? > > It's the vision that there may be partial writes on SIGKILL. Before > patch, the write will either succeed as a whole or not started at > all, depending on the timing of write/SIGKILL. This is kind of atomic > operation. However now the write can be half done. > > If the application really cares about atomic behavior, it can do > create-and-rename. However there are always the possibility of broken > applications. > > Maybe this is not that big problem as SIGKILL is considered be to > destructive already. Yeah, I have dim dark memories that there are subtle problems with interrupting write(). Linus might remember. Others might remember too, but we're only talking to fsdevel which nobody reads :( ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-23 9:05 ` Wu Fengguang 2011-11-23 9:50 ` Andrew Morton @ 2011-11-23 13:08 ` Jan Kara 2011-11-23 13:27 ` Wu Fengguang 1 sibling, 1 reply; 39+ messages in thread From: Jan Kara @ 2011-11-23 13:08 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Wed 23-11-11 17:05:33, Wu Fengguang wrote: > On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote: > > On Wed, 16 Nov 2011 19:44:21 +0800 > > Wu Fengguang <fengguang.wu@intel.com> wrote: > > > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > > it would safer to test this patch in linux-next until next merge window, > > > > Any such bugs will not be discovered in linux-next testing. > > Yup, I'm afraid. > > > The only way to find these things in a reasonable period of time is to > > go in and find them. For example, intensive fsx-linux testing with > > concurrent heavy memory pressure on various filesystems with various > > block sizes. And of course concurrent signalling. If you're talking > > about O_DIRECT then iirc I hacked support for that into fsx-linux. I > > think. > > How are we going to measure the success/failure? Check if it > eventually resulted in filesystem corruption or whatever? There are a few different questions: 1) Checking for filesystem corruption via fsck - I find such corruption caused by stopping write early extremely unlikely. 2) Checking that we do not expose uninitialized data after a partial (possibly DIRECT_IO) write - I did not find a place where that could happen but this would be worth testing. I think I can write a test for this if people are afraid of data exposure problems. 3) Is it acceptable for write(2) to be interrupted by SIGKILL in the middle? That obviously does happen with my patches so there's no reason to test that. The question is whether someone cares or not and that can be tested only by reality check :). Since the signal is SIGKILL, the process itself cannot notice the interrupted write but someone else can. But as I already said earlier, partial writes can already be observed when the machine crashes, filesystem is close to ENOSPC or so. Arguably these are more severe error conditions than application catching SIGKILL so my patch lowers the bar for observing partial writes. But I wouldn't like to throw away a sensible thing - allow SIGKILL to interrupt a system call - just because of fear of possibility some broken app could rely on this. Sure if the reality check shows there are such broken apps and users who care enough to report, then I have nothing against biting the bullet and reverting the change... Opinions? Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-23 13:08 ` Jan Kara @ 2011-11-23 13:27 ` Wu Fengguang 2011-11-23 15:06 ` Theodore Tso 0 siblings, 1 reply; 39+ messages in thread From: Wu Fengguang @ 2011-11-23 13:27 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Theodore Ts'o On Wed, Nov 23, 2011 at 09:08:03PM +0800, Jan Kara wrote: > On Wed 23-11-11 17:05:33, Wu Fengguang wrote: > > On Wed, Nov 23, 2011 at 06:28:05AM +0800, Andrew Morton wrote: > > > On Wed, 16 Nov 2011 19:44:21 +0800 > > > Wu Fengguang <fengguang.wu@intel.com> wrote: > > > > > > > Due to the (very low) possibility of data loss by partial writes, IMHO > > > > it would safer to test this patch in linux-next until next merge window, > > > > > > Any such bugs will not be discovered in linux-next testing. > > > > Yup, I'm afraid. > > > > > The only way to find these things in a reasonable period of time is to > > > go in and find them. For example, intensive fsx-linux testing with > > > concurrent heavy memory pressure on various filesystems with various > > > block sizes. And of course concurrent signalling. If you're talking > > > about O_DIRECT then iirc I hacked support for that into fsx-linux. I > > > think. > > > > How are we going to measure the success/failure? Check if it > > eventually resulted in filesystem corruption or whatever? > There are a few different questions: > 1) Checking for filesystem corruption via fsck - I find such corruption > caused by stopping write early extremely unlikely. Agreed. > 2) Checking that we do not expose uninitialized data after a partial > (possibly DIRECT_IO) write - I did not find a place where that could happen > but this would be worth testing. I think I can write a test for this if > people are afraid of data exposure problems. Do we already have such kind of tests in xfstests? If not, it sounds like a good gap to fill :-) > 3) Is it acceptable for write(2) to be interrupted by SIGKILL in the > middle? That obviously does happen with my patches so there's no reason > to test that. The question is whether someone cares or not and that can be > tested only by reality check :). Since the signal is SIGKILL, the process > itself cannot notice the interrupted write but someone else can. But as I > already said earlier, partial writes can already be observed when the > machine crashes, filesystem is close to ENOSPC or so. Arguably these are > more severe error conditions than application catching SIGKILL so my > patch lowers the bar for observing partial writes. But I wouldn't like to > throw away a sensible thing - allow SIGKILL to interrupt a system call - > just because of fear of possibility some broken app could rely on this. > Sure if the reality check shows there are such broken apps and users who > care enough to report, then I have nothing against biting the bullet > and reverting the change... Opinions? Reading Ted's information feed, I tend to disregard the partial write issue: since the "broken" applications will already fail and get punished in various other cases, I don't care adding one more penalty case to them :-P Thanks, Fengguang ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-23 13:27 ` Wu Fengguang @ 2011-11-23 15:06 ` Theodore Tso 2011-11-28 3:08 ` Wu Fengguang 0 siblings, 1 reply; 39+ messages in thread From: Theodore Tso @ 2011-11-23 15:06 UTC (permalink / raw) To: Wu Fengguang Cc: Theodore Tso, Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Nov 23, 2011, at 8:27 AM, Wu Fengguang wrote: > > Reading Ted's information feed, I tend to disregard the partial write > issue: since the "broken" applications will already fail and get > punished in various other cases, I don't care adding one more penalty > case to them :-P Just wait until you have a bunch of rabid application programmers, questioning your judgement, your morality, and even your paternity. :-) -- Ted ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-23 15:06 ` Theodore Tso @ 2011-11-28 3:08 ` Wu Fengguang 2011-11-29 14:16 ` Jan Kara 0 siblings, 1 reply; 39+ messages in thread From: Wu Fengguang @ 2011-11-28 3:08 UTC (permalink / raw) To: Theodore Tso Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Kazuya Mio, Dmitry Monakhov, Matthew Wilcox On Wed, Nov 23, 2011 at 11:06:56PM +0800, Theodore Ts'o wrote: > > On Nov 23, 2011, at 8:27 AM, Wu Fengguang wrote: > > > > > Reading Ted's information feed, I tend to disregard the partial write > > issue: since the "broken" applications will already fail and get > > punished in various other cases, I don't care adding one more penalty > > case to them :-P > > Just wait until you have a bunch of rabid application programmers, > questioning your judgement, your morality, and even your paternity. > :-) Ah OK, that sounds frightening. Hmm, till now every one have acknowledged the possibility of data corruption, only that people have different perceptions of the severeness. Let's rethink things this way: "Is it a _worthwhile_ risk at all?" I'm afraid not. Considering the origin of this patch [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages http://www.spinics.net/lists/linux-ext4/msg28464.html I *think* Jan's first patch is already enough for fixing the bug. IWO the patch we worried/discussed so much is really an optional one. I would imagine the easy and safe solution is to just drop it. Any objections? Thanks, Fengguang ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-28 3:08 ` Wu Fengguang @ 2011-11-29 14:16 ` Jan Kara 0 siblings, 0 replies; 39+ messages in thread From: Jan Kara @ 2011-11-29 14:16 UTC (permalink / raw) To: Wu Fengguang Cc: Theodore Tso, Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Kazuya Mio, Dmitry Monakhov, Matthew Wilcox On Mon 28-11-11 11:08:42, Wu Fengguang wrote: > On Wed, Nov 23, 2011 at 11:06:56PM +0800, Theodore Ts'o wrote: > > > > On Nov 23, 2011, at 8:27 AM, Wu Fengguang wrote: > > > > > > > > Reading Ted's information feed, I tend to disregard the partial write > > > issue: since the "broken" applications will already fail and get > > > punished in various other cases, I don't care adding one more penalty > > > case to them :-P > > > > Just wait until you have a bunch of rabid application programmers, > > questioning your judgement, your morality, and even your paternity. > > :-) > > Ah OK, that sounds frightening. Hmm, till now every one have > acknowledged the possibility of data corruption, only that > people have different perceptions of the severeness. > > Let's rethink things this way: "Is it a _worthwhile_ risk at all?" > I'm afraid not. Considering the origin of this patch > > [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages > http://www.spinics.net/lists/linux-ext4/msg28464.html > > I *think* Jan's first patch is already enough for fixing the bug. IWO > the patch we worried/discussed so much is really an optional one. I > would imagine the easy and safe solution is to just drop it. Any > objections? I still think aborting write is a cleaner solution. But I don't want to argue to death about this so if you still don't think it's a good idea, I'll respect your decision. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2011-11-29 14:16 UTC | newest] Thread overview: 39+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara 2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara 2011-11-14 12:12 ` Wu Fengguang 2011-11-14 12:37 ` Jan Kara 2011-11-14 11:10 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 2011-11-14 12:12 ` Matthew Wilcox 2011-11-14 12:15 ` Wu Fengguang 2011-11-14 12:34 ` Jan Kara 2011-11-14 14:16 ` Matthew Wilcox 2011-11-14 15:30 ` Jan Kara 2011-11-14 18:44 ` Jeremy Allison 2011-11-14 11:59 ` [PATCH 0/2] Make task doing heavy writing killable Wu Fengguang 2011-11-14 12:05 ` Christoph Hellwig 2011-11-14 12:24 ` Jan Kara 2011-11-14 12:29 ` Wu Fengguang 2011-11-14 12:41 ` Christoph Hellwig 2011-11-14 13:01 ` Wu Fengguang 2011-11-14 15:28 ` Jan Kara 2011-11-14 15:32 ` Christoph Hellwig 2011-11-14 16:19 ` Jan Kara 2011-11-14 12:12 ` Jan Kara -- strict thread matches above, loose matches on Subject: below -- 2011-11-14 16:15 [PATCH 0/2 v2] Make task in balance_dirty_pages() killable Jan Kara 2011-11-14 16:15 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 2011-11-14 16:26 ` Christoph Hellwig 2011-11-14 16:46 ` Jan Kara 2011-11-14 20:13 ` Christoph Hellwig 2011-11-14 22:19 ` Andrew Morton 2011-11-15 11:23 ` Jan Kara 2011-11-16 11:12 [PATCH 0/2 v3] Make task in balance_dirty_pages() killable Jan Kara 2011-11-16 11:12 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 2011-11-16 11:44 ` Wu Fengguang 2011-11-16 12:54 ` Jan Kara 2011-11-16 13:11 ` Wu Fengguang 2011-11-22 22:28 ` Andrew Morton 2011-11-23 9:05 ` Wu Fengguang 2011-11-23 9:50 ` Andrew Morton 2011-11-23 13:08 ` Jan Kara 2011-11-23 13:27 ` Wu Fengguang 2011-11-23 15:06 ` Theodore Tso 2011-11-28 3:08 ` Wu Fengguang 2011-11-29 14:16 ` Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).