* [PATCH 0/2 v2] Make task in balance_dirty_pages() killable @ 2011-11-14 16:15 Jan Kara 2011-11-14 16:15 ` [PATCH 1/2] mm: " Jan Kara 2011-11-14 16:15 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 0 siblings, 2 replies; 19+ messages in thread From: Jan Kara @ 2011-11-14 16:15 UTC (permalink / raw) To: Wu Fengguang; +Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel Hello, here is a second iteration of the patches. Changes since v1: * slightly moved the check in balance_dirty_pages() as Fengguang requested * made balance_dirty_pages() return EINTR if fatal signal was detected * changed check for signal to check for fatal signal in generic_perform_write() to avoid unexpected results for userspace applications. Honza ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 16:15 [PATCH 0/2 v2] Make task in balance_dirty_pages() killable Jan Kara @ 2011-11-14 16:15 ` Jan Kara 2011-11-14 16:23 ` Christoph Hellwig 2011-11-15 11:48 ` Wu Fengguang 2011-11-14 16:15 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 1 sibling, 2 replies; 19+ messages in thread From: Jan Kara @ 2011-11-14 16:15 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel, Jan Kara There is no reason why task in balance_dirty_pages() shouldn't be killable and it helps in recovering from some error conditions (like when filesystem goes in error state and cannot accept writeback anymore but we still want to kill processes using it to be able to unmount it). Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Jan Kara <jack@suse.cz> --- include/linux/writeback.h | 8 ++++---- mm/page-writeback.c | 27 +++++++++++++++++++-------- 2 files changed, 23 insertions(+), 12 deletions(-) diff --git a/include/linux/writeback.h b/include/linux/writeback.h index a378c29..eb3ecca 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -170,13 +170,13 @@ void __bdi_update_bandwidth(struct backing_dev_info *bdi, unsigned long start_time); void page_writeback_init(void); -void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, - unsigned long nr_pages_dirtied); +int balance_dirty_pages_ratelimited_nr(struct address_space *mapping, + unsigned long nr_pages_dirtied); -static inline void +static inline int balance_dirty_pages_ratelimited(struct address_space *mapping) { - balance_dirty_pages_ratelimited_nr(mapping, 1); + return balance_dirty_pages_ratelimited_nr(mapping, 1); } typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc, diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 0360d1b..380279c 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1000,8 +1000,11 @@ static unsigned long bdi_max_pause(struct backing_dev_info *bdi, * the caller to wait once crossing the (background_thresh + dirty_thresh) / 2. * If we're over `background_thresh' then the writeback threads are woken to * perform some writeout. + * + * We return 0 if nothing special happened, EINTR if our wait has been + * interrupted by a fatal signal. */ -static void balance_dirty_pages(struct address_space *mapping, +static int balance_dirty_pages(struct address_space *mapping, unsigned long pages_dirtied) { unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */ @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping, unsigned long pos_ratio; struct backing_dev_info *bdi = mapping->backing_dev_info; unsigned long start_time = jiffies; + int err = 0; for (;;) { /* @@ -1133,7 +1137,7 @@ pause: pages_dirtied, pause, start_time); - __set_current_state(TASK_UNINTERRUPTIBLE); + __set_current_state(TASK_KILLABLE); io_schedule_timeout(pause); dirty_thresh = hard_dirty_limit(dirty_thresh); @@ -1145,6 +1149,11 @@ pause: */ if (nr_dirty < dirty_thresh) break; + + if (fatal_signal_pending(current)) { + err = -EINTR; + break; + } } if (!dirty_exceeded && bdi->dirty_exceeded) @@ -1168,7 +1177,7 @@ pause: } if (writeback_in_progress(bdi)) - return; + return err; /* * In laptop mode, we wait until hitting the higher threshold before @@ -1179,10 +1188,11 @@ pause: * background_thresh, to keep the amount of dirty memory low. */ if (laptop_mode) - return; + return err; if (nr_reclaimable > background_thresh) bdi_start_background_writeback(bdi); + return err; } void set_page_dirty_balance(struct page *page, int page_mkwrite) @@ -1211,15 +1221,15 @@ static DEFINE_PER_CPU(int, bdp_ratelimits); * limit we decrease the ratelimiting by a lot, to prevent individual processes * from overshooting the limit by (ratelimit_pages) each. */ -void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, - unsigned long nr_pages_dirtied) +int balance_dirty_pages_ratelimited_nr(struct address_space *mapping, + unsigned long nr_pages_dirtied) { struct backing_dev_info *bdi = mapping->backing_dev_info; int ratelimit; int *p; if (!bdi_cap_account_dirty(bdi)) - return; + return 0; ratelimit = current->nr_dirtied_pause; if (bdi->dirty_exceeded) @@ -1247,7 +1257,8 @@ void balance_dirty_pages_ratelimited_nr(struct address_space *mapping, preempt_enable(); if (unlikely(current->nr_dirtied >= ratelimit)) - balance_dirty_pages(mapping, current->nr_dirtied); + return balance_dirty_pages(mapping, current->nr_dirtied); + return 0; } EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr); -- 1.7.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 16:15 ` [PATCH 1/2] mm: " Jan Kara @ 2011-11-14 16:23 ` Christoph Hellwig 2011-11-15 11:48 ` Wu Fengguang 1 sibling, 0 replies; 19+ messages in thread From: Christoph Hellwig @ 2011-11-14 16:23 UTC (permalink / raw) To: Jan Kara Cc: Wu Fengguang, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel On Mon, Nov 14, 2011 at 05:15:23PM +0100, Jan Kara wrote: > There is no reason why task in balance_dirty_pages() shouldn't be killable > and it helps in recovering from some error conditions (like when filesystem > goes in error state and cannot accept writeback anymore but we still want to > kill processes using it to be able to unmount it). Sorry for noticing this so late, but what about killing the balance_dirty_pages_ratelimited wrapper, and only exporting a version that takes a number of pages now that we have to change the signature anyway? I'd recommend to call the new one simply balance_dirty_pages_ratelimited, that will give a compile breakge to anyone using one of the old versions and thus cause them to actually adapt and hopefull check the return value. (and not for this series but soon later - we probably really should call into balance_dirty_pages in larger batch sizes in most places). ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 16:15 ` [PATCH 1/2] mm: " Jan Kara 2011-11-14 16:23 ` Christoph Hellwig @ 2011-11-15 11:48 ` Wu Fengguang 2011-11-15 13:41 ` Jan Kara 1 sibling, 1 reply; 19+ messages in thread From: Wu Fengguang @ 2011-11-15 11:48 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org > +static int balance_dirty_pages(struct address_space *mapping, > unsigned long pages_dirtied) > { > unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */ > @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping, > unsigned long pos_ratio; > struct backing_dev_info *bdi = mapping->backing_dev_info; > unsigned long start_time = jiffies; > + int err = 0; > > for (;;) { > /* > @@ -1133,7 +1137,7 @@ pause: > pages_dirtied, > pause, > start_time); > - __set_current_state(TASK_UNINTERRUPTIBLE); > + __set_current_state(TASK_KILLABLE); > io_schedule_timeout(pause); > > dirty_thresh = hard_dirty_limit(dirty_thresh); > @@ -1145,6 +1149,11 @@ pause: > */ > if (nr_dirty < dirty_thresh) > break; > + > + if (fatal_signal_pending(current)) { > + err = -EINTR; > + break; > + } > } The other alternative is to raise the limit on fatal_signal_pending: if (fatal_signal_pending(current) && nr_dirty < dirty_thresh + dirty_thresh / 2) break; That should work well enough in practice and avoids touching the fs code. Thanks, Fengguang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-15 11:48 ` Wu Fengguang @ 2011-11-15 13:41 ` Jan Kara 2011-11-15 14:15 ` Wu Fengguang 0 siblings, 1 reply; 19+ messages in thread From: Jan Kara @ 2011-11-15 13:41 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Tue 15-11-11 19:48:44, Wu Fengguang wrote: > > +static int balance_dirty_pages(struct address_space *mapping, > > unsigned long pages_dirtied) > > { > > unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */ > > @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping, > > unsigned long pos_ratio; > > struct backing_dev_info *bdi = mapping->backing_dev_info; > > unsigned long start_time = jiffies; > > + int err = 0; > > > > for (;;) { > > /* > > @@ -1133,7 +1137,7 @@ pause: > > pages_dirtied, > > pause, > > start_time); > > - __set_current_state(TASK_UNINTERRUPTIBLE); > > + __set_current_state(TASK_KILLABLE); > > io_schedule_timeout(pause); > > > > dirty_thresh = hard_dirty_limit(dirty_thresh); > > @@ -1145,6 +1149,11 @@ pause: > > */ > > if (nr_dirty < dirty_thresh) > > break; > > + > > + if (fatal_signal_pending(current)) { > > + err = -EINTR; > > + break; > > + } > > } > > The other alternative is to raise the limit on fatal_signal_pending: > > if (fatal_signal_pending(current) && > nr_dirty < dirty_thresh + dirty_thresh / 2) > break; > > That should work well enough in practice and avoids touching the fs code. Sorry, but I fail to see what would this bring us... Can you elaborate a bit please? Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-15 13:41 ` Jan Kara @ 2011-11-15 14:15 ` Wu Fengguang 2011-11-15 14:44 ` Jan Kara 0 siblings, 1 reply; 19+ messages in thread From: Wu Fengguang @ 2011-11-15 14:15 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Tue, Nov 15, 2011 at 09:41:27PM +0800, Jan Kara wrote: > On Tue 15-11-11 19:48:44, Wu Fengguang wrote: > > > +static int balance_dirty_pages(struct address_space *mapping, > > > unsigned long pages_dirtied) > > > { > > > unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */ > > > @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping, > > > unsigned long pos_ratio; > > > struct backing_dev_info *bdi = mapping->backing_dev_info; > > > unsigned long start_time = jiffies; > > > + int err = 0; > > > > > > for (;;) { > > > /* > > > @@ -1133,7 +1137,7 @@ pause: > > > pages_dirtied, > > > pause, > > > start_time); > > > - __set_current_state(TASK_UNINTERRUPTIBLE); > > > + __set_current_state(TASK_KILLABLE); > > > io_schedule_timeout(pause); > > > > > > dirty_thresh = hard_dirty_limit(dirty_thresh); > > > @@ -1145,6 +1149,11 @@ pause: > > > */ > > > if (nr_dirty < dirty_thresh) > > > break; > > > + > > > + if (fatal_signal_pending(current)) { > > > + err = -EINTR; > > > + break; > > > + } > > > } > > > > The other alternative is to raise the limit on fatal_signal_pending: > > > > if (fatal_signal_pending(current) && > > nr_dirty < dirty_thresh + dirty_thresh / 2) > > break; > > > > That should work well enough in practice and avoids touching the fs code. > Sorry, but I fail to see what would this bring us... Can you elaborate a > bit please? It's not bringing us something, but allows us to get rid of patch 2 as well as adding fatal_signal_pending() tests to all the other fs. So no more worries about partial writes :) Sure it leaves ->write_begin() blocks unhandled, however that should be much less a problem than the blocks inside balance_dirty_pages(). Thanks, Fengguang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-15 14:15 ` Wu Fengguang @ 2011-11-15 14:44 ` Jan Kara 0 siblings, 0 replies; 19+ messages in thread From: Jan Kara @ 2011-11-15 14:44 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org On Tue 15-11-11 22:15:28, Wu Fengguang wrote: > On Tue, Nov 15, 2011 at 09:41:27PM +0800, Jan Kara wrote: > > On Tue 15-11-11 19:48:44, Wu Fengguang wrote: > > > > +static int balance_dirty_pages(struct address_space *mapping, > > > > unsigned long pages_dirtied) > > > > { > > > > unsigned long nr_reclaimable; /* = file_dirty + unstable_nfs */ > > > > @@ -1020,6 +1023,7 @@ static void balance_dirty_pages(struct address_space *mapping, > > > > unsigned long pos_ratio; > > > > struct backing_dev_info *bdi = mapping->backing_dev_info; > > > > unsigned long start_time = jiffies; > > > > + int err = 0; > > > > > > > > for (;;) { > > > > /* > > > > @@ -1133,7 +1137,7 @@ pause: > > > > pages_dirtied, > > > > pause, > > > > start_time); > > > > - __set_current_state(TASK_UNINTERRUPTIBLE); > > > > + __set_current_state(TASK_KILLABLE); > > > > io_schedule_timeout(pause); > > > > > > > > dirty_thresh = hard_dirty_limit(dirty_thresh); > > > > @@ -1145,6 +1149,11 @@ pause: > > > > */ > > > > if (nr_dirty < dirty_thresh) > > > > break; > > > > + > > > > + if (fatal_signal_pending(current)) { > > > > + err = -EINTR; > > > > + break; > > > > + } > > > > } > > > > > > The other alternative is to raise the limit on fatal_signal_pending: > > > > > > if (fatal_signal_pending(current) && > > > nr_dirty < dirty_thresh + dirty_thresh / 2) > > > break; > > > > > > That should work well enough in practice and avoids touching the fs code. > > Sorry, but I fail to see what would this bring us... Can you elaborate a > > bit please? > > It's not bringing us something, but allows us to get rid of patch 2 > as well as adding fatal_signal_pending() tests to all the other fs. > > So no more worries about partial writes :) > > Sure it leaves ->write_begin() blocks unhandled, however that should > be much less a problem than the blocks inside balance_dirty_pages(). Ah, OK, I see. Frankly, I'd rather keep the loop break in generic_perform_write() and have the code straightforward. I don't see any security (data exposure) problem there and if someone complains on the grounds that the behavior has changed, we can revert the change in the worst case. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 16:15 [PATCH 0/2 v2] Make task in balance_dirty_pages() killable Jan Kara 2011-11-14 16:15 ` [PATCH 1/2] mm: " Jan Kara @ 2011-11-14 16:15 ` Jan Kara 2011-11-14 16:26 ` Christoph Hellwig 2011-11-14 22:19 ` Andrew Morton 1 sibling, 2 replies; 19+ messages in thread From: Jan Kara @ 2011-11-14 16:15 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel, Jan Kara Currently write(2) to a file is not interruptible by a signal. Sometimes this is desirable (e.g. when you want to quickly kill a process hogging your disk or when some process gets blocked in balance_dirty_pages() indefinitely due to a filesystem being in an error condition). Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Jan Kara <jack@suse.cz> --- mm/filemap.c | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index c0018f2..166b30e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, iov_iter_count(i)); again: - /* * Bring in the user page that we will copy from _first_. * Otherwise there's a nasty deadlock on copying from the @@ -2463,7 +2462,15 @@ again: written += copied; balance_dirty_pages_ratelimited(mapping); - + /* + * We check the signal independently of balance_dirty_pages() + * because we need not wait and check for signal there although + * this loop could have taken significant amount of time... + */ + if (fatal_signal_pending(current)) { + status = -EINTR; + break; + } } while (iov_iter_count(i)); return written ? written : status; -- 1.7.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 16:15 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara @ 2011-11-14 16:26 ` Christoph Hellwig 2011-11-14 16:46 ` Jan Kara 2011-11-14 22:19 ` Andrew Morton 1 sibling, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2011-11-14 16:26 UTC (permalink / raw) To: Jan Kara Cc: Wu Fengguang, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel On Mon, Nov 14, 2011 at 05:15:24PM +0100, Jan Kara wrote: > Currently write(2) to a file is not interruptible by a signal. Sometimes this > is desirable (e.g. when you want to quickly kill a process hogging your disk or > when some process gets blocked in balance_dirty_pages() indefinitely due to a > filesystem being in an error condition). > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > mm/filemap.c | 11 +++++++++-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index c0018f2..166b30e 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > iov_iter_count(i)); > > again: > - > /* > * Bring in the user page that we will copy from _first_. > * Otherwise there's a nasty deadlock on copying from the > @@ -2463,7 +2462,15 @@ again: > written += copied; > > balance_dirty_pages_ratelimited(mapping); > - > + /* > + * We check the signal independently of balance_dirty_pages() > + * because we need not wait and check for signal there although > + * this loop could have taken significant amount of time... > + */ > + if (fatal_signal_pending(current)) { > + status = -EINTR; > + break; > + } Hmm. If we need to check again every time adding the return value to balance_dirty_pages is rather pointless. I have a bit of a problem parsing the comment - does it try to say that we might spend too much time after the fatal_signal_pending in the balance_dirty_pages code so that we have to check it again? Why not repeat the check at the end of balance_dirty_pages_ratelimited and thus avoid having to duplicate the thing in all callers? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 16:26 ` Christoph Hellwig @ 2011-11-14 16:46 ` Jan Kara 2011-11-14 20:13 ` Christoph Hellwig 0 siblings, 1 reply; 19+ messages in thread From: Jan Kara @ 2011-11-14 16:46 UTC (permalink / raw) To: Christoph Hellwig Cc: Jan Kara, Wu Fengguang, Andrew Morton, Al Viro, linux-fsdevel On Mon 14-11-11 11:26:00, Christoph Hellwig wrote: > On Mon, Nov 14, 2011 at 05:15:24PM +0100, Jan Kara wrote: > > Currently write(2) to a file is not interruptible by a signal. Sometimes this > > is desirable (e.g. when you want to quickly kill a process hogging your disk or > > when some process gets blocked in balance_dirty_pages() indefinitely due to a > > filesystem being in an error condition). > > > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > mm/filemap.c | 11 +++++++++-- > > 1 files changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index c0018f2..166b30e 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > > iov_iter_count(i)); > > > > again: > > - > > /* > > * Bring in the user page that we will copy from _first_. > > * Otherwise there's a nasty deadlock on copying from the > > @@ -2463,7 +2462,15 @@ again: > > written += copied; > > > > balance_dirty_pages_ratelimited(mapping); > > - > > + /* > > + * We check the signal independently of balance_dirty_pages() > > + * because we need not wait and check for signal there although > > + * this loop could have taken significant amount of time... > > + */ > > + if (fatal_signal_pending(current)) { > > + status = -EINTR; > > + break; > > + } > > Hmm. If we need to check again every time adding the return value to > balance_dirty_pages is rather pointless. > > I have a bit of a problem parsing the comment - does it try to say that > we might spend too much time after the fatal_signal_pending in the > balance_dirty_pages code so that we have to check it again? Why not > repeat the check at the end of balance_dirty_pages_ratelimited and thus > avoid having to duplicate the thing in all callers? The point I was trying to get across is that all: iov_iter_fault_in_readable() ->write_begin() ... ->write_end() can take a rather long time. Considering that balance_dirty_pages_ratelimited() needn't call balance_dirty_pages() or balance_dirty_pages() can just return without doing anything because there aren't many dirty pages, the end result is that balance_dirty_pages_ratelimited() won't return EINTR for rather long time although there is a signal pending. That's why I want to check for signal pending directly in the loop in generic_perform_write(). There can be loops where the only blocking point is in balance_dirty_pages() and then using the return value makes sense but I think such loops would be in minority so maybe the return value doesn't make much sense after all. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 16:46 ` Jan Kara @ 2011-11-14 20:13 ` Christoph Hellwig 0 siblings, 0 replies; 19+ messages in thread From: Christoph Hellwig @ 2011-11-14 20:13 UTC (permalink / raw) To: Jan Kara Cc: Christoph Hellwig, Wu Fengguang, Andrew Morton, Al Viro, linux-fsdevel On Mon, Nov 14, 2011 at 05:46:47PM +0100, Jan Kara wrote: > There can be loops where the only blocking point is in > balance_dirty_pages() and then using the return value makes sense but > I think such loops would be in minority so maybe the return value doesn't > make much sense after all. Yes, maybe the return value isn't that smart. Sorry for all the noise. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 16:15 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 2011-11-14 16:26 ` Christoph Hellwig @ 2011-11-14 22:19 ` Andrew Morton 2011-11-15 11:23 ` Jan Kara 1 sibling, 1 reply; 19+ messages in thread From: Andrew Morton @ 2011-11-14 22:19 UTC (permalink / raw) To: Jan Kara; +Cc: Wu Fengguang, Christoph Hellwig, Al Viro, linux-fsdevel On Mon, 14 Nov 2011 17:15:24 +0100 Jan Kara <jack@suse.cz> wrote: > Currently write(2) to a file is not interruptible by a signal. Sometimes this > is desirable (e.g. when you want to quickly kill a process hogging your disk or > when some process gets blocked in balance_dirty_pages() indefinitely due to a > filesystem being in an error condition). > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > mm/filemap.c | 11 +++++++++-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index c0018f2..166b30e 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > iov_iter_count(i)); > > again: > - > /* > * Bring in the user page that we will copy from _first_. > * Otherwise there's a nasty deadlock on copying from the > @@ -2463,7 +2462,15 @@ again: > written += copied; > > balance_dirty_pages_ratelimited(mapping); > - > + /* > + * We check the signal independently of balance_dirty_pages() > + * because we need not wait and check for signal there although > + * this loop could have taken significant amount of time... > + */ > + if (fatal_signal_pending(current)) { > + status = -EINTR; > + break; > + } > } while (iov_iter_count(i)); > > return written ? written : status; Will this permit the interruption of things like fsync() or sync()? If so, considerable pondering is needed. Also I worry about stuff like the use of buffered write to finish off loose ends in direct-IO writing. Sometimes these writes MUST complete, to prevent exposing unwritten disk blocks to a subsequent read. Will a well-timed ^C break this? If "no" then does this change introduce risk that we'll later accidentally introduce such a security hole? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2] fs: Make write(2) interruptible by a signal 2011-11-14 22:19 ` Andrew Morton @ 2011-11-15 11:23 ` Jan Kara 0 siblings, 0 replies; 19+ messages in thread From: Jan Kara @ 2011-11-15 11:23 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Wu Fengguang, Christoph Hellwig, Al Viro, linux-fsdevel On Mon 14-11-11 14:19:54, Andrew Morton wrote: > On Mon, 14 Nov 2011 17:15:24 +0100 > Jan Kara <jack@suse.cz> wrote: > > > Currently write(2) to a file is not interruptible by a signal. Sometimes this > > is desirable (e.g. when you want to quickly kill a process hogging your disk or > > when some process gets blocked in balance_dirty_pages() indefinitely due to a > > filesystem being in an error condition). > > > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > mm/filemap.c | 11 +++++++++-- > > 1 files changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index c0018f2..166b30e 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file, > > iov_iter_count(i)); > > > > again: > > - > > /* > > * Bring in the user page that we will copy from _first_. > > * Otherwise there's a nasty deadlock on copying from the > > @@ -2463,7 +2462,15 @@ again: > > written += copied; > > > > balance_dirty_pages_ratelimited(mapping); > > - > > + /* > > + * We check the signal independently of balance_dirty_pages() > > + * because we need not wait and check for signal there although > > + * this loop could have taken significant amount of time... > > + */ > > + if (fatal_signal_pending(current)) { > > + status = -EINTR; > > + break; > > + } > > } while (iov_iter_count(i)); > > > > return written ? written : status; > > Will this permit the interruption of things like fsync() or sync()? If > so, considerable pondering is needed. No, fsync() or sync() are not influenced. > Also I worry about stuff like the use of buffered write to finish off > loose ends in direct-IO writing. Sometimes these writes MUST complete, > to prevent exposing unwritten disk blocks to a subsequent read. Will a > well-timed ^C break this? If "no" then does this change introduce risk > that we'll later accidentally introduce such a security hole? That's a good question! Well timed SIGKILL can stop buffered write completing bits of direct IO write. But looking into the code I fail to see where we rely on buffered write being completed. I find it somewhere on a boundary of a filesystem bug to return from direct IO with a potential of stale data exposure... For writes beyond i_size, we increase i_size only by the amount which was successfully written so there is no risk of exposure. Otherwise, direct IO would have to fill a hole to allocate a block - ext? filesystems, reiserfs, ocfs2, and other simple filesystems avoid doing that. XFS fills a hole but takes care to init everything. So I don't think anyone really expects buffered write to cover his back. To sum up the only result of interrupted buffered write after direct IO write seems to be that some other application could see only part of the write completed. That is possible in case of multipage buffered writes as well and e.g. with ext3/4 such situation is possible even without these patches (due to conditions like ENOSPC or due to a system crash - admittedly more serious conditions than someone sending SIGKILL). Thanks for your comments. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 0/2 v3] Make task in balance_dirty_pages() killable @ 2011-11-16 11:12 Jan Kara 2011-11-16 11:12 ` [PATCH 1/2] mm: " Jan Kara 0 siblings, 1 reply; 19+ messages in thread From: Jan Kara @ 2011-11-16 11:12 UTC (permalink / raw) To: Wu Fengguang; +Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel Hello, since the discussion has calmed down, here is a third iteration of the patches. Fengguang, can you put them into your tree and merge them with Linus? Thanks. Changes since v2: * removed return value of balance_dirty_pages() since it didn't seem that useful after all. * kept fatal_signal_pending() check in generic_perform_write() since I don't think possible partial writes when app receives SIGKILL are a problem. Changes since v1: * slightly moved the check in balance_dirty_pages() as Fengguang requested * made balance_dirty_pages() return EINTR if fatal signal was detected * changed check for signal to check for fatal signal in generic_perform_write() to avoid unexpected results for userspace applications. Honza ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-16 11:12 [PATCH 0/2 v3] Make task in balance_dirty_pages() killable Jan Kara @ 2011-11-16 11:12 ` Jan Kara 2011-11-16 11:28 ` Wu Fengguang 0 siblings, 1 reply; 19+ messages in thread From: Jan Kara @ 2011-11-16 11:12 UTC (permalink / raw) To: Wu Fengguang Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel, Jan Kara There is no reason why task in balance_dirty_pages() shouldn't be killable and it helps in recovering from some error conditions (like when filesystem goes in error state and cannot accept writeback anymore but we still want to kill processes using it to be able to unmount it). Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Jan Kara <jack@suse.cz> --- mm/page-writeback.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 0360d1b..39f5805 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1133,7 +1133,7 @@ pause: pages_dirtied, pause, start_time); - __set_current_state(TASK_UNINTERRUPTIBLE); + __set_current_state(TASK_KILLABLE); io_schedule_timeout(pause); dirty_thresh = hard_dirty_limit(dirty_thresh); @@ -1145,6 +1145,9 @@ pause: */ if (nr_dirty < dirty_thresh) break; + + if (fatal_signal_pending(current)) + break; } if (!dirty_exceeded && bdi->dirty_exceeded) -- 1.7.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-16 11:12 ` [PATCH 1/2] mm: " Jan Kara @ 2011-11-16 11:28 ` Wu Fengguang 2011-11-16 12:58 ` Jan Kara 0 siblings, 1 reply; 19+ messages in thread From: Wu Fengguang @ 2011-11-16 11:28 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Neil Brown, KOSAKI Motohiro, Anton Altaparmakov On Wed, Nov 16, 2011 at 07:12:14PM +0800, Jan Kara wrote: > There is no reason why task in balance_dirty_pages() shouldn't be killable > and it helps in recovering from some error conditions (like when filesystem > goes in error state and cannot accept writeback anymore but we still want to > kill processes using it to be able to unmount it). Here I'd like to add a paragraph from my original patch: The follow up patches will further abort the generic_perform_write() and other filesystem write loops, to avoid large write + SIGKILL combination exceeding the dirty limit and possibly strange OOM. Thanks, Fengguang > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > mm/page-writeback.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 0360d1b..39f5805 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1133,7 +1133,7 @@ pause: > pages_dirtied, > pause, > start_time); > - __set_current_state(TASK_UNINTERRUPTIBLE); > + __set_current_state(TASK_KILLABLE); > io_schedule_timeout(pause); > > dirty_thresh = hard_dirty_limit(dirty_thresh); > @@ -1145,6 +1145,9 @@ pause: > */ > if (nr_dirty < dirty_thresh) > break; > + > + if (fatal_signal_pending(current)) > + break; > } > > if (!dirty_exceeded && bdi->dirty_exceeded) > -- > 1.7.1 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-16 11:28 ` Wu Fengguang @ 2011-11-16 12:58 ` Jan Kara 0 siblings, 0 replies; 19+ messages in thread From: Jan Kara @ 2011-11-16 12:58 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, Andrew Morton, Christoph Hellwig, Al Viro, linux-fsdevel@vger.kernel.org, Neil Brown, KOSAKI Motohiro, Anton Altaparmakov On Wed 16-11-11 19:28:19, Wu Fengguang wrote: > On Wed, Nov 16, 2011 at 07:12:14PM +0800, Jan Kara wrote: > > There is no reason why task in balance_dirty_pages() shouldn't be killable > > and it helps in recovering from some error conditions (like when filesystem > > goes in error state and cannot accept writeback anymore but we still want to > > kill processes using it to be able to unmount it). > > Here I'd like to add a paragraph from my original patch: > > The follow up patches will further abort the generic_perform_write() and > other filesystem write loops, to avoid large write + SIGKILL combination > exceeding the dirty limit and possibly strange OOM. Sure, feel free to add the paragraph to the changelog. Honza > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > mm/page-writeback.c | 5 ++++- > > 1 files changed, 4 insertions(+), 1 deletions(-) > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index 0360d1b..39f5805 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -1133,7 +1133,7 @@ pause: > > pages_dirtied, > > pause, > > start_time); > > - __set_current_state(TASK_UNINTERRUPTIBLE); > > + __set_current_state(TASK_KILLABLE); > > io_schedule_timeout(pause); > > > > dirty_thresh = hard_dirty_limit(dirty_thresh); > > @@ -1145,6 +1145,9 @@ pause: > > */ > > if (nr_dirty < dirty_thresh) > > break; > > + > > + if (fatal_signal_pending(current)) > > + break; > > } > > > > if (!dirty_exceeded && bdi->dirty_exceeded) > > -- > > 1.7.1 -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 0/2] Make task doing heavy writing killable @ 2011-11-14 11:10 Jan Kara 2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara 0 siblings, 1 reply; 19+ messages in thread From: Jan Kara @ 2011-11-14 11:10 UTC (permalink / raw) To: linux-fsdevel Cc: Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig Hello, these two patches aim at making task waiting in balance_dirty_pages() killable. This is desirable because otherwise if filesystem stops accepting writes (e.g. if device has been removed or other serious error condidion) we have a task stuck in D state forever. I'm not sure who should merge these two patches... Al, Fengguang? Honza ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara @ 2011-11-14 11:10 ` Jan Kara 2011-11-14 12:12 ` Wu Fengguang 0 siblings, 1 reply; 19+ messages in thread From: Jan Kara @ 2011-11-14 11:10 UTC (permalink / raw) To: linux-fsdevel Cc: Wu Fengguang, Al Viro, k-mio, Andrew Morton, Christoph Hellwig, Jan Kara There is no reason why task in balance_dirty_pages() shouldn't be killable and it helps in recovering from some error conditions (like when filesystem goes in error state and cannot accept writeback anymore but we still want to kill processes using it to be able to unmount it). Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Jan Kara <jack@suse.cz> --- mm/page-writeback.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 0360d1b..e83c286 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1133,8 +1133,10 @@ pause: pages_dirtied, pause, start_time); - __set_current_state(TASK_UNINTERRUPTIBLE); + __set_current_state(TASK_KILLABLE); io_schedule_timeout(pause); + if (fatal_signal_pending(current)) + break; dirty_thresh = hard_dirty_limit(dirty_thresh); /* -- 1.7.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara @ 2011-11-14 12:12 ` Wu Fengguang 2011-11-14 12:37 ` Jan Kara 0 siblings, 1 reply; 19+ messages in thread From: Wu Fengguang @ 2011-11-14 12:12 UTC (permalink / raw) To: Jan Kara Cc: linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig On Mon, Nov 14, 2011 at 07:10:29PM +0800, Jan Kara wrote: > There is no reason why task in balance_dirty_pages() shouldn't be killable > and it helps in recovering from some error conditions (like when filesystem > goes in error state and cannot accept writeback anymore but we still want to > kill processes using it to be able to unmount it). > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > mm/page-writeback.c | 4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 0360d1b..e83c286 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1133,8 +1133,10 @@ pause: > pages_dirtied, > pause, > start_time); > - __set_current_state(TASK_UNINTERRUPTIBLE); > + __set_current_state(TASK_KILLABLE); Dumb question: will the task continue to show up as 'D' in ps? > io_schedule_timeout(pause); > + if (fatal_signal_pending(current)) > + break; I'd like to move that several lines below, to the end of the loop, so that the condition is not tested at all in normal executions. > dirty_thresh = hard_dirty_limit(dirty_thresh); > /* > -- > 1.7.1 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] mm: Make task in balance_dirty_pages() killable 2011-11-14 12:12 ` Wu Fengguang @ 2011-11-14 12:37 ` Jan Kara 0 siblings, 0 replies; 19+ messages in thread From: Jan Kara @ 2011-11-14 12:37 UTC (permalink / raw) To: Wu Fengguang Cc: Jan Kara, linux-fsdevel@vger.kernel.org, Al Viro, k-mio@sx.jp.nec.com, Andrew Morton, Christoph Hellwig On Mon 14-11-11 20:12:45, Wu Fengguang wrote: > On Mon, Nov 14, 2011 at 07:10:29PM +0800, Jan Kara wrote: > > There is no reason why task in balance_dirty_pages() shouldn't be killable > > and it helps in recovering from some error conditions (like when filesystem > > goes in error state and cannot accept writeback anymore but we still want to > > kill processes using it to be able to unmount it). > > > > Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > mm/page-writeback.c | 4 +++- > > 1 files changed, 3 insertions(+), 1 deletions(-) > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > > index 0360d1b..e83c286 100644 > > --- a/mm/page-writeback.c > > +++ b/mm/page-writeback.c > > @@ -1133,8 +1133,10 @@ pause: > > pages_dirtied, > > pause, > > start_time); > > - __set_current_state(TASK_UNINTERRUPTIBLE); > > + __set_current_state(TASK_KILLABLE); > > Dumb question: will the task continue to show up as 'D' in ps? Yes, it will. The only difference is that the task will be woken when SIGKILL arrives. > > io_schedule_timeout(pause); > > + if (fatal_signal_pending(current)) > > + break; > > I'd like to move that several lines below, to the end of the loop, > so that the condition is not tested at all in normal executions. OK, will do that. > > dirty_thresh = hard_dirty_limit(dirty_thresh); > > /* Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2011-11-16 12:58 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-11-14 16:15 [PATCH 0/2 v2] Make task in balance_dirty_pages() killable Jan Kara 2011-11-14 16:15 ` [PATCH 1/2] mm: " Jan Kara 2011-11-14 16:23 ` Christoph Hellwig 2011-11-15 11:48 ` Wu Fengguang 2011-11-15 13:41 ` Jan Kara 2011-11-15 14:15 ` Wu Fengguang 2011-11-15 14:44 ` Jan Kara 2011-11-14 16:15 ` [PATCH 2/2] fs: Make write(2) interruptible by a signal Jan Kara 2011-11-14 16:26 ` Christoph Hellwig 2011-11-14 16:46 ` Jan Kara 2011-11-14 20:13 ` Christoph Hellwig 2011-11-14 22:19 ` Andrew Morton 2011-11-15 11:23 ` Jan Kara -- strict thread matches above, loose matches on Subject: below -- 2011-11-16 11:12 [PATCH 0/2 v3] Make task in balance_dirty_pages() killable Jan Kara 2011-11-16 11:12 ` [PATCH 1/2] mm: " Jan Kara 2011-11-16 11:28 ` Wu Fengguang 2011-11-16 12:58 ` Jan Kara 2011-11-14 11:10 [PATCH 0/2] Make task doing heavy writing killable Jan Kara 2011-11-14 11:10 ` [PATCH 1/2] mm: Make task in balance_dirty_pages() killable Jan Kara 2011-11-14 12:12 ` Wu Fengguang 2011-11-14 12:37 ` Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).