[PATCH 0/3 v2] Three writeback fixes to stop sync(1) livelocks

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3 v2] Three writeback fixes to stop sync(1) livelocks
@ 2010-08-05 18:53 Jan Kara
  2010-08-05 18:53 ` [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread Jan Kara
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Jan Kara @ 2010-08-05 18:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, hch, jaxboe, Wu Fengguang

  Hi,

  attached is a second version of my patch set to modify writeback code
so that we avoid livelocks of sync(1) during usual load (such as copying
trees etc.). I have just updated changelogs to reflect my discussions
with Christoph since the last version.
  Christoph, are you OK with the patches after more explanation?
Andrew, could you merge the patches if Christoph does not object? Thanks.

								Honza

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread
  2010-08-05 18:53 [PATCH 0/3 v2] Three writeback fixes to stop sync(1) livelocks Jan Kara
@ 2010-08-05 18:53 ` Jan Kara
  2010-08-05 19:38   ` Jens Axboe
  2010-08-05 23:45   ` Andrew Morton
  2010-08-05 18:53 ` [PATCH 2/3 v2] mm: Fix writeback_in_progress() Jan Kara
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 22+ messages in thread
From: Jan Kara @ 2010-08-05 18:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, hch, jaxboe, Wu Fengguang, Jan Kara

Background writeback and kupdate-style writeback are easily livelockable (from
a definition of their target). This is inconvenient because it can make sync(1)
stall forever waiting on its queued work to be finished.  Generally, if someone
has a particular requirement for writeback he needs, it makes sense to give it
preference over a generic background dirty page cleaning. As soon as that work
is done, flusher thread will return back to background cleaning if it is
needed. So lets just interrupt background and kupdate writeback if there is
some other work to do to fix the livelocking problem.

CC: hch@infradead.org
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index d5be169..542471e 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb,
 			break;
 
 		/*
+		 * Background writeout and kupdate-style writeback are
+		 * easily livelockable. Stop them if there is other work
+		 * to do so that e.g. sync can proceed.
+		 */
+		if ((work->for_background || work->for_kupdate) &&
+		    !list_empty(&wb->bdi->work_list))
+			break;
+		/*
 		 * For background writeout, stop when we are below the
 		 * background dirty threshold
 		 */
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3 v2] mm: Stop background writeback if there is other work  queued for the thread
  2010-08-05 18:53 ` [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread Jan Kara
@ 2010-08-05 19:38   ` Jens Axboe
  2010-08-05 23:45   ` Andrew Morton
  1 sibling, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2010-08-05 19:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, linux-fsdevel@vger.kernel.org, hch@infradead.org,
	Wu Fengguang

On 08/05/2010 08:53 PM, Jan Kara wrote:
> Background writeback and kupdate-style writeback are easily livelockable (from
> a definition of their target). This is inconvenient because it can make sync(1)
> stall forever waiting on its queued work to be finished.  Generally, if someone
> has a particular requirement for writeback he needs, it makes sense to give it
> preference over a generic background dirty page cleaning. As soon as that work
> is done, flusher thread will return back to background cleaning if it is
> needed. So lets just interrupt background and kupdate writeback if there is
> some other work to do to fix the livelocking problem.

Looks good to me.

Acked-by: Jens Axboe <jaxboe@fusionio.com>

-- 
Jens Axboe

Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread
  2010-08-05 18:53 ` [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread Jan Kara
  2010-08-05 19:38   ` Jens Axboe
@ 2010-08-05 23:45   ` Andrew Morton
  2010-08-07 16:04     ` Wu Fengguang
                       ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Andrew Morton @ 2010-08-05 23:45 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, hch, jaxboe, Wu Fengguang

On Thu,  5 Aug 2010 20:53:17 +0200
Jan Kara <jack@suse.cz> wrote:

> Background writeback and kupdate-style writeback

What the heck's the difference between "Background writeback" and
"kupdate-style" writeback?  afacit "background" means "not due to
kupdate, but due to a vmscan poke or something like that".  But the
terms aren't defined anywhere and the wb_writeback_work fields are
uncommented and the functions are undocumented and no wonder we keep
making such a mess of this code.

> are easily livelockable (from
> a definition of their target).

Please fully describe the livelock scenario(s).

> This is inconvenient because it can make sync(1)
> stall forever waiting on its queued work to be finished.

And please fully describe the reason for the stall of sync(1).


Because if these things _are_ described then others are in a better
position to review your proposed fix and they are in a better position
to propose alternative fixes, no?

>  Generally, if someone
> has a particular requirement for writeback he needs, it makes sense to give it
> preference over a generic background dirty page cleaning. As soon as that work
> is done, flusher thread will return back to background cleaning if it is
> needed. So lets just interrupt background and kupdate writeback if there is
> some other work to do to fix the livelocking problem.
> 
> CC: hch@infradead.org
> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/fs-writeback.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index d5be169..542471e 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb,
>  			break;
>  
>  		/*
> +		 * Background writeout and kupdate-style writeback are
> +		 * easily livelockable. Stop them if there is other work
> +		 * to do so that e.g. sync can proceed.
> +		 */
> +		if ((work->for_background || work->for_kupdate) &&
> +		    !list_empty(&wb->bdi->work_list))
> +			break;
> +		/*

So what happens if an application sits in a loop doing write&fsync to a
file?  The vm's call for help gets ignored and your data doesn't get
written back for three days??

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread
  2010-08-05 23:45   ` Andrew Morton
@ 2010-08-07 16:04     ` Wu Fengguang
  2010-08-08  2:43     ` Jan Kara
  2010-08-08  4:12     ` Dave Chinner
  2 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2010-08-07 16:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, linux-fsdevel@vger.kernel.org, hch@infradead.org,
	jaxboe@fusionio.com

On Fri, Aug 06, 2010 at 07:45:35AM +0800, Andrew Morton wrote:
> On Thu,  5 Aug 2010 20:53:17 +0200
> Jan Kara <jack@suse.cz> wrote:
> 
> > Background writeback and kupdate-style writeback
> 
> What the heck's the difference between "Background writeback" and
> "kupdate-style" writeback?  afacit "background" means "not due to
> kupdate, but due to a vmscan poke or something like that".  But the
> terms aren't defined anywhere and the wb_writeback_work fields are
> uncommented and the functions are undocumented and no wonder we keep
> making such a mess of this code.

I happen to describe these terms in another email :)

...there are always four main writeback goals/semantics:
- periodic      stop when all 30s-old inodes are written
- background    stop when background threshold is reached
- nr_pages      stop when nr_pages written (or when all clean)
- sync          stop when all older-than-sync-time inodes/pages are written

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread
  2010-08-05 23:45   ` Andrew Morton
  2010-08-07 16:04     ` Wu Fengguang
@ 2010-08-08  2:43     ` Jan Kara
  2010-08-08  3:10       ` Wu Fengguang
  2010-08-08  4:12     ` Dave Chinner
  2 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2010-08-08  2:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, linux-fsdevel, hch, jaxboe, Wu Fengguang

On Thu 05-08-10 16:45:35, Andrew Morton wrote:
> On Thu,  5 Aug 2010 20:53:17 +0200
> Jan Kara <jack@suse.cz> wrote:
> > Background writeback and kupdate-style writeback
> 
> What the heck's the difference between "Background writeback" and
> "kupdate-style" writeback?  afacit "background" means "not due to
> kupdate, but due to a vmscan poke or something like that".  But the
> terms aren't defined anywhere and the wb_writeback_work fields are
> uncommented and the functions are undocumented and no wonder we keep
> making such a mess of this code.
  By "background" I mean the writeback we do when dirty_background_ratio is
exceeded. By "kupdate" I mean the writeback we do to write out inodes older
than dirty_expire_centisecs.

> > are easily livelockable (from
> > a definition of their target).
> 
> Please fully describe the livelock scenario(s).
> 
> > This is inconvenient because it can make sync(1)
> > stall forever waiting on its queued work to be finished.
> 
> And please fully describe the reason for the stall of sync(1).
  Ok, will do.

> Because if these things _are_ described then others are in a better
> position to review your proposed fix and they are in a better position
> to propose alternative fixes, no?
> 
> >  Generally, if someone
> > has a particular requirement for writeback he needs, it makes sense to give it
> > preference over a generic background dirty page cleaning. As soon as that work
> > is done, flusher thread will return back to background cleaning if it is
> > needed. So lets just interrupt background and kupdate writeback if there is
> > some other work to do to fix the livelocking problem.
> > 
> > CC: hch@infradead.org
> > Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/fs-writeback.c |    8 ++++++++
> >  1 files changed, 8 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index d5be169..542471e 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb,
> >  			break;
> >  
> >  		/*
> > +		 * Background writeout and kupdate-style writeback are
> > +		 * easily livelockable. Stop them if there is other work
> > +		 * to do so that e.g. sync can proceed.
> > +		 */
> > +		if ((work->for_background || work->for_kupdate) &&
> > +		    !list_empty(&wb->bdi->work_list))
> > +			break;
> > +		/*
> 
> So what happens if an application sits in a loop doing write&fsync to a
> file?  The vm's call for help gets ignored and your data doesn't get
> written back for three days??
  write & fsync wouldn't influece this because fsync() doesn't queue any
work for flusher thread (all the IO is done on behalf of the process doing
fsync()). If someone would be doing:
  while (1) sync();
Then this would make bdi-flusher thread ignore any VM's requests. But we
won't have much dirty data in this case anyway.
  The subtle thing here is that noone actually ever calls flusher thread to
do less work than it does when doing "kupdate" or "background" writeback as
defined above. But if we grow some calls to flusher thread for just a
limited amount of pages in future, then your are right it could be a
problem especially if flusher thread could be flooded with such requests.
I'll add above to the changelog as well.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread
  2010-08-08  2:43     ` Jan Kara
@ 2010-08-08  3:10       ` Wu Fengguang
  0 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2010-08-08  3:10 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, linux-fsdevel@vger.kernel.org, hch@infradead.org,
	jaxboe@fusionio.com

> > > @@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb,
> > >  			break;
> > >  
> > >  		/*
> > > +		 * Background writeout and kupdate-style writeback are
> > > +		 * easily livelockable. Stop them if there is other work
> > > +		 * to do so that e.g. sync can proceed.
> > > +		 */
> > > +		if ((work->for_background || work->for_kupdate) &&
> > > +		    !list_empty(&wb->bdi->work_list))
> > > +			break;
> > > +		/*
> > 
> > So what happens if an application sits in a loop doing write&fsync to a
> > file?  The vm's call for help gets ignored and your data doesn't get
> > written back for three days??
>   write & fsync wouldn't influece this because fsync() doesn't queue any
> work for flusher thread (all the IO is done on behalf of the process doing
> fsync()).

Right. The fsync functions will call into __filemap_fdatawrite_range()
to start writeback directly, instead of relaying to the flusher threads.

> If someone would be doing:
>   while (1) sync();
> Then this would make bdi-flusher thread ignore any VM's requests.

Yes, at least for now.

> But we won't have much dirty data in this case anyway.

With a heavy dirtier, it's still possible to maintain 20% dirty pages
when we are busy sync()ing.

It helps to knock down the dirty limit in this case.

>   The subtle thing here is that noone actually ever calls flusher thread to
> do less work than it does when doing "kupdate" or "background" writeback as
> defined above.

Yes.

> But if we grow some calls to flusher thread for just a
> limited amount of pages in future, then your are right it could be a
> problem especially if flusher thread could be flooded with such requests.

There's already such an interface? With nr_pages and !for_background,
such as wakeup_flusher_threads(total_scanned) in the direct reclaim
path. I suspect such requests will be piling up on memory pressure,
each request will do one kzalloc() -- it's memory allocation at vmscan
time!

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread
  2010-08-05 23:45   ` Andrew Morton
  2010-08-07 16:04     ` Wu Fengguang
  2010-08-08  2:43     ` Jan Kara
@ 2010-08-08  4:12     ` Dave Chinner
  2010-08-08  7:29       ` Wu Fengguang
  2 siblings, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2010-08-08  4:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, linux-fsdevel, hch, jaxboe, Wu Fengguang

On Thu, Aug 05, 2010 at 04:45:35PM -0700, Andrew Morton wrote:
> On Thu,  5 Aug 2010 20:53:17 +0200
> Jan Kara <jack@suse.cz> wrote:
> > ---
> >  fs/fs-writeback.c |    8 ++++++++
> >  1 files changed, 8 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index d5be169..542471e 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb,
> >  			break;
> >  
> >  		/*
> > +		 * Background writeout and kupdate-style writeback are
> > +		 * easily livelockable. Stop them if there is other work
> > +		 * to do so that e.g. sync can proceed.
> > +		 */
> > +		if ((work->for_background || work->for_kupdate) &&
> > +		    !list_empty(&wb->bdi->work_list))
> > +			break;
> > +		/*
> 
> So what happens if an application sits in a loop doing write&fsync to a
> file?  The vm's call for help gets ignored and your data doesn't get
> written back for three days??

To avoid the possibility of any such occurrence, perhaps requeuing
the work rather than cancelling it would be better? i.e. stop, put
it behind whatever work just came in and so when the new work
completes, we restart the background/expiry based writeback?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread
  2010-08-08  4:12     ` Dave Chinner
@ 2010-08-08  7:29       ` Wu Fengguang
  2010-08-08 11:07         ` Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: Wu Fengguang @ 2010-08-08  7:29 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andrew Morton, Jan Kara, linux-fsdevel@vger.kernel.org,
	hch@infradead.org, jaxboe@fusionio.com

On Sun, Aug 08, 2010 at 12:12:30PM +0800, Dave Chinner wrote:
> On Thu, Aug 05, 2010 at 04:45:35PM -0700, Andrew Morton wrote:
> > On Thu,  5 Aug 2010 20:53:17 +0200
> > Jan Kara <jack@suse.cz> wrote:
> > > ---
> > >  fs/fs-writeback.c |    8 ++++++++
> > >  1 files changed, 8 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > > index d5be169..542471e 100644
> > > --- a/fs/fs-writeback.c
> > > +++ b/fs/fs-writeback.c
> > > @@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb,
> > >  			break;
> > >  
> > >  		/*
> > > +		 * Background writeout and kupdate-style writeback are
> > > +		 * easily livelockable. Stop them if there is other work
> > > +		 * to do so that e.g. sync can proceed.
> > > +		 */
> > > +		if ((work->for_background || work->for_kupdate) &&
> > > +		    !list_empty(&wb->bdi->work_list))
> > > +			break;
> > > +		/*
> > 
> > So what happens if an application sits in a loop doing write&fsync to a
> > file?  The vm's call for help gets ignored and your data doesn't get
> > written back for three days??
> 
> To avoid the possibility of any such occurrence, perhaps requeuing
> the work rather than cancelling it would be better? i.e. stop, put
> it behind whatever work just came in and so when the new work
> completes, we restart the background/expiry based writeback?

That would be better. Ie. add a flag BDI_background_writeback to
indicate the background work should be started after finishing with
the queued works. bdi_start_background_writeback() can be modified to
simply set the bit and wake up the flusher thread.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread
  2010-08-08  7:29       ` Wu Fengguang
@ 2010-08-08 11:07         ` Jens Axboe
  2010-08-08 13:59           ` Jan Kara
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2010-08-08 11:07 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Dave Chinner, Andrew Morton, Jan Kara,
	linux-fsdevel@vger.kernel.org, hch@infradead.org

On 08/08/2010 03:29 AM, Wu Fengguang wrote:
> On Sun, Aug 08, 2010 at 12:12:30PM +0800, Dave Chinner wrote:
>> On Thu, Aug 05, 2010 at 04:45:35PM -0700, Andrew Morton wrote:
>>> On Thu,  5 Aug 2010 20:53:17 +0200
>>> Jan Kara <jack@suse.cz> wrote:
>>>> ---
>>>>  fs/fs-writeback.c |    8 ++++++++
>>>>  1 files changed, 8 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>>>> index d5be169..542471e 100644
>>>> --- a/fs/fs-writeback.c
>>>> +++ b/fs/fs-writeback.c
>>>> @@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb,
>>>>  			break;
>>>>  
>>>>  		/*
>>>> +		 * Background writeout and kupdate-style writeback are
>>>> +		 * easily livelockable. Stop them if there is other work
>>>> +		 * to do so that e.g. sync can proceed.
>>>> +		 */
>>>> +		if ((work->for_background || work->for_kupdate) &&
>>>> +		    !list_empty(&wb->bdi->work_list))
>>>> +			break;
>>>> +		/*
>>>
>>> So what happens if an application sits in a loop doing write&fsync to a
>>> file?  The vm's call for help gets ignored and your data doesn't get
>>> written back for three days??
>>
>> To avoid the possibility of any such occurrence, perhaps requeuing
>> the work rather than cancelling it would be better? i.e. stop, put
>> it behind whatever work just came in and so when the new work
>> completes, we restart the background/expiry based writeback?
> 
> That would be better. Ie. add a flag BDI_background_writeback to
> indicate the background work should be started after finishing with
> the queued works. bdi_start_background_writeback() can be modified to
> simply set the bit and wake up the flusher thread.

I think that is better, we ideally want background writeout to
interleave smoothly with any other write activity. But why not just
mark the 'background writeout was interrupted' bit here when breaking,
and have the thread check-clear that when finishing some other piece
of work (or when going idle)?

-- 
Jens Axboe


Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread
  2010-08-08 11:07         ` Jens Axboe
@ 2010-08-08 13:59           ` Jan Kara
  2010-08-08 22:55             ` Wu Fengguang
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2010-08-08 13:59 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Wu Fengguang, Dave Chinner, Andrew Morton, Jan Kara,
	linux-fsdevel@vger.kernel.org, hch@infradead.org

On Sun 08-08-10 07:07:26, Jens Axboe wrote:
> On 08/08/2010 03:29 AM, Wu Fengguang wrote:
> > On Sun, Aug 08, 2010 at 12:12:30PM +0800, Dave Chinner wrote:
> >> On Thu, Aug 05, 2010 at 04:45:35PM -0700, Andrew Morton wrote:
> >>> On Thu,  5 Aug 2010 20:53:17 +0200
> >>> Jan Kara <jack@suse.cz> wrote:
> >>>> ---
> >>>>  fs/fs-writeback.c |    8 ++++++++
> >>>>  1 files changed, 8 insertions(+), 0 deletions(-)
> >>>>
> >>>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >>>> index d5be169..542471e 100644
> >>>> --- a/fs/fs-writeback.c
> >>>> +++ b/fs/fs-writeback.c
> >>>> @@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb,
> >>>>  			break;
> >>>>  
> >>>>  		/*
> >>>> +		 * Background writeout and kupdate-style writeback are
> >>>> +		 * easily livelockable. Stop them if there is other work
> >>>> +		 * to do so that e.g. sync can proceed.
> >>>> +		 */
> >>>> +		if ((work->for_background || work->for_kupdate) &&
> >>>> +		    !list_empty(&wb->bdi->work_list))
> >>>> +			break;
> >>>> +		/*
> >>>
> >>> So what happens if an application sits in a loop doing write&fsync to a
> >>> file?  The vm's call for help gets ignored and your data doesn't get
> >>> written back for three days??
> >>
> >> To avoid the possibility of any such occurrence, perhaps requeuing
> >> the work rather than cancelling it would be better? i.e. stop, put
> >> it behind whatever work just came in and so when the new work
> >> completes, we restart the background/expiry based writeback?
> > 
> > That would be better. Ie. add a flag BDI_background_writeback to
> > indicate the background work should be started after finishing with
> > the queued works. bdi_start_background_writeback() can be modified to
> > simply set the bit and wake up the flusher thread.
> 
> I think that is better, we ideally want background writeout to
> interleave smoothly with any other write activity. But why not just
> mark the 'background writeout was interrupted' bit here when breaking,
> and have the thread check-clear that when finishing some other piece
> of work (or when going idle)?
  OK, although what would probably work as well would be to extend
wb_check_old_data_flush() to also start background writeback (not only
kupdate) if needed. Which probably makes sense regardless of any
interruption of background writeback we might do... Maybe let's talk about
this in some of the writeback sessions.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread
  2010-08-08 13:59           ` Jan Kara
@ 2010-08-08 22:55             ` Wu Fengguang
  0 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2010-08-08 22:55 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Dave Chinner, Andrew Morton,
	linux-fsdevel@vger.kernel.org, hch@infradead.org

On Sun, Aug 08, 2010 at 09:59:18PM +0800, Jan Kara wrote:
> On Sun 08-08-10 07:07:26, Jens Axboe wrote:
> > On 08/08/2010 03:29 AM, Wu Fengguang wrote:
> > > On Sun, Aug 08, 2010 at 12:12:30PM +0800, Dave Chinner wrote:
> > >> On Thu, Aug 05, 2010 at 04:45:35PM -0700, Andrew Morton wrote:
> > >>> On Thu,  5 Aug 2010 20:53:17 +0200
> > >>> Jan Kara <jack@suse.cz> wrote:
> > >>>> ---
> > >>>>  fs/fs-writeback.c |    8 ++++++++
> > >>>>  1 files changed, 8 insertions(+), 0 deletions(-)
> > >>>>
> > >>>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > >>>> index d5be169..542471e 100644
> > >>>> --- a/fs/fs-writeback.c
> > >>>> +++ b/fs/fs-writeback.c
> > >>>> @@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb,
> > >>>>  			break;
> > >>>>  
> > >>>>  		/*
> > >>>> +		 * Background writeout and kupdate-style writeback are
> > >>>> +		 * easily livelockable. Stop them if there is other work
> > >>>> +		 * to do so that e.g. sync can proceed.
> > >>>> +		 */
> > >>>> +		if ((work->for_background || work->for_kupdate) &&
> > >>>> +		    !list_empty(&wb->bdi->work_list))
> > >>>> +			break;
> > >>>> +		/*
> > >>>
> > >>> So what happens if an application sits in a loop doing write&fsync to a
> > >>> file?  The vm's call for help gets ignored and your data doesn't get
> > >>> written back for three days??
> > >>
> > >> To avoid the possibility of any such occurrence, perhaps requeuing
> > >> the work rather than cancelling it would be better? i.e. stop, put
> > >> it behind whatever work just came in and so when the new work
> > >> completes, we restart the background/expiry based writeback?
> > > 
> > > That would be better. Ie. add a flag BDI_background_writeback to
> > > indicate the background work should be started after finishing with
> > > the queued works. bdi_start_background_writeback() can be modified to
> > > simply set the bit and wake up the flusher thread.
> > 
> > I think that is better, we ideally want background writeout to
> > interleave smoothly with any other write activity. But why not just
> > mark the 'background writeout was interrupted' bit here when breaking,
> > and have the thread check-clear that when finishing some other piece
> > of work (or when going idle)?
>   OK, although what would probably work as well would be to extend
> wb_check_old_data_flush() to also start background writeback (not only
> kupdate) if needed. Which probably makes sense regardless of any
> interruption of background writeback we might do... Maybe let's talk about
> this in some of the writeback sessions.

Good idea.  Then bdi_start_background_writeback() can simply wake it
up. I'd vote for this way before hitting more requirements possibility
in future.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/3 v2] mm: Fix writeback_in_progress()
  2010-08-05 18:53 [PATCH 0/3 v2] Three writeback fixes to stop sync(1) livelocks Jan Kara
  2010-08-05 18:53 ` [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread Jan Kara
@ 2010-08-05 18:53 ` Jan Kara
  2010-08-05 19:37   ` Jens Axboe
                     ` (2 more replies)
  2010-08-05 18:53 ` [PATCH 3/3 v2] mm: Avoid resetting wb_start after each writeback round Jan Kara
  2010-08-06 12:23 ` [PATCH 0/3 v2] Three writeback fixes to stop sync(1) livelocks Christoph Hellwig
  3 siblings, 3 replies; 22+ messages in thread
From: Jan Kara @ 2010-08-05 18:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, hch, jaxboe, Wu Fengguang, Jan Kara

Commit 83ba7b071f30f7c01f72518ad72d5cd203c27502 broke writeback_in_progress()
as in that commit we started to remove work items from the list at the
moment we start working on them and not at the moment they are finished.
Thus if the flusher thread was doing some work but there was no other
work queued, writeback_in_progress() returned false. This could in
particular cause unnecessary queueing of background writeback from
balance_dirty_pages() or writeout work from writeback_sb_if_idle().

This patch fixes the problem by introducing a bit in the bdi state which
indicates that the flusher thread is processing some work and uses this
bit for writeback_in_progress() test.

NOTE: Both callsites of writeback_in_progress() (namely,
writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
need a different information than what writeback_in_progress() provides.
They would need to know whether *the kind of writeback they are going
to submit* is already queued. But this information isn't that simple
to provide so let's fix writeback_in_progress() for the time being.

CC: hch@infradead.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c           |    4 +++-
 include/linux/backing-dev.h |    1 +
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 542471e..6bdc924 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -59,7 +59,7 @@ struct wb_writeback_work {
  */
 int writeback_in_progress(struct backing_dev_info *bdi)
 {
-	return !list_empty(&bdi->work_list);
+	return test_bit(BDI_writeback_running, &bdi->state);
 }
 
 static void bdi_queue_work(struct backing_dev_info *bdi,
@@ -751,6 +751,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 	struct wb_writeback_work *work;
 	long wrote = 0;
 
+	set_bit(BDI_writeback_running, &wb->bdi->state);
 	while ((work = get_next_work_item(bdi, wb)) != NULL) {
 		/*
 		 * Override sync mode, in case we must wait for completion
@@ -775,6 +776,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 	 * Check for periodic writeback, kupdated() style
 	 */
 	wrote += wb_check_old_data_flush(wb);
+	clear_bit(BDI_writeback_running, &wb->bdi->state);
 
 	return wrote;
 }
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index e9aec0d..12ba2cd 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -31,6 +31,7 @@ enum bdi_state {
 	BDI_async_congested,	/* The async (write) queue is getting full */
 	BDI_sync_congested,	/* The sync queue is getting full */
 	BDI_registered,		/* bdi_register() was done */
+	BDI_writeback_running,	/* Writeback is in progress */
 	BDI_unused,		/* Available bits start here */
 };
 
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3 v2] mm: Fix writeback_in_progress()
  2010-08-05 18:53 ` [PATCH 2/3 v2] mm: Fix writeback_in_progress() Jan Kara
@ 2010-08-05 19:37   ` Jens Axboe
  2010-08-05 23:06   ` Wu Fengguang
  2010-08-06  0:10   ` Andrew Morton
  2 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2010-08-05 19:37 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, linux-fsdevel@vger.kernel.org, hch@infradead.org,
	Wu Fengguang

On 08/05/2010 08:53 PM, Jan Kara wrote:
> Commit 83ba7b071f30f7c01f72518ad72d5cd203c27502 broke writeback_in_progress()
> as in that commit we started to remove work items from the list at the
> moment we start working on them and not at the moment they are finished.
> Thus if the flusher thread was doing some work but there was no other
> work queued, writeback_in_progress() returned false. This could in
> particular cause unnecessary queueing of background writeback from
> balance_dirty_pages() or writeout work from writeback_sb_if_idle().
> 
> This patch fixes the problem by introducing a bit in the bdi state which
> indicates that the flusher thread is processing some work and uses this
> bit for writeback_in_progress() test.
> 
> NOTE: Both callsites of writeback_in_progress() (namely,
> writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
> need a different information than what writeback_in_progress() provides.
> They would need to know whether *the kind of writeback they are going
> to submit* is already queued. But this information isn't that simple
> to provide so let's fix writeback_in_progress() for the time being.

I left this deliberatly vague, as there's no good way to answer the
question that the caller wants to know about.

Alternatively, augment the list_empty() check with a task running
check. I guess that would require the wb lock, so your bit set/check
is the best solution I think.

Acked-by: Jens Axboe <jaxboe@fusionio.com>

-- 
Jens Axboe

Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3 v2] mm: Fix writeback_in_progress()
  2010-08-05 18:53 ` [PATCH 2/3 v2] mm: Fix writeback_in_progress() Jan Kara
  2010-08-05 19:37   ` Jens Axboe
@ 2010-08-05 23:06   ` Wu Fengguang
  2010-08-06  0:10   ` Andrew Morton
  2 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2010-08-05 23:06 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, linux-fsdevel@vger.kernel.org, hch@infradead.org,
	jaxboe@fusionio.com

Jan,

On Fri, Aug 06, 2010 at 02:53:18AM +0800, Jan Kara wrote:
> Commit 83ba7b071f30f7c01f72518ad72d5cd203c27502 broke writeback_in_progress()
> as in that commit we started to remove work items from the list at the
> moment we start working on them and not at the moment they are finished.
> Thus if the flusher thread was doing some work but there was no other
> work queued, writeback_in_progress() returned false. This could in
> particular cause unnecessary queueing of background writeback from
> balance_dirty_pages() or writeout work from writeback_sb_if_idle().

s/writeback_sb_if_idle/writeback_inodes_sb_if_idle

> This patch fixes the problem by introducing a bit in the bdi state which
> indicates that the flusher thread is processing some work and uses this
> bit for writeback_in_progress() test.

There is a possible small downside. Imagine this scenario:
- kupdate is running while dirty pages go up to 19%
- the dirtier stopped
- kupdate quits
- dirty pages remain at 18%, over background threshold

It looks not a big problem except for surprising some users (like me :).

The old code might also stuck in 18% dirty:
- there are two works, one queued and another running
- during the time dirty pages go up to 19%
- the dirtier stopped
- the two works quit
- dirty pages remain at 18%, over background threshold

> NOTE: Both callsites of writeback_in_progress() (namely,
> writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
> need a different information than what writeback_in_progress() provides.
> They would need to know whether *the kind of writeback they are going
> to submit* is already queued. But this information isn't that simple
> to provide so let's fix writeback_in_progress() for the time being.

I have an outdated patch that introduced can_submit_background_writeback()
in addition to the problematic writeback_in_progress(). It
deliberately enqueue a new background job even when there is a running
one. Because when the balance_dirty_pages() is converted to wait on
the flusher threads, it must guarantee that someone will be there
working to wake it up. And the running background job may right in the
progress of exiting..

Thanks,
Fengguang
---

writeback: only allow two background writeback works

balance_dirty_pages() need a reliable way to ensure some background work
is scheduled for running. We cannot simply run bdi_start_writeback()
because that would queue up huge number of works (which takes memory and
time).

CC: Jens Axboe <jens.axboe@oracle.com> 
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c           |   14 ++------------
 include/linux/backing-dev.h |   26 +++++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 13 deletions(-)

--- linux.orig/fs/fs-writeback.c	2009-11-06 09:52:24.000000000 +0800
+++ linux/fs/fs-writeback.c	2009-11-06 09:52:25.000000000 +0800
@@ -82,18 +82,6 @@ static inline void bdi_work_init(struct 
 	work->state = WS_USED;
 }
 
-/**
- * writeback_in_progress - determine whether there is writeback in progress
- * @bdi: the device's backing_dev_info structure.
- *
- * Determine whether there is writeback waiting to be handled against a
- * backing device.
- */
-int writeback_in_progress(struct backing_dev_info *bdi)
-{
-	return !list_empty(&bdi->work_list);
-}
-
 static void bdi_work_clear(struct bdi_work *work)
 {
 	clear_bit(WS_USED_B, &work->state);
@@ -144,6 +132,8 @@ static void wb_clear_pending(struct bdi_
 
 		spin_lock(&bdi->wb_lock);
 		list_del_rcu(&work->list);
+		if (work->args.for_background)
+			clear_bit(WB_FLAG_BACKGROUND_WORK, &bdi->wb_mask);
 		spin_unlock(&bdi->wb_lock);
 
 		wb_work_complete(work);
--- linux.orig/include/linux/backing-dev.h	2009-11-06 09:52:25.000000000 +0800
+++ linux/include/linux/backing-dev.h	2009-11-06 09:52:25.000000000 +0800
@@ -94,6 +94,11 @@ struct backing_dev_info {
 #endif
 };
 
+/*
+ * background work queued, set to avoid redundant background works
+ */
+#define WB_FLAG_BACKGROUND_WORK		30
+
 int bdi_init(struct backing_dev_info *bdi);
 void bdi_destroy(struct backing_dev_info *bdi);
 
@@ -248,7 +253,26 @@ int bdi_set_max_ratio(struct backing_dev
 extern struct backing_dev_info default_backing_dev_info;
 void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page);
 
-int writeback_in_progress(struct backing_dev_info *bdi);
+/**
+ * writeback_in_progress - determine whether there is writeback in progress
+ * @bdi: the device's backing_dev_info structure.
+ *
+ * Determine whether there is writeback waiting to be handled against a
+ * backing device.
+ */
+static inline int writeback_in_progress(struct backing_dev_info *bdi)
+{
+	return !list_empty(&bdi->work_list);
+}
+
+/*
+ * Helper to limit # of background writeback works in circulation to 2.
+ * (one running and another queued)
+ */
+static inline int can_submit_background_writeback(struct backing_dev_info *bdi)
+{
+       return !test_and_set_bit(WB_FLAG_BACKGROUND_WORK, &bdi->wb_mask);
+}
 
 static inline int bdi_congested(struct backing_dev_info *bdi, int bdi_bits)
 {

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3 v2] mm: Fix writeback_in_progress()
  2010-08-05 18:53 ` [PATCH 2/3 v2] mm: Fix writeback_in_progress() Jan Kara
  2010-08-05 19:37   ` Jens Axboe
  2010-08-05 23:06   ` Wu Fengguang
@ 2010-08-06  0:10   ` Andrew Morton
  2010-08-08  2:25     ` Jan Kara
  2 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2010-08-06  0:10 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, hch, jaxboe, Wu Fengguang

On Thu,  5 Aug 2010 20:53:18 +0200
Jan Kara <jack@suse.cz> wrote:

> Commit 83ba7b071f30f7c01f72518ad72d5cd203c27502 broke writeback_in_progress()
> as in that commit we started to remove work items from the list at the
> moment we start working on them and not at the moment they are finished.
> Thus if the flusher thread was doing some work but there was no other
> work queued, writeback_in_progress() returned false. This could in
> particular cause unnecessary queueing of background writeback from
> balance_dirty_pages() or writeout work from writeback_sb_if_idle().
> 
> This patch fixes the problem by introducing a bit in the bdi state which
> indicates that the flusher thread is processing some work and uses this
> bit for writeback_in_progress() test.
> 
> NOTE: Both callsites of writeback_in_progress() (namely,
> writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
> need a different information than what writeback_in_progress() provides.
> They would need to know whether *the kind of writeback they are going
> to submit* is already queued. But this information isn't that simple
> to provide so let's fix writeback_in_progress() for the time being.
> 

Patch looks reasonable, but..  What effect does it have?

writeback_inodes_sb_if_idle() is some ext4 delalloc hack.  Shudder, no
comment.

The writeback_in_progess() test in balance_dirty_pages() is at least
eight years old.  I got bored digging back through the git record
working out why I added it (that line's been changed multiple times and
it's a pita tracking back through those things).  

I suspect it was there to say "don't bother poking pdflush if it's
already doing something".  But perhaps that logic got broken by
subsequent mauling.  Or was never right.  For example, if the kupdate
function is writing back an old inode, do we really want that to
prevent balance_dirty_pages()'s attempt to cure a dirty-memory-exceeded
situation?  Don't think so.

So I dunno.  I suspect a better patch would be "remove
writeback_in_progess()".  But first one should find the
patch which added the test to balance_dirty_pages() and
see if it was well changelogged.

But this patch's changelog doesn't make me confident that the
end-user-visible effects of this change are fully understood?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3 v2] mm: Fix writeback_in_progress()
  2010-08-06  0:10   ` Andrew Morton
@ 2010-08-08  2:25     ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2010-08-08  2:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, linux-fsdevel, hch, jaxboe, Wu Fengguang

On Thu 05-08-10 17:10:25, Andrew Morton wrote:
> On Thu,  5 Aug 2010 20:53:18 +0200
> Jan Kara <jack@suse.cz> wrote:
> > Commit 83ba7b071f30f7c01f72518ad72d5cd203c27502 broke writeback_in_progress()
> > as in that commit we started to remove work items from the list at the
> > moment we start working on them and not at the moment they are finished.
> > Thus if the flusher thread was doing some work but there was no other
> > work queued, writeback_in_progress() returned false. This could in
> > particular cause unnecessary queueing of background writeback from
> > balance_dirty_pages() or writeout work from writeback_sb_if_idle().
> > 
> > This patch fixes the problem by introducing a bit in the bdi state which
> > indicates that the flusher thread is processing some work and uses this
> > bit for writeback_in_progress() test.
> > 
> > NOTE: Both callsites of writeback_in_progress() (namely,
> > writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
> > need a different information than what writeback_in_progress() provides.
> > They would need to know whether *the kind of writeback they are going
> > to submit* is already queued. But this information isn't that simple
> > to provide so let's fix writeback_in_progress() for the time being.
> > 
> 
> Patch looks reasonable, but..  What effect does it have?
> 
> writeback_inodes_sb_if_idle() is some ext4 delalloc hack.  Shudder, no
> comment.
> 
> The writeback_in_progess() test in balance_dirty_pages() is at least
> eight years old.  I got bored digging back through the git record
> working out why I added it (that line's been changed multiple times and
> it's a pita tracking back through those things).  
> 
> I suspect it was there to say "don't bother poking pdflush if it's
> already doing something".  But perhaps that logic got broken by
> subsequent mauling.  Or was never right.  For example, if the kupdate
  Yeah, it seems it never was quite right. I dug through BK history and
the logic seems to be introduced by commit
faa74c6f880b4a28d916a4c3b15594bbb93b57c0 sometime aroung *2.5.16* and at
that time it seems it already had this issue.

> function is writing back an old inode, do we really want that to
> prevent balance_dirty_pages()'s attempt to cure a dirty-memory-exceeded
> situation?  Don't think so.
  I think it would be good if balance_dirty_pages() made sure that at least
one backround writeback work is either being processed or in a queue. We
cannot just unconditionally queue a background writeback because that would
make queue really long full of requests for background writeback when the
device gets congested.

> So I dunno.  I suspect a better patch would be "remove
> writeback_in_progess()".  But first one should find the
> patch which added the test to balance_dirty_pages() and
> see if it was well changelogged.
  I definitely agree that we want to get rid of writeback_in_progress() as
soon as someone writes a patch to do it ;).

> But this patch's changelog doesn't make me confident that the
> end-user-visible effects of this change are fully understood?
  I just noticed that the function does something else than what its comment
suggests and that the change was unintentionally caused by commit
83ba7b071f30f7c01f72518ad72d5cd203c27502 short time ago. So I just figured
it is worth to restore the behavior to the one before the cleanup... About
user visible effects: It's hard for me to say if there are any. We may do
more writeback with buggy writeback_in_progress() which is sometimes
desirable (e.g. when flusher thread is just doing writeback of old inodes
and we want full background cleaning) and sometimes it needen't be. But
most of the time I'd say you just won't notice (at least in my testing I
was able to notice the difference only because of writeback code tracing).

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/3 v2] mm: Avoid resetting wb_start after each writeback round
  2010-08-05 18:53 [PATCH 0/3 v2] Three writeback fixes to stop sync(1) livelocks Jan Kara
  2010-08-05 18:53 ` [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread Jan Kara
  2010-08-05 18:53 ` [PATCH 2/3 v2] mm: Fix writeback_in_progress() Jan Kara
@ 2010-08-05 18:53 ` Jan Kara
  2010-08-05 19:38   ` Jens Axboe
  2010-08-06  0:21   ` Andrew Morton
  2010-08-06 12:23 ` [PATCH 0/3 v2] Three writeback fixes to stop sync(1) livelocks Christoph Hellwig
  3 siblings, 2 replies; 22+ messages in thread
From: Jan Kara @ 2010-08-05 18:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, hch, jaxboe, Wu Fengguang, Jan Kara

WB_SYNC_NONE writeback is done in rounds of 1024 pages so that we don't write
out some huge inode for too long while starving writeout of other inodes. To
avoid livelocks, we record time we started writeback in wbc->wb_start and do
not write out inodes which were dirtied after this time. But currently,
writeback_inodes_wb() resets wb_start each time it is called thus effectively
invalidating this logic and making any WB_SYNC_NONE writeback prone to
livelocks.

This patch makes sure wb_start is set only once when we start writeback.

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6bdc924..aa59394 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -530,7 +530,8 @@ void writeback_inodes_wb(struct bdi_writeback *wb,
 {
 	int ret = 0;
 
-	wbc->wb_start = jiffies; /* livelock avoidance */
+	if (!wbc->wb_start)
+		wbc->wb_start = jiffies; /* livelock avoidance */
 	spin_lock(&inode_lock);
 	if (!wbc->for_kupdate || list_empty(&wb->b_io))
 		queue_io(wb, wbc->older_than_this);
@@ -559,7 +560,6 @@ static void __writeback_inodes_sb(struct super_block *sb,
 {
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));
 
-	wbc->wb_start = jiffies; /* livelock avoidance */
 	spin_lock(&inode_lock);
 	if (!wbc->for_kupdate || list_empty(&wb->b_io))
 		queue_io(wb, wbc->older_than_this);
@@ -625,6 +625,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 		wbc.range_end = LLONG_MAX;
 	}
 
+	wbc.wb_start = jiffies; /* livelock avoidance */
 	for (;;) {
 		/*
 		 * Stop writeback when nr_pages has been consumed
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3 v2] mm: Avoid resetting wb_start after each writeback round
  2010-08-05 18:53 ` [PATCH 3/3 v2] mm: Avoid resetting wb_start after each writeback round Jan Kara
@ 2010-08-05 19:38   ` Jens Axboe
  2010-08-06  0:21   ` Andrew Morton
  1 sibling, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2010-08-05 19:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, linux-fsdevel@vger.kernel.org, hch@infradead.org,
	Wu Fengguang

On 08/05/2010 08:53 PM, Jan Kara wrote:
> WB_SYNC_NONE writeback is done in rounds of 1024 pages so that we don't write
> out some huge inode for too long while starving writeout of other inodes. To
> avoid livelocks, we record time we started writeback in wbc->wb_start and do
> not write out inodes which were dirtied after this time. But currently,
> writeback_inodes_wb() resets wb_start each time it is called thus effectively
> invalidating this logic and making any WB_SYNC_NONE writeback prone to
> livelocks.
> 
> This patch makes sure wb_start is set only once when we start writeback.

Acked-by: Jens Axboe <jaxboe@fusionio.com>

-- 
Jens Axboe

Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3 v2] mm: Avoid resetting wb_start after each writeback round
  2010-08-05 18:53 ` [PATCH 3/3 v2] mm: Avoid resetting wb_start after each writeback round Jan Kara
  2010-08-05 19:38   ` Jens Axboe
@ 2010-08-06  0:21   ` Andrew Morton
  2010-08-07 22:45     ` Jan Kara
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2010-08-06  0:21 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, hch, jaxboe, Wu Fengguang

On Thu,  5 Aug 2010 20:53:19 +0200
Jan Kara <jack@suse.cz> wrote:

> WB_SYNC_NONE writeback is done in rounds of 1024 pages so that we don't write
> out some huge inode for too long while starving writeout of other inodes. To
> avoid livelocks, we record time we started writeback in wbc->wb_start and do
> not write out inodes which were dirtied after this time. But currently,
> writeback_inodes_wb() resets wb_start each time it is called thus effectively
> invalidating this logic and making any WB_SYNC_NONE writeback prone to
> livelocks.

data point: that was eight-year-old code that we broke.  Five months ago.

How did you discover this?  You have test cases which cover this
scenario and things locked up?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3 v2] mm: Avoid resetting wb_start after each writeback round
  2010-08-06  0:21   ` Andrew Morton
@ 2010-08-07 22:45     ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2010-08-07 22:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, linux-fsdevel, hch, jaxboe, Wu Fengguang

On Thu 05-08-10 17:21:07, Andrew Morton wrote:
> On Thu,  5 Aug 2010 20:53:19 +0200
> Jan Kara <jack@suse.cz> wrote:
> 
> > WB_SYNC_NONE writeback is done in rounds of 1024 pages so that we don't write
> > out some huge inode for too long while starving writeout of other inodes. To
> > avoid livelocks, we record time we started writeback in wbc->wb_start and do
> > not write out inodes which were dirtied after this time. But currently,
> > writeback_inodes_wb() resets wb_start each time it is called thus effectively
> > invalidating this logic and making any WB_SYNC_NONE writeback prone to
> > livelocks.
> 
> data point: that was eight-year-old code that we broke.  Five months ago.
> 
> How did you discover this?  You have test cases which cover this
> scenario and things locked up?
  There was a bug report where someone complained sync(1) takes much longer
than it used to. So I had look at the code and noticed this bug (and also
the problem that we get stuck doing background writeback while the sync
work is waiting).

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3 v2] Three writeback fixes to stop sync(1) livelocks
  2010-08-05 18:53 [PATCH 0/3 v2] Three writeback fixes to stop sync(1) livelocks Jan Kara
                   ` (2 preceding siblings ...)
  2010-08-05 18:53 ` [PATCH 3/3 v2] mm: Avoid resetting wb_start after each writeback round Jan Kara
@ 2010-08-06 12:23 ` Christoph Hellwig
  3 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2010-08-06 12:23 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, linux-fsdevel, hch, jaxboe, Wu Fengguang

On Thu, Aug 05, 2010 at 08:53:16PM +0200, Jan Kara wrote:
> 
>   Hi,
> 
>   attached is a second version of my patch set to modify writeback code
> so that we avoid livelocks of sync(1) during usual load (such as copying
> trees etc.). I have just updated changelogs to reflect my discussions
> with Christoph since the last version.
>   Christoph, are you OK with the patches after more explanation?
> Andrew, could you merge the patches if Christoph does not object? Thanks.

I think they are fine for now (.36 / stable), but I think we need to
revisit the various things I've mentioned sooner or later.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-08-08 22:56 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-05 18:53 [PATCH 0/3 v2] Three writeback fixes to stop sync(1) livelocks Jan Kara
2010-08-05 18:53 ` [PATCH 1/3 v2] mm: Stop background writeback if there is other work queued for the thread Jan Kara
2010-08-05 19:38   ` Jens Axboe
2010-08-05 23:45   ` Andrew Morton
2010-08-07 16:04     ` Wu Fengguang
2010-08-08  2:43     ` Jan Kara
2010-08-08  3:10       ` Wu Fengguang
2010-08-08  4:12     ` Dave Chinner
2010-08-08  7:29       ` Wu Fengguang
2010-08-08 11:07         ` Jens Axboe
2010-08-08 13:59           ` Jan Kara
2010-08-08 22:55             ` Wu Fengguang
2010-08-05 18:53 ` [PATCH 2/3 v2] mm: Fix writeback_in_progress() Jan Kara
2010-08-05 19:37   ` Jens Axboe
2010-08-05 23:06   ` Wu Fengguang
2010-08-06  0:10   ` Andrew Morton
2010-08-08  2:25     ` Jan Kara
2010-08-05 18:53 ` [PATCH 3/3 v2] mm: Avoid resetting wb_start after each writeback round Jan Kara
2010-08-05 19:38   ` Jens Axboe
2010-08-06  0:21   ` Andrew Morton
2010-08-07 22:45     ` Jan Kara
2010-08-06 12:23 ` [PATCH 0/3 v2] Three writeback fixes to stop sync(1) livelocks Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).