public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* kupdate weirdness
@ 2007-08-01 20:45 Miklos Szeredi
  2007-08-01 21:14 ` Andrew Morton
  2007-08-02  1:53 ` kupdate weirdness David Chinner
  0 siblings, 2 replies; 9+ messages in thread
From: Miklos Szeredi @ 2007-08-01 20:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, linux-fsdevel

The following strange behavior can be observed:

1. large file is written
2. after 30 seconds, nr_dirty goes down by 1024
3. then for some time (< 30 sec) nothing happens (disk idle)
4. then nr_dirty again goes down by 1024
5. repeat from 3. until whole file is written

So basically a 4Mbyte chunk of the file is written every 30 seconds.
I'm quite sure this is not the intended behavior.

The reason seems to be that __sync_single_inode() will move the
partially written inode from s_io onto s_dirty, and sync_sb_inode()
will not splice it back onto s_io until the rest of the inodes on s_io
has been processed.

Since there will probably be a recently dirtied inode on s_io, this
will take some of time, but always less than 30 sec.

I don't know what's the easiest solution.

Any ideas?

Miklos

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kupdate weirdness
  2007-08-01 20:45 kupdate weirdness Miklos Szeredi
@ 2007-08-01 21:14 ` Andrew Morton
  2007-08-02 15:52   ` Miklos Szeredi
  2007-08-02  1:53 ` kupdate weirdness David Chinner
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2007-08-01 21:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, linux-fsdevel

On Wed, 01 Aug 2007 22:45:16 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:

> The following strange behavior can be observed:
> 
> 1. large file is written
> 2. after 30 seconds, nr_dirty goes down by 1024
> 3. then for some time (< 30 sec) nothing happens (disk idle)
> 4. then nr_dirty again goes down by 1024
> 5. repeat from 3. until whole file is written
> 
> So basically a 4Mbyte chunk of the file is written every 30 seconds.
> I'm quite sure this is not the intended behavior.
> 
> The reason seems to be that __sync_single_inode() will move the
> partially written inode from s_io onto s_dirty, and sync_sb_inode()
> will not splice it back onto s_io until the rest of the inodes on s_io
> has been processed.

It does all sorts of weird crap.

> Since there will probably be a recently dirtied inode on s_io, this
> will take some of time, but always less than 30 sec.
> 
> I don't know what's the easiest solution.
> 
> Any ideas?

Try 2.6.23-rc1-mm2.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kupdate weirdness
  2007-08-01 20:45 kupdate weirdness Miklos Szeredi
  2007-08-01 21:14 ` Andrew Morton
@ 2007-08-02  1:53 ` David Chinner
  1 sibling, 0 replies; 9+ messages in thread
From: David Chinner @ 2007-08-02  1:53 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, akpm, linux-fsdevel

On Wed, Aug 01, 2007 at 10:45:16PM +0200, Miklos Szeredi wrote:
> The following strange behavior can be observed:
> 
> 1. large file is written
> 2. after 30 seconds, nr_dirty goes down by 1024
> 3. then for some time (< 30 sec) nothing happens (disk idle)
> 4. then nr_dirty again goes down by 1024
> 5. repeat from 3. until whole file is written
> 
> So basically a 4Mbyte chunk of the file is written every 30 seconds.
> I'm quite sure this is not the intended behavior.
> 
> The reason seems to be that __sync_single_inode() will move the
> partially written inode from s_io onto s_dirty, and sync_sb_inode()
> will not splice it back onto s_io until the rest of the inodes on s_io
> has been processed.

It's been doing this for a long time.

http://marc.info/?l=linux-kernel&m=113919849421679&w=2

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kupdate weirdness
  2007-08-01 21:14 ` Andrew Morton
@ 2007-08-02 15:52   ` Miklos Szeredi
  2007-08-02 19:18     ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2007-08-02 15:52 UTC (permalink / raw)
  To: akpm; +Cc: miklos, linux-kernel, linux-fsdevel

> > The following strange behavior can be observed:
> > 
> > 1. large file is written
> > 2. after 30 seconds, nr_dirty goes down by 1024
> > 3. then for some time (< 30 sec) nothing happens (disk idle)
> > 4. then nr_dirty again goes down by 1024
> > 5. repeat from 3. until whole file is written
> > 
> > So basically a 4Mbyte chunk of the file is written every 30 seconds.
> > I'm quite sure this is not the intended behavior.
> > 
> > The reason seems to be that __sync_single_inode() will move the
> > partially written inode from s_io onto s_dirty, and sync_sb_inode()
> > will not splice it back onto s_io until the rest of the inodes on s_io
> > has been processed.
> 
> It does all sorts of weird crap.
> 
> > Since there will probably be a recently dirtied inode on s_io, this
> > will take some of time, but always less than 30 sec.
> > 
> > I don't know what's the easiest solution.
> > 
> > Any ideas?
> 
> Try 2.6.23-rc1-mm2.

Much better, but still not perfect.

Now it writes out 1024 pages after 30 seconds and then the rest after
another 30s.

If my analysis is correct, this is because when it first gets onto
s_io other inodes will get there too (with up-to 30s later dirying
time), and the contents of s_more_io won't be recycled until the
current contents of s_io are processed.

Maybe this is OK, the previous weird stuff didn't seem to bother a lot
of people either.

Miklos

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kupdate weirdness
  2007-08-02 15:52   ` Miklos Szeredi
@ 2007-08-02 19:18     ` Andrew Morton
  2007-08-02 19:35       ` Miklos Szeredi
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2007-08-02 19:18 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, linux-fsdevel, Ken Chen

On Thu, 02 Aug 2007 17:52:39 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:

> > > The following strange behavior can be observed:
> > > 
> > > 1. large file is written
> > > 2. after 30 seconds, nr_dirty goes down by 1024
> > > 3. then for some time (< 30 sec) nothing happens (disk idle)
> > > 4. then nr_dirty again goes down by 1024
> > > 5. repeat from 3. until whole file is written
> > > 
> > > So basically a 4Mbyte chunk of the file is written every 30 seconds.
> > > I'm quite sure this is not the intended behavior.
> > > 
> > > The reason seems to be that __sync_single_inode() will move the
> > > partially written inode from s_io onto s_dirty, and sync_sb_inode()
> > > will not splice it back onto s_io until the rest of the inodes on s_io
> > > has been processed.
> > 
> > It does all sorts of weird crap.
> > 
> > > Since there will probably be a recently dirtied inode on s_io, this
> > > will take some of time, but always less than 30 sec.
> > > 
> > > I don't know what's the easiest solution.
> > > 
> > > Any ideas?
> > 
> > Try 2.6.23-rc1-mm2.
> 
> Much better, but still not perfect.

I've kinda lost track of the status of all these patches.  I _think_ Ken
has identified a remaining problem even after his
writeback-fix-periodic-superblock-dirty-inode-flushing.patch, but maybe I
misremember.

Ken, can you remind us of the status there, please?

> Now it writes out 1024 pages after 30 seconds and then the rest after
> another 30s.

Bah.

> If my analysis is correct, this is because when it first gets onto
> s_io other inodes will get there too (with up-to 30s later dirying
> time), and the contents of s_more_io won't be recycled until the
> current contents of s_io are processed.
> 
> Maybe this is OK, the previous weird stuff didn't seem to bother a lot
> of people either.

There were heaps of problems in there and it is surprising how few people
were hitting them.  Ordered-mode journalling filesystems will fix it all up
behind the scenes, of course.

I just have a bad feeling about that code - list_heads are the wrong data
structure and it all needs to be ripped and redone using some indexable
data structure.  There has been desultory discussion, but nothing's
happening and nothing will happen in the medium term, so we need to keep
on whapping bandainds on it.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: kupdate weirdness
  2007-08-02 19:18     ` Andrew Morton
@ 2007-08-02 19:35       ` Miklos Szeredi
       [not found]         ` <1186091062.11797.34.camel@lappy>
  0 siblings, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2007-08-02 19:35 UTC (permalink / raw)
  To: akpm; +Cc: miklos, linux-kernel, linux-fsdevel, kenchen, peterz

> There were heaps of problems in there and it is surprising how few people
> were hitting them.  Ordered-mode journalling filesystems will fix it all up
> behind the scenes, of course.
> 
> I just have a bad feeling about that code - list_heads are the wrong data
> structure and it all needs to be ripped and redone using some indexable
> data structure.  There has been desultory discussion, but nothing's
> happening and nothing will happen in the medium term, so we need to keep
> on whapping bandainds on it.

The reason why I'm looking at that code is because of those
balance_dirty_pages() deadlocks.  I'm not perfectly happy with the
per-pdi-per-cpu counters Peter's patch is introducing.

I was wondering if we can count the number of writeback pages through
the radix tree, just like we do for dirty pages?

All that would be needed is to keep the under-writeback inodes on some
list as well.

But I realize, that this introduces it's own problems as well...

Miklos

^ permalink raw reply	[flat|nested] 9+ messages in thread

* per bdi dirty balancing (was Re: kupdate weirdness)
       [not found]         ` <1186091062.11797.34.camel@lappy>
@ 2007-08-03  6:43           ` Miklos Szeredi
  2007-08-03  7:15             ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2007-08-03  6:43 UTC (permalink / raw)
  To: peterz; +Cc: linux-kernel, akpm

(cc restored)

> > > There were heaps of problems in there and it is surprising how few people
> > > were hitting them.  Ordered-mode journalling filesystems will fix it all up
> > > behind the scenes, of course.
> > > 
> > > I just have a bad feeling about that code - list_heads are the wrong data
> > > structure and it all needs to be ripped and redone using some indexable
> > > data structure.  There has been desultory discussion, but nothing's
> > > happening and nothing will happen in the medium term, so we need to keep
> > > on whapping bandainds on it.
> > 
> > The reason why I'm looking at that code is because of those
> > balance_dirty_pages() deadlocks.  I'm not perfectly happy with the
> > per-pdi-per-cpu counters Peter's patch is introducing.
> 
> What is your biggest concern regarding them?

Complexity.  I've started to review the patches, and they are just too
damn complex.

For example introducing the backing_dev_info initializer and
destructor adds potential bugs if we miss to add them somewhere.

Now maybe this is unavoidable.  I'm just trying to look for a solution
involving less uncertanties and complexities.

My plan is to extract the minimal set of features from your patchset,
that solves the dirty balancing deadlocks and submit them as quickly
as possible.

After that we can look at trying to solve the more ambitious problem
of the slow vs. fast devices in a way that not only you can understand ;)

How's that?

Miklos

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: per bdi dirty balancing (was Re: kupdate weirdness)
  2007-08-03  6:43           ` per bdi dirty balancing (was Re: kupdate weirdness) Miklos Szeredi
@ 2007-08-03  7:15             ` Peter Zijlstra
  2007-08-03  7:41               ` Miklos Szeredi
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2007-08-03  7:15 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, akpm

On Fri, 2007-08-03 at 08:43 +0200, Miklos Szeredi wrote:
> (cc restored)
> 
> > > > There were heaps of problems in there and it is surprising how few people
> > > > were hitting them.  Ordered-mode journalling filesystems will fix it all up
> > > > behind the scenes, of course.
> > > > 
> > > > I just have a bad feeling about that code - list_heads are the wrong data
> > > > structure and it all needs to be ripped and redone using some indexable
> > > > data structure.  There has been desultory discussion, but nothing's
> > > > happening and nothing will happen in the medium term, so we need to keep
> > > > on whapping bandainds on it.
> > > 
> > > The reason why I'm looking at that code is because of those
> > > balance_dirty_pages() deadlocks.  I'm not perfectly happy with the
> > > per-pdi-per-cpu counters Peter's patch is introducing.
> > 
> > What is your biggest concern regarding them?
> 
> Complexity.  I've started to review the patches, and they are just too
> damn complex.
> 
> For example introducing the backing_dev_info initializer and
> destructor adds potential bugs if we miss to add them somewhere.

yeah, that was/is a pain.

> Now maybe this is unavoidable.  I'm just trying to look for a solution
> involving less uncertanties and complexities.
> 
> My plan is to extract the minimal set of features from your patchset,
> that solves the dirty balancing deadlocks and submit them as quickly
> as possible.

I had hoped to post a new version yesterday, but lets hope for today.

> After that we can look at trying to solve the more ambitious problem
> of the slow vs. fast devices in a way that not only you can understand ;)

Drad, and here I thought all that documentation in the proportions lib
would have solved that :-(



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: per bdi dirty balancing (was Re: kupdate weirdness)
  2007-08-03  7:15             ` Peter Zijlstra
@ 2007-08-03  7:41               ` Miklos Szeredi
  0 siblings, 0 replies; 9+ messages in thread
From: Miklos Szeredi @ 2007-08-03  7:41 UTC (permalink / raw)
  To: peterz; +Cc: miklos, linux-kernel, akpm

> > My plan is to extract the minimal set of features from your patchset,
> > that solves the dirty balancing deadlocks and submit them as quickly
> > as possible.
> 
> I had hoped to post a new version yesterday, but lets hope for today.

Would be cool.

> > After that we can look at trying to solve the more ambitious problem
> > of the slow vs. fast devices in a way that not only you can understand ;)
> 
> Drad, and here I thought all that documentation in the proportions lib
> would have solved that :-(

Well, I didn't get that far, and only had a glimpse of the proportions
lib.  But my hunch is that there's still lots of room for simplification.

Miklos

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-08-03  7:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-01 20:45 kupdate weirdness Miklos Szeredi
2007-08-01 21:14 ` Andrew Morton
2007-08-02 15:52   ` Miklos Szeredi
2007-08-02 19:18     ` Andrew Morton
2007-08-02 19:35       ` Miklos Szeredi
     [not found]         ` <1186091062.11797.34.camel@lappy>
2007-08-03  6:43           ` per bdi dirty balancing (was Re: kupdate weirdness) Miklos Szeredi
2007-08-03  7:15             ` Peter Zijlstra
2007-08-03  7:41               ` Miklos Szeredi
2007-08-02  1:53 ` kupdate weirdness David Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox