public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: inode->i_dirty_buffers redundant ?
  2001-01-24  9:55 V Ganesh
@ 2001-01-24  9:09 ` Marcelo Tosatti
  2001-01-24 11:26 ` Stephen C. Tweedie
  1 sibling, 0 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2001-01-24  9:09 UTC (permalink / raw)
  To: V Ganesh; +Cc: linux-kernel, sct


On Wed, 24 Jan 2001, V Ganesh wrote:

> now that we have inode->i_mapping->dirty_pages, what do we need
> inode->i_dirty_buffers for ? I understand the latter was added for the O_SYNC
> changes before dirty_pages came into the picture. but now both seem to be
> doing more or less the same thing.

inode->i_mapping->dirty_pages contains only mapped data pages of the
inode, not metadata.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* inode->i_dirty_buffers redundant ?
@ 2001-01-24  9:55 V Ganesh
  2001-01-24  9:09 ` Marcelo Tosatti
  2001-01-24 11:26 ` Stephen C. Tweedie
  0 siblings, 2 replies; 12+ messages in thread
From: V Ganesh @ 2001-01-24  9:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: sct

now that we have inode->i_mapping->dirty_pages, what do we need
inode->i_dirty_buffers for ? I understand the latter was added for the O_SYNC
changes before dirty_pages came into the picture. but now both seem to be
doing more or less the same thing.

ganesh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode->i_dirty_buffers redundant ?
  2001-01-24  9:55 V Ganesh
  2001-01-24  9:09 ` Marcelo Tosatti
@ 2001-01-24 11:26 ` Stephen C. Tweedie
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen C. Tweedie @ 2001-01-24 11:26 UTC (permalink / raw)
  To: V Ganesh; +Cc: linux-kernel, sct

Hi,

On Wed, Jan 24, 2001 at 03:25:16PM +0530, V Ganesh wrote:
> now that we have inode->i_mapping->dirty_pages, what do we need
> inode->i_dirty_buffers for ?

Metadata.  Specifically, directory contents and indirection blocks.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode->i_dirty_buffers redundant ?
@ 2001-01-25 10:47 V Ganesh
  2001-01-25 16:44 ` Stephen C. Tweedie
  0 siblings, 1 reply; 12+ messages in thread
From: V Ganesh @ 2001-01-25 10:47 UTC (permalink / raw)
  To: linux-kernel, marcelo, sct

Stephen C. Tweedie <sct@redhat.com> wrote:
: Hi,

: On Wed, Jan 24, 2001 at 03:25:16PM +0530, V Ganesh wrote:
:> now that we have inode->i_mapping->dirty_pages, what do we need
:> inode->i_dirty_buffers for ?

: Metadata.  Specifically, directory contents and indirection blocks.

: --Stephen

ah, mark_buffer_dirty_inode(). thanks.
so if I understand it,
1. read() and mmap read faults will put the page in i_mapping->clean_pages
2. mmaped writes will (eventually, from msync or unmap or swapout) put the
   page in i_mapping->dirty_pages
3. write() will put pages into i_dirty_buffers (__block_commit_write() calls
   buffer_insert_inode_queue()).

so i_dirty_buffers contains buffer_heads of pages coming from write() as
well as metadata buffers from mark_buffer_dirty_inode(). a dirty MAP_SHARED
page which has been write()n to will potentially exist in both lists.
won't doing a set_dirty_page() instead of buffer_insert_inode_queue() in
__block_commit_write() make things much simpler ? then we'd have i_dirty_buffers
having _only_ metadata, and all data pages in the i_mapping->*_pages lists.

ganesh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode->i_dirty_buffers redundant ?
  2001-01-25 10:47 inode->i_dirty_buffers redundant ? V Ganesh
@ 2001-01-25 16:44 ` Stephen C. Tweedie
  2001-01-25 20:05   ` Daniel Phillips
  2001-01-25 21:11   ` Marcelo Tosatti
  0 siblings, 2 replies; 12+ messages in thread
From: Stephen C. Tweedie @ 2001-01-25 16:44 UTC (permalink / raw)
  To: V Ganesh; +Cc: linux-kernel, marcelo, sct

Hi,

On Thu, Jan 25, 2001 at 04:17:30PM +0530, V Ganesh wrote:

> so i_dirty_buffers contains buffer_heads of pages coming from write() as
> well as metadata buffers from mark_buffer_dirty_inode(). a dirty MAP_SHARED
> page which has been write()n to will potentially exist in both lists.
> won't doing a set_dirty_page() instead of buffer_insert_inode_queue() in
> __block_commit_write() make things much simpler ? then we'd have i_dirty_buffers
> having _only_ metadata, and all data pages in the i_mapping->*_pages lists.

That would only complicate things: it would mean we'd have to scan
both lists on fsync instead of just the one, for example.  There are a
number of places where we need buffer lists for dirty data anyway,
such as for bdflush's background sync to disk.  We also maintain the
per-page buffer lists as caches of the virtual-to-physical mapping to
avoid redundant bmap()ping.  So, removing the buffer_heads which alias
the page cache data isn't an option.  Given that, it's as well to keep
all the inode's dirty buffers in the one place.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode->i_dirty_buffers redundant ?
  2001-01-25 16:44 ` Stephen C. Tweedie
@ 2001-01-25 20:05   ` Daniel Phillips
  2001-01-25 21:30     ` Marcelo Tosatti
  2001-01-26 11:35     ` Stephen C. Tweedie
  2001-01-25 21:11   ` Marcelo Tosatti
  1 sibling, 2 replies; 12+ messages in thread
From: Daniel Phillips @ 2001-01-25 20:05 UTC (permalink / raw)
  To: Stephen C. Tweedie, linux-kernel

"Stephen C. Tweedie" wrote:
> We also maintain the 
> per-page buffer lists as caches of the virtual-to-physical mapping to
> avoid redundant bmap()ping.

Could you clarify that one, please?

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode->i_dirty_buffers redundant ?
  2001-01-25 16:44 ` Stephen C. Tweedie
  2001-01-25 20:05   ` Daniel Phillips
@ 2001-01-25 21:11   ` Marcelo Tosatti
  2001-01-26 17:34     ` Stephen C. Tweedie
  1 sibling, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2001-01-25 21:11 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: V Ganesh, lkml, Steve Lord


On Thu, 25 Jan 2001, Stephen C. Tweedie wrote:

> Hi,
> 
> On Thu, Jan 25, 2001 at 04:17:30PM +0530, V Ganesh wrote:
> 
> > so i_dirty_buffers contains buffer_heads of pages coming from write() as
> > well as metadata buffers from mark_buffer_dirty_inode(). a dirty MAP_SHARED
> > page which has been write()n to will potentially exist in both lists.
> > won't doing a set_dirty_page() instead of buffer_insert_inode_queue() in
> > __block_commit_write() make things much simpler ? then we'd have i_dirty_buffers
> > having _only_ metadata, and all data pages in the i_mapping->*_pages lists.
> 
> That would only complicate things: it would mean we'd have to scan
> both lists on fsync instead of just the one, for example.  There are a
> number of places where we need buffer lists for dirty data anyway,
> such as for bdflush's background sync to disk.  We also maintain the
> per-page buffer lists as caches of the virtual-to-physical mapping to
> avoid redundant bmap()ping.  

Btw, 

We probably want another kind of "IO buffer" abstraction for 2.5 which can
support buffer's bigger than PAGE_SIZE. 

Do you have any thoughts on that, Stephen? 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode->i_dirty_buffers redundant ?
  2001-01-25 20:05   ` Daniel Phillips
@ 2001-01-25 21:30     ` Marcelo Tosatti
  2001-01-26 11:35     ` Stephen C. Tweedie
  1 sibling, 0 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2001-01-25 21:30 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Stephen C. Tweedie, linux-kernel


On Thu, 25 Jan 2001, Daniel Phillips wrote:

> "Stephen C. Tweedie" wrote:
> > We also maintain the 
> > per-page buffer lists as caches of the virtual-to-physical mapping to
> > avoid redundant bmap()ping.
> 
> Could you clarify that one, please?

Daniel, 

With "physical mapping" Stephen means on-disk block number.

If the buffer(s) for a page are mapped with valid information (ie
BH_Mapped) we avoid calling get_block(). 

See?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode->i_dirty_buffers redundant ?
@ 2001-01-26  9:13 V Ganesh
  0 siblings, 0 replies; 12+ messages in thread
From: V Ganesh @ 2001-01-26  9:13 UTC (permalink / raw)
  To: linux-kernel

Stephen C. Tweedie <sct@redhat.com> wrote:

: That would only complicate things: it would mean we'd have to scan
: both lists on fsync instead of just the one, for example.  There are a

we already do; filemap_fdatasync() is called first in sys_fsync(), though
it usually doesn't have much work I guess.

: number of places where we need buffer lists for dirty data anyway,
: such as for bdflush's background sync to disk.  We also maintain the
: per-page buffer lists as caches of the virtual-to-physical mapping to
: avoid redundant bmap()ping.  So, removing the buffer_heads which alias
: the page cache data isn't an option.  Given that, it's as well to keep
: all the inode's dirty buffers in the one place.

keeping dirty pages in the address space list doesn't preclude any of the
above. the pages could still have buffer_heads attached to them, and
these would cache the block location and be a part of the dirty buffer
list used by bdflush.
I guess both approaches would be roughly the same from a performance
point of view. I feel that keeping all data pages in the address space is more
elegant from a design point of view, but that's quite subjective, of course.

ganesh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode->i_dirty_buffers redundant ?
  2001-01-25 20:05   ` Daniel Phillips
  2001-01-25 21:30     ` Marcelo Tosatti
@ 2001-01-26 11:35     ` Stephen C. Tweedie
  2001-01-26 19:05       ` Daniel Phillips
  1 sibling, 1 reply; 12+ messages in thread
From: Stephen C. Tweedie @ 2001-01-26 11:35 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Stephen C. Tweedie, linux-kernel

Hi,

On Thu, Jan 25, 2001 at 09:05:54PM +0100, Daniel Phillips wrote:
> "Stephen C. Tweedie" wrote:
> > We also maintain the 
> > per-page buffer lists as caches of the virtual-to-physical mapping to
> > avoid redundant bmap()ping.
> 
> Could you clarify that one, please?

The buffer contains a physical label for the block's location on disk.
The page cache is indexed purely by logical location, so doing IO
to/from the page cache requires us to lookup the physical locations of
each block within the page.

Caching the buffer_heads for page cache pages means that once those
lookups are done once, further IO on the same page can bypass the
lookup and go straight to disk.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode->i_dirty_buffers redundant ?
  2001-01-25 21:11   ` Marcelo Tosatti
@ 2001-01-26 17:34     ` Stephen C. Tweedie
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen C. Tweedie @ 2001-01-26 17:34 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Stephen C. Tweedie, V Ganesh, lkml, Steve Lord

Hi,

On Thu, Jan 25, 2001 at 07:11:01PM -0200, Marcelo Tosatti wrote:
> 
> We probably want another kind of "IO buffer" abstraction for 2.5 which can
> support buffer's bigger than PAGE_SIZE. 
> 
> Do you have any thoughts on that, Stephen? 

XFS is already doing this, with pagebufs being used in their cache,
and kiobufs used for the IO submission.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inode->i_dirty_buffers redundant ?
  2001-01-26 11:35     ` Stephen C. Tweedie
@ 2001-01-26 19:05       ` Daniel Phillips
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Phillips @ 2001-01-26 19:05 UTC (permalink / raw)
  To: Stephen C. Tweedie, linux-kernel

"Stephen C. Tweedie" wrote:
> 
> Hi,
> 
> On Thu, Jan 25, 2001 at 09:05:54PM +0100, Daniel Phillips wrote:
> > "Stephen C. Tweedie" wrote:
> > > We also maintain the
> > > per-page buffer lists as caches of the virtual-to-physical mapping to
> > > avoid redundant bmap()ping.
> >
> > Could you clarify that one, please?
> 
> The buffer contains a physical label for the block's location on disk.
> The page cache is indexed purely by logical location, so doing IO
> to/from the page cache requires us to lookup the physical locations of
> each block within the page.
> 
> Caching the buffer_heads for page cache pages means that once those
> lookups are done once, further IO on the same page can bypass the
> lookup and go straight to disk.

OK, this was just a terminology problem - I normally say 'filesystem
block' or just 'block' where you said 'physical', and if you say
'physical' to me, I'll hear 'physical memory address' :-/  I was also
confused by 'bmap', which to me means the bad old way of
generic_reading, and to you means 'mapping a buffer to a disk
location'.  Um.  Which I would express as 'mapping a buffer to a block'.

Leaving that confusing behind, I am now working out an approach to
mapping an inode's index blocks into the page cache.  Probably, I am
reinventing the wheel, but here it is.  I plan to create another
address_space mapping in the inode's ext2-private area, called
i_index_mapping.  This maps all possible index blocks onto the
i_index_mapping, IOW, for each possible data block we can directly
compute the index into the i_index_mapping at which we will search for
the datablock's parent index block.  Normally we will find the index
block there (because it was read recently) and not have to search down
through its parents as we must with the buffer cache approach.  Perhaps
then the inode->dirty_buffers list can go away.

Obviously the impact on ext2_get_block is fairly major, but I think the
end result will be simpler than what we have now.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2001-01-26 19:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-25 10:47 inode->i_dirty_buffers redundant ? V Ganesh
2001-01-25 16:44 ` Stephen C. Tweedie
2001-01-25 20:05   ` Daniel Phillips
2001-01-25 21:30     ` Marcelo Tosatti
2001-01-26 11:35     ` Stephen C. Tweedie
2001-01-26 19:05       ` Daniel Phillips
2001-01-25 21:11   ` Marcelo Tosatti
2001-01-26 17:34     ` Stephen C. Tweedie
  -- strict thread matches above, loose matches on Subject: below --
2001-01-26  9:13 V Ganesh
2001-01-24  9:55 V Ganesh
2001-01-24  9:09 ` Marcelo Tosatti
2001-01-24 11:26 ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox