* Linux page cache issue?
@ 2007-03-28 6:45 Xin Zhao
2007-03-28 7:35 ` junjie cai
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Xin Zhao @ 2007-03-28 6:45 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, Xin Zhao
Hi,
If a Linux process opens and reads a file A, then it closes the file.
Will Linux keep the file A's data in cache for a while in case another
process opens and reads the same in a short time? I think that is what
I heard before.
But after I digged into the kernel code, I am confused.
When a process closes the file A, iput() will be called, which in turn
calls the follows two functions:
iput_final()->generic_drop_inode()
But from the following calling chain, we can see that file close will
eventually lead to evict and free all cached pages. Actually in
truncate_complete_page(), the pages will be freed. This seems to
imply that Linux has to re-read the same data from disk even if
another process B read the same file right after process A closes the
file. That does not make sense to me.
/***calling chain ***/
generic_delete_inode/generic_forget_inode()->
truncate_inode_pages()->truncate_inode_pages_range()->
truncate_complete_page()->remove_from_page_cache()->
__remove_from_page_cache()->radix_tree_delete()
Am I missing something? Can someone please provide some advise?
Thanks a lot
-x
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux page cache issue?
2007-03-28 6:45 Linux page cache issue? Xin Zhao
@ 2007-03-28 7:35 ` junjie cai
2007-03-28 7:38 ` Matthias Kaehlcke
2007-03-28 14:10 ` Dave Kleikamp
2 siblings, 0 replies; 9+ messages in thread
From: junjie cai @ 2007-03-28 7:35 UTC (permalink / raw)
To: Xin Zhao; +Cc: linux-kernel, linux-fsdevel
Hi,
generic_forget_inode() don't call trancate_inode_pages() if FS is still active
see the 10449 line below:
1040 static void generic_forget_inode(struct inode *inode)
1041 {
1042 struct super_block *sb = inode->i_sb;
1043
1044 if (!hlist_unhashed(&inode->i_hash)) {
1045 if (!(inode->i_state & (I_DIRTY|I_LOCK)))
1046 list_move(&inode->i_list, &inode_unused);
1047 inodes_stat.nr_unused++;
1048 spin_unlock(&inode_lock);
1049 if (!sb || (sb->s_flags & MS_ACTIVE))
1050 return;
1051 write_inode_now(inode, 1);
1052 spin_lock(&inode_lock);
1053 inodes_stat.nr_unused--;
1054 hlist_del_init(&inode->i_hash);
1055 }
1056 list_del_init(&inode->i_list);
1057 list_del_init(&inode->i_sb_list);
1058 inode->i_state|=I_FREEING;
1059 inodes_stat.nr_inodes--;
1060 spin_unlock(&inode_lock);
1061 if (inode->i_data.nrpages)
1062 truncate_inode_pages(&inode->i_data, 0);
1063 clear_inode(inode);
1064 destroy_inode(inode);
1065
On 3/28/07, Xin Zhao <uszhaoxin@gmail.com> wrote:
> Hi,
>
> If a Linux process opens and reads a file A, then it closes the file.
> Will Linux keep the file A's data in cache for a while in case another
> process opens and reads the same in a short time? I think that is what
> I heard before.
>
> But after I digged into the kernel code, I am confused.
>
> When a process closes the file A, iput() will be called, which in turn
> calls the follows two functions:
> iput_final()->generic_drop_inode()
>
> But from the following calling chain, we can see that file close will
> eventually lead to evict and free all cached pages. Actually in
> truncate_complete_page(), the pages will be freed. This seems to
> imply that Linux has to re-read the same data from disk even if
> another process B read the same file right after process A closes the
> file. That does not make sense to me.
>
> /***calling chain ***/
> generic_delete_inode/generic_forget_inode()->
> truncate_inode_pages()->truncate_inode_pages_range()->
> truncate_complete_page()->remove_from_page_cache()->
> __remove_from_page_cache()->radix_tree_delete()
>
> Am I missing something? Can someone please provide some advise?
>
> Thanks a lot
> -x
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux page cache issue?
2007-03-28 6:45 Linux page cache issue? Xin Zhao
2007-03-28 7:35 ` junjie cai
@ 2007-03-28 7:38 ` Matthias Kaehlcke
2007-03-28 14:10 ` Dave Kleikamp
2 siblings, 0 replies; 9+ messages in thread
From: Matthias Kaehlcke @ 2007-03-28 7:38 UTC (permalink / raw)
To: Xin Zhao; +Cc: linux-kernel, linux-fsdevel
according to the chapter "Linux Kernel Overview" of the
kernelhacking-HOWTO the page cache holds pages associated with *open*
files:
The Page Cache
The page cache is made up of pages, each of which refers to a 4kB
portion of data associated with an open file. The data contained in a
page may come from several disk blocks, which may or may not be
physically neighbours on the disk. The page cache is largely used to
interface the requirements of the memory management subsystem (which
uses fixed, 4kB pages) to the VFS subsystem (which uses different size
blocks for different devices).
The page cache has two important data structures, a page hash table
and an inode queue. The page hash table is used to quickly find the
page descriptor of the page holding data associated with an inode and
offset within a file. The inode queue contains lists of page
descriptors relating to open files.
http://www.kernelhacking.org/docs/kernelhacking-HOWTO/indexs03.html
m.
El Wed, Mar 28, 2007 at 02:45:23AM -0400 Xin Zhao ha dit:
> Hi,
>
> If a Linux process opens and reads a file A, then it closes the file.
> Will Linux keep the file A's data in cache for a while in case another
> process opens and reads the same in a short time? I think that is what
> I heard before.
>
> But after I digged into the kernel code, I am confused.
>
> When a process closes the file A, iput() will be called, which in turn
> calls the follows two functions:
> iput_final()->generic_drop_inode()
>
> But from the following calling chain, we can see that file close will
> eventually lead to evict and free all cached pages. Actually in
> truncate_complete_page(), the pages will be freed. This seems to
> imply that Linux has to re-read the same data from disk even if
> another process B read the same file right after process A closes the
> file. That does not make sense to me.
>
> /***calling chain ***/
> generic_delete_inode/generic_forget_inode()->
> truncate_inode_pages()->truncate_inode_pages_range()->
> truncate_complete_page()->remove_from_page_cache()->
> __remove_from_page_cache()->radix_tree_delete()
>
> Am I missing something? Can someone please provide some advise?
>
> Thanks a lot
> -x
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
For to be free is not merely to cast off
one's chains, but to live in a way that
respects and enhances the freedom of others
(Nelson Mandela)
.''`.
using free software / Debian GNU/Linux | http://debian.org : :' :
`. `'`
gpg --keyserver pgp.mit.edu --recv-keys 47D8E5D4 `-
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux page cache issue?
2007-03-28 6:45 Linux page cache issue? Xin Zhao
2007-03-28 7:35 ` junjie cai
2007-03-28 7:38 ` Matthias Kaehlcke
@ 2007-03-28 14:10 ` Dave Kleikamp
2007-03-28 15:39 ` Xin Zhao
2 siblings, 1 reply; 9+ messages in thread
From: Dave Kleikamp @ 2007-03-28 14:10 UTC (permalink / raw)
To: Xin Zhao; +Cc: linux-kernel, linux-fsdevel
On Wed, 2007-03-28 at 02:45 -0400, Xin Zhao wrote:
> Hi,
>
> If a Linux process opens and reads a file A, then it closes the file.
> Will Linux keep the file A's data in cache for a while in case another
> process opens and reads the same in a short time? I think that is what
> I heard before.
Yes.
> But after I digged into the kernel code, I am confused.
>
> When a process closes the file A, iput() will be called, which in turn
> calls the follows two functions:
> iput_final()->generic_drop_inode()
A comment from the top of fs/dcache.c:
/*
* Notes on the allocation strategy:
*
* The dcache is a master of the icache - whenever a dcache entry
* exists, the inode will always exist. "iput()" is done either when
* the dcache entry is deleted or garbage collected.
*/
Basically, as long a a dentry is present, iput_final won't be called on
the inode.
> But from the following calling chain, we can see that file close will
> eventually lead to evict and free all cached pages. Actually in
> truncate_complete_page(), the pages will be freed. This seems to
> imply that Linux has to re-read the same data from disk even if
> another process B read the same file right after process A closes the
> file. That does not make sense to me.
>
> /***calling chain ***/
> generic_delete_inode/generic_forget_inode()->
> truncate_inode_pages()->truncate_inode_pages_range()->
> truncate_complete_page()->remove_from_page_cache()->
> __remove_from_page_cache()->radix_tree_delete()
>
> Am I missing something? Can someone please provide some advise?
>
> Thanks a lot
> -x
Shaggy
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux page cache issue?
2007-03-28 14:10 ` Dave Kleikamp
@ 2007-03-28 15:39 ` Xin Zhao
[not found] ` <alpine.DEB.0.83.0703281157010.2527@sigma.j-a-k-j.com>
2007-03-29 9:27 ` Jan Kara
0 siblings, 2 replies; 9+ messages in thread
From: Xin Zhao @ 2007-03-28 15:39 UTC (permalink / raw)
To: Dave Kleikamp; +Cc: linux-kernel, linux-fsdevel
Thanks a lot! Folks!
Your reply addressed my concern.
Now I want to explain the problem that leads me to explore the Linux
disk cache management. This is actually from my project. In a file
system I am working on, two files may have different inodes, but share
the same data blocks. Of course additional block-level reference
counting and copy-on-write mechanisms are needed to prevent operations
on one file from disrupting the other file. But the point is, the two
files share the same data blocks.
I hope that consequential reads to the two files can benefit from disk
cache, since they have the same data blocks. But I noticed that Linux
splits disk buffer cache into many small parts and associate a file's
data with its mapping object. Linux determines whether a data page is
cached or not by lookup the file's mapping radix tree. So this is a
per-file radix tree. This design obviously makes each tree smaller and
faster to look up. But this design eliminates the possibility of
sharing disk cache across two files. For example, if a process reads
file 2 right after file 1 (both file 1 and 2 share the same data block
set). Even if the data blocks are already loaded in memory, but they
can only be located via file 1's mapping object. When Linux reads file
2, it still think the data is not present in memory. So the process
still needs to load the data from disk again.
Would it make sense to build a per-device radix tree indexed by (dev,
sect_no)? The loaded data pages can still be associated with a
per-file radix tree in the file's mapping object, but it is also
associated with the per-device radix tree. When looking up cached
pages, Linux can first check the per-file radix tree. The per-device
radix tree is checked only if Linux fails to find a cached page in the
per-file radix tree. The lookup of the per-device radix tree may incur
some overhead. But compared to the slow disk access, looking up an
in-memory radix tree is much cheaper and should be trivial, I guess.
Any thought about this?
Thanks,
-x
On 3/28/07, Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote:
> On Wed, 2007-03-28 at 02:45 -0400, Xin Zhao wrote:
> > Hi,
> >
> > If a Linux process opens and reads a file A, then it closes the file.
> > Will Linux keep the file A's data in cache for a while in case another
> > process opens and reads the same in a short time? I think that is what
> > I heard before.
>
> Yes.
>
> > But after I digged into the kernel code, I am confused.
> >
> > When a process closes the file A, iput() will be called, which in turn
> > calls the follows two functions:
> > iput_final()->generic_drop_inode()
>
> A comment from the top of fs/dcache.c:
>
> /*
> * Notes on the allocation strategy:
> *
> * The dcache is a master of the icache - whenever a dcache entry
> * exists, the inode will always exist. "iput()" is done either when
> * the dcache entry is deleted or garbage collected.
> */
>
> Basically, as long a a dentry is present, iput_final won't be called on
> the inode.
>
> > But from the following calling chain, we can see that file close will
> > eventually lead to evict and free all cached pages. Actually in
> > truncate_complete_page(), the pages will be freed. This seems to
> > imply that Linux has to re-read the same data from disk even if
> > another process B read the same file right after process A closes the
> > file. That does not make sense to me.
> >
> > /***calling chain ***/
> > generic_delete_inode/generic_forget_inode()->
> > truncate_inode_pages()->truncate_inode_pages_range()->
> > truncate_complete_page()->remove_from_page_cache()->
> > __remove_from_page_cache()->radix_tree_delete()
> >
> > Am I missing something? Can someone please provide some advise?
> >
> > Thanks a lot
> > -x
>
> Shaggy
> --
> David Kleikamp
> IBM Linux Technology Center
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux page cache issue?
[not found] ` <alpine.DEB.0.83.0703281157010.2527@sigma.j-a-k-j.com>
@ 2007-03-28 16:15 ` Xin Zhao
0 siblings, 0 replies; 9+ messages in thread
From: Xin Zhao @ 2007-03-28 16:15 UTC (permalink / raw)
To: John Anthony Kazos Jr., linux-fsdevel, linux-kernel
You are right. If the device is very big, the radix tree could be huge
as well. Maybe the lookup it not that cheap. But the per-device tree
can be optimized too. A simple way I can immediately image is: evenly
split a device into N parts by the sector numbers. For each part, we
maintain a radix tree. Let's do a math. Suppose I have a 32G partition
(2^35 bytes) and each data block is 4K bytes (2^12). So the partition
has 2^23 blocks. I divide the blocks into 1024 (2^12) groups. Each
group will only have 2^11 blocks. With radix tree, the average lookup
overhead for each tree would be log(2^11) steps. That is 11 in-memory
tree traverse to locate a page. This cost seems to be acceptable. I
don't really measure it though. As to the memory used for maintain the
radix trees, I believe it is trivial considering the memory size of
modern computers.
Xin
On 3/28/07, John Anthony Kazos Jr. <jakj@j-a-k-j.com> wrote:
> > The lookup of the per-device radix tree may incur some overhead. But
> > compared to the slow disk access, looking up an in-memory radix tree is
> > much cheaper and should be trivial, I guess.
>
> I would consider whether or not it really is trivial. You'd have to think
> hard about just how much of your filesystem is going to be sharing data
> blocks. If you fail to find in the per-file tree, then fail to find in the
> per-device tree, then still have to read the block from the device, and
> this is happening too often, then the additional overhead of the
> per-device tree check for non-cached items may end up cancelling the
> savings for cached items.
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux page cache issue?
2007-03-28 15:39 ` Xin Zhao
[not found] ` <alpine.DEB.0.83.0703281157010.2527@sigma.j-a-k-j.com>
@ 2007-03-29 9:27 ` Jan Kara
2007-03-29 14:41 ` Xin Zhao
1 sibling, 1 reply; 9+ messages in thread
From: Jan Kara @ 2007-03-29 9:27 UTC (permalink / raw)
To: Xin Zhao; +Cc: Dave Kleikamp, linux-kernel, linux-fsdevel
Hello,
> Now I want to explain the problem that leads me to explore the Linux
> disk cache management. This is actually from my project. In a file
> system I am working on, two files may have different inodes, but share
> the same data blocks. Of course additional block-level reference
> counting and copy-on-write mechanisms are needed to prevent operations
> on one file from disrupting the other file. But the point is, the two
> files share the same data blocks.
>
> I hope that consequential reads to the two files can benefit from disk
> cache, since they have the same data blocks. But I noticed that Linux
> splits disk buffer cache into many small parts and associate a file's
> data with its mapping object. Linux determines whether a data page is
> cached or not by lookup the file's mapping radix tree. So this is a
> per-file radix tree. This design obviously makes each tree smaller and
> faster to look up. But this design eliminates the possibility of
> sharing disk cache across two files. For example, if a process reads
> file 2 right after file 1 (both file 1 and 2 share the same data block
> set). Even if the data blocks are already loaded in memory, but they
> can only be located via file 1's mapping object. When Linux reads file
> 2, it still think the data is not present in memory. So the process
> still needs to load the data from disk again.
Actually, there is one inode - the device inode - whose mapping can
contain all the blocks of the filesystem. That is basically the radix
tree you are looking for. ext3 for example uses it for accessing its
metadata (indirect blocks etc.). But you have to be really careful to
avoid aliasing issues and such when you'd like to map copies of those
pages into mappings of several different inodes (BTW ext3cow filesystem
may be interesting for you www.ext3cow.com).
Honza
> On 3/28/07, Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote:
> >On Wed, 2007-03-28 at 02:45 -0400, Xin Zhao wrote:
> >> Hi,
> >>
> >> If a Linux process opens and reads a file A, then it closes the file.
> >> Will Linux keep the file A's data in cache for a while in case another
> >> process opens and reads the same in a short time? I think that is what
> >> I heard before.
> >
> >Yes.
> >
> >> But after I digged into the kernel code, I am confused.
> >>
> >> When a process closes the file A, iput() will be called, which in turn
> >> calls the follows two functions:
> >> iput_final()->generic_drop_inode()
> >
> >A comment from the top of fs/dcache.c:
> >
> >/*
> > * Notes on the allocation strategy:
> > *
> > * The dcache is a master of the icache - whenever a dcache entry
> > * exists, the inode will always exist. "iput()" is done either when
> > * the dcache entry is deleted or garbage collected.
> > */
> >
> >Basically, as long a a dentry is present, iput_final won't be called on
> >the inode.
> >
> >> But from the following calling chain, we can see that file close will
> >> eventually lead to evict and free all cached pages. Actually in
> >> truncate_complete_page(), the pages will be freed. This seems to
> >> imply that Linux has to re-read the same data from disk even if
> >> another process B read the same file right after process A closes the
> >> file. That does not make sense to me.
> >>
> >> /***calling chain ***/
> >> generic_delete_inode/generic_forget_inode()->
> >> truncate_inode_pages()->truncate_inode_pages_range()->
> >> truncate_complete_page()->remove_from_page_cache()->
> >> __remove_from_page_cache()->radix_tree_delete()
> >>
> >> Am I missing something? Can someone please provide some advise?
> >>
> >> Thanks a lot
> >> -x
> >
> >Shaggy
> >--
> >David Kleikamp
> >IBM Linux Technology Center
> >
> >
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux page cache issue?
2007-03-29 9:27 ` Jan Kara
@ 2007-03-29 14:41 ` Xin Zhao
2007-04-02 12:51 ` Jan Kara
0 siblings, 1 reply; 9+ messages in thread
From: Xin Zhao @ 2007-03-29 14:41 UTC (permalink / raw)
To: Jan Kara; +Cc: Dave Kleikamp, linux-kernel, linux-fsdevel
Hi Jan,
Many thanks for your kind reply.
I know we can use device inode's radix tree to achieve the same goal.
The only downside could be: First, by default, Linux will not add the
data pages into that radix tree. Only when a file is opened in
O_DIRECT, the data pages will be put into dev's radix tree. Moreover,
if the partition is big, I am not sure whether the lookup overhead is
an issue. So it might need some optimization.
Can you elaborate more about the aliasing issues mentioned in your
email? I do have some mechanisms to handle the following situation:
suppose two files share same data blocks. Now two processes open the
two files separately. If one process writes a file, the other file
will be affected. Is this the aliasing issue you referred to?
Thanks,
xin
On 3/29/07, Jan Kara <jack@suse.cz> wrote:
> Hello,
>
> > Now I want to explain the problem that leads me to explore the Linux
> > disk cache management. This is actually from my project. In a file
> > system I am working on, two files may have different inodes, but share
> > the same data blocks. Of course additional block-level reference
> > counting and copy-on-write mechanisms are needed to prevent operations
> > on one file from disrupting the other file. But the point is, the two
> > files share the same data blocks.
> >
> > I hope that consequential reads to the two files can benefit from disk
> > cache, since they have the same data blocks. But I noticed that Linux
> > splits disk buffer cache into many small parts and associate a file's
> > data with its mapping object. Linux determines whether a data page is
> > cached or not by lookup the file's mapping radix tree. So this is a
> > per-file radix tree. This design obviously makes each tree smaller and
> > faster to look up. But this design eliminates the possibility of
> > sharing disk cache across two files. For example, if a process reads
> > file 2 right after file 1 (both file 1 and 2 share the same data block
> > set). Even if the data blocks are already loaded in memory, but they
> > can only be located via file 1's mapping object. When Linux reads file
> > 2, it still think the data is not present in memory. So the process
> > still needs to load the data from disk again.
> Actually, there is one inode - the device inode - whose mapping can
> contain all the blocks of the filesystem. That is basically the radix
> tree you are looking for. ext3 for example uses it for accessing its
> metadata (indirect blocks etc.). But you have to be really careful to
> avoid aliasing issues and such when you'd like to map copies of those
> pages into mappings of several different inodes (BTW ext3cow filesystem
> may be interesting for you www.ext3cow.com).
>
> Honza
>
> > On 3/28/07, Dave Kleikamp <shaggy@linux.vnet.ibm.com> wrote:
> > >On Wed, 2007-03-28 at 02:45 -0400, Xin Zhao wrote:
> > >> Hi,
> > >>
> > >> If a Linux process opens and reads a file A, then it closes the file.
> > >> Will Linux keep the file A's data in cache for a while in case another
> > >> process opens and reads the same in a short time? I think that is what
> > >> I heard before.
> > >
> > >Yes.
> > >
> > >> But after I digged into the kernel code, I am confused.
> > >>
> > >> When a process closes the file A, iput() will be called, which in turn
> > >> calls the follows two functions:
> > >> iput_final()->generic_drop_inode()
> > >
> > >A comment from the top of fs/dcache.c:
> > >
> > >/*
> > > * Notes on the allocation strategy:
> > > *
> > > * The dcache is a master of the icache - whenever a dcache entry
> > > * exists, the inode will always exist. "iput()" is done either when
> > > * the dcache entry is deleted or garbage collected.
> > > */
> > >
> > >Basically, as long a a dentry is present, iput_final won't be called on
> > >the inode.
> > >
> > >> But from the following calling chain, we can see that file close will
> > >> eventually lead to evict and free all cached pages. Actually in
> > >> truncate_complete_page(), the pages will be freed. This seems to
> > >> imply that Linux has to re-read the same data from disk even if
> > >> another process B read the same file right after process A closes the
> > >> file. That does not make sense to me.
> > >>
> > >> /***calling chain ***/
> > >> generic_delete_inode/generic_forget_inode()->
> > >> truncate_inode_pages()->truncate_inode_pages_range()->
> > >> truncate_complete_page()->remove_from_page_cache()->
> > >> __remove_from_page_cache()->radix_tree_delete()
> > >>
> > >> Am I missing something? Can someone please provide some advise?
> > >>
> > >> Thanks a lot
> > >> -x
> > >
> > >Shaggy
> > >--
> > >David Kleikamp
> > >IBM Linux Technology Center
> > >
> > >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> --
> Jan Kara <jack@suse.cz>
> SuSE CR Labs
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux page cache issue?
2007-03-29 14:41 ` Xin Zhao
@ 2007-04-02 12:51 ` Jan Kara
0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2007-04-02 12:51 UTC (permalink / raw)
To: Xin Zhao; +Cc: Dave Kleikamp, linux-kernel, linux-fsdevel
Hi Xin,
On Thu 29-03-07 10:41:01, Xin Zhao wrote:
> I know we can use device inode's radix tree to achieve the same goal.
> The only downside could be: First, by default, Linux will not add the
> data pages into that radix tree. Only when a file is opened in
Right.
> O_DIRECT, the data pages will be put into dev's radix tree. Moreover,
If you use O_DIRECT, I don't think the data will and in any radix tree -
ideally they go directly to disk in this case.
> if the partition is big, I am not sure whether the lookup overhead is
> an issue. So it might need some optimization.
Maybe, but I'd not say so as my first guess.
> Can you elaborate more about the aliasing issues mentioned in your
> email? I do have some mechanisms to handle the following situation:
> suppose two files share same data blocks. Now two processes open the
> two files separately. If one process writes a file, the other file
> will be affected. Is this the aliasing issue you referred to?
Yes, this is exactly what I meant. Note that these problems are not
only about writes but also about truncate and such...
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-04-02 12:44 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-28 6:45 Linux page cache issue? Xin Zhao
2007-03-28 7:35 ` junjie cai
2007-03-28 7:38 ` Matthias Kaehlcke
2007-03-28 14:10 ` Dave Kleikamp
2007-03-28 15:39 ` Xin Zhao
[not found] ` <alpine.DEB.0.83.0703281157010.2527@sigma.j-a-k-j.com>
2007-03-28 16:15 ` Xin Zhao
2007-03-29 9:27 ` Jan Kara
2007-03-29 14:41 ` Xin Zhao
2007-04-02 12:51 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).