* fs block size and PAGE_CACHE_SIZE
@ 2003-05-06 17:42 David Chow
2003-05-06 19:34 ` Szakacsits Szabolcs
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: David Chow @ 2003-05-06 17:42 UTC (permalink / raw)
To: linux-fsdevel
Hi all,
I've got a question regarding filesystem block size (the arbitary internal fs block size) between the system's PAGE_CACHE_SIZE . Since the most common block size of fs in Linux is 4k, it is because it provides easy implementation of fs which directly matches the PAGE_CACHE_SIZE . As you know, new CPUs such as IA-64's can have a page size of 16k and above, I am wondering is it possible to implement an fs that has 16k block size on a 4k system which this fs can be used both on IA-32 and IA-64 systems. In other words, is it possible (or someone already done) to implement a file system that has 16k block size on a 4k paging system. Firstly, I think of working on a readpage() aop on a non aligned page no (in the middle of a block), how could I efficient populate one 16k block to page cache(4 pages) at one readpage() op? Since the fs driver could not really create a mapped page or telling whether a page (page no) is in the page cache or not. From my knowledge, one way of doing this
is to use the read_cache_page() callback to tell whether the page is in page cache by knowing wether the callback is triggered. If I simply call read_cache_page() to aligned page in readpage() aop of unaligned pages, even the aligned page is up_to_date afterwards, how would it possible for the readpage() process to return an up_to_date page (guaranteed) for unaligned pages? I thought of locking the page, but the problem is that readpage() is called with the page locked, and call read_cache_page() on a cached aligned page will not trigger an fs read (page cache hit of aligned page doesn't mean an up_to_date for unaligned page). Since my fs implementation would be very inefficient if doing a reading in the middle of a block (compression file systems) or the overhead of reading will be 4 times higher on small page cache size systems. I am pleased to hear any suggestions on such implementations. Thanks.
regards,
David Chow
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: fs block size and PAGE_CACHE_SIZE
2003-05-06 17:42 fs block size and PAGE_CACHE_SIZE David Chow
@ 2003-05-06 19:34 ` Szakacsits Szabolcs
2003-05-06 21:08 ` Bryan Henderson
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: Szakacsits Szabolcs @ 2003-05-06 19:34 UTC (permalink / raw)
To: David Chow; +Cc: linux-fsdevel
On Wed, 7 May 2003, David Chow wrote:
> size of 16k and above, I am wondering is it possible to implement an fs
> that has 16k block size on a 4k system which this fs can be used both
> on IA-32 and IA-64 systems. In other words, is it possible (or someone
> already done) to implement a file system that has 16k block size on a
> 4k paging system.
The new NTFS driver, in 2.5 or as a patch to 2.4.20, supports all
power of 2 block sizes between 512 bytes and 64kB.
Szaka
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: fs block size and PAGE_CACHE_SIZE
2003-05-06 17:42 fs block size and PAGE_CACHE_SIZE David Chow
2003-05-06 19:34 ` Szakacsits Szabolcs
@ 2003-05-06 21:08 ` Bryan Henderson
2003-05-07 0:59 ` Phillip Lougher
2003-05-12 1:59 ` David Chow
2003-05-06 21:34 ` Trond Myklebust
2003-05-07 0:48 ` Phillip Lougher
3 siblings, 2 replies; 13+ messages in thread
From: Bryan Henderson @ 2003-05-06 21:08 UTC (permalink / raw)
To: David Chow; +Cc: linux-fsdevel
I don't know why there would be any issue having page size != block size,
as long as one is a multiple of the other. Maybe you have a particular
issue in mind?
I know one area where having a block larger than a page is a pain: When
you allocate a new block in order to write just one page, you have to
separately initialize the rest of the block.
>how could I efficient populate one 16k block to page cache(4 pages) at one
readpage() op?
Why would you want to? If someone wants to access those other 3 pages,
they'll have page faults of their own.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fs block size and PAGE_CACHE_SIZE
2003-05-06 21:08 ` Bryan Henderson
@ 2003-05-07 0:59 ` Phillip Lougher
2003-05-12 1:59 ` David Chow
1 sibling, 0 replies; 13+ messages in thread
From: Phillip Lougher @ 2003-05-07 0:59 UTC (permalink / raw)
To: Bryan Henderson; +Cc: David Chow, linux-fsdevel
Bryan Henderson wrote:
>
>>how could I efficient populate one 16k block to page cache(4 pages) at one
>>
>readpage() op?
>
>Why would you want to? If someone wants to access those other 3 pages,
>they'll have page faults of their own.
>
In an uncompressed filesystem this is not a problem because the
additional page faults (read_page), will read the pages out of the block
cache. However, for a compressed filesystem, the 16K block would
have to be (re-)decompressed evey time, because the block is compressed
in the block cache. It is much more efficient to push the extra
decompressed pages into the page cache.
Regards
Phillip Lougher
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fs block size and PAGE_CACHE_SIZE
2003-05-06 21:08 ` Bryan Henderson
2003-05-07 0:59 ` Phillip Lougher
@ 2003-05-12 1:59 ` David Chow
1 sibling, 0 replies; 13+ messages in thread
From: David Chow @ 2003-05-12 1:59 UTC (permalink / raw)
To: Bryan Henderson; +Cc: linux-fsdevel
Bryan Henderson wrote:
>
>
>I don't know why there would be any issue having page size != block size,
>as long as one is a multiple of the other. Maybe you have a particular
>issue in mind?
>
>I know one area where having a block larger than a page is a pain: When
>you allocate a new block in order to write just one page, you have to
>separately initialize the rest of the block.
>
>
>
>>how could I efficient populate one 16k block to page cache(4 pages) at one
>>
>>
>readpage() op?
>
>Why would you want to? If someone wants to access those other 3 pages,
>they'll have page faults of their own.
>
>
Yes, for uncompressed file systems it doesn't really matter. If I am
writing a file system that supports 16k blocks compression, I have to
spend 3 extra (12k extra) compression work for only one 4k data. Since
there is no way for me to read in the middle of the block. If the extra
12k data read from the block didn't put into page cache and set
up-to-date, it will be wasted. That's why.
regards,
David Chow
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: fs block size and PAGE_CACHE_SIZE
2003-05-06 17:42 fs block size and PAGE_CACHE_SIZE David Chow
2003-05-06 19:34 ` Szakacsits Szabolcs
2003-05-06 21:08 ` Bryan Henderson
@ 2003-05-06 21:34 ` Trond Myklebust
[not found] ` <3EBE85E8.50906@shaolinmicro.com>
2003-05-07 0:48 ` Phillip Lougher
3 siblings, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2003-05-06 21:34 UTC (permalink / raw)
To: David Chow; +Cc: linux-fsdevel
>>>>> " " == David Chow <davidchow@shaolinmicro.com> writes:
> In other words, is it possible (or someone already done) to
> implement a file system that has 16k block size on a 4k paging
> system.
The NFS client does that. In 2.4.x, I had to invent a custom system
for coalescing the pages, but in 2.5.x, Andrew added the readpages()
op, and the ability to control page readahead via the backing_dev_info
struct.
Cheers,
Trond
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: fs block size and PAGE_CACHE_SIZE
2003-05-06 17:42 fs block size and PAGE_CACHE_SIZE David Chow
` (2 preceding siblings ...)
2003-05-06 21:34 ` Trond Myklebust
@ 2003-05-07 0:48 ` Phillip Lougher
2003-05-11 17:12 ` David Chow
3 siblings, 1 reply; 13+ messages in thread
From: Phillip Lougher @ 2003-05-07 0:48 UTC (permalink / raw)
To: David Chow; +Cc: linux-fsdevel
David Chow wrote:
>Hi all,
> I am wondering is it possible to implement an fs that has 16k block size on a 4k system which this fs can be used both on IA-32 and IA-64 systems. In other words, is it possible (or someone already done) to implement a file system that has 16k block size on a 4k paging system.
>
Both zisofs and squashfs (http://squashfs.sourceforge.net) use larger
than 4K blocks. In the case of squashfs, it can uses blocks up to 32K.
Both of these are compressed read-only filesystems.
> Firstly, I think of working on a readpage() aop on a non aligned page no (in the middle of a block), how could I efficient populate one 16k block to page cache(4 pages) at one readpage() op? Since the fs driver could not really create a mapped page or telling whether a page (page no) is in the page cache or not. From my knowledge, one way of doing this is to use the read_cache_page() callback to tell whether the page is in page cache by knowing wether the callback is triggered. If I simply call read_cache_page() to aligned page in readpage() aop of unaligned pages, even the aligned page is up_to_date afterwards, how would it possible for the readpage() process to return an up_to_date page (guaranteed) for unaligned pages? I thought of locking the page, but the problem is that readpage() is called with the page locked, and call read_cache_page() on a cached aligned page will not trigger an fs read (page cache hit of aligned page doesn't mean an up_to_date for unaligned page)
. Since my fs implementation would be very inefficient if doing a reading in the middle of a block (compression file systems) or the overhead of reading will be 4 times higher on small page cache size systems. I am pleased to hear any suggestions on such implementations. Thanks.
>
Squashfs in the read_page routine pushes the extra pages (i.e. 3 extra
4K pages in the case of a 16K block) into the page cache using
grab_page_nowait(), which returns a page with the appropriate mapping to
be filled and marked uptodate. To avoid race conditions this routine
will not sleep if the page is already locked.
Using read_cache_page() is dangerous, because it sleeps if the page you
are trying to fill is already locked. When writing squashfs I hit the
problem where read_page was being called almost simultaneously for
different pages which were in the same (32K) block. The first
invocation read the 32K block and locked it, then tried to push the
pages into the page cache using read_cache_page(). However,
read_cache_page slept when it tried to push the page that the second
invocation had locked. The second invocation itself was sleeping
holding that lock, whilst waiting for the overal 32K block lock .
Regards
Phillip Lougher
>
>
>regards,
>David Chow
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: fs block size and PAGE_CACHE_SIZE
2003-05-07 0:48 ` Phillip Lougher
@ 2003-05-11 17:12 ` David Chow
0 siblings, 0 replies; 13+ messages in thread
From: David Chow @ 2003-05-11 17:12 UTC (permalink / raw)
To: Phillip Lougher; +Cc: linux-fsdevel
Phillip Lougher wrote:
> David Chow wrote:
>
>> Hi all,
>> I am wondering is it possible to implement an fs that has 16k block
>> size on a 4k system which this fs can be used both on IA-32 and IA-64
>> systems. In other words, is it possible (or someone already done) to
>> implement a file system that has 16k block size on a 4k paging system.
>>
>
> Both zisofs and squashfs (http://squashfs.sourceforge.net) use larger
> than 4K blocks. In the case of squashfs, it can uses blocks up to
> 32K. Both of these are compressed read-only filesystems.
>
>> Firstly, I think of working on a readpage() aop on a non aligned page
>> no (in the middle of a block), how could I efficient populate one 16k
>> block to page cache(4 pages) at one readpage() op? Since the fs
>> driver could not really create a mapped page or telling whether a
>> page (page no) is in the page cache or not. From my knowledge, one
>> way of doing this is to use the read_cache_page() callback to tell
>> whether the page is in page cache by knowing wether the callback is
>> triggered. If I simply call read_cache_page() to aligned page in
>> readpage() aop of unaligned pages, even the aligned page is
>> up_to_date afterwards, how would it possible for the readpage()
>> process to return an up_to_date page (guaranteed) for unaligned
>> pages? I thought of locking the page, but the problem is that
>> readpage() is called with the page locked, and call read_cache_page()
>> on a cached aligned page will not trigger an fs read (page cache hit
>> of aligned page doesn't mean an up_to_date for unaligned page). Since
>> my fs implementation would be very inefficient if doing a reading in
>> the middle of a block (compression file systems) or the overhead of
>> reading will be 4 times higher on small page cache size systems. I am
>> pleased to hear any suggestions on such implementations. Thanks.
>
>>
>
> Squashfs in the read_page routine pushes the extra pages (i.e. 3
> extra 4K pages in the case of a 16K block) into the page cache using
> grab_page_nowait(), which returns a page with the appropriate mapping
> to be filled and marked uptodate. To avoid race conditions this
> routine will not sleep if the page is already locked.
The case for sleep or no sleep only works on uni processor systems and
non-preemptive kernel. Since the file system is suppose to work on smp
systems which sleeping or not doesn't really helps. As you know, cases
where simultaneous read to the same block really block my head.
>
> Using read_cache_page() is dangerous, because it sleeps if the page
> you are trying to fill is already locked. When writing squashfs I hit
> the problem where read_page was being called almost simultaneously for
> different pages which were in the same (32K) block. The first
> invocation read the 32K block and locked it, then tried to push the
> pages into the page cache using read_cache_page(). However,
> read_cache_page slept when it tried to push the page that the second
> invocation had locked. The second invocation itself was sleeping
> holding that lock, whilst waiting for the overal 32K block lock .
Thanks for your pointers.
regards,
David Chow
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2003-05-12 17:33 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-06 17:42 fs block size and PAGE_CACHE_SIZE David Chow
2003-05-06 19:34 ` Szakacsits Szabolcs
2003-05-06 21:08 ` Bryan Henderson
2003-05-07 0:59 ` Phillip Lougher
2003-05-12 1:59 ` David Chow
2003-05-06 21:34 ` Trond Myklebust
[not found] ` <3EBE85E8.50906@shaolinmicro.com>
2003-05-12 0:31 ` Trond Myklebust
2003-05-12 2:07 ` David Chow
2003-05-12 10:32 ` Anton Altaparmakov
2003-05-12 11:33 ` Phillip Lougher
2003-05-12 17:46 ` David Chow
2003-05-07 0:48 ` Phillip Lougher
2003-05-11 17:12 ` David Chow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox