Understanding UBIFS flash overhead

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* Understanding UBIFS flash overhead
@ 2008-10-10 21:00 Deepak Saxena
  2008-10-13  6:48 ` Artem Bityutskiy
  2008-10-13  7:45 ` Artem Bityutskiy
  0 siblings, 2 replies; 6+ messages in thread
From: Deepak Saxena @ 2008-10-10 21:00 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: linux-mtd

Artem,

I'm working on getting UBIFS going on the OLPC XO and am trying to
understand what I am seeing in regards to file system size. 

I have partitioned the 1GiB mtd device into a 32MiB JFFS boot partition
for OFW, a 128KiB partition to hold the Redboot partition table,
and the remainder (991.625) for use by UBI.

The NAND device has 128KiB EBs with a 2KiB pages and we are running w/o
sub-page writes. Plugging this into the overhead calculation, we get:

SP	PEB Size		128KiB
SL	LEB Size		128KiB - 2 * 2KiB = 124 KiB
P	Total PEBs 		991.625MiB / 128KiB = 7933
B	Reserved PEBsg		79(1%)
O	Overhead= SP - SL	4KiB

UBI Overhead = (B + 4) * SP + O * (P - B - 4) 
             = (79 + 4) * 128Kib + 4 KiB * (7933 - 79 - 4)
             = 42024 KiB 
             = 329.3125 PEBs (round to 329)

This leaves us with 7604 PEBs or 973312KiB available for user data. 

At boot up, I see:

UBIFS: file system size: 995237888 bytes (971912 KiB, 949 MiB,
UBIFS: journal size: 9023488 bytes (8812 KiB, 8 MiB, 72 LEBs)

'df' returns:

Filesystem             Size   Used  Avail Use% Mounted on
mtd0                   822M   242M   581M  30% /

I expect some overhead, but I'm really wondering where over 100MiB of
space went!

This is 2.6.25 with UBI 2.6.25 tree merged in. Does that tree have
any bugfixes from 2.6.26+ backported?

~Deepak

-- 
   _____   __o  Deepak Saxena - Living CarFree and CareFree          (o>
------    -\<,  "When I see an adult on a bicycle, I do not despair  //\
 ----- ( )/ ( )  for the future of the human race." -H.G Wells       V_/_
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Understanding UBIFS flash overhead
  2008-10-10 21:00 Understanding UBIFS flash overhead Deepak Saxena
@ 2008-10-13  6:48 ` Artem Bityutskiy
  2008-10-14 22:56   ` Deepak Saxena
  2008-10-13  7:45 ` Artem Bityutskiy
  1 sibling, 1 reply; 6+ messages in thread
From: Artem Bityutskiy @ 2008-10-13  6:48 UTC (permalink / raw)
  To: dsaxena; +Cc: linux-mtd

On Fri, 2008-10-10 at 14:00 -0700, Deepak Saxena wrote:
> I'm working on getting UBIFS going on the OLPC XO and am trying to
> understand what I am seeing in regards to file system size. 

Nice. I think UBIFS is quite suitable for OLPC and should improve its
boot time and, presumably, performance.

> I have partitioned the 1GiB mtd device into a 32MiB JFFS boot partition
> for OFW, a 128KiB partition to hold the Redboot partition table,
> and the remainder (991.625) for use by UBI.

OK.

> The NAND device has 128KiB EBs with a 2KiB pages and we are running w/o
> sub-page writes.

Right. Last time I looked at this I found out that your NAND supports
sub-pages, but your Marvell controller does not, unfortunately.

>  Plugging this into the overhead calculation, we get:
> 
> SP	PEB Size		128KiB
> SL	LEB Size		128KiB - 2 * 2KiB = 124 KiB
> P	Total PEBs 		991.625MiB / 128KiB = 7933
> B	Reserved PEBsg		79(1%)
> O	Overhead= SP - SL	4KiB
> 
> 
> UBI Overhead = (B + 4) * SP + O * (P - B - 4) 
>              = (79 + 4) * 128Kib + 4 KiB * (7933 - 79 - 4)
>              = 42024 KiB 
>              = 329.3125 PEBs (round to 329)
> 
> This leaves us with 7604 PEBs or 973312KiB available for user data. 

OK.

> At boot up, I see:
> 
> UBIFS: file system size: 995237888 bytes (971912 KiB, 949 MiB,
> UBIFS: journal size: 9023488 bytes (8812 KiB, 8 MiB, 72 LEBs)

Note, this FS size (949 MiB) is the size which UBIFS will use for its
"main area". Main area includes all the FS, the index, and the journal.
So this does not mean you'll have 949MiB for your FS data. You'll have
slightly less.

> 'df' returns:
> 
> Filesystem             Size   Used  Avail Use% Mounted on
> mtd0                   822M   242M   581M  30% /
> 
> I expect some overhead, but I'm really wondering where over 100MiB of
> space went!
> 
> This is 2.6.25 with UBI 2.6.25 tree merged in. Does that tree have
> any bugfixes from 2.6.26+ backported?

I have several comments.

1. We've improved df reporting, but we have not updated the back-port
trees for long time, so the improvements were not there. I've just
updated all back-ports and they contain all the recent patches we
consider stable. It is basically identical to ubifs-2.6.git now. Please,
update, things will become better. However, do not expect df to tell you
precise information anyway. See below.

2. Let me first describe JFFS2 experience we have had. In statfs() (and
thus, df) JFFS2 reports physical flash space. There are some challenges
in calculating this space, and JFFS2 often lies in df. For example, it
may say it has 20MiB free, but if you start writing on it, you'll be
able to write only 16MiB file. Our user-space developers complained
about this several times. So in UBIFS we decided to report _worst-case_
flash space because we thought it is better not to tell less, but to be
honest.

Please, read here for information about UBIFS flash space prediction
challenges: http://www.linux-mtd.infradead.org/doc/ubifs.html#L_spaceacc
This should shed some light.

In short, I'll conclude with few items here:
  * I most of the cases UBIFS reports _less_ space that it really has.
    This is because it reports worst-case digits, and worst-case
    scenario happens very-very rarely. Just try to write a file and see.
  * It is very difficult to report precise flash space. This was the  
    issue in JFFS2, but it is even more of the issue in UBIFS because
    of write-back.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Understanding UBIFS flash overhead
  2008-10-10 21:00 Understanding UBIFS flash overhead Deepak Saxena
  2008-10-13  6:48 ` Artem Bityutskiy
@ 2008-10-13  7:45 ` Artem Bityutskiy
  2008-10-14 22:56   ` Deepak Saxena
  1 sibling, 1 reply; 6+ messages in thread
From: Artem Bityutskiy @ 2008-10-13  7:45 UTC (permalink / raw)
  To: dsaxena; +Cc: linux-mtd

On Fri, 2008-10-10 at 14:00 -0700, Deepak Saxena wrote:
> I'm working on getting UBIFS going on the OLPC XO and am trying to
> understand what I am seeing in regards to file system size. 

BTW, consider to try new "bulk_read" mount option which may improve file
read speed. The "no_chk_data_crc" might be considered as well.

We'd be interested to get some digits from you to see whether
"bulk_read" improves things for you. Please, read here:
http://marc.info/?l=linux-kernel&m=122276053307842&w=2

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Understanding UBIFS flash overhead
  2008-10-13  6:48 ` Artem Bityutskiy
@ 2008-10-14 22:56   ` Deepak Saxena
  2008-10-15 12:59     ` Artem Bityutskiy
  0 siblings, 1 reply; 6+ messages in thread
From: Deepak Saxena @ 2008-10-14 22:56 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: linux-mtd

On Oct 13 2008, at 09:48, Artem Bityutskiy was caught saying:
> I have several comments.
> 
> 1. We've improved df reporting, but we have not updated the back-port
> trees for long time, so the improvements were not there. I've just
> updated all back-ports and they contain all the recent patches we
> consider stable. It is basically identical to ubifs-2.6.git now. Please,
> update, things will become better. However, do not expect df to tell you
> precise information anyway. See below.

Artem,

Thanks for updating the backport trees.

I pulled these and we go from an 822MiB filesystem to and 878MiB filesystem 
out of 949MiB device. This is definetely an improvement, but still means 
71MiB is being used for the journal (8MiB default in my test) and for 
indexes (or not being properly accounted for).

> 2. Let me first describe JFFS2 experience we have had. In statfs() (and
> thus, df) JFFS2 reports physical flash space. There are some challenges
> in calculating this space, and JFFS2 often lies in df. For example, it
> may say it has 20MiB free, but if you start writing on it, you'll be
> able to write only 16MiB file. Our user-space developers complained
> about this several times. So in UBIFS we decided to report _worst-case_
> flash space because we thought it is better not to tell less, but to be
> honest.
> 
> Please, read here for information about UBIFS flash space prediction
> challenges: http://www.linux-mtd.infradead.org/doc/ubifs.html#L_spaceacc
> This should shed some light.

Thanks. I've read the docs, faqs, and white paper over and my understanding
is that this is refering to free space reporting. I think we can live with 
not-perfectly accurate numbers on this end if our applications fail nicely.

The fact that we're loosing ~8% of space from the start is an issue for 
us b/c we are already running into issues with kids filling the systems 
up quickly, so every page we can save is important. We'll have to some 
performance analysis on tweaking the journal size but I'm wondering what 
else is configurable (or could be made configurable via changes) to 
decrease this? I notice there is an option to mkfs.ubifs to change the 
index fanout and I'll read the code to understand this and see how it
impacts the fs size.

Does the reported filesystem size change dynamically w.r.t w/b and
compression assumptions or is it completely based on static overhead
of journal and index?

Thanks again for the help,
~Deepak

-- 
   _____   __o  Deepak Saxena - Living CarFree and CareFree          (o>
------    -\<,  "When I see an adult on a bicycle, I do not despair  //\
 ----- ( )/ ( )  for the future of the human race." -H.G Wells       V_/_
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Understanding UBIFS flash overhead
  2008-10-13  7:45 ` Artem Bityutskiy
@ 2008-10-14 22:56   ` Deepak Saxena
  0 siblings, 0 replies; 6+ messages in thread
From: Deepak Saxena @ 2008-10-14 22:56 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: linux-mtd

On Oct 13 2008, at 10:45, Artem Bityutskiy was caught saying:
> On Fri, 2008-10-10 at 14:00 -0700, Deepak Saxena wrote:
> > I'm working on getting UBIFS going on the OLPC XO and am trying to
> > understand what I am seeing in regards to file system size. 
> 
> BTW, consider to try new "bulk_read" mount option which may improve file
> read speed. The "no_chk_data_crc" might be considered as well.
> 
> We'd be interested to get some digits from you to see whether
> "bulk_read" improves things for you. Please, read here:
> http://marc.info/?l=linux-kernel&m=122276053307842&w=2

Thanks, will do.


-- 
   _____   __o  Deepak Saxena - Living CarFree and CareFree          (o>
------    -\<,  "When I see an adult on a bicycle, I do not despair  //\
 ----- ( )/ ( )  for the future of the human race." -H.G Wells       V_/_
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Understanding UBIFS flash overhead
  2008-10-14 22:56   ` Deepak Saxena
@ 2008-10-15 12:59     ` Artem Bityutskiy
  0 siblings, 0 replies; 6+ messages in thread
From: Artem Bityutskiy @ 2008-10-15 12:59 UTC (permalink / raw)
  To: dsaxena; +Cc: linux-mtd

On Tue, 2008-10-14 at 15:56 -0700, Deepak Saxena wrote:
> I pulled these and we go from an 822MiB filesystem to and 878MiB filesystem 
> out of 949MiB device. This is definetely an improvement, but still means 
> 71MiB is being used for the journal (8MiB default in my test) and for 
> indexes (or not being properly accounted for).

1. First of all, I'd like to comment on the "71MiB is being used for the
journal (8MiB default in my test)" phrase, just to clarify things.

The size of the journal does not really affect available space. Just
created this FAQ entry to elaborate on this:
http://www.linux-mtd.infradead.org/faq/ubifs.html#L_smaller_jrn

2. Could I please ask you to actually fill the file-system with a huge
uncompressible file and send use size of the file you was able to
create. Something like:

dd if=/dev/urandom of=/mnt/ubifs/file bs=4096
ls -l /mnt/ubifs/file

You should probably to be able to create a larger than 878MiB file.
Let's see what is the _real_ amount of free space, because df anyway
lies a little.

> Thanks. I've read the docs, faqs, and white paper over and my understanding
> is that this is refering to free space reporting. I think we can live with 
> not-perfectly accurate numbers on this end if our applications fail nicely.
> 
> The fact that we're loosing ~8% of space from the start is an issue for 
> us b/c we are already running into issues with kids filling the systems 
> up quickly, so every page we can save is important. We'll have to some 
> performance analysis on tweaking the journal size but I'm wondering what 
> else is configurable (or could be made configurable via changes) to 
> decrease this?

As I said earlier, we do not expect journal size to affect available
space much.

>  I notice there is an option to mkfs.ubifs to change the 
> index fanout and I'll read the code to understand this and see how it
> impacts the fs size.

I expect larger index fanout would save some space, but the dependency
should really be small. Also we did not test the FS extensively for
non-default fanouts, although we ran _some_ tests with various
non-default fanouts and UBIFS looked OK. I think the maximum tested
fanout was 128, while default is 8.

> Does the reported filesystem size change dynamically w.r.t w/b and
> compression assumptions or is it completely based on static overhead
> of journal and index?

The reported space does change dynamically w.r.t. w/b. The less dirty
data you have, the more precise is the calculation. To get the most
precise 'df' output, call 'sync'. It not only flushed all dirty data,
but also commits which also makes calculations more precise, because
UBIFS knows _exact_ indexing tree size after commit. I mean, if yo have
data in journal, it is not indexed, and precise index size is unknown.
UBIFS would have to actually _do_ the commit to know precise index size.
This is not fundamental thing, it is just implementation issue. We just
found it much more difficult implement things differently.

Let me tell you some more details which may be useful to know. In UBIFS
we have the index which has size X. And we reserve 2*X more flash space
to guarantee that we can always commit. I mean, the index takes X bytes,
and we reserve 2*X bytes more. Well things are rounded to LEB size, but
this does not matter much. We had a discussion with Adrian today, and we
think that in general we may try to improve things and reserve X bytes,
instead of 2*X bytes, but it is difficult to do. So we would like to
know index size in your case, to understand if it is really worth it.

To provide is the index size you should print the "c->old_idx_sz"
variable. Fill your FS, run sync, and then get its value. I think adding
a printk to the 'ubifs_calc_min_idx_lebs()' function should work. This
func is called on 'df'. So you do sync, run df and look at dmesg.

I was going to add debugfs support to UBIFS and expose important
variables like this via debugfs later.

But there is another possibility to save ~15.5MiB of flash. To do this
we should improve UBI and teach it to store both headers at the first
NAND page. Since you do not have sub-pages, we could use available OOB
bytes from the first page. This would save 2048 bytes per eraseblocks,
which is about ~15.5MiB. Could you please give us information about how
many OOB bytes are available in case of OLPC NAND. You should basically
look at 'struct nand_ecclayout' to find this out. There is an
ECCGETLAYOUT MTD device ioctl. Never used it though, but it should work.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-10-15 13:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-10 21:00 Understanding UBIFS flash overhead Deepak Saxena
2008-10-13  6:48 ` Artem Bityutskiy
2008-10-14 22:56   ` Deepak Saxena
2008-10-15 12:59     ` Artem Bityutskiy
2008-10-13  7:45 ` Artem Bityutskiy
2008-10-14 22:56   ` Deepak Saxena

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox