* Duplication of dirent names in JFFS2 summary
@ 2006-05-19 0:44 David Woodhouse
2006-05-19 1:01 ` David Woodhouse
` (3 more replies)
0 siblings, 4 replies; 28+ messages in thread
From: David Woodhouse @ 2006-05-19 0:44 UTC (permalink / raw)
To: Ferenc Havasi; +Cc: linux-mtd
It's a little unfortunate that we have to actually keep a second copy of
the name in the summary. Is there any way we could avoid it?
One possibility might be to refrain from storing the name in the dirent
node itself -- store it _only_ in the summary. This could happen only
during garbage collection -- when we GC a dirent node, we could write it
out _without_ its name. In fact, we could possibly refrain from writing
the GC'd dirent to the log at all -- put it _only_ in the summary.
That does break backwards compatibility though, which isn't ideal.
Another possibility might be to omit the full name from the summary, but
to store only the hash (or name_crc) instead. We hardly ever actually
need the full name -- it's only used if there's a hash collision, in
jffs2_add_fd_to_list().
Further thoughts...?
Do we have any current data on how much space is taken by summary nodes
on typical file systems, and how much of that is names?
--
dwmw2
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 0:44 Duplication of dirent names in JFFS2 summary David Woodhouse @ 2006-05-19 1:01 ` David Woodhouse 2006-05-19 11:53 ` David Woodhouse 2006-05-19 6:05 ` Jörn Engel ` (2 subsequent siblings) 3 siblings, 1 reply; 28+ messages in thread From: David Woodhouse @ 2006-05-19 1:01 UTC (permalink / raw) To: Ferenc Havasi; +Cc: linux-mtd While I'm looking at how to reduce the size of the summary... would we also gain by dropping the 'totlen' field and assuming that each node's length can be found by subtracting its offset from the offset on the next node on the flash? (Much as I plan to do to reduce memory usage by shrinking the struct jffs2_raw_node_ref.) That'd mean we have to have a summary representation of 'dirty' space, but there should be relatively few of those in the eraseblocks we've only just finished writing out, so it should be a gain overall. We don't actually need 'type' in the jffs2_sum_dirent_flash either, do we? Depending on the number of crc32 collisions we get and how expensive it is to read the names from the flash, we might not even need the nsize field. And would we benefit by using a variable-length encoding of the integers in the summary entries? We're far from being CPU-bound when we eat these, surely? ... which in turn means we should keep storing totlen and drop _offset_, since we can infer one from the other and the length can always be stored in fewer bits... -- dwmw2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 1:01 ` David Woodhouse @ 2006-05-19 11:53 ` David Woodhouse 2006-05-19 11:58 ` Artem B. Bityutskiy 0 siblings, 1 reply; 28+ messages in thread From: David Woodhouse @ 2006-05-19 11:53 UTC (permalink / raw) To: Ferenc Havasi; +Cc: linux-mtd On Fri, 2006-05-19 at 02:01 +0100, David Woodhouse wrote: > And would we benefit by using a variable-length encoding of the integers > in the summary entries? We're far from being CPU-bound when we eat > these, surely? Thinking of something vaguely like UTF-8, but more optimal. Each byte would convey seven bits of information, and the top bit would indicate whether there's a further byte to be added. So a number from 0-127 would be stored as-is, while numbers from 128-16383 are stored with seven bits in each of two bytes, with the top bit of the first set and the top bit of the second clear. This means we take only two bytes for the common node lengths -- one byte for nodes up to 508 bytes (obviously we can shift the length >>2 before storing it since it's always a multiple of 4). Or should we be more ambitious and actually go for bit-packing instead of just bytes? unsigned char *encode_int(unsigned char *p, uint32_t val) { switch (val) { case (1<<28) ... 0xffffffff: *p++ = 0x80 | ((val >> 28) & 0x7f); case (1<<21) ... (1<<28)-1: *p++ = 0x80 | ((val >> 21) & 0x7f); case (1<<14) ... (1<<21)-1: *p++ = 0x80 | ((val >> 14) & 0x7f); case (1<<7) ... (1<<14)-1: *p++ = 0x80 | ((val >> 7) & 0x7f); default: *p++ = val & 0x7f; } return p; } unsigned char *decode_int(unsigned char *p, uint32_t *val) { uint32_t ret = 0; unsigned char c; while ((c = *p++) & 0x80) { ret |= c & 0x7f; ret <<= 7; } ret |= c; *val = ret; return p; } -- dwmw2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 11:53 ` David Woodhouse @ 2006-05-19 11:58 ` Artem B. Bityutskiy 2006-05-19 12:11 ` David Woodhouse 0 siblings, 1 reply; 28+ messages in thread From: Artem B. Bityutskiy @ 2006-05-19 11:58 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Fri, 2006-05-19 at 12:53 +0100, David Woodhouse wrote: > On Fri, 2006-05-19 at 02:01 +0100, David Woodhouse wrote: > > And would we benefit by using a variable-length encoding of the integers > > in the summary entries? We're far from being CPU-bound when we eat > > these, surely? > > Thinking of something vaguely like UTF-8, but more optimal. Each byte > would convey seven bits of information, and the top bit would indicate > whether there's a further byte to be added. Why do you think it really worth it ? -- Best Regards, Artem B. Bityutskiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 11:58 ` Artem B. Bityutskiy @ 2006-05-19 12:11 ` David Woodhouse 2006-05-19 12:14 ` Artem B. Bityutskiy 0 siblings, 1 reply; 28+ messages in thread From: David Woodhouse @ 2006-05-19 12:11 UTC (permalink / raw) To: dedekind; +Cc: linux-mtd On Fri, 2006-05-19 at 15:58 +0400, Artem B. Bityutskiy wrote: > > Thinking of something vaguely like UTF-8, but more optimal. Each byte > > would convey seven bits of information, and the top bit would indicate > > whether there's a further byte to be added. > > Why do you think it really worth it ? If we can halve the amount of space taken on the flash by the summary nodes, I think that's a worthwhile aim. -- dwmw2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 12:11 ` David Woodhouse @ 2006-05-19 12:14 ` Artem B. Bityutskiy 2006-05-19 12:31 ` David Woodhouse 0 siblings, 1 reply; 28+ messages in thread From: Artem B. Bityutskiy @ 2006-05-19 12:14 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Fri, 2006-05-19 at 13:11 +0100, David Woodhouse wrote: > If we can halve the amount of space taken on the flash by the summary > nodes, I think that's a worthwhile aim. > If halve, OK, but why do you think this encoding will halve it? -- Best Regards, Artem B. Bityutskiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 12:14 ` Artem B. Bityutskiy @ 2006-05-19 12:31 ` David Woodhouse 2006-05-19 14:34 ` Artem B. Bityutskiy 0 siblings, 1 reply; 28+ messages in thread From: David Woodhouse @ 2006-05-19 12:31 UTC (permalink / raw) To: dedekind; +Cc: linux-mtd On Fri, 2006-05-19 at 16:14 +0400, Artem B. Bityutskiy wrote: > On Fri, 2006-05-19 at 13:11 +0100, David Woodhouse wrote: > > If we can halve the amount of space taken on the flash by the summary > > nodes, I think that's a worthwhile aim. > > > If halve, OK, but why do you think this encoding will halve it? I think overall we can about halve the size of the summary nodes. A variable-length encoding would be part of that. struct jffs2_sum_inode_flash { jint16_t nodetype; /* node type */ jint32_t inode; /* inode number */ jint32_t version; /* inode version */ jint32_t offset; /* offset on jeb */ jint32_t totlen; /* record length */ } __attribute__((packed)); The nodetype can be a single byte. Inode number is going to start low, and will take only two bytes if it's less than 16364 -- three bytes up to 2097152. Storing it as an index into a table of 'inodes affected by this summary' might also be worthwhile, since we'll often have many nodes which belong to the same inode. Version is also going to start low -- and we can also avoid storing it for the second and subsequent nodes belonging to any given inode. We _know_ it only counts up by one at a time. Offset can be entirely redundant if we represent dirty space with a separate summary entry -- there shouldn't be much dirty space in the eraseblock when we've only just finished writing it anyway. Totlen is going to be small. So yes, for the 'struct jffs2_sum_inode_flash' I think we can halve it. Much of the same goes for the 'struct jffs2_sum_dirent_flash', and we can also drop the name from that too. So yes, I suspect we can get close to half the size of that too. -- dwmw2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 12:31 ` David Woodhouse @ 2006-05-19 14:34 ` Artem B. Bityutskiy 2006-05-19 14:59 ` David Woodhouse 0 siblings, 1 reply; 28+ messages in thread From: Artem B. Bityutskiy @ 2006-05-19 14:34 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Fri, 2006-05-19 at 13:31 +0100, David Woodhouse wrote: > Inode number is going to start low, and will take only two bytes if it's > less than 16364 -- three bytes up to 2097152. Storing it as an index > into a table of 'inodes affected by this summary' might also be > worthwhile, since we'll often have many nodes which belong to the same > inode. > > Version is also going to start low -- and we can also avoid storing it > for the second and subsequent nodes belonging to any given inode. We > _know_ it only counts up by one at a time. I'd not take seriously a compression which stably degrades with the flow of time. So, I'd no count version and inode. And I have a feeling that the gain will not worth the effort, new bugs, new complexities... -- Best Regards, Artem B. Bityutskiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 14:34 ` Artem B. Bityutskiy @ 2006-05-19 14:59 ` David Woodhouse 2006-05-19 16:41 ` David Woodhouse 0 siblings, 1 reply; 28+ messages in thread From: David Woodhouse @ 2006-05-19 14:59 UTC (permalink / raw) To: dedekind; +Cc: linux-mtd On Fri, 2006-05-19 at 18:34 +0400, Artem B. Bityutskiy wrote: > I'd not take seriously a compression which stably degrades with the > flow of time. So, I'd no count version and inode. And I have a feeling > that the gain will not worth the effort, new bugs, new complexities... Do we have current figures on how much space the summary nodes actually take? What I'd actually do, rather than storing inode number and version just like that, is have a table stored in the summary, containing tuples of inode number, and the first version number of that inode which is used. The actual node entries in the summary would then just contain an index into that table for the inode number, and an offset for the version. So if we have an eraseblock which contains the following nodes... { ino# 143454, version# 2341 } { ino# 143454, version# 2342 } { ino# 143454, version# 2343 } { ino# 209343, version# 342 } { ino# 143454, version# 2344 } { ino# 209343, version# 343 } ... that would be represented as a table containing [0]: { ino #143454, first version# 2341 } [1]: { ino #209343, first version# 342 } ... followed by actual summary entries which just refer to that table, having { index into inode table, offset from first version } tuples... { 0, 0 } { 0, 1 } { 0, 2 } { 1, 0 } { 0, 3 } { 1, 1 } And in that case, the numbers _would_ compress reliably, even on old file systems. I wonder if using something like rtime would work for compressing summary blocks, rather than a specific integer encoding... -- dwmw2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 14:59 ` David Woodhouse @ 2006-05-19 16:41 ` David Woodhouse 2006-05-19 16:43 ` David Woodhouse 0 siblings, 1 reply; 28+ messages in thread From: David Woodhouse @ 2006-05-19 16:41 UTC (permalink / raw) To: dedekind; +Cc: linux-mtd On Fri, 2006-05-19 at 15:59 +0100, David Woodhouse wrote: > Do we have current figures on how much space the summary nodes > actually take? On the OLPC board -- 512MiB NAND in 4096 * 128KiB eraseblocks. I have it about 22% full, with copies of /lib, /bin, /etc, /boot and /usr/bin from a Fedora system -- along with a few nodes in /dev because I was testing the new device node support. 795 eraseblocks have a summary. The average size of the summary is 2458 bytes -- about ~1.8% of the eraseblock size. The minimum was 616, maximum 24608 bytes. The average size of the _names_ in the summary is 92 bytes, which is ~0.07% of the eraseblock size. Minimum 0, Maximum 3070. Total space taken by summaries if all 4096 blocks of the file system were contained summaries at these average sizes would be 9.6 MiB, of which 370KiB or so would be names. So we might not actually gain _much_ by removing the duplicate names -- they really aren't a significant contribution to the size of the summary nodes. It isn't _hard_ to remove them though, and it may well make a lot of sense not to read them even in the !SUMMARY case too. If we can remove the 'offset' field from the summary entries and compress the numbers either by an encoding such as the one I was talking about and an inode table, or by just using a general-purpose compression on the summary node before writing it, then I think we probably ought to get the overhead down to 1% or so from the current 1.8% -- that's probably worth doing. Just using rubin compression on the summary nodes would save 35% (thus they'd be 1.25% of total size) -- by reducing the average to 1708 bytes per eraseblock. -- dwmw2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 16:41 ` David Woodhouse @ 2006-05-19 16:43 ` David Woodhouse 0 siblings, 0 replies; 28+ messages in thread From: David Woodhouse @ 2006-05-19 16:43 UTC (permalink / raw) To: dedekind; +Cc: linux-mtd On Fri, 2006-05-19 at 17:41 +0100, David Woodhouse wrote: > Just using rubin compression on the summary nodes would save 35% (thus > they'd be 1.25% of total size) -- by reducing the average to 1708 > bytes per eraseblock. Sorry, that should be 30%. 1708 is 69.5% of 2457. -- dwmw2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 0:44 Duplication of dirent names in JFFS2 summary David Woodhouse 2006-05-19 1:01 ` David Woodhouse @ 2006-05-19 6:05 ` Jörn Engel 2006-05-19 10:08 ` David Woodhouse 2006-05-19 11:32 ` Artem B. Bityutskiy 2006-05-19 11:57 ` Artem B. Bityutskiy 3 siblings, 1 reply; 28+ messages in thread From: Jörn Engel @ 2006-05-19 6:05 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Fri, 19 May 2006 01:44:24 +0100, David Woodhouse wrote: > > It's a little unfortunate that we have to actually keep a second copy of > the name in the summary. Is there any way we could avoid it? > > One possibility might be to refrain from storing the name in the dirent > node itself -- store it _only_ in the summary. This could happen only > during garbage collection -- when we GC a dirent node, we could write it > out _without_ its name. In fact, we could possibly refrain from writing > the GC'd dirent to the log at all -- put it _only_ in the summary. > > That does break backwards compatibility though, which isn't ideal. Worse, what happens if the power fails? Do we block creat/mkdir/mknod until summary is written? Could be a long time. Or do we return to userspace and claim we've written everything while in fact we haven't? I'd pick the other option. Jörn -- The story so far: In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. -- Douglas Adams ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 6:05 ` Jörn Engel @ 2006-05-19 10:08 ` David Woodhouse 0 siblings, 0 replies; 28+ messages in thread From: David Woodhouse @ 2006-05-19 10:08 UTC (permalink / raw) To: Jörn Engel; +Cc: linux-mtd On Fri, 2006-05-19 at 08:05 +0200, Jörn Engel wrote: > Worse, what happens if the power fails? Do we block creat/mkdir/mknod > until summary is written? Could be a long time. Or do we return to > userspace and claim we've written everything while in fact we haven't? That's why I said it could happen only during GC. -- dwmw2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 0:44 Duplication of dirent names in JFFS2 summary David Woodhouse 2006-05-19 1:01 ` David Woodhouse 2006-05-19 6:05 ` Jörn Engel @ 2006-05-19 11:32 ` Artem B. Bityutskiy 2006-05-19 11:57 ` Artem B. Bityutskiy 3 siblings, 0 replies; 28+ messages in thread From: Artem B. Bityutskiy @ 2006-05-19 11:32 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd David Woodhouse wrote: > It's a little unfortunate that we have to actually keep a second copy of > the name in the summary. Is there any way we could avoid it? > > One possibility might be to refrain from storing the name in the dirent > node itself -- store it _only_ in the summary. This could happen only > during garbage collection -- when we GC a dirent node, we could write it > out _without_ its name. In fact, we could possibly refrain from writing > the GC'd dirent to the log at all -- put it _only_ in the summary. > > That does break backwards compatibility though, which isn't ideal. And it's just bad: o it's hacky; o it'll introduce many if()s in code and make it even more copmplecated; o finally, it'll interfer robustness: if the summary node get's corrupted, we may loose a lot - not just one node; I believe that this approach has to be vetoed. > Another possibility might be to omit the full name from the summary, but > to store only the hash (or name_crc) instead. We hardly ever actually > need the full name -- it's only used if there's a hash collision, in > jffs2_add_fd_to_list(). Good idea. -- Best Regards, Artem B. Bityutskiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 0:44 Duplication of dirent names in JFFS2 summary David Woodhouse ` (2 preceding siblings ...) 2006-05-19 11:32 ` Artem B. Bityutskiy @ 2006-05-19 11:57 ` Artem B. Bityutskiy 2006-05-19 12:05 ` Artem B. Bityutskiy 3 siblings, 1 reply; 28+ messages in thread From: Artem B. Bityutskiy @ 2006-05-19 11:57 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Fri, 2006-05-19 at 01:44 +0100, David Woodhouse wrote: > Another possibility might be to omit the full name from the summary, but > to store only the hash (or name_crc) instead. We hardly ever actually > need the full name -- it's only used if there's a hash collision, in > jffs2_add_fd_to_list(). About reading names during scanning. The only reason why we read direntry names during scanning is to find out which directory entries were deleted. There is no other reason to read names. Let's think if we may avoid reading direntry names at all. Yes, it is possible I think - we just have to postpone this for later. But I don't see a simple way do do this. Ideas? -- Best Regards, Artem B. Bityutskiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 11:57 ` Artem B. Bityutskiy @ 2006-05-19 12:05 ` Artem B. Bityutskiy 2006-05-19 12:23 ` David Woodhouse 0 siblings, 1 reply; 28+ messages in thread From: Artem B. Bityutskiy @ 2006-05-19 12:05 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Fri, 2006-05-19 at 15:57 +0400, Artem B. Bityutskiy wrote: > About reading names during scanning. > > The only reason why we read direntry names during scanning is to find > out which directory entries were deleted. There is no other reason to > read names. > > Let's think if we may avoid reading direntry names at all. Yes, it is > possible I think - we just have to postpone this for later. But I don't > see a simple way do do this. Ideas? > I mean, we don't have to know the right inode refcount values at the scanning time, may we calculate it at the readinode() time? -- Best Regards, Artem B. Bityutskiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 12:05 ` Artem B. Bityutskiy @ 2006-05-19 12:23 ` David Woodhouse 2006-05-19 14:17 ` Artem B. Bityutskiy 0 siblings, 1 reply; 28+ messages in thread From: David Woodhouse @ 2006-05-19 12:23 UTC (permalink / raw) To: dedekind; +Cc: linux-mtd On Fri, 2006-05-19 at 16:05 +0400, Artem B. Bityutskiy wrote: > I mean, we don't have to know the right inode refcount values at the > scanning time, may we calculate it at the readinode() time? Stop. Look at where we use the _name_. Hint: It's only in jffs2_add_fd_to_list(). Observe that we almost never need it, and we can actually fetch it from the flash only when we have to. Ponder whether that's going to be an improvement overall. It means the summary shrinks, so takes less space and is faster to read because it doesn't contain any names. The down side is that _sometimes_ when there's a crc32 collision, you have to actually read the name for some nodes from the flash after reading the summary. -- dwmw2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 12:23 ` David Woodhouse @ 2006-05-19 14:17 ` Artem B. Bityutskiy 2006-05-19 14:50 ` David Woodhouse 0 siblings, 1 reply; 28+ messages in thread From: Artem B. Bityutskiy @ 2006-05-19 14:17 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Fri, 2006-05-19 at 13:23 +0100, David Woodhouse wrote: > On Fri, 2006-05-19 at 16:05 +0400, Artem B. Bityutskiy wrote: > > I mean, we don't have to know the right inode refcount values at the > > scanning time, may we calculate it at the readinode() time? > > Stop. Look at where we use the _name_. Hint: It's only in > jffs2_add_fd_to_list(). True. > Observe that we almost never need it, and we can > actually fetch it from the flash only when we have to. Let's talk not in 'summary' context. Suppose there is no summary. I'm talking about how to avoid reading names (and CRC-checking) in this case. WIn the current implementation, we have to read names. At the source code-level language, because we need nhash at jffs2_add_fd_to_list(). But I'll repeat my statement. The only fundamental reason why we need names at the scanning time is to provide correct nlink of inodes. This is because the way how JFFS2 unlinks: it writes direntry with the same name but with target ino=0. So, if we'd not read names at the scanning time at all, we'd have no possibility to calculate correct nlinks. They'd be greater or equivalent to the correct values. And some dead inodes would no go at scan time. And my idea: so what? Nothing really bad in this. We'll adjust them later. > Ponder whether that's going to be an improvement overall. It means the > summary shrinks, so takes less space and is faster to read because it > doesn't contain any names. The down side is that _sometimes_ when > there's a crc32 collision, you have to actually read the name for some > nodes from the flash after reading the summary. So, we can skip reading names. But then we'll have incorrect nlinks after scanning. And what I'm saying is that it is possible to refine nlinks at readinode() time. -- Best Regards, Artem B. Bityutskiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 14:17 ` Artem B. Bityutskiy @ 2006-05-19 14:50 ` David Woodhouse 2006-05-19 15:07 ` Artem B. Bityutskiy 0 siblings, 1 reply; 28+ messages in thread From: David Woodhouse @ 2006-05-19 14:50 UTC (permalink / raw) To: dedekind; +Cc: linux-mtd On Fri, 2006-05-19 at 18:17 +0400, Artem B. Bityutskiy wrote: > Let's talk not in 'summary' context. Suppose there is no summary. I'm > talking about how to avoid reading names (and CRC-checking) in this > case. OK. > In the current implementation, we have to read names. At the source > code-level language, because we need nhash at jffs2_add_fd_to_list(). We could use the name_crc for that instead. We don't actually need the name itself. It just has to be a hash of the name -- the Linux nhash just happened to be a good, cheap one. > But I'll repeat my statement. The only fundamental reason why we need > names at the scanning time is to provide correct nlink of inodes. This > is because the way how JFFS2 unlinks: it writes direntry with the same > name but with target ino=0. We still only need the name _if_ the crc32 and length match a previous dirent in the same directory. For ~99% of dirents, we _don't_ need the name at scan time at all. We only actually need to look at the name in the case where there are two or more dirents which have the same pino, crc32 (or nhash) and length. Let us assume that genuine hash collisions are rare enough to be ignored, statistically. So on NOR flash without summary, where we actually mark the old nodes obsolete, you'll almost _never_ actually have to use the real name in jffs2_add_fd_to_list(). You might as well not read it during the scan. On NAND flash, or with summary, we don't mark old nodes obsolete. So it'll happen a bit more often -- specifically, in two cases: First, there's the case where we've garbage-collected a dirent node but haven't _yet_ deleted the original. But deleting the original is the whole _point_ in GC, so that's not a common case. Second, there's the case where we've unlinked a dirent node but the original hasn't yet been actually erased. That may be a _little_ more common, but I don't think it'll be _so_ common. So I still think that the benefit we get from dropping the full name from all the rest of the dirents in the summary (or just not reading it at scan time, in the !SUMMARY version) is going to be _more_ than the slowdown caused by occasionally having to go back to the flash and read the name when there's a collision. If the time taken by reading names later really _is_ significant, we could perhaps reduce that further by including "obsoleted version number" in the summary for dirent nodes. > So, if we'd not read names at the scanning time at all, we'd have no > possibility to calculate correct nlinks. Not so. We could calculate correct nlinks in the majority of cases. We really don't need the full name very often. > They'd be greater or equivalent to the correct values. And some dead > inodes would no go at scan time. And my idea: so what? Nothing really > bad in this. We'll adjust them later. But nlink is visible to userspace. We'd be giving incorrect values to userspace until we've finished the scan. -- dwmw2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 14:50 ` David Woodhouse @ 2006-05-19 15:07 ` Artem B. Bityutskiy 2006-05-19 15:26 ` Artem B. Bityutskiy 2006-05-19 15:37 ` David Woodhouse 0 siblings, 2 replies; 28+ messages in thread From: Artem B. Bityutskiy @ 2006-05-19 15:07 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Fri, 2006-05-19 at 15:50 +0100, David Woodhouse wrote: > We could use the name_crc for that instead. We don't actually need the > name itself. It just has to be a hash of the name -- the Linux nhash > just happened to be a good, cheap one. Fair enough, indeed. Using CRC on scan, then nhash() later. > We still only need the name _if_ the crc32 and length match a previous > dirent in the same directory. For ~99% of dirents, we _don't_ need the > name at scan time at all. Ok, fair. > We only actually need to look at the name in the case where there are > two or more dirents which have the same pino, crc32 (or nhash) and > length. Fair. > Let us assume that genuine hash collisions are rare enough to be > ignored, statistically. So on NOR flash without summary, where we > actually mark the old nodes obsolete, you'll almost _never_ actually > have to use the real name in jffs2_add_fd_to_list(). You might as well > not read it during the scan. Yes. > On NAND flash, or with summary, we don't mark old nodes obsolete. So > it'll happen a bit more often -- specifically, in two cases: > > First, there's the case where we've garbage-collected a dirent node but > haven't _yet_ deleted the original. But deleting the original is the > whole _point_ in GC, so that's not a common case. Sorry, -ENOPARSE on "But deleting the original is the whole _point_ in GC, so that's not a common case." > Second, there's the case where we've unlinked a dirent node but the > original hasn't yet been actually erased. That may be a _little_ more > common, but I don't think it'll be _so_ common. Ok. > So I still think that the benefit we get from dropping the full name > from all the rest of the dirents in the summary (or just not reading it > at scan time, in the !SUMMARY version) is going to be _more_ than the > slowdown caused by occasionally having to go back to the flash and read > the name when there's a collision. I assume so too. > If the time taken by reading names later really _is_ significant, we > could perhaps reduce that further by including "obsoleted version > number" in the summary for dirent nodes. Probably. > > So, if we'd not read names at the scanning time at all, we'd have no > > possibility to calculate correct nlinks. > > Not so. We could calculate correct nlinks in the majority of cases. We > really don't need the full name very often. Majority is a bad word here. > > They'd be greater or equivalent to the correct values. And some dead > > inodes would no go at scan time. And my idea: so what? Nothing really > > bad in this. We'll adjust them later. > > But nlink is visible to userspace. We'd be giving incorrect values to > userspace until we've finished the scan. 1. During scan nobody can access JFFS2, so "giving incorrect values to userspace until we've finished the scan" is wrong. 2. Before accessing *any* inode, read_inode() is called. So what I'm saying is to calculate *correct* nlink at the read_inode() time. So, we're still talking about different things. Probably I'm expressing my thought horribly. You are talking about how to use hashes instead of names. What is common, what is not, what do do in case of collisions. I'm talking about how to avoid looking at names at all. At hashes too. See the difference? No collisions. I'm repeating for the 3rd time my statement: names (or their hashes) are only needed to calculate correct nlinks. We don't habe to calculate correct nlinks at the *scan* time, we may do try to do this at the *read_inode()* time. See what I mean? -- Best Regards, Artem B. Bityutskiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 15:07 ` Artem B. Bityutskiy @ 2006-05-19 15:26 ` Artem B. Bityutskiy 2006-05-19 15:33 ` David Woodhouse 2006-05-19 15:37 ` David Woodhouse 1 sibling, 1 reply; 28+ messages in thread From: Artem B. Bityutskiy @ 2006-05-19 15:26 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd It;s getting late, I'm doing too many typos. Retype. On Fri, 2006-05-19 at 19:07 +0400, Artem B. Bityutskiy wrote: > 1. During scan nobody can access JFFS2, so "giving incorrect values to > userspace until we've finished the scan" is wrong. > 2. Before accessing *any* inode, read_inode() is called. So what I'm > saying is to calculate *correct* nlink at the read_inode() time. > > So, we're still talking about different things. Probably I'm expressing > my thought horribly. > > You are talking about how to use hashes instead of names. What is > common, what is not, what do do in case of collisions. > > I'm talking about how to avoid looking at names at all. At hashes too. > See the difference? No collisions. > > I'm repeating for the 3rd time my statement: names (or their hashes) are > only needed to calculate correct nlinks. We don't habe to calculate > correct nlinks at the *scan* time, we may do try to do this at the > *read_inode()* time. See what I mean? > 1. During scan nobody can access JFFS2, so "giving incorrect values to userspace until we've finished the scan" is wrong. 2. Before accessing *any* inode, read_inode() is called. So what I'm saying is to calculate *correct* nlink at the read_inode() time. So, we're still talking about different things. Probably I'm expressing my thoughts horribly. You are talking about how to use hashes instead of names. What is common, what is not, what to do in case of collisions. I'm talking about how to avoid looking at names at all. At hashes too. See the difference? No collisions. I'm repeating for the 3rd time my statement: names (or their hashes) are only needed to calculate correct nlinks. We don't have to calculate correct nlinks at the *scan* time, we may try to do this at the *read_inode()* time. See what I mean? -- Best Regards, Artem B. Bityutskiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 15:26 ` Artem B. Bityutskiy @ 2006-05-19 15:33 ` David Woodhouse 2006-05-19 15:38 ` Artem B. Bityutskiy 0 siblings, 1 reply; 28+ messages in thread From: David Woodhouse @ 2006-05-19 15:33 UTC (permalink / raw) To: dedekind; +Cc: linux-mtd On Fri, 2006-05-19 at 19:26 +0400, Artem B. Bityutskiy wrote: > 1. During scan nobody can access JFFS2, so "giving incorrect values to > userspace until we've finished the scan" is wrong. I shouldn't have said 'scan'. I should have said 'until the GC thread has finished checking nodes'. > 2. Before accessing *any* inode, read_inode() is called. So what I'm > saying is to calculate *correct* nlink at the read_inode() time. You can't do that. You don't have correct nlink until you've checked all the dirents in _all_ the directories on the file system. > I'm repeating for the 3rd time my statement: names (or their hashes) are > only needed to calculate correct nlinks. We don't have to calculate > correct nlinks at the *scan* time, we may try to do this at the > *read_inode()* time. See what I mean? I know what you mean, and it's _still_ not correct. -- dwmw2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 15:33 ` David Woodhouse @ 2006-05-19 15:38 ` Artem B. Bityutskiy 2006-05-19 15:43 ` David Woodhouse 0 siblings, 1 reply; 28+ messages in thread From: Artem B. Bityutskiy @ 2006-05-19 15:38 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Fri, 2006-05-19 at 16:33 +0100, David Woodhouse wrote: > I shouldn't have said 'scan'. I should have said 'until the GC thread > has finished checking nodes'. > > > 2. Before accessing *any* inode, read_inode() is called. So what I'm > > saying is to calculate *correct* nlink at the read_inode() time. > > You can't do that. You don't have correct nlink until you've checked all > the dirents in _all_ the directories on the file system. Err... I don't understand. Suppose you've scanned your FS and mounted it. Then the 1st call is read_inode("/"). Here you calculate nlink of all direntries od "/". The 2nd, say, read_inode("/home"). Here you calculate nlink of all direntries of "/home". And so forth. I don't see the problem. -- Best Regards, Artem B. Bityutskiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 15:38 ` Artem B. Bityutskiy @ 2006-05-19 15:43 ` David Woodhouse 2006-05-19 15:46 ` Artem B. Bityutskiy 2006-05-20 8:38 ` Artem B. Bityutskiy 0 siblings, 2 replies; 28+ messages in thread From: David Woodhouse @ 2006-05-19 15:43 UTC (permalink / raw) To: dedekind; +Cc: linux-mtd On Fri, 2006-05-19 at 19:38 +0400, Artem B. Bityutskiy wrote: > Err... I don't understand. Suppose you've scanned your FS and mounted > it. > > Then the 1st call is read_inode("/"). Here you calculate nlink of all > direntries od "/". > > The 2nd, say, read_inode("/home"). Here you calculate nlink of all > direntries of "/home". > > And so forth. I don't see the problem. That's because you haven't actually thought it through. Q: What affects nlink? If there's a file 'foo' in the '/' directory, which is inode #365 -- where do you look for dirents which might affect the nlink of that inode #365? -- dwmw2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 15:43 ` David Woodhouse @ 2006-05-19 15:46 ` Artem B. Bityutskiy 2006-05-20 8:38 ` Artem B. Bityutskiy 1 sibling, 0 replies; 28+ messages in thread From: Artem B. Bityutskiy @ 2006-05-19 15:46 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Fri, 2006-05-19 at 16:43 +0100, David Woodhouse wrote: > Q: What affects nlink? If there's a file 'foo' in the '/' directory, > which is inode #365 -- where do you look for dirents which might affect > the nlink of that inode #365? > OK, got it. Thanks. -- Best Regards, Artem B. Bityutskiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 15:43 ` David Woodhouse 2006-05-19 15:46 ` Artem B. Bityutskiy @ 2006-05-20 8:38 ` Artem B. Bityutskiy 2006-05-20 9:08 ` Artem B. Bityutskiy 1 sibling, 1 reply; 28+ messages in thread From: Artem B. Bityutskiy @ 2006-05-20 8:38 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Fri, 2006-05-19 at 16:43 +0100, David Woodhouse wrote: > Q: What affects nlink? If there's a file 'foo' in the '/' directory, > which is inode #365 -- where do you look for dirents which might affect > the nlink of that inode #365? One more idea that came to my mind today: your approach with nhashes seems good, but it has one drawback - collisions. They may be rare, but we'll anyway deal with them - more code, more testing. How about to introduce "direntry IDs"? Just 32 bit values, constantly incremented, per-inode. They'd act instead of names. We'd not need names at all, as those IDs would play their role. Advantages of fixed-width IDs vs variable length names are obvious. I suspect it'd simplify and improve many things, but I'm not sure we could make it compatible. -- Best Regards, Artem B. Bityutskiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-20 8:38 ` Artem B. Bityutskiy @ 2006-05-20 9:08 ` Artem B. Bityutskiy 0 siblings, 0 replies; 28+ messages in thread From: Artem B. Bityutskiy @ 2006-05-20 9:08 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-mtd On Sat, 2006-05-20 at 12:38 +0400, Artem B. Bityutskiy wrote: > How about to introduce "direntry IDs"? Just 32 bit values, constantly > incremented, per-inode. They'd act instead of names. We'd not need names > at all, as those IDs would play their role. Advantages of fixed-width > IDs vs variable length names are obvious. > Oh, this won't work :-( -- Best Regards, Artem B. Bityutskiy, St.-Petersburg, Russia. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Duplication of dirent names in JFFS2 summary 2006-05-19 15:07 ` Artem B. Bityutskiy 2006-05-19 15:26 ` Artem B. Bityutskiy @ 2006-05-19 15:37 ` David Woodhouse 1 sibling, 0 replies; 28+ messages in thread From: David Woodhouse @ 2006-05-19 15:37 UTC (permalink / raw) To: dedekind; +Cc: linux-mtd On Fri, 2006-05-19 at 19:07 +0400, Artem B. Bityutskiy wrote: > > First, there's the case where we've garbage-collected a dirent node but > > haven't _yet_ deleted the original. But deleting the original is the > > whole _point_ in GC, so that's not a common case. > Sorry, -ENOPARSE on "But deleting the original is the whole _point_ in > GC, so that's not a common case." Q: Why do we write new nodes out during GC? A: So that we can delete the original. Q: What is the point in GC? A: The point in GC is that we can delete the original nodes. Better now? My point is that when we've written out a colliding node during garbage collection, that is because we intend to delete the original node real soon now -- that deletion is what the GC is _for_. That's the whole _point_ in the GC. So we needn't worry too much about these collisions -- they'll be short-lived and hence uncommon. -- dwmw2 ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2006-05-20 9:09 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-05-19 0:44 Duplication of dirent names in JFFS2 summary David Woodhouse 2006-05-19 1:01 ` David Woodhouse 2006-05-19 11:53 ` David Woodhouse 2006-05-19 11:58 ` Artem B. Bityutskiy 2006-05-19 12:11 ` David Woodhouse 2006-05-19 12:14 ` Artem B. Bityutskiy 2006-05-19 12:31 ` David Woodhouse 2006-05-19 14:34 ` Artem B. Bityutskiy 2006-05-19 14:59 ` David Woodhouse 2006-05-19 16:41 ` David Woodhouse 2006-05-19 16:43 ` David Woodhouse 2006-05-19 6:05 ` Jörn Engel 2006-05-19 10:08 ` David Woodhouse 2006-05-19 11:32 ` Artem B. Bityutskiy 2006-05-19 11:57 ` Artem B. Bityutskiy 2006-05-19 12:05 ` Artem B. Bityutskiy 2006-05-19 12:23 ` David Woodhouse 2006-05-19 14:17 ` Artem B. Bityutskiy 2006-05-19 14:50 ` David Woodhouse 2006-05-19 15:07 ` Artem B. Bityutskiy 2006-05-19 15:26 ` Artem B. Bityutskiy 2006-05-19 15:33 ` David Woodhouse 2006-05-19 15:38 ` Artem B. Bityutskiy 2006-05-19 15:43 ` David Woodhouse 2006-05-19 15:46 ` Artem B. Bityutskiy 2006-05-20 8:38 ` Artem B. Bityutskiy 2006-05-20 9:08 ` Artem B. Bityutskiy 2006-05-19 15:37 ` David Woodhouse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox