JFFS2 mount time

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* JFFS2 mount time
@ 2004-10-20 14:26 Ferenc Havasi
  2004-10-20 15:26 ` [OBORONA-SPAM] " Artem B. Bityuckiy
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Ferenc Havasi @ 2004-10-20 14:26 UTC (permalink / raw)
  To: linux-mtd, jffs-dev, dwmw2

[-- Attachment #1: Type: text/plain, Size: 1792 bytes --]

Dear All,

Here is the latest version of our mount time improvement.

Using of it:
- apply this patch on the latest version of MTD
- compile sumtool (make command in mtd/util)
- make your JFFS2 image as before (or you can use already created images 
as well)
- run sumtool to insert summary information, for example:
   ./sumtool -i original.jffs2 -o new.jffs2 -e128KiB
- recompile your kernel with "JFFS2 inode summary support"

Jarkko made a measurement on a real NAND device: his JFFS2 image was 
120819928 (115M), after running sumtool the new image was 123338752 (117M).

Using the original mount time was 55 sec, with the new image it is only 
8.5 sec.

It works very similar as our previous improvement: stores special 
information at the end of the erase blocks, and at mount time if there 
is this kind of information the scaning of the erase block is unneccessary.

New things compared to our previous improvement:
- it was fully rewritten
- we separated the user space tool from mkfs. (sumtool)
- sumtool now not only inserts the summary information but also make 
some node-reordering. There will be two kind of erase blocks: in the 
"first type" there will be only jffs2_raw_inodes, and all other node 
(jffs2_raw_dirent) will be stored in the "second type". It generates 
summary at the end of all "fist type" eraseblock. (the "second type" 
will be scanned as before, because all information is needed in 
jffs_raw_dirent at mount time)

Ceratinly all of these things are optional (as you can see above you 
have to select it from kernel config). The JFFS2 image produced by 
sumtool is also usable with previous kernel because the summary node is 
JFFS2_FEATURE_RWCOMPAT_DELETE.

I think it can be usefull not only for us. David, may I commit it to the 
CVS?

Regards,
Ferenc

[-- Attachment #2: jffs2-summary.patch --]
[-- Type: text/x-patch, Size: 36792 bytes --]

diff --unified --recursive --new-file mtd2/fs/Kconfig mtd/fs/Kconfig
--- mtd2/fs/Kconfig	2004-07-16 17:20:59.000000000 +0200
+++ mtd/fs/Kconfig	2004-10-20 15:10:42.000000000 +0200
@@ -68,6 +68,19 @@
 	  Say 'N' unless you have NAND flash and you are willing to test and
 	  develop JFFS2 support for it.
 
+config JFFS2_FS_SUMMARY
+	bool "JFFS2 inode summary support (EXPERIMENTAL)" 
+	depends on JFFS2_FS
+	default n
+        help
+          This feature makes it possible to use inode summary information
+          for faster filesystem mount - specially on NAND.
+
+          The summary information can be inserted into a filesystem image 
+          by the utility 'sumtool'.
+
+	  If unsure, say 'N'.
+          
 config JFFS2_COMPRESSION_OPTIONS
 	bool "Advanced compression options for JFFS2"
 	default n
diff --unified --recursive --new-file mtd2/fs/jffs2/scan.c mtd/fs/jffs2/scan.c
--- mtd2/fs/jffs2/scan.c	2004-09-12 11:56:13.000000000 +0200
+++ mtd/fs/jffs2/scan.c	2004-10-20 15:44:25.000000000 +0200
@@ -4,6 +4,8 @@
  * Copyright (C) 2001-2003 Red Hat, Inc.
  *
  * Created by David Woodhouse <dwmw2@redhat.com>
+ * Inode summary support by Zoltan Sogor, Ferenc Havasi, Patrik Kluba
+ *                          University of Szeged, Hungary
  *
  * For licensing information, see the file 'LICENCE' in this directory.
  *
@@ -58,6 +60,11 @@
 static int jffs2_scan_dirent_node(struct jffs2_sb_info *c, struct jffs2_eraseblock *jeb,
 				 struct jffs2_raw_dirent *rd, uint32_t ofs);
 
+
+#ifdef CONFIG_JFFS2_FS_SUMMARY
+static struct jffs2_inode_cache *jffs2_scan_make_ino_cache(struct jffs2_sb_info *c, uint32_t ino);
+#endif
+
 #define BLK_STATE_ALLFF		0
 #define BLK_STATE_CLEAN		1
 #define BLK_STATE_PARTDIRTY	2
@@ -292,6 +299,25 @@
 #ifdef CONFIG_JFFS2_FS_NAND
 	int cleanmarkerfound = 0;
 #endif
+#ifdef CONFIG_JFFS2_FS_SUMMARY
+	struct jffs2_raw_node_ref *raw;
+	struct jffs2_raw_node_ref *cache_ref;
+	struct jffs2_inode_cache *ic;
+		
+	typedef struct sum_marker {
+		jint32_t offset;
+		jint32_t magic;
+	} sum_marker;
+	
+	sum_marker *sm;	
+	int i;
+	int sumsize;
+	uint32_t ino;
+	uint32_t crc;
+	struct jffs2_inode_sum_node *summary;
+	struct jffs2_inode_sum_record *sum_rec;
+	int bad_sum = 0;
+#endif
 
 	ofs = jeb->offset;
 	prevofs = jeb->offset - 1;
@@ -314,10 +340,217 @@
 		}
 	}
 #endif
+
+#ifdef CONFIG_JFFS2_FS_SUMMARY
+	/* Looking for summary marker */
+	sm = (sum_marker *)kmalloc(sizeof(*sm), GFP_KERNEL);
+	if (!sm) {
+	        return -ENOMEM;
+	}
+	
+	err = jffs2_fill_scan_buf(c, (unsigned char *) sm, jeb->offset + c->sector_size - 8, 8);
+	
+	if (err) {
+	        return err;
+	}
+	
+	if (je32_to_cpu(sm->magic) == JFFS2_SUM_MAGIC) {
+		ofs = je32_to_cpu(sm->offset);
+		sumsize = c->sector_size - ofs;
+		ofs += jeb->offset;
+		
+		D1(printk(KERN_DEBUG "jffs2_scan_eraseblock(): Inode summary information found at 0x%x (%d bytes)\n", ofs, sumsize));
+		
+		summary = (struct jffs2_inode_sum_node *) kmalloc(sumsize, GFP_KERNEL);
+			
+		if (!summary) {
+				kfree(sm);
+				return -ENOMEM;
+		}
+		
+		err = jffs2_fill_scan_buf(c, (unsigned char *)summary, ofs, sumsize);
+		
+		if (err) {
+				kfree(sm);
+				kfree(summary);
+				return err;
+		}
+  
+		/* OK, now check for node validity and CRC */
+		crcnode.magic = cpu_to_je16(JFFS2_MAGIC_BITMASK);
+		crcnode.nodetype = cpu_to_je16(JFFS2_NODETYPE_INODE_SUM);
+		crcnode.totlen = summary->totlen;
+		hdr_crc = crc32(0, &crcnode, sizeof(crcnode)-4);
+		
+		if (je32_to_cpu(summary->hdr_crc) != hdr_crc) {
+				D1(printk(KERN_DEBUG "jffs2_scan_eraseblock(): Summary node header is corrupt (bad CRC or no summary at all)\n"));
+				bad_sum = 1;
+		}
+		
+		if ((!bad_sum) && (je32_to_cpu(summary->totlen) != sumsize)) {
+				D1(printk(KERN_DEBUG "jffs2_scan_eraseblock(): Summary node is corrupt (wrong erasesize?)\n"));
+				bad_sum = 1;
+		}
+		
+		crc = crc32(0, summary, sizeof(struct jffs2_inode_sum_node)-8);
+			
+		if ((!bad_sum) && (je32_to_cpu(summary->node_crc) != crc)) {
+				D1(printk(KERN_DEBUG "jffs2_scan_eraseblock(): Summary node is corrupt (bad CRC)\n"));
+				bad_sum = 1;
+		}
+		
+		sum_rec = (struct jffs2_inode_sum_record *) &(summary->sum[0]);
+		crc = crc32(0, sum_rec, sumsize - sizeof(struct jffs2_inode_sum_node));
+  
+		if ((!bad_sum) && (je32_to_cpu(summary->sum_crc) != crc)) {
+				D1(printk(KERN_DEBUG "jffs2_scan_eraseblock(): Summary node data is corrupt (bad CRC)\n"));
+				bad_sum = 1;
+		}
+		
+		if (!bad_sum) {
+			
+			if ( je32_to_cpu(summary->cln_mkr) ){
+				
+				D1(printk(KERN_DEBUG "Summary : CLEANMARKER node \n"));
+				
+				if (je32_to_cpu(summary->cln_mkr) != c->cleanmarker_size) {
+					printk(KERN_DEBUG "CLEANMARKER node has totlen 0x%x != normal 0x%x\n", 
+					   je32_to_cpu(summary->cln_mkr), c->cleanmarker_size);
+					UNCHECKED_SPACE( PAD(je32_to_cpu(summary->cln_mkr)) );
+				} 
+				else if (jeb->first_node) {
+					printk(KERN_DEBUG "CLEANMARKER node not first node in block (0x%08x)\n", jeb->offset);
+					UNCHECKED_SPACE( PAD(je32_to_cpu(summary->cln_mkr)) );
+				} 
+				else {
+					struct jffs2_raw_node_ref *marker_ref = jffs2_alloc_raw_node_ref();
+						
+					if (!marker_ref) {
+						printk(KERN_NOTICE "Failed to allocate node ref for clean marker\n");
+						return -ENOMEM;
+					}
+					
+					marker_ref->next_in_ino = NULL;
+					marker_ref->next_phys = NULL;
+					marker_ref->flash_offset = jeb->offset | REF_NORMAL;
+					marker_ref->__totlen = je32_to_cpu(summary->cln_mkr);
+					jeb->first_node = jeb->last_node = marker_ref;
+				
+					USED_SPACE( PAD(je32_to_cpu(summary->cln_mkr)) );
+									
+				}
+			}
+			
+			for(i = 0; i < je16_to_cpu(summary->sum_num); i++) {
+				
+				D1(printk(KERN_DEBUG "jffs2_scan_eraseblock(): Processing summary information %d\n", i));
+				
+				//JFFS2_NODETYPE_INODE:
+				ino = je32_to_cpu(sum_rec->inode);
+				D1(printk(KERN_DEBUG "jffs2_scan_eraseblock(): Inode at 0x%08x\n", jeb->offset + je32_to_cpu(sum_rec->offset)));
+				raw = jffs2_alloc_raw_node_ref();
+				if (!raw) {
+						printk(KERN_NOTICE "jffs2_scan_eraseblock(): allocation of node reference failed\n");
+						kfree(sm);
+						kfree(summary);
+						return -ENOMEM;
+				}
+
+				ic = jffs2_get_ino_cache(c, ino);
+				if (!ic) {
+						ic = jffs2_scan_make_ino_cache(c, ino);
+						if (!ic) {
+								printk(KERN_NOTICE "jffs2_scan_eraseblock(): scan_make_ino_cache failed\n");
+								jffs2_free_raw_node_ref(raw);
+							kfree(sm);
+							kfree(summary);
+							return -ENOMEM;
+						}
+				}
+
+				raw->flash_offset = (jeb->offset + je32_to_cpu(sum_rec->offset)) | REF_UNCHECKED;
+				raw->__totlen = PAD(je32_to_cpu(sum_rec->totlen));
+				raw->next_phys = NULL;
+				raw->next_in_ino = ic->nodes;
+
+				ic->nodes = raw;
+				if (!jeb->first_node)
+						jeb->first_node = raw;
+				if (jeb->last_node)
+						jeb->last_node->next_phys = raw;
+				jeb->last_node = raw;
+
+				/* do we need this? this requires storing another 4 bytes per record in the cache or an expensive reading */
+				pseudo_random += je32_to_cpu(sum_rec->version);
+
+				UNCHECKED_SPACE(PAD(je32_to_cpu(sum_rec->totlen)));
+				
+				sum_rec++;
+			}
+			
+			kfree(sm);
+			kfree(summary);
+
+			/* for ACCT_PARANOIA_CHECK */
+			cache_ref = jffs2_alloc_raw_node_ref();
+			
+			if (!cache_ref) {
+				printk(KERN_NOTICE "Failed to allocate node ref for cache\n");
+				return -ENOMEM;
+			}
+			
+			cache_ref->next_in_ino = NULL;
+			cache_ref->next_phys = NULL;
+			cache_ref->flash_offset = ofs | REF_NORMAL;
+			cache_ref->__totlen = sumsize;
+			
+			if (!jeb->first_node)
+				jeb->first_node = cache_ref;
+			if (jeb->last_node)
+				jeb->last_node->next_phys = cache_ref;
+			jeb->last_node = cache_ref;
+			
+			USED_SPACE(sumsize);
+			
+			/* somebody check this and all of space accounting in summary support */
+	
+			if ((jeb->used_size + jeb->unchecked_size) == PAD(c->cleanmarker_size) && !jeb->dirty_size 
+				&& (!jeb->first_node || !jeb->first_node->next_in_ino) ) { 
+					return BLK_STATE_CLEANMARKER; 
+				}		
+			/* move blocks with max 4 byte dirty space to cleanlist */	
+			else if (!ISDIRTY(c->sector_size - (jeb->used_size + jeb->unchecked_size))) {
+				c->dirty_size -= jeb->dirty_size;
+				c->wasted_size += jeb->dirty_size; 
+				jeb->wasted_size += jeb->dirty_size;
+				jeb->dirty_size = 0;
+				return BLK_STATE_CLEAN;
+			}
+			else if (jeb->used_size || jeb->unchecked_size) { 
+					return BLK_STATE_PARTDIRTY; 
+			}
+			else { 
+					return BLK_STATE_ALLDIRTY; 
+			}
+		}   
+	}
+	D1(printk(KERN_DEBUG "Summary end\n"));
+	
+	ofs = jeb->offset;
+	prevofs = jeb->offset - 1;
+
+#endif
+
 	buf_ofs = jeb->offset;
 
 	if (!buf_size) {
 		buf_len = c->sector_size;
+#ifdef CONFIG_JFFS2_FS_SUMMARY
+		/* must reread because of summary test */
+		err = jffs2_fill_scan_buf(c, buf, buf_ofs, buf_len);
+		if (err)
+			return err;
+#endif
 	} else {
 		buf_len = EMPTY_SCAN_SIZE;
 		err = jffs2_fill_scan_buf(c, buf, buf_ofs, buf_len);
diff --unified --recursive --new-file mtd2/include/linux/jffs2.h mtd/include/linux/jffs2.h
--- mtd2/include/linux/jffs2.h	2004-05-25 13:31:55.000000000 +0200
+++ mtd/include/linux/jffs2.h	2004-10-20 14:53:52.000000000 +0200
@@ -28,6 +28,9 @@
 #define JFFS2_EMPTY_BITMASK 0xffff
 #define JFFS2_DIRTY_BITMASK 0x0000
 
+/* Summary node MAGIC marker */
+#define JFFS2_SUM_MAGIC	0x02851885 
+
 /* We only allow a single char for length, and 0xFF is empty flash so
    we don't want it confused with a real length. Hence max 254.
 */
@@ -61,6 +64,7 @@
 #define JFFS2_NODETYPE_INODE (JFFS2_FEATURE_INCOMPAT | JFFS2_NODE_ACCURATE | 2)
 #define JFFS2_NODETYPE_CLEANMARKER (JFFS2_FEATURE_RWCOMPAT_DELETE | JFFS2_NODE_ACCURATE | 3)
 #define JFFS2_NODETYPE_PADDING (JFFS2_FEATURE_RWCOMPAT_DELETE | JFFS2_NODE_ACCURATE | 4)
+#define JFFS2_NODETYPE_INODE_SUM (JFFS2_FEATURE_RWCOMPAT_DELETE | JFFS2_NODE_ACCURATE | 6)
 
 // Maybe later...
 //#define JFFS2_NODETYPE_CHECKPOINT (JFFS2_FEATURE_RWCOMPAT_DELETE | JFFS2_NODE_ACCURATE | 3)
@@ -148,10 +152,31 @@
 	uint8_t data[0];
 } __attribute__((packed));
 
+struct jffs2_inode_sum_node{
+    jint16_t magic;
+	jint16_t nodetype; /* = JFFS2_NODETYPE_INODE_SUM */
+	jint32_t totlen;
+	jint32_t hdr_crc;
+	jint16_t sum_num;	/* number of sum entries*/
+	jint32_t cln_mkr;	/* clean marker size, 0 = no cleanmarker */
+	jint32_t sum_crc;	/* summary information crc */
+	jint32_t node_crc; 	/* node crc */
+	jint32_t sum[0]; 	/* inode summary info */
+} __attribute__((packed));
+
+struct jffs2_inode_sum_record{
+	jint32_t inode;
+	jint32_t version;
+	jint32_t offset;
+	jint32_t totlen; 
+} __attribute__((packed));
+
+
 union jffs2_node_union {
 	struct jffs2_raw_inode i;
 	struct jffs2_raw_dirent d;
 	struct jffs2_unknown_node u;
+	struct jffs2_inode_sum_node s;
 };
 
 #endif /* __LINUX_JFFS2_H__ */
diff --unified --recursive --new-file mtd2/util/Makefile mtd/util/Makefile
--- mtd2/util/Makefile	2004-07-13 19:49:43.000000000 +0200
+++ mtd/util/Makefile	2004-10-19 15:11:02.000000000 +0200
@@ -13,7 +13,7 @@
 TARGETS = ftl_format flash_erase flash_eraseall nanddump doc_loadbios \
 	mkfs.jffs ftl_check mkfs.jffs2 flash_lock flash_unlock \
 	flash_info mtd_debug flashcp nandwrite jffs2dump \
-	nftldump nftl_format docfdisk #jffs2reader
+	nftldump nftl_format docfdisk sumtool #jffs2reader
 
 SYMLINKS = compr_lzari.c compr_lzo.c
 
@@ -48,6 +48,9 @@
 jffs2dump: jffs2dump.o crc32.o
 	$(CC) $(LDFLAGS) -o $@ $^
 
+sumtool: sumtool.o crc32.o
+	$(CC) $(LDFLAGS) -o $@ $^
+
 install: ${TARGETS}
 	mkdir -p ${DESTDIR}/${SBINDIR}
 	install -m0755 -oroot -groot ${TARGETS} ${DESTDIR}/${SBINDIR}/
diff --unified --recursive --new-file mtd2/util/jffs2dump.c mtd/util/jffs2dump.c
--- mtd2/util/jffs2dump.c	2004-06-19 00:11:48.000000000 +0200
+++ mtd/util/jffs2dump.c	2004-10-20 14:55:28.000000000 +0200
@@ -3,7 +3,7 @@
  *
  *  Copyright (C) 2003 Thomas Gleixner (tglx@linutronix.de)
  *
- * $Id: jffs2dump.c,v 1.6 2004/06/18 22:11:48 gleixner Exp $
+ * $Id: jffs2dump.c,v 1.1 2004/10/19 07:19:55 weth Exp $
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -277,7 +277,63 @@
 
 			p += PAD(je32_to_cpu (node->d.totlen));						
 			break;
-	
+
+		case JFFS2_NODETYPE_INODE_SUM:{
+			
+			int i;
+			jint32_t *offset,*magic;
+			
+			printf ("%8s Inode Sum  node at 0x%08x, totlen 0x%08x, sum_num  %5d, cleanmarker size %5d\n",
+					obsolete ? "Obsolete" : "",
+					p - data,
+					je32_to_cpu (node->s.totlen),
+					je16_to_cpu (node->s.sum_num),
+					je32_to_cpu (node->s.cln_mkr));
+
+			crc = crc32 (0, node, sizeof (struct jffs2_inode_sum_node) - 8);
+			if (crc != je32_to_cpu (node->s.node_crc)) {
+				printf ("Wrong node_crc at  0x%08x, 0x%08x instead of 0x%08x\n", p - data, je32_to_cpu (node->s.node_crc), crc);
+				p += PAD(je32_to_cpu (node->s.totlen));
+				dirty += PAD(je32_to_cpu (node->s.totlen));;
+				continue;
+			}
+			
+			crc = crc32(0, p + sizeof (struct jffs2_inode_sum_node),  je32_to_cpu (node->s.totlen) - sizeof(struct jffs2_inode_sum_node));
+			if (crc != je32_to_cpu(node->s.sum_crc)) {
+				printf ("Wrong data_crc at  0x%08x, 0x%08x instead of 0x%08x\n", p - data, je32_to_cpu (node->s.sum_crc), crc);
+				p += PAD(je32_to_cpu (node->s.totlen));
+				dirty += PAD(je32_to_cpu (node->s.totlen));;
+				continue;
+			}
+
+			if(verbose){
+				for(i = 0; i < je16_to_cpu (node->s.sum_num); i++){
+					struct jffs2_inode_sum_record *sp;
+					sp = (struct jffs2_inode_sum_record *) (p + sizeof (struct jffs2_inode_sum_node));
+						
+					printf ("%14s #ino  %5d,  version %5d, offset %8d, totlen 0x%08x\n",
+						"",
+						je32_to_cpu (sp[i].inode),
+						je32_to_cpu (sp[i].version),
+						je32_to_cpu (sp[i].offset), 
+						je32_to_cpu (sp[i].totlen));
+					
+				}
+				
+				offset = (jint32_t *)((char *)p + je32_to_cpu(node->s.totlen) - 8);
+				magic = (jint32_t *)((char *)p + je32_to_cpu(node->s.totlen) - 4);
+				
+				printf("%14s Sum Node Offset  0x%08x,  Magic 0x%08x\n",
+					"",
+					je32_to_cpu(*offset),
+					je32_to_cpu(*magic));
+			}
+			
+			p += PAD(je32_to_cpu (node->s.totlen));
+			break;
+			
+		}
+			
 		case JFFS2_NODETYPE_CLEANMARKER:
 			if (verbose) {
 				printf ("%8s Cleanmarker     at 0x%08x, totlen 0x%08x\n", 
@@ -418,9 +474,9 @@
 
 			write (fd, &newnode, sizeof (struct jffs2_raw_dirent));
 			write (fd, p + sizeof (struct jffs2_raw_dirent), PAD (je32_to_cpu (node->d.totlen) -  sizeof (struct jffs2_raw_dirent)));
-			p += PAD(je32_to_cpu (node->d.totlen));						
+			p += PAD(je32_to_cpu (node->d.totlen));
 			break;
-	
+	                        
 		case JFFS2_NODETYPE_CLEANMARKER:
 		case JFFS2_NODETYPE_PADDING:
 			newnode.u.magic = cnv_e16 (node->u.magic);
diff --unified --recursive --new-file mtd2/util/sumtool.c mtd/util/sumtool.c
--- mtd2/util/sumtool.c	1970-01-01 01:00:00.000000000 +0100
+++ mtd/util/sumtool.c	2004-10-20 11:56:08.000000000 +0200
@@ -0,0 +1,800 @@
+/*
+ *  sumtool.c
+ *
+ *  Copyright (C) 2004 Zoltan Sogor <weth@inf.u-szeged.hu>,
+ *                     Ferenc Havasi <havasi@inf.u-szeged.hu>,
+ *                     Patrik Kluba <pajko@halom.u-szeged.hu>,
+ *                     University of Szeged, Hungary
+ *
+ * $Id: sumtool.c,v 1.2 2004/10/20 09:56:08 hafy Exp $
+ * 
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * Overview:
+ *   This is a utility to reorder nodes and insert inode summary information
+ *   into JFFS2 image for faster mount time - specially on NAND.
+ *
+ */
+
+#include <errno.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdarg.h>
+#include <string.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <time.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/param.h>
+#include <asm/types.h>
+#include <dirent.h>
+#include <mtd/jffs2-user.h>
+#include <endian.h>
+#include <byteswap.h>
+#include <getopt.h>
+#include "crc32.h"
+
+#define PAD(x) (((x)+3)&~3)
+
+#define SINODE 0	/* Inode Type*/
+#define SONODE 1  	/* Other type*/
+
+static const char *const app_name = "sumtool";
+
+typedef struct sum_storage {
+	jint32_t inode;
+	jint32_t version;
+	jint32_t offset;
+	jint32_t totlen;
+	struct sum_storage *next;
+} sum_storage;
+
+static sum_storage *sum_collected = NULL;	/* summary info list */
+static int sum_records = 0;					/* number of sumary records */
+
+
+static int verbose = 0;
+static int add_cleanmarkers = 1;			/* add cleanmarker to output */
+static int use_input_cleanmarker_size = 1;	/* use input file's cleanmarker size (default) */
+static int found_cleanmarkers = 0;			/* cleanmarker found in input file */
+static struct jffs2_unknown_node cleanmarker;
+static int cleanmarker_size = sizeof(cleanmarker);
+static const char *short_options = "o:i:e:hvVblnc:";
+static int erase_block_size = 65536;
+static int target_endian = __BYTE_ORDER;
+static int out_fd = -1;
+static int in_fd = -1;
+
+static uint8_t *inode_buffer = NULL; 	/* buffer for inodes */
+static unsigned int ino_ofs = 0;	 	/* inode buffer offset */
+
+static uint8_t *dirent_buffer = NULL;	/* buffer for directory entries and other (symlink, spec. device files, etc.)*/ 
+static unsigned int dent_ofs = 0;		/* directory enrty buffer offset*/	
+
+static uint8_t *file_buffer = NULL;		/* file buffer contains the actual erase block*/
+static unsigned int file_ofs = 0;		/* position in the buffer */
+
+static struct option long_options[] = {
+	{"output", 1, NULL, 'o'},
+	{"input", 1, NULL, 'i'},
+	{"eraseblock", 1, NULL, 'e'},
+	{"help", 0, NULL, 'h'},
+	{"verbose", 0, NULL, 'v'},
+	{"version", 0, NULL, 'V'},
+	{"bigendian", 0, NULL, 'b'},
+	{"littleendian", 0, NULL, 'l'},	
+	{"no-cleanmarkers", 0, NULL, 'n'},
+	{"cleanmarker", 1, NULL, 'c'},
+	{NULL, 0, NULL, 0}
+};
+
+static char *helptext =
+	"Usage: sumtool [OPTIONS] -i inputfile -o outputfile\n"
+	"Convert the input JFFS2 file to a SUM-ed JFFS2 file\n\n"
+	"Options:\n"
+	"  -e, --eraseblock=SIZE     Use erase block size SIZE (default: 64KiB)\n"
+	"                            (usually 16KiB on NAND)\n"
+	"  -c, --cleanmarker=SIZE    Size of cleanmarker (default 12).\n"
+	"                            (usually 16 bytes on NAND, and will be set to\n"
+	"                            this value if left at the default 12). Will be\n"
+	"                            stored in OOB after each physical page composing\n"
+	"                            a physical erase block.\n"
+	"  -n, --no-cleanmarkers     Don't add a cleanmarker to every eraseblock\n"
+	"  -o, --output=FILE         Output to FILE \n"
+	"  -i, --input=FILE          Input from FILE \n"
+	"  -b, --bigendian	     Image is big endian\n"
+	"  -l  --littleendian        Image is little endian\n"
+	"  -h, --help                Display this help text\n"
+	"  -v, --verbose             Verbose operation\n"
+	"  -V, --version             Display version information\n\n";
+
+
+static char *revtext = "$Revision: 1.2 $";
+
+static unsigned char ffbuf[16] = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
+};
+
+static void verror_msg(const char *s, va_list p) {
+	fflush(stdout);
+	fprintf(stderr, "%s: ", app_name);
+	vfprintf(stderr, s, p);
+}
+
+static void error_msg_and_die(const char *s, ...) {
+	va_list p;
+
+	va_start(p, s);
+	verror_msg(s, p);
+	va_end(p);
+	putc('\n', stderr);
+	exit(EXIT_FAILURE);
+}
+
+static void vperror_msg(const char *s, va_list p) {
+	int err = errno;
+
+	if (s == 0)
+		s = "";
+	verror_msg(s, p);
+	if (*s)
+		s = ": ";
+	fprintf(stderr, "%s%s\n", s, strerror(err));
+}
+
+static void perror_msg_and_die(const char *s, ...) {
+	va_list p;
+
+	va_start(p, s);
+	vperror_msg(s, p);
+	va_end(p);
+	exit(EXIT_FAILURE);
+}
+
+
+
+static void full_write(void *target_buff, const void *buf, int len, int nd);
+
+void setup_cleanmarker() {
+
+	cleanmarker.magic    = cpu_to_je16(JFFS2_MAGIC_BITMASK);
+	cleanmarker.nodetype = cpu_to_je16(JFFS2_NODETYPE_CLEANMARKER);
+	cleanmarker.totlen   = cpu_to_je32(cleanmarker_size);
+	cleanmarker.hdr_crc  = cpu_to_je32(crc32(0, &cleanmarker, sizeof(struct jffs2_unknown_node)-4));
+}
+
+void process_options (int argc, char **argv){
+	int opt,c;
+	
+	while ((opt = getopt_long(argc, argv, short_options, long_options, &c)) >= 0) 
+	{
+		switch (opt) 
+		{
+			case 'o':
+				if (out_fd != -1) {
+					error_msg_and_die("output filename specified more than once");
+				}
+				out_fd = open(optarg, O_CREAT | O_TRUNC | O_RDWR, 0644);
+				if (out_fd == -1) {
+					perror_msg_and_die("open output file");
+				}
+				break;
+				
+			case 'i':
+				if (in_fd != -1) {
+					error_msg_and_die("input filename specified more than once");
+				}
+				in_fd = open(optarg, O_RDONLY);
+				if (in_fd == -1) {
+					perror_msg_and_die("open input file");
+				}
+				break;
+			case 'b':
+				target_endian = __BIG_ENDIAN;
+				break;
+			case 'l':
+				target_endian = __LITTLE_ENDIAN;
+				break;	
+			case 'h':
+			case '?':
+				error_msg_and_die(helptext);
+	
+			case 'v':
+				verbose = 1;
+				break;
+	
+			case 'V':
+				error_msg_and_die("revision %.*s\n",
+						(int) strlen(revtext) - 13, revtext + 11);
+	
+			case 'e': {
+				char *next;
+				unsigned units = 0;
+				erase_block_size = strtol(optarg, &next, 0);
+				if (!erase_block_size)
+					error_msg_and_die("Unrecognisable erase size\n");
+	
+				if (*next) {
+					if (!strcmp(next, "KiB")) {
+						units = 1024;
+					} else if (!strcmp(next, "MiB")) {
+						units = 1024 * 1024;
+					} else {
+						error_msg_and_die("Unknown units in erasesize\n");
+					}
+				} else {
+					if (erase_block_size < 0x1000)
+						units = 1024;
+					else
+						units = 1;
+				}
+				erase_block_size *= units;
+	
+				/* If it's less than 8KiB, they're not allowed */
+				if (erase_block_size < 0x2000) {
+					fprintf(stderr, "Erase size 0x%x too small. Increasing to 8KiB minimum\n",
+						erase_block_size);
+					erase_block_size = 0x2000;
+				}
+				break;
+			}
+
+			case 'n':
+				add_cleanmarkers = 0;
+				break;
+			case 'c':
+				cleanmarker_size = strtol(optarg, NULL, 0);
+			
+				if (cleanmarker_size < sizeof(cleanmarker)) {
+					error_msg_and_die("cleanmarker size must be >= 12");
+				}
+				if (cleanmarker_size >= erase_block_size) {
+					error_msg_and_die("cleanmarker size must be < eraseblock size");
+				}
+				
+				use_input_cleanmarker_size = 0;
+				found_cleanmarkers = 1;
+				setup_cleanmarker();
+				
+				break;
+			
+		}
+	}
+}
+
+
+void init_buffers() {
+	
+	inode_buffer = malloc(erase_block_size);
+	
+	if (!inode_buffer) {
+		perror("out of memory");
+		close (in_fd);
+		close (out_fd);
+		exit(1);
+	}
+		
+	dirent_buffer = malloc(erase_block_size);
+	
+	if (!dirent_buffer) {
+		perror("out of memory");
+		close (in_fd);
+		close (out_fd);
+		exit(1);
+	}
+		
+	file_buffer = malloc(erase_block_size);
+	
+	if (!file_buffer) {
+		perror("out of memory");
+		close (in_fd);
+		close (out_fd);
+		exit(1);
+	}
+}
+
+void clean_buffers() {
+	
+	if (inode_buffer) 
+		free(inode_buffer);
+	if (dirent_buffer)
+		free(dirent_buffer);
+	if (file_buffer)
+		free(file_buffer);
+}
+
+int load_next_block() {
+	
+	int ret;
+	ret = read(in_fd, file_buffer, erase_block_size);
+	file_ofs = 0;
+	
+	if(verbose)
+		printf("Load next block : %d bytes read\n",ret);
+	
+	return ret;
+}
+
+void write_buff_to_file(int nd) {
+	
+	int ret;
+	int len = erase_block_size;
+	uint8_t *buf = NULL;
+	
+	if (!nd) {
+		buf = inode_buffer;
+		while (len > 0) {
+			ret = write(out_fd, buf, len);
+	
+			if (ret < 0)
+				perror_msg_and_die("write");
+	
+			if (ret == 0)
+				perror_msg_and_die("write returned zero");
+	
+			len -= ret;
+			buf += ret;
+		}
+		ino_ofs = 0;
+	}
+	else {
+		buf = dirent_buffer;
+		while (len > 0) {
+			ret = write(out_fd, buf, len);
+	
+			if (ret < 0)
+				perror_msg_and_die("write");
+	
+			if (ret == 0)
+				perror_msg_and_die("write returned zero");
+	
+			len -= ret;
+			buf += ret;
+		}
+		dent_ofs = 0;
+	}
+}
+
+void dump_sum_records() {
+	
+    struct jffs2_inode_sum_node isum;
+    struct sum_storage *temp;
+	jint32_t offset;
+	jint32_t *wpage;
+	int datasize;
+	int infosize;
+	int padsize;
+	jint32_t magic = cpu_to_je32(JFFS2_SUM_MAGIC);
+	
+	if (!sum_records) 
+		return; 
+	
+	datasize = sum_records * sizeof(struct jffs2_inode_sum_record) + 8;
+	infosize = sizeof(struct jffs2_inode_sum_node) + datasize;
+	padsize = erase_block_size - ino_ofs - infosize;
+	infosize += padsize; datasize += padsize;
+	offset = cpu_to_je32(ino_ofs);
+	jint32_t *tpage = (jint32_t *) malloc(datasize);
+	
+	if(!tpage)
+		error_msg_and_die("Can't allocate memory to dump summary information!\n");
+	
+	memset(tpage, 0xff, datasize);
+	memset(&isum, 0, sizeof(isum));
+	
+	isum.magic = cpu_to_je16(JFFS2_MAGIC_BITMASK);
+	isum.nodetype = cpu_to_je16(JFFS2_NODETYPE_INODE_SUM);
+	isum.totlen = cpu_to_je32(infosize);
+	isum.hdr_crc = cpu_to_je32(crc32(0, &isum, sizeof(struct jffs2_unknown_node) - 4));
+		
+	if (add_cleanmarkers && found_cleanmarkers) {
+		isum.cln_mkr = cpu_to_je32(cleanmarker_size);	
+	}
+	else{
+		isum.cln_mkr = cpu_to_je32(0);
+	}
+	
+	isum.sum_num = cpu_to_je16(sum_records);
+	wpage = tpage;
+	
+	while (sum_records) {
+		*(wpage++) = sum_collected->inode;
+		*(wpage++) = sum_collected->version;
+		*(wpage++) = sum_collected->offset;
+		*(wpage++) = sum_collected->totlen;
+		temp = sum_collected;
+		sum_collected = sum_collected->next;
+		free(temp);
+		sum_records--;
+	}
+	
+	((char *)wpage) += padsize;
+	*(wpage++) = offset;
+	*(wpage++) = magic;
+	isum.sum_crc = cpu_to_je32(crc32(0, tpage, datasize));
+	isum.node_crc = cpu_to_je32(crc32(0, &isum, sizeof(isum) - 8));
+	
+	full_write(inode_buffer + ino_ofs, &isum, sizeof(isum), SINODE);
+	full_write(inode_buffer + ino_ofs, tpage, datasize, SINODE);
+	
+	free(tpage);
+}
+
+static void full_write(void *target_buff, const void *buf, int len, int nd) {
+	memcpy(target_buff, buf, len);
+	
+	if (!nd)
+		ino_ofs += len;
+	else 
+		dent_ofs += len;
+}
+
+static void pad(int req, int nd) {
+	if (!nd) {
+		
+		while (req) {
+			if (req > sizeof(ffbuf)) {
+				full_write(inode_buffer + ino_ofs, ffbuf, sizeof(ffbuf), nd);
+				req -= sizeof(ffbuf);
+			} else {
+				full_write(inode_buffer + ino_ofs, ffbuf, req, nd);
+				req = 0;
+			}
+		}
+	} 
+	else {
+		while (req) {
+			if (req > sizeof(ffbuf)) {
+				full_write(dirent_buffer + dent_ofs, ffbuf, sizeof(ffbuf), nd);
+				req -= sizeof(ffbuf);
+			} 
+			else {
+				full_write(dirent_buffer + dent_ofs, ffbuf, req, nd);
+				req = 0;
+			}
+		}
+	}
+}
+
+static inline void padword(int nd) {
+	
+	if (!nd){
+		if (ino_ofs % 4) {
+			full_write(inode_buffer + ino_ofs, ffbuf, 4 - (ino_ofs % 4), nd);
+		}
+	} 
+	else {
+		if (dent_ofs % 4) {
+			full_write(dirent_buffer + dent_ofs, ffbuf, 4 - (dent_ofs % 4), nd);
+		}
+	}
+}
+
+static inline void pad_block_if_less_than(int req, int nd) {
+    if (!nd) {
+	    int datasize = ((sum_records + 1) * sizeof(struct jffs2_inode_sum_record)) + sizeof(struct jffs2_inode_sum_node) + 8;
+	    datasize += (4 - (datasize % 4)) % 4;
+	    if (ino_ofs + req > erase_block_size - datasize) {
+	        dump_sum_records();
+			write_buff_to_file(nd);
+	    }
+		
+		if (add_cleanmarkers && found_cleanmarkers) {
+			if (!ino_ofs) {
+				full_write(inode_buffer, &cleanmarker, sizeof(cleanmarker), nd);
+				pad(cleanmarker_size - sizeof(cleanmarker), nd);
+				padword(nd);
+			}
+		}
+			
+    }
+	else {
+	    if (dent_ofs + req > erase_block_size)  {
+	        pad(erase_block_size - dent_ofs, nd);
+			write_buff_to_file(nd);
+	    }
+		
+		if (add_cleanmarkers && found_cleanmarkers) {
+			if (!dent_ofs) {
+				full_write(dirent_buffer, &cleanmarker, sizeof(cleanmarker), nd);
+				pad(cleanmarker_size - sizeof(cleanmarker), nd);
+				padword(nd);
+			}
+    	}
+	}	
+}
+
+void flush_buffers() {
+	
+	if ((add_cleanmarkers == 1) && (found_cleanmarkers == 1)) { /* CLEANMARKER */
+		if (ino_ofs != cleanmarker_size) {	/* INODE BUFFER */
+			
+		    int datasize = ((sum_records + 1) * sizeof(struct jffs2_inode_sum_record)) + sizeof(struct jffs2_inode_sum_node) + 8;
+		    datasize += (4 - (datasize % 4)) % 4;
+			
+			/* If we have a full inode buffer, then write out inode and summary data  */
+		    if (ino_ofs + sizeof(struct jffs2_raw_inode) + JFFS2_MIN_DATA_LEN > erase_block_size - datasize) {
+		        dump_sum_records();
+				write_buff_to_file(SINODE);
+		    }
+			/* else just write out inode data */
+			else{
+				pad(erase_block_size - ino_ofs, SINODE);
+				write_buff_to_file(SINODE);
+			}
+		}
+		
+		if (dent_ofs != cleanmarker_size) { /* DIRENT AND OTHERS BUFFER */
+			pad(erase_block_size - dent_ofs, SONODE);
+			write_buff_to_file(SONODE);
+		}
+		
+	}
+	else { /* NO CLEANMARKER */
+		if (ino_ofs != 0) { /* INODE BUFFER */
+			
+		    int datasize = ((sum_records + 1) * sizeof(struct jffs2_inode_sum_record)) + sizeof(struct jffs2_inode_sum_node) + 8;
+		    datasize += (4 - (datasize % 4)) % 4;
+			
+			/* If we have a full inode buffer, then write out inode and summary data */
+		    if (ino_ofs + sizeof(struct jffs2_raw_inode) + JFFS2_MIN_DATA_LEN > erase_block_size - datasize) {
+		        dump_sum_records();
+				write_buff_to_file(SINODE);
+		    }
+			/* Else just write out inode data */
+			else{
+				pad(erase_block_size - ino_ofs, SINODE);
+				write_buff_to_file(SINODE);
+			}
+		}
+		
+		if (dent_ofs != 0) { /* DIRENT AND OTHER BUFFER */
+			pad(erase_block_size - dent_ofs, SONODE);
+			write_buff_to_file(SONODE);
+		}
+	}
+}
+
+
+void write_dirent_to_buff(union jffs2_node_union *node) {
+	
+	pad_block_if_less_than(je32_to_cpu (node->d.totlen), SONODE);
+	full_write(dirent_buffer + dent_ofs, &(node->d), je32_to_cpu (node->d.totlen), SONODE);
+	padword(SONODE);	
+}
+
+void add_sum_entry(union jffs2_node_union *node) {
+	
+	sum_storage *walk;
+	sum_storage *temp = (sum_storage *) malloc(sizeof(sum_storage));
+	
+	if (!temp)
+		error_msg_and_die("Can't allocate memory for summary information!\n");
+	
+	temp->inode = node->i.ino;
+	temp->version = node->i.version;
+	temp->offset = cpu_to_je32(ino_ofs); 
+	temp->totlen = node->i.totlen;
+	temp->next = NULL;
+	
+	if (!sum_collected) {
+	    sum_collected = temp;
+	} 
+	else {
+	    walk = sum_collected;
+		
+	    while (walk->next) {
+			walk = walk->next;
+	    }
+		walk->next = temp;
+	}
+	sum_records++;
+}
+
+void write_inode_to_buff(union jffs2_node_union *node) {
+	
+	pad_block_if_less_than(je32_to_cpu (node->i.totlen), SINODE);  
+	add_sum_entry(node);	/* Add inode summary entry to summary list */
+	full_write(inode_buffer + ino_ofs, &(node->i), je32_to_cpu (node->i.totlen), SINODE);	/* Write out the inode to inode_buffer */
+	padword(SINODE);
+	
+}
+
+
+void create_summed_image(int inp_size) {
+	uint8_t		*p = file_buffer;
+	union jffs2_node_union 	*node;
+	uint32_t	crc;
+	uint16_t	type;
+	int		bitchbitmask = 0;
+	int		obsolete;
+	
+	char	name[256];
+	
+	while ( p < (file_buffer + inp_size)) {
+		
+		node = (union jffs2_node_union*) p;
+		
+		/* Skip empty space */
+		if (je16_to_cpu (node->u.magic) == 0xFFFF && je16_to_cpu (node->u.nodetype) == 0xFFFF) {
+			p += 4;
+			continue;
+		}
+		
+		if (je16_to_cpu (node->u.magic) != JFFS2_MAGIC_BITMASK)	{
+			if (!bitchbitmask++)
+    			    printf ("Wrong bitmask  at  0x%08x, 0x%04x\n", p - file_buffer, je16_to_cpu (node->u.magic));
+			p += 4;
+			continue;
+		}
+		
+		bitchbitmask = 0;
+		
+		type = je16_to_cpu(node->u.nodetype);
+		if ((type & JFFS2_NODE_ACCURATE) != JFFS2_NODE_ACCURATE) {
+			obsolete = 1;
+			type |= JFFS2_NODE_ACCURATE;
+		} else
+			obsolete = 0;
+		
+		node->u.nodetype = cpu_to_je16(type);
+	    
+		crc = crc32 (0, node, sizeof (struct jffs2_unknown_node) - 4);
+		if (crc != je32_to_cpu (node->u.hdr_crc)) {
+			printf ("Wrong hdr_crc  at  0x%08x, 0x%08x instead of 0x%08x\n", p - file_buffer, je32_to_cpu (node->u.hdr_crc), crc);
+			p += 4;
+			continue;
+		}
+		
+		switch(je16_to_cpu(node->u.nodetype)) {
+		
+			case JFFS2_NODETYPE_INODE:
+				if(verbose)
+					printf ("%8s Inode      node at 0x%08x, totlen 0x%08x, #ino  %5d, version %5d, isize %8d, csize %8d, dsize %8d, offset %8d\n",
+						obsolete ? "Obsolete" : "",
+						p - file_buffer, je32_to_cpu (node->i.totlen), je32_to_cpu (node->i.ino),
+						je32_to_cpu ( node->i.version), je32_to_cpu (node->i.isize), 
+						je32_to_cpu (node->i.csize), je32_to_cpu (node->i.dsize), je32_to_cpu (node->i.offset));
+	
+				crc = crc32 (0, node, sizeof (struct jffs2_raw_inode) - 8);
+				if (crc != je32_to_cpu (node->i.node_crc)) {
+					printf ("Wrong node_crc at  0x%08x, 0x%08x instead of 0x%08x\n", p - file_buffer, je32_to_cpu (node->i.node_crc), crc);
+					p += PAD(je32_to_cpu (node->i.totlen));
+					continue;
+				}
+				
+				crc = crc32(0, p + sizeof (struct jffs2_raw_inode), je32_to_cpu(node->i.csize));
+				if (crc != je32_to_cpu(node->i.data_crc)) {
+					printf ("Wrong data_crc at  0x%08x, 0x%08x instead of 0x%08x\n", p - file_buffer, je32_to_cpu (node->i.data_crc), crc);
+					p += PAD(je32_to_cpu (node->i.totlen));
+					continue;
+				}
+				
+				write_inode_to_buff(node);
+				
+				p += PAD(je32_to_cpu (node->i.totlen));
+				break;
+				
+			case JFFS2_NODETYPE_DIRENT:
+				memcpy (name, node->d.name, node->d.nsize);
+				name [node->d.nsize] = 0x0;
+			
+				if(verbose)
+					printf ("%8s Dirent     node at 0x%08x, totlen 0x%08x, #pino %5d, version %5d, #ino  %8d, nsize %8d, name %s\n",
+						obsolete ? "Obsolete" : "",
+						p - file_buffer, je32_to_cpu (node->d.totlen), je32_to_cpu (node->d.pino),
+						je32_to_cpu ( node->d.version), je32_to_cpu (node->d.ino), 
+						node->d.nsize, name);
+	
+				crc = crc32 (0, node, sizeof (struct jffs2_raw_dirent) - 8);
+				if (crc != je32_to_cpu (node->d.node_crc)) {
+					printf ("Wrong node_crc at  0x%08x, 0x%08x instead of 0x%08x\n", p - file_buffer, je32_to_cpu (node->d.node_crc), crc);
+					p += PAD(je32_to_cpu (node->d.totlen));
+					continue;
+				}
+				
+				crc = crc32(0, p + sizeof (struct jffs2_raw_dirent), node->d.nsize);
+				if (crc != je32_to_cpu(node->d.name_crc)) {
+					printf ("Wrong name_crc at  0x%08x, 0x%08x instead of 0x%08x\n", p - file_buffer, je32_to_cpu (node->d.name_crc), crc);
+					p += PAD(je32_to_cpu (node->d.totlen));
+					continue;
+				}
+	
+				write_dirent_to_buff(node);
+				
+				p += PAD(je32_to_cpu (node->d.totlen));						
+				break;
+		
+			case JFFS2_NODETYPE_CLEANMARKER:
+				if (verbose) {
+					printf ("%8s Cleanmarker     at 0x%08x, totlen 0x%08x\n", 
+						obsolete ? "Obsolete" : "",
+						p - file_buffer, je32_to_cpu (node->u.totlen));
+				}
+				
+				if(!found_cleanmarkers){
+					found_cleanmarkers = 1;
+					
+					if(add_cleanmarkers == 1 && use_input_cleanmarker_size == 1){
+						cleanmarker_size = je32_to_cpu (node->u.totlen);
+						setup_cleanmarker();
+					}
+				}			
+				
+				p += PAD(je32_to_cpu (node->u.totlen));						
+				break;
+		
+			case JFFS2_NODETYPE_PADDING:
+				if (verbose) {
+					printf ("%8s Padding    node at 0x%08x, totlen 0x%08x\n", 
+						obsolete ? "Obsolete" : "",
+						p - file_buffer, je32_to_cpu (node->u.totlen));
+				}		
+				p += PAD(je32_to_cpu (node->u.totlen));						
+				break;
+				
+			case 0xffff:
+				p += 4;
+				break;
+				
+			default:	
+				if (verbose) {
+					printf ("%8s Unknown    node at 0x%08x, totlen 0x%08x\n", 
+						obsolete ? "Obsolete" : "",
+						p - file_buffer, je32_to_cpu (node->u.totlen));
+				}
+				
+				write_dirent_to_buff(node);
+				
+				p += PAD(je32_to_cpu (node->u.totlen));						
+		}	
+	}
+}
+
+int main(int argc, char **argv) {
+	
+	int ret;
+	
+	process_options(argc,argv);
+	
+	if ((in_fd == -1) || (out_fd == -1))	{
+		
+		if(in_fd != -1)
+			close(in_fd);
+		if(out_fd != -1)
+			close(out_fd);
+		
+		error_msg_and_die("You must specify input and output files!\n");
+	}
+	
+	init_buffers();
+	
+	while ((ret = load_next_block())) {
+		create_summed_image(ret);	
+	}
+
+	flush_buffers();
+	clean_buffers();
+	
+	if (in_fd != -1)
+		close(in_fd);
+	if (out_fd != -1)
+		close(out_fd);
+	
+	return 0;
+}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [OBORONA-SPAM]  JFFS2 mount time
  2004-10-20 14:26 JFFS2 mount time Ferenc Havasi
@ 2004-10-20 15:26 ` Artem B. Bityuckiy
  2004-10-20 15:49   ` Ferenc Havasi
  2004-10-21  6:29 ` Artem B. Bityuckiy
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-20 15:26 UTC (permalink / raw)
  To: Ferenc Havasi; +Cc: dwmw2, linux-mtd

Ferenc,

I didn't investigate your patch well yet, but after 3 minuets of looking 
to it I have two questions:

1. Why did you introduce the JFFS2_SUM_MAGIC constant? As I understand, 
the node's magic field is needed to identify the *beginning of node*, 
*not the node type*. The type of node is defined by the next field, 
called 'nodetype'. You use it (JFFS2_NODETYPE_INODE_SUM). So, IMHO, the 
JFFS2_SUM_MAGIC constant doesn't fit into the common rules...

2. This is very minor of course, just a remark. IMHO, its better to 
avoid too many ifdefs, so, I think it is unnecessary to place the 
function prototype under ifdef. I mead:

+#ifdef CONFIG_JFFS2_FS_SUMMARY
+static struct jffs2_inode_cache *jffs2_scan_make_ino_cache(struct 
jffs2_sb_info *c, uint32_t ino);
+#endif



Ferenc Havasi wrote:
> Dear All,
> 
> Here is the latest version of our mount time improvement.
> 
> Using of it:
> - apply this patch on the latest version of MTD
> - compile sumtool (make command in mtd/util)
> - make your JFFS2 image as before (or you can use already created images 
> as well)
> - run sumtool to insert summary information, for example:
>   ./sumtool -i original.jffs2 -o new.jffs2 -e128KiB
> - recompile your kernel with "JFFS2 inode summary support"
> 
> Jarkko made a measurement on a real NAND device: his JFFS2 image was 
> 120819928 (115M), after running sumtool the new image was 123338752 (117M).
> 
> Using the original mount time was 55 sec, with the new image it is only 
> 8.5 sec.
> 
> It works very similar as our previous improvement: stores special 
> information at the end of the erase blocks, and at mount time if there 
> is this kind of information the scaning of the erase block is unneccessary.
> 
> New things compared to our previous improvement:
> - it was fully rewritten
> - we separated the user space tool from mkfs. (sumtool)
> - sumtool now not only inserts the summary information but also make 
> some node-reordering. There will be two kind of erase blocks: in the 
> "first type" there will be only jffs2_raw_inodes, and all other node 
> (jffs2_raw_dirent) will be stored in the "second type". It generates 
> summary at the end of all "fist type" eraseblock. (the "second type" 
> will be scanned as before, because all information is needed in 
> jffs_raw_dirent at mount time)
> 
> Ceratinly all of these things are optional (as you can see above you 
> have to select it from kernel config). The JFFS2 image produced by 
> sumtool is also usable with previous kernel because the summary node is 
> JFFS2_FEATURE_RWCOMPAT_DELETE.
> 
> I think it can be usefull not only for us. David, may I commit it to the 
> CVS?
> 
> Regards,
> Ferenc

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-20 15:26 ` [OBORONA-SPAM] " Artem B. Bityuckiy
@ 2004-10-20 15:49   ` Ferenc Havasi
  2004-10-20 15:53     ` Artem B. Bityuckiy
  0 siblings, 1 reply; 27+ messages in thread
From: Ferenc Havasi @ 2004-10-20 15:49 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: dwmw2, linux-mtd

Hi Artem,

> 1. Why did you introduce the JFFS2_SUM_MAGIC constant? As I understand, 
> the node's magic field is needed to identify the *beginning of node*, 
> *not the node type*. The type of node is defined by the next field, 
> called 'nodetype'. You use it (JFFS2_NODETYPE_INODE_SUM). So, IMHO, the 
> JFFS2_SUM_MAGIC constant doesn't fit into the common rules...

The reason is the following: the summary node is at the end of the erase 
block, and it has not fixed size (its size depends on the information it 
stores).

The main advantage of using summary node is to avoid the original 
scanning method. So we cannot use the original full-scanning method to 
determine  the begining of the summary node (using only 
JFFS2_NODETYPE_INODE_SUM).

Our method is the following:
- read some bytes at the end of the erase block
- if the last word is JFFS2_SUM_MAGIC than we will almost sure that it 
is an erase block which has summary
- the word before this magic is the length of the node
- using this length we can check that it is really a 
JFFS2_NODETYPE_INODE_SUM node, and process it

I can't image more effective method to determine the begining of the 
summary node. (if you have better suggestion...) And because the magic 
is inside of the summary node I think it is fit to the philosophy of 
JFFS2 - but a little bit tricky.

> 2. This is very minor of course, just a remark. IMHO, its better to 
> avoid too many ifdefs, so, I think it is unnecessary to place the 
> function prototype under ifdef. I mead:
> 
> +#ifdef CONFIG_JFFS2_FS_SUMMARY
> +static struct jffs2_inode_cache *jffs2_scan_make_ino_cache(struct 
> jffs2_sb_info *c, uint32_t ino);
> +#endif

Yes, I aggree. I will modify it.

Bye,
Ferenc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-20 15:49   ` Ferenc Havasi
@ 2004-10-20 15:53     ` Artem B. Bityuckiy
  0 siblings, 0 replies; 27+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-20 15:53 UTC (permalink / raw)
  To: Ferenc Havasi; +Cc: linux-mtd

Ferenc Havasi wrote:
> The reason is the following: the summary node is at the end of the erase 
> block, and it has not fixed size (its size depends on the information it 
> stores).
> 
> The main advantage of using summary node is to avoid the original 
> scanning method. So we cannot use the original full-scanning method to 
> determine  the begining of the summary node (using only 
> JFFS2_NODETYPE_INODE_SUM).
> 
> Our method is the following:
> - read some bytes at the end of the erase block
> - if the last word is JFFS2_SUM_MAGIC than we will almost sure that it 
> is an erase block which has summary
> - the word before this magic is the length of the node
> - using this length we can check that it is really a 
> JFFS2_NODETYPE_INODE_SUM node, and process it
> 
> I can't image more effective method to determine the begining of the 
> summary node. (if you have better suggestion...) And because the magic 
> is inside of the summary node I think it is fit to the philosophy of 
> JFFS2 - but a little bit tricky.
Ok, I got it. I was wrong, sorry.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-20 14:26 JFFS2 mount time Ferenc Havasi
  2004-10-20 15:26 ` [OBORONA-SPAM] " Artem B. Bityuckiy
@ 2004-10-21  6:29 ` Artem B. Bityuckiy
  2004-10-21  6:54   ` Ferenc Havasi
  2004-10-21  7:30 ` JFFS2 mount time - more Artem B. Bityuckiy
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-21  6:29 UTC (permalink / raw)
  To: Ferenc Havasi; +Cc: dwmw2, linux-mtd, jffs-dev

Hello Ferenc,

As I understand, you only prepare JFFS2 image with summaries. This is 
great until we do not change anything. For read-only file-systems this 
is OK.

But what if files/direntries are changed/deleted ? Do you write summary 
information dynamically? How are you going to place nodes/direntries to 
different blocks dynamically?

Ferenc Havasi wrote:
> Dear All,
> 
> Here is the latest version of our mount time improvement.
> 
> Using of it:
> - apply this patch on the latest version of MTD
> - compile sumtool (make command in mtd/util)
> - make your JFFS2 image as before (or you can use already created images 
> as well)
> - run sumtool to insert summary information, for example:
>   ./sumtool -i original.jffs2 -o new.jffs2 -e128KiB
> - recompile your kernel with "JFFS2 inode summary support"
> 
> Jarkko made a measurement on a real NAND device: his JFFS2 image was 
> 120819928 (115M), after running sumtool the new image was 123338752 (117M).
> 
> Using the original mount time was 55 sec, with the new image it is only 
> 8.5 sec.
> 
> It works very similar as our previous improvement: stores special 
> information at the end of the erase blocks, and at mount time if there 
> is this kind of information the scaning of the erase block is unneccessary.
> 
> New things compared to our previous improvement:
> - it was fully rewritten
> - we separated the user space tool from mkfs. (sumtool)
> - sumtool now not only inserts the summary information but also make 
> some node-reordering. There will be two kind of erase blocks: in the 
> "first type" there will be only jffs2_raw_inodes, and all other node 
> (jffs2_raw_dirent) will be stored in the "second type". It generates 
> summary at the end of all "fist type" eraseblock. (the "second type" 
> will be scanned as before, because all information is needed in 
> jffs_raw_dirent at mount time)
> 
> Ceratinly all of these things are optional (as you can see above you 
> have to select it from kernel config). The JFFS2 image produced by 
> sumtool is also usable with previous kernel because the summary node is 
> JFFS2_FEATURE_RWCOMPAT_DELETE.
> 
> I think it can be usefull not only for us. David, may I commit it to the 
> CVS?
> 
> Regards,
> Ferenc

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-21  6:29 ` Artem B. Bityuckiy
@ 2004-10-21  6:54   ` Ferenc Havasi
  2004-10-21  7:16     ` Artem B. Bityuckiy
  0 siblings, 1 reply; 27+ messages in thread
From: Ferenc Havasi @ 2004-10-21  6:54 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: dwmw2, linux-mtd, jffs-dev

Hi Artem,

> As I understand, you only prepare JFFS2 image with summaries. This is 
> great until we do not change anything. For read-only file-systems this 
> is OK.
> 
> But what if files/direntries are changed/deleted ? Do you write summary 
> information dynamically? How are you going to place nodes/direntries to 
> different blocks dynamically?

You are right, there is a small change which is really important (and 
will be ready very soon) to extend jffs2_mark_node_obsolete() to mark 
not only the node but also its entry in the summary.

Any other improvement can be done later, because after it the filesystem 
will be always coherent, because we write summary only at the of the 
erasy blocks, when it is fully "finished" - so if there is a summary 
somewhere we will not need to extend it, only to mark the obscolated nodes.

We also plan in the near future to implement the ability of generating 
summary dinamically when the filesystem finishes an erase block - which 
keep this "fast mount time" permament.

Bye,
Ferenc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-21  6:54   ` Ferenc Havasi
@ 2004-10-21  7:16     ` Artem B. Bityuckiy
  2004-10-21 19:50       ` Ferenc Havasi
  0 siblings, 1 reply; 27+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-21  7:16 UTC (permalink / raw)
  To: Ferenc Havasi; +Cc: dwmw2, linux-mtd, jffs-dev

> You are right, there is a small change which is really important (and 
> will be ready very soon) to extend jffs2_mark_node_obsolete() to mark 
> not only the node but also its entry in the summary.
Unfortunately, you can not mark entries as obsoleted in your summary 
node in case of NAND.

If you write your summary only for *full* blocks, you will not need to 
mark entries obsoleted, even if you have NOR flash (but you can on NOR). 
The partially filled blocks must not have the summary node (you can 
introduce special marker and write it to OOB of the last page of 
NAND/last word of sector on NOR which tells if there is the summary node 
present).

So, fully filled block will have summary and will be scanned very 
quickly, partially filled ones will have no summary and will be fully 
scanned, free blocks will have cleanmarkers and will not be scanned, 
other blocks will be either erased or considered free.

> 
> Any other improvement can be done later, because after it the filesystem 
> will be always coherent, because we write summary only at the of the 
> erasy blocks, when it is fully "finished" - so if there is a summary 
> somewhere we will not need to extend it, only to mark the obscolated nodes.
Yes, nice, but why do you need to mark obsoleted nodes in summary ??? 
When you insert node to the fragtree or dirents to the list, JFFS2 code 
will detect obsoleted nodes automatically, no need to mark them physically.

> 
> We also plan in the near future to implement the ability of generating 
> summary dinamically when the filesystem finishes an erase block - which 
> keep this "fast mount time" permament.
This would be perfect.


-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* JFFS2 mount time - more
  2004-10-20 14:26 JFFS2 mount time Ferenc Havasi
  2004-10-20 15:26 ` [OBORONA-SPAM] " Artem B. Bityuckiy
  2004-10-21  6:29 ` Artem B. Bityuckiy
@ 2004-10-21  7:30 ` Artem B. Bityuckiy
       [not found] ` <41776351.4040204@yandex.ru>
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Artem B. Bityuckiy @ 2004-10-21  7:30 UTC (permalink / raw)
  To: linux-mtd

Ferenc,

I have 3 more questions.

1. How large are your summary nodes (in average) for blocks full of 
dirents/nodes ?
2. Why do not you use compression for them?
3. Why did you introduce new tool instead of just adding new options to 
the mkfs.jffs2 ?

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time - 3 more questions
       [not found] ` <41776351.4040204@yandex.ru>
@ 2004-10-21  7:39   ` Ferenc Havasi
  0 siblings, 0 replies; 27+ messages in thread
From: Ferenc Havasi @ 2004-10-21  7:39 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: dwmw2, linux-mtd, jffs-dev

Hi Artem,

> 1. How large are your summary nodes (in average) for blocks full of 
> dirents/nodes ?

It heavily depends on
- the size of the earase block
- the sizes of the nodes
It is 4 words for every jffs2_raw_inode. Dirents are stored separatedly 
  without summary.

> 2. Why do not you use compression for them?

To make boot time as fast as possible :) But not a bad idea. If someone 
needs it we can make a new option.

> 3. Why did you introduce new tool instead of just adding new options to 
> the mkfs.jffs2 ?

I think it is "nicer", cleaner design, and uing this separation the 
reordering of the nodes is much more easier.

Bye,
Ferenc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-20 14:26 JFFS2 mount time Ferenc Havasi
                   ` (3 preceding siblings ...)
       [not found] ` <41776351.4040204@yandex.ru>
@ 2004-10-21 12:49 ` Jarkko Lavinen
  2004-10-21 19:11   ` Ferenc Havasi
  2004-10-22  9:58   ` Ferenc Havasi
  2004-10-21 13:24 ` David Woodhouse
  5 siblings, 2 replies; 27+ messages in thread
From: Jarkko Lavinen @ 2004-10-21 12:49 UTC (permalink / raw)
  Cc: dwmw2, linux-mtd

On Wed, Oct 20, 2004 at 04:26:27PM +0200, ext Ferenc Havasi wrote:
> Jarkko made a measurement on a real NAND device: his JFFS2 image was 
> 120819928 (115M), after running sumtool the new image was 123338752 (117M).
> 
> Using the original mount time was 55 sec, with the new image it is only 
> 8.5 sec.

My initial test was only about the mount time. I have now also tried
to exercise the patched file system and with very little testing I get
CRC or ECC errors.

  # mount /dev/mtdblock2 /mnt -t jffs2
  # mkdir /mnt/testdir
  # umount /mnt
  jffs2_flush_wbuf(): Write failed with -5
  # mount /dev/mtdblock2 /mnt -t jffs2
  mtd->read(0x1fbec bytes from 0x1fc0414) returned ECC error
  Empty flash at 0x01fc2f1c ends at 0x01fc3000
  #

With plain 2.6.9-rc4-omap1 with fresh CVS MTD code, I don't see anything
weird occuring.

Jarkko Lavinen

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-20 14:26 JFFS2 mount time Ferenc Havasi
                   ` (4 preceding siblings ...)
  2004-10-21 12:49 ` JFFS2 mount time Jarkko Lavinen
@ 2004-10-21 13:24 ` David Woodhouse
  2004-10-21 20:05   ` Ferenc Havasi
  5 siblings, 1 reply; 27+ messages in thread
From: David Woodhouse @ 2004-10-21 13:24 UTC (permalink / raw)
  To: Ferenc Havasi; +Cc: linux-mtd, jffs-dev

On Wed, 2004-10-20 at 16:26 +0200, Ferenc Havasi wrote:
> Dear All,
> 
> Here is the latest version of our mount time improvement.

It's looking good, but the kernel really needs to be able to write these
summaries for _itself_ in order to give a real improvement over the long
term. If the file system has to be read-only we might as well be using
cramfs, and if the summary becomes obsolete over time we might as well
not bother in a lot of cases.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-21 12:49 ` JFFS2 mount time Jarkko Lavinen
@ 2004-10-21 19:11   ` Ferenc Havasi
  2004-10-22  9:58   ` Ferenc Havasi
  1 sibling, 0 replies; 27+ messages in thread
From: Ferenc Havasi @ 2004-10-21 19:11 UTC (permalink / raw)
  To: Jarkko Lavinen; +Cc: dwmw2, Kluba Patrik, linux-mtd

Hi Jarkko,

> My initial test was only about the mount time. I have now also tried
> to exercise the patched file system and with very little testing I get
> CRC or ECC errors.
> 
>   # mount /dev/mtdblock2 /mnt -t jffs2
>   # mkdir /mnt/testdir
>   # umount /mnt
>   jffs2_flush_wbuf(): Write failed with -5
>   # mount /dev/mtdblock2 /mnt -t jffs2
>   mtd->read(0x1fbec bytes from 0x1fc0414) returned ECC error
>   Empty flash at 0x01fc2f1c ends at 0x01fc3000
>   #
> 
> With plain 2.6.9-rc4-omap1 with fresh CVS MTD code, I don't see anything
> weird occuring.

Can you send me the full kernel log file?

Another interesting test would be to test the new image (sumtool) with 
the CVS MTD code. Because the summary is RWCOMPAT_DELETE node it should 
works well. It would be nice to now if there is CRC/ECC errors in this case.

Thanks,
Ferenc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-21  7:16     ` Artem B. Bityuckiy
@ 2004-10-21 19:50       ` Ferenc Havasi
  0 siblings, 0 replies; 27+ messages in thread
From: Ferenc Havasi @ 2004-10-21 19:50 UTC (permalink / raw)
  To: Artem B. Bityuckiy; +Cc: dwmw2, linux-mtd, jffs-dev

Hi Artem,

> Unfortunately, you can not mark entries as obsoleted in your summary 
> node in case of NAND.
> 
> If you write your summary only for *full* blocks, you will not need to 
> mark entries obsoleted, even if you have NOR flash (but you can on NOR). 
> The partially filled blocks must not have the summary node (you can 
> introduce special marker and write it to OOB of the last page of 
> NAND/last word of sector on NOR which tells if there is the summary node 
> present).

Really, you are right.

So we only have to solve this problem on NOR. I think the easiest 
solution is to set jffs2_can_mark_obsolete() to false if the summary 
support is enabled.

Bye,
Ferenc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-21 13:24 ` David Woodhouse
@ 2004-10-21 20:05   ` Ferenc Havasi
  2004-10-22 12:44     ` Artem Bityuckiy
  0 siblings, 1 reply; 27+ messages in thread
From: Ferenc Havasi @ 2004-10-21 20:05 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-mtd, jffs-dev

David Woodhouse wrote:
> It's looking good, but the kernel really needs to be able to write these
> summaries for _itself_ in order to give a real improvement over the long
> term. If the file system has to be read-only we might as well be using
> cramfs, and if the summary becomes obsolete over time we might as well
> not bother in a lot of cases.

Our plan for it:

We would like to store some additional information in jeb struct:
- a type information, where there this type can be INODE_ONLY and 
ANYTHING_OTHER. This information is easy to detect during mount time.
- a predicted summary size (calculated dinamically). It will be used to 
  decide when to generate the summary. Ceratinly only for INODE_ONLY 
erase blocks.

If I am right every node allocation is done by jffs2_reserve_space(). We 
would like to modify it, and introduce a new interface for it called 
jffs2_reserve_space_for_inode() function. Every inode storing function 
(there is no too much I think) should call 
jffs2_reserve_space_for_inode() with some extra information (inode 
number...).

jffs2_reserve_space() should use only ANYTHING_OTHER eraseblocks, as 
jffs2_reserve_space_for_inode() use only INODE_ONLY ones. If there is no 
free space in them it should use the usual technique to find a clean 
eraseblock and start to store the new node in it.

The generating of summary is also the task of 
jffs2_reserve_space_for_inode(), if the new inode (+summary) is not fit 
in the erase block, it will generates summary.

What do you think?

Regards,
Ferenc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-21 12:49 ` JFFS2 mount time Jarkko Lavinen
  2004-10-21 19:11   ` Ferenc Havasi
@ 2004-10-22  9:58   ` Ferenc Havasi
  1 sibling, 0 replies; 27+ messages in thread
From: Ferenc Havasi @ 2004-10-22  9:58 UTC (permalink / raw)
  To: Jarkko Lavinen; +Cc: dwmw2, linux-mtd

Jarkko Lavinen wrote:

> My initial test was only about the mount time. I have now also tried
> to exercise the patched file system and with very little testing I get
> CRC or ECC errors.
> 
>   # mount /dev/mtdblock2 /mnt -t jffs2
>   # mkdir /mnt/testdir
>   # umount /mnt
>   jffs2_flush_wbuf(): Write failed with -5
>   # mount /dev/mtdblock2 /mnt -t jffs2
>   mtd->read(0x1fbec bytes from 0x1fc0414) returned ECC error
>   Empty flash at 0x01fc2f1c ends at 0x01fc3000
>   #
> 
> With plain 2.6.9-rc4-omap1 with fresh CVS MTD code, I don't see anything
> weird occuring.

We are tring to find out what happens here...

Jarkko previously sent me some more detail, the logs starts with:
> sh-2.05b# /rootfstest.sh
> Mounting file system: 			Ok
> Creating a test directory: 						Ok
> Creating a test file: mtd->read(0x44 bytes from 0x1fa344c) returned ECC error
> Data CRC 6d0b1da8 != calculated CRC 9c4f3838 for node at 01fa344c
> mtd->read(0x44 bytes from 0x1fa3e20) returned ECC error
> Data CRC 5127cb7f != calculated CRC 057e127c for node at 01fa3e20
> mtd->read(0x44 bytes from 0x1fa4388) returned ECC error
> mtd->read(0x44 bytes from 0x1fa5cb8) returned ECC error
> mtd->read(0x44 bytes from 0x1fa6748) returned ECC error
> mtd->read(0x44 bytes from 0x1fa7b30) returned ECC error
> Data CRC 72b41a04 != calculated CRC ebc121db for node at 01fa7b30
> mtd->read(0x44 bytes from 0x1fa866c) returned ECC error
> Data CRC 9ff1d419 != calculated CRC cb2cce56 for node at 01fa866c

It means the mounting is done successfully. The first problem is when 
the filesystem try to read jffs2_raw_inode nodes (if I am right the 0x44 
is the size of that). It is not successfull (I don't know why), and it 
cause CRC errors, too. The only one differences should be only the the 
original version already read this 0x44 before (during mount time), the 
summary version did not read yet, just know where it is from the summary.

Jarkko, one more interesing thing can be (if you have that image) to see 
what is at the place 0x1fa344c, 01fa3e20, ... with the tool jffs2dump.

If anyone have any idea that is welcome. Unfortunatelly we don't have 
real NAND device, and it works with our emulator.

Thanks,
Ferenc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-21 20:05   ` Ferenc Havasi
@ 2004-10-22 12:44     ` Artem Bityuckiy
  2004-10-25  9:36       ` Ferenc Havasi
  2004-10-26  9:29       ` Jarkko Lavinen
  0 siblings, 2 replies; 27+ messages in thread
From: Artem Bityuckiy @ 2004-10-22 12:44 UTC (permalink / raw)
  To: Ferenc Havasi; +Cc: linux-mtd, David Woodhouse, jffs-dev

Hello Ferenc,

At first, please, let me describe your design shortly to be sure I 
understand it and we both thinking the same way.

Essentially, your design is based on the fact that you do not want to 
refer directory entries in the summary nodes. Motivation that you will 
keep almost the copy of direntries in the summary, thus:
1. duplicating too many information.
2. you suppose there will not be the mount speed acceleration.

So, for this purpose you are going to distribute the inode nodes and 
other (including direntry nodes) by different blocks. Those blocks, who 
contain only the inode nodes, will have summaries, other blocks - will not.

I think this is not the best solution. Why? In general, because I do not 
like the following:
A. Your idea to distribute inode nodes and other nodes between different 
blocks.
B. Your assumption that the directory information in summaries will not 
affect the mount time.

The following are reasons concerning the item A.

1. Your change will affect JFFS2 very heavily. You will introduce 
restriction into JFFS2. Another improvements may not work with such 
restriction. Now all the blocks are equivalent. But you want to 
distinguish between two kins of blocks. Don't you think it is too 
complicated decision?

2. Think about the wear-leveling. In JFFS it was ideal. In JFFS2 it is 
good, but not so ideal. I average, the inode nodes are changed more 
often (just think about FIFOs, we told about them in this list 
recently). So, you will need to Garbage Collect the NODE_ONLY blocks 
more often. So, I afraid the wear-leveling will suffer from your 
improvement.

3. Imagine the file system with *lots* of very small files. I this case, 
  the direntries portion on the media will be large enough. And the 
mount time of such file system will not be improved very well.

4. It seems for me you will need to increase the number of blocks which 
are reserved for the garbage collection (double ?). This is also minor 
drawback.

The following are reasons concerning the item B.

I believe that if we have directory references in summaries, this will 
increase the mount speed.

1. At first, we will store fewer data! We don't need to keep the common 
headers, CRCs and mctimes.
2. At the second, we may compress summary (direntries aren't compressed)!
3. And the third, on NAND there is difference between reading lots of 
different pages or few pages.

I propose the another design.

1. Keep direntry references in summaries too and hence, do not 
distinguish between blocks with inode nodes and direntries.
2. Compress summaries.

So, you will avoid a lot of problems related to teaching the GC 
distinguish between different blocks. This will be more natural. I 
believe, summaries must refer *any* node in block. This is more simple 
and clean design.

Why you do not like this?

I see only one potential problem: direntries may have long names (up to 
255 symbols). this may lead to large summaries.

But in this case we may do:
1. Improve the JFFS2 itself. Keep, say, only 20, characters in the 
full_dirent structure. Most of direntries will fit. For other, we will 
just read the flash.
2. We may not touch JFFS2, and keep only 20 characters in summaries. For 
other direntries, we may read them from flash (keeping theirs flash 
offsets instead of names).

Comments?

-- 
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: dedekind@oktetlabs.ru, web: http://www.oktetlabs.ru

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-22 12:44     ` Artem Bityuckiy
@ 2004-10-25  9:36       ` Ferenc Havasi
  2004-10-25 10:56         ` Artem Bityuckiy
  2004-10-25 11:21         ` Artem Bityuckiy
  2004-10-26  9:29       ` Jarkko Lavinen
  1 sibling, 2 replies; 27+ messages in thread
From: Ferenc Havasi @ 2004-10-25  9:36 UTC (permalink / raw)
  To: dedekind, David Woodhouse; +Cc: linux-mtd, jffs-dev

Hi Artem,

> So, for this purpose you are going to distribute the inode nodes and 
> other (including direntry nodes) by different blocks. Those blocks, who 
> contain only the inode nodes, will have summaries, other blocks - will not.

Yes, I think there are three kinds of nodes:
- type A contains relevant amount of data which is not needed at mount 
time (jffs2_raw_inode)
- type B is (almost) fully needed at mount time (jffs2_raw_dirent)
- type C is any other (unkown, developements in the future...)

To achieve as much mount time speed up as possible I think we should 
distinguish them.

Using summary the really relevant speed up will be only at node type
A. We can also generate summary for type B, but that (as you wrote) 
relevant ratio of the information will be duplicated.

So we whould like to intorduce two kinds of erase blocks:
- erase blocks with summary: it will store (now only) type A nodes, 
maybe later some of type B
- erase block without summary: it will store all of type C and B nodes 
which is not stored before

> 1. Your change will affect JFFS2 very heavily. You will introduce 
> restriction into JFFS2. Another improvements may not work with such 
> restriction. Now all the blocks are equivalent. But you want to 
> distinguish between two kins of blocks. Don't you think it is too 
> complicated decision?

What kind of restriction do you mean? We don't introduce any 
restrictions. The "type C" kind of nodes are processed as before, using 
the usual scanning method. If you what to force for every node to make 
their represenation in the summary, that whould be a restriction.

I think for some kinds of node summary is meaningful, and for some kinds 
not.

If we mix them that can be a very big slow down, if you what to process 
them only with making a reference in the summary to its offset, because 
if you (for example) what to read only 50 bytes (size of the node) you 
will have to read 512/2048 bytes depening on the flash. (where mostly 
there will be inode nodes which is not neccesery to read because that is 
int he summary)

But if all of this "not summarized, small" nodes are stored in a
"seperated" erase block than the this 512/2048 byte reading will not be 
unnecessary (because on the remaining 462-1998 bytes will store also 
relevant information, which is not in the summary).

> 2. Think about the wear-leveling. In JFFS it was ideal. In JFFS2 it is 
> good, but not so ideal. I average, the inode nodes are changed more 
> often (just think about FIFOs, we told about them in this list 
> recently). So, you will need to Garbage Collect the NODE_ONLY blocks 
> more often. So, I afraid the wear-leveling will suffer from your 
> improvement.

I think the GC solves it "automaticly". This mark 
(SUMMARIZED/NOT_SUMMARIZED) is not a premament thing, it is done "pseudo 
randomly".

I aggree that it cause some different behavior in wear-leveling but I 
don't think it makes it relevantly worse.

> 4. It seems for me you will need to increase the number of blocks which 
> are reserved for the garbage collection (double ?). This is also minor 
> drawback.

I don't understand what do you mean here.

> I believe that if we have directory references in summaries, this will 
> increase the mount speed.
> 1. At first, we will store fewer data! We don't need to keep the common 
> headers, CRCs and mctimes.
> 2. At the second, we may compress summary (direntries aren't compressed)!
> 3. And the third, on NAND there is difference between reading lots of 
> different pages or few pages.

Yes, we should try it - to store dirents in SUMMARIZED erase blocks. But 
it can be a improvement later, for first we need a well working stable 
system - and this is urgent for us now.

> 2. Compress summaries.

It makes harder to determine the optimal time of summary generation (it 
is easy to see the summary size, but here the compressed size of it the 
relevant). It can cause smaller image but may cause some slow down, too. 
We may introduce it later as an option.

So now we have two open discussion:
- is the SUMMARIZED / NOT_SUMMARIZED distiguishment good or not
- in the first version do we need dirents in the summary or not

Fortunatelly the effects (and side effects) of this improvements will be 
active only if the new kernel option is enabled, and don't kill any 
other future improvements.

I curious about (at least) David's optinion about these topics.

Bye,
Ferenc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-25  9:36       ` Ferenc Havasi
@ 2004-10-25 10:56         ` Artem Bityuckiy
  2004-10-25 15:30           ` Ferenc Havasi
  2004-10-25 11:21         ` Artem Bityuckiy
  1 sibling, 1 reply; 27+ messages in thread
From: Artem Bityuckiy @ 2004-10-25 10:56 UTC (permalink / raw)
  To: Ferenc Havasi; +Cc: linux-mtd, David Woodhouse, jffs-dev

Hello Ferenc,

 > Yes, I think there are three kinds of nodes:
 > - type A contains relevant amount of data which is not needed at mount
 > time (jffs2_raw_inode)
 > - type B is (almost) fully needed at mount time (jffs2_raw_dirent)
 > - type C is any other (unkown, developements in the future...)
 >
 > To achieve as much mount time speed up as possible I think we should
 > distinguish them.
This is what I really do not like.

Ok, let us discuss now only this topic. Lt I explain why I believe it is 
vad and very *unnatural* to introduce two or more kinds of blocks.

The example of JFFS2 change that I consider natural is the introduction 
of new node type. It is natural, because of when JFFS2 was designed, 
this possibility was foreseen and taken into account. It is relatively 
easy to do this. It is possible to do this and do not affect other 
things in the JFFS2.

Conversely, the introducing several block types was not foreseen in the 
JFFS2 design. And all things in the JFFS2 are coded with the assumption 
all the blocks are equivalent.

This is my point view on the issue in general.

Now I will try to illustrate why I think so.

1. In JFFS2 there are several lists of blocks - clean_list, dirty_list, 
very_dirty_list?. Are you going to introduce clean_list_typeA, 
dirty_list_typeA, very_dirty_list_typeA, clean_list_typeB, 
dirty_list_typeB, very_dirty_list_typeB ?

2. Just do 'grep "_list" * | grep -e "\(dirty\)\|\(very\)"' and see how 
many places in JFFS2 where these lists are changed. Do you think it is 
natural to introduce 3 more lists? I believe not. What if somebody else 
will introduce one more type of block?

3. There is write buffer in the JFFS2 which is used in case of NAND. Are 
you going to have two wbufs? This is also significant change.

4. Now the GC just gives one block, and moves all the valid nodes to 
another one. In your case (if you have the JFFS2 image which was created 
  by older code, without your patch, where all node types are mixed), 
you will need to move one type of nodes to one block, another to the 
another block.

So, I think you will be needed to change many things in JFFS2. You have 
a risk to hit on a can of worms.

So, do you agree that this change is *unnatural* ?

===================================================================

 >> 4. It seems for me you will need to increase the number of blocks
 >> which are reserved for the garbage collection (double ?). This is also
 >> minor drawback.
 > I don't understand what do you mean here

I mean the sb->resv_blocks_gcmerge and related. You will need to 
increase it, which is not very good.

-- 
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: dedekind@oktetlabs.ru, web: http://www.oktetlabs.ru

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-25  9:36       ` Ferenc Havasi
  2004-10-25 10:56         ` Artem Bityuckiy
@ 2004-10-25 11:21         ` Artem Bityuckiy
  1 sibling, 0 replies; 27+ messages in thread
From: Artem Bityuckiy @ 2004-10-25 11:21 UTC (permalink / raw)
  To: Ferenc Havasi; +Cc: linux-mtd, jffs-dev

> I curious about (at least) David's optinion about these topics.
I also wonder why people are not very active :-)


-- 
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: dedekind@oktetlabs.ru, web: http://www.oktetlabs.ru

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-25 10:56         ` Artem Bityuckiy
@ 2004-10-25 15:30           ` Ferenc Havasi
  2004-10-26  9:59             ` Artem Bityuckiy
  0 siblings, 1 reply; 27+ messages in thread
From: Ferenc Havasi @ 2004-10-25 15:30 UTC (permalink / raw)
  To: dedekind; +Cc: linux-mtd, David Woodhouse, jffs-dev

Hi Artem,

>  > To achieve as much mount time speed up as possible I think we should
>  > distinguish them.
> This is what I really do not like.
> 
> Ok, let us discuss now only this topic. Lt I explain why I believe it is 
> vad and very *unnatural* to introduce two or more kinds of blocks.

You are right, it can be unnatural in point of the original design of 
the JFFS2. But I think in point of the connection of this optimization 
and JFFS2 it is more natural than simple store offsets in the summary, 
or copy all the information into it.

Our plan was modify wbuf (make a second one) and modify 
jffs2_reserve_space to select the right wbuf and generate summary. Never 
planded to introduce new clean_*, dirty_*, ... lists, thats really too 
difficult.

> 3. There is write buffer in the JFFS2 which is used in case of NAND. Are 
> you going to have two wbufs? This is also significant change.

Yes, we started to implement it yesterday and now agree. It is really 
not easy, and we don't write to rewrite the NAND handling part of JFFS2 
whithout a real NAND device. Maybe at the design of JFFS3 :)

So you convinced me. We will change the design of summary. The inodes 
and dirents will be also in the summary. All other nodes will be copied 
as itself into the summary and cause a warning. The summary support will 
be a required thing for new node types, too.

In the kernel we will have to modify
1. jffs2_scan_eraseblock(), as it is already in our patch
2. jeb struct to store generated the summary dinamically (one plus field)
3. jffs2_reserve_space(), which will have a new parameter (summary 
size), which can be JFFS2_SUMMARY_INODE_SIZE or 
JFFS2_SUMMARY_DIRENT_SIZE(namelen). It can decide when to generate 
summary and it can do this generation.
4. jffs2_flash_writev(), which is used to write info to flash. It can 
parse the node (similar to sumtool) and store the summary of it in its jeb.

If it works we'll check the effect of compressing the summary. (size and 
speed)

Comments?

Bye,
Ferenc

P.s.: Thanks for this good conversation.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-22 12:44     ` Artem Bityuckiy
  2004-10-25  9:36       ` Ferenc Havasi
@ 2004-10-26  9:29       ` Jarkko Lavinen
  2004-10-26 10:24         ` Ferenc Havasi
  2004-10-26 10:34         ` Artem Bityuckiy
  1 sibling, 2 replies; 27+ messages in thread
From: Jarkko Lavinen @ 2004-10-26  9:29 UTC (permalink / raw)
  To: linux-mtd, jffs-dev; +Cc: ext Artem Bityuckiy, David Woodhouse

[-- Attachment #1: Type: text/plain, Size: 2655 bytes --]

I tried to see with jffs2dump how much Inodes and Dirents I have on
root filesystem on Arm testbed. Quick and dirty Perl script attached.
This isn't accurate as the calculated total image size misses at least
the final padding on the last erase block.

The size of the plain JFFS2 image is 31.1 MiB. The root fs consists of
all applications and libraries and no user data.

  $ jffs2dump -c rootfs.jffs2 | perl jffs2stats.pl
  Number of dirents:   6144.
   Total dirent node space:  304911 (0.9%)
   Average dirent len: 49.6
   Total dirent name space:  76671
   Average name len:   12.5

  Number of Inodes:    21197
   Total Inode space:  32254866 (99.1%)
   Average Inode size: 1521.7

  Padding:        37326 0.1%
  Total image size: 32559777
  $ ls -l rootfs.jffs2
  -rw-r--r--  1 root root 32597104 Oct 20 15:11 rootfs.jffs2

With sumtool the image size grows to 31.8 MiB

  $ jffs2dump -c rootfs-sum.jffs2 | perl jffs2stats.pl
  Number of dirents:   6144.
   Total dirent node space:  304911 (0.9%)
   Average dirent len: 49.6
   Total dirent name space:  76671
   Average name len:   12.5

  Number of Inodes:    21197
   Total Inode space:  32254866 (97.2%)
   Average Inode size: 1521.7

  Number of Inode Summary nodes:  251
   Total Inode Sum space: 631524, (1.9%)
   Average Sum node size: 2516.0

  Padding:        153063 0.5%
  Total image size: 33191301
  $ ls -l rootfs-sum.jffs2
  -rw-r--r--  1 root root 33423360 Oct 20 15:23 rootfs-sum.jffs2


If dentries were stored just as they are (unstripped and uncompressed)
in the summary, the summary size would grow by 50% to about 3% of the
whole image size.

On Fri, Oct 22, 2004 at 04:44:13PM +0400, ext Artem Bityuckiy wrote:
> I believe that if we have directory references in summaries, this will 
> increase the mount speed.
> 
> 1. At first, we will store fewer data! We don't need to keep the common 
> headers, CRCs and mctimes.
> 2. At the second, we may compress summary (direntries aren't compressed)!
> 3. And the third, on NAND there is difference between reading lots of 
> different pages or few pages.


I tried Ferenc's earlier mount time patch in August and the 52s mount
time dropped then to 14s. If I understand right, inodes and dentries
were then mixed in the erase block and the summary was for inodes
only.  This shows reading dentries from semirandom places is
expensive.

Ferenc's latest patch put dentries on their own erase block in
consecutive order.  Considering only the read efficiency from the
media, reading consecutive, uncompressed, and unstripped dentries from
a summary should cost no more than reading them from dedicated erase
block.

Jarkko Lavinen

[-- Attachment #2: jffs2stats.pl --]
[-- Type: text/x-perl, Size: 3230 bytes --]

#! /usr/bin/perl

$EBLOCKSIZE=131072;

$dirents = $totdirentlen = $totnamelen = 0;
$inodes = $totinodelen = 0;
$sumnodes = $totsumnodelen = 0;
$totpadlen = 0;
$gaps = $totgaplen = 0;
$nextaddr = 0;

sub checkpadding {
    my ($addr, $totlen) = @_;

    my $len = hex($addr) - $nextaddr;

    if ($len > 0) {
	if (hex($addr) % $EBLOCKSIZE == 0 || $len <= 3) {
	    $totalpadlen += $len;
	} else {
	    print sprintf "Gap seen at %08x .. $addr, length $len\n", $nextaddr;

	    $gaps++;
	    $totgaplen += hex($addr) - $nextaddr;
	}
    }

    $nextaddr = hex($addr) + hex($totlen);
}

while(<>) {
    chop;
    if (/^\s+Dirent/) {
	die "Cannot parse $_" if (! /^ \s+ 
				  Dirent     \s+ 
				  node \s at \s+ (\w+), \s+ 
				  totlen     \s+ (\w+), \s+ 
				  \#pino     \s+ (\w+), \s+ 
				  version    \s+ (\w+), \s+ 
				  \#ino      \s+ (\w+), \s+ 
				  nsize      \s+ (\w+), \s+
				  name       \s+ (.*) 
				  $/x);

	my ($addr, $totlen, $pino, $version, $ino, $nsize, $name) = ($1, $2, $3, $4, $5, $6, $7);
	&checkpadding($addr, $totlen);

	$dirents++;
	$totdirentlen += hex($totlen);
	$totnamelen += hex($nsize);
    } elsif (/^\s+Inode Sum/) { 
	die "Cannot parse $_" if (! /^ \s+ 
				  Inode \s Sum \s+
				  node \s at          \s+ (\w+), \s+
				  totlen              \s+ (\w+), \s+
				  sum_num             \s+ (\w+), \s+
				  cleanmarker \s size \s+ (\w+)  \s*
				  $/x);

	my ($addr, $totlen, $sum_num, $cleanmarksize) = ($1, $2, $3, $4);
	&checkpadding($addr, $totlen);
	
	$sumnodes++;
	$totsumnodelen += hex($totlen);
    } elsif (/^\s+Inode/) {
	die "Cannot parse $_" if (! /^ \s+ 
				  Inode \s+
				  node \s at \s+ (\w+), \s+
				  totlen     \s+ (\w+), \s+
				  \#ino      \s+ (\w+), \s+
				  version    \s+ (\w+), \s+
				  isize      \s+ (\w+), \s+
				  csize      \s+ (\w+), \s+ 
				  dsize      \s+ (\w+), \s+
				  offset     \s+ (\w+)  \s*
				  $/x);

	my ($addr, $totlen, $ino, $version, $isize, $csize, $dsize, $offset) = ($1, $2, $3, $4, $5, $6, $7, $8);
	&checkpadding($addr, $totlen);

	$inodes++;
	$totinodelen += hex($totlen);
    } else {
	die "Cannot parse $_";
    }
}

$totalsize = $totdirentlen + $totinodelen + $totsumnodelen + $totpadlen + $totgaplen;

print "Number of dirents:\t$dirents.\n";
print sprintf " Total dirent node space:\t$totdirentlen (%.1f%%)\n", 100.0*$totdirentlen/$totalsize;
print " Average dirent len:\t", sprintf("%.1f", $totdirentlen/$dirents), "\n" if ($dirents > 0);
print " Total dirent name space:\t$totnamelen\n";
print sprintf(" Average name len:\t%.1f\n", $totnamelen/$dirents) if ($dirents > 0);
print "\n";
print "Number of Inodes:\t$inodes\n";
print sprintf " Total Inode space:\t$totinodelen (%.1f%%)\n", 100.0*$totinodelen/$totalsize;
print " Average Inode size:\t", sprintf("%.1f", $totinodelen/$inodes), "\n" if ($inodes > 0);

if ($sumnodes) {
    print "\n";
    print "Number of Inode Summary nodes:\t$sumnodes\n";
    print sprintf " Total Inode Sum space:\t$totsumnodelen, (%.1f%%)\n", 100.0*$totsumnodelen/$totalsize;
    print " Average Sum node size:\t", sprintf("%.1f", $totsumnodelen/$sumnodes), "\n";
}

print sprintf "\nPadding:\t$totalpadlen %.1f%%\n", 100.0*$totalpadlen/$totalsize;

print "Total image size: $totalsize\n";

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-25 15:30           ` Ferenc Havasi
@ 2004-10-26  9:59             ` Artem Bityuckiy
  2004-10-26 10:21               ` Ferenc Havasi
  0 siblings, 1 reply; 27+ messages in thread
From: Artem Bityuckiy @ 2004-10-26  9:59 UTC (permalink / raw)
  To: Ferenc Havasi; +Cc: David Woodhouse, linux-mtd, jffs-dev

Hello Ferenc,

Ferenc Havasi wrote:
> In the kernel we will have to modify
> 1. jffs2_scan_eraseblock(), as it is already in our patch
> 2. jeb struct to store generated the summary dinamically (one plus field)
IMHO, since the summary relates only to one block, the current block, it 
is logical to refer the summary from the jffs2_sb_info, not from 
jffs2_erase_blocks. It is also not very nice to store it in the 
jffs2_erase_blocks since it will increase the size of array of JFFS2 
blocks (c->blocks[]).

> 3. jffs2_reserve_space(), which will have a new parameter (summary 
> size), which can be JFFS2_SUMMARY_INODE_SIZE or 
> JFFS2_SUMMARY_DIRENT_SIZE(namelen). It can decide when to generate 
> summary and it can do this generation.
Yes, I also think so.

Currently the jffs2_do_reserve_space() do (as I understand):
1. If the current block (c->nextblock) have space and it is sufficient 
for request, it reserves it.
2. If the c->nextblock has fewer size, than requested, the c->nextblock 
is wasted, put to the correspondent list (dirty_list, etc), free block 
is taken and reserved.

Thus, the jffs2_do_reserve_space() should be improved to be able to save 
some space for summary. And, some function like jffs2_write_summary() 
which will be called before jffs2_do_reserve_space() takes new block 
from the free_list.

> 4. jffs2_flash_writev(), which is used to write info to flash. It can 
> parse the node (similar to sumtool) and store the summary of it in its jeb.
May be write here... Didn't think a lot... May be as I wrote, in 
jffs2_do_reserve_space()...

I also offer you to include direntries in summaries and compress them. See:

sizeof(struct jffs2_raw_dirent) = 40 (without name)
you will need to store in your summary only:

totlen
pino
version
ino
nsize
type
name

which is 24 bytes. You don't store all data! Of course, in case of long 
names things are not so good...

If you also compress them, they will be smaller (minus 50-70%)!

So, if there are few direntries in block, why not to store them in summary?

Did you measured the time of summary uncompress on your system? I can't 
know for sure, but I suspect that if you have, say, 200MHz system, the 
time of uncompression = o(time of block read)!

There is one more issue: if there are too many direntries in block, 
summary may become too large (the compression helps here). In this case 
you may not write summary or don't mention direntries in summary.

-- 
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: dedekind@oktetlabs.ru, web: http://www.oktetlabs.ru

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-26  9:59             ` Artem Bityuckiy
@ 2004-10-26 10:21               ` Ferenc Havasi
  2004-10-26 11:05                 ` Artem Bityuckiy
  0 siblings, 1 reply; 27+ messages in thread
From: Ferenc Havasi @ 2004-10-26 10:21 UTC (permalink / raw)
  To: dedekind; +Cc: David Woodhouse, linux-mtd, jffs-dev

Hi Artem,

> IMHO, since the summary relates only to one block, the current block, it 
> is logical to refer the summary from the jffs2_sb_info, not from 
> jffs2_erase_blocks. It is also not very nice to store it in the 
> jffs2_erase_blocks since it will increase the size of array of JFFS2 
> blocks (c->blocks[]).

Is it sure than only one non-full erase block is in the filesystem? 
Non-full means here that there is some nodes already in that, but also 
there is some free space at the end of it.

>> 4. jffs2_flash_writev(), which is used to write info to flash. It can 
>> parse the node (similar to sumtool) and store the summary of it in its 
>> jeb.
> 
> May be write here... Didn't think a lot... May be as I wrote, in 
> jffs2_do_reserve_space()...

As I see jffs2_do_reserve space is called before inode/... allocation in 
most cases. So at that time the summary information is not know - but at 
writing it have to be known certainly.

> So, if there are few direntries in block, why not to store them in summary?

You may misunderstood me. In the previous letter I wrote: "So you 
convinced me. We will change the design of summary. The inodes and 
dirents will be also in the summary."

So now we do plan to store dirents in the summary. :)

> Did you measured the time of summary uncompress on your system? I can't 
> know for sure, but I suspect that if you have, say, 200MHz system, the 
> time of uncompression = o(time of block read)!

It depends on the compressor.

We will test it with zlib/rtime. I whould like to implement as an 
optional feature.

> There is one more issue: if there are too many direntries in block, 
> summary may become too large (the compression helps here). In this case 
> you may not write summary or don't mention direntries in summary.

Let see how it work, and after we can make it more optimal :)

Bye,
Ferenc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-26  9:29       ` Jarkko Lavinen
@ 2004-10-26 10:24         ` Ferenc Havasi
  2004-10-26 10:34         ` Artem Bityuckiy
  1 sibling, 0 replies; 27+ messages in thread
From: Ferenc Havasi @ 2004-10-26 10:24 UTC (permalink / raw)
  To: Jarkko Lavinen; +Cc: ext Artem Bityuckiy, linux-mtd, David Woodhouse, jffs-dev

Hi Jarkko,

 > If dentries were stored just as they are (unstripped and uncompressed)
 > in the summary, the summary size would grow by 50% to about 3% of the
 > whole image size.

Thanks, good to know it.

Did you got ECC/CRC errors? The most interest test for me whould be to 
test the new (sumtool) image with the original kernel (because the 
summary nodes are compatibles it should work), and see if there is 
ECC/CRC errors or not.

Bye,
Ferenc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-26  9:29       ` Jarkko Lavinen
  2004-10-26 10:24         ` Ferenc Havasi
@ 2004-10-26 10:34         ` Artem Bityuckiy
  1 sibling, 0 replies; 27+ messages in thread
From: Artem Bityuckiy @ 2004-10-26 10:34 UTC (permalink / raw)
  To: Jarkko Lavinen; +Cc: David Woodhouse, linux-mtd, jffs-dev

Hello Jarkko,

> I tried Ferenc's earlier mount time patch in August and the 52s mount
> time dropped then to 14s. If I understand right, inodes and dentries
> were then mixed in the erase block and the summary was for inodes
> only.  This shows reading dentries from semirandom places is
> expensive.
This is very good that direntries are distributed more or less uniformly 
in average.

> 
> Ferenc's latest patch put dentries on their own erase block in
> consecutive order.  Considering only the read efficiency from the
> media, reading consecutive, uncompressed, and unstripped dentries from
> a summary should cost no more than reading them from dedicated erase
> block.
> 
Definitely true - the second patch must be better than the first one. But
unfortunately, it hard to do this dinamically :-( Ferenc tried...

But in my proposition, we will also refer direntries in the summary - 
this is not the same as to read direntries from where they are placed, 
this is another thing, especially in case of NAND! There is difference 
(if we have NAND) - whether to read one 512 NAND page containing 
compressed information about 20-25 direntries or to read 20-25 
*different* NAND pages.

So, I think, new design will also better than the early Ferenc's patch :-)

-- 
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: dedekind@oktetlabs.ru, web: http://www.oktetlabs.ru

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-26 10:21               ` Ferenc Havasi
@ 2004-10-26 11:05                 ` Artem Bityuckiy
  2004-10-26 13:52                   ` Ferenc Havasi
  0 siblings, 1 reply; 27+ messages in thread
From: Artem Bityuckiy @ 2004-10-26 11:05 UTC (permalink / raw)
  To: Ferenc Havasi; +Cc: linux-mtd, David Woodhouse, jffs-dev

Ferenc,

> Is it sure than only one non-full erase block is in the filesystem? 
> Non-full means here that there is some nodes already in that, but also 
> there is some free space at the end of it.
I didn't analyse this accurately, but my vision is that there is one 
current block (c->nextblock). Even GC moves nodes to it. This is because 
  the jffs2_do_reserve_space() is always used (even by GC), and the 
jffs2_do_reserve_space() always uses c->nextblock.

> As I see jffs2_do_reserve space is called before inode/... allocation in 
> most cases. So at that time the summary information is not know - but at 
> writing it have to be known certainly.
May be... From another hand you may write summary every time the 
jffs2_reserve_space() fetches new block from the free_list...
Anyway, this is not fundamental...

> You may misunderstood me. In the previous letter I wrote: "So you 
> convinced me. We will change the design of summary. The inodes and 
> dirents will be also in the summary."
> 
> So now we do plan to store dirents in the summary. :)
OK, sorry. :-)

> Let see how it work, and after we can make it more optimal :)
Agree :-)

Also, please, take into account that there may be checkpoint nodes (I'm 
implementing this). So, I think you need to have a generic mechanism to 
add new node types to your summary.

Also, I think it is good to have a generic mechanism to just refer some 
nodes from summaries (for example, direntries with long names or 
something else).

Thank you for conversation too.
:-)

-- 
Best regards, Artem B. Bityuckiy
Oktet Labs (St. Petersburg), Software Engineer.
+78124286709 (office) +79112449030 (mobile)
E-mail: dedekind@oktetlabs.ru, web: http://www.oktetlabs.ru

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: JFFS2 mount time
  2004-10-26 11:05                 ` Artem Bityuckiy
@ 2004-10-26 13:52                   ` Ferenc Havasi
  0 siblings, 0 replies; 27+ messages in thread
From: Ferenc Havasi @ 2004-10-26 13:52 UTC (permalink / raw)
  To: dedekind; +Cc: linux-mtd, David Woodhouse, jffs-dev

Hi Artem,

> Also, please, take into account that there may be checkpoint nodes (I'm 
> implementing this). So, I think you need to have a generic mechanism to 
> add new node types to your summary.
 >
> Also, I think it is good to have a generic mechanism to just refer some 
> nodes from summaries (for example, direntries with long names or 
> something else).

Yes, it will be easy to extend.

We also need a this general support - because we will introduce a new 
node type, too, becauseof the model file support, which will start to 
commit when David finishes his patch for Linus.

Bye,
Ferenc

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2004-10-26 13:48 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-20 14:26 JFFS2 mount time Ferenc Havasi
2004-10-20 15:26 ` [OBORONA-SPAM] " Artem B. Bityuckiy
2004-10-20 15:49   ` Ferenc Havasi
2004-10-20 15:53     ` Artem B. Bityuckiy
2004-10-21  6:29 ` Artem B. Bityuckiy
2004-10-21  6:54   ` Ferenc Havasi
2004-10-21  7:16     ` Artem B. Bityuckiy
2004-10-21 19:50       ` Ferenc Havasi
2004-10-21  7:30 ` JFFS2 mount time - more Artem B. Bityuckiy
     [not found] ` <41776351.4040204@yandex.ru>
2004-10-21  7:39   ` JFFS2 mount time - 3 more questions Ferenc Havasi
2004-10-21 12:49 ` JFFS2 mount time Jarkko Lavinen
2004-10-21 19:11   ` Ferenc Havasi
2004-10-22  9:58   ` Ferenc Havasi
2004-10-21 13:24 ` David Woodhouse
2004-10-21 20:05   ` Ferenc Havasi
2004-10-22 12:44     ` Artem Bityuckiy
2004-10-25  9:36       ` Ferenc Havasi
2004-10-25 10:56         ` Artem Bityuckiy
2004-10-25 15:30           ` Ferenc Havasi
2004-10-26  9:59             ` Artem Bityuckiy
2004-10-26 10:21               ` Ferenc Havasi
2004-10-26 11:05                 ` Artem Bityuckiy
2004-10-26 13:52                   ` Ferenc Havasi
2004-10-25 11:21         ` Artem Bityuckiy
2004-10-26  9:29       ` Jarkko Lavinen
2004-10-26 10:24         ` Ferenc Havasi
2004-10-26 10:34         ` Artem Bityuckiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox