public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
* Benchmarking JFFS2
@ 2002-05-02 12:56 Jarkko Lavinen
  2002-05-02 13:40 ` David Woodhouse
  0 siblings, 1 reply; 16+ messages in thread
From: Jarkko Lavinen @ 2002-05-02 12:56 UTC (permalink / raw)
  To: MTD List

[-- Attachment #1: Type: text/plain, Size: 5395 bytes --]

I have been running simple benchmark program that is based on over 10
years old Byte Unix file-system benchmark. I have modified the program
to measure time using gettimeofday(), report throughput better, collect
latency profiles and report about memory consumption.

I have then measured the write throughput relation to the block
size. I am using embedded device with Arm9 CPU @ 120 Mhz, 8MB of ram
and Intel 4MiB NOR flash (28F320B3T) and kernel 2.4.15. My JFFS2 code
is from CVS snapshot in February 2002 and may be too old as well as
the kernel.

I am attaching the test program, sample parameters and a figure of
results seen on particular device. The figure shows two curves. In the
first curve, an upside down "V" is show. In another curve there is a
bent "S" shape.

I first tried running the benchmark runs in sequence, with random
data. The best performance 27000 bytes/s was achieved using blocks
sizes of 1024 +- 512 bytes. With block size of 4KiB the performance is
only 50% of the peak throughput.

Then I run the test using natively formatted file-system. I erased the
flash partition, then mounted it, unmounted and then again mounted
it. After that I run only single benchmark run so that the file
created didn't fill the file-system and no overwriting occurred nor
garbage was produced. The performance increased steadily and at block
size multiples of 4KiB achieved about 56000 bytes/s. The raw writing
speed through /dev/mtd0 is 67000 B/s

Question 1:

Is the lack of performance at higher block sizes normal?

Question 2:

Is the lack of performance at higher blocks sizes due to garbage
collection?


I have also tried to run the tests using linear data that compresses
easily. I have encountered repeatedly very low memory and out of
memory condition and messages like "Memory fail", "deflateInit failed"
and when the memory really runs out repeated "Out of memory". I don't
think a benchmark program should be able to bring the system to its
knwws simply by exercising the file-system. I wouldn't bet on the
stability and maturity of the embedded device either.

To bring out this behavior I run the test with blocks sizes of 1K to
128K with proportional steps. Typically somewhere around 2K I start to
see first messages and at 4 KiB the system has run out of memory. It
is not enough to run single benchmark alone but effect seems to be
cumulative and requires sustained loading of the file-system.

Regards,
Jarkko Lavinen


------------------------------------------------------------------------------
Some output from the program when memory problems occur:

Running fstime 60 seconds, 1722 byte block size, linear data, max size 17500
Write test: 61.4s elapsed, 1218 blocks, 2.00 MB @ 33.4 Kbytes/sec written.
Read test: 60.0s elapsed, 628782 blocks, 1.01 GB @ 17.2 Mbytes/sec read.
Copy test: Memory fault

Running fstime 60 seconds, 3444 byte block size, linear data, max size 17500
Write test: 61.5s elapsed, 821 blocks, 2.70 MB @ 44.9 Kbytes/sec written.
Read test: 60.0s elapsed, 330471 blocks, 1.06 GB @ 18.1 Mbytes/sec read.
Copy test: deflateInit failed
deflateInit failed
deflateInit failed
60.2s elapsed, 1086 blocks, 3.57 MB @ 60.7 Kbytes/sec copied.
# 3444 45976.000000 18969034.000000 62160.281250 1777664 5234688 24576 2445312 1908736 5103616 24576 4820992 1581056 5431296 24576 5226496 2232320 4780032 24576 4587520 2232320 4780032 24576 2392064

Running fstime 60 seconds, 4096 byte block size, linear data, max size 17500
Write test: 60.8s elapsed, 1739 blocks, 6.79 MB @ 114 Kbytes/sec written.
Read test: 60.0s elapsed, 19406 blocks, 75.8 MB @ 1.26 Mbytes/sec read.
deflateInit failed
Copy test: deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
Out of Memory: Killed process 8 (sh).
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
Out of Memory: Killed process 8 (sh).
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
Out of Memory: Killed process 8 (sh).
Out of Memory: Killed process 8 (sh).
Out of Memory: Killed process 8 (sh).

[-- Attachment #2: combined.png --]
[-- Type: image/png, Size: 5142 bytes --]

[-- Attachment #3: fstime.c --]
[-- Type: text/plain, Size: 13788 bytes --]

/* fstest.c. baed on old Byte Unix filsystem bechmark */

#include <stdio.h>
#include <signal.h>
#include <errno.h>
#include <sys/time.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>

#define SECONDS      10
#define BUFF_SIZE    131072
#define BLOCK_SIZE   1024 /* default block size */
#define MAX_BLOCKS   2000  /* max number of BUFF_SIZE blocks in file */
#define MAX_SIZE     3584  /* Some default flash size limit */
#define LATENCYSCALE 6
#define FILE0        "testfile0"
#define FILE1        "testfile1"

/* Globals */
int  block_size = BLOCK_SIZE;
int  max_size =   MAX_SIZE;
int  max_blocks = MAX_BLOCKS;
int  seconds    = SECONDS;
int  sigalarm   = 0;
int  verbose    = 0;

char buf[BUFF_SIZE];
int  latencylogging = 0;
int  fd0, fd1;
unsigned w_latencies[LATENCYSCALE][100], 
  c_latencies[LATENCYSCALE][100];

struct fs_stat_st {
  float speed;
  unsigned long used[4];
} w_stat, r_stat, c_stat;

int w_test(void);
int r_test(void);
int c_test(void);
void stop_count(int);
void clean_up(int prob);

float elapsed_seconds(struct timeval *start, struct timeval *end)
{
  float secs = (end->tv_sec - start->tv_sec) + 
    (end->tv_usec - start->tv_usec) / 1000000.0;
  if (secs < 0) secs = 0.0;
  return secs;
}

static void floatprint(FILE *fp, float f, int decimals)
{
  char fmt[10];
  int intdigits;

  if (f >= 100.0)
    intdigits=3;
  else if(f >= 10.0)
    intdigits=2;
  else
    intdigits=1;
		
  decimals -= intdigits;
  if (decimals < 0)
    decimals = 0;
  sprintf(fmt, "%%.%df ", decimals);	
  fprintf(fp, fmt, f);
}

static void floatscaleprint(FILE *fp, float f, int decimals)
{
  if (f >= 1024*1024*1024.0) {
    floatprint(stderr, f/(1024*1024*1024.0), decimals);
    fputc('G', stderr);
  } else if (f >= 1024*1024.0) {
    floatprint(stderr, f/(1024*1024.0), decimals);
    fputc('M', stderr);
  } else if (f >= 1024.0) {
    floatprint(stderr, f/1024.0, decimals);
    fputc('K', stderr);
  } else floatprint(stderr, f, decimals);
}

void skipff(FILE *fp)
{
  int c;
	
  c = getc(fp);
  while (c != EOF && c != '\n')
    c = getc(fp);
}

static void freemem(unsigned long *used)
{
  FILE *fp;
  int n;
	
  if (used == NULL)
    return;

  if ((fp = fopen("/proc/meminfo", "r")) == NULL) {
    fprintf(stderr, "Cannot open /proc/meminfo\n");
    exit(10);
  }

  skipff(fp);

  n = fscanf(fp, "Mem: %*u %lu %lu %*u %lu %lu",
	     &used[0], &used[1], &used[2], &used[3]);

  if (n >= 4) {
    used[1] += used[2] + used[3];
    used[0] -= used[2] + used[3];
  }

  fclose(fp);
}

static void report(float elapsed, int n_blocks, char *opdonename,
		   struct fs_stat_st *stat)
{
  float size = ((float) n_blocks * block_size);
  float speed = size / elapsed;

  if (verbose) {
    fprintf(stderr, "%.1fs elapsed, %d blocks, ",
	    elapsed, n_blocks);

    floatscaleprint(stderr, size, 3);
    fprintf(stderr,"B @ ");

    floatscaleprint(stderr, speed, 3);
    fprintf(stderr, "bytes/sec %s.\n", opdonename);
  }

  if (stat != NULL) {
    stat->speed = speed;
    freemem(stat->used);
  }
	       
  return;
}

static void loglatency(unsigned latencies[LATENCYSCALE][100],
		       struct timeval *before,
		       struct timeval *after)
{
  long usecs = (after->tv_sec - before->tv_sec)*1000000+
    (after->tv_usec - before->tv_usec);
  unsigned ix;

  if (usecs < 0)
    usecs = 0;

  if (usecs < 1000) { /* below 1 ms */
    ix = usecs / 10; /* max 990 us => 99 */
    latencies[0][ix]++;
  } else if (usecs < 10000) { /* 1 .. 10 ms */
    ix = usecs / 100; /* min 1000 us, max 9900 us  */
    latencies[1][ix]++;
  } else if (usecs < 100000) { /* 10.0 .. 99.9 ms */
    ix = usecs / 1000; /* max 99000 us */
    latencies[2][ix]++;
  } else if (usecs < 1000000) { /* 100 .. 999 ms */
    ix = usecs / 10000; /* min 100 000 max 990 000 us */
    latencies[3][ix]++;
  } else if (usecs < 10000000) { /* 1.. 9.9s s */
    ix = usecs / 100000; /* max 9 900 000 us */
    latencies[4][ix]++;
  } else { /* 10 .. 999 s */
    ix = usecs / 1000000; /* max 99 0900 000 us */
    if (ix > 99)
      ix = 99 ;
    latencies[5][ix]++;
  }
}

void latencystats(char *name, unsigned latencies[LATENCYSCALE][100])
{
  float lat,sum = 0.0;
  float min = 1e6, max = 0.0;
  unsigned cnt = 0, medcnt = 0;
  int i, j;
  char *fmt[] = {"%.2f %u\n", "%.1f %u\n", "%.0f %u\n"};
  float mul[LATENCYSCALE] = {0.01, 0.1, 1.0, 10, 100, 1000}; 
  float prev, median=-1;
	
  printf("Latency profile for %s\n", name);

  for(j = 0; j < LATENCYSCALE; j++) {
    char *s = fmt[j < 2 ? j : 2 ];

    /* The first latency array covers 0..990 us with 10us granularity,
       the next 1.0 .. 9.9 ms with 0.1ms granularty, then 10 .. 99 ms with 1ms
       granularty, then 100 .. 990 with 10 ms and so on. */
    for(i = (j > 0 ? 10 : 0); i < 100; i++)
      if (latencies[j][i]) {
	lat = i * mul[j];
	sum += lat*latencies[j][i];
	cnt += latencies[j][i];
	if (lat < min)
	  min = lat;
	if (lat > max)
	  max = lat;

	printf(s, lat, latencies[j][i]);
      }	
  }

  if (cnt > 0) {		
    prev=0;
    for(j = 0; j < LATENCYSCALE; j++) {
      for(i = (j > 0 ? 10 : 0); i < 100; i++) {
	if ((medcnt + latencies[j][i])*2 > cnt) {
	  median = (prev*medcnt + 
		    (i+1)*mul[j]*(cnt - medcnt)) / cnt;
	  goto loopout;
	}
	medcnt += latencies[j][i];
	prev = i*mul[j];
      }
    }
  loopout:

    printf("Avg latency:     %9.3g ms\n", sum / cnt);
    printf("Median latency:  %9.3g ms\n", median);
    printf("Minimum latency: %9g ms\n", min);
    printf("Maximum latency: %9g ms\n", max);
  }
}



/******************** MAIN ****************************/

int main(int argc, char *argv[])
{
  int random = 0;
  unsigned long begin[4], end[4];
  int i, c;
  char *dir = NULL;
	
  char *usage="fstime [-t seconds] [-b blocks] [-d dir] [-m max size]\n";

  do {
    /* Use posixly correct flag '+' */
    c = getopt (argc, argv, "b:t:d:m:vrh?l");
    switch (c) {
    case -1:
      break;
    case 'v':
      verbose++;
      break;
    case 'b':
      block_size = atoi(optarg);
      break;
    case 'd':
      dir=optarg;
      break;

    case 't':
      seconds = atoi(optarg);
      break;
    case 'r':
      random=1;
      break;
    case 'm':
      max_size=atoi(optarg);
      break;
    case 'l':
      latencylogging = 1;
      break;
    case 'h':
    case '?':
      fputs(usage, stderr);
      exit(0);
	
    default:
      printf ("Unknown option '%c'\n", c);
      break;
    }
  } while (c != -1);

  /**** initialize ****/
  if (seconds <= 0) {
    fprintf(stderr, "No time given\n");
    exit(1);
  }
	
  if (dir == NULL) {
    fprintf(stderr, "No test directory given\n");
    exit(1);
  }
  if (chdir(dir) == -1) {
    perror("fstime: chdir");
    exit(1);
  }

  if (block_size <= 0 || block_size > BUFF_SIZE) {
    fprintf(stderr, 
	    "Illegal blocksize. Must be +- 1..%d\n",
	    BUFF_SIZE);
    exit(1);
  }

  max_blocks = max_size*1024 / block_size;

  freemem(&begin[0]);

  if (verbose)
    printf("Running fstime %d seconds, %d byte block size, %s data, max size %d\n",
	   seconds,
	   block_size,
	   random?"random":"linear",
	   max_size);
  
  if ((fd0 = open(FILE0, O_CREAT | O_TRUNC | O_RDWR, 0666)) == -1) {
    perror("fstime: open");
    exit(1);
  }

  if ((fd1 = open(FILE1, O_CREAT|O_TRUNC|O_RDWR, 0666)) == -1) {
    perror("fstime: open");
    exit(1);
  }

  /* fill buffer with random or linear data */
  for (i = 0; i < BUFF_SIZE; i++) {
    if (random)
      buf[i] = rand() & 0xff;
    else
      buf[i] = i & 0xff;
  }
	
  if (latencylogging) 
    for (i = 0; i < 100; i++) {
      int j;
      for(j = 0; j < LATENCYSCALE; j++) {
	w_latencies[j][i] = 0;
	c_latencies[j][i] = 0;
      }
    }
	

  signal(SIGKILL, clean_up);
  if(w_test() || r_test() || c_test()) { 
    clean_up(0);
    exit(1);
  }
  
  clean_up(0);
  freemem(&end[0]);
  
  if (verbose > 1) 
    printf("Fields:\t 1  blocksize\n"
	   "\t 2 write speed\n"
	   "\t 3 read speed\n"
	   "\t 4 copy speed\n"
			    
	   "\t 5 bytes available at begin before opening files\n"
	   "\t 6 bytes used at begin\n"
	   "\t 7 bytes buffered at begin\n"
	   "\t 8 bytes cached at begin\n"
			    
	   "\t 9 bytes used after write test\n"
	   "\t10 bytes available after write test\n"
	   "\t11 bytes buffered after write test\n"
	   "\t12 bytes cached after write test\n"
			
	   "\t13 bytes used after read test\n"
	   "\t14 bytes available after read test\n"
	   "\t15 bytes buffered after read test\n"
	   "\t16 bytes cached after read test\n"
			
	   "\t17 bytes used after copy test\n"
	   "\t18 bytes available after copy test\n"
	   "\t19 bytes buffered after copy test\n"
	   "\t20 bytes cached after copy test\n"
			
	   "\t21 bytes used at the end after closing files\n"
	   "\t22 bytes available at the end\n"
	   "\t23 bytes buffered at the end\n"
	   "\t24 bytes cached at the end\n");

  printf("# %d %f %f %f ", 
	 block_size, w_stat.speed, r_stat.speed, c_stat.speed);

  printf("%lu %lu %lu %lu ",
	 begin[0], begin[1], begin[2], begin[3]);

  printf("%lu %lu %lu %lu ",
	 w_stat.used[0],
	 w_stat.used[1],
	 w_stat.used[2],
	 w_stat.used[3]);

  printf("%lu %lu %lu %lu ",
	 r_stat.used[0],
	 r_stat.used[1],
	 r_stat.used[2],
	 r_stat.used[3]);

  printf("%lu %lu %lu %lu ",
	 c_stat.used[0],
	 c_stat.used[1],
	 c_stat.used[2],
	 c_stat.used[3]);

  printf("%lu %lu %lu %lu\n",
	 end[0], end[1], end[2], end[3]);

  if (latencylogging) {
    latencystats("write latencies in the write test", w_latencies);
    latencystats("write latencies in the copy test",  c_latencies);
  }

  exit(0);
}

/* write test */
int w_test() 
{
  long n_blocks = 0L;
  int f_blocks;
  struct timeval start, end, l0, l1;
  int status;

#ifdef USE_SYNC 
  sync();
  sleep(5); /* to insure the sync */
#endif

  if (verbose)
    fprintf(stderr, "Write test: ");
  signal(SIGALRM,stop_count);
  sigalarm = 0; /* reset alarm flag */
  alarm(seconds);
  gettimeofday(&start, NULL);
  while(!sigalarm) {
      /* On first alarm may break the loop before complete. On
	 further rounds must go through */
      for(f_blocks=0; 
	  f_blocks < max_blocks &&
	    (!sigalarm || n_blocks >= max_blocks);
	  ++f_blocks) {
	  if (latencylogging) {
	    gettimeofday(&l0, NULL);
	    status = write(fd0, buf, block_size);
	    gettimeofday(&l1, NULL);
	    loglatency(w_latencies, &l0, &l1);
	  } else
	    status = write(fd0, buf, block_size);

	  if (status < 0) {
	    if (errno != EINTR) {
	      perror("fstime: write");
	      return(-1);
	    } else stop_count(0);
	  }
	  ++ n_blocks;
	}
      lseek(fd0, SEEK_SET, 0); /* rewind */
    }
  /* stop clock */
  gettimeofday(&end, NULL);
  report(elapsed_seconds(&start, &end), n_blocks, "written",
	 &w_stat);
  return(0);
}

/* read test */
int r_test()
{
  long n_blocks = 0L;
  int n_read;
  extern int errno;
  struct timeval start, end;
  
  if (verbose)
    fprintf(stderr, "Read test: ");

  /* rewind */
#ifdef USE_SYNC
  sync();
  sleep(10+seconds/2);
#endif

  errno = 0;
  lseek(fd0, SEEK_SET, 0);

  signal(SIGALRM,stop_count);
  sigalarm = 0; /* reset alarm flag */
  alarm(seconds);
  gettimeofday(&start, NULL);
  while(!sigalarm) {
    /* read while checking for an error */
    n_read = read(fd0, buf, block_size);
    if (n_read == 0) {
      lseek(fd0, SEEK_SET, 0);  /* rewind at end of file */
      continue;
    } else if (n_read < 0)
      switch(errno) {
      case EINVAL:
	lseek(fd0, SEEK_SET, 0);  /* rewind at end of file */
	continue;
	break;
      case EINTR:
	stop_count(0);
	break;
      default:
	perror("fstime: read");
	return(-1);
	break;
      }
    ++ n_blocks;
  }
  /* stop clock */
  gettimeofday(&end, NULL);
  report(elapsed_seconds(&start, &end), n_blocks, "read", &r_stat);
  return(0);
}


/* copy test */
int c_test() 
{
  long n_blocks = 0L;
  struct timeval start, end, l0, l1;
  int n_read, n_written;

  if (verbose)
    fprintf(stderr, "Copy test: ");
  /* rewind */
#ifdef USE_SYNC
  sync();
  sleep(10+seconds/2); /* to insure the sync */
#endif
  lseek(fd0, SEEK_SET, 0);

  signal(SIGALRM,stop_count);
  sigalarm = 0; /* reset alarm flag */
  alarm(seconds);
  gettimeofday(&start, NULL);
  while(!sigalarm) {
    n_read = read(fd0, buf, block_size);
    if (n_read == 0) { /* EOF */
      lseek(fd0, SEEK_SET, 0);  /* rewind at end of file */
      lseek(fd1, SEEK_SET, 0);  /* rewind the output as well */
      continue;
    } else if (n_read < 0) {
      switch(errno) {
      case 0:
      case EINVAL:
	  lseek(fd0, SEEK_SET, 0);  /* rewind at end of file */
	  lseek(fd1, SEEK_SET, 0);  /* rewind the output as well */
	  continue;
	  break;
      case EINTR:
	stop_count(0);
	break;
      default:
	fprintf(stderr, "fstime: copy read (%d): %s\n",
		errno, strerror(errno));
	return(-1);
	break;
      }
    }

    if (latencylogging) {
      gettimeofday(&l0, NULL);
      n_written = write(fd1, buf, block_size);
      gettimeofday(&l1, NULL);
      loglatency(c_latencies, &l0, &l1);
    } else
      n_written = write(fd1, buf, block_size);
    
    if (n_written < 0) {
      if (errno == ENOSPC) {
	printf("fstime: copy write: %s at block %ld, max blocks %d.\n",
	       strerror(errno), n_blocks, max_blocks);
	system("df .");
	break; /* FS full */
      } else if (errno != EINTR) {
	fprintf(stderr, "fstime: copy write (%d): %s\n",
		errno, strerror(errno));
	return(-1);
      } else stop_count(0);
    } else
      ++ n_blocks;
  }
  /* stop clock */
  gettimeofday(&end, NULL);
  report(elapsed_seconds(&start, &end), n_blocks, "copied", &c_stat);
  return(0);
}

void stop_count(int i)
{
  sigalarm = 1;
}

void clean_up(int prob)
{  
  if (close(fd0) || close(fd1))
    perror("fstime: close");

  if (unlink(FILE0) || unlink(FILE1))
    perror("fstime: unlink");
}

[-- Attachment #4: test-rnd.sh --]
[-- Type: application/x-sh, Size: 19462 bytes --]

[-- Attachment #5: test-lin.sh --]
[-- Type: application/x-sh, Size: 1476 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-05-02 12:56 Benchmarking JFFS2 Jarkko Lavinen
@ 2002-05-02 13:40 ` David Woodhouse
  2002-05-03 17:19   ` Jarkko Lavinen
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: David Woodhouse @ 2002-05-02 13:40 UTC (permalink / raw)
  To: Jarkko Lavinen; +Cc: MTD List, jffs-dev

jlavi@iki.fi said:
>  I have also tried to run the tests using linear data that compresses
> easily. I have encountered repeatedly very low memory and out of
> memory condition and messages like "Memory fail", "deflateInit failed"
> and when the memory really runs out repeated "Out of memory". I don't
> think a benchmark program should be able to bring the system to its
> knwws simply by exercising the file-system. I wouldn't bet on the
> stability and maturity of the embedded device either.

The 'deflateInit failed' and memory problems are solved with the 
application of the 'shared-zlib' patches. I'm waiting for 2.4.19 to be 
released before sending those to Marcelo for 2.4.20-pre1, but they're at 
ftp.kernel.org:/pub/linux/kernel/people/dwmw2/shared-zlib and in the 2.4-ac 
trees. 

Your results on a clean file system are as expected. We write nodes which 
do not cross a 4096-byte boundary. So 4096-byte writes and multiples of 
4096 bytes will always write full-sized blocks with a full 4096 bytes of 
data prepended by a node header, and the effective write speed approaches 
a reasonable proportion of the maximum write bandwidth available. Due to 
the addition of node headers and the time taken by compression, the 
full write bandwidth of the raw flash chips cannot be achieved. 

Where your write size is not a multiple of 4096 bytes, some nodes which do
not carry a full payload must be written, and this is obviously less 
efficient. 

jlavi@iki.fi said:
> Question 1:
> Is the lack of performance at higher block sizes normal?
> Question 2:
> Is the lack of performance at higher blocks sizes due to garbage
> collection? 

We break up writes of greater than 4 KiB into 4 KiB chunks. A write size of 
8 KiB or any other multiple of 4 KiB should give you identical performance
the write size of 4 KiB. I suspect your results are skewed, and can see two 
possible reasons.

1. The file system is getting progressively dirtier as your tests continue.
   Perhaps you should take a complete snapshot of the flash when the file
   system is 'dirty', and reinstall that precise image before each run.

2. Garbage collection is happening in the background thread between your
   benchmark's timed write attempts, thereby making the smaller writes 
   _look_ more efficient. Possibly either kill (or SIGSTOP) the GC thread
   to prevent this or call gettimeofday() once each time round the loop 
   rather than twice, comparing with the value from the previous loop.

Neither of the above are valid excuses for the fact that write performance 
on a dirty file system sucks royally. There are some things we can do about 
that.

1. Stop the GC from decompressing then immediately recompressing nodes that
   it's just going to write out unchanged. It's a stupid waste of CPU time.

2. We have a 'dirty_list' containing blocks which have _any_ dirty space, 
   and we pick blocks from the to garbage-collect from. If there's only a 
   few bytes of dirty space, we GC the whole block just to gain a few 
   bytes. We should keep a 'very_dirty_list' of blocks with _significant_ 
   amounts of dirty space and favour that even more when picking blocks
   to GC, especially when doing just-in-time GC rather than from the 
   background thread.

If we're feeling really brave then we can try:

3. JFFS2 current keeps a single 'nextblock' pointer for the block to which
   new nodes are written. We interleave new writes from userspace with GC
   copies of old data; mixing long-lived data with new. This means we end up
   with blocks to be GC'd which have static long-lived data in. We should 
   keep _two_ blocks for writing, one for new data and one for data being 
   GC'd; this way the static data tend to get grouped together into blocks
   which stay clean and are (almost) never GC'd, while short-lived data are
   also grouped together into blocks which will have a higher proportion 
   of dirty space and hence will give faster GC progress.

   If we do this, it's going to completely screw up our NAND wbuf support/
   flushing logic, but it's probably worth it anyway.


--
dwmw2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-05-02 13:40 ` David Woodhouse
@ 2002-05-03 17:19   ` Jarkko Lavinen
  2002-05-03 17:54     ` David Woodhouse
  2002-05-12 19:04   ` David Woodhouse
  2002-10-17 10:36   ` Jarkko Lavinen
  2 siblings, 1 reply; 16+ messages in thread
From: Jarkko Lavinen @ 2002-05-03 17:19 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List, jffs-dev

[-- Attachment #1: Type: text/plain, Size: 1422 bytes --]

> I suspect your results are skewed, and can see two possible reasons.
> 1. The file system is getting progressively dirtier as your tests continue.
>    Perhaps you should take a complete snapshot of the flash when the file
>    system is 'dirty', and reinstall that precise image before each run.

It is quite laborious to flash a new file-system for each benchmark
run. Instead I added an option to make clean file-system dirty.

The program opens two files and fills the file-system writing these two
files in turns. The file A writes big blocks (about 2K) and is thrown
away once the file-system fills. The file B writes smaller blocks but
is left in place. Because the writes for the two files occurred in tuns
every erase block will have about 90% of garbage and small live data
pieces here and there. This dirtiness is reproducible and but perhaps even
too dirty and artificial for benchmarking.

Rerunning the benchmark with this setup had dramatic result. You were
right. My results were badly skewed.

Now the write performance peak is 11300 B/s at 1024 byte block
size. After 4K block size the throughput drops close to 2KB/s. I have
included the gnuplot figure containing the previous result from clean
file-system and the new result from extra dirty file-system. 

During my benchmark runs I have encountered 'Eep. read_inode() failed for 
ino #...'. Is this something I should be concerned?

Jarkko Lavinen

[-- Attachment #2: combined.png --]
[-- Type: image/png, Size: 3253 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-05-03 17:19   ` Jarkko Lavinen
@ 2002-05-03 17:54     ` David Woodhouse
  2002-05-06 12:19       ` Jarkko Lavinen
  0 siblings, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2002-05-03 17:54 UTC (permalink / raw)
  To: Jarkko Lavinen; +Cc: MTD List, jffs-dev

jlavi@iki.fi said:
>  During my benchmark runs I have encountered 'Eep. read_inode() failed
> for  ino #...'. Is this something I should be concerned?

Er, yes, that is concerning. Does it say why? If it's only occasional, then 
it's almost certainly memory allocation problems. 

It would be useful if you could provide a profiling run from the
worst-performing case, so we can see where it's spending the time. I've 
already voiced my suspicions, but I've been wrong before; I only wrote it :)

--
dwmw2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-05-03 17:54     ` David Woodhouse
@ 2002-05-06 12:19       ` Jarkko Lavinen
  2002-05-07 15:02         ` David Woodhouse
  2002-10-08 17:10         ` David Woodhouse
  0 siblings, 2 replies; 16+ messages in thread
From: Jarkko Lavinen @ 2002-05-06 12:19 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List, jffs-dev

> Er, yes, that is concerning. Does it say why? If it's only occasional, then 
> it's almost certainly memory allocation problems. 

I have caught the message 8 times. It doesn't occur rather seldom when I have
been benchmarking JFFS2 over the weekend.

On five cases the first line was "Unknown INCOMPAT nodetype C002 at 0020293C"
(address varies).

On all 8 cases the next lines are "jffs2_do_read_inode(): No data 
nodes found for ino #" then "Eep. read_inode() failed for ino #"

Last year there were two reports about C002 type node: 28 Jun 2001 Frederic 
Giasson, and 08 Aug 2001 Xavier DEBREUIL. Is this the same problem they had?


> It would be useful if you could provide a profiling run from the

Do you mean kernel profiling? I am using JFFS2 on an Arm based device and 
kernel profiling is not available for Arm or is it?

Jarkko lavinen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-05-06 12:19       ` Jarkko Lavinen
@ 2002-05-07 15:02         ` David Woodhouse
  2002-05-07 15:13           ` Thomas Gleixner
  2002-05-07 17:46           ` Jarkko Lavinen
  2002-10-08 17:10         ` David Woodhouse
  1 sibling, 2 replies; 16+ messages in thread
From: David Woodhouse @ 2002-05-07 15:02 UTC (permalink / raw)
  To: Jarkko Lavinen; +Cc: MTD List, jffs-dev

jlavi@iki.fi said:
>  I have caught the message 8 times. It doesn't occur rather seldom
> when I have been benchmarking JFFS2 over the weekend.

> On five cases the first line was "Unknown INCOMPAT nodetype C002 at
> 0020293C" (address varies). 

Er, this means it's marked as obsolete on the flash but not in memory. I 
have a vague recollection of having seen this and worked out the cause. 
Is this with the latest code from the jffs2-2_4-branch of CVS? Does it also 
happen with the trunk code?

>  Do you mean kernel profiling? I am using JFFS2 on an Arm based device
> and  kernel profiling is not available for Arm or is it?

Yes. Kernel profiling should work fine on ARM. 

--
dwmw2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-05-07 15:02         ` David Woodhouse
@ 2002-05-07 15:13           ` Thomas Gleixner
  2002-05-07 17:46           ` Jarkko Lavinen
  1 sibling, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2002-05-07 15:13 UTC (permalink / raw)
  To: David Woodhouse, Jarkko Lavinen; +Cc: MTD List, jffs-dev

On Tuesday, 7. May 2002 17:02, David Woodhouse wrote:
> jlavi@iki.fi said:
> >  I have caught the message 8 times. It doesn't occur rather seldom
> > when I have been benchmarking JFFS2 over the weekend.
> >
> > On five cases the first line was "Unknown INCOMPAT nodetype C002 at
> > 0020293C" (address varies).
>
> Er, this means it's marked as obsolete on the flash but not in memory. I
> have a vague recollection of having seen this and worked out the cause.
> Is this with the latest code from the jffs2-2_4-branch of CVS? Does it also
> happen with the trunk code?
Yep, this happens with the trunk code too. As I can remember, it happend, 
when I removed a directory and had a power loss before everything was written 
to flash. On the next mount I got some errors and was not able to remove 
the remains of this directory.

-- 
Thomas
___________________________________
autronix automation GmbH
http://www.autronix.de gleixner@autronix.de

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-05-07 15:02         ` David Woodhouse
  2002-05-07 15:13           ` Thomas Gleixner
@ 2002-05-07 17:46           ` Jarkko Lavinen
  2002-05-07 19:42             ` David Woodhouse
  1 sibling, 1 reply; 16+ messages in thread
From: Jarkko Lavinen @ 2002-05-07 17:46 UTC (permalink / raw)
  To: David Woodhouse; +Cc: MTD List, jffs-dev

> Is this with the latest code from the jffs2-2_4-branch of CVS? Does it also 
> happen with the trunk code?

The code was not the latest and I really hope I am not causing a wrong alarm.
Kernel 2.4.15, JFFS2 and mtd code from 11 February 2002 snapshot 
in ftp://ftp.uk.linux.org/pub/people/dwmw2/mtd/cvs/.

Jarkko Lavinen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-05-07 17:46           ` Jarkko Lavinen
@ 2002-05-07 19:42             ` David Woodhouse
  0 siblings, 0 replies; 16+ messages in thread
From: David Woodhouse @ 2002-05-07 19:42 UTC (permalink / raw)
  To: Jarkko Lavinen; +Cc: MTD List, jffs-dev

jlavi@iki.fi said:
>  The code was not the latest and I really hope I am not causing a
> wrong alarm. Kernel 2.4.15, JFFS2 and mtd code from 11 February 2002
> snapshot  in ftp://ftp.uk.linux.org/pub/people/dwmw2/mtd/cvs/. 

Forgive me - my brain goes in and out of phase with JFFS2, and it's mostly 
out at this point. Thomas seems generally more clueful than I.

Version 1.63 of scan.c contains what was intended to be an optimisation (from
Joakim Tjernlund), but which should also have the effect of preventing 
obsolete nodes from ending up on the lists. That's in the trunk code but 
not the jffs2-2_4-branch.

--
dwmw2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-05-02 13:40 ` David Woodhouse
  2002-05-03 17:19   ` Jarkko Lavinen
@ 2002-05-12 19:04   ` David Woodhouse
  2002-05-13  8:40     ` Jarkko Lavinen
  2002-10-17 10:36   ` Jarkko Lavinen
  2 siblings, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2002-05-12 19:04 UTC (permalink / raw)
  To: Jarkko Lavinen; +Cc: MTD List, jffs-dev

dwmw2@infradead.org said:
> 2. We have a 'dirty_list' containing blocks which have _any_ dirty space, 
>    and we pick blocks from the to garbage-collect from. If there's only a 
>    few bytes of dirty space, we GC the whole block just to gain a few 
>    bytes. We should keep a 'very_dirty_list' of blocks with _significant_ 
>    amounts of dirty space and favour that even more when picking blocks
>    to GC, especially when doing just-in-time GC rather than from the 
>    background thread. 

I've done this, and every block with more than 50% dirty space now goes on
the 'very_dirty_list'. With the file system snapshot I'd taken from my iPAQ
to investigate what turned out to the double-free bug, the garbage collect 
thread was GC'ing 6 blocks immediately after mounting before it stopped. 
With the very_dirty_list optimisation, that dropped to two.

I'm beginning to wonder whether we should actually abolish the separate
clean/dirty/very_dirty lists and keep a single used_list in decreasing 
order of dirty space, then in jffs2_find_gc_block just pick the block 
number $N in that list, where N is exponentially distributed -- high 
chance of being '1', small but non-negligible chance of being near the end 
of the list.

Keeping the used_list sorted should be fairly simple and cheap - each time 
you make a block slightly dirtier you just do a little bit of bubble-sort, 
and moving nextblock onto the used_list is fairly rare so it doesn't matter 
too much that it's a bit slower.

Does anyone have any ideas on how we'd generate the random number N for 
jffs2_find_gc_block() though?

--
dwmw2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-05-12 19:04   ` David Woodhouse
@ 2002-05-13  8:40     ` Jarkko Lavinen
  2002-05-13  9:11       ` David Woodhouse
  0 siblings, 1 reply; 16+ messages in thread
From: Jarkko Lavinen @ 2002-05-13  8:40 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Jarkko Lavinen, MTD List, jffs-dev

> order of dirty space, then in jffs2_find_gc_block just pick the block 
> number $N in that list, where N is exponentially distributed -- high 
> chance of being '1', small but non-negligible chance of being near the end 
> of the list.
...
> Does anyone have any ideas on how we'd generate the random number N for 
> jffs2_find_gc_block() though?

Starting from the list head, one could produce random numbers x, 0 <= x < 1,
and proceed to the next node only if the x is below a ratio r, 0 < r < 1. One 
could repeat the same procedure n times. The probabilities to reach first few
nodes would be p1 = r, p2 = r*(1 - r), p3 = r*(1 - r)^2 and pi = r*(1 - r)^i. 
Probability to reach the last node n would be pn = (1 - r)^(n - 1).

For example, if r were 0.9 and the list length 4, p1 = 0.9, p2 = 0.09, 
p3 = 0.009 and p4 = 0.001.

Is this too naive approach? This relies perhaps too much on the randomness of 
semi-random numbers and might mean only first few nodes are ever picked up 
and never nodes from the tail.

Jarkko Lavinen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-05-13  8:40     ` Jarkko Lavinen
@ 2002-05-13  9:11       ` David Woodhouse
  0 siblings, 0 replies; 16+ messages in thread
From: David Woodhouse @ 2002-05-13  9:11 UTC (permalink / raw)
  To: Jarkko Lavinen; +Cc: MTD List, jffs-dev

jlavi@iki.fi said:
>  Is this too naive approach? This relies perhaps too much on the
> randomness of  semi-random numbers and might mean only first few nodes
> are ever picked up  and never nodes from the tail. 

I think naïve is fine. In fact, I can't even justify the suggestion that it 
be exponential -- we could probably do just as well by picking the first 
(i.e. dirtiest) block with 85% probability, and picking some other block 
off the list with uniform distribution the other 15% of the time.

--
dwmw2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-05-06 12:19       ` Jarkko Lavinen
  2002-05-07 15:02         ` David Woodhouse
@ 2002-10-08 17:10         ` David Woodhouse
  1 sibling, 0 replies; 16+ messages in thread
From: David Woodhouse @ 2002-10-08 17:10 UTC (permalink / raw)
  To: Jarkko Lavinen; +Cc: MTD List, jffs-dev

On Mon, 6 May 2002, jlavi@iki.fi said:
>  I have caught the message 8 times. It doesn't occur rather seldom
> when I have been benchmarking JFFS2 over the weekend.

> On five cases the first line was "Unknown INCOMPAT nodetype C002 at
> 0020293C" (address varies).

> On all 8 cases the next lines are "jffs2_do_read_inode(): No data
> nodes found for ino #" then "Eep. read_inode() failed for ino #"

> Last year there were two reports about C002 type node: 28 Jun 2001
> Frederic  Giasson, and 08 Aug 2001 Xavier DEBREUIL. Is this the same
> problem they had? 

Er, yes. What happens is that the VFS calls jffs2_read_inode() at the same 
time as it's calling jffs2_clear_inode() for the inode in question. So 
clear_inode() is busily obsoleting all the nodes which belong to the inode 
while read_inode() is trying to read them. Giving you either a whinge that 
clear_inode() got there first and there are _no_ nodes, or sometimes a 
whinge about a node which was marked valid in memory but by the time we 
read it from the flash it was marked obsolete.

As this behaviour from the VFS is intentional, try this (patch to 2.4 
branch; it's already in the CVS head for other reasons):

Index: fs/jffs2/gc.c
===================================================================
RCS file: /home/cvs/mtd/fs/jffs2/gc.c,v
retrieving revision 1.52.2.3
diff -u -p -r1.52.2.3 gc.c
--- fs/jffs2/gc.c	12 May 2002 17:27:08 -0000	1.52.2.3
+++ fs/jffs2/gc.c	8 Oct 2002 17:06:44 -0000
@@ -134,8 +134,10 @@ int jffs2_garbage_collect_pass(struct jf
 
 	D1(printk(KERN_DEBUG "garbage collect from block at phys 0x%08x\n", jeb->offset));
 
-	if (!jeb->used_size)
+	if (!jeb->used_size) {
+		up(&c->alloc_sem);
 		goto eraseit;
+	}
 
 	raw = jeb->gc_node;
 			
@@ -156,6 +158,7 @@ int jffs2_garbage_collect_pass(struct jf
 		/* Inode-less node. Clean marker, snapshot or something like that */
 		spin_unlock_bh(&c->erase_completion_lock);
 		jffs2_mark_node_obsolete(c, raw);
+		up(&c->alloc_sem);
 		goto eraseit_lock;
 	}
 						     
@@ -170,8 +173,8 @@ int jffs2_garbage_collect_pass(struct jf
 	if (is_bad_inode(inode)) {
 		printk(KERN_NOTICE "Eep. read_inode() failed for ino #%u\n", inum);
 		/* NB. This will happen again. We need to do something appropriate here. */
-		iput(inode);
 		up(&c->alloc_sem);
+		iput(inode);
 		return -EIO;
 	}
 
@@ -234,6 +237,7 @@ int jffs2_garbage_collect_pass(struct jf
 	}
  upnout:
 	up(&f->sem);
+	up(&c->alloc_sem);
 	iput(inode);
 
  eraseit_lock:
@@ -250,7 +254,6 @@ int jffs2_garbage_collect_pass(struct jf
 		jffs2_erase_pending_trigger(c);
 	}
 	spin_unlock_bh(&c->erase_completion_lock);
-	up(&c->alloc_sem);
 
 	return ret;
 }
Index: fs/jffs2/readinode.c
===================================================================
RCS file: /home/cvs/mtd/fs/jffs2/readinode.c,v
retrieving revision 1.58.2.5
diff -u -p -r1.58.2.5 readinode.c
--- fs/jffs2/readinode.c	5 Mar 2002 22:40:03 -0000	1.58.2.5
+++ fs/jffs2/readinode.c	8 Oct 2002 17:06:44 -0000
@@ -468,15 +468,28 @@ void jffs2_clear_inode (struct inode *in
 	struct jffs2_node_frag *frag, *frags;
 	struct jffs2_full_dirent *fd, *fds;
 	struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
+        /* I don't think we care about the potential race due to reading this
+           without f->sem. It can never get undeleted. */
+        int deleted = f->inocache && !f->inocache->nlink;
 
 	D1(printk(KERN_DEBUG "jffs2_clear_inode(): ino #%lu mode %o\n", inode->i_ino, inode->i_mode));
 
+	/* If it's a deleted inode, grab the alloc_sem. This prevents
+	   jffs2_garbage_collect_pass() from deciding that it wants to
+	   garbage collect one of the nodes we're just about to mark 
+	   obsolete -- by the time we drop alloc_sem and return, all
+	   the nodes are marked obsolete, and jffs2_g_c_pass() won't
+	   call iget() for the inode in question.
+	*/
+	if (deleted)
+		down(&c->alloc_sem);
+
 	down(&f->sem);
 
 	frags = f->fraglist;
 	fds = f->dents;
 	if (f->metadata) {
-		if (!f->inocache->nlink)
+		if (deleted)
 			jffs2_mark_node_obsolete(c, f->metadata->raw);
 		jffs2_free_full_dnode(f->metadata);
 	}
@@ -488,7 +501,7 @@ void jffs2_clear_inode (struct inode *in
 
 		if (frag->node && !(--frag->node->frags)) {
 			/* Not a hole, and it's the final remaining frag of this node. Free the node */
-			if (!f->inocache->nlink)
+			if (deleted)
 				jffs2_mark_node_obsolete(c, frag->node->raw);
 
 			jffs2_free_full_dnode(frag->node);
@@ -502,5 +515,8 @@ void jffs2_clear_inode (struct inode *in
 	}
 
 	up(&f->sem);
+
+	if(deleted)
+		up(&c->alloc_sem);
 };
 


--
dwmw2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-05-02 13:40 ` David Woodhouse
  2002-05-03 17:19   ` Jarkko Lavinen
  2002-05-12 19:04   ` David Woodhouse
@ 2002-10-17 10:36   ` Jarkko Lavinen
  2003-01-23 12:09     ` David Woodhouse
  2 siblings, 1 reply; 16+ messages in thread
From: Jarkko Lavinen @ 2002-10-17 10:36 UTC (permalink / raw)
  To: MTD List, jffs-dev; +Cc: David Woodhouse

On Thu, May 02, 2002 at 02:40:58PM +0100, David Woodhouse wrote:

[ JFFS2 dirty fs poor performance discussd ]

> Neither of the above are valid excuses for the fact that write performance
> on a dirty file system sucks royally. There are some things we can do about
> that.

[ three optimizations suggested ]


In May I was using a prototype board running on ARM 925T (Omap). Since
then I have changed the testing to TI EVM 1510 evaluation system
available from TI. In the spring I was using 120 MHz, now 60MHz. The
flash was then 28F320B3 Strataflash chip, now it is Intel 28F128
StrataFlash. Also I am now using reasonably new snapshot of JFFS2 from
October 9th.

The performance in last spring:
http://www.hut.fi/~jlavi/jffs2/h_fstime1605combined.png

Please note the block size scale is logarithmic.  Dirty files system
performance peaked at the block size of 600 .. 700 bytes and at
singular point of 4096 bytes.

The performance now with the JFFS2 snapshot from October 9th:
http://www.hut.fi/~jlavi/jffs2/h_14102002_dirtyblocks_2.4.19_CVS0910.png

Please note the block size scale is linear.  The dirty file system
performance follows now very accurately to the clean file system
performance. This is partially due improved measurement accuracy:
longer benchmark time, using warmup pass, setting the benchmark do
equal mount of overwriting among data-points. The dirty file system
performance measured this way fluctuates +- 1% with 480 second
benchmark time.

Latencies:
http://www.hut.fi/~jlavi/jffs2/refpoint_1610_60M_+ID_t480w120_l100000_v2.4.19_CVS0910.png

This time double logarithmic chart.  I don't see over 10 second
latencies anymore. In the latency benchmark the maximum latency of
3.3s occured once among 100 000 samples.

Performance fluctuation due to GC:
http://www.hut.fi/~jlavi/jffs2/gc_10102002_scatterdirty_2.4.19_MTD0910.png

The sample is short but shows the general pattern of how the background GC 
thread does its work.

Of course measuring write throughput is just one side of performance
but I think it is fair to say JFFS2 write performane on dirty file
system with low performance CPUs has improved dramatically since last
spring.

Jarkko Lavinen

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Benchmarking JFFS2
  2002-10-17 10:36   ` Jarkko Lavinen
@ 2003-01-23 12:09     ` David Woodhouse
  2003-02-13 12:38       ` Jarkko Lavinen
  0 siblings, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2003-01-23 12:09 UTC (permalink / raw)
  To: Jarkko Lavinen; +Cc: MTD List, jffs-dev

On Thu, 17 Oct 2002, jlavi@iki.fi said:
>  Of course measuring write throughput is just one side of performance
> but I think it is fair to say JFFS2 write performane on dirty file
> system with low performance CPUs has improved dramatically since last
> spring.

Fancy repeating the test with current CVS? I just committed the 
oft-discussed code required to avoid decompressing and recompressing nodes 
which haven't changed -- and to avoid even doing the iget() and building up 
the node lists for the inodes to which they belong in that case too.

--
dwmw2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Benchmarking JFFS2
  2003-01-23 12:09     ` David Woodhouse
@ 2003-02-13 12:38       ` Jarkko Lavinen
  0 siblings, 0 replies; 16+ messages in thread
From: Jarkko Lavinen @ 2003-02-13 12:38 UTC (permalink / raw)
  To: linux-mtd

On Thu, Jan 23, 2003 at 12:09:55PM +0000, David Woodhouse wrote:
> Fancy repeating the test with current CVS? I just committed the 
> oft-discussed code required to avoid decompressing and recompressing nodes 
> which haven't changed -- and to avoid even doing the iget() and building up 
> the node lists for the inodes to which they belong in that case too.

Jan 23 CVS. Latency accumulation around 1 second, after that only sporadic
long latencies. 99.99% of the 1 million writes occurs in less than
1.3 seconds.  Both frequency and time scales are logarithmic.
http://www.hut.fi/~jlavi/jffs2/refpoint_0302_60M+ID_t480w240_l1000000_linux-2.4.19_MTD2301.png

Nov 27 CVS: Latency accumulations at 1, 2, 3 seconds. http://www.hut.fi/~jlavi/jffs2/refpoint_2811_60M_+ID_t480w120_l100000_v2.4.19_CVS2711.png

Comparing November 27 snapshot to 23 January snapshot there seems to
be no difference in the average write throughput (at the block size of
4KiB) but the write latencies and the latency distribution differ. In
the new code latencies longer than 1 second (approx. the time to erase
1 sector) are less frequent. I was still able to catch latencies of
5..10 seconds, but the occurrence of less than 1:100 000.

In dirty file system the benchmark overwrites the file several times
and the overwriting is sequential: write till given file size, seek to
begin, write over till end, seek....  I think the garbage collection
likes this sequential overwriting.  There will be plenty of very dirty
erase sectors with very little live data and GC doesn't have to move
much live data to other erase sectors.

Because of sequential overwriting the cost of copying live data in GC
is negligible. I have also tried overwriting at random locations
instead on sequential order.

Here is another comparison of the results:
+--------------------+---------------+-------+--------+------+------+
|                    | Avg write spd |  Avg  | 99.9%  |  Max |      |
|                    |           B/sa|  lat  | < lat  |  lat |  Cnt |
+--------------------+---------------+-------+--------+------+------+
| Nov27, sequential  | 39000 / 39300 | 104ms | 3200ms | 3.4s | 100k |
| Nov27, seq, no gcd | 39100 / 39100 | 104ms | 3100ms | 3.1s |  10k |
| Jan23, sequential  | 38800 / 38700 | 105ms | 1000ms |  10s |   1M |
| Jan23, seq, no gcd | 38200 / 38900 | 105ms | 1200ms | 7.4s |   1M |
| Nov27, random seek | 38000 / 40000 | 102ms | 3600ms | 4.0s |  10K |
| Jan23, random seek | 38300 / 42600 |  97ms | 1200ms | 1.3s |  10k |
+--------------------+---------------+-------+--------+------+------+

There are two writing speeds mentioned in the table. The first is
without latency logging and the second is with latencies logged.
Latencies are measured by calling gettimeofday() once per loop pass.
The "Cnt" stands for how many times write() was called.

In sequential overwrite tests the error in measurement is +- 2% and if
the measurement is repeated several times the standard deviation is
1%.  With the random seek tests I haven't measured the error.

It is interesting to notice that the throughput increases about 5% if
the latencies are measured with updates@random locations.

One can grow the maximum latency in the tests by running the benchmark 
longer.  A short maximum latency means either the writes really are 
executed shorter or that the benchmark program was not run long enough.

Jarkko Lavinen

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-02-13 12:38 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-02 12:56 Benchmarking JFFS2 Jarkko Lavinen
2002-05-02 13:40 ` David Woodhouse
2002-05-03 17:19   ` Jarkko Lavinen
2002-05-03 17:54     ` David Woodhouse
2002-05-06 12:19       ` Jarkko Lavinen
2002-05-07 15:02         ` David Woodhouse
2002-05-07 15:13           ` Thomas Gleixner
2002-05-07 17:46           ` Jarkko Lavinen
2002-05-07 19:42             ` David Woodhouse
2002-10-08 17:10         ` David Woodhouse
2002-05-12 19:04   ` David Woodhouse
2002-05-13  8:40     ` Jarkko Lavinen
2002-05-13  9:11       ` David Woodhouse
2002-10-17 10:36   ` Jarkko Lavinen
2003-01-23 12:09     ` David Woodhouse
2003-02-13 12:38       ` Jarkko Lavinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox