* Benchmarking JFFS2
@ 2002-05-02 12:56 Jarkko Lavinen
2002-05-02 13:40 ` David Woodhouse
0 siblings, 1 reply; 16+ messages in thread
From: Jarkko Lavinen @ 2002-05-02 12:56 UTC (permalink / raw)
To: MTD List
[-- Attachment #1: Type: text/plain, Size: 5395 bytes --]
I have been running simple benchmark program that is based on over 10
years old Byte Unix file-system benchmark. I have modified the program
to measure time using gettimeofday(), report throughput better, collect
latency profiles and report about memory consumption.
I have then measured the write throughput relation to the block
size. I am using embedded device with Arm9 CPU @ 120 Mhz, 8MB of ram
and Intel 4MiB NOR flash (28F320B3T) and kernel 2.4.15. My JFFS2 code
is from CVS snapshot in February 2002 and may be too old as well as
the kernel.
I am attaching the test program, sample parameters and a figure of
results seen on particular device. The figure shows two curves. In the
first curve, an upside down "V" is show. In another curve there is a
bent "S" shape.
I first tried running the benchmark runs in sequence, with random
data. The best performance 27000 bytes/s was achieved using blocks
sizes of 1024 +- 512 bytes. With block size of 4KiB the performance is
only 50% of the peak throughput.
Then I run the test using natively formatted file-system. I erased the
flash partition, then mounted it, unmounted and then again mounted
it. After that I run only single benchmark run so that the file
created didn't fill the file-system and no overwriting occurred nor
garbage was produced. The performance increased steadily and at block
size multiples of 4KiB achieved about 56000 bytes/s. The raw writing
speed through /dev/mtd0 is 67000 B/s
Question 1:
Is the lack of performance at higher block sizes normal?
Question 2:
Is the lack of performance at higher blocks sizes due to garbage
collection?
I have also tried to run the tests using linear data that compresses
easily. I have encountered repeatedly very low memory and out of
memory condition and messages like "Memory fail", "deflateInit failed"
and when the memory really runs out repeated "Out of memory". I don't
think a benchmark program should be able to bring the system to its
knwws simply by exercising the file-system. I wouldn't bet on the
stability and maturity of the embedded device either.
To bring out this behavior I run the test with blocks sizes of 1K to
128K with proportional steps. Typically somewhere around 2K I start to
see first messages and at 4 KiB the system has run out of memory. It
is not enough to run single benchmark alone but effect seems to be
cumulative and requires sustained loading of the file-system.
Regards,
Jarkko Lavinen
------------------------------------------------------------------------------
Some output from the program when memory problems occur:
Running fstime 60 seconds, 1722 byte block size, linear data, max size 17500
Write test: 61.4s elapsed, 1218 blocks, 2.00 MB @ 33.4 Kbytes/sec written.
Read test: 60.0s elapsed, 628782 blocks, 1.01 GB @ 17.2 Mbytes/sec read.
Copy test: Memory fault
Running fstime 60 seconds, 3444 byte block size, linear data, max size 17500
Write test: 61.5s elapsed, 821 blocks, 2.70 MB @ 44.9 Kbytes/sec written.
Read test: 60.0s elapsed, 330471 blocks, 1.06 GB @ 18.1 Mbytes/sec read.
Copy test: deflateInit failed
deflateInit failed
deflateInit failed
60.2s elapsed, 1086 blocks, 3.57 MB @ 60.7 Kbytes/sec copied.
# 3444 45976.000000 18969034.000000 62160.281250 1777664 5234688 24576 2445312 1908736 5103616 24576 4820992 1581056 5431296 24576 5226496 2232320 4780032 24576 4587520 2232320 4780032 24576 2392064
Running fstime 60 seconds, 4096 byte block size, linear data, max size 17500
Write test: 60.8s elapsed, 1739 blocks, 6.79 MB @ 114 Kbytes/sec written.
Read test: 60.0s elapsed, 19406 blocks, 75.8 MB @ 1.26 Mbytes/sec read.
deflateInit failed
Copy test: deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
Out of Memory: Killed process 8 (sh).
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
Out of Memory: Killed process 8 (sh).
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
deflateInit failed
Out of Memory: Killed process 8 (sh).
Out of Memory: Killed process 8 (sh).
Out of Memory: Killed process 8 (sh).
[-- Attachment #2: combined.png --]
[-- Type: image/png, Size: 5142 bytes --]
[-- Attachment #3: fstime.c --]
[-- Type: text/plain, Size: 13788 bytes --]
/* fstest.c. baed on old Byte Unix filsystem bechmark */
#include <stdio.h>
#include <signal.h>
#include <errno.h>
#include <sys/time.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#define SECONDS 10
#define BUFF_SIZE 131072
#define BLOCK_SIZE 1024 /* default block size */
#define MAX_BLOCKS 2000 /* max number of BUFF_SIZE blocks in file */
#define MAX_SIZE 3584 /* Some default flash size limit */
#define LATENCYSCALE 6
#define FILE0 "testfile0"
#define FILE1 "testfile1"
/* Globals */
int block_size = BLOCK_SIZE;
int max_size = MAX_SIZE;
int max_blocks = MAX_BLOCKS;
int seconds = SECONDS;
int sigalarm = 0;
int verbose = 0;
char buf[BUFF_SIZE];
int latencylogging = 0;
int fd0, fd1;
unsigned w_latencies[LATENCYSCALE][100],
c_latencies[LATENCYSCALE][100];
struct fs_stat_st {
float speed;
unsigned long used[4];
} w_stat, r_stat, c_stat;
int w_test(void);
int r_test(void);
int c_test(void);
void stop_count(int);
void clean_up(int prob);
float elapsed_seconds(struct timeval *start, struct timeval *end)
{
float secs = (end->tv_sec - start->tv_sec) +
(end->tv_usec - start->tv_usec) / 1000000.0;
if (secs < 0) secs = 0.0;
return secs;
}
static void floatprint(FILE *fp, float f, int decimals)
{
char fmt[10];
int intdigits;
if (f >= 100.0)
intdigits=3;
else if(f >= 10.0)
intdigits=2;
else
intdigits=1;
decimals -= intdigits;
if (decimals < 0)
decimals = 0;
sprintf(fmt, "%%.%df ", decimals);
fprintf(fp, fmt, f);
}
static void floatscaleprint(FILE *fp, float f, int decimals)
{
if (f >= 1024*1024*1024.0) {
floatprint(stderr, f/(1024*1024*1024.0), decimals);
fputc('G', stderr);
} else if (f >= 1024*1024.0) {
floatprint(stderr, f/(1024*1024.0), decimals);
fputc('M', stderr);
} else if (f >= 1024.0) {
floatprint(stderr, f/1024.0, decimals);
fputc('K', stderr);
} else floatprint(stderr, f, decimals);
}
void skipff(FILE *fp)
{
int c;
c = getc(fp);
while (c != EOF && c != '\n')
c = getc(fp);
}
static void freemem(unsigned long *used)
{
FILE *fp;
int n;
if (used == NULL)
return;
if ((fp = fopen("/proc/meminfo", "r")) == NULL) {
fprintf(stderr, "Cannot open /proc/meminfo\n");
exit(10);
}
skipff(fp);
n = fscanf(fp, "Mem: %*u %lu %lu %*u %lu %lu",
&used[0], &used[1], &used[2], &used[3]);
if (n >= 4) {
used[1] += used[2] + used[3];
used[0] -= used[2] + used[3];
}
fclose(fp);
}
static void report(float elapsed, int n_blocks, char *opdonename,
struct fs_stat_st *stat)
{
float size = ((float) n_blocks * block_size);
float speed = size / elapsed;
if (verbose) {
fprintf(stderr, "%.1fs elapsed, %d blocks, ",
elapsed, n_blocks);
floatscaleprint(stderr, size, 3);
fprintf(stderr,"B @ ");
floatscaleprint(stderr, speed, 3);
fprintf(stderr, "bytes/sec %s.\n", opdonename);
}
if (stat != NULL) {
stat->speed = speed;
freemem(stat->used);
}
return;
}
static void loglatency(unsigned latencies[LATENCYSCALE][100],
struct timeval *before,
struct timeval *after)
{
long usecs = (after->tv_sec - before->tv_sec)*1000000+
(after->tv_usec - before->tv_usec);
unsigned ix;
if (usecs < 0)
usecs = 0;
if (usecs < 1000) { /* below 1 ms */
ix = usecs / 10; /* max 990 us => 99 */
latencies[0][ix]++;
} else if (usecs < 10000) { /* 1 .. 10 ms */
ix = usecs / 100; /* min 1000 us, max 9900 us */
latencies[1][ix]++;
} else if (usecs < 100000) { /* 10.0 .. 99.9 ms */
ix = usecs / 1000; /* max 99000 us */
latencies[2][ix]++;
} else if (usecs < 1000000) { /* 100 .. 999 ms */
ix = usecs / 10000; /* min 100 000 max 990 000 us */
latencies[3][ix]++;
} else if (usecs < 10000000) { /* 1.. 9.9s s */
ix = usecs / 100000; /* max 9 900 000 us */
latencies[4][ix]++;
} else { /* 10 .. 999 s */
ix = usecs / 1000000; /* max 99 0900 000 us */
if (ix > 99)
ix = 99 ;
latencies[5][ix]++;
}
}
void latencystats(char *name, unsigned latencies[LATENCYSCALE][100])
{
float lat,sum = 0.0;
float min = 1e6, max = 0.0;
unsigned cnt = 0, medcnt = 0;
int i, j;
char *fmt[] = {"%.2f %u\n", "%.1f %u\n", "%.0f %u\n"};
float mul[LATENCYSCALE] = {0.01, 0.1, 1.0, 10, 100, 1000};
float prev, median=-1;
printf("Latency profile for %s\n", name);
for(j = 0; j < LATENCYSCALE; j++) {
char *s = fmt[j < 2 ? j : 2 ];
/* The first latency array covers 0..990 us with 10us granularity,
the next 1.0 .. 9.9 ms with 0.1ms granularty, then 10 .. 99 ms with 1ms
granularty, then 100 .. 990 with 10 ms and so on. */
for(i = (j > 0 ? 10 : 0); i < 100; i++)
if (latencies[j][i]) {
lat = i * mul[j];
sum += lat*latencies[j][i];
cnt += latencies[j][i];
if (lat < min)
min = lat;
if (lat > max)
max = lat;
printf(s, lat, latencies[j][i]);
}
}
if (cnt > 0) {
prev=0;
for(j = 0; j < LATENCYSCALE; j++) {
for(i = (j > 0 ? 10 : 0); i < 100; i++) {
if ((medcnt + latencies[j][i])*2 > cnt) {
median = (prev*medcnt +
(i+1)*mul[j]*(cnt - medcnt)) / cnt;
goto loopout;
}
medcnt += latencies[j][i];
prev = i*mul[j];
}
}
loopout:
printf("Avg latency: %9.3g ms\n", sum / cnt);
printf("Median latency: %9.3g ms\n", median);
printf("Minimum latency: %9g ms\n", min);
printf("Maximum latency: %9g ms\n", max);
}
}
/******************** MAIN ****************************/
int main(int argc, char *argv[])
{
int random = 0;
unsigned long begin[4], end[4];
int i, c;
char *dir = NULL;
char *usage="fstime [-t seconds] [-b blocks] [-d dir] [-m max size]\n";
do {
/* Use posixly correct flag '+' */
c = getopt (argc, argv, "b:t:d:m:vrh?l");
switch (c) {
case -1:
break;
case 'v':
verbose++;
break;
case 'b':
block_size = atoi(optarg);
break;
case 'd':
dir=optarg;
break;
case 't':
seconds = atoi(optarg);
break;
case 'r':
random=1;
break;
case 'm':
max_size=atoi(optarg);
break;
case 'l':
latencylogging = 1;
break;
case 'h':
case '?':
fputs(usage, stderr);
exit(0);
default:
printf ("Unknown option '%c'\n", c);
break;
}
} while (c != -1);
/**** initialize ****/
if (seconds <= 0) {
fprintf(stderr, "No time given\n");
exit(1);
}
if (dir == NULL) {
fprintf(stderr, "No test directory given\n");
exit(1);
}
if (chdir(dir) == -1) {
perror("fstime: chdir");
exit(1);
}
if (block_size <= 0 || block_size > BUFF_SIZE) {
fprintf(stderr,
"Illegal blocksize. Must be +- 1..%d\n",
BUFF_SIZE);
exit(1);
}
max_blocks = max_size*1024 / block_size;
freemem(&begin[0]);
if (verbose)
printf("Running fstime %d seconds, %d byte block size, %s data, max size %d\n",
seconds,
block_size,
random?"random":"linear",
max_size);
if ((fd0 = open(FILE0, O_CREAT | O_TRUNC | O_RDWR, 0666)) == -1) {
perror("fstime: open");
exit(1);
}
if ((fd1 = open(FILE1, O_CREAT|O_TRUNC|O_RDWR, 0666)) == -1) {
perror("fstime: open");
exit(1);
}
/* fill buffer with random or linear data */
for (i = 0; i < BUFF_SIZE; i++) {
if (random)
buf[i] = rand() & 0xff;
else
buf[i] = i & 0xff;
}
if (latencylogging)
for (i = 0; i < 100; i++) {
int j;
for(j = 0; j < LATENCYSCALE; j++) {
w_latencies[j][i] = 0;
c_latencies[j][i] = 0;
}
}
signal(SIGKILL, clean_up);
if(w_test() || r_test() || c_test()) {
clean_up(0);
exit(1);
}
clean_up(0);
freemem(&end[0]);
if (verbose > 1)
printf("Fields:\t 1 blocksize\n"
"\t 2 write speed\n"
"\t 3 read speed\n"
"\t 4 copy speed\n"
"\t 5 bytes available at begin before opening files\n"
"\t 6 bytes used at begin\n"
"\t 7 bytes buffered at begin\n"
"\t 8 bytes cached at begin\n"
"\t 9 bytes used after write test\n"
"\t10 bytes available after write test\n"
"\t11 bytes buffered after write test\n"
"\t12 bytes cached after write test\n"
"\t13 bytes used after read test\n"
"\t14 bytes available after read test\n"
"\t15 bytes buffered after read test\n"
"\t16 bytes cached after read test\n"
"\t17 bytes used after copy test\n"
"\t18 bytes available after copy test\n"
"\t19 bytes buffered after copy test\n"
"\t20 bytes cached after copy test\n"
"\t21 bytes used at the end after closing files\n"
"\t22 bytes available at the end\n"
"\t23 bytes buffered at the end\n"
"\t24 bytes cached at the end\n");
printf("# %d %f %f %f ",
block_size, w_stat.speed, r_stat.speed, c_stat.speed);
printf("%lu %lu %lu %lu ",
begin[0], begin[1], begin[2], begin[3]);
printf("%lu %lu %lu %lu ",
w_stat.used[0],
w_stat.used[1],
w_stat.used[2],
w_stat.used[3]);
printf("%lu %lu %lu %lu ",
r_stat.used[0],
r_stat.used[1],
r_stat.used[2],
r_stat.used[3]);
printf("%lu %lu %lu %lu ",
c_stat.used[0],
c_stat.used[1],
c_stat.used[2],
c_stat.used[3]);
printf("%lu %lu %lu %lu\n",
end[0], end[1], end[2], end[3]);
if (latencylogging) {
latencystats("write latencies in the write test", w_latencies);
latencystats("write latencies in the copy test", c_latencies);
}
exit(0);
}
/* write test */
int w_test()
{
long n_blocks = 0L;
int f_blocks;
struct timeval start, end, l0, l1;
int status;
#ifdef USE_SYNC
sync();
sleep(5); /* to insure the sync */
#endif
if (verbose)
fprintf(stderr, "Write test: ");
signal(SIGALRM,stop_count);
sigalarm = 0; /* reset alarm flag */
alarm(seconds);
gettimeofday(&start, NULL);
while(!sigalarm) {
/* On first alarm may break the loop before complete. On
further rounds must go through */
for(f_blocks=0;
f_blocks < max_blocks &&
(!sigalarm || n_blocks >= max_blocks);
++f_blocks) {
if (latencylogging) {
gettimeofday(&l0, NULL);
status = write(fd0, buf, block_size);
gettimeofday(&l1, NULL);
loglatency(w_latencies, &l0, &l1);
} else
status = write(fd0, buf, block_size);
if (status < 0) {
if (errno != EINTR) {
perror("fstime: write");
return(-1);
} else stop_count(0);
}
++ n_blocks;
}
lseek(fd0, SEEK_SET, 0); /* rewind */
}
/* stop clock */
gettimeofday(&end, NULL);
report(elapsed_seconds(&start, &end), n_blocks, "written",
&w_stat);
return(0);
}
/* read test */
int r_test()
{
long n_blocks = 0L;
int n_read;
extern int errno;
struct timeval start, end;
if (verbose)
fprintf(stderr, "Read test: ");
/* rewind */
#ifdef USE_SYNC
sync();
sleep(10+seconds/2);
#endif
errno = 0;
lseek(fd0, SEEK_SET, 0);
signal(SIGALRM,stop_count);
sigalarm = 0; /* reset alarm flag */
alarm(seconds);
gettimeofday(&start, NULL);
while(!sigalarm) {
/* read while checking for an error */
n_read = read(fd0, buf, block_size);
if (n_read == 0) {
lseek(fd0, SEEK_SET, 0); /* rewind at end of file */
continue;
} else if (n_read < 0)
switch(errno) {
case EINVAL:
lseek(fd0, SEEK_SET, 0); /* rewind at end of file */
continue;
break;
case EINTR:
stop_count(0);
break;
default:
perror("fstime: read");
return(-1);
break;
}
++ n_blocks;
}
/* stop clock */
gettimeofday(&end, NULL);
report(elapsed_seconds(&start, &end), n_blocks, "read", &r_stat);
return(0);
}
/* copy test */
int c_test()
{
long n_blocks = 0L;
struct timeval start, end, l0, l1;
int n_read, n_written;
if (verbose)
fprintf(stderr, "Copy test: ");
/* rewind */
#ifdef USE_SYNC
sync();
sleep(10+seconds/2); /* to insure the sync */
#endif
lseek(fd0, SEEK_SET, 0);
signal(SIGALRM,stop_count);
sigalarm = 0; /* reset alarm flag */
alarm(seconds);
gettimeofday(&start, NULL);
while(!sigalarm) {
n_read = read(fd0, buf, block_size);
if (n_read == 0) { /* EOF */
lseek(fd0, SEEK_SET, 0); /* rewind at end of file */
lseek(fd1, SEEK_SET, 0); /* rewind the output as well */
continue;
} else if (n_read < 0) {
switch(errno) {
case 0:
case EINVAL:
lseek(fd0, SEEK_SET, 0); /* rewind at end of file */
lseek(fd1, SEEK_SET, 0); /* rewind the output as well */
continue;
break;
case EINTR:
stop_count(0);
break;
default:
fprintf(stderr, "fstime: copy read (%d): %s\n",
errno, strerror(errno));
return(-1);
break;
}
}
if (latencylogging) {
gettimeofday(&l0, NULL);
n_written = write(fd1, buf, block_size);
gettimeofday(&l1, NULL);
loglatency(c_latencies, &l0, &l1);
} else
n_written = write(fd1, buf, block_size);
if (n_written < 0) {
if (errno == ENOSPC) {
printf("fstime: copy write: %s at block %ld, max blocks %d.\n",
strerror(errno), n_blocks, max_blocks);
system("df .");
break; /* FS full */
} else if (errno != EINTR) {
fprintf(stderr, "fstime: copy write (%d): %s\n",
errno, strerror(errno));
return(-1);
} else stop_count(0);
} else
++ n_blocks;
}
/* stop clock */
gettimeofday(&end, NULL);
report(elapsed_seconds(&start, &end), n_blocks, "copied", &c_stat);
return(0);
}
void stop_count(int i)
{
sigalarm = 1;
}
void clean_up(int prob)
{
if (close(fd0) || close(fd1))
perror("fstime: close");
if (unlink(FILE0) || unlink(FILE1))
perror("fstime: unlink");
}
[-- Attachment #4: test-rnd.sh --]
[-- Type: application/x-sh, Size: 19462 bytes --]
[-- Attachment #5: test-lin.sh --]
[-- Type: application/x-sh, Size: 1476 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-05-02 12:56 Benchmarking JFFS2 Jarkko Lavinen
@ 2002-05-02 13:40 ` David Woodhouse
2002-05-03 17:19 ` Jarkko Lavinen
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: David Woodhouse @ 2002-05-02 13:40 UTC (permalink / raw)
To: Jarkko Lavinen; +Cc: MTD List, jffs-dev
jlavi@iki.fi said:
> I have also tried to run the tests using linear data that compresses
> easily. I have encountered repeatedly very low memory and out of
> memory condition and messages like "Memory fail", "deflateInit failed"
> and when the memory really runs out repeated "Out of memory". I don't
> think a benchmark program should be able to bring the system to its
> knwws simply by exercising the file-system. I wouldn't bet on the
> stability and maturity of the embedded device either.
The 'deflateInit failed' and memory problems are solved with the
application of the 'shared-zlib' patches. I'm waiting for 2.4.19 to be
released before sending those to Marcelo for 2.4.20-pre1, but they're at
ftp.kernel.org:/pub/linux/kernel/people/dwmw2/shared-zlib and in the 2.4-ac
trees.
Your results on a clean file system are as expected. We write nodes which
do not cross a 4096-byte boundary. So 4096-byte writes and multiples of
4096 bytes will always write full-sized blocks with a full 4096 bytes of
data prepended by a node header, and the effective write speed approaches
a reasonable proportion of the maximum write bandwidth available. Due to
the addition of node headers and the time taken by compression, the
full write bandwidth of the raw flash chips cannot be achieved.
Where your write size is not a multiple of 4096 bytes, some nodes which do
not carry a full payload must be written, and this is obviously less
efficient.
jlavi@iki.fi said:
> Question 1:
> Is the lack of performance at higher block sizes normal?
> Question 2:
> Is the lack of performance at higher blocks sizes due to garbage
> collection?
We break up writes of greater than 4 KiB into 4 KiB chunks. A write size of
8 KiB or any other multiple of 4 KiB should give you identical performance
the write size of 4 KiB. I suspect your results are skewed, and can see two
possible reasons.
1. The file system is getting progressively dirtier as your tests continue.
Perhaps you should take a complete snapshot of the flash when the file
system is 'dirty', and reinstall that precise image before each run.
2. Garbage collection is happening in the background thread between your
benchmark's timed write attempts, thereby making the smaller writes
_look_ more efficient. Possibly either kill (or SIGSTOP) the GC thread
to prevent this or call gettimeofday() once each time round the loop
rather than twice, comparing with the value from the previous loop.
Neither of the above are valid excuses for the fact that write performance
on a dirty file system sucks royally. There are some things we can do about
that.
1. Stop the GC from decompressing then immediately recompressing nodes that
it's just going to write out unchanged. It's a stupid waste of CPU time.
2. We have a 'dirty_list' containing blocks which have _any_ dirty space,
and we pick blocks from the to garbage-collect from. If there's only a
few bytes of dirty space, we GC the whole block just to gain a few
bytes. We should keep a 'very_dirty_list' of blocks with _significant_
amounts of dirty space and favour that even more when picking blocks
to GC, especially when doing just-in-time GC rather than from the
background thread.
If we're feeling really brave then we can try:
3. JFFS2 current keeps a single 'nextblock' pointer for the block to which
new nodes are written. We interleave new writes from userspace with GC
copies of old data; mixing long-lived data with new. This means we end up
with blocks to be GC'd which have static long-lived data in. We should
keep _two_ blocks for writing, one for new data and one for data being
GC'd; this way the static data tend to get grouped together into blocks
which stay clean and are (almost) never GC'd, while short-lived data are
also grouped together into blocks which will have a higher proportion
of dirty space and hence will give faster GC progress.
If we do this, it's going to completely screw up our NAND wbuf support/
flushing logic, but it's probably worth it anyway.
--
dwmw2
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-05-02 13:40 ` David Woodhouse
@ 2002-05-03 17:19 ` Jarkko Lavinen
2002-05-03 17:54 ` David Woodhouse
2002-05-12 19:04 ` David Woodhouse
2002-10-17 10:36 ` Jarkko Lavinen
2 siblings, 1 reply; 16+ messages in thread
From: Jarkko Lavinen @ 2002-05-03 17:19 UTC (permalink / raw)
To: David Woodhouse; +Cc: MTD List, jffs-dev
[-- Attachment #1: Type: text/plain, Size: 1422 bytes --]
> I suspect your results are skewed, and can see two possible reasons.
> 1. The file system is getting progressively dirtier as your tests continue.
> Perhaps you should take a complete snapshot of the flash when the file
> system is 'dirty', and reinstall that precise image before each run.
It is quite laborious to flash a new file-system for each benchmark
run. Instead I added an option to make clean file-system dirty.
The program opens two files and fills the file-system writing these two
files in turns. The file A writes big blocks (about 2K) and is thrown
away once the file-system fills. The file B writes smaller blocks but
is left in place. Because the writes for the two files occurred in tuns
every erase block will have about 90% of garbage and small live data
pieces here and there. This dirtiness is reproducible and but perhaps even
too dirty and artificial for benchmarking.
Rerunning the benchmark with this setup had dramatic result. You were
right. My results were badly skewed.
Now the write performance peak is 11300 B/s at 1024 byte block
size. After 4K block size the throughput drops close to 2KB/s. I have
included the gnuplot figure containing the previous result from clean
file-system and the new result from extra dirty file-system.
During my benchmark runs I have encountered 'Eep. read_inode() failed for
ino #...'. Is this something I should be concerned?
Jarkko Lavinen
[-- Attachment #2: combined.png --]
[-- Type: image/png, Size: 3253 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-05-03 17:19 ` Jarkko Lavinen
@ 2002-05-03 17:54 ` David Woodhouse
2002-05-06 12:19 ` Jarkko Lavinen
0 siblings, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2002-05-03 17:54 UTC (permalink / raw)
To: Jarkko Lavinen; +Cc: MTD List, jffs-dev
jlavi@iki.fi said:
> During my benchmark runs I have encountered 'Eep. read_inode() failed
> for ino #...'. Is this something I should be concerned?
Er, yes, that is concerning. Does it say why? If it's only occasional, then
it's almost certainly memory allocation problems.
It would be useful if you could provide a profiling run from the
worst-performing case, so we can see where it's spending the time. I've
already voiced my suspicions, but I've been wrong before; I only wrote it :)
--
dwmw2
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-05-03 17:54 ` David Woodhouse
@ 2002-05-06 12:19 ` Jarkko Lavinen
2002-05-07 15:02 ` David Woodhouse
2002-10-08 17:10 ` David Woodhouse
0 siblings, 2 replies; 16+ messages in thread
From: Jarkko Lavinen @ 2002-05-06 12:19 UTC (permalink / raw)
To: David Woodhouse; +Cc: MTD List, jffs-dev
> Er, yes, that is concerning. Does it say why? If it's only occasional, then
> it's almost certainly memory allocation problems.
I have caught the message 8 times. It doesn't occur rather seldom when I have
been benchmarking JFFS2 over the weekend.
On five cases the first line was "Unknown INCOMPAT nodetype C002 at 0020293C"
(address varies).
On all 8 cases the next lines are "jffs2_do_read_inode(): No data
nodes found for ino #" then "Eep. read_inode() failed for ino #"
Last year there were two reports about C002 type node: 28 Jun 2001 Frederic
Giasson, and 08 Aug 2001 Xavier DEBREUIL. Is this the same problem they had?
> It would be useful if you could provide a profiling run from the
Do you mean kernel profiling? I am using JFFS2 on an Arm based device and
kernel profiling is not available for Arm or is it?
Jarkko lavinen
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-05-06 12:19 ` Jarkko Lavinen
@ 2002-05-07 15:02 ` David Woodhouse
2002-05-07 15:13 ` Thomas Gleixner
2002-05-07 17:46 ` Jarkko Lavinen
2002-10-08 17:10 ` David Woodhouse
1 sibling, 2 replies; 16+ messages in thread
From: David Woodhouse @ 2002-05-07 15:02 UTC (permalink / raw)
To: Jarkko Lavinen; +Cc: MTD List, jffs-dev
jlavi@iki.fi said:
> I have caught the message 8 times. It doesn't occur rather seldom
> when I have been benchmarking JFFS2 over the weekend.
> On five cases the first line was "Unknown INCOMPAT nodetype C002 at
> 0020293C" (address varies).
Er, this means it's marked as obsolete on the flash but not in memory. I
have a vague recollection of having seen this and worked out the cause.
Is this with the latest code from the jffs2-2_4-branch of CVS? Does it also
happen with the trunk code?
> Do you mean kernel profiling? I am using JFFS2 on an Arm based device
> and kernel profiling is not available for Arm or is it?
Yes. Kernel profiling should work fine on ARM.
--
dwmw2
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-05-07 15:02 ` David Woodhouse
@ 2002-05-07 15:13 ` Thomas Gleixner
2002-05-07 17:46 ` Jarkko Lavinen
1 sibling, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2002-05-07 15:13 UTC (permalink / raw)
To: David Woodhouse, Jarkko Lavinen; +Cc: MTD List, jffs-dev
On Tuesday, 7. May 2002 17:02, David Woodhouse wrote:
> jlavi@iki.fi said:
> > I have caught the message 8 times. It doesn't occur rather seldom
> > when I have been benchmarking JFFS2 over the weekend.
> >
> > On five cases the first line was "Unknown INCOMPAT nodetype C002 at
> > 0020293C" (address varies).
>
> Er, this means it's marked as obsolete on the flash but not in memory. I
> have a vague recollection of having seen this and worked out the cause.
> Is this with the latest code from the jffs2-2_4-branch of CVS? Does it also
> happen with the trunk code?
Yep, this happens with the trunk code too. As I can remember, it happend,
when I removed a directory and had a power loss before everything was written
to flash. On the next mount I got some errors and was not able to remove
the remains of this directory.
--
Thomas
___________________________________
autronix automation GmbH
http://www.autronix.de gleixner@autronix.de
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-05-07 15:02 ` David Woodhouse
2002-05-07 15:13 ` Thomas Gleixner
@ 2002-05-07 17:46 ` Jarkko Lavinen
2002-05-07 19:42 ` David Woodhouse
1 sibling, 1 reply; 16+ messages in thread
From: Jarkko Lavinen @ 2002-05-07 17:46 UTC (permalink / raw)
To: David Woodhouse; +Cc: MTD List, jffs-dev
> Is this with the latest code from the jffs2-2_4-branch of CVS? Does it also
> happen with the trunk code?
The code was not the latest and I really hope I am not causing a wrong alarm.
Kernel 2.4.15, JFFS2 and mtd code from 11 February 2002 snapshot
in ftp://ftp.uk.linux.org/pub/people/dwmw2/mtd/cvs/.
Jarkko Lavinen
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-05-07 17:46 ` Jarkko Lavinen
@ 2002-05-07 19:42 ` David Woodhouse
0 siblings, 0 replies; 16+ messages in thread
From: David Woodhouse @ 2002-05-07 19:42 UTC (permalink / raw)
To: Jarkko Lavinen; +Cc: MTD List, jffs-dev
jlavi@iki.fi said:
> The code was not the latest and I really hope I am not causing a
> wrong alarm. Kernel 2.4.15, JFFS2 and mtd code from 11 February 2002
> snapshot in ftp://ftp.uk.linux.org/pub/people/dwmw2/mtd/cvs/.
Forgive me - my brain goes in and out of phase with JFFS2, and it's mostly
out at this point. Thomas seems generally more clueful than I.
Version 1.63 of scan.c contains what was intended to be an optimisation (from
Joakim Tjernlund), but which should also have the effect of preventing
obsolete nodes from ending up on the lists. That's in the trunk code but
not the jffs2-2_4-branch.
--
dwmw2
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-05-02 13:40 ` David Woodhouse
2002-05-03 17:19 ` Jarkko Lavinen
@ 2002-05-12 19:04 ` David Woodhouse
2002-05-13 8:40 ` Jarkko Lavinen
2002-10-17 10:36 ` Jarkko Lavinen
2 siblings, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2002-05-12 19:04 UTC (permalink / raw)
To: Jarkko Lavinen; +Cc: MTD List, jffs-dev
dwmw2@infradead.org said:
> 2. We have a 'dirty_list' containing blocks which have _any_ dirty space,
> and we pick blocks from the to garbage-collect from. If there's only a
> few bytes of dirty space, we GC the whole block just to gain a few
> bytes. We should keep a 'very_dirty_list' of blocks with _significant_
> amounts of dirty space and favour that even more when picking blocks
> to GC, especially when doing just-in-time GC rather than from the
> background thread.
I've done this, and every block with more than 50% dirty space now goes on
the 'very_dirty_list'. With the file system snapshot I'd taken from my iPAQ
to investigate what turned out to the double-free bug, the garbage collect
thread was GC'ing 6 blocks immediately after mounting before it stopped.
With the very_dirty_list optimisation, that dropped to two.
I'm beginning to wonder whether we should actually abolish the separate
clean/dirty/very_dirty lists and keep a single used_list in decreasing
order of dirty space, then in jffs2_find_gc_block just pick the block
number $N in that list, where N is exponentially distributed -- high
chance of being '1', small but non-negligible chance of being near the end
of the list.
Keeping the used_list sorted should be fairly simple and cheap - each time
you make a block slightly dirtier you just do a little bit of bubble-sort,
and moving nextblock onto the used_list is fairly rare so it doesn't matter
too much that it's a bit slower.
Does anyone have any ideas on how we'd generate the random number N for
jffs2_find_gc_block() though?
--
dwmw2
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-05-12 19:04 ` David Woodhouse
@ 2002-05-13 8:40 ` Jarkko Lavinen
2002-05-13 9:11 ` David Woodhouse
0 siblings, 1 reply; 16+ messages in thread
From: Jarkko Lavinen @ 2002-05-13 8:40 UTC (permalink / raw)
To: David Woodhouse; +Cc: Jarkko Lavinen, MTD List, jffs-dev
> order of dirty space, then in jffs2_find_gc_block just pick the block
> number $N in that list, where N is exponentially distributed -- high
> chance of being '1', small but non-negligible chance of being near the end
> of the list.
...
> Does anyone have any ideas on how we'd generate the random number N for
> jffs2_find_gc_block() though?
Starting from the list head, one could produce random numbers x, 0 <= x < 1,
and proceed to the next node only if the x is below a ratio r, 0 < r < 1. One
could repeat the same procedure n times. The probabilities to reach first few
nodes would be p1 = r, p2 = r*(1 - r), p3 = r*(1 - r)^2 and pi = r*(1 - r)^i.
Probability to reach the last node n would be pn = (1 - r)^(n - 1).
For example, if r were 0.9 and the list length 4, p1 = 0.9, p2 = 0.09,
p3 = 0.009 and p4 = 0.001.
Is this too naive approach? This relies perhaps too much on the randomness of
semi-random numbers and might mean only first few nodes are ever picked up
and never nodes from the tail.
Jarkko Lavinen
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-05-13 8:40 ` Jarkko Lavinen
@ 2002-05-13 9:11 ` David Woodhouse
0 siblings, 0 replies; 16+ messages in thread
From: David Woodhouse @ 2002-05-13 9:11 UTC (permalink / raw)
To: Jarkko Lavinen; +Cc: MTD List, jffs-dev
jlavi@iki.fi said:
> Is this too naive approach? This relies perhaps too much on the
> randomness of semi-random numbers and might mean only first few nodes
> are ever picked up and never nodes from the tail.
I think naïve is fine. In fact, I can't even justify the suggestion that it
be exponential -- we could probably do just as well by picking the first
(i.e. dirtiest) block with 85% probability, and picking some other block
off the list with uniform distribution the other 15% of the time.
--
dwmw2
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-05-06 12:19 ` Jarkko Lavinen
2002-05-07 15:02 ` David Woodhouse
@ 2002-10-08 17:10 ` David Woodhouse
1 sibling, 0 replies; 16+ messages in thread
From: David Woodhouse @ 2002-10-08 17:10 UTC (permalink / raw)
To: Jarkko Lavinen; +Cc: MTD List, jffs-dev
On Mon, 6 May 2002, jlavi@iki.fi said:
> I have caught the message 8 times. It doesn't occur rather seldom
> when I have been benchmarking JFFS2 over the weekend.
> On five cases the first line was "Unknown INCOMPAT nodetype C002 at
> 0020293C" (address varies).
> On all 8 cases the next lines are "jffs2_do_read_inode(): No data
> nodes found for ino #" then "Eep. read_inode() failed for ino #"
> Last year there were two reports about C002 type node: 28 Jun 2001
> Frederic Giasson, and 08 Aug 2001 Xavier DEBREUIL. Is this the same
> problem they had?
Er, yes. What happens is that the VFS calls jffs2_read_inode() at the same
time as it's calling jffs2_clear_inode() for the inode in question. So
clear_inode() is busily obsoleting all the nodes which belong to the inode
while read_inode() is trying to read them. Giving you either a whinge that
clear_inode() got there first and there are _no_ nodes, or sometimes a
whinge about a node which was marked valid in memory but by the time we
read it from the flash it was marked obsolete.
As this behaviour from the VFS is intentional, try this (patch to 2.4
branch; it's already in the CVS head for other reasons):
Index: fs/jffs2/gc.c
===================================================================
RCS file: /home/cvs/mtd/fs/jffs2/gc.c,v
retrieving revision 1.52.2.3
diff -u -p -r1.52.2.3 gc.c
--- fs/jffs2/gc.c 12 May 2002 17:27:08 -0000 1.52.2.3
+++ fs/jffs2/gc.c 8 Oct 2002 17:06:44 -0000
@@ -134,8 +134,10 @@ int jffs2_garbage_collect_pass(struct jf
D1(printk(KERN_DEBUG "garbage collect from block at phys 0x%08x\n", jeb->offset));
- if (!jeb->used_size)
+ if (!jeb->used_size) {
+ up(&c->alloc_sem);
goto eraseit;
+ }
raw = jeb->gc_node;
@@ -156,6 +158,7 @@ int jffs2_garbage_collect_pass(struct jf
/* Inode-less node. Clean marker, snapshot or something like that */
spin_unlock_bh(&c->erase_completion_lock);
jffs2_mark_node_obsolete(c, raw);
+ up(&c->alloc_sem);
goto eraseit_lock;
}
@@ -170,8 +173,8 @@ int jffs2_garbage_collect_pass(struct jf
if (is_bad_inode(inode)) {
printk(KERN_NOTICE "Eep. read_inode() failed for ino #%u\n", inum);
/* NB. This will happen again. We need to do something appropriate here. */
- iput(inode);
up(&c->alloc_sem);
+ iput(inode);
return -EIO;
}
@@ -234,6 +237,7 @@ int jffs2_garbage_collect_pass(struct jf
}
upnout:
up(&f->sem);
+ up(&c->alloc_sem);
iput(inode);
eraseit_lock:
@@ -250,7 +254,6 @@ int jffs2_garbage_collect_pass(struct jf
jffs2_erase_pending_trigger(c);
}
spin_unlock_bh(&c->erase_completion_lock);
- up(&c->alloc_sem);
return ret;
}
Index: fs/jffs2/readinode.c
===================================================================
RCS file: /home/cvs/mtd/fs/jffs2/readinode.c,v
retrieving revision 1.58.2.5
diff -u -p -r1.58.2.5 readinode.c
--- fs/jffs2/readinode.c 5 Mar 2002 22:40:03 -0000 1.58.2.5
+++ fs/jffs2/readinode.c 8 Oct 2002 17:06:44 -0000
@@ -468,15 +468,28 @@ void jffs2_clear_inode (struct inode *in
struct jffs2_node_frag *frag, *frags;
struct jffs2_full_dirent *fd, *fds;
struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
+ /* I don't think we care about the potential race due to reading this
+ without f->sem. It can never get undeleted. */
+ int deleted = f->inocache && !f->inocache->nlink;
D1(printk(KERN_DEBUG "jffs2_clear_inode(): ino #%lu mode %o\n", inode->i_ino, inode->i_mode));
+ /* If it's a deleted inode, grab the alloc_sem. This prevents
+ jffs2_garbage_collect_pass() from deciding that it wants to
+ garbage collect one of the nodes we're just about to mark
+ obsolete -- by the time we drop alloc_sem and return, all
+ the nodes are marked obsolete, and jffs2_g_c_pass() won't
+ call iget() for the inode in question.
+ */
+ if (deleted)
+ down(&c->alloc_sem);
+
down(&f->sem);
frags = f->fraglist;
fds = f->dents;
if (f->metadata) {
- if (!f->inocache->nlink)
+ if (deleted)
jffs2_mark_node_obsolete(c, f->metadata->raw);
jffs2_free_full_dnode(f->metadata);
}
@@ -488,7 +501,7 @@ void jffs2_clear_inode (struct inode *in
if (frag->node && !(--frag->node->frags)) {
/* Not a hole, and it's the final remaining frag of this node. Free the node */
- if (!f->inocache->nlink)
+ if (deleted)
jffs2_mark_node_obsolete(c, frag->node->raw);
jffs2_free_full_dnode(frag->node);
@@ -502,5 +515,8 @@ void jffs2_clear_inode (struct inode *in
}
up(&f->sem);
+
+ if(deleted)
+ up(&c->alloc_sem);
};
--
dwmw2
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-05-02 13:40 ` David Woodhouse
2002-05-03 17:19 ` Jarkko Lavinen
2002-05-12 19:04 ` David Woodhouse
@ 2002-10-17 10:36 ` Jarkko Lavinen
2003-01-23 12:09 ` David Woodhouse
2 siblings, 1 reply; 16+ messages in thread
From: Jarkko Lavinen @ 2002-10-17 10:36 UTC (permalink / raw)
To: MTD List, jffs-dev; +Cc: David Woodhouse
On Thu, May 02, 2002 at 02:40:58PM +0100, David Woodhouse wrote:
[ JFFS2 dirty fs poor performance discussd ]
> Neither of the above are valid excuses for the fact that write performance
> on a dirty file system sucks royally. There are some things we can do about
> that.
[ three optimizations suggested ]
In May I was using a prototype board running on ARM 925T (Omap). Since
then I have changed the testing to TI EVM 1510 evaluation system
available from TI. In the spring I was using 120 MHz, now 60MHz. The
flash was then 28F320B3 Strataflash chip, now it is Intel 28F128
StrataFlash. Also I am now using reasonably new snapshot of JFFS2 from
October 9th.
The performance in last spring:
http://www.hut.fi/~jlavi/jffs2/h_fstime1605combined.png
Please note the block size scale is logarithmic. Dirty files system
performance peaked at the block size of 600 .. 700 bytes and at
singular point of 4096 bytes.
The performance now with the JFFS2 snapshot from October 9th:
http://www.hut.fi/~jlavi/jffs2/h_14102002_dirtyblocks_2.4.19_CVS0910.png
Please note the block size scale is linear. The dirty file system
performance follows now very accurately to the clean file system
performance. This is partially due improved measurement accuracy:
longer benchmark time, using warmup pass, setting the benchmark do
equal mount of overwriting among data-points. The dirty file system
performance measured this way fluctuates +- 1% with 480 second
benchmark time.
Latencies:
http://www.hut.fi/~jlavi/jffs2/refpoint_1610_60M_+ID_t480w120_l100000_v2.4.19_CVS0910.png
This time double logarithmic chart. I don't see over 10 second
latencies anymore. In the latency benchmark the maximum latency of
3.3s occured once among 100 000 samples.
Performance fluctuation due to GC:
http://www.hut.fi/~jlavi/jffs2/gc_10102002_scatterdirty_2.4.19_MTD0910.png
The sample is short but shows the general pattern of how the background GC
thread does its work.
Of course measuring write throughput is just one side of performance
but I think it is fair to say JFFS2 write performane on dirty file
system with low performance CPUs has improved dramatically since last
spring.
Jarkko Lavinen
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Benchmarking JFFS2
2002-10-17 10:36 ` Jarkko Lavinen
@ 2003-01-23 12:09 ` David Woodhouse
2003-02-13 12:38 ` Jarkko Lavinen
0 siblings, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2003-01-23 12:09 UTC (permalink / raw)
To: Jarkko Lavinen; +Cc: MTD List, jffs-dev
On Thu, 17 Oct 2002, jlavi@iki.fi said:
> Of course measuring write throughput is just one side of performance
> but I think it is fair to say JFFS2 write performane on dirty file
> system with low performance CPUs has improved dramatically since last
> spring.
Fancy repeating the test with current CVS? I just committed the
oft-discussed code required to avoid decompressing and recompressing nodes
which haven't changed -- and to avoid even doing the iget() and building up
the node lists for the inodes to which they belong in that case too.
--
dwmw2
^ permalink raw reply [flat|nested] 16+ messages in thread
* Benchmarking JFFS2
2003-01-23 12:09 ` David Woodhouse
@ 2003-02-13 12:38 ` Jarkko Lavinen
0 siblings, 0 replies; 16+ messages in thread
From: Jarkko Lavinen @ 2003-02-13 12:38 UTC (permalink / raw)
To: linux-mtd
On Thu, Jan 23, 2003 at 12:09:55PM +0000, David Woodhouse wrote:
> Fancy repeating the test with current CVS? I just committed the
> oft-discussed code required to avoid decompressing and recompressing nodes
> which haven't changed -- and to avoid even doing the iget() and building up
> the node lists for the inodes to which they belong in that case too.
Jan 23 CVS. Latency accumulation around 1 second, after that only sporadic
long latencies. 99.99% of the 1 million writes occurs in less than
1.3 seconds. Both frequency and time scales are logarithmic.
http://www.hut.fi/~jlavi/jffs2/refpoint_0302_60M+ID_t480w240_l1000000_linux-2.4.19_MTD2301.png
Nov 27 CVS: Latency accumulations at 1, 2, 3 seconds. http://www.hut.fi/~jlavi/jffs2/refpoint_2811_60M_+ID_t480w120_l100000_v2.4.19_CVS2711.png
Comparing November 27 snapshot to 23 January snapshot there seems to
be no difference in the average write throughput (at the block size of
4KiB) but the write latencies and the latency distribution differ. In
the new code latencies longer than 1 second (approx. the time to erase
1 sector) are less frequent. I was still able to catch latencies of
5..10 seconds, but the occurrence of less than 1:100 000.
In dirty file system the benchmark overwrites the file several times
and the overwriting is sequential: write till given file size, seek to
begin, write over till end, seek.... I think the garbage collection
likes this sequential overwriting. There will be plenty of very dirty
erase sectors with very little live data and GC doesn't have to move
much live data to other erase sectors.
Because of sequential overwriting the cost of copying live data in GC
is negligible. I have also tried overwriting at random locations
instead on sequential order.
Here is another comparison of the results:
+--------------------+---------------+-------+--------+------+------+
| | Avg write spd | Avg | 99.9% | Max | |
| | B/sa| lat | < lat | lat | Cnt |
+--------------------+---------------+-------+--------+------+------+
| Nov27, sequential | 39000 / 39300 | 104ms | 3200ms | 3.4s | 100k |
| Nov27, seq, no gcd | 39100 / 39100 | 104ms | 3100ms | 3.1s | 10k |
| Jan23, sequential | 38800 / 38700 | 105ms | 1000ms | 10s | 1M |
| Jan23, seq, no gcd | 38200 / 38900 | 105ms | 1200ms | 7.4s | 1M |
| Nov27, random seek | 38000 / 40000 | 102ms | 3600ms | 4.0s | 10K |
| Jan23, random seek | 38300 / 42600 | 97ms | 1200ms | 1.3s | 10k |
+--------------------+---------------+-------+--------+------+------+
There are two writing speeds mentioned in the table. The first is
without latency logging and the second is with latencies logged.
Latencies are measured by calling gettimeofday() once per loop pass.
The "Cnt" stands for how many times write() was called.
In sequential overwrite tests the error in measurement is +- 2% and if
the measurement is repeated several times the standard deviation is
1%. With the random seek tests I haven't measured the error.
It is interesting to notice that the throughput increases about 5% if
the latencies are measured with updates@random locations.
One can grow the maximum latency in the tests by running the benchmark
longer. A short maximum latency means either the writes really are
executed shorter or that the benchmark program was not run long enough.
Jarkko Lavinen
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2003-02-13 12:38 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-02 12:56 Benchmarking JFFS2 Jarkko Lavinen
2002-05-02 13:40 ` David Woodhouse
2002-05-03 17:19 ` Jarkko Lavinen
2002-05-03 17:54 ` David Woodhouse
2002-05-06 12:19 ` Jarkko Lavinen
2002-05-07 15:02 ` David Woodhouse
2002-05-07 15:13 ` Thomas Gleixner
2002-05-07 17:46 ` Jarkko Lavinen
2002-05-07 19:42 ` David Woodhouse
2002-10-08 17:10 ` David Woodhouse
2002-05-12 19:04 ` David Woodhouse
2002-05-13 8:40 ` Jarkko Lavinen
2002-05-13 9:11 ` David Woodhouse
2002-10-17 10:36 ` Jarkko Lavinen
2003-01-23 12:09 ` David Woodhouse
2003-02-13 12:38 ` Jarkko Lavinen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox