* Was able to reproduce "cp: cannot stat file.x: Input/output error"
@ 2004-08-06 4:54 David Dabbs
2004-08-06 7:31 ` mjt
0 siblings, 1 reply; 47+ messages in thread
From: David Dabbs @ 2004-08-06 4:54 UTC (permalink / raw)
To: 'ReiserFS List'
Ran
./mongo.pl INFO_R4=rc2-mm2 FSTYPE=reiser4 LOG=/home/ddabbs/mongo/reiser4.log
dev=/dev/hdb1 dir=/mnt/testfs file_size=8000 bytes=512000000 NPROC=3
REP_COUNTER=3 SYNC=off WRITE_BUFFER=4096 GAMMA=0.0 DD_MBCOUNT=768
PHASE_MKDIRS=on PHASE_MKFILES=on PHASE_COPY=cp PHASE_OVERWRITE=on
PHASE_APPEND=on PHASE_READ=find PHASE_DELETE=rm PHASE_MODIFY=on
MAX_FNAME_LEN=23 RUN
Somewhere during PHASE_COPY, errors similar to the following (the ones I saw
the other day) spewed across the terminal:
cp: cannot stat
`/mnt/testfs/testdir0-0-0/d00000000/d000000000000000022/d0000000000023/f1461
8.d': Input/output error
I can cat and ls the file:
-rw-r--r-- 1 root root 7512 2004-08-05 21:54
/mnt/testfs/testdir0-0-0/d00000000/d000000000000000022/d0000000000023/f14622
.c
Other than the following log items
...
Aug 5 22:06:09 linux kernel: reiser4[cp(1088)]: cbk_level_lookup
(fs/reiser4/search.c:1033)[vs-3533]:
Aug 5 22:06:09 linux kernel: WARNING: Keys are inconsistent. Fsck?
Aug 5 22:06:09 linux kernel: ]: key_warning
(fs/reiser4/plugin/object.c:97)[nikita-717]:
Aug 5 22:06:09 linux kernel: WARNING: Error for inode 137388 (-5)
Aug 5 22:06:09 linux kernel: reiser4[cp(1087)]: cbk_level_lookup
(fs/reiser4/search.c:1033)[vs-3533]:
Aug 5 22:06:09 linux kernel: WARNING: Keys are inconsistent. Fsck?
Aug 5 22:06:09 linux kernel: reiser4[cp(1087)]: key_warning
(fs/reiser4/plugin/object.c:97)[nikita-717]:
Aug 5 22:06:09 linux kernel: WARNING: Error for inode 137404 (-5)
Aug 5 22:06:09 linux kernel: reiser4[cp(1087)]: cbk_level_lookup
(fs/reiser4/search.c:1033)[vs-3533]:
Aug 5 22:06:09 linux kernel: WARNING: Keys are inconsistent. Fsck?
...
I could find nothing to indicate there was any sort of disk/ide hardware
error. Yes, there were some of the (apparently) innocuous driver artifacts
as Nikita called them.
All the relevant info I could pull together is available at
http://dabbs.net/reiser4/stat_error_mongo.tar.bz2. Other than gathering this
info, I have left the system as is. Let me know if you need more info, for
me to poke around or run commands, retest, etc.
BTW, built with yesterday's r4 snapshot.
David
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
@ 2004-08-06 6:53 David Dabbs
2004-08-06 15:51 ` Vladimir V. Saveliev
0 siblings, 1 reply; 47+ messages in thread
From: David Dabbs @ 2004-08-06 6:53 UTC (permalink / raw)
To: 'ReiserFS List'
This is definitely reproducible. I rebooted and reran the script. Same
result at about the same point in the sequence (part way into first COPY
iteration).
Right now I'm running identical mongo config against reiserfs and it appears
to be running just fine into the second copy iteration.
Should someone from Namesys downloads the bzip tar I mentioned before here's
a summary of the changes in my version of mongo:
reiser_fract_tree.c
- added extensions to generated filenames bay adding rnd to a base
char of backtick. Extensions generated should be in the range of
'a' to 'a' + MAX_NAME_LEN.
mongo.pl
- added PHASE_MKDIRS
Makes directories on the target device from a static list.
- added PHASE_MKFILES
Makes files on the target device from a static list.
- added MAX_FNAME_LEN option
Allows me to use something different than a hard coded value of 6.
I ran it with 23.
- Some extra LOG calls so I can see commands being executed.
David
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-06 4:54 David Dabbs
@ 2004-08-06 7:31 ` mjt
0 siblings, 0 replies; 47+ messages in thread
From: mjt @ 2004-08-06 7:31 UTC (permalink / raw)
To: David Dabbs; +Cc: 'ReiserFS List'
On Thu, Aug 05, 2004 at 11:54:38PM -0500, David Dabbs wrote:
>
>All the relevant info I could pull together is available at
>http://dabbs.net/reiser4/stat_error_mongo.tar.bz2. Other than gathering this
>info, I have left the system as is. Let me know if you need more info, for
>me to poke around or run commands, retest, etc.
I dunno if you have it there already, but the metadata of the filesystem
is usually good to have included..
My just-woke-up two cents :)
--
mjt
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-06 6:53 Was able to reproduce "cp: cannot stat file.x: Input/output error" David Dabbs
@ 2004-08-06 15:51 ` Vladimir V. Saveliev
2004-08-06 17:10 ` Philippe Gramoullé
0 siblings, 1 reply; 47+ messages in thread
From: Vladimir V. Saveliev @ 2004-08-06 15:51 UTC (permalink / raw)
To: David Dabbs; +Cc: 'ReiserFS List'
Hello
David Dabbs wrote:
> This is definitely reproducible. I rebooted and reran the script. Same
> result at about the same point in the sequence (part way into first COPY
> iteration).
>
> Right now I'm running identical mongo config against reiserfs and it appears
> to be running just fine into the second copy iteration.
>
> Should someone from Namesys downloads the bzip tar I mentioned before here's
> a summary of the changes in my version of mongo:
>
Yes, i am running your version of mongo.
> reiser_fract_tree.c
> - added extensions to generated filenames bay adding rnd to a base
> char of backtick. Extensions generated should be in the range of
> 'a' to 'a' + MAX_NAME_LEN.
>
> mongo.pl
> - added PHASE_MKDIRS
> Makes directories on the target device from a static list.
> - added PHASE_MKFILES
> Makes files on the target device from a static list.
> - added MAX_FNAME_LEN option
> Allows me to use something different than a hard coded value of 6.
> I ran it with 23.
> - Some extra LOG calls so I can see commands being executed.
>
> David
>
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-06 15:51 ` Vladimir V. Saveliev
@ 2004-08-06 17:10 ` Philippe Gramoullé
2004-08-06 17:39 ` Vladimir V. Saveliev
` (2 more replies)
0 siblings, 3 replies; 47+ messages in thread
From: Philippe Gramoullé @ 2004-08-06 17:10 UTC (permalink / raw)
To: Vladimir V. Saveliev; +Cc: David Dabbs, 'ReiserFS List'
Hello,
I've run David's version of Mongo and hit the following error during the PHASE MODIFY sequence.
reiser4[mongo_modify(4127)]: commit_current_atom (fs/reiser4/txnmgr.c:1206)[nikita-3176]:
WARNING: Flushing like mad: 16384
All 3 mongo_modify processes are stuck in D+
Running a fsck --check now.
Thanks,
Philippe
On Fri, 06 Aug 2004 19:51:58 +0400
"Vladimir V. Saveliev" <vs@namesys.com> wrote:
| > Should someone from Namesys downloads the bzip tar I mentioned before here's
| > a summary of the changes in my version of mongo:
| >
|
| Yes, i am running your version of mongo.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-06 17:10 ` Philippe Gramoullé
@ 2004-08-06 17:39 ` Vladimir V. Saveliev
2004-08-06 19:06 ` Philippe Gramoullé
2004-08-07 4:14 ` Hans Reiser
2004-08-06 17:46 ` David Dabbs
2004-08-06 17:51 ` Alex Zarochentsev
2 siblings, 2 replies; 47+ messages in thread
From: Vladimir V. Saveliev @ 2004-08-06 17:39 UTC (permalink / raw)
To: Philippe Gramoullé; +Cc: David Dabbs, 'ReiserFS List'
Hello
Philippe Gramoullé wrote:
> Hello,
>
> I've run David's version of Mongo and hit the following error during the PHASE MODIFY sequence.
>
> reiser4[mongo_modify(4127)]: commit_current_atom (fs/reiser4/txnmgr.c:1206)[nikita-3176]:
> WARNING: Flushing like mad: 16384
>
This is harmless message
> All 3 mongo_modify processes are stuck in D+
>
So, it did not complete?
It is working yet here
> Running a fsck --check now.
>
> Thanks,
>
> Philippe
>
> On Fri, 06 Aug 2004 19:51:58 +0400
> "Vladimir V. Saveliev" <vs@namesys.com> wrote:
>
> | > Should someone from Namesys downloads the bzip tar I mentioned before here's
> | > a summary of the changes in my version of mongo:
> | >
> |
> | Yes, i am running your version of mongo.
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-06 17:10 ` Philippe Gramoullé
2004-08-06 17:39 ` Vladimir V. Saveliev
@ 2004-08-06 17:46 ` David Dabbs
2004-08-06 19:11 ` Philippe Gramoullé
2004-08-07 4:15 ` Hans Reiser
2004-08-06 17:51 ` Alex Zarochentsev
2 siblings, 2 replies; 47+ messages in thread
From: David Dabbs @ 2004-08-06 17:46 UTC (permalink / raw)
To: 'Philippe Gramoullé'; +Cc: 'ReiserFS List'
Philippe (or anyone else using my modified Mongo), note that if you use the
MAX_FNAME_LEN option you should restrict the value to something under 26.
The extension character is the backtick (dec 96) + the value of the random
number the code already generates left zero pad the filename in the sprintf.
i.e.
"%.c", '`' + (char) rnd
If you exceed 26 then you will generate filenames that may cause problems.
David
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-06 17:10 ` Philippe Gramoullé
2004-08-06 17:39 ` Vladimir V. Saveliev
2004-08-06 17:46 ` David Dabbs
@ 2004-08-06 17:51 ` Alex Zarochentsev
2004-08-06 19:10 ` Philippe Gramoullé
2 siblings, 1 reply; 47+ messages in thread
From: Alex Zarochentsev @ 2004-08-06 17:51 UTC (permalink / raw)
To: Philippe Gramoull?
Cc: Vladimir V. Saveliev, David Dabbs, 'ReiserFS List'
On Fri, Aug 06, 2004 at 07:10:45PM +0200, Philippe Gramoull? wrote:
>
> Hello,
>
> I've run David's version of Mongo and hit the following error during the PHASE MODIFY sequence.
>
> reiser4[mongo_modify(4127)]: commit_current_atom (fs/reiser4/txnmgr.c:1206)[nikita-3176]:
> WARNING: Flushing like mad: 16384
>
> All 3 mongo_modify processes are stuck in D+
did vmstat(8) show no disk i/o?
>
> Running a fsck --check now.
>
> Thanks,
>
> Philippe
>
> On Fri, 06 Aug 2004 19:51:58 +0400
> "Vladimir V. Saveliev" <vs@namesys.com> wrote:
>
> | > Should someone from Namesys downloads the bzip tar I mentioned before here's
> | > a summary of the changes in my version of mongo:
> | >
> |
> | Yes, i am running your version of mongo.
--
Alex.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-06 17:39 ` Vladimir V. Saveliev
@ 2004-08-06 19:06 ` Philippe Gramoullé
2004-08-07 4:14 ` Hans Reiser
1 sibling, 0 replies; 47+ messages in thread
From: Philippe Gramoullé @ 2004-08-06 19:06 UTC (permalink / raw)
To: Vladimir V. Saveliev; +Cc: David Dabbs, 'ReiserFS List'
Hello,
On Fri, 06 Aug 2004 21:39:15 +0400
"Vladimir V. Saveliev" <vs@namesys.com> wrote:
| > reiser4[mongo_modify(4127)]: commit_current_atom (fs/reiser4/txnmgr.c:1206)[nikita-3176]:
| > WARNING: Flushing like mad: 16384
| >
| This is harmless message
Ok,
|
| > All 3 mongo_modify processes are stuck in D+
| >
|
| So, it did not complete?
no
| It is working yet here
Here mongo_modify processes were stuck in D+ state doing nothing, strace has shown no activity.
Thanks,
Philippe
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-06 17:51 ` Alex Zarochentsev
@ 2004-08-06 19:10 ` Philippe Gramoullé
0 siblings, 0 replies; 47+ messages in thread
From: Philippe Gramoullé @ 2004-08-06 19:10 UTC (permalink / raw)
To: Alex Zarochentsev
Cc: Vladimir V. Saveliev, David Dabbs, 'ReiserFS List'
Hello Alex,
I was in a hurry to leave, and didn't have a look.
I'll try to reproduce the problem and get better description of the problem.
Thanks,
Philippe
On Fri, 6 Aug 2004 21:51:57 +0400
Alex Zarochentsev <zam@namesys.com> wrote:
| > All 3 mongo_modify processes are stuck in D+
|
| did vmstat(8) show no disk i/o?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-06 17:46 ` David Dabbs
@ 2004-08-06 19:11 ` Philippe Gramoullé
2004-08-07 4:15 ` Hans Reiser
1 sibling, 0 replies; 47+ messages in thread
From: Philippe Gramoullé @ 2004-08-06 19:11 UTC (permalink / raw)
To: David Dabbs; +Cc: 'ReiserFS List'
Hello david,
I copy/pasted the exact command line than you posted at the beginning of this thread, so with MAX_FNAME_LEN=23.
Thanks for the info.
Philippe
On Fri, 6 Aug 2004 12:46:02 -0500
"David Dabbs" <david@dabbs.net> wrote:
|
| If you exceed 26 then you will generate filenames that may cause problems.
|
| David
|
|
|
|
|
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-06 17:39 ` Vladimir V. Saveliev
2004-08-06 19:06 ` Philippe Gramoullé
@ 2004-08-07 4:14 ` Hans Reiser
1 sibling, 0 replies; 47+ messages in thread
From: Hans Reiser @ 2004-08-07 4:14 UTC (permalink / raw)
To: Vladimir V. Saveliev
Cc: Philippe Gramoullé, David Dabbs, 'ReiserFS List'
Vladimir V. Saveliev wrote:
> Hello
>
> Philippe Gramoullé wrote:
>
>> Hello,
>>
>> I've run David's version of Mongo and hit the following error during
>> the PHASE MODIFY sequence.
>>
>> reiser4[mongo_modify(4127)]: commit_current_atom
>> (fs/reiser4/txnmgr.c:1206)[nikita-3176]:
>> WARNING: Flushing like mad: 16384
>>
> This is harmless message
Then remove it.
>
>> All 3 mongo_modify processes are stuck in D+
>>
>
> So, it did not complete?
> It is working yet here
>
>> Running a fsck --check now.
>>
>> Thanks,
>>
>> Philippe
>>
>> On Fri, 06 Aug 2004 19:51:58 +0400
>> "Vladimir V. Saveliev" <vs@namesys.com> wrote:
>>
>> | > Should someone from Namesys downloads the bzip tar I mentioned
>> before here's
>> | > a summary of the changes in my version of mongo:
>> | > | | Yes, i am running your version of mongo.
>>
>>
>
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-06 17:46 ` David Dabbs
2004-08-06 19:11 ` Philippe Gramoullé
@ 2004-08-07 4:15 ` Hans Reiser
2004-08-07 6:46 ` David Dabbs
1 sibling, 1 reply; 47+ messages in thread
From: Hans Reiser @ 2004-08-07 4:15 UTC (permalink / raw)
To: David Dabbs; +Cc: 'Philippe Gramoullé', 'ReiserFS List'
David Dabbs wrote:
>Philippe (or anyone else using my modified Mongo), note that if you use the
>MAX_FNAME_LEN option you should restrict the value to something under 26.
>The extension character is the backtick (dec 96) + the value of the random
>number the code already generates left zero pad the filename in the sprintf.
>i.e.
>
> "%.c", '`' + (char) rnd
>
>If you exceed 26 then you will generate filenames that may cause problems.
>
>
Problems that indicate reiser4 bugs or mongo bugs?
>David
>
>
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-07 4:15 ` Hans Reiser
@ 2004-08-07 6:46 ` David Dabbs
2004-08-07 7:49 ` Hans Reiser
0 siblings, 1 reply; 47+ messages in thread
From: David Dabbs @ 2004-08-07 6:46 UTC (permalink / raw)
To: 'Hans Reiser'; +Cc: 'ReiserFS List'
> -----Original Message-----
> From: Hans Reiser [mailto:reiser@namesys.com]
> Sent: Friday, August 06, 2004 11:16 PM
> To: David Dabbs
> Cc: 'Philippe Gramoullé'; 'ReiserFS List'
> Subject: Re: Was able to reproduce "cp: cannot stat file.x: Input/output
> error"
>
> David Dabbs wrote:
>
> >i.e.
> >
> > "%.c", '`' + (char) rnd
> >
> >If you exceed 26 then you will generate filenames that may cause
> problems.
> >
> >
> Problems that indicate reiser4 bugs or mongo bugs?
>
Mongo. The pipe character is not legal in a filename as an example.
I think I have discovered the problem - unless there was a reason mongo was
issuing mount/unmount commands at the start/end of a mongo 'run' as well as
before/after _each phase_. I just removed the mount/unmount calls that
bracket each phase execution and I am not seeing the errors. In addition,
reiser4 is running much faster. For instance, with the 'extra'
mount/unmounts r4 reported a REAL TIME stats of approximately 700 & 40 for
CREATE and MKFILES. After removing the mount/un calls it is reporting 224 &
50 REAL TIME.
I will reboot and run a sequence of each of the filesystem configs overnight
and see what happens.
David
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-07 6:46 ` David Dabbs
@ 2004-08-07 7:49 ` Hans Reiser
2004-08-08 2:54 ` David Dabbs
2004-08-10 3:21 ` Valdis.Kletnieks
0 siblings, 2 replies; 47+ messages in thread
From: Hans Reiser @ 2004-08-07 7:49 UTC (permalink / raw)
To: David Dabbs; +Cc: 'ReiserFS List'
David Dabbs wrote:
>
>
>>-----Original Message-----
>>From: Hans Reiser [mailto:reiser@namesys.com]
>>Sent: Friday, August 06, 2004 11:16 PM
>>To: David Dabbs
>>Cc: 'Philippe Gramoullé'; 'ReiserFS List'
>>Subject: Re: Was able to reproduce "cp: cannot stat file.x: Input/output
>>error"
>>
>>David Dabbs wrote:
>>
>>
>>
>>>i.e.
>>>
>>> "%.c", '`' + (char) rnd
>>>
>>>If you exceed 26 then you will generate filenames that may cause
>>>
>>>
>>problems.
>>
>>
>>>
>>>
>>Problems that indicate reiser4 bugs or mongo bugs?
>>
>>
>>
>
>Mongo. The pipe character is not legal in a filename as an example.
>
>I think I have discovered the problem - unless there was a reason mongo was
>issuing mount/unmount commands at the start/end of a mongo 'run' as well as
>before/after _each phase_. I just removed the mount/unmount calls that
>bracket each phase execution and I am not seeing the errors. In addition,
>reiser4 is running much faster. For instance, with the 'extra'
>mount/unmounts r4 reported a REAL TIME stats of approximately 700 & 40 for
>CREATE and MKFILES. After removing the mount/un calls it is reporting 224 &
>50 REAL TIME.
>
>
can you quote code more in these emails?
>I will reboot and run a sequence of each of the filesystem configs overnight
>and see what happens.
>
>David
>
>
>
>
>
>
>
Probably someone wanted to separate the measurement of the phases. It
has been a while since I read mongo.....
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-07 7:49 ` Hans Reiser
@ 2004-08-08 2:54 ` David Dabbs
2004-08-10 3:21 ` Valdis.Kletnieks
1 sibling, 0 replies; 47+ messages in thread
From: David Dabbs @ 2004-08-08 2:54 UTC (permalink / raw)
Cc: 'ReiserFS List'
> -----Original Message-----
> From: Hans Reiser [mailto:reiser@namesys.com]
> Sent: Saturday, August 07, 2004 2:50 AM
> To: David Dabbs
> Cc: 'ReiserFS List'
> Subject: Re: Was able to reproduce "cp: cannot stat file.x: Input/output
> error"
>
> >
> >Mongo. The pipe character is not legal in a filename as an example.
> >
> >I think I have discovered the problem - unless there was a reason mongo
> was
> >issuing mount/unmount commands at the start/end of a mongo 'run' as well
> as
> >before/after _each phase_. I just removed the mount/unmount calls that
> >bracket each phase execution and I am not seeing the errors. In addition,
> >reiser4 is running much faster. For instance, with the 'extra'
> >mount/unmounts r4 reported a REAL TIME stats of approximately 700 & 40
> for
> >CREATE and MKFILES. After removing the mount/un calls it is reporting 224
> &
> >50 REAL TIME.
> >
> >
> can you quote code more in these emails?
>
Instead of cutting/pasting sections here see
http://dabbs.net/reiser4/mongopl.html. I have annotated my modifications to
mongo's mount handling with 'dmd.' There are other non-annotated changes I
have made, but they are not germane to this issue. If you want to know
exactly what is different, diff this with the mongo.pl in the build Elena
recently uploaded at
http://thebsh.namesys.com/benchmarks/mongo-2004.07.26.tar.gz.
I'm still rerunning tests to see if I can reproduce, but time is limited
this weekend.
David
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
[not found] <4115A979.5090002@namesys.com>
@ 2004-08-08 7:07 ` David Dabbs
2004-08-08 18:08 ` Hans Reiser
0 siblings, 1 reply; 47+ messages in thread
From: David Dabbs @ 2004-08-08 7:07 UTC (permalink / raw)
To: 'Hans Reiser'; +Cc: 'ReiserFS List'
> -----Original Message-----
> From: Hans Reiser [mailto:reiser@namesys.com]
> Sent: Saturday, August 07, 2004 11:18 PM
> To: David Dabbs
> Subject: Re: Was able to reproduce "cp: cannot stat file.x: Input/output
> error"
>
> >
> How about, code was X, now it is Y, with just the relevant parts of the
> code cited?
Probably the most significant change to mongo sources is how it generates
file names. Mongo.pl had a hard coded value of 6 for the "max_fname"
parameter passed to reiser_fract_tree. I made this a parameter one can
specify on the mongo command line. I passed it a value of 23.
In addition, reiser_fract_tree.c now generates file names (not directories)
with extensions. Before, every file name generated had none. Here is the
original and modified code.
CODE WAS
/* generate a unique filename */
void get_name_by_number(num_t this_files_number, char * str, char type)
{
double rnd;
char t[16];
/* We need to generate filenames of different lengths */
rnd=rand();
rnd=rnd/RAND_MAX*max_fname+1;
sprintf (t, "%%c%%0%ulu",(int) rnd );
sprintf (str, t, type, this_files_number);
}
NOW CODE IS
/* generate a unique filename */
void get_name_by_number(num_t this_files_number, char * str, char type)
{
double rnd;
char t[16];
/* We need to generate filenames of different lengths */
rnd=rand();
rnd=rnd/RAND_MAX*max_fname+1;
if( type == 'f' )
sprintf (t, "%%c%%0%ulu.%c",(int) rnd, '`'+(char)rnd );
else
sprintf (t, "%%c%%0%ulu",(int) rnd );
sprintf (str, t, type, this_files_number);
}
Backtick is the character immediately preceding 'a', so passing a value for
max_fname that generates a character greater than 'z' might generate an
extension character that might be a shell metacharacter. Hence my earlier
warning about using this parameter.
Finally, I added two phases to mongo. MKDIRS and MKFILES. The first makes
directories from a static list of directories cat-ted from a file. MKFILES
does the same, except with files. The files are created by executing the
following code taken directory from reiser_fract_tree:
/* make a file of a specified size */
void make_file(int size, char * fname)
{
char string [1025] = {0};
char * str = string;
int fd = 0;
int error;
static num_t this_files_number = 1;
/* open the file, and deal with the various errors that can occur */
if ((fd = open(fname, g_flags, 0666)) == -1 ) {
if (errno == ENOSPC) {
if (!already_whined) {
printf("reiser-2021A: out of disk (or inodes) space, will keep
trying\n");
already_whined = 1; /* we continue other file creation in out of
space conditions */
}
return;
}
/* it is sometimes useful to be able to run this program more than once
inside the same directory, and that means skipping over filenames
that
already exist. Thus we ignore EEXIST, and pay attention to all
else. */
if ( errno == EEXIST) { /* just skip existing file */
return;
}
perror ("open");
exit (errno);
}
/* close the file */
if (close(fd)) {
perror("close() failed");
exit(errno);
}
}
Since the above code simply opens and closes the file, every file created is
zero length, and it creates _many_ zero-length files. Before, all files
created during a mongo run had some content, though I'd need to go back and
check the size distribution function to be sure.
Finally, there is the bit about mounting and unmounting. Mongo.pl first
unmounts the target filesystem, then runs the proper mkfs command, mounted
the filesystem, ran df to get the block usage, then unmounted the
filesystem. To run each phase, mongo calls the function mongo_launcher,
which mounted the filesystem, ran the command for N iterations, then
unmounted the filesystem.
I made the following changes, see http://dabbs.net/reiser4/mongopl.html for
more context.
In
init_fsys:
don't unmount the filesystem after mkfs * df.
mongo_launcher:
don't call &mount_fsys; at beginning and &umount_fsys; at end of each
phase
Only umount_fsys at end of function mongo.
David
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-08 7:07 ` David Dabbs
@ 2004-08-08 18:08 ` Hans Reiser
2004-08-08 19:09 ` David Dabbs
2004-08-09 15:13 ` Nikita Danilov
0 siblings, 2 replies; 47+ messages in thread
From: Hans Reiser @ 2004-08-08 18:08 UTC (permalink / raw)
To: David Dabbs
Cc: 'ReiserFS List', Nikita Danilov, Alexander Zarochentcev,
E. Gryaznova
David Dabbs wrote:
>
>
>>-----Original Message-----
>>From: Hans Reiser [mailto:reiser@namesys.com]
>>Sent: Saturday, August 07, 2004 11:18 PM
>>To: David Dabbs
>>Subject: Re: Was able to reproduce "cp: cannot stat file.x: Input/output
>>error"
>>
>>
>>
>
>
>
>>How about, code was X, now it is Y, with just the relevant parts of the
>>code cited?
>>
>>
>
>Probably the most significant change to mongo sources is how it generates
>file names. Mongo.pl had a hard coded value of 6 for the "max_fname"
>parameter passed to reiser_fract_tree. I made this a parameter one can
>specify on the mongo command line. I passed it a value of 23.
>
>
most filenames are small. We have an optimization in reiser4 that
assumes filenames are less than 15 characters, which most of them are.
Maybe you should use a list of real filenames as well as a list of real
directory names. You can reuse the filenames, so long as they are
unique within a directory.
Hmm, it occurs to me that rather than using the 7 character filename
prefix in the hash we should use first 4 letters and last 3 letters.
Nikita, zam, what do you think?
>In addition, reiser_fract_tree.c now generates file names (not directories)
>with extensions. Before, every file name generated had none. Here is the
>original and modified code.
>
>
>CODE WAS
>/* generate a unique filename */
>void get_name_by_number(num_t this_files_number, char * str, char type)
>{
> double rnd;
> char t[16];
> /* We need to generate filenames of different lengths */
> rnd=rand();
> rnd=rnd/RAND_MAX*max_fname+1;
> sprintf (t, "%%c%%0%ulu",(int) rnd );
> sprintf (str, t, type, this_files_number);
>
>}
>
>NOW CODE IS
>
>/* generate a unique filename */
>void get_name_by_number(num_t this_files_number, char * str, char type)
>{
> double rnd;
> char t[16];
> /* We need to generate filenames of different lengths */
> rnd=rand();
> rnd=rnd/RAND_MAX*max_fname+1;
> if( type == 'f' )
> sprintf (t, "%%c%%0%ulu.%c",(int) rnd, '`'+(char)rnd );
> else
> sprintf (t, "%%c%%0%ulu",(int) rnd );
>
>
> sprintf (str, t, type, this_files_number);
>
>}
>
>Backtick is the character immediately preceding 'a', so passing a value for
>max_fname that generates a character greater than 'z' might generate an
>extension character that might be a shell metacharacter. Hence my earlier
>warning about using this parameter.
>
>Finally, I added two phases to mongo. MKDIRS and MKFILES. The first makes
>directories from a static list of directories cat-ted from a file. MKFILES
>does the same, except with files. The files are created by executing the
>following code taken directory from reiser_fract_tree:
>
>/* make a file of a specified size */
>void make_file(int size, char * fname)
>{
> char string [1025] = {0};
> char * str = string;
> int fd = 0;
> int error;
> static num_t this_files_number = 1;
>
> /* open the file, and deal with the various errors that can occur */
>
> if ((fd = open(fname, g_flags, 0666)) == -1 ) {
> if (errno == ENOSPC) {
> if (!already_whined) {
> printf("reiser-2021A: out of disk (or inodes) space, will keep
>trying\n");
> already_whined = 1; /* we continue other file creation in out of
> space conditions */
> }
> return;
> }
> /* it is sometimes useful to be able to run this program more than once
> inside the same directory, and that means skipping over filenames
>that
> already exist. Thus we ignore EEXIST, and pay attention to all
> else. */
> if ( errno == EEXIST) { /* just skip existing file */
> return;
> }
> perror ("open");
> exit (errno);
> }
>
> /* close the file */
> if (close(fd)) {
> perror("close() failed");
> exit(errno);
> }
>}
>
>Since the above code simply opens and closes the file, every file created is
>zero length, and it creates _many_ zero-length files.
>
Why would you want that?
> Before, all files
>created during a mongo run had some content, though I'd need to go back and
>check the size distribution function to be sure.
>
>
>Finally, there is the bit about mounting and unmounting. Mongo.pl first
>unmounts the target filesystem, then runs the proper mkfs command, mounted
>the filesystem, ran df to get the block usage, then unmounted the
>filesystem. To run each phase, mongo calls the function mongo_launcher,
>which mounted the filesystem, ran the command for N iterations, then
>unmounted the filesystem.
>
>I made the following changes, see http://dabbs.net/reiser4/mongopl.html for
>more context.
>
>In
>
>init_fsys:
> don't unmount the filesystem after mkfs * df.
>
>mongo_launcher:
> don't call &mount_fsys; at beginning and &umount_fsys; at end of each
>phase
>
>
>Only umount_fsys at end of function mongo.
>
>
>David
>
>
>
>
>
>
ok. Unmounting between phases was probably a bad idea, as it benchmarks
unmounting. It should be fixed.
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-08 18:08 ` Hans Reiser
@ 2004-08-08 19:09 ` David Dabbs
2004-08-09 6:17 ` Hans Reiser
2004-08-09 15:13 ` Nikita Danilov
1 sibling, 1 reply; 47+ messages in thread
From: David Dabbs @ 2004-08-08 19:09 UTC (permalink / raw)
To: 'Hans Reiser'
Cc: 'ReiserFS List', 'Nikita Danilov',
'Alexander Zarochentcev', 'E. Gryaznova'
> -----Original Message-----
> From: Hans Reiser [mailto:reiser@namesys.com]
> Sent: Sunday, August 08, 2004 1:09 PM
> To: David Dabbs
> Cc: 'ReiserFS List'; Nikita Danilov; Alexander Zarochentcev; E. Gryaznova
> Subject: Re: Was able to reproduce "cp: cannot stat file.x: Input/output
> error"
>
> most filenames are small. We have an optimization in reiser4 that
> assumes filenames are less than 15 characters, which most of them are.
>
This is true, but most file names are also not six characters or less with
no extensions.
> Maybe you should use a list of real filenames as well as a list of real
> directory names. You can reuse the filenames, so long as they are
> unique within a directory.
>
I'm not sure uniqueness is a requirement, as mongo allows for duplicate
filenames. I did use real dirs and files. This is what the MKDIRS and
MKFILES phases do. I generated a list of most of the actual directories and
files in my / partition. In my config these are created by MKFILES and
MKDIRS before the 'stock' mongo phases run.
> Hmm, it occurs to me that rather than using the 7 character filename
> prefix in the hash we should use first 4 letters and last 3 letters.
>
> Nikita, zam, what do you think?
>
Then you wouldn't store directories and filenames in lexicographic order as
you originally intended.
> >
> >Since the above code simply opens and closes the file, every file created
> is zero length, and it creates _many_ zero-length files.
> >
> Why would you want that?
I didn't explicitly _want_ zero length files, I just did the minimum to
generate the files. I can easily go back and write one byte to the files,
but that (shouldn't) really have an impact on the operations against the
files created by reiser_fract_tree and later
modified/read/overwritten/deleted by later phases because the files created
by MKFILES are in directories not manipulated by the stock mongo phases.
They use the files and directories created by reiser_fract_tree. The reason
I added the dirs and files created by MKDIRS/FILES was to have mongo work
with a filesystem 'loaded' with as much metadata as a real system might
have. If this adds nothing to the benchmarking then simply run with
MKFILES=off and MKDIRS=off.
> >
> >I made the following changes, see http://dabbs.net/reiser4/mongopl.html
> for more context.
> >
> >In
> >
> >init_fsys:
> > don't unmount the filesystem after mkfs * df.
> >
> >mongo_launcher:
> > don't call &mount_fsys; at beginning and &umount_fsys; at end of each
> >phase
> >
> >
> >Only umount_fsys at end of function mongo.
> >
> >
> >David
> >
> >
> ok. Unmounting between phases was probably a bad idea, as it benchmarks
> unmounting. It should be fixed.
Mounting/umounting between phases shouldn't cause errors, but I focused
there because the errors I saw didn't begin until the copy phase, which is
the first phase where mongo _reads_ what was created. Perhaps something is
not being committed, at least with my hardware/config, despite the fact that
there's a sync at the end of every phase iteration. I also added a sync
command before the final umount.
David
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-09 6:17 ` Hans Reiser
@ 2004-08-08 21:40 ` David Dabbs
2004-08-09 0:01 ` Hans Reiser
0 siblings, 1 reply; 47+ messages in thread
From: David Dabbs @ 2004-08-08 21:40 UTC (permalink / raw)
To: 'Hans Reiser'
Cc: 'ReiserFS List', 'Nikita Danilov',
'Alexander Zarochentcev', 'E. Gryaznova'
> -----Original Message-----
> From: Hans Reiser [mailto:reiser@namesys.com]
> Sent: Monday, August 09, 2004 1:18 AM
> To: David Dabbs
> Cc: 'ReiserFS List'; 'Nikita Danilov'; 'Alexander Zarochentcev'; 'E.
> Gryaznova'
> Subject: Re: Was able to reproduce "cp: cannot stat file.x: Input/output
> error"
>
>
...snip...
> Hans:
> yes, it [hashing] is only a good idea for filenames larger than 15
> characters.
>
Actually, larger than 7 + 8 + 8 characters, if LARGE_KEY is used. This means
that no keys were hashed in my tests, or the original mongo for that matter,
since no file name would have been considered 'large' in kassign.c. I should
up the file name limit slightly to exercise it with at least some hashed
keys.
> >
> it is extremely unrealistic to create the files in one phase and write
> to them in the next phase. the filesystem will behave completely
> differently.
>
Of course. The extra files I added are created once, and should never be
touched again since mongo only operates on the objects it creates. I am
simply loading the test filesystem with a reasonable facsimile of a 'normal'
Linux file set (with the exception that the files are all zero length). To
make these 'static' dirs/files really useful, mongo would need to be written
to use these directories for its randomly created files. As for my files all
being zero length, it would be easy to give them all some small file
content. If these additional dirs/files as currently created add no value to
the test, then do not enable these phases.
On top of this Mongo randomly creates (in PHASE CREATE) its directories and
files, with each file filled at creation with a random number of 'a'
characters. Each subsequent phase, if specified, does one of the following
operations in the order you see below. Since you said you hadn't looked at
mongo for some time, I have included the commands mongo executes. These
examples are from a run with three processes and three iterations, so each
command below was run three times in succession. Note that mongo drives each
phase from a file list, so the files are always manipulated in the same
order - the order in which they were created.
a) COPY command: Copy all files from some base iteration directory.
(update-flist.pl /mnt/testfs/testdir0-0-0 /mnt/testfs/testdir1-0-0
/var/tmp/mongo.flist0-0-0 /var/tmp/mongo.flist1-0-0 || touch ERR.file ) &
(update-flist.pl /mnt/testfs/testdir0-0-1 /mnt/testfs/testdir1-0-1
/var/tmp/mongo.flist0-0-1 /var/tmp/mongo.flist1-0-1 || touch ERR.file ) &
(update-flist.pl /mnt/testfs/testdir0-0-2 /mnt/testfs/testdir1-0-2
/var/tmp/mongo.flist0-0-2 /var/tmp/mongo.flist1-0-2 || touch ERR.file ) &
wait; sync
/*******************************************************************/
update-flist.pl
($DIR0, $DIR1, $FLIST0, $FLIST1) = @ARGV;
system("cp -r $DIR0 $DIR1");
open(FL0, $FLIST0) || DIE("Cannot open $FLIST0");
open(FL1, ">$FLIST1") || DIE("Cannot open $FLIST1");
while (<FL0>) {
s|^$DIR0|$DIR1|;
print FL1 $_;
}
close(FL1);
close(FL0);
/*******************************************************************/
b) APPEND command:
(cat /var/tmp/mongo.flist[01]-0-0 | grep -v -e /\$ | mongo_append 0.5 4096
off || touch ERR.file ) &
(cat /var/tmp/mongo.flist[01]-0-1 | grep -v -e /\$ | mongo_append 0.5 4096
off || touch ERR.file ) &
(cat /var/tmp/mongo.flist[01]-0-2 | grep -v -e /\$ | mongo_append 0.5 4096
off || touch ERR.file ) &
wait; sync
/*******************************************************************/
mongo_append.c
while (getline(&line_buffer, &line_buffer_size, stdin) != -1) {
int fd, written = 0;
char * lf_pos;
if ((lf_pos = index(line_buffer, '\n')) != NULL) *lf_pos = '\0';
if ( (fd = open(line_buffer,O_RDWR)) == -1) {
fprintf(stderr, "%s :", line_buffer);
perror("cannot open file");
continue;
}
append_size = append_factor * lseek(fd, 0, SEEK_END);
while (written < append_size) {
int error;
written += (error = write(fd, buffer,
( append_size - written < writesize ) ? append_size -
written:writesize));
if (error == -1) {
close(fd);
break;
}
}
if (use_fsync) {
if (fsync(fd) < 0) {
fprintf(stderr, "%s : failed to fsync file\n",
line_buffer);
exit(1);
}
}
close(fd);
}
free (buffer);
free (line_buffer);
return 0;
/*******************************************************************/
c) MODIFY command:
(cat /var/tmp/mongo.flist[01]-0-0 | grep -v -e /\$ | mongo_modify 0.02 4096
off || touch ERR.file ) &
(cat /var/tmp/mongo.flist[01]-0-1 | grep -v -e /\$ | mongo_modify 0.02 4096
off || touch ERR.file ) &
(cat /var/tmp/mongo.flist[01]-0-2 | grep -v -e /\$ | mongo_modify 0.02 4096
off || touch ERR.file ) &
wait; sync
/*******************************************************************/
mongo_modify.c
while (getline(&line_buffer, &line_buffer_size, stdin) != -1) {
int fd;
off_t fileend;
char * lf_pos;
if ((lf_pos = index(line_buffer, '\n')) != NULL) *lf_pos = '\0';
if ( (fd = open(line_buffer,O_RDWR)) == -1) {
perror("cannot open file");
continue;
}
fileend = lseek(fd,0,SEEK_END);
if (fileend == (off_t) - 1) {
perror("lseek failed");
close (fd);
continue;
}
if (!fileend) {
/* nothing to modify */
close (fd);
continue;
}
{
off_t region_size = (double)fileend * modify_factor;
off_t write_pos = (double)(fileend - region_size) * rand() /
(RAND_MAX + 1.0);
size_t bytes = 0;
if (lseek(fd, write_pos, SEEK_SET) == (off_t) - 1) {
perror("lseek failed");
close (fd);
continue;
}
while (bytes < region_size) {
size_t write_count = region_size - bytes;
if (write_count > write_buffer_size)
write_count = write_buffer_size;
write_count = write(fd, buffer, write_count);
if (write_count == 0) break;
if (write_count == -1) {
perror("write failed");
break;
}
bytes += write_count;
}
}
if (use_fsync) {
if (fsync(fd) < 0) {
fprintf(stderr, "%s : failed to fsync file\n",
line_buffer);
exit(1);
}
}
close(fd);
}
free (buffer);
free (line_buffer);
return 0;
/*******************************************************************/
d) OVERWRITE command:
( cat /var/tmp/mongo.flist[01]-0-0 | grep -v -e /\$ | mongo_modify 1 4096
off || touch ERR.file ) &
( cat /var/tmp/mongo.flist[01]-0-1 | grep -v -e /\$ | mongo_modify 1 4096
off || touch ERR.file ) &
( cat /var/tmp/mongo.flist[01]-0-2 | grep -v -e /\$ | mongo_modify 1 4096
off || touch ERR.file ) &
wait; sync
See mongo_modify.c above.
/*******************************************************************/
e) READ command:
(find /mnt/testfs/testdir[01]-1-0 -type f | mongo_read || touch ERR.file ) &
(find /mnt/testfs/testdir[01]-1-1 -type f | mongo_read || touch ERR.file ) &
(find /mnt/testfs/testdir[01]-1-2 -type f | mongo_read || touch ERR.file ) &
wait; sync
/*******************************************************************/
mongo_read.c
/* Read all file names from the standard input */
while ((rd = getline(&file_name_buf, &file_name_buf_size, stdin)) !=
-1) {
/* remove the new line character. */
if (rd > 0 && file_name_buf[rd - 1] == '\n')
file_name_buf[rd - 1] = 0;
/* open the file */
fd = open (file_name_buf, O_RDONLY);
if (fd == -1) {
fprintf (stderr, "Open failed (%s)\n", strerror
(errno));
return errno;
}
/* read the file */
while (1) {
rd = read (fd, read_buf, bufsize);
if (rd == -1) {
fprintf (stderr, "Read failed (%s)\n",
strerror (errno));
return errno;
}
if (rd == 0)
break; /* EOF */
/* file consists of 'a'. Check that */
if (char_to_check) {
int j;
for (j = 0; j < rd; j ++)
if (read_buf[j] != char_to_check) {
fprintf (stderr, "Incorrect
data were read\n");
return EIO;
}
}
}
close (fd);
}
if (file_name_buf)
free(file_name_buf);
free (read_buf);
snapstats();
return 0;
/*******************************************************************/
f) STATS command:
((cat /var/tmp/mongo.flist[01]-0-0 | xargs ls -d) > /dev/null || touch
ERR.file ) &
((cat /var/tmp/mongo.flist[01]-0-1 | xargs ls -d) > /dev/null || touch
ERR.file ) &
((cat /var/tmp/mongo.flist[01]-0-2 | xargs ls -d) > /dev/null || touch
ERR.file ) &
wait; sync
g) DELETE command:
( rm -r /mnt/testfs/testdir[01]-0-0 || touch ERR.file ) &
( rm -r /mnt/testfs/testdir[01]-0-1 || touch ERR.file ) &
( rm -r /mnt/testfs/testdir[01]-0-2 || touch ERR.file ) &
wait; sync
Finally, if specified, it executes the large file phases
h) dd_writing_largefile command:
( dd if=/dev/zero of=/mnt/testfs/largefile bs=1M count=768 ) &
( dd if=/dev/zero of=/mnt/testfs/largefile bs=1M count=768 ) &
( dd if=/dev/zero of=/mnt/testfs/largefile bs=1M count=768 ) &
wait; sync
i) dd_reading_largefile command:
( dd of=/dev/null if=/mnt/testfs/largefile bs=1M count=768 ) &
( dd of=/dev/null if=/mnt/testfs/largefile bs=1M count=768 ) &
( dd of=/dev/null if=/mnt/testfs/largefile bs=1M count=768 ) &
wait; sync
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-08 21:40 ` David Dabbs
@ 2004-08-09 0:01 ` Hans Reiser
2004-08-09 1:55 ` David Dabbs
2004-08-09 2:38 ` David Dabbs
0 siblings, 2 replies; 47+ messages in thread
From: Hans Reiser @ 2004-08-09 0:01 UTC (permalink / raw)
To: David Dabbs
Cc: 'ReiserFS List', 'Nikita Danilov',
'Alexander Zarochentcev', 'E. Gryaznova'
David Dabbs wrote:
>
>
>>-----Original Message-----
>>From: Hans Reiser [mailto:reiser@namesys.com]
>>Sent: Monday, August 09, 2004 1:18 AM
>>To: David Dabbs
>>Cc: 'ReiserFS List'; 'Nikita Danilov'; 'Alexander Zarochentcev'; 'E.
>>Gryaznova'
>>Subject: Re: Was able to reproduce "cp: cannot stat file.x: Input/output
>>error"
>>
>>
>>
>>
>
>...snip...
>
>
>
>>Hans:
>>yes, it [hashing] is only a good idea for filenames larger than 15
>>characters.
>>
>>
>>
>
>Actually, larger than 7 + 8 + 8 characters, if LARGE_KEY is used.
>
Oh. Ok. LARGE_KEY is the only thing that should be used.
> This means
>that no keys were hashed in my tests, or the original mongo for that matter,
>since no file name would have been considered 'large' in kassign.c. I should
>up the file name limit slightly to exercise it with at least some hashed
>keys.
>
>
>
>>it is extremely unrealistic to create the files in one phase and write
>>to them in the next phase. the filesystem will behave completely
>>differently.
>>
>>
>>
>
>Of course. The extra files I added are created once, and should never be
>touched again since mongo only operates on the objects it creates. I am
>simply loading the test filesystem with a reasonable facsimile of a 'normal'
>Linux file set (with the exception that the files are all zero length).
>
a huge exception. I don't grok what you are trying to do here.
> To
>make these 'static' dirs/files really useful, mongo would need to be written
>to use these directories for its randomly created files. As for my files all
>being zero length, it would be easy to give them all some small file
>content. If these additional dirs/files as currently created add no value to
>the test, then do not enable these phases.
>
>On top of this Mongo randomly creates (in PHASE CREATE) its directories and
>files, with each file filled at creation with a random number of 'a'
>characters. Each subsequent phase, if specified, does one of the following
>operations in the order you see below. Since you said you hadn't looked at
>mongo for some time, I have included the commands mongo executes. These
>examples are from a run with three processes and three iterations, so each
>command below was run three times in succession. Note that mongo drives each
>phase from a file list, so the files are always manipulated in the same
>order - the order in which they were created.
>
>a) COPY command: Copy all files from some base iteration directory.
>(update-flist.pl /mnt/testfs/testdir0-0-0 /mnt/testfs/testdir1-0-0
>/var/tmp/mongo.flist0-0-0 /var/tmp/mongo.flist1-0-0 || touch ERR.file ) &
>(update-flist.pl /mnt/testfs/testdir0-0-1 /mnt/testfs/testdir1-0-1
>/var/tmp/mongo.flist0-0-1 /var/tmp/mongo.flist1-0-1 || touch ERR.file ) &
>(update-flist.pl /mnt/testfs/testdir0-0-2 /mnt/testfs/testdir1-0-2
>/var/tmp/mongo.flist0-0-2 /var/tmp/mongo.flist1-0-2 || touch ERR.file ) &
>wait; sync
>
>
>/*******************************************************************/
>update-flist.pl
>
>($DIR0, $DIR1, $FLIST0, $FLIST1) = @ARGV;
>
>system("cp -r $DIR0 $DIR1");
>
>open(FL0, $FLIST0) || DIE("Cannot open $FLIST0");
>open(FL1, ">$FLIST1") || DIE("Cannot open $FLIST1");
>while (<FL0>) {
> s|^$DIR0|$DIR1|;
> print FL1 $_;
>}
>close(FL1);
>close(FL0);
>
>/*******************************************************************/
>
>b) APPEND command:
>(cat /var/tmp/mongo.flist[01]-0-0 | grep -v -e /\$ | mongo_append 0.5 4096
>off || touch ERR.file ) &
>(cat /var/tmp/mongo.flist[01]-0-1 | grep -v -e /\$ | mongo_append 0.5 4096
>off || touch ERR.file ) &
>(cat /var/tmp/mongo.flist[01]-0-2 | grep -v -e /\$ | mongo_append 0.5 4096
>off || touch ERR.file ) &
>wait; sync
>
>/*******************************************************************/
>mongo_append.c
>
> while (getline(&line_buffer, &line_buffer_size, stdin) != -1) {
> int fd, written = 0;
> char * lf_pos;
>
> if ((lf_pos = index(line_buffer, '\n')) != NULL) *lf_pos = '\0';
>
> if ( (fd = open(line_buffer,O_RDWR)) == -1) {
> fprintf(stderr, "%s :", line_buffer);
> perror("cannot open file");
> continue;
> }
>
> append_size = append_factor * lseek(fd, 0, SEEK_END);
>
> while (written < append_size) {
> int error;
> written += (error = write(fd, buffer,
> ( append_size - written < writesize ) ? append_size -
>written:writesize));
> if (error == -1) {
> close(fd);
> break;
> }
> }
> if (use_fsync) {
> if (fsync(fd) < 0) {
> fprintf(stderr, "%s : failed to fsync file\n",
>line_buffer);
> exit(1);
> }
> }
> close(fd);
> }
> free (buffer);
> free (line_buffer);
> return 0;
>
>/*******************************************************************/
>
>
>c) MODIFY command:
>(cat /var/tmp/mongo.flist[01]-0-0 | grep -v -e /\$ | mongo_modify 0.02 4096
>off || touch ERR.file ) &
>(cat /var/tmp/mongo.flist[01]-0-1 | grep -v -e /\$ | mongo_modify 0.02 4096
>off || touch ERR.file ) &
>(cat /var/tmp/mongo.flist[01]-0-2 | grep -v -e /\$ | mongo_modify 0.02 4096
>off || touch ERR.file ) &
>wait; sync
>
>/*******************************************************************/
>mongo_modify.c
>
> while (getline(&line_buffer, &line_buffer_size, stdin) != -1) {
> int fd;
> off_t fileend;
> char * lf_pos;
>
> if ((lf_pos = index(line_buffer, '\n')) != NULL) *lf_pos = '\0';
>
> if ( (fd = open(line_buffer,O_RDWR)) == -1) {
> perror("cannot open file");
> continue;
> }
> fileend = lseek(fd,0,SEEK_END);
>
> if (fileend == (off_t) - 1) {
> perror("lseek failed");
> close (fd);
> continue;
> }
> if (!fileend) {
> /* nothing to modify */
> close (fd);
> continue;
> }
>
> {
> off_t region_size = (double)fileend * modify_factor;
> off_t write_pos = (double)(fileend - region_size) * rand() /
>(RAND_MAX + 1.0);
> size_t bytes = 0;
>
> if (lseek(fd, write_pos, SEEK_SET) == (off_t) - 1) {
> perror("lseek failed");
> close (fd);
> continue;
> }
>
> while (bytes < region_size) {
> size_t write_count = region_size - bytes;
>
> if (write_count > write_buffer_size)
> write_count = write_buffer_size;
>
> write_count = write(fd, buffer, write_count);
>
> if (write_count == 0) break;
>
> if (write_count == -1) {
> perror("write failed");
> break;
> }
>
> bytes += write_count;
> }
>
> }
> if (use_fsync) {
> if (fsync(fd) < 0) {
> fprintf(stderr, "%s : failed to fsync file\n",
>line_buffer);
> exit(1);
> }
> }
> close(fd);
> }
> free (buffer);
> free (line_buffer);
> return 0;
>
>/*******************************************************************/
>
>
>d) OVERWRITE command:
>( cat /var/tmp/mongo.flist[01]-0-0 | grep -v -e /\$ | mongo_modify 1 4096
>off || touch ERR.file ) &
>( cat /var/tmp/mongo.flist[01]-0-1 | grep -v -e /\$ | mongo_modify 1 4096
>off || touch ERR.file ) &
>( cat /var/tmp/mongo.flist[01]-0-2 | grep -v -e /\$ | mongo_modify 1 4096
>off || touch ERR.file ) &
>wait; sync
>
>
>See mongo_modify.c above.
>
>/*******************************************************************/
>
>e) READ command:
>(find /mnt/testfs/testdir[01]-1-0 -type f | mongo_read || touch ERR.file ) &
>(find /mnt/testfs/testdir[01]-1-1 -type f | mongo_read || touch ERR.file ) &
>(find /mnt/testfs/testdir[01]-1-2 -type f | mongo_read || touch ERR.file ) &
>wait; sync
>
>/*******************************************************************/
>mongo_read.c
>
> /* Read all file names from the standard input */
> while ((rd = getline(&file_name_buf, &file_name_buf_size, stdin)) !=
>-1) {
>
> /* remove the new line character. */
> if (rd > 0 && file_name_buf[rd - 1] == '\n')
> file_name_buf[rd - 1] = 0;
>
> /* open the file */
> fd = open (file_name_buf, O_RDONLY);
> if (fd == -1) {
> fprintf (stderr, "Open failed (%s)\n", strerror
>(errno));
> return errno;
> }
>
> /* read the file */
> while (1) {
> rd = read (fd, read_buf, bufsize);
> if (rd == -1) {
> fprintf (stderr, "Read failed (%s)\n",
>strerror (errno));
> return errno;
> }
> if (rd == 0)
> break; /* EOF */
>
> /* file consists of 'a'. Check that */
> if (char_to_check) {
> int j;
> for (j = 0; j < rd; j ++)
> if (read_buf[j] != char_to_check) {
> fprintf (stderr, "Incorrect
>data were read\n");
> return EIO;
> }
> }
> }
> close (fd);
> }
>
> if (file_name_buf)
> free(file_name_buf);
> free (read_buf);
> snapstats();
> return 0;
>
>/*******************************************************************/
>
>f) STATS command:
>((cat /var/tmp/mongo.flist[01]-0-0 | xargs ls -d) > /dev/null || touch
>ERR.file ) &
>((cat /var/tmp/mongo.flist[01]-0-1 | xargs ls -d) > /dev/null || touch
>ERR.file ) &
>((cat /var/tmp/mongo.flist[01]-0-2 | xargs ls -d) > /dev/null || touch
>ERR.file ) &
>wait; sync
>
>
>g) DELETE command:
>( rm -r /mnt/testfs/testdir[01]-0-0 || touch ERR.file ) &
>( rm -r /mnt/testfs/testdir[01]-0-1 || touch ERR.file ) &
>( rm -r /mnt/testfs/testdir[01]-0-2 || touch ERR.file ) &
>wait; sync
>
>
>Finally, if specified, it executes the large file phases
>
>h) dd_writing_largefile command:
>( dd if=/dev/zero of=/mnt/testfs/largefile bs=1M count=768 ) &
>( dd if=/dev/zero of=/mnt/testfs/largefile bs=1M count=768 ) &
>( dd if=/dev/zero of=/mnt/testfs/largefile bs=1M count=768 ) &
>wait; sync
>
>i) dd_reading_largefile command:
>( dd of=/dev/null if=/mnt/testfs/largefile bs=1M count=768 ) &
>( dd of=/dev/null if=/mnt/testfs/largefile bs=1M count=768 ) &
>( dd of=/dev/null if=/mnt/testfs/largefile bs=1M count=768 ) &
>wait; sync
>
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-09 0:01 ` Hans Reiser
@ 2004-08-09 1:55 ` David Dabbs
2004-08-09 17:43 ` Hans Reiser
2004-08-09 2:38 ` David Dabbs
1 sibling, 1 reply; 47+ messages in thread
From: David Dabbs @ 2004-08-09 1:55 UTC (permalink / raw)
To: 'Hans Reiser'
Cc: 'ReiserFS List', 'Nikita Danilov',
'Alexander Zarochentcev', 'E. Gryaznova'
> >
> >>Hans:
> >>yes, it [hashing] is only a good idea for filenames larger than 15
> >>characters.
> >>
> >>
> >>
> >
> >Actually, larger than 7 + 8 + 8 characters, if LARGE_KEY is used.
> >
> Oh. Ok. LARGE_KEY is the only thing that should be used.
>
> >
> >>it is extremely unrealistic to create the files in one phase and write
> >>to them in the next phase. the filesystem will behave completely
> >>differently.
> >>
> >
> >Of course. The extra files I added are created once, and should never be
> >touched again since mongo only operates on the objects it creates. I am
> >simply loading the test filesystem with a reasonable facsimile of a
> 'normal' Linux file set (with the exception that the files are all zero
> length).
> >
> a huge exception. I don't grok what you are trying to do here.
>
Mongo starts with a clean slate filesystem. I was attempting to run the
tests on top of something other than a clean filesystem, because that is an
exceptional situation. Based on your reaction, it is obviously not
detrimental to measuring relative performance measures. I didn't really care
about the performance of the MKFILES & MKDIRS phases themselves - that was
irrelevant. I simply wanted to run the tests on a filesystem that looked as
much as possible as what one might start with under 'normal' conditions. The
only reason the files were zero length - and not a copy of real files - was
that at the time I had limited disk space with which to test.
But enough of extensions and experiments. I am in no way trying to press a
case for my MKDIRS/FILES experiments. I do, however, believe mongo should be
run with file names longer than six characters and that have extensions.
This is not representative of Linux file system usage, nor does it exercise
reiser features designed to improve performance, e.g. fibration. In
addition, as you point out, it should probably not benchmark mounting and
unmounting the filesystem -- unless there was good reason to have done so.
The real issue, for me at least, is finding a method of determining whether
the errors I saw were due to hardware or software; if it is the latter,
whether it can be traced to mongo, the filesystem or Linux.
David
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-09 0:01 ` Hans Reiser
2004-08-09 1:55 ` David Dabbs
@ 2004-08-09 2:38 ` David Dabbs
2004-08-09 17:59 ` Alex Zarochentsev
1 sibling, 1 reply; 47+ messages in thread
From: David Dabbs @ 2004-08-09 2:38 UTC (permalink / raw)
To: 'Hans Reiser'; +Cc: 'ReiserFS List'
One other somewhat related FYI. In the recent fibration discussion you
thought it would probably be good to make FIBRATION_EXT_1 the default
fibration plugin instead of FIBRATION_DOT_O, but dot-O is still the default:
[PSET_FIBRATION] = {
.type = REISER4_FIBRATION_PLUGIN_TYPE,
.id = FIBRATION_DOT_O
},
David
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-08 19:09 ` David Dabbs
@ 2004-08-09 6:17 ` Hans Reiser
2004-08-08 21:40 ` David Dabbs
0 siblings, 1 reply; 47+ messages in thread
From: Hans Reiser @ 2004-08-09 6:17 UTC (permalink / raw)
To: David Dabbs
Cc: 'ReiserFS List', 'Nikita Danilov',
'Alexander Zarochentcev', 'E. Gryaznova'
David Dabbs wrote:
>
>
>>-----Original Message-----
>>From: Hans Reiser [mailto:reiser@namesys.com]
>>Sent: Sunday, August 08, 2004 1:09 PM
>>To: David Dabbs
>>Cc: 'ReiserFS List'; Nikita Danilov; Alexander Zarochentcev; E. Gryaznova
>>Subject: Re: Was able to reproduce "cp: cannot stat file.x: Input/output
>>error"
>>
>>most filenames are small. We have an optimization in reiser4 that
>>assumes filenames are less than 15 characters, which most of them are.
>>
>>
>>
>
>This is true, but most file names are also not six characters or less with
>no extensions.
>
>
>
>>Maybe you should use a list of real filenames as well as a list of real
>>directory names. You can reuse the filenames, so long as they are
>>unique within a directory.
>>
>>
>>
>
>I'm not sure uniqueness is a requirement, as mongo allows for duplicate
>filenames. I did use real dirs and files. This is what the MKDIRS and
>MKFILES phases do. I generated a list of most of the actual directories and
>files in my / partition. In my config these are created by MKFILES and
>MKDIRS before the 'stock' mongo phases run.
>
>
>
>>Hmm, it occurs to me that rather than using the 7 character filename
>>prefix in the hash we should use first 4 letters and last 3 letters.
>>
>>Nikita, zam, what do you think?
>>
>>
>>
>
>Then you wouldn't store directories and filenames in lexicographic order as
>you originally intended.
>
>
yes, it is only a good idea for filenames larger than 15 characters.
>
>
>
>>>Since the above code simply opens and closes the file, every file created
>>>
>>>
>>is zero length, and it creates _many_ zero-length files.
>>
>>
>>Why would you want that?
>>
>>
>
>I didn't explicitly _want_ zero length files, I just did the minimum to
>generate the files. I can easily go back and write one byte to the files,
>but that (shouldn't) really have an impact on the operations against the
>files created by reiser_fract_tree and later
>modified/read/overwritten/deleted by later phases because the files created
>by MKFILES are in directories not manipulated by the stock mongo phases.
>They use the files and directories created by reiser_fract_tree. The reason
>I added the dirs and files created by MKDIRS/FILES was to have mongo work
>with a filesystem 'loaded' with as much metadata as a real system might
>have. If this adds nothing to the benchmarking then simply run with
>MKFILES=off and MKDIRS=off.
>
>
it is extremely unrealistic to create the files in one phase and write
to them in the next phase. the filesystem will behave completely
differently.
>
>
>
>>>I made the following changes, see http://dabbs.net/reiser4/mongopl.html
>>>
>>>
>>for more context.
>>
>>
>>>In
>>>
>>>init_fsys:
>>> don't unmount the filesystem after mkfs * df.
>>>
>>>mongo_launcher:
>>> don't call &mount_fsys; at beginning and &umount_fsys; at end of each
>>>phase
>>>
>>>
>>>Only umount_fsys at end of function mongo.
>>>
>>>
>>>David
>>>
>>>
>>>
>>>
>>ok. Unmounting between phases was probably a bad idea, as it benchmarks
>>unmounting. It should be fixed.
>>
>>
>
>Mounting/umounting between phases shouldn't cause errors, but I focused
>there because the errors I saw didn't begin until the copy phase, which is
>the first phase where mongo _reads_ what was created. Perhaps something is
>not being committed, at least with my hardware/config, despite the fact that
>there's a sync at the end of every phase iteration. I also added a sync
>command before the final umount.
>
>David
>
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-08 18:08 ` Hans Reiser
2004-08-08 19:09 ` David Dabbs
@ 2004-08-09 15:13 ` Nikita Danilov
2004-08-09 17:48 ` Hans Reiser
1 sibling, 1 reply; 47+ messages in thread
From: Nikita Danilov @ 2004-08-09 15:13 UTC (permalink / raw)
To: Hans Reiser
Cc: David Dabbs, 'ReiserFS List', Alexander Zarochentcev,
E. Gryaznova
Hans Reiser writes:
> David Dabbs wrote:
>
> >
> >
> >>-----Original Message-----
> >>From: Hans Reiser [mailto:reiser@namesys.com]
> >>Sent: Saturday, August 07, 2004 11:18 PM
> >>To: David Dabbs
> >>Subject: Re: Was able to reproduce "cp: cannot stat file.x: Input/output
> >>error"
> >>
> >>
> >>
> >
> >
> >
> >>How about, code was X, now it is Y, with just the relevant parts of the
> >>code cited?
> >>
> >>
> >
> >Probably the most significant change to mongo sources is how it generates
> >file names. Mongo.pl had a hard coded value of 6 for the "max_fname"
> >parameter passed to reiser_fract_tree. I made this a parameter one can
> >specify on the mongo command line. I passed it a value of 23.
> >
> >
> most filenames are small. We have an optimization in reiser4 that
> assumes filenames are less than 15 characters, which most of them are.
With large keys (which is the default) up to 23 characters are
embedded into key.
[...]
Nikita.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-09 1:55 ` David Dabbs
@ 2004-08-09 17:43 ` Hans Reiser
2004-08-09 18:32 ` David Dabbs
0 siblings, 1 reply; 47+ messages in thread
From: Hans Reiser @ 2004-08-09 17:43 UTC (permalink / raw)
To: David Dabbs
Cc: 'ReiserFS List', 'Nikita Danilov',
'Alexander Zarochentcev', 'E. Gryaznova'
David Dabbs wrote:
>
>
>>>>Hans:
>>>>yes, it [hashing] is only a good idea for filenames larger than 15
>>>>characters.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>Actually, larger than 7 + 8 + 8 characters, if LARGE_KEY is used.
>>>
>>>
>>>
>>Oh. Ok. LARGE_KEY is the only thing that should be used.
>>
>>
>>
>>>>it is extremely unrealistic to create the files in one phase and write
>>>>to them in the next phase. the filesystem will behave completely
>>>>differently.
>>>>
>>>>
>>>>
>>>Of course. The extra files I added are created once, and should never be
>>>touched again since mongo only operates on the objects it creates. I am
>>>simply loading the test filesystem with a reasonable facsimile of a
>>>
>>>
>>'normal' Linux file set (with the exception that the files are all zero
>>length).
>>
>>
>>a huge exception. I don't grok what you are trying to do here.
>>
>>
>>
>
>Mongo starts with a clean slate filesystem. I was attempting to run the
>tests on top of something other than a clean filesystem, because that is an
>exceptional situation. Based on your reaction, it is obviously not
>detrimental to measuring relative performance measures. I didn't really care
>about the performance of the MKFILES & MKDIRS phases themselves - that was
>irrelevant. I simply wanted to run the tests on a filesystem that looked as
>much as possible as what one might start with under 'normal' conditions. The
>only reason the files were zero length - and not a copy of real files - was
>that at the time I had limited disk space with which to test.
>
>But enough of extensions and experiments. I am in no way trying to press a
>case for my MKDIRS/FILES experiments. I do, however, believe mongo should be
>run with file names longer than six characters
>
ok.
>and that have extensions.
>
>
this should be without effect, provided that the file creation order is
the same as the hash/fibration order.
>This is not representative of Linux file system usage, nor does it exercise
>reiser features designed to improve performance, e.g. fibration. In
>addition, as you point out, it should probably not benchmark mounting and
>unmounting the filesystem -- unless there was good reason to have done so.
>
>The real issue, for me at least, is finding a method of determining whether
>the errors I saw were due to hardware or software; if it is the latter,
>whether it can be traced to mongo, the filesystem or Linux.
>
>David
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-09 15:13 ` Nikita Danilov
@ 2004-08-09 17:48 ` Hans Reiser
0 siblings, 0 replies; 47+ messages in thread
From: Hans Reiser @ 2004-08-09 17:48 UTC (permalink / raw)
To: Nikita Danilov
Cc: David Dabbs, 'ReiserFS List', Alexander Zarochentcev,
E. Gryaznova
Nikita Danilov wrote:
>
>With large keys (which is the default) up to 23 characters are
>embedded into key.
>
Thanks for coding it that way.:)
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-09 2:38 ` David Dabbs
@ 2004-08-09 17:59 ` Alex Zarochentsev
2004-08-09 18:22 ` David Dabbs
0 siblings, 1 reply; 47+ messages in thread
From: Alex Zarochentsev @ 2004-08-09 17:59 UTC (permalink / raw)
To: David Dabbs; +Cc: 'Hans Reiser', 'ReiserFS List'
On Sun, Aug 08, 2004 at 09:38:29PM -0500, David Dabbs wrote:
>
>
> One other somewhat related FYI. In the recent fibration discussion you
> thought it would probably be good to make FIBRATION_EXT_1 the default
> fibration plugin instead of FIBRATION_DOT_O, but dot-O is still the default:
>
> [PSET_FIBRATION] = {
> .type = REISER4_FIBRATION_PLUGIN_TYPE,
> .id = FIBRATION_DOT_O
> },
mkfs.reiser4 sets the fs-wide default fibration plugin.
> David
>
--
Alex.
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-09 17:59 ` Alex Zarochentsev
@ 2004-08-09 18:22 ` David Dabbs
2004-08-09 18:42 ` Alex Zarochentsev
0 siblings, 1 reply; 47+ messages in thread
From: David Dabbs @ 2004-08-09 18:22 UTC (permalink / raw)
To: 'Alex Zarochentsev'; +Cc: 'ReiserFS List'
> -----Original Message-----
> From: Alex Zarochentsev [mailto:zam@namesys.com]
> Sent: Monday, August 09, 2004 1:00 PM
> To: David Dabbs
>
> mkfs.reiser4 sets the fs-wide default fibration plugin.
>
> Alex.
[David Dabbs]
So does mkfs.reiser4 using ext-1 as its default? I don't have the code
available to check before asking.
David
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-09 17:43 ` Hans Reiser
@ 2004-08-09 18:32 ` David Dabbs
0 siblings, 0 replies; 47+ messages in thread
From: David Dabbs @ 2004-08-09 18:32 UTC (permalink / raw)
To: 'ReiserFS List'; +Cc: 'Hans Reiser'
> -----Original Message-----
> From: Hans Reiser [mailto:reiser@namesys.com]
> Sent: Monday, August 09, 2004 12:43 PM
>
> David Dabbs wrote:
> >
> >But enough of extensions and experiments. I am in no way trying to press
> >a case for my MKDIRS/FILES experiments. I do, however, believe mongo
> >should run with file names longer than six characters
>
> ok.
>
> >and that have extensions.
> >
> >
> this should be without effect, provided that the file creation order is
> the same as the hash/fibration order.
>
[David Dabbs]
Yes, it is moot when using exclusively all short, extension-less file names,
because all files have the same fibration bits and no file has the hash bit
in key element 2. But with some long file names (len > 23), extensions and
DOT-O, and especially EXT-1, all the .a's, .b's, etc. would be grouped
together in-tree and on-disk, then ordered by file name. This ordering
differs from how mongo currently runs, though the net effect on performance
remains to be measured.
Besides the fact that configuring mongo this way is closer to average usage,
the other (potential) benefit is that it exercises more of the reiser4 code
base. But having it exercise more code may not matter to, or may be
orthogonal to, generating meaningful relative performance metrics.
As an aside, if Namesys shipped r4 to Andrew Morton, is there an ETA on when
he will include it in an -mm patch?
David
p.s. For those benchmarking ext3 vs. r4 using mongo or some other tool,
there is an ext3 performance regression patch included in -rc3-mm2, so YMMV
with rc3-mm1.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-09 18:22 ` David Dabbs
@ 2004-08-09 18:42 ` Alex Zarochentsev
0 siblings, 0 replies; 47+ messages in thread
From: Alex Zarochentsev @ 2004-08-09 18:42 UTC (permalink / raw)
To: David Dabbs; +Cc: 'ReiserFS List'
On Mon, Aug 09, 2004 at 01:22:36PM -0500, David Dabbs wrote:
>
>
> > -----Original Message-----
> > From: Alex Zarochentsev [mailto:zam@namesys.com]
> > Sent: Monday, August 09, 2004 1:00 PM
> > To: David Dabbs
> >
> > mkfs.reiser4 sets the fs-wide default fibration plugin.
> >
> > Alex.
> [David Dabbs]
>
> So does mkfs.reiser4 using ext-1 as its default? I don't have the code
> available to check before asking.
mkfs.reiser4 can report default plugins:
darkstar:~ # ~zam/bk/reiser4progs/progs/mkfs/mkfs.reiser4 -p
/home/zam/bk/reiser4progs/progs/mkfs/.libs/lt-mkfs.reiser4 0.5.7
Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser, licensing governed by
reiser4progs/COPYING.
Default profiles:
format: "format40" (id:0x0 type:0x8)
journal: "journal40" (id:0x0 type:0xf)
oid: "oid40" (id:0x0 type:0x9)
alloc: "alloc40" (id:0x0 type:0xe)
key: "key_large" (id:0x1 type:0x10)
node: "node40" (id:0x0 type:0x2)
statdata: "stat40" (id:0x0 type:0x1)
nodeptr: "nodeptr40" (id:0x3 type:0x1)
direntry: "cde40" (id:0x2 type:0x1)
tail: "tail40" (id:0x6 type:0x1)
extent: "extent40" (id:0x5 type:0x1)
acl: "absent (id:0x4 type:0x1)"
permission: "absent (id:0x0 type:0x6)"
regular: "reg40" (id:0x0 type:0x0)
directory: "dir40" (id:0x1 type:0x0)
symlink: "sym40" (id:0x2 type:0x0)
special: "spl40" (id:0x3 type:0x0)
hash: "r5_hash" (id:0x1 type:0x3)
fibration: "ext_1_fibre" (id:0x2 type:0x4)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
formatting: "smart" (id:0x2 type:0x5)
darkstar:~ #
>
> David
>
--
Alex.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-07 7:49 ` Hans Reiser
2004-08-08 2:54 ` David Dabbs
@ 2004-08-10 3:21 ` Valdis.Kletnieks
2004-08-10 8:31 ` Hans Reiser
2004-08-10 9:20 ` Alex Zarochentsev
1 sibling, 2 replies; 47+ messages in thread
From: Valdis.Kletnieks @ 2004-08-10 3:21 UTC (permalink / raw)
To: Hans Reiser; +Cc: David Dabbs, 'ReiserFS List'
[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]
On Sat, 07 Aug 2004 00:49:43 PDT, Hans Reiser said:
> >I think I have discovered the problem - unless there was a reason mongo was
> >issuing mount/unmount commands at the start/end of a mongo 'run' as well as
> >before/after _each phase_.
> Probably someone wanted to separate the measurement of the phases. It
> has been a while since I read mongo.....
Note that an unmount/mount pair will force a flush of all dirtied pages in the
in-memory file cache, and *really* not return until it's really done and really
out on disk. In addition, sync() will force stuff to disk, but *not* invalidate
in-cache pages - more drastic measures are needed if you want to benchmark
with a cold cache (which is almost a must if you're doing actual filesystem
benchmarking, as otherwise you're benching the in-core cache instead).
As an aside, although the Linux fs/buffer:do_sync() won't return until it's
all really done, there is no mandate that the sync() syscall wait (and in fact,
is the source of the old "type 'sync' three times, then 'halt'" - the second
and third times you typed sync and hit return hopefully gave the I/O scheduled
by the *first* sync time to complete. At least one 'Unix for Dummies' book
proved their lack of depth of understanding when they recommended:
# sync;sync;sync;halt
;)
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-10 3:21 ` Valdis.Kletnieks
@ 2004-08-10 8:31 ` Hans Reiser
2004-08-10 15:41 ` Valdis.Kletnieks
2004-08-10 9:20 ` Alex Zarochentsev
1 sibling, 1 reply; 47+ messages in thread
From: Hans Reiser @ 2004-08-10 8:31 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: David Dabbs, 'ReiserFS List'
Valdis.Kletnieks@vt.edu wrote:
>On Sat, 07 Aug 2004 00:49:43 PDT, Hans Reiser said:
>
>
>
>>>I think I have discovered the problem - unless there was a reason mongo was
>>>issuing mount/unmount commands at the start/end of a mongo 'run' as well as
>>>before/after _each phase_.
>>>
>>>
>
>
>
>>Probably someone wanted to separate the measurement of the phases. It
>>has been a while since I read mongo.....
>>
>>
>
>Note that an unmount/mount pair will force a flush of all dirtied pages in the
>in-memory file cache, and *really* not return until it's really done and really
>out on disk. In addition, sync() will force stuff to disk, but *not* invalidate
>in-cache pages - more drastic measures are needed if you want to benchmark
>with a cold cache (which is almost a must if you're doing actual filesystem
>benchmarking, as otherwise you're benching the in-core cache instead).
>
>
>
Yes, but benchmarking the invalidation of the cache is a mistake to avoid.
>As an aside, although the Linux fs/buffer:do_sync() won't return until it's
>all really done, there is no mandate that the sync() syscall wait (and in fact,
>is the source of the old "type 'sync' three times, then 'halt'" - the second
>and third times you typed sync and hit return hopefully gave the I/O scheduled
>by the *first* sync time to complete. At least one 'Unix for Dummies' book
>proved their lack of depth of understanding when they recommended:
>
># sync;sync;sync;halt
>
>;)
>
>
>
Thanks for explaining
sync;sync;sync;halt
I always felt I was failing to grok something.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-10 3:21 ` Valdis.Kletnieks
2004-08-10 8:31 ` Hans Reiser
@ 2004-08-10 9:20 ` Alex Zarochentsev
2004-08-10 17:35 ` Hans Reiser
1 sibling, 1 reply; 47+ messages in thread
From: Alex Zarochentsev @ 2004-08-10 9:20 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: Hans Reiser, David Dabbs, 'ReiserFS List'
On Mon, Aug 09, 2004 at 11:21:17PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Sat, 07 Aug 2004 00:49:43 PDT, Hans Reiser said:
>
> > >I think I have discovered the problem - unless there was a reason mongo was
> > >issuing mount/unmount commands at the start/end of a mongo 'run' as well as
> > >before/after _each phase_.
>
> > Probably someone wanted to separate the measurement of the phases. It
> > has been a while since I read mongo.....
>
> Note that an unmount/mount pair will force a flush of all dirtied pages in the
> in-memory file cache, and *really* not return until it's really done and really
> out on disk. In addition, sync() will force stuff to disk, but *not* invalidate
> in-cache pages - more drastic measures are needed if you want to benchmark
> with a cold cache (which is almost a must if you're doing actual filesystem
> benchmarking, as otherwise you're benching the in-core cache instead).
That was designed to have result in each phase as independent as we can. For
example, if we have read slowdown in mongo, we will analyze only reads and,
probably, fs fragmentation, we won't deal with unmeasurable cache state before
the read phase. Known and persistent "cold cache effect" is better than unknown
hot cache one :) And, mongo phases are designed to be long and keep the cold
cache effect at minimum.
> As an aside, although the Linux fs/buffer:do_sync() won't return until it's
> all really done, there is no mandate that the sync() syscall wait (and in fact,
> is the source of the old "type 'sync' three times, then 'halt'" - the second
> and third times you typed sync and hit return hopefully gave the I/O scheduled
> by the *first* sync time to complete. At least one 'Unix for Dummies' book
> proved their lack of depth of understanding when they recommended:
>
> # sync;sync;sync;halt
>
--
Alex.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-10 8:31 ` Hans Reiser
@ 2004-08-10 15:41 ` Valdis.Kletnieks
0 siblings, 0 replies; 47+ messages in thread
From: Valdis.Kletnieks @ 2004-08-10 15:41 UTC (permalink / raw)
To: Hans Reiser; +Cc: David Dabbs, 'ReiserFS List'
[-- Attachment #1: Type: text/plain, Size: 847 bytes --]
On Tue, 10 Aug 2004 01:31:17 PDT, Hans Reiser said:
> Thanks for explaining
>
> sync;sync;sync;halt
>
> I always felt I was failing to grok something.
As was the author who recommended it. It started out as:
# sync ( this one schedules the I/O)
# sync ( just a time waster typing)
# sync ( just a time waster typing )
# halt ( and we finally actually shut down).
The disks on the old PDP and Vax 750 boxes were actually sluggish enough that
if you had a whole 1M or 2M in the buffer cache to flush out, it was actually not
difficult to enter "sync", hit return, enter "halt", hit return, and have the
halt happen before sync finished, doing the predictable to the non-journaled file
systems. Empirical studies showed that even on the biggest-memory boxes,
sync could almost always finish with 2 time-wasters before the halt.. ;)
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-10 9:20 ` Alex Zarochentsev
@ 2004-08-10 17:35 ` Hans Reiser
2004-08-10 17:42 ` David Dabbs
2004-08-10 18:05 ` Alex Zarochentsev
0 siblings, 2 replies; 47+ messages in thread
From: Hans Reiser @ 2004-08-10 17:35 UTC (permalink / raw)
To: Alex Zarochentsev; +Cc: Valdis.Kletnieks, David Dabbs, 'ReiserFS List'
Alex Zarochentsev wrote:
>On Mon, Aug 09, 2004 at 11:21:17PM -0400, Valdis.Kletnieks@vt.edu wrote:
>
>
>>On Sat, 07 Aug 2004 00:49:43 PDT, Hans Reiser said:
>>
>>
>>
>>>>I think I have discovered the problem - unless there was a reason mongo was
>>>>issuing mount/unmount commands at the start/end of a mongo 'run' as well as
>>>>before/after _each phase_.
>>>>
>>>>
>>>Probably someone wanted to separate the measurement of the phases. It
>>>has been a while since I read mongo.....
>>>
>>>
>>Note that an unmount/mount pair will force a flush of all dirtied pages in the
>>in-memory file cache, and *really* not return until it's really done and really
>>out on disk. In addition, sync() will force stuff to disk, but *not* invalidate
>>in-cache pages - more drastic measures are needed if you want to benchmark
>>with a cold cache (which is almost a must if you're doing actual filesystem
>>benchmarking, as otherwise you're benching the in-core cache instead).
>>
>>
>
>That was designed to have result in each phase as independent as we can. For
>example, if we have read slowdown in mongo, we will analyze only reads and,
>probably, fs fragmentation, we won't deal with unmeasurable cache state before
>the read phase. Known and persistent "cold cache effect" is better than unknown
>hot cache one :) And, mongo phases are designed to be long and keep the cold
>cache effect at minimum.
>
>
Let me be more precise here. Is the time spent mounting and umounting
included in the time for the phase?
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-10 17:35 ` Hans Reiser
@ 2004-08-10 17:42 ` David Dabbs
2004-08-10 17:46 ` Hans Reiser
2004-08-10 18:05 ` Alex Zarochentsev
1 sibling, 1 reply; 47+ messages in thread
From: David Dabbs @ 2004-08-10 17:42 UTC (permalink / raw)
To: 'Hans Reiser', 'Alex Zarochentsev'
Cc: 'ReiserFS List'
>
> Let me be more precise here. Is the time spent mounting and umounting
> included in the time for the phase?
[David Dabbs]
No. Each phase iteration is wrapped in a time call. The mounting and
unmounting happen before and after this, so there's shouldn't be any timing
impact. I'd need to double check the sources, but I'm pretty sure that's the
case.
David
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-10 17:42 ` David Dabbs
@ 2004-08-10 17:46 ` Hans Reiser
0 siblings, 0 replies; 47+ messages in thread
From: Hans Reiser @ 2004-08-10 17:46 UTC (permalink / raw)
To: David Dabbs; +Cc: 'Alex Zarochentsev', 'ReiserFS List'
David Dabbs wrote:
>
>
>
>>Let me be more precise here. Is the time spent mounting and umounting
>>included in the time for the phase?
>>
>>
>
> [David Dabbs]
>No. Each phase iteration is wrapped in a time call. The mounting and
>unmounting happen before and after this, so there's shouldn't be any timing
>impact. I'd need to double check the sources, but I'm pretty sure that's the
>case.
>
>David
>
>
>
>
>
>
Oh, well, then my criticism is invalid.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-10 17:35 ` Hans Reiser
2004-08-10 17:42 ` David Dabbs
@ 2004-08-10 18:05 ` Alex Zarochentsev
2004-08-10 19:55 ` Hans Reiser
1 sibling, 1 reply; 47+ messages in thread
From: Alex Zarochentsev @ 2004-08-10 18:05 UTC (permalink / raw)
To: Hans Reiser; +Cc: Valdis.Kletnieks, David Dabbs, 'ReiserFS List'
On Tue, Aug 10, 2004 at 10:35:58AM -0700, Hans Reiser wrote:
> Alex Zarochentsev wrote:
>
> >On Mon, Aug 09, 2004 at 11:21:17PM -0400, Valdis.Kletnieks@vt.edu wrote:
> >
> >
> >>On Sat, 07 Aug 2004 00:49:43 PDT, Hans Reiser said:
> >>
> >>
> >>
> >>>>I think I have discovered the problem - unless there was a reason mongo
> >>>>was
> >>>>issuing mount/unmount commands at the start/end of a mongo 'run' as
> >>>>well as
> >>>>before/after _each phase_.
> >>>>
> >>>>
> >>>Probably someone wanted to separate the measurement of the phases. It
> >>>has been a while since I read mongo.....
> >>>
> >>>
> >>Note that an unmount/mount pair will force a flush of all dirtied pages
> >>in the
> >>in-memory file cache, and *really* not return until it's really done and
> >>really
> >>out on disk. In addition, sync() will force stuff to disk, but *not*
> >>invalidate
> >>in-cache pages - more drastic measures are needed if you want to benchmark
> >>with a cold cache (which is almost a must if you're doing actual
> >>filesystem
> >>benchmarking, as otherwise you're benching the in-core cache instead).
> >>
> >>
> >
> >That was designed to have result in each phase as independent as we can.
> >For
> >example, if we have read slowdown in mongo, we will analyze only reads and,
> >probably, fs fragmentation, we won't deal with unmeasurable cache state
> >before
> >the read phase. Known and persistent "cold cache effect" is better than
> >unknown
> >hot cache one :) And, mongo phases are designed to be long and keep the
> >cold
> >cache effect at minimum.
> >
> >
>
> Let me be more precise here. Is the time spent mounting and umounting
> included in the time for the phase?
oops. sync time included. mount/umount time is not.
--
Alex.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-10 18:05 ` Alex Zarochentsev
@ 2004-08-10 19:55 ` Hans Reiser
2004-08-10 20:41 ` Alex Zarochentsev
0 siblings, 1 reply; 47+ messages in thread
From: Hans Reiser @ 2004-08-10 19:55 UTC (permalink / raw)
To: Alex Zarochentsev; +Cc: Valdis.Kletnieks, David Dabbs, 'ReiserFS List'
Alex Zarochentsev wrote:
>On Tue, Aug 10, 2004 at 10:35:58AM -0700, Hans Reiser wrote:
>
>
>>Alex Zarochentsev wrote:
>>
>>
>>
>>>On Mon, Aug 09, 2004 at 11:21:17PM -0400, Valdis.Kletnieks@vt.edu wrote:
>>>
>>>
>>>
>>>
>>>>On Sat, 07 Aug 2004 00:49:43 PDT, Hans Reiser said:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>>I think I have discovered the problem - unless there was a reason mongo
>>>>>>was
>>>>>>issuing mount/unmount commands at the start/end of a mongo 'run' as
>>>>>>well as
>>>>>>before/after _each phase_.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>Probably someone wanted to separate the measurement of the phases. It
>>>>>has been a while since I read mongo.....
>>>>>
>>>>>
>>>>>
>>>>>
>>>>Note that an unmount/mount pair will force a flush of all dirtied pages
>>>>in the
>>>>in-memory file cache, and *really* not return until it's really done and
>>>>really
>>>>out on disk. In addition, sync() will force stuff to disk, but *not*
>>>>invalidate
>>>>in-cache pages - more drastic measures are needed if you want to benchmark
>>>>with a cold cache (which is almost a must if you're doing actual
>>>>filesystem
>>>>benchmarking, as otherwise you're benching the in-core cache instead).
>>>>
>>>>
>>>>
>>>>
>>>That was designed to have result in each phase as independent as we can.
>>>For
>>>example, if we have read slowdown in mongo, we will analyze only reads and,
>>>probably, fs fragmentation, we won't deal with unmeasurable cache state
>>>before
>>>the read phase. Known and persistent "cold cache effect" is better than
>>>unknown
>>>hot cache one :) And, mongo phases are designed to be long and keep the
>>>cold
>>>cache effect at minimum.
>>>
>>>
>>>
>>>
>>Let me be more precise here. Is the time spent mounting and umounting
>>included in the time for the phase?
>>
>>
>
>oops. sync time included. mount/umount time is not.
>
>
>
>
You see the problem, yes? Your suggestion?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-10 19:55 ` Hans Reiser
@ 2004-08-10 20:41 ` Alex Zarochentsev
0 siblings, 0 replies; 47+ messages in thread
From: Alex Zarochentsev @ 2004-08-10 20:41 UTC (permalink / raw)
To: Hans Reiser; +Cc: Valdis.Kletnieks, David Dabbs, 'ReiserFS List'
On Tue, Aug 10, 2004 at 12:55:27PM -0700, Hans Reiser wrote:
> Alex Zarochentsev wrote:
>
> >On Tue, Aug 10, 2004 at 10:35:58AM -0700, Hans Reiser wrote:
> >
> >
> >>Alex Zarochentsev wrote:
> >>
> >>
> >>
> >>>On Mon, Aug 09, 2004 at 11:21:17PM -0400, Valdis.Kletnieks@vt.edu wrote:
> >>>
> >>>
> >>>
> >>>
> >>>>On Sat, 07 Aug 2004 00:49:43 PDT, Hans Reiser said:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>>I think I have discovered the problem - unless there was a reason
> >>>>>>mongo was
> >>>>>>issuing mount/unmount commands at the start/end of a mongo 'run' as
> >>>>>>well as
> >>>>>>before/after _each phase_.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>Probably someone wanted to separate the measurement of the phases. It
> >>>>>has been a while since I read mongo.....
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>Note that an unmount/mount pair will force a flush of all dirtied pages
> >>>>in the
> >>>>in-memory file cache, and *really* not return until it's really done
> >>>>and really
> >>>>out on disk. In addition, sync() will force stuff to disk, but *not*
> >>>>invalidate
> >>>>in-cache pages - more drastic measures are needed if you want to
> >>>>benchmark
> >>>>with a cold cache (which is almost a must if you're doing actual
> >>>>filesystem
> >>>>benchmarking, as otherwise you're benching the in-core cache instead).
> >>>>
> >>>>
> >>>>
> >>>>
> >>>That was designed to have result in each phase as independent as we can.
> >>>For
> >>>example, if we have read slowdown in mongo, we will analyze only reads
> >>>and,
> >>>probably, fs fragmentation, we won't deal with unmeasurable cache state
> >>>before
> >>>the read phase. Known and persistent "cold cache effect" is better than
> >>>unknown
> >>>hot cache one :) And, mongo phases are designed to be long and keep the
> >>>cold
> >>>cache effect at minimum.
> >>>
> >>>
> >>>
> >>>
> >>Let me be more precise here. Is the time spent mounting and umounting
> >>included in the time for the phase?
> >>
> >>
> >
> >oops. sync time included. mount/umount time is not.
> >
> >
> >
> >
> You see the problem, yes? Your suggestion?
The problem is that sync may not wait, yes? it might be considered as fs bug.
we can time the umount() and, if it takes sufficient time (like 0.5+% of the
test phase duration) report benchmark error. If number of "buggy" fs would be
big enough, we will include umount() into the timed part.
However, we can add options to control sync()/umount() between phases and see
that benchmark results do not depend on it too much :)
--
Alex.
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
[not found] <20040810205450.GU9811@backtop.namesys.com>
@ 2004-08-10 21:06 ` David Dabbs
2004-08-10 21:06 ` Alex Zarochentsev
0 siblings, 1 reply; 47+ messages in thread
From: David Dabbs @ 2004-08-10 21:06 UTC (permalink / raw)
To: 'Alex Zarochentsev'
Cc: 'Hans Reiser', 'ReiserFS List'
>
> On Tue, Aug 10, 2004 at 03:40:14PM -0500, David Dabbs wrote:
> >
> > The solution I proposed is below, and I followed up that the wait should
> be in the iteration command, but it should probably be moved _after_ all
> reps as I have it below. This will sync/wait after all the phase
> iterations have run instead of after each individual phase iteration
> command. In either case it will not be timed. Sound right, Alex?
>
> one "wait" should be at the old place still.
>
[David Dabbs]
I thought so, then second guessed it.
> but, I don't think we should measure writes w/o final cleaning of the fs
> cache.
>
[David Dabbs]
If you mean at phase end, yes that does sound right for write ops.
>
> Reiser4 had no support for write throttling at some early stages of
> development, w/o final sync() it might look faster than physical disk
> drive.
>
> > dd
> >
I have been running benchmarks with SYNC=off, as the mongo page recommends
against doing so. In configs where SYNC=ON we would omit the syncs/wais,
yes?
David
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-10 21:06 ` David Dabbs
@ 2004-08-10 21:06 ` Alex Zarochentsev
2004-08-10 21:19 ` David Dabbs
2004-08-10 21:26 ` David Dabbs
0 siblings, 2 replies; 47+ messages in thread
From: Alex Zarochentsev @ 2004-08-10 21:06 UTC (permalink / raw)
To: David Dabbs; +Cc: 'Hans Reiser', 'ReiserFS List'
On Tue, Aug 10, 2004 at 04:06:24PM -0500, David Dabbs wrote:
>
>
> >
> > On Tue, Aug 10, 2004 at 03:40:14PM -0500, David Dabbs wrote:
> > >
> > > The solution I proposed is below, and I followed up that the wait should
> > be in the iteration command, but it should probably be moved _after_ all
> > reps as I have it below. This will sync/wait after all the phase
> > iterations have run instead of after each individual phase iteration
> > command. In either case it will not be timed. Sound right, Alex?
> >
> > one "wait" should be at the old place still.
> >
> [David Dabbs]
> I thought so, then second guessed it.
>
> > but, I don't think we should measure writes w/o final cleaning of the fs
> > cache.
> >
> [David Dabbs]
> If you mean at phase end, yes that does sound right for write ops.
>
>
> >
> > Reiser4 had no support for write throttling at some early stages of
> > development, w/o final sync() it might look faster than physical disk
> > drive.
> >
> > > dd
> > >
>
> I have been running benchmarks with SYNC=off, as the mongo page recommends
> against doing so. In configs where SYNC=ON we would omit the syncs/wais,
> yes?
not sure. metadata need to be synched still.
> David
--
Alex.
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-10 21:06 ` Alex Zarochentsev
@ 2004-08-10 21:19 ` David Dabbs
2004-08-11 10:03 ` Vladimir V. Saveliev
2004-08-10 21:26 ` David Dabbs
1 sibling, 1 reply; 47+ messages in thread
From: David Dabbs @ 2004-08-10 21:19 UTC (permalink / raw)
To: 'Alex Zarochentsev'
Cc: 'Hans Reiser', 'ReiserFS List'
> -----Original Message-----
> From: Alex Zarochentsev [mailto:zam@namesys.com]
> Sent: Tuesday, August 10, 2004 4:07 PM
> To: David Dabbs
> Cc: 'Hans Reiser'; 'ReiserFS List'
> Subject: Re: Was able to reproduce "cp: cannot stat file.x: Input/output
> error"
>
> On Tue, Aug 10, 2004 at 04:06:24PM -0500, David Dabbs wrote:
> >
> > I have been running benchmarks with SYNC=off, as the mongo page
> > recommends against doing so. In configs where SYNC=ON we would
> > omit the syncs/waits, yes?
>
> not sure. metadata need to be synched still.
>
> Alex.
Saw following recently on LKML re. ext3. Does r4 O_SYNC not sync metadata?
From Theodore Ts'o
Subject Re: ext3 and SPEC SFS Run rules.
On Mon, Jul 26, 2004 at 10:12:01AM +0100, Tigran Aivazian wrote:
> On Mon, 26 Jul 2004, Andrew Morton wrote:
> > ext3 should be fully syncing data and metadata for both fsync()
> > and O_SYNC writes in all three journalling modes.
> > If not, that's a big bug.
David
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-10 21:06 ` Alex Zarochentsev
2004-08-10 21:19 ` David Dabbs
@ 2004-08-10 21:26 ` David Dabbs
1 sibling, 0 replies; 47+ messages in thread
From: David Dabbs @ 2004-08-10 21:26 UTC (permalink / raw)
To: 'Alex Zarochentsev'
Cc: 'Hans Reiser', 'ReiserFS List'
> > I have been running benchmarks with SYNC=off, as the mongo page
> recommends
> > against doing so. In configs where SYNC=ON we would omit the syncs/wais,
> > yes?
>
> not sure. metadata need to be synched still.
>
> > David
>
> --
> Alex.
See also below. I'm pretty sure my IDE disks are write cache-enabled.
Perhaps the statement below that "ide write caches must be disabled for
reliable fsync operations with Linux" is an explanation for the errors I saw
during my benchmarking. Will look more into the "IDE barrier and true
fsync() in Linux on IDE" threads.
David
Subject: Re: Why O_SYNC is faster than fsync on ext3
Date: Sun, 21 Mar 2004 11:45:18 +0100
-------------------------------------------------------------------------
Yusuf Goolamabbas wrote:
>I sent this to Bruce but forgot to cc pgsql-hackers, The patches are
>likely to go into 2.6.6. People interested in extremely safe fsync
>writes should also follow the IDE barrier thread and the true fsync()
>in Linux on IDE thread
Actually the most interesting part of the thread was the initial post from
Peter Zaitsev on a fcntl(fd, F_FULLSYNC, NULL): He wrote that this is
necessary for Mac OS X to force a flush of the write caches in the disks.
Unfortunately I can't find anything about this flag with google.
Another interesting point is that right now, ide write caches must be
disabled for reliable fsync operations with Linux. Recent suse kernels
contain partial support. If the existing patches are completed and merged,
it will be safe to enable write caching.
Perhaps Bruce's cache flush test could be modified slightly to check that
the OS isn't lying about fsync: if fsync is faster than the rotational delay
of the disks, then the setup is not suitable for postgres. This could be
recommended as a setup test in the install document.
^ permalink raw reply [flat|nested] 47+ messages in thread
* RE: Was able to reproduce "cp: cannot stat file.x: Input/output error"
[not found] <411944EF.7000504@namesys.com>
@ 2004-08-10 22:05 ` David Dabbs
0 siblings, 0 replies; 47+ messages in thread
From: David Dabbs @ 2004-08-10 22:05 UTC (permalink / raw)
To: 'Hans Reiser'
Cc: 'Alex Zarochentsev', 'ReiserFS List'
> -----Original Message-----
> From: Hans Reiser [mailto:reiser@namesys.com]
> Sent: Tuesday, August 10, 2004 4:58 PM
> To: David Dabbs
> Cc: 'Alex Zarochentsev'
> Subject: Re: Was able to reproduce "cp: cannot stat file.x: Input/output
> error"
>
> Do you report the time spent on wait as a separate metric or?
>
> Hans
>
No, there is nothing like that now and it didn't occur to me. Alex's earlier
message proposed timing the umount, but nothing regarding the wait. There is
(an undoc?) mongo option for EXTENDED_STATISTICS, but I think that strictly
deals with disk fragmentation.
david
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Was able to reproduce "cp: cannot stat file.x: Input/output error"
2004-08-10 21:19 ` David Dabbs
@ 2004-08-11 10:03 ` Vladimir V. Saveliev
0 siblings, 0 replies; 47+ messages in thread
From: Vladimir V. Saveliev @ 2004-08-11 10:03 UTC (permalink / raw)
To: David Dabbs
Cc: 'Alex Zarochentsev', 'Hans Reiser',
'ReiserFS List'
Hello
David Dabbs wrote:
>
>>-----Original Message-----
>>From: Alex Zarochentsev [mailto:zam@namesys.com]
>>Sent: Tuesday, August 10, 2004 4:07 PM
>>To: David Dabbs
>>Cc: 'Hans Reiser'; 'ReiserFS List'
>>Subject: Re: Was able to reproduce "cp: cannot stat file.x: Input/output
>>error"
>>
>>On Tue, Aug 10, 2004 at 04:06:24PM -0500, David Dabbs wrote:
>>
>>>I have been running benchmarks with SYNC=off, as the mongo page
>>>recommends against doing so. In configs where SYNC=ON we would
>>>omit the syncs/waits, yes?
>>
>>not sure. metadata need to be synched still.
>>
>>Alex.
>
>
> Saw following recently on LKML re. ext3. Does r4 O_SYNC not sync metadata?
>
It does. Well, it should.
>
>>From Theodore Ts'o
> Subject Re: ext3 and SPEC SFS Run rules.
>
> On Mon, Jul 26, 2004 at 10:12:01AM +0100, Tigran Aivazian wrote:
>
>>On Mon, 26 Jul 2004, Andrew Morton wrote:
>>
>>>ext3 should be fully syncing data and metadata for both fsync()
>>>and O_SYNC writes in all three journalling modes.
>>>If not, that's a big bug.
>
>
> David
>
>
>
>
^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2004-08-11 10:03 UTC | newest]
Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-06 6:53 Was able to reproduce "cp: cannot stat file.x: Input/output error" David Dabbs
2004-08-06 15:51 ` Vladimir V. Saveliev
2004-08-06 17:10 ` Philippe Gramoullé
2004-08-06 17:39 ` Vladimir V. Saveliev
2004-08-06 19:06 ` Philippe Gramoullé
2004-08-07 4:14 ` Hans Reiser
2004-08-06 17:46 ` David Dabbs
2004-08-06 19:11 ` Philippe Gramoullé
2004-08-07 4:15 ` Hans Reiser
2004-08-07 6:46 ` David Dabbs
2004-08-07 7:49 ` Hans Reiser
2004-08-08 2:54 ` David Dabbs
2004-08-10 3:21 ` Valdis.Kletnieks
2004-08-10 8:31 ` Hans Reiser
2004-08-10 15:41 ` Valdis.Kletnieks
2004-08-10 9:20 ` Alex Zarochentsev
2004-08-10 17:35 ` Hans Reiser
2004-08-10 17:42 ` David Dabbs
2004-08-10 17:46 ` Hans Reiser
2004-08-10 18:05 ` Alex Zarochentsev
2004-08-10 19:55 ` Hans Reiser
2004-08-10 20:41 ` Alex Zarochentsev
2004-08-06 17:51 ` Alex Zarochentsev
2004-08-06 19:10 ` Philippe Gramoullé
[not found] <411944EF.7000504@namesys.com>
2004-08-10 22:05 ` David Dabbs
[not found] <20040810205450.GU9811@backtop.namesys.com>
2004-08-10 21:06 ` David Dabbs
2004-08-10 21:06 ` Alex Zarochentsev
2004-08-10 21:19 ` David Dabbs
2004-08-11 10:03 ` Vladimir V. Saveliev
2004-08-10 21:26 ` David Dabbs
[not found] <4115A979.5090002@namesys.com>
2004-08-08 7:07 ` David Dabbs
2004-08-08 18:08 ` Hans Reiser
2004-08-08 19:09 ` David Dabbs
2004-08-09 6:17 ` Hans Reiser
2004-08-08 21:40 ` David Dabbs
2004-08-09 0:01 ` Hans Reiser
2004-08-09 1:55 ` David Dabbs
2004-08-09 17:43 ` Hans Reiser
2004-08-09 18:32 ` David Dabbs
2004-08-09 2:38 ` David Dabbs
2004-08-09 17:59 ` Alex Zarochentsev
2004-08-09 18:22 ` David Dabbs
2004-08-09 18:42 ` Alex Zarochentsev
2004-08-09 15:13 ` Nikita Danilov
2004-08-09 17:48 ` Hans Reiser
-- strict thread matches above, loose matches on Subject: below --
2004-08-06 4:54 David Dabbs
2004-08-06 7:31 ` mjt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.