All of lore.kernel.org
 help / color / mirror / Atom feed
* AMD64/Reiser4 testing and problems
@ 2004-12-02 23:13 Isaac Chanin
  2004-12-04 21:27 ` Alex Zarochentsev
  0 siblings, 1 reply; 6+ messages in thread
From: Isaac Chanin @ 2004-12-02 23:13 UTC (permalink / raw)
  To: reiserfs-list

Hi,

I did some testing with Resier4 on AMD and was wondering if perhaps 
debug information or anything on this could help with getting Resier4 
stable and working on AMD64.

I have read that if AMD would give an AMD64 cpu that would be a big 
help, but it doesn't seem inheriantly impossible to fix the problem from 
error reports and such.

Anyways, here are the results for my testing.  I used a mm 2.6.10-rc2 
kernel and the filesystem was in a file mounted by loopback.

http://users.wpi.edu/~chanin/r4log.txt

The commands I tried were as follows:
(r4 was a 512mb file made by dd)
mount -o loop /root/r4 /root/r4dir (worked)
df -h (worked)
cd /root/r4 (worked)
ls (worked)
ls -la (worked)
mkdir linux (worked)
cp -r /usr/src/linux linux (worked, but the program hung at the end, 
kill -9 had no effect)
ls (worked, but the program hung at the end, kill -9 had no effect)
cd /root/r4 (no output, program crashed immediately)


Thanks for taking the time to read over the problems - and if there's 
anymore testing I could do or whatnot just ask; and, thanks for making 
what will (hopefully) soon be the fastest file system to work on my 
computer!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AMD64/Reiser4 testing and problems
  2004-12-02 23:13 AMD64/Reiser4 testing and problems Isaac Chanin
@ 2004-12-04 21:27 ` Alex Zarochentsev
  0 siblings, 0 replies; 6+ messages in thread
From: Alex Zarochentsev @ 2004-12-04 21:27 UTC (permalink / raw)
  To: Isaac Chanin; +Cc: reiserfs-list

Hello Isaac

On Thu, Dec 02, 2004 at 06:13:39PM -0500, Isaac Chanin wrote:
> Hi,
> 
> I did some testing with Resier4 on AMD and was wondering if perhaps 
> debug information or anything on this could help with getting Resier4 
> stable and working on AMD64.
> 
> I have read that if AMD would give an AMD64 cpu that would be a big 
> help, but it doesn't seem inheriantly impossible to fix the problem from 
> error reports and such.
> 
> Anyways, here are the results for my testing.  I used a mm 2.6.10-rc2 
> kernel and the filesystem was in a file mounted by loopback.
> 
> http://users.wpi.edu/~chanin/r4log.txt

thanks a lot for the report.  can you try the following patch?

===== plugin/space/bitmap.c 1.183 vs edited =====
--- 1.183/plugin/space/bitmap.c	Wed Oct 13 17:22:01 2004
+++ edited/plugin/space/bitmap.c	Sun Dec  5 00:18:55 2004
@@ -170,7 +170,7 @@
 static int
 find_next_zero_bit_in_word(ulong_t word, int start_bit)
 {
-	unsigned int mask = 1 << start_bit;
+	ulong_t mask = 1 << start_bit;
 	int i = start_bit;
 
 	while ((word & mask) != 0) {
@@ -234,7 +234,7 @@
 /* search for the first set bit in single word. */
 static int find_last_set_bit_in_word (ulong_t word, int start_bit)
 {
-	unsigned bit_mask;
+	ulong_t bit_mask;
 	int nr = start_bit;
 
 	assert ("zam-965", start_bit < BITS_PER_LONG);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AMD64/Reiser4 testing and problems
@ 2004-12-08  8:04 Isaac Chanin
  0 siblings, 0 replies; 6+ messages in thread
From: Isaac Chanin @ 2004-12-08  8:04 UTC (permalink / raw)
  To: reiserfs-list

Hey Alex,

Thanks for the patch, I wish there was more I could say, but so far I 
haven't been able to produce any problems whatsoever with this new patch 
on reiser4/amd64.  The things I tried were Jake Maciejewski's kernel 
compiling command which I modified slightly to better suit my system. 
The following ran for at least 30 minutes and a couple of iterations of 
kernel compiling: for i in `seq 1 20` ; do make mrproper ; cat 
/boot/2.6.10-r4-mm-config > .config ; make ; echo $i ; done & for i in 
`seq 1 5` ; do dd; if=/dev/zero of=large_file bs=1M count=20k ; rm 
large_file ; echo $i ; done

I also tried filling the filesystem, and testing for data retention from 
forced umounts.  For fun I also tried running ext2 and a second resier4 
mounted on files inside the reiser4 filesystem.  Everything worked fine, 
I was unable to produce a single error in the logs.

The only thing that could possibly have been said to have gone wrong, 
was when I tried force umounting with open file handles, that didn't 
work out quite as well - however, for all sane usage resier4 seems 
rather stable on amd64 to me.

I will try running an entire linux system on it once I get the time, and 
perhaps then will be able to give you some more feedback.

So once again, thanks for the patch, and best of luck with resier4,

Isaac


Alex Zarochentsev wrote:

 > Hello Isaac
 >
 > On Thu, Dec 02, 2004 at 06:13:39PM -0500, Isaac Chanin wrote:
 >
 >> Hi,
 >>
 >> I did some testing with Resier4 on AMD and was wondering if perhaps 
debug information or anything on this could help with getting Resier4 
stable and working on AMD64.
 >>
 >> I have read that if AMD would give an AMD64 cpu that would be a big 
help, but it doesn't seem inheriantly impossible to fix the problem from 
error reports and such.
 >>
 >> Anyways, here are the results for my testing.  I used a mm 
2.6.10-rc2 kernel and the filesystem was in a file mounted by loopback.
 >>
 >> http://users.wpi.edu/~chanin/r4log.txt
 >
 >
 >
 > thanks a lot for the report.  can you try the following patch?
 >
 > ===== plugin/space/bitmap.c 1.183 vs edited =====
 > --- 1.183/plugin/space/bitmap.c    Wed Oct 13 17:22:01 2004
 > +++ edited/plugin/space/bitmap.c    Sun Dec  5 00:18:55 2004
 > @@ -170,7 +170,7 @@
 >  static int
 >  find_next_zero_bit_in_word(ulong_t word, int start_bit)
 >  {
 > -    unsigned int mask = 1 << start_bit;
 > +    ulong_t mask = 1 << start_bit;
 >      int i = start_bit;
 >
 >      while ((word & mask) != 0) {
 > @@ -234,7 +234,7 @@
 >  /* search for the first set bit in single word. */
 >  static int find_last_set_bit_in_word (ulong_t word, int start_bit)
 >  {
 > -    unsigned bit_mask;
 > +    ulong_t bit_mask;
 >      int nr = start_bit;
 >
 >      assert ("zam-965", start_bit < BITS_PER_LONG);
 >


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AMD64/Reiser4 testing and problems
@ 2004-12-21  5:46 Isaac Chanin
  2004-12-22  5:45 ` Isaac Chanin
  0 siblings, 1 reply; 6+ messages in thread
From: Isaac Chanin @ 2004-12-21  5:46 UTC (permalink / raw)
  To: reiserfs-list


Hello Alex et al,

After my last e-mail was so useless I figured I would follow it up with 
a more useful one, so what I've attempted to do is build and entire AMD64 
system on top of reiser4 (aside from boot, still ext2 there), and, in the
process I've run into two more bugs.


The first bug would give output (without resier4 debugging options 
enabled) like:

Dec 20 02:33:14 [kernel] reiser4[portageq(31681)]: traverse_tree 
(fs/reiser4/search.c:789)[nikita-373]:
Dec 20 02:33:14 [kernel] reiser4[portageq(31681)]: traverse_tree 
(fs/reiser4/search.c:755)[nikita-1481]:

over and over until the system was rebooted (whatever process - it has so 
far happened with portageeq, wget, and i think nano - becomes completely 
unresponsive to all kill signals and takes 100% cpu time.)

So far I have absolutely no clue when this bug occurs, as it seems to pop 
up completely randomly (copying a large file from one reiser4 logical 
partition to another, wget'ing a tiny file to a reiser4 partition.)  About 
the only connection I could find was with writing to the filesystem.

I could only produce this bug on 2.6.10-rc2-mm4; linux-2.6.10-rc3-mm1 
(both with Alex's patch) would boot (onto reiser3) but wouldn't do much
else.


This leads nicely into the other bug, which is that it does not seem 
possible to boot into a resier4 system (ext2 for boot, resier4 for /, et 
cetera) on an AMD64.  I don't have any log output as the system hangs 
before it can load the logger, and no screen output because my laptop (no
matter what framebuffer configuration I use) has a completely black boot-up
screen until after the initial kernel loading is complete.  Neither 
2.6.10-rc2-mm4 nor linux-2.6.10-rc3-mm1 can get any farther in the booting 
process either.

It occured to me that this could simply be happeneing because of all of 
the failures the partitions had while getting the system installed on them 
(though fsck.resier4 seemed to handle them all nicely, at the time.)  So 
perhaps this bug is not really a bug, and more of a misconfiguration on my 
part - just a thought.


I'll put some full (with reiser4 debug options enabled in the kernel - 
still can't believe I forgot those) logs up at 
http://users.wpi.edu/~chanin/newr4log.txt once I can get the first bug to 
occur again (or if I can somehow get some output from the second.)  I 
should be able to get them up later tonight and if not then, probably 
sometime tomorrow.


Thanks,

Isaac


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AMD64/Reiser4 testing and problems
  2004-12-21  5:46 Isaac Chanin
@ 2004-12-22  5:45 ` Isaac Chanin
  0 siblings, 0 replies; 6+ messages in thread
From: Isaac Chanin @ 2004-12-22  5:45 UTC (permalink / raw)
  To: reiserfs-list


Hi everybody,
I have a few updates.  The second bug that i mentioned 
earlier has been dismissed, it is definitely possible to boot into a 
reiser4 system on AMD64.  The first bug is very hard to replicate, I've 
tried everything that caused it the first time without any luck - I'm 
starting to think that it must be caused by some race condition (and hence 
comes up randomly) or one of the debug options prevents it from happening 
(however unlikely that may be.)

However, it may have shown it's face again - after successfully compiling 
everything from glibc to xfce4 I finally got another reiser4 failure.  It 
happened pretty much like so:

---------------------------------------------------------
root /etc/X11 # /etc/init.d/gpm stop
reiser4 panicked cowardly: assertion failed: 
reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, 
start_offset) >= end_offset
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at debug:131
invalid operand: 0000 [1]
CPU 0
Modules linked in:
Pid: 6648, comm: runscript.sh Tainted: G   M  2.6.10-rc2-mm4
RIP: 0010:[<ffffffff801a0070>] <ffffffff801a0070>{reiser4_do_panic+768}
RSP: 0018:ffff81003d5497c8  EFLAGS: 00010246
RAX: ffffffff80564ac0 RBX: ffff81003d549da8 RCX: ffffffff80564ac0
RDX: ffff81003d549e78 RSI: ffffffff80564ac0 RDI: ffff81003f1e9400
RBP: ffff81003f1e9400 R08: 0000000000000000 R09: 0000000000000005
R10: 00000000ffffffff R11: 0000000000000000 R12: 00000000000009e4
R13: 00000000000009e4 R14: 0000000000000001 R15: 00000000000009c3
FS:  00002aaaaaeb8700(0000) GS:ffffffff805a2080(0000) 
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00005555556729c0 CR3: 000000003d844000 CR4: 00000000000006e0
Process runscript.sh (pid: 6648, threadinfo ffff81003d548000, task 
ffff81003e8ee860)
Stack: 0000003000000010 ffff81003d5498a8 ffff81003d5497e8 00000000000019f8
       ffffffff804081b0 ffffffff80441d90 0000000000000000 0000000000000000
       0000000000000000 0000000000000003
Call Trace:<ffffffff801a0348>{get_current_log_flags+8} 
<ffffffff801eda78>{reiser4_block_count+72}
        <ffffffff801a00a9>{schedulable+9} 
<ffffffff802617a0>{load_and_lock_bnode+96}
        <ffffffff80260ff6>{parse_blocknr+326} 
<ffffffff8019fd55>{reiser4_print_prefix+133}
        <ffffffff8019fc29>{report_err+9} 
<ffffffff802627a2>{check_blocks_bitmap+1042}
        <ffffffff801c62b6>{reiser4_check_block+22} 
<ffffffff801a7df0>{zget+1088}
        <ffffffff80200c27>{do_reiser4_file_readahead+999} 
<ffffffff8015dd63>{handle_mm_fault+1107}
        <ffffffff80201090>{reiser4_file_readahead+240} 
<ffffffff8026b627>{read_unix_file+775}
        <ffffffff80235d20>{read_tail+0} 
<ffffffff8026911c>{unix_file_filemap_nopage+236}
        <ffffffff801bea75>{done_context+741} 
<ffffffff801fb1f1>{reiser4_read+689}
        <ffffffff8015cb79>{do_wp_page+153} 
<ffffffff8016cc27>{vfs_read+199}
        <ffffffff8016cf13>{sys_read+83} 
<ffffffff8010e16a>{system_call+126}

Code: 0f 0b aa 1a 42 80 ff ff ff ff 83 00 48 c7 c6 40 46 56 80 48
RIP <ffffffff801a0070>{reiser4_do_panic+768} RSP <ffff81003d5497c8>
Segmentation fault
--------------------------------------------------------

A few things to note about the circumstance would be that
 - I had restarted without properly umounting the drives a few times 
before the bug occured (i'm not so sure how much fsck checking is 
happening at boot-time, it doesn't seem like much.)
 - Everything worked fine after the bug until...
 - Once I tried to stop gpm again, it went into an infinite loop of 
outputting errors (the syslogger died, so I couldn't get them) much like 
the previous error.

Also, if anyone has any suggestions, based upon either of my past two 
e-mails about anything to try (to either fix, or cause the bug to occur 
again) please feel free to suggest it - the output from these really tells 
me next to nothing so I'm kind of working blind here.

Anyways, I hope that output is a bit more helpful,

Isaac


Also, sorry about responding to myself three times now - it's just that it 
took me so long to get another bug that I figured an update was in order 
(not like it's so bad compared to the real spam on the list).


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AMD64/Reiser4 testing and problems
  2005-01-01 21:49 Recursive modfied-timestamp? Hans Reiser
@ 2005-01-02  4:22 ` Isaac Chanin
  0 siblings, 0 replies; 6+ messages in thread
From: Isaac Chanin @ 2005-01-02  4:22 UTC (permalink / raw)
  To: reiserfs-list


Hello all,

Just responding to my previous messages a bit more.  Not too much new to 
say, aside from a bunch of new bug report/error messages.

If you're interested they're at http://users.wpi.edu/~chanin/r4more.txt.

The old 'random' bug is still popping up.  Definitely looks like it has 
something to do with the reiser4_find_next_zero_bit function in bitmap.c.  
I've looked through the file (and includes) and haven't found anything 
obvious - but my C skills are quite what they should be for debugging 
something like this.

Also, there appears to be a new bug, or perhaps simple fluke event that 
resulted in some random file courruption - I've yet to formulate 
uninformed opnions about what caused that one yet, however.

Finally, if there's no need for more bug reports - apparently my last one 
did not warrant a patch or response (or some people just enjoy the season 
more than I do) - feel free to tell me.  I do recall reading that a 
x86_64 machine would be on its way to namesys soon.


Thanks,
Isaac


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-01-02  4:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-02 23:13 AMD64/Reiser4 testing and problems Isaac Chanin
2004-12-04 21:27 ` Alex Zarochentsev
  -- strict thread matches above, loose matches on Subject: below --
2004-12-08  8:04 Isaac Chanin
2004-12-21  5:46 Isaac Chanin
2004-12-22  5:45 ` Isaac Chanin
2005-01-01 21:49 Recursive modfied-timestamp? Hans Reiser
2005-01-02  4:22 ` AMD64/Reiser4 testing and problems Isaac Chanin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.