AMD64 progress?

All of lore.kernel.org
 help / color / mirror / Atom feed

* AMD64 progress?
@ 2005-01-13  5:33 Jake Maciejewski
  2005-01-14  0:03 ` David Masover
  2005-01-14 20:17 ` Alex Zarochentsev
  0 siblings, 2 replies; 12+ messages in thread
From: Jake Maciejewski @ 2005-01-13  5:33 UTC (permalink / raw)
  To: reiserfs-list

There was mention of Namesys getting an AMD64 machine. Do you guys have
it yet? Has there been any progress on my reiser4 bug(s), particularly
the "reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset,
start_offset) >= end_offset" problem?

-- 
Jake Maciejewski <maciejej@msoe.edu>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: AMD64 progress?
  2005-01-13  5:33 AMD64 progress? Jake Maciejewski
@ 2005-01-14  0:03 ` David Masover
  2005-01-14 20:17 ` Alex Zarochentsev
  1 sibling, 0 replies; 12+ messages in thread
From: David Masover @ 2005-01-14  0:03 UTC (permalink / raw)
  To: Jake Maciejewski; +Cc: reiserfs-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jake Maciejewski wrote:
| There was mention of Namesys getting an AMD64 machine. Do you guys have
| it yet? Has there been any progress on my reiser4 bug(s), particularly
| the "reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset,
| start_offset) >= end_offset" problem?

I'd like to second this.  I'm looking at getting an AMD64 machine soon,
but if AMD64 support isn't stable by the time I have the money, I might
dump it for desktop use.  No repacker, no shrinker, bugs with AMD64...
Hate to say it, but until the transactions / metafiles / transparent
compression/encryption is done, XFS is looking better every day.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iQIVAwUBQecMZngHNmZLgCUhAQI5vhAAn+p2ni1YLptts79vDdJZy9l/YS3AhgAQ
kh9EYBRonrLTCPdEYTExvvkQlEug0v1CZWY5ovFSLRJAjsj6prBPBIXrnxm/xLtX
v4ZQu7RAyAPSDdsBq6J42QA8mb2oL34qG65wJdcWZLgZfDbAdi3nf8iaIyY3kq7t
vRcGbrllikcn1UwlDF1b/urTbyHmP7Gc64VJlL0JbPlAi7VdiiVEauc/5XWD38l9
NhKjg2+z0d4qe6JMmMK0i6y65bGdRD9he1rLpCO1YfQ4NlEXg4ml+L757XN9b3Ys
1Q1TxEenCV4jff1Mjji75Uz+EbxasWGN2OEPk7g0qdY9TaMo3KyaUV9ANl3FnCUu
rncJVjfnAhFQ8polmyQtGWg7D4gi3yOdqDp/oAMeeofUNoMbl7MnPv2lX+z0kXHX
SQgV2MX1RTQgtTsrO+ENwMx625oYWHrV9gU9WIyn1WvV7fe/X6OxBnL0arbM71mn
/q9PEmIowDidN9MxqDasP5W4Hl/EDLNJpZU+WoymJa2o2jK29o/HIe+LyuUkMytP
hhylAE77iaTWiTITCFjXr6gf53/8Z2Nue3OPmQqjswVkrJ8EaqACvhwamytwNHQs
C8ZAjwNwQottjb7AtJglVkQ8JtGsGxMV5ogwVR4ej/Tnz4VUbw3QzisBBypFvVpp
Uwsd80C2lYw=
=f0Zv
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: AMD64 progress?
  2005-01-13  5:33 AMD64 progress? Jake Maciejewski
  2005-01-14  0:03 ` David Masover
@ 2005-01-14 20:17 ` Alex Zarochentsev
  2005-01-14 22:58   ` Jake Maciejewski
  2005-01-16  2:26   ` Isaac Chanin
  1 sibling, 2 replies; 12+ messages in thread
From: Alex Zarochentsev @ 2005-01-14 20:17 UTC (permalink / raw)
  To: Jake Maciejewski; +Cc: reiserfs-list

On Wed, Jan 12, 2005 at 11:33:27PM -0600, Jake Maciejewski wrote:
> There was mention of Namesys getting an AMD64 machine. Do you guys have
> it yet? Has there been any progress on my reiser4 bug(s), particularly
> the "reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset,
> start_offset) >= end_offset" problem?

can you test the following patch? I know that amd64 does not require
strong word aligment, but may be amd64 bitops coded in assembler 
do that?

===== plugin/space/bitmap.c 1.185 vs edited =====
--- 1.185/plugin/space/bitmap.c	Sun Dec  5 19:49:52 2004
+++ edited/plugin/space/bitmap.c	Fri Jan 14 23:15:53 2005
@@ -57,13 +57,15 @@
 
 #define CHECKSUM_SIZE    4
 
+#define BYTES_PER_LONG   (sizeof(long))
+
 #if BITS_PER_LONG == 64
 #  define LONG_INT_SHIFT (6)
 #else
 #  define LONG_INT_SHIFT (5)
 #endif
 
-#define LONG_INT_MASK (BITS_PER_LONG - 1)
+#define LONG_INT_MASK (BITS_PER_LONG - 1UL)
 
 typedef unsigned long ulong_t;
 
@@ -182,12 +184,41 @@
 
 #include <asm/bitops.h>
 
+#if BITS_PER_LONG == 64
+
+#define OFF(addr)  (((ulong_t)(addr) & (BYTES_PER_LONG - 1)) << 3)
+#define BASE(addr) ((ulong_t*) ((ulong_t)(addr) & ~(BYTES_PER_LONG - 1)))
+
+static inline void reiser4_set_bit(int nr, void * addr)
+{
+	ext2_set_bit(nr + OFF(addr), BASE(addr));
+}
+
+static inline void reiser4_clear_bit(int nr, void * addr)
+{
+	ext2_clear_bit(nr + OFF(addr), BASE(addr));
+}
+
+static inline int reiser4_test_bit(int nr, void * addr)
+{
+	return ext2_test_bit(nr + OFF(addr), BASE(addr));
+}
+static inline int reiser4_find_next_zero_bit(void * addr, int maxoffset, int offset) 
+{
+	int off = OFF(addr);
+
+	return ext2_find_next_zero_bit(BASE(addr), maxoffset + off, offset + off) - off;
+}
+
+#else
+
 #define reiser4_set_bit(nr, addr)    ext2_set_bit(nr, addr)
 #define reiser4_clear_bit(nr, addr)  ext2_clear_bit(nr, addr)
 #define reiser4_test_bit(nr, addr)  ext2_test_bit(nr, addr)
 
 #define reiser4_find_next_zero_bit(addr, maxoffset, offset) \
 ext2_find_next_zero_bit(addr, maxoffset, offset)
+#endif
 
 /* Search for a set bit in the bit array [@start_offset, @max_offset[, offsets
  * are counted from @addr, return the offset of the first bit if it is found,



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: AMD64 progress?
  2005-01-14 20:17 ` Alex Zarochentsev
@ 2005-01-14 22:58   ` Jake Maciejewski
  2005-01-16  2:26   ` Isaac Chanin
  1 sibling, 0 replies; 12+ messages in thread
From: Jake Maciejewski @ 2005-01-14 22:58 UTC (permalink / raw)
  To: Alex Zarochentsev; +Cc: reiserfs-list

I used the 2.6.10-1 patch on an otherwise vanilla 2.6.10. The filesystem
tested was freshly created with 1.0.3. I did my usual dd and kernel
compile. After the kernel compilation failed, all sorts of processes
just hung, even simple stuff like copying and removing files on other
filesystems. 

reiser4[cc1(11377)]: check_blocks_bitmap (fs/reiser4/plugin/space/bitmap.c:1203)[zam-623]:
code: -2 at fs/reiser4/search.c:1285
reiser4 panicked cowardly: assertion failed: reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, start_offset) >= end_offset
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at debug:131
invalid operand: 0000 [1] 
CPU 0 
Modules linked in: ipv6 i2c_dev it87 i2c_sensor i2c_isa i2c_core ipt_state ip_conntrack iptable_filter ip_tables snd_ioctl32 snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss 3c59x e1000 snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd soundcore snd_page_alloc ehci_hcd ohci_hcd
Pid: 11377, comm: cc1 Not tainted 2.6.10+reiser4
RIP: 0010:[<ffffffff801bf010>] <ffffffff801bf010>{reiser4_do_panic+704}
RSP: 0018:00000100168116e8  EFLAGS: 00010246
RAX: ffffffff8052f4e0 RBX: 0000010016811db8 RCX: ffffffff8052f4e0
RDX: 000001001df1de88 RSI: 000001001df1de88 RDI: 000001001cb2ec00
RBP: ffffff00006e8090 R08: 0000000000000000 R09: 000000000000000d
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001ce2
R13: 0000000000000001 R14: 0000000000001c81 R15: 0000000000001c81
FS:  0000002a95863ae0(0000) GS:ffffffff80546e80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000002a95fb8000 CR3: 0000000000101000 CR4: 00000000000006e0
Process cc1 (pid: 11377, threadinfo 0000010016810000, task 00000100019bb4f0)
Stack: 0000003000000010 00000100168117c8 0000010016811708 0000000000002c71 
       ffffffff803fc0f0 ffffffff8042e4c8 0000000000000000 0000000000000000 
       0000000000000000 0000000000000003 
Call Trace:<ffffffff80208acd>{reiser4_block_count+77} <ffffffff801bf2c9>{schedulable+9} 
       <ffffffff8027b5d5>{load_and_lock_bnode+101} <ffffffff8027ab66>{parse_blocknr+326} 
       <ffffffff801bf175>{reiser4_print_prefix+133} <ffffffff801bf049>{report_err+9} 
       <ffffffff8027c18e>{check_blocks_bitmap+1134} <ffffffff801e3a06>{reiser4_check_block+22} 
       <ffffffff801c69d3>{zget+1075} <ffffffff801ff950>{cbk_level_lookup+112} 
       <ffffffff801ff5fd>{cbk_node_lookup+2445} <ffffffff801d5202>{lock_tail+1298} 
       <ffffffff801d33a5>{znode_is_any_locked+37} <ffffffff801c2544>{check_jload+52} 
       <ffffffff801c46e4>{jload_gfp+1092} <ffffffff801d4815>{move_lh_internal+1429} 
       <ffffffff80200895>{traverse_tree+1749} <ffffffff802017e3>{object_lookup+915} 
       <ffffffff80282c65>{find_file_item+565} <ffffffff8024fed0>{read_tail+0} 
       <ffffffff802853cc>{read_file+524} <ffffffff801e785a>{txn_end+218} 
       <ffffffff80285cc0>{read_unix_file+736} <ffffffff801dc525>{done_context+773} 
       <ffffffff80214185>{reiser4_lookup+485} <ffffffff802146fc>{reiser4_getattr+332} 
       <ffffffff802159c5>{reiser4_read+485} <ffffffff80164d14>{vfs_read+228} 
       <ffffffff80164fc3>{sys_read+83} <ffffffff8010d19a>{system_call+126} 
       

Code: 0f 0b 4d ad 40 80 ff ff ff ff 83 00 48 c7 c6 a0 f0 52 80 48 
RIP <ffffffff801bf010>{reiser4_do_panic+704} RSP <00000100168116e8>
 
On Fri, 2005-01-14 at 23:17 +0300, Alex Zarochentsev wrote:
> On Wed, Jan 12, 2005 at 11:33:27PM -0600, Jake Maciejewski wrote:
> > There was mention of Namesys getting an AMD64 machine. Do you guys have
> > it yet? Has there been any progress on my reiser4 bug(s), particularly
> > the "reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset,
> > start_offset) >= end_offset" problem?
> 
> can you test the following patch? I know that amd64 does not require
> strong word aligment, but may be amd64 bitops coded in assembler 
> do that?
> 
> ===== plugin/space/bitmap.c 1.185 vs edited =====
> --- 1.185/plugin/space/bitmap.c	Sun Dec  5 19:49:52 2004
> +++ edited/plugin/space/bitmap.c	Fri Jan 14 23:15:53 2005
> @@ -57,13 +57,15 @@
>  
>  #define CHECKSUM_SIZE    4
>  
> +#define BYTES_PER_LONG   (sizeof(long))
> +
>  #if BITS_PER_LONG == 64
>  #  define LONG_INT_SHIFT (6)
>  #else
>  #  define LONG_INT_SHIFT (5)
>  #endif
>  
> -#define LONG_INT_MASK (BITS_PER_LONG - 1)
> +#define LONG_INT_MASK (BITS_PER_LONG - 1UL)
>  
>  typedef unsigned long ulong_t;
>  
> @@ -182,12 +184,41 @@
>  
>  #include <asm/bitops.h>
>  
> +#if BITS_PER_LONG == 64
> +
> +#define OFF(addr)  (((ulong_t)(addr) & (BYTES_PER_LONG - 1)) << 3)
> +#define BASE(addr) ((ulong_t*) ((ulong_t)(addr) & ~(BYTES_PER_LONG - 1)))
> +
> +static inline void reiser4_set_bit(int nr, void * addr)
> +{
> +	ext2_set_bit(nr + OFF(addr), BASE(addr));
> +}
> +
> +static inline void reiser4_clear_bit(int nr, void * addr)
> +{
> +	ext2_clear_bit(nr + OFF(addr), BASE(addr));
> +}
> +
> +static inline int reiser4_test_bit(int nr, void * addr)
> +{
> +	return ext2_test_bit(nr + OFF(addr), BASE(addr));
> +}
> +static inline int reiser4_find_next_zero_bit(void * addr, int maxoffset, int offset) 
> +{
> +	int off = OFF(addr);
> +
> +	return ext2_find_next_zero_bit(BASE(addr), maxoffset + off, offset + off) - off;
> +}
> +
> +#else
> +
>  #define reiser4_set_bit(nr, addr)    ext2_set_bit(nr, addr)
>  #define reiser4_clear_bit(nr, addr)  ext2_clear_bit(nr, addr)
>  #define reiser4_test_bit(nr, addr)  ext2_test_bit(nr, addr)
>  
>  #define reiser4_find_next_zero_bit(addr, maxoffset, offset) \
>  ext2_find_next_zero_bit(addr, maxoffset, offset)
> +#endif
>  
>  /* Search for a set bit in the bit array [@start_offset, @max_offset[, offsets
>   * are counted from @addr, return the offset of the first bit if it is found,
> 
> 
-- 
Jake Maciejewski <maciejej@msoe.edu>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: AMD64 progress?
  2005-01-14 20:17 ` Alex Zarochentsev
  2005-01-14 22:58   ` Jake Maciejewski
@ 2005-01-16  2:26   ` Isaac Chanin
  2005-02-07 13:29     ` Alex Zarochentsev
  1 sibling, 1 reply; 12+ messages in thread
From: Isaac Chanin @ 2005-01-16  2:26 UTC (permalink / raw)
  To: Alex Zarochentsev; +Cc: reiserfs-list

Also tested this patch, similar results as to what I've been getting - 
ie. working for awhile (more than a few hours even this time, but that 
could just be coincidence, i suppose) and then going down with the normal 
reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, 
start_offset) >= end_offset error.  For the exact message see 
http://users.wpi.edu/~chanin/r4newpatch.txt.

As an aside, I've also been trying to fix this problem in my own 
haphazard/change-random-things way and have tried a few things that might 
be somewhat useful to know, a few less things to try, at least:

Tried removing the #define's for find_next_zero_bit and 
find_first_zero_bit in bitops.h so that those functions instead definitely 
call what's in bitops.c.  Same errors with or without them.

Tried changing the asm in bitops.c for find_next_zero_bit to use the 
same opcodes as the x86 version (well, slightly different, quadwords and 
all, but you get the idea,) which didn't help.

And, I wrote a little c program to test the find_next_zero_bit function, 
with and without the redefining of the functions (as is done in bitops.h)
and couldn't get the x86_64 code to behave differently from the x86 code. 
Though, to be fair, it wasn't very good code, so it probably didn't test 
the function very well.

I realize that my testing is not the best, but it doesn't seem like it
would hurt to hypothesize that perhaps find_next_zero_bit is not the
problem, but instead, the find_next_zero_bit error is manifesting itself
because of some other problem elsewhere in the code?

By the way, I tested the new patch on 2.6.10-mm2 and 2.6.11_rc1-mm1 both 
of which gave similar error messages (the one i linked to is from the 
11_rc1 version.

On Fri, 14 Jan 2005, Alex Zarochentsev wrote:

> On Wed, Jan 12, 2005 at 11:33:27PM -0600, Jake Maciejewski wrote:
> > There was mention of Namesys getting an AMD64 machine. Do you guys have
> > it yet? Has there been any progress on my reiser4 bug(s), particularly
> > the "reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset,
> > start_offset) >= end_offset" problem?
> 
> can you test the following patch? I know that amd64 does not require
> strong word aligment, but may be amd64 bitops coded in assembler 
> do that?
> 
> ===== plugin/space/bitmap.c 1.185 vs edited =====
> --- 1.185/plugin/space/bitmap.c	Sun Dec  5 19:49:52 2004
> +++ edited/plugin/space/bitmap.c	Fri Jan 14 23:15:53 2005
> @@ -57,13 +57,15 @@
>  
>  #define CHECKSUM_SIZE    4
>  
> +#define BYTES_PER_LONG   (sizeof(long))
> +
>  #if BITS_PER_LONG == 64
>  #  define LONG_INT_SHIFT (6)
>  #else
>  #  define LONG_INT_SHIFT (5)
>  #endif
>  
> -#define LONG_INT_MASK (BITS_PER_LONG - 1)
> +#define LONG_INT_MASK (BITS_PER_LONG - 1UL)
>  
>  typedef unsigned long ulong_t;
>  
> @@ -182,12 +184,41 @@
>  
>  #include <asm/bitops.h>
>  
> +#if BITS_PER_LONG == 64
> +
> +#define OFF(addr)  (((ulong_t)(addr) & (BYTES_PER_LONG - 1)) << 3)
> +#define BASE(addr) ((ulong_t*) ((ulong_t)(addr) & ~(BYTES_PER_LONG - 1)))
> +
> +static inline void reiser4_set_bit(int nr, void * addr)
> +{
> +	ext2_set_bit(nr + OFF(addr), BASE(addr));
> +}
> +
> +static inline void reiser4_clear_bit(int nr, void * addr)
> +{
> +	ext2_clear_bit(nr + OFF(addr), BASE(addr));
> +}
> +
> +static inline int reiser4_test_bit(int nr, void * addr)
> +{
> +	return ext2_test_bit(nr + OFF(addr), BASE(addr));
> +}
> +static inline int reiser4_find_next_zero_bit(void * addr, int maxoffset, int offset) 
> +{
> +	int off = OFF(addr);
> +
> +	return ext2_find_next_zero_bit(BASE(addr), maxoffset + off, offset + off) - off;
> +}
> +
> +#else
> +
>  #define reiser4_set_bit(nr, addr)    ext2_set_bit(nr, addr)
>  #define reiser4_clear_bit(nr, addr)  ext2_clear_bit(nr, addr)
>  #define reiser4_test_bit(nr, addr)  ext2_test_bit(nr, addr)
>  
>  #define reiser4_find_next_zero_bit(addr, maxoffset, offset) \
>  ext2_find_next_zero_bit(addr, maxoffset, offset)
> +#endif
>  
>  /* Search for a set bit in the bit array [@start_offset, @max_offset[, offsets
>   * are counted from @addr, return the offset of the first bit if it is found,
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: AMD64 progress?
  2005-01-16  2:26   ` Isaac Chanin
@ 2005-02-07 13:29     ` Alex Zarochentsev
  2005-02-07 19:34       ` Jake Maciejewski
  2005-02-08 20:29       ` Isaac Chanin
  0 siblings, 2 replies; 12+ messages in thread
From: Alex Zarochentsev @ 2005-02-07 13:29 UTC (permalink / raw)
  To: Isaac Chanin; +Cc: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 551 bytes --]

Hello,

On Sat, Jan 15, 2005 at 09:26:01PM -0500, Isaac Chanin wrote:
> 
> Also tested this patch, similar results as to what I've been getting - 
> ie. working for awhile (more than a few hours even this time, but that 
> could just be coincidence, i suppose) and then going down with the normal 
> reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, 
> start_offset) >= end_offset error.  For the exact message see 
> http://users.wpi.edu/~chanin/r4newpatch.txt.
> 

please try attached patch (it is for fs/reiser4 subtree)

-- 
Alex.

[-- Attachment #2: bitmap-amd64-fix-2.diff --]
[-- Type: text/plain, Size: 590 bytes --]

===== plugin/space/bitmap.c 1.186 vs edited =====
--- 1.186/plugin/space/bitmap.c	Wed Jan 19 18:52:52 2005
+++ edited/plugin/space/bitmap.c	Mon Feb  7 16:18:37 2005
@@ -165,7 +165,7 @@
 static int
 find_next_zero_bit_in_word(ulong_t word, int start_bit)
 {
-	ulong_t mask = 1 << start_bit;
+	ulong_t mask = 1UL << start_bit;
 	int i = start_bit;
 
 	while ((word & mask) != 0) {
@@ -235,7 +235,7 @@
 	assert ("zam-965", start_bit < BITS_PER_LONG);
 	assert ("zam-966", start_bit >= 0);
 
-	bit_mask = (1 << nr);
+	bit_mask = (1UL << nr);
 
 	while (bit_mask != 0) {
 		if (bit_mask & word)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: AMD64 progress?
  2005-02-07 13:29     ` Alex Zarochentsev
@ 2005-02-07 19:34       ` Jake Maciejewski
  2005-02-07 19:51         ` Alex Zarochentsev
  2005-02-08 20:29       ` Isaac Chanin
  1 sibling, 1 reply; 12+ messages in thread
From: Jake Maciejewski @ 2005-02-07 19:34 UTC (permalink / raw)
  To: Alex Zarochentsev; +Cc: Isaac Chanin, reiserfs-list

I'm running reiser4progs 1.0.3 and 2.6.10 patched with reiser4 from
2.6.11-rc3-mm1 (and this patch).

I've been doing the simultaneous dd and kernel compilation that has
always crashed reiser4 on AMD64 in the past. After about an hour with
debugging and two hours without debugging, I'm thinking of more ways to
torture the FS. For now it looks like reiser4 is working on AMD64!

How did you track this bug down anyway? Do you have AMD64 hardware, or
did you look over the code and discover an invalid assumption? If this
bug is indeed fixed, are we any closer to inclusion in vanilla?

On Mon, 2005-02-07 at 16:29 +0300, Alex Zarochentsev wrote:
> Hello,
> 
> On Sat, Jan 15, 2005 at 09:26:01PM -0500, Isaac Chanin wrote:
> > 
> > Also tested this patch, similar results as to what I've been getting - 
> > ie. working for awhile (more than a few hours even this time, but that 
> > could just be coincidence, i suppose) and then going down with the normal 
> > reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, 
> > start_offset) >= end_offset error.  For the exact message see 
> > http://users.wpi.edu/~chanin/r4newpatch.txt.
> > 
> 
> please try attached patch (it is for fs/reiser4 subtree)
> 
-- 
Jake Maciejewski <maciejej@msoe.edu>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: AMD64 progress?
  2005-02-07 19:34       ` Jake Maciejewski
@ 2005-02-07 19:51         ` Alex Zarochentsev
  2005-02-08 18:25           ` Jake Maciejewski
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Zarochentsev @ 2005-02-07 19:51 UTC (permalink / raw)
  To: Jake Maciejewski; +Cc: Isaac Chanin, reiserfs-list

On Mon, Feb 07, 2005 at 01:34:56PM -0600, Jake Maciejewski wrote:
> I'm running reiser4progs 1.0.3 and 2.6.10 patched with reiser4 from
> 2.6.11-rc3-mm1 (and this patch).
> 
> I've been doing the simultaneous dd and kernel compilation that has
> always crashed reiser4 on AMD64 in the past. After about an hour with
> debugging and two hours without debugging, I'm thinking of more ways to
> torture the FS. For now it looks like reiser4 is working on AMD64!

i think so.  reiser4/amd64 passed 5h of stress testing instead of crashing in
first 30min.

> 
> How did you track this bug down anyway? Do you have AMD64 hardware, or

yes, we have amd64 h/w now.

> did you look over the code and discover an invalid assumption? 

I found it only after realizing that reiser4_find_next_set_bit
is broken :( (its simple replacement worked fine)

> If this
> bug is indeed fixed, are we any closer to inclusion in vanilla?
> 
> On Mon, 2005-02-07 at 16:29 +0300, Alex Zarochentsev wrote:
> > Hello,
> > 
> > On Sat, Jan 15, 2005 at 09:26:01PM -0500, Isaac Chanin wrote:
> > > 
> > > Also tested this patch, similar results as to what I've been getting - 
> > > ie. working for awhile (more than a few hours even this time, but that 
> > > could just be coincidence, i suppose) and then going down with the normal 
> > > reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, 
> > > start_offset) >= end_offset error.  For the exact message see 
> > > http://users.wpi.edu/~chanin/r4newpatch.txt.
> > > 
> > 
> > please try attached patch (it is for fs/reiser4 subtree)
> > 
> -- 
> Jake Maciejewski <maciejej@msoe.edu>
> 

-- 
Alex.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: AMD64 progress?
  2005-02-07 19:51         ` Alex Zarochentsev
@ 2005-02-08 18:25           ` Jake Maciejewski
  2005-02-08 19:12             ` Alex Zarochentsev
  0 siblings, 1 reply; 12+ messages in thread
From: Jake Maciejewski @ 2005-02-08 18:25 UTC (permalink / raw)
  To: Alex Zarochentsev; +Cc: Isaac Chanin, reiserfs-list

On Mon, 2005-02-07 at 22:51 +0300, Alex Zarochentsev wrote:
> On Mon, Feb 07, 2005 at 01:34:56PM -0600, Jake Maciejewski wrote:
> > I'm running reiser4progs 1.0.3 and 2.6.10 patched with reiser4 from
> > 2.6.11-rc3-mm1 (and this patch).
> > 
> > I've been doing the simultaneous dd and kernel compilation that has
> > always crashed reiser4 on AMD64 in the past. After about an hour with
> > debugging and two hours without debugging, I'm thinking of more ways to
> > torture the FS. For now it looks like reiser4 is working on AMD64!
> 
> i think so.  reiser4/amd64 passed 5h of stress testing instead of crashing in
> first 30min.
> 

Have you been stress testing with debugging disabled? I was doing some
extreme testing and crashed reiser4 with this patch twice. The same test
that crashed it one of the times passes on reiserfs (didn't try the
other), and if enable debugging, I can torture reiser4 all night and
still not crash it. I'll do some more tests and try to identify a
simple, reproducible crash scenario.

> > 
> > How did you track this bug down anyway? Do you have AMD64 hardware, or
> 
> yes, we have amd64 h/w now.
> 
> > did you look over the code and discover an invalid assumption? 
> 
> I found it only after realizing that reiser4_find_next_set_bit
> is broken :( (its simple replacement worked fine)
> 
> > If this
> > bug is indeed fixed, are we any closer to inclusion in vanilla?
> > 
> > On Mon, 2005-02-07 at 16:29 +0300, Alex Zarochentsev wrote:
> > > Hello,
> > > 
> > > On Sat, Jan 15, 2005 at 09:26:01PM -0500, Isaac Chanin wrote:
> > > > 
> > > > Also tested this patch, similar results as to what I've been getting - 
> > > > ie. working for awhile (more than a few hours even this time, but that 
> > > > could just be coincidence, i suppose) and then going down with the normal 
> > > > reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, 
> > > > start_offset) >= end_offset error.  For the exact message see 
> > > > http://users.wpi.edu/~chanin/r4newpatch.txt.
> > > > 
> > > 
> > > please try attached patch (it is for fs/reiser4 subtree)
> > > 
> > -- 
> > Jake Maciejewski <maciejej@msoe.edu>
> > 
> 
-- 
Jake Maciejewski <maciejej@msoe.edu>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: AMD64 progress?
  2005-02-08 18:25           ` Jake Maciejewski
@ 2005-02-08 19:12             ` Alex Zarochentsev
  2005-02-08 20:49               ` Jake Maciejewski
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Zarochentsev @ 2005-02-08 19:12 UTC (permalink / raw)
  To: Jake Maciejewski; +Cc: Isaac Chanin, reiserfs-list

On Tue, Feb 08, 2005 at 12:25:58PM -0600, Jake Maciejewski wrote:
> On Mon, 2005-02-07 at 22:51 +0300, Alex Zarochentsev wrote:
> > On Mon, Feb 07, 2005 at 01:34:56PM -0600, Jake Maciejewski wrote:
> > > I'm running reiser4progs 1.0.3 and 2.6.10 patched with reiser4 from
> > > 2.6.11-rc3-mm1 (and this patch).
> > > 
> > > I've been doing the simultaneous dd and kernel compilation that has
> > > always crashed reiser4 on AMD64 in the past. After about an hour with
> > > debugging and two hours without debugging, I'm thinking of more ways to
> > > torture the FS. For now it looks like reiser4 is working on AMD64!
> > 
> > i think so.  reiser4/amd64 passed 5h of stress testing instead of crashing in
> > first 30min.
> > 
> Have you been stress testing with debugging disabled? I was doing some
> extreme testing and crashed reiser4 with this patch twice. The same test

how it crashed?  Was the fs corrupted after the crash?

> that crashed it one of the times passes on reiserfs (didn't try the
> other), and if enable debugging, I can torture reiser4 all night and
> still not crash it. I'll do some more tests and try to identify a
> simple, reproducible crash scenario.

Thanks,
Alex.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: AMD64 progress?
  2005-02-08 19:12             ` Alex Zarochentsev
@ 2005-02-08 20:49               ` Jake Maciejewski
  0 siblings, 0 replies; 12+ messages in thread
From: Jake Maciejewski @ 2005-02-08 20:49 UTC (permalink / raw)
  To: Alex Zarochentsev; +Cc: reiserfs-list

On Tue, 2005-02-08 at 22:12 +0300, Alex Zarochentsev wrote:
> On Tue, Feb 08, 2005 at 12:25:58PM -0600, Jake Maciejewski wrote:
> > On Mon, 2005-02-07 at 22:51 +0300, Alex Zarochentsev wrote:
> > > On Mon, Feb 07, 2005 at 01:34:56PM -0600, Jake Maciejewski wrote:
> > > > I'm running reiser4progs 1.0.3 and 2.6.10 patched with reiser4 from
> > > > 2.6.11-rc3-mm1 (and this patch).
> > > > 
> > > > I've been doing the simultaneous dd and kernel compilation that has
> > > > always crashed reiser4 on AMD64 in the past. After about an hour with
> > > > debugging and two hours without debugging, I'm thinking of more ways to
> > > > torture the FS. For now it looks like reiser4 is working on AMD64!
> > > 
> > > i think so.  reiser4/amd64 passed 5h of stress testing instead of crashing in
> > > first 30min.
> > > 
> > Have you been stress testing with debugging disabled? I was doing some
> > extreme testing and crashed reiser4 with this patch twice. The same test
> 
> how it crashed?  Was the fs corrupted after the crash?

As I said, it was extreme testing. I didn't keep track of my testing
procedures because I expected reiser4 to take whatever I threw at it.

Anyway, the first test involved one hard drive with a reiser4 filesystem
on a partition, and another drive with a reiser4 loopback filesystem on
a reiser4 loopback filesystem on XFS. The partition-based filesytem had
bonnie++, a kernel compile, and dd'ing a large file from /dev/zero all
going on at once, as I recall. The top-level loopback filesystem was
also running bonnie++, and I was continually cat'ing its loopback file
to /dev/null. Of course while all this was going, since I didn't expect
trouble, I was seeding about 30 torrents with Azureus (most of which are
actually stored on a Samba server mounted as SMBFS because CIFS has been
unstable ever since I added a gigabit card) and listening to MP3s with
XMMS. The music stopped, X froze, and I couldn't SSH in. After
rebooting, I discovered minor corruption on the parition-based reiser4
filesystem but neither loopback. --fix fixed it and --check after --fix
came up clean. The log from --fix:

FSCK: Directory [12557:6b636f6e666967:1257d]: can't find the object
[1257d:c673796d626f6c2e:12591] pointed by the entry [symbol.c].
FSCK: Directory [12557:6b636f6e666967:1257d]: can't find the object
[1257d:c673796d626f6c2e:12591] pointed by the entry [symbol.c]. Entry is
removed.

I probably should have checked what happened to symbol.c, but I didn't
think anything of it and continued testing on a fresh filesystem.

My next test was dd'ing a large file from /dev/zero, compiling a kernel,
running bonnie++, and "find . -type f -exec cat {} >/dev/null \;", all
looping and running simultaneously on a a fresh filesystem on a
parition, no loopback involved at all. Once again I was doing other
stuff at the time. I know I was watching a movie with mplayer, but I
don't remember if Azureus was going or not. It froze like the previous
time.

Figuring I was onto something with the above test, I tried reiserfs on
the same drive, same parition to eliminate hardware and other errors. It
ran for a few hours until I decided test reiser4 with debugging.

The same crazy combination of dd, kernel compilation, bonnie++, and
find/cat worked at least 7 hours with debugging enabled, although I
might not have been reproducing the conditions exactly because I think I
changed the options to dd so that the filesystem wouldn't fill up, as it
did several times before the crash.

At some point I tried compiling 2.6.11-rc3-mm1, but it failed. My crash
still isn't reproducible, but if I ever get something I have a 32-bit
installation to test if it's an AMD64-only problem.

> 
> > that crashed it one of the times passes on reiserfs (didn't try the
> > other), and if enable debugging, I can torture reiser4 all night and
> > still not crash it. I'll do some more tests and try to identify a
> > simple, reproducible crash scenario.
> 
> Thanks,
> Alex.
-- 
Jake Maciejewski <maciejej@msoe.edu>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: AMD64 progress?
  2005-02-07 13:29     ` Alex Zarochentsev
  2005-02-07 19:34       ` Jake Maciejewski
@ 2005-02-08 20:29       ` Isaac Chanin
  1 sibling, 0 replies; 12+ messages in thread
From: Isaac Chanin @ 2005-02-08 20:29 UTC (permalink / raw)
  To: Alex Zarochentsev; +Cc: reiserfs-list

On Mon, 7 Feb 2005, Alex Zarochentsev wrote:

> Hello,
> 
> On Sat, Jan 15, 2005 at 09:26:01PM -0500, Isaac Chanin wrote:
> > 
> > Also tested this patch, similar results as to what I've been getting - 
> > ie. working for awhile (more than a few hours even this time, but that 
> > could just be coincidence, i suppose) and then going down with the normal 
> > reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, 
> > start_offset) >= end_offset error.  For the exact message see 
> > http://users.wpi.edu/~chanin/r4newpatch.txt.
> > 
> 
> please try attached patch (it is for fs/reiser4 subtree)
> 
> -- 
> Alex.
> 

Alex,

Thanks for the patch, just chiming in to say that this also seems to be 
working well for me too.  I've never had any consistant crashes to test,
but with this patch reiser4 has managed to not trigger a kernel panic in
at least twice as long as without it - could just be luck; but, with
Jake's test no longer crashing it on his machine I'd "the bug" is 
fixed.

If anything does go wrong eventually I'll be sure to give you the error 
messages and such.

Thanks again,
Isaac

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-02-08 20:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-13  5:33 AMD64 progress? Jake Maciejewski
2005-01-14  0:03 ` David Masover
2005-01-14 20:17 ` Alex Zarochentsev
2005-01-14 22:58   ` Jake Maciejewski
2005-01-16  2:26   ` Isaac Chanin
2005-02-07 13:29     ` Alex Zarochentsev
2005-02-07 19:34       ` Jake Maciejewski
2005-02-07 19:51         ` Alex Zarochentsev
2005-02-08 18:25           ` Jake Maciejewski
2005-02-08 19:12             ` Alex Zarochentsev
2005-02-08 20:49               ` Jake Maciejewski
2005-02-08 20:29       ` Isaac Chanin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.