* AMD64 progress? @ 2005-01-13 5:33 Jake Maciejewski 2005-01-14 0:03 ` David Masover 2005-01-14 20:17 ` Alex Zarochentsev 0 siblings, 2 replies; 12+ messages in thread From: Jake Maciejewski @ 2005-01-13 5:33 UTC (permalink / raw) To: reiserfs-list There was mention of Namesys getting an AMD64 machine. Do you guys have it yet? Has there been any progress on my reiser4 bug(s), particularly the "reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, start_offset) >= end_offset" problem? -- Jake Maciejewski <maciejej@msoe.edu> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: AMD64 progress? 2005-01-13 5:33 AMD64 progress? Jake Maciejewski @ 2005-01-14 0:03 ` David Masover 2005-01-14 20:17 ` Alex Zarochentsev 1 sibling, 0 replies; 12+ messages in thread From: David Masover @ 2005-01-14 0:03 UTC (permalink / raw) To: Jake Maciejewski; +Cc: reiserfs-list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jake Maciejewski wrote: | There was mention of Namesys getting an AMD64 machine. Do you guys have | it yet? Has there been any progress on my reiser4 bug(s), particularly | the "reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, | start_offset) >= end_offset" problem? I'd like to second this. I'm looking at getting an AMD64 machine soon, but if AMD64 support isn't stable by the time I have the money, I might dump it for desktop use. No repacker, no shrinker, bugs with AMD64... Hate to say it, but until the transactions / metafiles / transparent compression/encryption is done, XFS is looking better every day. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iQIVAwUBQecMZngHNmZLgCUhAQI5vhAAn+p2ni1YLptts79vDdJZy9l/YS3AhgAQ kh9EYBRonrLTCPdEYTExvvkQlEug0v1CZWY5ovFSLRJAjsj6prBPBIXrnxm/xLtX v4ZQu7RAyAPSDdsBq6J42QA8mb2oL34qG65wJdcWZLgZfDbAdi3nf8iaIyY3kq7t vRcGbrllikcn1UwlDF1b/urTbyHmP7Gc64VJlL0JbPlAi7VdiiVEauc/5XWD38l9 NhKjg2+z0d4qe6JMmMK0i6y65bGdRD9he1rLpCO1YfQ4NlEXg4ml+L757XN9b3Ys 1Q1TxEenCV4jff1Mjji75Uz+EbxasWGN2OEPk7g0qdY9TaMo3KyaUV9ANl3FnCUu rncJVjfnAhFQ8polmyQtGWg7D4gi3yOdqDp/oAMeeofUNoMbl7MnPv2lX+z0kXHX SQgV2MX1RTQgtTsrO+ENwMx625oYWHrV9gU9WIyn1WvV7fe/X6OxBnL0arbM71mn /q9PEmIowDidN9MxqDasP5W4Hl/EDLNJpZU+WoymJa2o2jK29o/HIe+LyuUkMytP hhylAE77iaTWiTITCFjXr6gf53/8Z2Nue3OPmQqjswVkrJ8EaqACvhwamytwNHQs C8ZAjwNwQottjb7AtJglVkQ8JtGsGxMV5ogwVR4ej/Tnz4VUbw3QzisBBypFvVpp Uwsd80C2lYw= =f0Zv -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: AMD64 progress? 2005-01-13 5:33 AMD64 progress? Jake Maciejewski 2005-01-14 0:03 ` David Masover @ 2005-01-14 20:17 ` Alex Zarochentsev 2005-01-14 22:58 ` Jake Maciejewski 2005-01-16 2:26 ` Isaac Chanin 1 sibling, 2 replies; 12+ messages in thread From: Alex Zarochentsev @ 2005-01-14 20:17 UTC (permalink / raw) To: Jake Maciejewski; +Cc: reiserfs-list On Wed, Jan 12, 2005 at 11:33:27PM -0600, Jake Maciejewski wrote: > There was mention of Namesys getting an AMD64 machine. Do you guys have > it yet? Has there been any progress on my reiser4 bug(s), particularly > the "reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, > start_offset) >= end_offset" problem? can you test the following patch? I know that amd64 does not require strong word aligment, but may be amd64 bitops coded in assembler do that? ===== plugin/space/bitmap.c 1.185 vs edited ===== --- 1.185/plugin/space/bitmap.c Sun Dec 5 19:49:52 2004 +++ edited/plugin/space/bitmap.c Fri Jan 14 23:15:53 2005 @@ -57,13 +57,15 @@ #define CHECKSUM_SIZE 4 +#define BYTES_PER_LONG (sizeof(long)) + #if BITS_PER_LONG == 64 # define LONG_INT_SHIFT (6) #else # define LONG_INT_SHIFT (5) #endif -#define LONG_INT_MASK (BITS_PER_LONG - 1) +#define LONG_INT_MASK (BITS_PER_LONG - 1UL) typedef unsigned long ulong_t; @@ -182,12 +184,41 @@ #include <asm/bitops.h> +#if BITS_PER_LONG == 64 + +#define OFF(addr) (((ulong_t)(addr) & (BYTES_PER_LONG - 1)) << 3) +#define BASE(addr) ((ulong_t*) ((ulong_t)(addr) & ~(BYTES_PER_LONG - 1))) + +static inline void reiser4_set_bit(int nr, void * addr) +{ + ext2_set_bit(nr + OFF(addr), BASE(addr)); +} + +static inline void reiser4_clear_bit(int nr, void * addr) +{ + ext2_clear_bit(nr + OFF(addr), BASE(addr)); +} + +static inline int reiser4_test_bit(int nr, void * addr) +{ + return ext2_test_bit(nr + OFF(addr), BASE(addr)); +} +static inline int reiser4_find_next_zero_bit(void * addr, int maxoffset, int offset) +{ + int off = OFF(addr); + + return ext2_find_next_zero_bit(BASE(addr), maxoffset + off, offset + off) - off; +} + +#else + #define reiser4_set_bit(nr, addr) ext2_set_bit(nr, addr) #define reiser4_clear_bit(nr, addr) ext2_clear_bit(nr, addr) #define reiser4_test_bit(nr, addr) ext2_test_bit(nr, addr) #define reiser4_find_next_zero_bit(addr, maxoffset, offset) \ ext2_find_next_zero_bit(addr, maxoffset, offset) +#endif /* Search for a set bit in the bit array [@start_offset, @max_offset[, offsets * are counted from @addr, return the offset of the first bit if it is found, ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: AMD64 progress? 2005-01-14 20:17 ` Alex Zarochentsev @ 2005-01-14 22:58 ` Jake Maciejewski 2005-01-16 2:26 ` Isaac Chanin 1 sibling, 0 replies; 12+ messages in thread From: Jake Maciejewski @ 2005-01-14 22:58 UTC (permalink / raw) To: Alex Zarochentsev; +Cc: reiserfs-list I used the 2.6.10-1 patch on an otherwise vanilla 2.6.10. The filesystem tested was freshly created with 1.0.3. I did my usual dd and kernel compile. After the kernel compilation failed, all sorts of processes just hung, even simple stuff like copying and removing files on other filesystems. reiser4[cc1(11377)]: check_blocks_bitmap (fs/reiser4/plugin/space/bitmap.c:1203)[zam-623]: code: -2 at fs/reiser4/search.c:1285 reiser4 panicked cowardly: assertion failed: reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, start_offset) >= end_offset ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at debug:131 invalid operand: 0000 [1] CPU 0 Modules linked in: ipv6 i2c_dev it87 i2c_sensor i2c_isa i2c_core ipt_state ip_conntrack iptable_filter ip_tables snd_ioctl32 snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss 3c59x e1000 snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd soundcore snd_page_alloc ehci_hcd ohci_hcd Pid: 11377, comm: cc1 Not tainted 2.6.10+reiser4 RIP: 0010:[<ffffffff801bf010>] <ffffffff801bf010>{reiser4_do_panic+704} RSP: 0018:00000100168116e8 EFLAGS: 00010246 RAX: ffffffff8052f4e0 RBX: 0000010016811db8 RCX: ffffffff8052f4e0 RDX: 000001001df1de88 RSI: 000001001df1de88 RDI: 000001001cb2ec00 RBP: ffffff00006e8090 R08: 0000000000000000 R09: 000000000000000d R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001ce2 R13: 0000000000000001 R14: 0000000000001c81 R15: 0000000000001c81 FS: 0000002a95863ae0(0000) GS:ffffffff80546e80(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000002a95fb8000 CR3: 0000000000101000 CR4: 00000000000006e0 Process cc1 (pid: 11377, threadinfo 0000010016810000, task 00000100019bb4f0) Stack: 0000003000000010 00000100168117c8 0000010016811708 0000000000002c71 ffffffff803fc0f0 ffffffff8042e4c8 0000000000000000 0000000000000000 0000000000000000 0000000000000003 Call Trace:<ffffffff80208acd>{reiser4_block_count+77} <ffffffff801bf2c9>{schedulable+9} <ffffffff8027b5d5>{load_and_lock_bnode+101} <ffffffff8027ab66>{parse_blocknr+326} <ffffffff801bf175>{reiser4_print_prefix+133} <ffffffff801bf049>{report_err+9} <ffffffff8027c18e>{check_blocks_bitmap+1134} <ffffffff801e3a06>{reiser4_check_block+22} <ffffffff801c69d3>{zget+1075} <ffffffff801ff950>{cbk_level_lookup+112} <ffffffff801ff5fd>{cbk_node_lookup+2445} <ffffffff801d5202>{lock_tail+1298} <ffffffff801d33a5>{znode_is_any_locked+37} <ffffffff801c2544>{check_jload+52} <ffffffff801c46e4>{jload_gfp+1092} <ffffffff801d4815>{move_lh_internal+1429} <ffffffff80200895>{traverse_tree+1749} <ffffffff802017e3>{object_lookup+915} <ffffffff80282c65>{find_file_item+565} <ffffffff8024fed0>{read_tail+0} <ffffffff802853cc>{read_file+524} <ffffffff801e785a>{txn_end+218} <ffffffff80285cc0>{read_unix_file+736} <ffffffff801dc525>{done_context+773} <ffffffff80214185>{reiser4_lookup+485} <ffffffff802146fc>{reiser4_getattr+332} <ffffffff802159c5>{reiser4_read+485} <ffffffff80164d14>{vfs_read+228} <ffffffff80164fc3>{sys_read+83} <ffffffff8010d19a>{system_call+126} Code: 0f 0b 4d ad 40 80 ff ff ff ff 83 00 48 c7 c6 a0 f0 52 80 48 RIP <ffffffff801bf010>{reiser4_do_panic+704} RSP <00000100168116e8> On Fri, 2005-01-14 at 23:17 +0300, Alex Zarochentsev wrote: > On Wed, Jan 12, 2005 at 11:33:27PM -0600, Jake Maciejewski wrote: > > There was mention of Namesys getting an AMD64 machine. Do you guys have > > it yet? Has there been any progress on my reiser4 bug(s), particularly > > the "reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, > > start_offset) >= end_offset" problem? > > can you test the following patch? I know that amd64 does not require > strong word aligment, but may be amd64 bitops coded in assembler > do that? > > ===== plugin/space/bitmap.c 1.185 vs edited ===== > --- 1.185/plugin/space/bitmap.c Sun Dec 5 19:49:52 2004 > +++ edited/plugin/space/bitmap.c Fri Jan 14 23:15:53 2005 > @@ -57,13 +57,15 @@ > > #define CHECKSUM_SIZE 4 > > +#define BYTES_PER_LONG (sizeof(long)) > + > #if BITS_PER_LONG == 64 > # define LONG_INT_SHIFT (6) > #else > # define LONG_INT_SHIFT (5) > #endif > > -#define LONG_INT_MASK (BITS_PER_LONG - 1) > +#define LONG_INT_MASK (BITS_PER_LONG - 1UL) > > typedef unsigned long ulong_t; > > @@ -182,12 +184,41 @@ > > #include <asm/bitops.h> > > +#if BITS_PER_LONG == 64 > + > +#define OFF(addr) (((ulong_t)(addr) & (BYTES_PER_LONG - 1)) << 3) > +#define BASE(addr) ((ulong_t*) ((ulong_t)(addr) & ~(BYTES_PER_LONG - 1))) > + > +static inline void reiser4_set_bit(int nr, void * addr) > +{ > + ext2_set_bit(nr + OFF(addr), BASE(addr)); > +} > + > +static inline void reiser4_clear_bit(int nr, void * addr) > +{ > + ext2_clear_bit(nr + OFF(addr), BASE(addr)); > +} > + > +static inline int reiser4_test_bit(int nr, void * addr) > +{ > + return ext2_test_bit(nr + OFF(addr), BASE(addr)); > +} > +static inline int reiser4_find_next_zero_bit(void * addr, int maxoffset, int offset) > +{ > + int off = OFF(addr); > + > + return ext2_find_next_zero_bit(BASE(addr), maxoffset + off, offset + off) - off; > +} > + > +#else > + > #define reiser4_set_bit(nr, addr) ext2_set_bit(nr, addr) > #define reiser4_clear_bit(nr, addr) ext2_clear_bit(nr, addr) > #define reiser4_test_bit(nr, addr) ext2_test_bit(nr, addr) > > #define reiser4_find_next_zero_bit(addr, maxoffset, offset) \ > ext2_find_next_zero_bit(addr, maxoffset, offset) > +#endif > > /* Search for a set bit in the bit array [@start_offset, @max_offset[, offsets > * are counted from @addr, return the offset of the first bit if it is found, > > -- Jake Maciejewski <maciejej@msoe.edu> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: AMD64 progress? 2005-01-14 20:17 ` Alex Zarochentsev 2005-01-14 22:58 ` Jake Maciejewski @ 2005-01-16 2:26 ` Isaac Chanin 2005-02-07 13:29 ` Alex Zarochentsev 1 sibling, 1 reply; 12+ messages in thread From: Isaac Chanin @ 2005-01-16 2:26 UTC (permalink / raw) To: Alex Zarochentsev; +Cc: reiserfs-list Also tested this patch, similar results as to what I've been getting - ie. working for awhile (more than a few hours even this time, but that could just be coincidence, i suppose) and then going down with the normal reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, start_offset) >= end_offset error. For the exact message see http://users.wpi.edu/~chanin/r4newpatch.txt. As an aside, I've also been trying to fix this problem in my own haphazard/change-random-things way and have tried a few things that might be somewhat useful to know, a few less things to try, at least: Tried removing the #define's for find_next_zero_bit and find_first_zero_bit in bitops.h so that those functions instead definitely call what's in bitops.c. Same errors with or without them. Tried changing the asm in bitops.c for find_next_zero_bit to use the same opcodes as the x86 version (well, slightly different, quadwords and all, but you get the idea,) which didn't help. And, I wrote a little c program to test the find_next_zero_bit function, with and without the redefining of the functions (as is done in bitops.h) and couldn't get the x86_64 code to behave differently from the x86 code. Though, to be fair, it wasn't very good code, so it probably didn't test the function very well. I realize that my testing is not the best, but it doesn't seem like it would hurt to hypothesize that perhaps find_next_zero_bit is not the problem, but instead, the find_next_zero_bit error is manifesting itself because of some other problem elsewhere in the code? By the way, I tested the new patch on 2.6.10-mm2 and 2.6.11_rc1-mm1 both of which gave similar error messages (the one i linked to is from the 11_rc1 version. On Fri, 14 Jan 2005, Alex Zarochentsev wrote: > On Wed, Jan 12, 2005 at 11:33:27PM -0600, Jake Maciejewski wrote: > > There was mention of Namesys getting an AMD64 machine. Do you guys have > > it yet? Has there been any progress on my reiser4 bug(s), particularly > > the "reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, > > start_offset) >= end_offset" problem? > > can you test the following patch? I know that amd64 does not require > strong word aligment, but may be amd64 bitops coded in assembler > do that? > > ===== plugin/space/bitmap.c 1.185 vs edited ===== > --- 1.185/plugin/space/bitmap.c Sun Dec 5 19:49:52 2004 > +++ edited/plugin/space/bitmap.c Fri Jan 14 23:15:53 2005 > @@ -57,13 +57,15 @@ > > #define CHECKSUM_SIZE 4 > > +#define BYTES_PER_LONG (sizeof(long)) > + > #if BITS_PER_LONG == 64 > # define LONG_INT_SHIFT (6) > #else > # define LONG_INT_SHIFT (5) > #endif > > -#define LONG_INT_MASK (BITS_PER_LONG - 1) > +#define LONG_INT_MASK (BITS_PER_LONG - 1UL) > > typedef unsigned long ulong_t; > > @@ -182,12 +184,41 @@ > > #include <asm/bitops.h> > > +#if BITS_PER_LONG == 64 > + > +#define OFF(addr) (((ulong_t)(addr) & (BYTES_PER_LONG - 1)) << 3) > +#define BASE(addr) ((ulong_t*) ((ulong_t)(addr) & ~(BYTES_PER_LONG - 1))) > + > +static inline void reiser4_set_bit(int nr, void * addr) > +{ > + ext2_set_bit(nr + OFF(addr), BASE(addr)); > +} > + > +static inline void reiser4_clear_bit(int nr, void * addr) > +{ > + ext2_clear_bit(nr + OFF(addr), BASE(addr)); > +} > + > +static inline int reiser4_test_bit(int nr, void * addr) > +{ > + return ext2_test_bit(nr + OFF(addr), BASE(addr)); > +} > +static inline int reiser4_find_next_zero_bit(void * addr, int maxoffset, int offset) > +{ > + int off = OFF(addr); > + > + return ext2_find_next_zero_bit(BASE(addr), maxoffset + off, offset + off) - off; > +} > + > +#else > + > #define reiser4_set_bit(nr, addr) ext2_set_bit(nr, addr) > #define reiser4_clear_bit(nr, addr) ext2_clear_bit(nr, addr) > #define reiser4_test_bit(nr, addr) ext2_test_bit(nr, addr) > > #define reiser4_find_next_zero_bit(addr, maxoffset, offset) \ > ext2_find_next_zero_bit(addr, maxoffset, offset) > +#endif > > /* Search for a set bit in the bit array [@start_offset, @max_offset[, offsets > * are counted from @addr, return the offset of the first bit if it is found, > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: AMD64 progress? 2005-01-16 2:26 ` Isaac Chanin @ 2005-02-07 13:29 ` Alex Zarochentsev 2005-02-07 19:34 ` Jake Maciejewski 2005-02-08 20:29 ` Isaac Chanin 0 siblings, 2 replies; 12+ messages in thread From: Alex Zarochentsev @ 2005-02-07 13:29 UTC (permalink / raw) To: Isaac Chanin; +Cc: reiserfs-list [-- Attachment #1: Type: text/plain, Size: 551 bytes --] Hello, On Sat, Jan 15, 2005 at 09:26:01PM -0500, Isaac Chanin wrote: > > Also tested this patch, similar results as to what I've been getting - > ie. working for awhile (more than a few hours even this time, but that > could just be coincidence, i suppose) and then going down with the normal > reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, > start_offset) >= end_offset error. For the exact message see > http://users.wpi.edu/~chanin/r4newpatch.txt. > please try attached patch (it is for fs/reiser4 subtree) -- Alex. [-- Attachment #2: bitmap-amd64-fix-2.diff --] [-- Type: text/plain, Size: 590 bytes --] ===== plugin/space/bitmap.c 1.186 vs edited ===== --- 1.186/plugin/space/bitmap.c Wed Jan 19 18:52:52 2005 +++ edited/plugin/space/bitmap.c Mon Feb 7 16:18:37 2005 @@ -165,7 +165,7 @@ static int find_next_zero_bit_in_word(ulong_t word, int start_bit) { - ulong_t mask = 1 << start_bit; + ulong_t mask = 1UL << start_bit; int i = start_bit; while ((word & mask) != 0) { @@ -235,7 +235,7 @@ assert ("zam-965", start_bit < BITS_PER_LONG); assert ("zam-966", start_bit >= 0); - bit_mask = (1 << nr); + bit_mask = (1UL << nr); while (bit_mask != 0) { if (bit_mask & word) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: AMD64 progress? 2005-02-07 13:29 ` Alex Zarochentsev @ 2005-02-07 19:34 ` Jake Maciejewski 2005-02-07 19:51 ` Alex Zarochentsev 2005-02-08 20:29 ` Isaac Chanin 1 sibling, 1 reply; 12+ messages in thread From: Jake Maciejewski @ 2005-02-07 19:34 UTC (permalink / raw) To: Alex Zarochentsev; +Cc: Isaac Chanin, reiserfs-list I'm running reiser4progs 1.0.3 and 2.6.10 patched with reiser4 from 2.6.11-rc3-mm1 (and this patch). I've been doing the simultaneous dd and kernel compilation that has always crashed reiser4 on AMD64 in the past. After about an hour with debugging and two hours without debugging, I'm thinking of more ways to torture the FS. For now it looks like reiser4 is working on AMD64! How did you track this bug down anyway? Do you have AMD64 hardware, or did you look over the code and discover an invalid assumption? If this bug is indeed fixed, are we any closer to inclusion in vanilla? On Mon, 2005-02-07 at 16:29 +0300, Alex Zarochentsev wrote: > Hello, > > On Sat, Jan 15, 2005 at 09:26:01PM -0500, Isaac Chanin wrote: > > > > Also tested this patch, similar results as to what I've been getting - > > ie. working for awhile (more than a few hours even this time, but that > > could just be coincidence, i suppose) and then going down with the normal > > reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, > > start_offset) >= end_offset error. For the exact message see > > http://users.wpi.edu/~chanin/r4newpatch.txt. > > > > please try attached patch (it is for fs/reiser4 subtree) > -- Jake Maciejewski <maciejej@msoe.edu> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: AMD64 progress? 2005-02-07 19:34 ` Jake Maciejewski @ 2005-02-07 19:51 ` Alex Zarochentsev 2005-02-08 18:25 ` Jake Maciejewski 0 siblings, 1 reply; 12+ messages in thread From: Alex Zarochentsev @ 2005-02-07 19:51 UTC (permalink / raw) To: Jake Maciejewski; +Cc: Isaac Chanin, reiserfs-list On Mon, Feb 07, 2005 at 01:34:56PM -0600, Jake Maciejewski wrote: > I'm running reiser4progs 1.0.3 and 2.6.10 patched with reiser4 from > 2.6.11-rc3-mm1 (and this patch). > > I've been doing the simultaneous dd and kernel compilation that has > always crashed reiser4 on AMD64 in the past. After about an hour with > debugging and two hours without debugging, I'm thinking of more ways to > torture the FS. For now it looks like reiser4 is working on AMD64! i think so. reiser4/amd64 passed 5h of stress testing instead of crashing in first 30min. > > How did you track this bug down anyway? Do you have AMD64 hardware, or yes, we have amd64 h/w now. > did you look over the code and discover an invalid assumption? I found it only after realizing that reiser4_find_next_set_bit is broken :( (its simple replacement worked fine) > If this > bug is indeed fixed, are we any closer to inclusion in vanilla? > > On Mon, 2005-02-07 at 16:29 +0300, Alex Zarochentsev wrote: > > Hello, > > > > On Sat, Jan 15, 2005 at 09:26:01PM -0500, Isaac Chanin wrote: > > > > > > Also tested this patch, similar results as to what I've been getting - > > > ie. working for awhile (more than a few hours even this time, but that > > > could just be coincidence, i suppose) and then going down with the normal > > > reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, > > > start_offset) >= end_offset error. For the exact message see > > > http://users.wpi.edu/~chanin/r4newpatch.txt. > > > > > > > please try attached patch (it is for fs/reiser4 subtree) > > > -- > Jake Maciejewski <maciejej@msoe.edu> > -- Alex. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: AMD64 progress? 2005-02-07 19:51 ` Alex Zarochentsev @ 2005-02-08 18:25 ` Jake Maciejewski 2005-02-08 19:12 ` Alex Zarochentsev 0 siblings, 1 reply; 12+ messages in thread From: Jake Maciejewski @ 2005-02-08 18:25 UTC (permalink / raw) To: Alex Zarochentsev; +Cc: Isaac Chanin, reiserfs-list On Mon, 2005-02-07 at 22:51 +0300, Alex Zarochentsev wrote: > On Mon, Feb 07, 2005 at 01:34:56PM -0600, Jake Maciejewski wrote: > > I'm running reiser4progs 1.0.3 and 2.6.10 patched with reiser4 from > > 2.6.11-rc3-mm1 (and this patch). > > > > I've been doing the simultaneous dd and kernel compilation that has > > always crashed reiser4 on AMD64 in the past. After about an hour with > > debugging and two hours without debugging, I'm thinking of more ways to > > torture the FS. For now it looks like reiser4 is working on AMD64! > > i think so. reiser4/amd64 passed 5h of stress testing instead of crashing in > first 30min. > Have you been stress testing with debugging disabled? I was doing some extreme testing and crashed reiser4 with this patch twice. The same test that crashed it one of the times passes on reiserfs (didn't try the other), and if enable debugging, I can torture reiser4 all night and still not crash it. I'll do some more tests and try to identify a simple, reproducible crash scenario. > > > > How did you track this bug down anyway? Do you have AMD64 hardware, or > > yes, we have amd64 h/w now. > > > did you look over the code and discover an invalid assumption? > > I found it only after realizing that reiser4_find_next_set_bit > is broken :( (its simple replacement worked fine) > > > If this > > bug is indeed fixed, are we any closer to inclusion in vanilla? > > > > On Mon, 2005-02-07 at 16:29 +0300, Alex Zarochentsev wrote: > > > Hello, > > > > > > On Sat, Jan 15, 2005 at 09:26:01PM -0500, Isaac Chanin wrote: > > > > > > > > Also tested this patch, similar results as to what I've been getting - > > > > ie. working for awhile (more than a few hours even this time, but that > > > > could just be coincidence, i suppose) and then going down with the normal > > > > reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, > > > > start_offset) >= end_offset error. For the exact message see > > > > http://users.wpi.edu/~chanin/r4newpatch.txt. > > > > > > > > > > please try attached patch (it is for fs/reiser4 subtree) > > > > > -- > > Jake Maciejewski <maciejej@msoe.edu> > > > -- Jake Maciejewski <maciejej@msoe.edu> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: AMD64 progress? 2005-02-08 18:25 ` Jake Maciejewski @ 2005-02-08 19:12 ` Alex Zarochentsev 2005-02-08 20:49 ` Jake Maciejewski 0 siblings, 1 reply; 12+ messages in thread From: Alex Zarochentsev @ 2005-02-08 19:12 UTC (permalink / raw) To: Jake Maciejewski; +Cc: Isaac Chanin, reiserfs-list On Tue, Feb 08, 2005 at 12:25:58PM -0600, Jake Maciejewski wrote: > On Mon, 2005-02-07 at 22:51 +0300, Alex Zarochentsev wrote: > > On Mon, Feb 07, 2005 at 01:34:56PM -0600, Jake Maciejewski wrote: > > > I'm running reiser4progs 1.0.3 and 2.6.10 patched with reiser4 from > > > 2.6.11-rc3-mm1 (and this patch). > > > > > > I've been doing the simultaneous dd and kernel compilation that has > > > always crashed reiser4 on AMD64 in the past. After about an hour with > > > debugging and two hours without debugging, I'm thinking of more ways to > > > torture the FS. For now it looks like reiser4 is working on AMD64! > > > > i think so. reiser4/amd64 passed 5h of stress testing instead of crashing in > > first 30min. > > > Have you been stress testing with debugging disabled? I was doing some > extreme testing and crashed reiser4 with this patch twice. The same test how it crashed? Was the fs corrupted after the crash? > that crashed it one of the times passes on reiserfs (didn't try the > other), and if enable debugging, I can torture reiser4 all night and > still not crash it. I'll do some more tests and try to identify a > simple, reproducible crash scenario. Thanks, Alex. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: AMD64 progress? 2005-02-08 19:12 ` Alex Zarochentsev @ 2005-02-08 20:49 ` Jake Maciejewski 0 siblings, 0 replies; 12+ messages in thread From: Jake Maciejewski @ 2005-02-08 20:49 UTC (permalink / raw) To: Alex Zarochentsev; +Cc: reiserfs-list On Tue, 2005-02-08 at 22:12 +0300, Alex Zarochentsev wrote: > On Tue, Feb 08, 2005 at 12:25:58PM -0600, Jake Maciejewski wrote: > > On Mon, 2005-02-07 at 22:51 +0300, Alex Zarochentsev wrote: > > > On Mon, Feb 07, 2005 at 01:34:56PM -0600, Jake Maciejewski wrote: > > > > I'm running reiser4progs 1.0.3 and 2.6.10 patched with reiser4 from > > > > 2.6.11-rc3-mm1 (and this patch). > > > > > > > > I've been doing the simultaneous dd and kernel compilation that has > > > > always crashed reiser4 on AMD64 in the past. After about an hour with > > > > debugging and two hours without debugging, I'm thinking of more ways to > > > > torture the FS. For now it looks like reiser4 is working on AMD64! > > > > > > i think so. reiser4/amd64 passed 5h of stress testing instead of crashing in > > > first 30min. > > > > > Have you been stress testing with debugging disabled? I was doing some > > extreme testing and crashed reiser4 with this patch twice. The same test > > how it crashed? Was the fs corrupted after the crash? As I said, it was extreme testing. I didn't keep track of my testing procedures because I expected reiser4 to take whatever I threw at it. Anyway, the first test involved one hard drive with a reiser4 filesystem on a partition, and another drive with a reiser4 loopback filesystem on a reiser4 loopback filesystem on XFS. The partition-based filesytem had bonnie++, a kernel compile, and dd'ing a large file from /dev/zero all going on at once, as I recall. The top-level loopback filesystem was also running bonnie++, and I was continually cat'ing its loopback file to /dev/null. Of course while all this was going, since I didn't expect trouble, I was seeding about 30 torrents with Azureus (most of which are actually stored on a Samba server mounted as SMBFS because CIFS has been unstable ever since I added a gigabit card) and listening to MP3s with XMMS. The music stopped, X froze, and I couldn't SSH in. After rebooting, I discovered minor corruption on the parition-based reiser4 filesystem but neither loopback. --fix fixed it and --check after --fix came up clean. The log from --fix: FSCK: Directory [12557:6b636f6e666967:1257d]: can't find the object [1257d:c673796d626f6c2e:12591] pointed by the entry [symbol.c]. FSCK: Directory [12557:6b636f6e666967:1257d]: can't find the object [1257d:c673796d626f6c2e:12591] pointed by the entry [symbol.c]. Entry is removed. I probably should have checked what happened to symbol.c, but I didn't think anything of it and continued testing on a fresh filesystem. My next test was dd'ing a large file from /dev/zero, compiling a kernel, running bonnie++, and "find . -type f -exec cat {} >/dev/null \;", all looping and running simultaneously on a a fresh filesystem on a parition, no loopback involved at all. Once again I was doing other stuff at the time. I know I was watching a movie with mplayer, but I don't remember if Azureus was going or not. It froze like the previous time. Figuring I was onto something with the above test, I tried reiserfs on the same drive, same parition to eliminate hardware and other errors. It ran for a few hours until I decided test reiser4 with debugging. The same crazy combination of dd, kernel compilation, bonnie++, and find/cat worked at least 7 hours with debugging enabled, although I might not have been reproducing the conditions exactly because I think I changed the options to dd so that the filesystem wouldn't fill up, as it did several times before the crash. At some point I tried compiling 2.6.11-rc3-mm1, but it failed. My crash still isn't reproducible, but if I ever get something I have a 32-bit installation to test if it's an AMD64-only problem. > > > that crashed it one of the times passes on reiserfs (didn't try the > > other), and if enable debugging, I can torture reiser4 all night and > > still not crash it. I'll do some more tests and try to identify a > > simple, reproducible crash scenario. > > Thanks, > Alex. -- Jake Maciejewski <maciejej@msoe.edu> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: AMD64 progress? 2005-02-07 13:29 ` Alex Zarochentsev 2005-02-07 19:34 ` Jake Maciejewski @ 2005-02-08 20:29 ` Isaac Chanin 1 sibling, 0 replies; 12+ messages in thread From: Isaac Chanin @ 2005-02-08 20:29 UTC (permalink / raw) To: Alex Zarochentsev; +Cc: reiserfs-list On Mon, 7 Feb 2005, Alex Zarochentsev wrote: > Hello, > > On Sat, Jan 15, 2005 at 09:26:01PM -0500, Isaac Chanin wrote: > > > > Also tested this patch, similar results as to what I've been getting - > > ie. working for awhile (more than a few hours even this time, but that > > could just be coincidence, i suppose) and then going down with the normal > > reiser4_find_next_zero_bit(bnode_working_data(bnode), end_offset, > > start_offset) >= end_offset error. For the exact message see > > http://users.wpi.edu/~chanin/r4newpatch.txt. > > > > please try attached patch (it is for fs/reiser4 subtree) > > -- > Alex. > Alex, Thanks for the patch, just chiming in to say that this also seems to be working well for me too. I've never had any consistant crashes to test, but with this patch reiser4 has managed to not trigger a kernel panic in at least twice as long as without it - could just be luck; but, with Jake's test no longer crashing it on his machine I'd "the bug" is fixed. If anything does go wrong eventually I'll be sure to give you the error messages and such. Thanks again, Isaac ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2005-02-08 20:49 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-01-13 5:33 AMD64 progress? Jake Maciejewski 2005-01-14 0:03 ` David Masover 2005-01-14 20:17 ` Alex Zarochentsev 2005-01-14 22:58 ` Jake Maciejewski 2005-01-16 2:26 ` Isaac Chanin 2005-02-07 13:29 ` Alex Zarochentsev 2005-02-07 19:34 ` Jake Maciejewski 2005-02-07 19:51 ` Alex Zarochentsev 2005-02-08 18:25 ` Jake Maciejewski 2005-02-08 19:12 ` Alex Zarochentsev 2005-02-08 20:49 ` Jake Maciejewski 2005-02-08 20:29 ` Isaac Chanin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.