From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Brian Chu" Subject: reiserfsck --rebuild-tree all-in-one problem. Date: Sun, 2 Feb 2003 13:33:17 -0500 Message-ID: <063201c2cae9$8dfb69d0$0201010a@brian> Reply-To: "Brian Chu" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: list-help: list-unsubscribe: list-post: Errors-To: flx@namesys.com List-Id: Content-Type: text/plain; charset="us-ascii" To: reiserfs-list@namesys.com Hello. Last friday when I went to upgrade my server, I noticed that there had been a lot of kernel messages on my server that were saying that one partition was spewing this: Jan 5 13:48:14 simmy kernel: hde: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 5 13:48:14 simmy kernel: hde: dma_intr: error=0x40 { UncorrectableError }, LBAsect=91887, high=0, low=91887, sector=91824 Jan 5 13:48:14 simmy kernel: end_request: I/O error, dev 21:01 (hde), sector 91824 Jan 5 13:48:14 simmy kernel: vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [7495 7710 0x0 SD] I checked it for this email just now and discovered that this problem has been persisting for at least one month (logrotate deleted the rest), which is surprising because I never had any problems with the hard drive for all this time. Either way, after I was done upgrading my server, I figured I could run reiserfsck since it was a new reboot with 'reiserfsck --check /dev/hde1' (version 3.6.3) which proved to be fatal. After the first fsck, reiserfsck did not exit cleanly (I don't remember the error, this was two days ago), and I was able to mount back the partition, so I unmounted and fsck again, and again it did not exit cleanly. This time, however, I could not mount it, with mount giving "mount: Not a directory" error and exiting, even though reiserfs did the journal replay and all. I restarted it, but it was not mounted, and so I took the hard drive out. It was here that I noticed that the errors were probably because bad sectors had developed on the drive, so since I had an identical (160GB Maxtor 4G160J8) drive, I brought it to a spare comp, installed debian (testing) onto it, and put in the hard drives. (reiserfsck version 3.6.4) From there, I ran a dd to copy the data from the damaged drive to the new unused drive, and I started running reiserfsck --check. --check told me I had to run --rebuild-tree, so I ran --rebuild-tree with the logfile. I ran this process for two times now, and each time --rebuild-tree would stop at the second Pass with the leaf insertion. The first time, the log file had taken up all the space in the root partition of the machine, so I figured that it was because the log file took up all the space (this was a 1.7GB file I had. *twice*), that caused reiserfsck to stop. I gave up that night, because running dd once took 7 hours and reiserfsck twice took 2 hours each, so the whole day was wasted. I had read on the first time I ran --rebuild-tree that a "dd_rescue" was suggested, so I downloaded it, installed it, and ran it again (since I had used just plain dd the first time). I'm not sure if that made a difference or not. Today I started again, assuming that with dd_rescue, I would have a greater chance of getting the filesystem recovered, but --check told me I had to run --rebuild-tree, and this time I just did --logfile /dev/null, because screen dumps during the run would make it impossible to see what's going on. But again, it stopped again at the same place- Pass 2. Since the logfiles spit so much STUFF out, I have none at the moment (I can remake them if needed). Screen dump: Pass 0: Loading on-disk bitmap .. ok, 35629753 blocks marked used Skipping 9432 blocks (super block, journal, bitmaps) 35620321 blocks will be rea d 0%....20%....40%....60%....80%....100% left 0, 6936 /sec "r5" hash is selected Flushing..finished Read blocks (but not data blocks) 35620321 Leaves among those 68299 - leaves all contents of which could not be saved and deleted 1 Objectids found 152402 Pass 1 (will try to insert 68298 leaves): Looking for allocable blocks .. fininshed 0%....20%....40%....60%....80%....100% left 0, 1219 /sec Flushing..finished 68298 leaves read 68262 inserted 36 not inserted Pass 2: 0%....20%....40%.. left 36, 0 /sec And it stops there. top indicates reiserfsck is using all of the cpu cycles, even after it seemingly freezes. debugreiserfs -p... creates a huge file, so I stopped it. The filesytem has about 136GB of data that I would really like to recover. Of course, because of the 1000/1024 thing, the partition has only 152GB of partition space. Throughout the process reiserfsck spit out a lot of problems. Is there a way I can have reiserfsck skip through passes, because generating what the pass1 and pass2 messages (which are probably more important) would require that I wait at least ~two hours to get through pass0. mount ... weird. mount gives a different message now. mount was giving the same "mount: Not a directory" that the first computer had given before this last run of reiserfsck. simmy:~# mount -t reiserfs /dev/hdd1 /mnt Feb 2 13:41:00 simmy kernel: dev 16:41: Unfinished reiserfsck --rebuild-tree run detected. Please run Feb 2 13:41:00 simmy kernel: reiserfsck --rebuild-tree and wait for a completion. If that fails Feb 2 13:41:00 simmy kernel: get newer reiserfsprogs package Feb 2 13:41:00 simmy kernel: read_super_block: can't find a reiserfs filesystem on (dev 16:41, block 2, size 4096) mount: wrong fs type, bad option, bad superblock on /dev/hdd1, or too many mounted file systems Any (quick) help will be appreciated. If any information is missing, please ask. chub@stuy.yi.org PS: I'm on the list so I can find the replies there, but cc it to my email address if possible (list email goes to another one).