All of lore.kernel.org
 help / color / mirror / Atom feed
* Bug in reiserfsck 3.6.5
@ 2003-04-09 10:57 Kelledin
  2003-04-09 10:09 ` Yury Umanets
  2003-04-09 10:15 ` Oleg Drokin
  0 siblings, 2 replies; 5+ messages in thread
From: Kelledin @ 2003-04-09 10:57 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 2184 bytes --]

(I'd send this directly to Yura, but I'm having a bit of trouble 
getting mail through.  Twice it's bounced off the 
namesys.botik.ru mailserver as suspected spam, and when I 
finally employ a mailertable trick, namesys.botik.ru no longer 
resolves.  I think some angry god is conspiring against me and 
my bug report...)

I recently found that when installing reiserfsprogs with
/sbin/fsck.reiserfs symlinked to reiserfsck, fsck.reiserfs
generates a SIGABORT when called as an fsck backend via "fsck -a
-A -C -T" (fairly common command used in some system boot
scripts).  It was quite interesting to troubleshoot, as the
problem _didn't_ occur if I turned fsck.reiserfs into a wrapper
script, or called "fsck.reiserfs -a /dev/sda14" directly from
the bash prompt...

I traced it down to this line in lib/io.c:

--
void flush_buffers (dev_t dev)
{
    if (!dev)
        die ("flush_buffers: device is not specifed");
        ^^^^^
--

I'm fairly certan what is happening is that when fsck calls the
fsck.reiserfs backend, it's closing all default stream
descriptors (stdin, stdout, stderr) before exec'ing it.  So if
fsck.reiserfs opens the device file (/dev/sda14) before anything
else, then fs->dev_t gets a descriptor value of zero.  This
eventually trickles down to flush_buffers(), which thinks
something is wrong with this and croaks.

(This is obviously incorrect thinking on the part of
flush_buffers().  Having a general-purpose file descriptor with
a value of 0 is unusual, but not really incorrect.)

When reiserfsck is called directly from the shell prompt, or is
executed via a wrapper script, it actually gets its own
stdin/stdout/stderr sitting on descriptors 0/1/2 and thus
doesn't trip over this bug.  So creating a wrapper script works
as a quick band-aid fix.

The proper solution is to change the flush_buffers() way of
thinking; the attached patch might be enough.  Or it might not 
be.  If some other bit of code is actually setting fs->fs_dev to 
0 to signify a real error condition, then a real fix is going to 
require more far-reaching changes.

--
Kelledin
"If a server crashes in a server farm and no one pings it, does
it still cost four figures to fix?"

[-- Attachment #2: reiserfsprogs-3.6.5-fdzero.patch --]
[-- Type: text/x-diff, Size: 603 bytes --]

diff -Naur reiserfsprogs-3.6.5/lib/io.c reiserfsprogs-3.6.5-fdzero/lib/io.c
--- reiserfsprogs-3.6.5/lib/io.c	2003-03-12 11:34:43.000000000 -0600
+++ reiserfsprogs-3.6.5-fdzero/lib/io.c	2003-04-09 05:43:05.000000000 -0500
@@ -390,8 +390,12 @@
 
 void flush_buffers (dev_t dev)
 {
+/* Scrap this test.  A file descriptor with a value of zero is perfectly
+ * valid.
+ *                                                        --kelledin
     if (!dev)
 	die ("flush_buffers: device is not specifed");
+*/
     sync_buffers (&Buffer_list_head, dev, 0/*all*/);
     buffer_soft_limit = BUFFER_SOFT_LIMIT;
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-04-10  4:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-09 10:57 Bug in reiserfsck 3.6.5 Kelledin
2003-04-09 10:09 ` Yury Umanets
2003-04-09 10:15 ` Oleg Drokin
2003-04-09 17:17   ` Hans Reiser
2003-04-10  4:12     ` Kelledin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.