Re: 2.4.0-test11 ext2 fs corruption

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: 2.4.0-test11 ext2 fs corruption
  2000-11-28 20:32 Petr Vandrovec
@ 2000-11-28 20:02 ` Alexander Viro
  0 siblings, 0 replies; 14+ messages in thread
From: Alexander Viro @ 2000-11-28 20:02 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: linux-kernel, tytso



On Tue, 28 Nov 2000, Petr Vandrovec wrote:

> Hi Al,
>   during weekend I was uncompressing XFree (Debian's 4.0.1-7) at home,
> with 2.4.0-test11 running on Celeron 300A, 128MB RAM, SMP kernel on up.
> It failed to compile lbxproxy/di/main.c. After some investigation I found
> that they were overwritten by some source font data. fsck did not reveal
> any croslinked clusters, nothing. Filesystem itself uses 4KB clusters.
> 
>   Today I found some spare time and investigated it further. There is
> same data contents in:
> 
> programs/lbxproxy/di/init.c 0-8720  fonts/bdf/75dpi/lubR24.bdf  0x5000-0x7210
>                  lbxfuncs.c 0x0000-0x0EC0           lubR24.bdf  0x8000-0x8EC0
>                             0x0EC1-0x0FFF                      zero
>                             0x1000-0x5ABC           lutBS08.bdf 0x0000-0x4ABC
>                             0x5ABD-0x5FFF                      zero
>                             0x6000-0x92C1           lutBS10.bdf 0x0000-0x32C1
>                  lbxutil.c  0x0000-0x1E27           lutBS10.bdf 0x4000-0x5E27
>                             0x1E28-0x1FFF                      zero
>                             0x2000-0x3452           lutBS12.bdf 0x0000-0x1452   
>                  main.c     0-4614                  lutBS12.bdf 0x2000-0x3206
>                  options.c  0x0000-0x222E           lutBS12.bdf 0x4000-0x622E
>                             0x222F-0x2FFF                      zero
>                             0x3000-0x4E30           lutBS14.bdf 0x0000-0x1E30
>                  pm.c       0-11706                 lutBS14.bdf 0x2000-0x4DA8
>              (blocks 722433-722459)                (blocks 558899-~558927)
>              
> Other files are intouch. As you can see, somewhat disk blocks
> ended somewhere else than they should in addition to correct place.
> I also found that data after end of file in di/*.c files are not
> cleared, so maybe that ide driver did a mistake? But I was not able
> to find how to convert either block address, or LBA adress, or CHS
> address (drive uses 839/240/63, but I hope that it runs in LBA) to
> get 558899 from 722433 or vice versa.

Erm... Do you mean that you've got a 1-1 correspondence in data between these
two ranges? Then it looks like something way below the fs level... Weird.
Could you verify it with dd?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.0-test11 ext2 fs corruption
  2000-11-28 21:10 Petr Vandrovec
@ 2000-11-28 20:04 ` David S. Miller
  2000-11-28 20:41 ` Alexander Viro
  1 sibling, 0 replies; 14+ messages in thread
From: David S. Miller @ 2000-11-28 20:04 UTC (permalink / raw)
  To: VANDROVE; +Cc: viro, linux-kernel, tytso

   From: "Petr Vandrovec" <VANDROVE@vc.cvut.cz>
   Date:          Tue, 28 Nov 2000 21:10:36 MET-1

   Yes, it is identical copy. But I do not think that hdd can write same
   data into two places with one command...

Petr, did the af_inet.c assertions get triggered on this
same machine?

If yes, you seem to have some crazy kernel data corruptions
going on, and whatever it is would seem to be the cause of
both these problems you are reporting.

Later,
David S. Miller
davem@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* 2.4.0-test11 ext2 fs corruption
@ 2000-11-28 20:32 Petr Vandrovec
  2000-11-28 20:02 ` Alexander Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Petr Vandrovec @ 2000-11-28 20:32 UTC (permalink / raw)
  To: viro; +Cc: linux-kernel, tytso

Hi Al,
  during weekend I was uncompressing XFree (Debian's 4.0.1-7) at home,
with 2.4.0-test11 running on Celeron 300A, 128MB RAM, SMP kernel on up.
It failed to compile lbxproxy/di/main.c. After some investigation I found
that they were overwritten by some source font data. fsck did not reveal
any croslinked clusters, nothing. Filesystem itself uses 4KB clusters.

  Today I found some spare time and investigated it further. There is
same data contents in:

programs/lbxproxy/di/init.c 0-8720  fonts/bdf/75dpi/lubR24.bdf  0x5000-0x7210
                 lbxfuncs.c 0x0000-0x0EC0           lubR24.bdf  0x8000-0x8EC0
                            0x0EC1-0x0FFF                      zero
                            0x1000-0x5ABC           lutBS08.bdf 0x0000-0x4ABC
                            0x5ABD-0x5FFF                      zero
                            0x6000-0x92C1           lutBS10.bdf 0x0000-0x32C1
                 lbxutil.c  0x0000-0x1E27           lutBS10.bdf 0x4000-0x5E27
                            0x1E28-0x1FFF                      zero
                            0x2000-0x3452           lutBS12.bdf 0x0000-0x1452   
                 main.c     0-4614                  lutBS12.bdf 0x2000-0x3206
                 options.c  0x0000-0x222E           lutBS12.bdf 0x4000-0x622E
                            0x222F-0x2FFF                      zero
                            0x3000-0x4E30           lutBS14.bdf 0x0000-0x1E30
                 pm.c       0-11706                 lutBS14.bdf 0x2000-0x4DA8
             (blocks 722433-722459)                (blocks 558899-~558927)

Other files are intouch. As you can see, somewhat disk blocks
ended somewhere else than they should in addition to correct place.
I also found that data after end of file in di/*.c files are not
cleared, so maybe that ide driver did a mistake? But I was not able
to find how to convert either block address, or LBA adress, or CHS
address (drive uses 839/240/63, but I hope that it runs in LBA) to
get 558899 from 722433 or vice versa.

Motherboard is i440BX, HDD was IDE TOSHIBA MK6409MAV on secondary IDE,
running UDMA2.

Nobody complained - neither IDE nor kernel nor ext2, just data were
damaged. Machine does not have any other problems, so I have no idea
what caused this incident. Maybe I stressed MM system too much with
some gnome app during untar?

And last note, according to debian/scripts/source.unpack, programs/lbxproxy
was created first, and fonts/bdf/... was created after that (i.e.
X401src-1 was decompressed first, X401src-2_debian was decompressed
second). This also agrees with zeroed bytes in these datablocks.
                                    Thanks,
                                            Petr Vandrovec
                                            vandrove@vc.cvut.cz

P.S.: Ted, why field 'Blocks: XXX' in debugfs (1.19) is 'Sectors: '
in reality (it reports blocks * 8, so I assume (as I have 4KB clusters)
that it converts it to sector count)?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.0-test11 ext2 fs corruption
  2000-11-28 21:10 Petr Vandrovec
  2000-11-28 20:04 ` David S. Miller
@ 2000-11-28 20:41 ` Alexander Viro
  2000-11-29 16:26   ` Daniel Phillips
  1 sibling, 1 reply; 14+ messages in thread
From: Alexander Viro @ 2000-11-28 20:41 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: linux-kernel, tytso, Linus Torvalds, Andrea Arcangeli

On Tue, 28 Nov 2000, Petr Vandrovec wrote:

> > two ranges? Then it looks like something way below the fs level... Weird.
> > Could you verify it with dd?
> 
> Yes, it is identical copy. But I do not think that hdd can write same
> data into two places with one command...
> 
> vana:/# dd if=/dev/hdd1 bs=4096 count=27 skip=722433 | md5sum
> 27+0 records in
> 27+0 records out
> 613de4a7ea664ce34b2a9ec8203de0f4
> vana:/# dd if=/dev/hdd1 bs=4096 count=27 skip=558899 | md5sum
> 27+0 records in
> 27+0 records out
> 613de4a7ea664ce34b2a9ec8203de0f4
> vana:/#

Bloody hell... OK, let's see. Both ranges are covered by multiple files
and are way larger than one page. I.e. anything on pagecache level is
extremely unlikely - pages are not searched by physical location on
disk. And I really doubt that it's ext2_get_block() - we would have
to get a systematic error (constant offset), then read the data in
for no good reason, then forget the page->buffers, then get the right
values fro ext2_get_block(), leave the data unmodified _and_ write it.

It almost looks like a request in queue got fscked up retaining the
->bh from one of the previous (also coalesced) requests and having
correct ->sector. Weird.

Linus, Andrea - any ideas? Situation looks so: after massive file creation
a range of disk with the data from new files (many new files) got
duplicated over another range - one with the data from older files
(also many of them). 27 blocks, block size == 4Kb. No intersection
between inodes, fsck is happy with fs, just a data ending up in two
places on disk. No warnings from IDE or ext2 drivers.

Kernel: test11 built with 2.95.2, so gcc bug may very well be there.
However, I really wonder what could trigger it in ll_rw_blk.c - 5:1
that shit had hit the fan there.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.0-test11 ext2 fs corruption
@ 2000-11-28 21:10 Petr Vandrovec
  2000-11-28 20:04 ` David S. Miller
  2000-11-28 20:41 ` Alexander Viro
  0 siblings, 2 replies; 14+ messages in thread
From: Petr Vandrovec @ 2000-11-28 21:10 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel, tytso

On 28 Nov 00 at 15:02, Alexander Viro wrote:
> On Tue, 28 Nov 2000, Petr Vandrovec wrote:
> 
> > Hi Al,
> >   during weekend I was uncompressing XFree (Debian's 4.0.1-7) at home,
> > with 2.4.0-test11 running on Celeron 300A, 128MB RAM, SMP kernel on up.
> > It failed to compile lbxproxy/di/main.c. After some investigation I found
> > that they were overwritten by some source font data. fsck did not reveal
> > any croslinked clusters, nothing. Filesystem itself uses 4KB clusters.
> > 
> >   Today I found some spare time and investigated it further. There is
> > same data contents in:
> > 
> > programs/lbxproxy/di/init.c 0-8720  fonts/bdf/75dpi/lubR24.bdf  0x5000-0x7210
> >                  lbxfuncs.c 0x0000-0x0EC0           lubR24.bdf  0x8000-0x8EC0
> >                             0x0EC1-0x0FFF                      zero
> >                             0x1000-0x5ABC           lutBS08.bdf 0x0000-0x4ABC
> >                             0x5ABD-0x5FFF                      zero
> >                             0x6000-0x92C1           lutBS10.bdf 0x0000-0x32C1
> >                  lbxutil.c  0x0000-0x1E27           lutBS10.bdf 0x4000-0x5E27
> >                             0x1E28-0x1FFF                      zero
> >                             0x2000-0x3452           lutBS12.bdf 0x0000-0x1452   
> >                  main.c     0-4614                  lutBS12.bdf 0x2000-0x3206
> >                  options.c  0x0000-0x222E           lutBS12.bdf 0x4000-0x622E
> >                             0x222F-0x2FFF                      zero
> >                             0x3000-0x4E30           lutBS14.bdf 0x0000-0x1E30
> >                  pm.c       0-11706                 lutBS14.bdf 0x2000-0x4DA8
> >              (blocks 722433-722459)                (blocks 558899-~558927)
> >              
> > Other files are intouch. As you can see, somewhat disk blocks
> > ended somewhere else than they should in addition to correct place.
> > I also found that data after end of file in di/*.c files are not
> > cleared, so maybe that ide driver did a mistake? But I was not able
> > to find how to convert either block address, or LBA adress, or CHS
> > address (drive uses 839/240/63, but I hope that it runs in LBA) to
> > get 558899 from 722433 or vice versa.
> 
> Erm... Do you mean that you've got a 1-1 correspondence in data between these
> two ranges? Then it looks like something way below the fs level... Weird.
> Could you verify it with dd?

Yes, it is identical copy. But I do not think that hdd can write same
data into two places with one command...

vana:/# dd if=/dev/hdd1 bs=4096 count=27 skip=722433 | md5sum
27+0 records in
27+0 records out
613de4a7ea664ce34b2a9ec8203de0f4
vana:/# dd if=/dev/hdd1 bs=4096 count=27 skip=558899 | md5sum
27+0 records in
27+0 records out
613de4a7ea664ce34b2a9ec8203de0f4
vana:/#

I found match by searching of contents of init.c in other files.

It is just these 27 blocks; blocks before and after range differs.
                                        Best regards,
                                                Petr Vandrovec
                                                vandrove@vc.cvut.cz
                                                
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.0-test11 ext2 fs corruption
@ 2000-11-28 21:46 Petr Vandrovec
  2000-11-29  0:43 ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Petr Vandrovec @ 2000-11-28 21:46 UTC (permalink / raw)
  To: David S. Miller; +Cc: viro, linux-kernel, tytso

On 28 Nov 00 at 12:04, David S. Miller wrote:
> 
>    Yes, it is identical copy. But I do not think that hdd can write same
>    data into two places with one command...
> 
> Petr, did the af_inet.c assertions get triggered on this
> same machine?

No, ext2fs is at home, and af_inet is at work... At work I'm using
vmware, at home I do not use it... But kernel sources are same
(g450 patch for matroxfb, ncpfs supporting device nodes, threaded ipx;
but neither ncpfs nor ipx is compiled at home).
                                                   Petr Vandrovec
                                                   vandrove@vc.cvut.cz
                                                   
                                                                
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.0-test11 ext2 fs corruption
  2000-11-28 21:46 2.4.0-test11 ext2 fs corruption Petr Vandrovec
@ 2000-11-29  0:43 ` Jens Axboe
  2000-11-29  1:08   ` Andrea Arcangeli
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2000-11-29  0:43 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: David S. Miller, viro, linux-kernel, tytso

On Tue, Nov 28 2000, Petr Vandrovec wrote:
> On 28 Nov 00 at 12:04, David S. Miller wrote:
> > 
> >    Yes, it is identical copy. But I do not think that hdd can write same
> >    data into two places with one command...
> > 
> > Petr, did the af_inet.c assertions get triggered on this
> > same machine?
> 
> No, ext2fs is at home, and af_inet is at work... At work I'm using
> vmware, at home I do not use it... But kernel sources are same
> (g450 patch for matroxfb, ncpfs supporting device nodes, threaded ipx;
> but neither ncpfs nor ipx is compiled at home).
>                                                    Petr Vandrovec
>                                                    vandrove@vc.cvut.cz

Petr,

Could you try and reproduce with attached patch? If this would trigger
I would assume fs corruption as well (which doesn't seem to be the
case for you), but it's worth a shot.

--- drivers/block/ll_rw_blk.c~	Wed Nov 29 01:30:22 2000
+++ drivers/block/ll_rw_blk.c	Wed Nov 29 01:33:00 2000
@@ -684,7 +684,7 @@
 	int max_segments = MAX_SEGMENTS;
 	struct request * req = NULL, *freereq = NULL;
 	int rw_ahead, max_sectors, el_ret;
-	struct list_head *head = &q->queue_head;
+	struct list_head *head;
 	int latency;
 	elevator_t *elevator = &q->elevator;
 
@@ -734,6 +734,7 @@
 	 */
 again:
 	spin_lock_irq(&io_request_lock);
+	head = &q->queue_head;
 
 	/*
 	 * skip first entry, for devices with active queue head

-- 
* Jens Axboe <axboe@suse.de>
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.0-test11 ext2 fs corruption
  2000-11-29  0:43 ` Jens Axboe
@ 2000-11-29  1:08   ` Andrea Arcangeli
  2000-11-29  1:11     ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Andrea Arcangeli @ 2000-11-29  1:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Petr Vandrovec, David S. Miller, viro, linux-kernel, tytso

Side note: that could generate mem/io corruption only on headactive devices
(like IDE).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.0-test11 ext2 fs corruption
  2000-11-29  1:08   ` Andrea Arcangeli
@ 2000-11-29  1:11     ` Jens Axboe
  2000-11-29  1:32       ` Andre Hedrick
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2000-11-29  1:11 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Petr Vandrovec, David S. Miller, viro, linux-kernel, tytso

On Wed, Nov 29 2000, Andrea Arcangeli wrote:
> Side note: that could generate mem/io corruption only on headactive devices
> (like IDE).

Yep, that's why I told Linus it was a long shot and couldn't possibly
account for all the corruption cases reported. And one would expect
fs corruption to go with that as well. So it's of course a long shot,
but still worth trying for Petr.

-- 
* Jens Axboe <axboe@suse.de>
* SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.0-test11 ext2 fs corruption
  2000-11-29  1:11     ` Jens Axboe
@ 2000-11-29  1:32       ` Andre Hedrick
  0 siblings, 0 replies; 14+ messages in thread
From: Andre Hedrick @ 2000-11-29  1:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrea Arcangeli, Petr Vandrovec, David S. Miller, viro,
	linux-kernel, tytso

On Wed, 29 Nov 2000, Jens Axboe wrote:

> On Wed, Nov 29 2000, Andrea Arcangeli wrote:
> > Side note: that could generate mem/io corruption only on headactive devices
> > (like IDE).
> 
> Yep, that's why I told Linus it was a long shot and couldn't possibly
> account for all the corruption cases reported. And one would expect
> fs corruption to go with that as well. So it's of course a long shot,
> but still worth trying for Petr.

Okay, I have spent part of the afternoon kicking my FW around and have not
followed all of the thread.  However we are talking FSC and ATA so what
are the details?  And where are we poking into the driver.

Andre Hedrick
Linux ATA Development

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.0-test11 ext2 fs corruption
@ 2000-11-29 12:21 schwidefsky
  2000-11-29 12:43 ` Alexander Viro
  0 siblings, 1 reply; 14+ messages in thread
From: schwidefsky @ 2000-11-29 12:21 UTC (permalink / raw)
  To: linux-kernel



>--- drivers/block/ll_rw_blk.c~  Wed Nov 29 01:30:22 2000
>+++ drivers/block/ll_rw_blk.c   Wed Nov 29 01:33:00 2000
>@@ -684,7 +684,7 @@
>        int max_segments = MAX_SEGMENTS;
>        struct request * req = NULL, *freereq = NULL;
>        int rw_ahead, max_sectors, el_ret;
>-       struct list_head *head = &q->queue_head;
>+       struct list_head *head;
>        int latency;
>        elevator_t *elevator = &q->elevator;

head = &q->queue_head is a simple offset calculation in the request
queue structure. Moving this into the spinlock won't change anything,
since q->queue_head isn't a pointer that can change.

Independent of that I can second the observation that test11 can corrupt
ext2 in memory. I think that this is related to the memory management
problems I see but I can't prove it yet.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.0-test11 ext2 fs corruption
@ 2000-11-29 12:41 Petr Vandrovec
  0 siblings, 0 replies; 14+ messages in thread
From: Petr Vandrovec @ 2000-11-29 12:41 UTC (permalink / raw)
  To: Jens Axboe; +Cc: David S. Miller, viro, linux-kernel, tytso

On 29 Nov 00 at 1:43, Jens Axboe wrote:

> Could you try and reproduce with attached patch? If this would trigger
> I would assume fs corruption as well (which doesn't seem to be the
> case for you), but it's worth a shot.

I'll try, but it is not easily reproducible. Fortunately.

BTW, during night, it came to me that maybe I was biased with original
diagnostics (thing written twice), as there was (~3 weeks ago) unpacked
XF4.0.1-0phase?v27 on the same disk. 

As font data did not change between these two versions, it is possible 
that one 27 blocks chunk (*.c files) was lost (or written somewhere where 
I did not found it yet), instead of another one (fonts) duplicated.
                                                Thanks,
                                                    Petr Vandrovec
                                                    vandrove@vc.cvut.cz
                                                    
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.0-test11 ext2 fs corruption
  2000-11-29 12:21 schwidefsky
@ 2000-11-29 12:43 ` Alexander Viro
  0 siblings, 0 replies; 14+ messages in thread
From: Alexander Viro @ 2000-11-29 12:43 UTC (permalink / raw)
  To: schwidefsky; +Cc: linux-kernel



On Wed, 29 Nov 2000 schwidefsky@de.ibm.com wrote:

> 
> 
> >--- drivers/block/ll_rw_blk.c~  Wed Nov 29 01:30:22 2000
> >+++ drivers/block/ll_rw_blk.c   Wed Nov 29 01:33:00 2000
> >@@ -684,7 +684,7 @@
> >        int max_segments = MAX_SEGMENTS;
> >        struct request * req = NULL, *freereq = NULL;
> >        int rw_ahead, max_sectors, el_ret;
> >-       struct list_head *head = &q->queue_head;
> >+       struct list_head *head;
> >        int latency;
> >        elevator_t *elevator = &q->elevator;
> 
> head = &q->queue_head is a simple offset calculation in the request
> queue structure. Moving this into the spinlock won't change anything,
> since q->queue_head isn't a pointer that can change.

That's fine, but head is _re_assigned later. Grep for 'head =' and 'again'
in __make_request().

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.0-test11 ext2 fs corruption
  2000-11-28 20:41 ` Alexander Viro
@ 2000-11-29 16:26   ` Daniel Phillips
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Phillips @ 2000-11-29 16:26 UTC (permalink / raw)
  To: Alexander Viro, linux-kernel

Alexander Viro wrote:
> 
> On Tue, 28 Nov 2000, Petr Vandrovec wrote:
> 
> > > two ranges? Then it looks like something way below the fs level... Weird.
> > > Could you verify it with dd?
> >
> > Yes, it is identical copy. But I do not think that hdd can write same
> > data into two places with one command...
> >
> > vana:/# dd if=/dev/hdd1 bs=4096 count=27 skip=722433 | md5sum
> > 27+0 records in
> > 27+0 records out
> > 613de4a7ea664ce34b2a9ec8203de0f4
> > vana:/# dd if=/dev/hdd1 bs=4096 count=27 skip=558899 | md5sum
> > 27+0 records in
> > 27+0 records out
> > 613de4a7ea664ce34b2a9ec8203de0f4
> > vana:/#
> 
> Bloody hell... OK, let's see. Both ranges are covered by multiple files
> and are way larger than one page. I.e. anything on pagecache level is
> extremely unlikely - pages are not searched by physical location on
> disk. And I really doubt that it's ext2_get_block() - we would have
> to get a systematic error (constant offset), then read the data in
> for no good reason, then forget the page->buffers, then get the right
> values fro ext2_get_block(), leave the data unmodified _and_ write it.
> 
> It almost looks like a request in queue got fscked up retaining the
> ->bh from one of the previous (also coalesced) requests and having
> correct ->sector. Weird.
> 
> Linus, Andrea - any ideas? Situation looks so: after massive file creation
> a range of disk with the data from new files (many new files) got
> duplicated over another range - one with the data from older files
> (also many of them). 27 blocks, block size == 4Kb. No intersection
> between inodes, fsck is happy with fs, just a data ending up in two
> places on disk. No warnings from IDE or ext2 drivers.
> 
> Kernel: test11 built with 2.95.2, so gcc bug may very well be there.
> However, I really wonder what could trigger it in ll_rw_blk.c - 5:1
> that shit had hit the fan there.

I picked up a bug in ialloc.c back in February, for which I submitted a
poorly constructed patch (for which I was privately and properly flamed;
as I recall my subsequent attempts to post an improved version failed
for various reasons which may or may not include ORBS).  Anyway, the
basic idea is clear:

  http://marc.theaimsgroup.com/?l=linux-kernel&m=95162877201890&w=2

I'll make a proper patch out of this if you like.  This *could* cause
the effect we're seeing here.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2000-11-29 16:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-11-28 21:46 2.4.0-test11 ext2 fs corruption Petr Vandrovec
2000-11-29  0:43 ` Jens Axboe
2000-11-29  1:08   ` Andrea Arcangeli
2000-11-29  1:11     ` Jens Axboe
2000-11-29  1:32       ` Andre Hedrick
  -- strict thread matches above, loose matches on Subject: below --
2000-11-29 12:41 Petr Vandrovec
2000-11-29 12:21 schwidefsky
2000-11-29 12:43 ` Alexander Viro
2000-11-28 21:10 Petr Vandrovec
2000-11-28 20:04 ` David S. Miller
2000-11-28 20:41 ` Alexander Viro
2000-11-29 16:26   ` Daniel Phillips
2000-11-28 20:32 Petr Vandrovec
2000-11-28 20:02 ` Alexander Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox