Testing framework

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Testing framework
@ 2006-12-28 13:18 Karuna sagar k
  0 siblings, 0 replies; 8+ messages in thread
From: Karuna sagar k @ 2006-12-28 13:18 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel

Hi,

I am working on a testing framework for file systems focusing on
repair and recovery areas. Right now, I have been timing fsck and
trying to determine the effectiveness of fsck. The idea that I have is
below.

In abstract terms, I create a file system (ideal state), corrupt it,
run fsck on it and compare the recovered state with the ideal one. So
the whole process is divided into phases:

1. Prepare Phase - a new file system is created, populated and aged.
This state of the file system is consistent and is considered as ideal
state. This state is to be recorded for later comparisons (during the
comparison phase).

2. Corruption Phase - the file system on the disk is corrupted. We
should be able to reproduce such corruption on different test runs
(probably not very accurately). This way we would be able to compare
the file systems in a better way.

3. Repair Phase - fsck is run to repair and recover the disk to a
consistent state. The time taken by fsck is determined here.

4. Comparison Phase - the current state (recovered state) is compared
(a logical comparison) with the ideal state. The comparison tells what
fsck recovered, and how close are the two states.

Apart from this we would also require a component to record the state
of the file system. For this we construct a tree (which represents the
state of the file system) where each node stores some info on the
files and directories in the file system and the tree structure
records the parent children relationship among the files and
directories.

Currently I am focusing on the corruption and comparison phases:

Corruption Phase:
Focus - the corruption should be reproducible. This gives the control
over the comparison between test runs and file systems. I am assuming
here that different test runs would deal with same files.

I have come across two approaches here:

Approach 1 -
1. Among the files present on the disk, we randomly choose few files.
2. For each such file, we will then find the meta data block info
(inode block).
3. We seek to such blocks and corrupt them (the corruption is done by
randomly writing data to the block or some predetermined info)
4. Such files are noted to reproduce the similar corruption.

The above approach looks at the file system from a very abstract view.
But there may be file system specific data structures on the disk (eg.
group descriptors in case of ext2). These are not handled by this
approach directly.

Approach 2 - We pick up meta data blocks from the disk, and form a
logical disk structure containing just these meta data blocks. The
blocks on the logical disk map onto the physical disk. We pick up
randomly blocks from this and corrupt them.

Comparison Phase:
We determine the logical equivalence between the recovered and ideal
states of the disk i.e. say if a file was lost during corruption and
fsck recovered it and put it in the lost+found directory. We have to
recognize this as logical equivalent and not report that the file is
lost.

Right now I have a basic implementation of the above with random
corruption (using fsfuzzer logic).

Any suggestions?

Thanks,
Karuna

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Testing framework
@ 2007-04-22 20:46 Karuna sagar K
  2007-04-23  9:09 ` Kalpak Shah
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Karuna sagar K @ 2007-04-22 20:46 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5964 bytes --]

Hi,

For some time I had been working on this file system test framework.
Now I have a implementation for the same and below is the explanation.
Any comments are welcome.

Introduction:
The testing tools and benchmarks available around do not take into
account the repair and recovery aspects of file systems. The test
framework described here focuses on repair and recovery capabilities
of file systems. Since most file systems use 'fsck' to recover from
file system inconsistencies, the test framework characterizes file
systems based on outcomes of running 'fsck'.

Overview:
The model can be described in brief as - prepare a file system, record
the state of the file system, corrupt it, use repair and recovery
tools and finally compare and report the status of the recovered file
system against its initial state.

Prepare Phase:
This is the first phase in the model. Here we prepare a file system to
carry out subsequent phases. A new file system image is created with
the specified name. 'mkfs' program is run on this image and then the
file system is aged after populating it sufficiently. This state of
the file system is considered as an ideal state.

Corruption Phase:
The file system prepared in the prepare phase is corrupted to simulate
a system crash or in general an inconsistency in the file system.
Obviously we are more interested in corrupting the metadata
information. A random corruption would provide us with the results
like that of fs_mutator or fs_fuzzer. However, for different test runs
the corruption would vary and hence it wouldn't be fair and tedious to
have a comparison between file systems. So, we would like have a
mechanism where the corruption could be replayable thus ensuring
almost same amount of corruption be reproduced across test runs. The
techniques for corruption are:

Higher level perspective/approach:
In this approach the file system is viewed as a tree of nodes, where
nodes are either files or directories. The metadata information
corresponding to some randomly chosen nodes of the tree are corrupted.
Nodes which are corrupted are marked or recorded to be able to replay
later. This file system is called source file system while the file
system on which we need to replay the corruption is called target file
system. The assumption is that the target file system contains a set
of files and directories which is a superset of that in the source
file system. Hence to replay the corruption we need point out which
nodes in the source file system were corrupted in the source file
system and corrupt the corresponding nodes in the target file system.

A major disadvantage with this approach is that on-disk structures
(like superblocks, block group descriptors, etc.) are not considered
for corruption.

Lower level perspective/approach:
The file system is looked upon as a set of blocks (more precisely
metadata blocks). We randomly choose from this set of blocks to
corrupt. Hence we would be able to overcome the deficiency of the
previous approach. However this approach makes it difficult to have a
replayable corruption. Further thought about this approach has to be
given.

We could have a blend of both the approaches in the program to
compromise between corruption and replayability.

Repair Phase:
The corrupted file system is repaired and recovered with 'fsck' or any
other tools; this phase considers the repair and recovery action on
the file system as a black box. The time taken to repair by the tool
is measured.

Comparison Phase:
The current state of the file system is compared with the ideal state
of the file system. The metadata information of the file system is
checked with that of the ideal file system and the outcome is noted to
summarize on this test run. If repair tool used is 100% effective then
the current state of the file system should be exactly the same as
that of the ideal file system. Simply checking for equality wouldn't
be right because it doesn't take care of lost and found files. Hence
we need to check node-by-node for each node in the ideal state of the
file system.

State Record:
The comparison phase requires that the ideal state of the file system
be known. Replicating the whole file system would eat up a lot of disk
space. Storing the state of the file system in memory would be costly
in case of huge file systems. So, we need to store the state of the
file system on the disk such that it wouldn't take up a lot of disk
space. We record the metadata information and store it onto a file.
One approach is replicating the metadata blocks of the source file
system and storing the replica blocks under a single file called state
file. Additional metadata such as checksum of the data blocks can be
stored in the same state file. However this may store some unnecessary
metadata information in the state file and hence swelling it up for
huge source file systems. So, instead of storing the metadata blocks
themselves we would summarize the information in them before storing
in the state file.

Summary Phase:
This is the final phase in the model. A report file is prepared which
summarizes the result of this test run. The summary contains:

Average time taken for recovery
Number of files lost at the end of each iteration
Number of files with metadata corruption at the end of each iteration
Number of files with data corruption at the end of each iteration
Number of files lost and found at the end of each iteration

Putting it all together:
The Corruption, Repair and Comparison phases could be repeated a
number of times (each repetition is called an iteration) before the
summary of that test run is prepared.

TODO:
Account for files in the lost+found directory during the comparison phase.
Support for other file systems (only ext2 is supported currently)
State of the either file system is stored, which may be huge, time
consuming and not necessary. So, we could have better ways of storing
the state.

Comments are welcome!!

Thanks,
Karuna

[-- Attachment #2: tf.tar.bz2 --]
[-- Type: application/x-bzip2, Size: 26358 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Testing framework
  2007-04-22 20:46 Testing framework Karuna sagar K
@ 2007-04-23  9:09 ` Kalpak Shah
  2007-04-23 11:25   ` Karuna sagar K
  2007-04-23 14:04 ` Avishay Traeger
  2007-04-28  9:35 ` Pavel Machek
  2 siblings, 1 reply; 8+ messages in thread
From: Kalpak Shah @ 2007-04-23  9:09 UTC (permalink / raw)
  To: Karuna sagar K; +Cc: linux-fsdevel, linux-kernel

On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote:
> Hi,
> 
> For some time I had been working on this file system test framework.
> Now I have a implementation for the same and below is the explanation.
> Any comments are welcome.
> 
> Introduction:
> The testing tools and benchmarks available around do not take into
> account the repair and recovery aspects of file systems. The test
> framework described here focuses on repair and recovery capabilities
> of file systems. Since most file systems use 'fsck' to recover from
> file system inconsistencies, the test framework characterizes file
> systems based on outcomes of running 'fsck'.

<snip>

> Higher level perspective/approach:
> In this approach the file system is viewed as a tree of nodes, where
> nodes are either files or directories. The metadata information
> corresponding to some randomly chosen nodes of the tree are corrupted.
> Nodes which are corrupted are marked or recorded to be able to replay
> later. This file system is called source file system while the file
> system on which we need to replay the corruption is called target file
> system. The assumption is that the target file system contains a set
> of files and directories which is a superset of that in the source
> file system. Hence to replay the corruption we need point out which
> nodes in the source file system were corrupted in the source file
> system and corrupt the corresponding nodes in the target file system.
> 
> A major disadvantage with this approach is that on-disk structures
> (like superblocks, block group descriptors, etc.) are not considered
> for corruption.
> 
> Lower level perspective/approach:
> The file system is looked upon as a set of blocks (more precisely
> metadata blocks). We randomly choose from this set of blocks to
> corrupt. Hence we would be able to overcome the deficiency of the
> previous approach. However this approach makes it difficult to have a
> replayable corruption. Further thought about this approach has to be
> given.
> 

Fill a test filesystem with data and save it. Corrupt it by copying a
chunk of data from random locations A to B. Save positions A and B so
that you can reproduce the corruption. 

Or corrupt random bits (ideally in metadata blocks) and maintain the
list of the bit numbers for reproducing the corruption.

> We could have a blend of both the approaches in the program to
> compromise between corruption and replayability.
> 
> Repair Phase:
> The corrupted file system is repaired and recovered with 'fsck' or any
> other tools; this phase considers the repair and recovery action on
> the file system as a black box. The time taken to repair by the tool
> is measured.

I see that you are running fsck just once on the test filesystem. It
might be a good idea to run it twice and if second fsck does not find
the filesystem to be completely clean that means it is a bug in fsck.

<snip>

> Summary Phase:
> This is the final phase in the model. A report file is prepared which
> summarizes the result of this test run. The summary contains:
> 
> Average time taken for recovery
> Number of files lost at the end of each iteration
> Number of files with metadata corruption at the end of each iteration
> Number of files with data corruption at the end of each iteration
> Number of files lost and found at the end of each iteration
> 
> Putting it all together:
> The Corruption, Repair and Comparison phases could be repeated a
> number of times (each repetition is called an iteration) before the
> summary of that test run is prepared.
> 
> TODO:
> Account for files in the lost+found directory during the comparison phase.
> Support for other file systems (only ext2 is supported currently)
> State of the either file system is stored, which may be huge, time
> consuming and not necessary. So, we could have better ways of storing
> the state.

Also, people may want to test with different mount options, so something
like "mount -t $fstype -o loop,$MOUNT_OPTIONS $imgname $mountpt" may be
useful. Similarly it may also be useful to have MKFS_OPTIONS while
formatting the filesystem.

Thanks,
Kalpak.

> 
> Comments are welcome!!
> 
> Thanks,
> Karuna


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Testing framework
  2007-04-23  9:09 ` Kalpak Shah
@ 2007-04-23 11:25   ` Karuna sagar K
  0 siblings, 0 replies; 8+ messages in thread
From: Karuna sagar K @ 2007-04-23 11:25 UTC (permalink / raw)
  To: Kalpak Shah; +Cc: linux-fsdevel, linux-kernel

On 4/23/07, Kalpak Shah <kalpak@linsyssoft.com> wrote:
> On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote:

.....
>> The file system is looked upon as a set of blocks (more precisely
>> metadata blocks). We randomly choose from this set of blocks to
>> corrupt. Hence we would be able to overcome the deficiency of the
>> previous approach. However this approach makes it difficult to have a
>> replayable corruption. Further thought about this approach has to be
>> given.
>
> Fill a test filesystem with data and save it. Corrupt it by copying a
> chunk of data from random locations A to B. Save positions A and B so
> that you can reproduce the corruption.
>

Hey, thats a nice idea :). But, this woundnt reproduce the same
corruption right? Because, say, on first run of the tool there is
metadata stored at locations A and B and then on the second run there
may be user data present. I mean the allocation may be different.

> Or corrupt random bits (ideally in metadata blocks) and maintain the
> list of the bit numbers for reproducing the corruption.
>

.....
>> The corrupted file system is repaired and recovered with 'fsck' or any
>> other tools; this phase considers the repair and recovery action on
>> the file system as a black box. The time taken to repair by the tool
>> is measured
>
> I see that you are running fsck just once on the test filesystem. It
> might be a good idea to run it twice and if second fsck does not find
> the filesystem to be completely clean that means it is a bug in fsck.

You are right. Will modify that.

>
> <snip>
>

......
>> State of the either file system is stored, which may be huge, time
>> consuming and not necessary. So, we could have better ways of storing
>> the state.
>
> Also, people may want to test with different mount options, so something
> like "mount -t $fstype -o loop,$MOUNT_OPTIONS $imgname $mountpt" may be
> useful. Similarly it may also be useful to have MKFS_OPTIONS while
> formatting the filesystem.
>

Right. I didnt think of that. Will look into it.

> Thanks,
> Kalpak.
>
> >
> > Comments are welcome!!
> >
> > Thanks,
> > Karuna
>
>

Thanks,
Karuna

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Testing framework
  2007-04-22 20:46 Testing framework Karuna sagar K
  2007-04-23  9:09 ` Kalpak Shah
@ 2007-04-23 14:04 ` Avishay Traeger
  2007-04-23 22:11   ` Ric Wheeler
  2007-04-25 11:28   ` Karuna sagar K
  2007-04-28  9:35 ` Pavel Machek
  2 siblings, 2 replies; 8+ messages in thread
From: Avishay Traeger @ 2007-04-23 14:04 UTC (permalink / raw)
  To: Karuna sagar K; +Cc: linux-fsdevel, linux-kernel

On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote:
> For some time I had been working on this file system test framework.
> Now I have a implementation for the same and below is the explanation.
> Any comments are welcome.

<snip>

You may want to check out the paper "EXPLODE: A Lightweight, General
System for Finding Serious Storage System Errors" from OSDI 2006 (if you
haven't already).  The idea sounds very similar to me, although I
haven't read all the details of your proposal.

Avishay


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Testing framework
  2007-04-23 14:04 ` Avishay Traeger
@ 2007-04-23 22:11   ` Ric Wheeler
  2007-04-25 11:28   ` Karuna sagar K
  1 sibling, 0 replies; 8+ messages in thread
From: Ric Wheeler @ 2007-04-23 22:11 UTC (permalink / raw)
  To: Avishay Traeger; +Cc: Karuna sagar K, linux-fsdevel, linux-kernel

Avishay Traeger wrote:
> On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote:
>> For some time I had been working on this file system test framework.
>> Now I have a implementation for the same and below is the explanation.
>> Any comments are welcome.
> 
> <snip>
> 
> You may want to check out the paper "EXPLODE: A Lightweight, General
> System for Finding Serious Storage System Errors" from OSDI 2006 (if you
> haven't already).  The idea sounds very similar to me, although I
> haven't read all the details of your proposal.
> 
> Avishay
> 

It would also be interesting to use the disk error injection patches 
that Mark Lord sent out recently to introduce real sector level 
corruption.  When your file systems are large enough and old enough, 
getting bad sectors and IO errors during an fsck stresses things in 
interesting ways ;-)

ric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Testing framework
  2007-04-23 14:04 ` Avishay Traeger
  2007-04-23 22:11   ` Ric Wheeler
@ 2007-04-25 11:28   ` Karuna sagar K
  1 sibling, 0 replies; 8+ messages in thread
From: Karuna sagar K @ 2007-04-25 11:28 UTC (permalink / raw)
  To: Avishay Traeger; +Cc: linux-fsdevel, linux-kernel

On 4/23/07, Avishay Traeger <atraeger@cs.sunysb.edu> wrote:
> On Mon, 2007-04-23 at 02:16 +0530, Karuna sagar K wrote:
> <snip>
>
> You may want to check out the paper "EXPLODE: A Lightweight, General
> System for Finding Serious Storage System Errors" from OSDI 2006 (if you
> haven't already).  The idea sounds very similar to me, although I
> haven't read all the details of your proposal.

EXPLODE is more of a generic tool i.e. it is used to find larger set
of errors/bugs in file systems than the Test framework which focuses
on the repair of file systems.

The Test framework is focused towards repairability of the file
systems, it doesnt use model checking concept, it uses replayable
corruption mechanism and is user space implementation. Thats the
reason why this is not similar to EXPLODE.

>
> Avishay
>
>

Thanks,
Karuna

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Testing framework
  2007-04-22 20:46 Testing framework Karuna sagar K
  2007-04-23  9:09 ` Kalpak Shah
  2007-04-23 14:04 ` Avishay Traeger
@ 2007-04-28  9:35 ` Pavel Machek
  2 siblings, 0 replies; 8+ messages in thread
From: Pavel Machek @ 2007-04-28  9:35 UTC (permalink / raw)
  To: Karuna sagar K; +Cc: linux-fsdevel, linux-kernel

Hi!

> For some time I had been working on this file system 
> test framework.
> Now I have a implementation for the same and below is 
> the explanation.
> Any comments are welcome.
> 
> Introduction:
> The testing tools and benchmarks available around do not 
> take into
> account the repair and recovery aspects of file systems. 
> The test
> framework described here focuses on repair and recovery 
> capabilities
> of file systems. Since most file systems use 'fsck' to 
> recover from
> file system inconsistencies, the test framework 
> characterizes file
> systems based on outcomes of running 'fsck'.

Thanks for your work.

> Putting it all together:
> The Corruption, Repair and Comparison phases could be 
> repeated a
> number of times (each repetition is called an iteration) 
> before the
> summary of that test run is prepared.
> 
> TODO:
> Account for files in the lost+found directory during the 
> comparison phase.
> Support for other file systems (only ext2 is supported 
> currently)

Yes, please. ext2 does really well in fsck area, unfortunately some
other filesystems (vfat, reiserfs) do not work that well.

						Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-04-28  9:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-22 20:46 Testing framework Karuna sagar K
2007-04-23  9:09 ` Kalpak Shah
2007-04-23 11:25   ` Karuna sagar K
2007-04-23 14:04 ` Avishay Traeger
2007-04-23 22:11   ` Ric Wheeler
2007-04-25 11:28   ` Karuna sagar K
2007-04-28  9:35 ` Pavel Machek
  -- strict thread matches above, loose matches on Subject: below --
2006-12-28 13:18 Karuna sagar k

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).