All of lore.kernel.org
 help / color / mirror / Atom feed
* Data Integrity Test with fio
@ 2013-08-21 22:02 Juan Casse
  2013-08-22  7:10 ` Erwan Velu
  0 siblings, 1 reply; 3+ messages in thread
From: Juan Casse @ 2013-08-21 22:02 UTC (permalink / raw)
  To: Jens Axboe, fio@vger.kernel.org; +Cc: grundler

Hi Jens:

I wanted to let you know what we intend to do in advance in case you
can foresee any problems or have any preferences.

We would like fio to test for data integrity/retention as follows.

Phase 1: Write data to storage device
Run fio with a job file specifying the following options.
readwrite=
 - randwrite
   (does random writes)
 - randrw
   (does random reads and writes)
size=64k or any other size
(specifies the total amount of data that will be written/read)
bs=8192 or any other size
(specifies the size of each data block)
runtime=1 or any other number of seconds
(specifies the duration of the fio run)
time_based
(tells fio to run based on time rather than iops.)
verify=meta
(adds block number, numberio and timestamp to the block header)
verify_pattern=0xffffffffffffffff or any other pattern
(fills the rest of the block with the specified pattern)
verify_dump=1
(writes to a file the data read from disk and the data that was
expected, in case of data corruption)
continue_on_error=verify
(allows fio to continue even if corrupted blocks are found, otherwise
fio will stop execution on the first corrupted block)
random_generator=lfsr
(use lfsr as the random number generator)

Phase 2: Verify data
(Recall that we mentioned adding a "generation" number to the block
header data? Well, we think we can use the existing numberio instead.)
Run fio with the same options as before plus specifying a new option:
 - data_integrity_check
(I already modified the code to accept this option).
When this option is given in the job file, fio will "replay" the
workload (without actually writing data to storage). Once we have done
this, we can read each block back and compare its numberio with the
one obtained by running lfsr in reverse. We run lfsr in reverse
because the numberio that was last written to a block will be found
toward the end of the lfsr sequence when the data is written multiple
times, which is what we want.

The numberio is incremented each time we read or write, so it can
easily be computed going backwards by decrementing its value.

How is the block offset computed from the lfsr? Do you see any
problems trying to compute the offset going backwards with the lfsr?

One way to perform the data integrity check is to verify each block in
order of block number (offset). For each block number, run the lfsr
backwards starting from the end until we hit the block number. We then
compare the numberio obtained by running the lfsr backwards with the
one read from storage.

Any concerns with this?

In the fio code, the function do_io() performs the workload specified,
whether it be writes, reads and writes, or just reads.
The function do_verify() is executed after do_io() only when the
workload does any writes. If the workload does only reads, do_verify()
is not executed. This function reads the blocks back and compares the
offset (block number). I already have code in place that checks for
numberio as well.

However, if the job file specifies to run based on time rather than
total number of bytes (setting runtime= and time_based), then
do_verify() is not performed. We would also need to run do_verify() in
this case to make sure that the correct data was indeed written to
storage.
Note: fio needs to be run based on time if we want numberio
incremented when a block is rewritten. If we set fio to run a number
of iterations instead (by specifying loops=int), the same numberio
will be written every time the block is rewritten.

Thanks,
Juan


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Data Integrity Test with fio
  2013-08-21 22:02 Data Integrity Test with fio Juan Casse
@ 2013-08-22  7:10 ` Erwan Velu
  2013-08-22 16:48   ` Juan Casse
  0 siblings, 1 reply; 3+ messages in thread
From: Erwan Velu @ 2013-08-22  7:10 UTC (permalink / raw)
  To: Juan Casse; +Cc: Jens Axboe, fio@vger.kernel.org, grundler

On 22/08/2013 00:02, Juan Casse wrote:
> Hi Jens:
>
> I wanted to let you know what we intend to do in advance in case you
> can foresee any problems or have any preferences.
>
> We would like fio to test for data integrity/retention as follows.
Does this sample fio job isn't what you are looking at ?

http://git.kernel.dk/?p=fio.git;a=blob;f=examples/surface-scan.fio;h=dc3373a2ea48f495cdc03ccf4dc2e1ed23e3e434;hb=HEAD


Erwan,


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Data Integrity Test with fio
  2013-08-22  7:10 ` Erwan Velu
@ 2013-08-22 16:48   ` Juan Casse
  0 siblings, 0 replies; 3+ messages in thread
From: Juan Casse @ 2013-08-22 16:48 UTC (permalink / raw)
  To: Erwan Velu; +Cc: Juan Casse, Jens Axboe, fio@vger.kernel.org, grundler

Hi Erwan:

Thank you for the prompt reply. I apologize for my lengthy previous email.

On Thu, Aug 22, 2013 at 12:10 AM, Erwan Velu <erwan@enovance.com> wrote:
>
> On 22/08/2013 00:02, Juan Casse wrote:
>>
>> Hi Jens:
>>
>> I wanted to let you know what we intend to do in advance in case you
>> can foresee any problems or have any preferences.
>>
>> We would like fio to test for data integrity/retention as follows.
>
> Does this sample fio job isn't what you are looking at ?
>
> http://git.kernel.dk/?p=fio.git;a=blob;f=examples/surface-scan.fio;h=dc3373a2ea48f495cdc03ccf4dc2e1ed23e3e434;hb=HEAD
>

The sample job is fine. We want to improve the data integrity check to
detect types of failures.

We want fio to check for stale data. We would like to add a generation
number, which counts the number of times the same block has been
written.

So here is what Grant and I are currently considering doing; a bit
different from what I proposed in my previous email.

- Add a separate LFSR, one for reads and one for writes
- Add a check for numberio_w; numberio_w is a global count of writes
that fio currently keeps track of, but can serve the same purpose as
the generation number.
- During the verification phase, we simply run the LFSR forward
(without actually writing) and fill a numberio_w table with the last
computed numberio_w for each block, then read each block back and
compare against the computed numberio_w.

Thank you,
Juan

>
> Erwan,


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-08-22 16:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-21 22:02 Data Integrity Test with fio Juan Casse
2013-08-22  7:10 ` Erwan Velu
2013-08-22 16:48   ` Juan Casse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.