* raid1 recovery: reads and writes
@ 2008-12-15 13:45 Jon Nelson
2008-12-15 21:09 ` Neil Brown
0 siblings, 1 reply; 5+ messages in thread
From: Jon Nelson @ 2008-12-15 13:45 UTC (permalink / raw)
To: LinuxRaid
I have a software raid1 which underwent a recovery recently.
While it was recovering, I could not understand why I was seeing an
equal amount of *read* I/O as write I/O to the device the raid1 was
recovering to. That's to say, the raid1 was rebuilding on to a spare
but instead of just writing I was seeing (nearly exactly) the same
amount of read I/O as write I/O. The non-spare device was getting
almost exclusively read I/O as expected. Why was the spare getting any
read I/O at all?
--
Jon
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: raid1 recovery: reads and writes
2008-12-15 13:45 raid1 recovery: reads and writes Jon Nelson
@ 2008-12-15 21:09 ` Neil Brown
2008-12-16 0:54 ` Dan Williams
0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2008-12-15 21:09 UTC (permalink / raw)
To: Jon Nelson; +Cc: LinuxRaid
On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
> I have a software raid1 which underwent a recovery recently.
> While it was recovering, I could not understand why I was seeing an
> equal amount of *read* I/O as write I/O to the device the raid1 was
> recovering to. That's to say, the raid1 was rebuilding on to a spare
> but instead of just writing I was seeing (nearly exactly) the same
> amount of read I/O as write I/O. The non-spare device was getting
> almost exclusively read I/O as expected. Why was the spare getting any
> read I/O at all?
It sounds like it is doing a 'repair' rather than a 'recover'.
'repair' happens when you write the word 'repair' to
/sys/block/mdX/md/sync_action.
In situations like this it is always best to provide lots of detail.
e.g. kernel logs of the time when things were happening, "mdadm -E" or
all devices. The exact data you looked at to make you think there was
something going wrong (in this case, whatever statistics you found
that suggested lots of read requests).
Without those specifics it is hard to try to reproduce, and hard to
guess what was actually happening.
NeilBrown
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: raid1 recovery: reads and writes
2008-12-15 21:09 ` Neil Brown
@ 2008-12-16 0:54 ` Dan Williams
2008-12-16 2:40 ` Jon Nelson
0 siblings, 1 reply; 5+ messages in thread
From: Dan Williams @ 2008-12-16 0:54 UTC (permalink / raw)
To: Neil Brown; +Cc: Jon Nelson, LinuxRaid
On Mon, Dec 15, 2008 at 2:09 PM, Neil Brown <neilb@suse.de> wrote:
> On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
>> I have a software raid1 which underwent a recovery recently.
>> While it was recovering, I could not understand why I was seeing an
>> equal amount of *read* I/O as write I/O to the device the raid1 was
>> recovering to. That's to say, the raid1 was rebuilding on to a spare
>> but instead of just writing I was seeing (nearly exactly) the same
>> amount of read I/O as write I/O. The non-spare device was getting
>> almost exclusively read I/O as expected. Why was the spare getting any
>> read I/O at all?
>
> It sounds like it is doing a 'repair' rather than a 'recover'.
> 'repair' happens when you write the word 'repair' to
> /sys/block/mdX/md/sync_action.
>
> In situations like this it is always best to provide lots of detail.
> e.g. kernel logs of the time when things were happening, "mdadm -E" or
> all devices. The exact data you looked at to make you think there was
> something going wrong (in this case, whatever statistics you found
> that suggested lots of read requests).
>
> Without those specifics it is hard to try to reproduce, and hard to
> guess what was actually happening.
>
Yes it is hard to guess what is happening here; however, should not
the following commit go to -stable?
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=56ac36d722d0d27c03599d1245ac0ab59e474e5c
Regards,
Dan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: raid1 recovery: reads and writes
2008-12-16 0:54 ` Dan Williams
@ 2008-12-16 2:40 ` Jon Nelson
2008-12-16 21:04 ` Dan Williams
0 siblings, 1 reply; 5+ messages in thread
From: Jon Nelson @ 2008-12-16 2:40 UTC (permalink / raw)
To: Dan Williams; +Cc: Neil Brown, LinuxRaid
On Mon, Dec 15, 2008 at 6:54 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Mon, Dec 15, 2008 at 2:09 PM, Neil Brown <neilb@suse.de> wrote:
>> On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
>>> I have a software raid1 which underwent a recovery recently.
>>> While it was recovering, I could not understand why I was seeing an
>>> equal amount of *read* I/O as write I/O to the device the raid1 was
>>> recovering to. That's to say, the raid1 was rebuilding on to a spare
>>> but instead of just writing I was seeing (nearly exactly) the same
>>> amount of read I/O as write I/O. The non-spare device was getting
>>> almost exclusively read I/O as expected. Why was the spare getting any
>>> read I/O at all?
..
> Yes it is hard to guess what is happening here; however, should not
> the following commit go to -stable?
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=56ac36d722d0d27c03599d1245ac0ab59e474e5c
Is it possible this commit is (partially) responsible for what I was
seeing? Under what conditions would it trigger?
For me, it was a thoroughly unexpected recovery - I had expected to
--re-add the device and have it resync up very quickly - this normally
takes a few seconds, but instead for some reason it went into a full
recovery. I was stracing the nbd-server process responsible for the
block device and it was definitely getting read and write requests in
equal measure. However, later when I actually did rebuild the array
the recovery was (almost) 100% write.
--
Jon
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-12-16 21:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-15 13:45 raid1 recovery: reads and writes Jon Nelson
2008-12-15 21:09 ` Neil Brown
2008-12-16 0:54 ` Dan Williams
2008-12-16 2:40 ` Jon Nelson
2008-12-16 21:04 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).