linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* raid1 recovery: reads and writes
@ 2008-12-15 13:45 Jon Nelson
  2008-12-15 21:09 ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Jon Nelson @ 2008-12-15 13:45 UTC (permalink / raw)
  To: LinuxRaid

I have a software raid1 which underwent a recovery recently.
While it was recovering, I could not understand why I was seeing an
equal amount of *read* I/O as write I/O to the device the raid1 was
recovering to. That's to say, the raid1 was rebuilding on to a spare
but instead of just writing I was seeing (nearly exactly) the same
amount of read I/O as write I/O. The non-spare device was getting
almost exclusively read I/O as expected. Why was the spare getting any
read I/O at all?

-- 
Jon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid1 recovery: reads and writes
  2008-12-15 13:45 raid1 recovery: reads and writes Jon Nelson
@ 2008-12-15 21:09 ` Neil Brown
  2008-12-16  0:54   ` Dan Williams
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2008-12-15 21:09 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
> I have a software raid1 which underwent a recovery recently.
> While it was recovering, I could not understand why I was seeing an
> equal amount of *read* I/O as write I/O to the device the raid1 was
> recovering to. That's to say, the raid1 was rebuilding on to a spare
> but instead of just writing I was seeing (nearly exactly) the same
> amount of read I/O as write I/O. The non-spare device was getting
> almost exclusively read I/O as expected. Why was the spare getting any
> read I/O at all?

It sounds like it is doing a 'repair' rather than a 'recover'.
'repair' happens when you write the word 'repair' to
/sys/block/mdX/md/sync_action.

In situations like this it is always best to provide lots of detail.
e.g. kernel logs of the time when things were happening, "mdadm -E" or
all devices.  The exact data you looked at to make you think there was
something going wrong (in this case, whatever statistics you found
that suggested lots of read requests).

Without those specifics it is hard to try to reproduce, and hard to
guess what was actually happening.

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid1 recovery: reads and writes
  2008-12-15 21:09 ` Neil Brown
@ 2008-12-16  0:54   ` Dan Williams
  2008-12-16  2:40     ` Jon Nelson
  0 siblings, 1 reply; 5+ messages in thread
From: Dan Williams @ 2008-12-16  0:54 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jon Nelson, LinuxRaid

On Mon, Dec 15, 2008 at 2:09 PM, Neil Brown <neilb@suse.de> wrote:
> On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
>> I have a software raid1 which underwent a recovery recently.
>> While it was recovering, I could not understand why I was seeing an
>> equal amount of *read* I/O as write I/O to the device the raid1 was
>> recovering to. That's to say, the raid1 was rebuilding on to a spare
>> but instead of just writing I was seeing (nearly exactly) the same
>> amount of read I/O as write I/O. The non-spare device was getting
>> almost exclusively read I/O as expected. Why was the spare getting any
>> read I/O at all?
>
> It sounds like it is doing a 'repair' rather than a 'recover'.
> 'repair' happens when you write the word 'repair' to
> /sys/block/mdX/md/sync_action.
>
> In situations like this it is always best to provide lots of detail.
> e.g. kernel logs of the time when things were happening, "mdadm -E" or
> all devices.  The exact data you looked at to make you think there was
> something going wrong (in this case, whatever statistics you found
> that suggested lots of read requests).
>
> Without those specifics it is hard to try to reproduce, and hard to
> guess what was actually happening.
>

Yes it is hard to guess what is happening here; however, should not
the following commit go to -stable?

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=56ac36d722d0d27c03599d1245ac0ab59e474e5c

Regards,
Dan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid1 recovery: reads and writes
  2008-12-16  0:54   ` Dan Williams
@ 2008-12-16  2:40     ` Jon Nelson
  2008-12-16 21:04       ` Dan Williams
  0 siblings, 1 reply; 5+ messages in thread
From: Jon Nelson @ 2008-12-16  2:40 UTC (permalink / raw)
  To: Dan Williams; +Cc: Neil Brown, LinuxRaid

On Mon, Dec 15, 2008 at 6:54 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Mon, Dec 15, 2008 at 2:09 PM, Neil Brown <neilb@suse.de> wrote:
>> On Monday December 15, jnelson-linux-raid@jamponi.net wrote:
>>> I have a software raid1 which underwent a recovery recently.
>>> While it was recovering, I could not understand why I was seeing an
>>> equal amount of *read* I/O as write I/O to the device the raid1 was
>>> recovering to. That's to say, the raid1 was rebuilding on to a spare
>>> but instead of just writing I was seeing (nearly exactly) the same
>>> amount of read I/O as write I/O. The non-spare device was getting
>>> almost exclusively read I/O as expected. Why was the spare getting any
>>> read I/O at all?

..

> Yes it is hard to guess what is happening here; however, should not
> the following commit go to -stable?
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=56ac36d722d0d27c03599d1245ac0ab59e474e5c

Is it possible this commit is (partially) responsible for what I was
seeing? Under what conditions would it trigger?

For me, it was a thoroughly unexpected recovery - I had expected to
--re-add the device and have it resync up very quickly - this normally
takes a few seconds, but instead for some reason it went into a full
recovery. I was stracing the nbd-server process responsible for the
block device and it was definitely getting read and write requests in
equal measure. However, later when I actually did rebuild the array
the recovery was (almost) 100% write.

-- 
Jon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: raid1 recovery: reads and writes
  2008-12-16  2:40     ` Jon Nelson
@ 2008-12-16 21:04       ` Dan Williams
  0 siblings, 0 replies; 5+ messages in thread
From: Dan Williams @ 2008-12-16 21:04 UTC (permalink / raw)
  To: Jon Nelson; +Cc: Neil Brown, LinuxRaid

Jon Nelson wrote:
> ..
> 
>> Yes it is hard to guess what is happening here; however, should not
>> the following commit go to -stable?
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=56ac36d722d0d27c03599d1245ac0ab59e474e5c
> 
> Is it possible this commit is (partially) responsible for what I was
> seeing? Under what conditions would it trigger?

I don't think this commit is responsible, it's a fix for a case where 
the kernel was asked to do a repair but a recovery is pending.

This commit addressed a bug in external-metadata support when adding a 
spare disk via sysfs.  It would be fairly difficult to trigger this in 
the non-external-metadata case.   It would require a check/repair 
request to come in immediately after the disk was (re-)added, but before 
the kernel started the resync.

--
Dan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-12-16 21:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-15 13:45 raid1 recovery: reads and writes Jon Nelson
2008-12-15 21:09 ` Neil Brown
2008-12-16  0:54   ` Dan Williams
2008-12-16  2:40     ` Jon Nelson
2008-12-16 21:04       ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).