* [PATCH] md: Add two chances to update sync/recovery checkpoint
@ 2012-09-15 8:59 Jianpeng Ma
2012-09-20 3:36 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Jianpeng Ma @ 2012-09-15 8:59 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
According commit 97e4f42d62badb0f9fbc27c013e89,it has 16 times to update
checkpoint of sync/recovery in func md_do_sync().
Because the the size of HDD became larger,the time of sync/recovery may
taken long times.So the 1/16 of time maybe half hour or more.
So it should add chance to update checkpoint.
There are places which can update checkpoint in md_do_sync.
1: If call cond_resched and really sched
2: If curr_speed is larger than max_sync_spedd
If above conditions are ok, we can try to update checkpoint.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
---
drivers/md/md.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3f6203a..c7993d6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7496,7 +7496,14 @@ void md_do_sync(struct mddev *mddev)
* about not overloading the IO subsystem. (things like an
* e2fsck being done on the RAID array should execute fast)
*/
- cond_resched();
+ if (cond_resched())
+ if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
+ mddev->curr_resync_completed != j &&
+ atomic_read(&mddev->recovery_active) == 0) {
+ mddev->curr_resync_completed = j;
+ set_bit(MD_CHANGE_CLEAN, &mddev->flags);
+ sysfs_notify(&mddev->kobj, NULL, "sync_completed");
+ }
currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2
/((jiffies-mddev->resync_mark)/HZ +1) +1;
@@ -7505,6 +7512,13 @@ void md_do_sync(struct mddev *mddev)
if ((currspeed > speed_max(mddev)) ||
!is_mddev_idle(mddev, 0)) {
msleep(500);
+ if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
+ mddev->curr_resync_completed != j &&
+ atomic_read(&mddev->recovery_active) == 0) {
+ mddev->curr_resync_completed = j;
+ set_bit(MD_CHANGE_CLEAN, &mddev->flags);
+ sysfs_notify(&mddev->kobj, NULL, "sync_completed");
+ }
goto repeat;
}
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] md: Add two chances to update sync/recovery checkpoint
2012-09-15 8:59 [PATCH] md: Add two chances to update sync/recovery checkpoint Jianpeng Ma
@ 2012-09-20 3:36 ` NeilBrown
2012-09-20 6:14 ` Jianpeng Ma
0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2012-09-20 3:36 UTC (permalink / raw)
To: Jianpeng Ma; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2629 bytes --]
On Sat, 15 Sep 2012 16:59:34 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:
> According commit 97e4f42d62badb0f9fbc27c013e89,it has 16 times to update
> checkpoint of sync/recovery in func md_do_sync().
> Because the the size of HDD became larger,the time of sync/recovery may
> taken long times.So the 1/16 of time maybe half hour or more.
> So it should add chance to update checkpoint.
> There are places which can update checkpoint in md_do_sync.
> 1: If call cond_resched and really sched
> 2: If curr_speed is larger than max_sync_spedd
> If above conditions are ok, we can try to update checkpoint.
>
> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
> ---
> drivers/md/md.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 3f6203a..c7993d6 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7496,7 +7496,14 @@ void md_do_sync(struct mddev *mddev)
> * about not overloading the IO subsystem. (things like an
> * e2fsck being done on the RAID array should execute fast)
> */
> - cond_resched();
> + if (cond_resched())
> + if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
> + mddev->curr_resync_completed != j &&
> + atomic_read(&mddev->recovery_active) == 0) {
> + mddev->curr_resync_completed = j;
> + set_bit(MD_CHANGE_CLEAN, &mddev->flags);
> + sysfs_notify(&mddev->kobj, NULL, "sync_completed");
> + }
>
> currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2
> /((jiffies-mddev->resync_mark)/HZ +1) +1;
> @@ -7505,6 +7512,13 @@ void md_do_sync(struct mddev *mddev)
> if ((currspeed > speed_max(mddev)) ||
> !is_mddev_idle(mddev, 0)) {
> msleep(500);
> + if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
> + mddev->curr_resync_completed != j &&
> + atomic_read(&mddev->recovery_active) == 0) {
> + mddev->curr_resync_completed = j;
> + set_bit(MD_CHANGE_CLEAN, &mddev->flags);
> + sysfs_notify(&mddev->kobj, NULL, "sync_completed");
> + }
> goto repeat;
> }
> }
I don't really like this. These two conditions seems rather arbitrary.
If we want to do a checkpoint more often, we should use some time based test
to do it.
What results do you get with this change? How often does a checkpoint happen
on a busy system? How often on an idle system?
A time-based update could be done in user-space. Just write 'idle' to
'sync_action' and it should do a checkpoint, then immediately restart from
where it left off.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Re: [PATCH] md: Add two chances to update sync/recovery checkpoint
2012-09-20 3:36 ` NeilBrown
@ 2012-09-20 6:14 ` Jianpeng Ma
0 siblings, 0 replies; 3+ messages in thread
From: Jianpeng Ma @ 2012-09-20 6:14 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
On 2012-09-20 11:36 NeilBrown <neilb@suse.de> Wrote:
>On Sat, 15 Sep 2012 16:59:34 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:
>
>> According commit 97e4f42d62badb0f9fbc27c013e89,it has 16 times to update
>> checkpoint of sync/recovery in func md_do_sync().
>> Because the the size of HDD became larger,the time of sync/recovery may
>> taken long times.So the 1/16 of time maybe half hour or more.
>> So it should add chance to update checkpoint.
>> There are places which can update checkpoint in md_do_sync.
>> 1: If call cond_resched and really sched
>> 2: If curr_speed is larger than max_sync_spedd
>> If above conditions are ok, we can try to update checkpoint.
>>
>> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
>> ---
>> drivers/md/md.c | 16 +++++++++++++++-
>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 3f6203a..c7993d6 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -7496,7 +7496,14 @@ void md_do_sync(struct mddev *mddev)
>> * about not overloading the IO subsystem. (things like an
>> * e2fsck being done on the RAID array should execute fast)
>> */
>> - cond_resched();
>> + if (cond_resched())
>> + if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
>> + mddev->curr_resync_completed != j &&
>> + atomic_read(&mddev->recovery_active) == 0) {
>> + mddev->curr_resync_completed = j;
>> + set_bit(MD_CHANGE_CLEAN, &mddev->flags);
>> + sysfs_notify(&mddev->kobj, NULL, "sync_completed");
>> + }
>>
>> currspeed = ((unsigned long)(io_sectors-mddev->resync_mark_cnt))/2
>> /((jiffies-mddev->resync_mark)/HZ +1) +1;
>> @@ -7505,6 +7512,13 @@ void md_do_sync(struct mddev *mddev)
>> if ((currspeed > speed_max(mddev)) ||
>> !is_mddev_idle(mddev, 0)) {
>> msleep(500);
>> + if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
>> + mddev->curr_resync_completed != j &&
>> + atomic_read(&mddev->recovery_active) == 0) {
>> + mddev->curr_resync_completed = j;
>> + set_bit(MD_CHANGE_CLEAN, &mddev->flags);
>> + sysfs_notify(&mddev->kobj, NULL, "sync_completed");
>> + }
>> goto repeat;
>> }
>> }
>
>I don't really like this. These two conditions seems rather arbitrary.
>If we want to do a checkpoint more often, we should use some time based test
>to do it.
>
>What results do you get with this change? How often does a checkpoint happen
>on a busy system? How often on an idle system?
My though is if cond_resched or msleep returned and atomic_read(&mddev->recovery_active) == 0,
we can change recovery_up and dosen't wait mddev->recovery_active==0.
There are many place to check recovery_cp, so as possible as to update recovery_cp may be good.
>
>A time-based update could be done in user-space. Just write 'idle' to
>'sync_action' and it should do a checkpoint, then immediately restart from
>where it left off.
>
>NeilBrown
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-09-20 6:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-15 8:59 [PATCH] md: Add two chances to update sync/recovery checkpoint Jianpeng Ma
2012-09-20 3:36 ` NeilBrown
2012-09-20 6:14 ` Jianpeng Ma
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).