linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] md/r5cache: improve add-journal
@ 2017-03-15 18:28 Song Liu
  2017-03-15 18:28 ` [PATCH 2/2] md/r5cache: journal remove support Song Liu
  2017-03-15 22:51 ` [PATCH 1/2] md/r5cache: improve add-journal Shaohua Li
  0 siblings, 2 replies; 5+ messages in thread
From: Song Liu @ 2017-03-15 18:28 UTC (permalink / raw)
  To: linux-raid
  Cc: shli, neilb, kernel-team, dan.j.williams, hch, jes.sorensen,
	Song Liu

1. suspend the array before adding journal, so that we can add journal
   when the array is not read-only;
2. allow recreate journal when existing journal is Faulty. So that we can
   add-journal before removing failed journal.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 drivers/md/md.c    | 5 +++--
 drivers/md/raid5.c | 6 ++----
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 42e68b2..ac3bd15 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6230,9 +6230,10 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
 			struct md_rdev *rdev2;
 			bool has_journal = false;
 
-			/* make sure no existing journal disk */
+			/* make sure no active journal disk */
 			rdev_for_each(rdev2, mddev) {
-				if (test_bit(Journal, &rdev2->flags)) {
+				if (test_bit(Journal, &rdev2->flags) &&
+				    !test_bit(Faulty, &rdev2->flags)) {
 					has_journal = true;
 					break;
 				}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 447d9dd..ee8648b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7758,11 +7758,9 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 			return -EBUSY;
 
 		rdev->raid_disk = 0;
-		/*
-		 * The array is in readonly mode if journal is missing, so no
-		 * write requests running. We should be safe
-		 */
+		mddev_suspend(mddev);
 		log_init(conf, rdev);
+		mddev_resume(mddev);
 		return 0;
 	}
 	if (mddev->recovery_disabled == conf->recovery_disabled)
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] md/r5cache: journal remove support
  2017-03-15 18:28 [PATCH 1/2] md/r5cache: improve add-journal Song Liu
@ 2017-03-15 18:28 ` Song Liu
  2017-03-15 22:52   ` Shaohua Li
  2017-03-15 22:51 ` [PATCH 1/2] md/r5cache: improve add-journal Shaohua Li
  1 sibling, 1 reply; 5+ messages in thread
From: Song Liu @ 2017-03-15 18:28 UTC (permalink / raw)
  To: linux-raid
  Cc: shli, neilb, kernel-team, dan.j.williams, hch, jes.sorensen,
	Song Liu

When journal device of an array fails, the array is forced into read-only
mode. To make the array normal without adding another journal device, we
need to remove journal _feature_ from the array.

This patch allows remove journal _feature_ from an array, For journal
remove to work, existing journal should be either missing or faulty.

Two flags are added to GET_ARRAY_INFO for mdadm.
  1. MD_SB_HAS_JOURNAL: meaning the array have journal feature;
  2. MD_SB_JOURNAL_REMOVABLE: meaning the journal is faulty or missing

When both flags are set, mdadm can clear MD_SB_HAS_JOURNAL to remove
journal _feature_.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 drivers/md/md.c                | 42 ++++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/raid/md_p.h | 11 ++++++++---
 2 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index ac3bd15..32ee994 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5981,6 +5981,25 @@ static void autorun_devices(int part)
 }
 #endif /* !MODULE */
 
+/*
+ * the journal _feature_ is removable when:
+ * the array has journal support &&
+ *    (journal is missing || journal is faulty)
+ */
+static bool journal_removable(struct mddev *mddev)
+{
+	struct md_rdev *rdev;
+
+	if (!test_bit(MD_HAS_JOURNAL, &mddev->flags))
+		return false;
+
+	rdev_for_each_rcu(rdev, mddev)
+		if (test_bit(Journal, &rdev->flags) &&
+		    !test_bit(Faulty, &rdev->flags))
+		    return false;
+	return true;
+}
+
 static int get_version(void __user *arg)
 {
 	mdu_version_t ver;
@@ -6041,6 +6060,10 @@ static int get_array_info(struct mddev *mddev, void __user *arg)
 		info.state |= (1<<MD_SB_BITMAP_PRESENT);
 	if (mddev_is_clustered(mddev))
 		info.state |= (1<<MD_SB_CLUSTERED);
+	if (test_bit(MD_HAS_JOURNAL, &mddev->flags))
+		info.state |= (1<<MD_SB_HAS_JOURNAL);
+	if (journal_removable(mddev))
+		info.state |= (1<<MD_SB_JOURNAL_REMOVABLE);
 	info.active_disks  = insync;
 	info.working_disks = working;
 	info.failed_disks  = failed;
@@ -6721,6 +6744,8 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
 	/* calculate expected state,ignoring low bits */
 	if (mddev->bitmap && mddev->bitmap_info.offset)
 		state |= (1 << MD_SB_BITMAP_PRESENT);
+	if (journal_removable(mddev))
+		state |= (1 << MD_SB_JOURNAL_REMOVABLE);
 
 	if (mddev->major_version != info->major_version ||
 	    mddev->minor_version != info->minor_version ||
@@ -6730,8 +6755,11 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
 /*	    mddev->layout        != info->layout        || */
 	    mddev->persistent	 != !info->not_persistent ||
 	    mddev->chunk_sectors != info->chunk_size >> 9 ||
-	    /* ignore bottom 8 bits of state, and allow SB_BITMAP_PRESENT to change */
-	    ((state^info->state) & 0xfffffe00)
+	    /*
+	     * ignore bottom 8 bits of state, and allow SB_BITMAP_PRESENT
+	     * and SB_HAS_JOURNAL to change
+	     */
+	    ((state^info->state) & 0xfffffc00)
 		)
 		return -EINVAL;
 	/* Check there is only one change */
@@ -6743,6 +6771,8 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
 		cnt++;
 	if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT))
 		cnt++;
+	if ((state ^ info->state) & (1<<MD_SB_HAS_JOURNAL))
+		cnt++;
 	if (cnt == 0)
 		return 0;
 	if (cnt > 1)
@@ -6831,6 +6861,14 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
 			mddev->bitmap_info.offset = 0;
 		}
 	}
+	if ((state ^ info->state) & (1<<MD_SB_HAS_JOURNAL)) {
+		if (!journal_removable(mddev)) {
+			rv = -EINVAL;
+			goto err;
+		}
+		clear_bit(MD_HAS_JOURNAL,  &mddev->flags);
+	}
+
 	md_update_sb(mddev, 1);
 	return rv;
 err:
diff --git a/include/uapi/linux/raid/md_p.h b/include/uapi/linux/raid/md_p.h
index d9a1ead..b1f2b63 100644
--- a/include/uapi/linux/raid/md_p.h
+++ b/include/uapi/linux/raid/md_p.h
@@ -1,15 +1,15 @@
 /*
    md_p.h : physical layout of Linux RAID devices
           Copyright (C) 1996-98 Ingo Molnar, Gadi Oxman
-	  
+
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2, or (at your option)
    any later version.
-   
+
    You should have received a copy of the GNU General Public License
    (for example /usr/src/linux/COPYING); if not, write to the Free
-   Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  
+   Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 */
 
 #ifndef _MD_P_H
@@ -119,6 +119,11 @@ typedef struct mdp_device_descriptor_s {
 
 #define	MD_SB_CLUSTERED		5 /* MD is clustered */
 #define	MD_SB_BITMAP_PRESENT	8 /* bitmap may be present nearby */
+#define	MD_SB_HAS_JOURNAL	9
+#define	MD_SB_JOURNAL_REMOVABLE	10 /* journal _feature_ can be removed,
+				    * which means the journal is either
+				    * missing or Faulty
+				    */
 
 /*
  * Notes:
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] md/r5cache: improve add-journal
  2017-03-15 18:28 [PATCH 1/2] md/r5cache: improve add-journal Song Liu
  2017-03-15 18:28 ` [PATCH 2/2] md/r5cache: journal remove support Song Liu
@ 2017-03-15 22:51 ` Shaohua Li
  2017-03-15 23:48   ` Song Liu
  1 sibling, 1 reply; 5+ messages in thread
From: Shaohua Li @ 2017-03-15 22:51 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-raid, shli, neilb, kernel-team, dan.j.williams, hch,
	jes.sorensen

On Wed, Mar 15, 2017 at 11:28:15AM -0700, Song Liu wrote:
> 1. suspend the array before adding journal, so that we can add journal
>    when the array is not read-only;

we can't call mddev_suspend in raid5d, because there is deadlock.
raid5_add_disk can be called in raid5d.

> 2. allow recreate journal when existing journal is Faulty. So that we can
>    add-journal before removing failed journal.

this is weird usage, why don't we remove the failed journal first?
 
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  drivers/md/md.c    | 5 +++--
>  drivers/md/raid5.c | 6 ++----
>  2 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 42e68b2..ac3bd15 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6230,9 +6230,10 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
>  			struct md_rdev *rdev2;
>  			bool has_journal = false;
>  
> -			/* make sure no existing journal disk */
> +			/* make sure no active journal disk */
>  			rdev_for_each(rdev2, mddev) {
> -				if (test_bit(Journal, &rdev2->flags)) {
> +				if (test_bit(Journal, &rdev2->flags) &&
> +				    !test_bit(Faulty, &rdev2->flags)) {
>  					has_journal = true;
>  					break;
>  				}
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 447d9dd..ee8648b 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7758,11 +7758,9 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
>  			return -EBUSY;
>  
>  		rdev->raid_disk = 0;
> -		/*
> -		 * The array is in readonly mode if journal is missing, so no
> -		 * write requests running. We should be safe
> -		 */
> +		mddev_suspend(mddev);
>  		log_init(conf, rdev);
> +		mddev_resume(mddev);
>  		return 0;
>  	}
>  	if (mddev->recovery_disabled == conf->recovery_disabled)
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] md/r5cache: journal remove support
  2017-03-15 18:28 ` [PATCH 2/2] md/r5cache: journal remove support Song Liu
@ 2017-03-15 22:52   ` Shaohua Li
  0 siblings, 0 replies; 5+ messages in thread
From: Shaohua Li @ 2017-03-15 22:52 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-raid, shli, neilb, kernel-team, dan.j.williams, hch,
	jes.sorensen

On Wed, Mar 15, 2017 at 11:28:16AM -0700, Song Liu wrote:
> When journal device of an array fails, the array is forced into read-only
> mode. To make the array normal without adding another journal device, we
> need to remove journal _feature_ from the array.
> 
> This patch allows remove journal _feature_ from an array, For journal
> remove to work, existing journal should be either missing or faulty.
> 
> Two flags are added to GET_ARRAY_INFO for mdadm.
>   1. MD_SB_HAS_JOURNAL: meaning the array have journal feature;
>   2. MD_SB_JOURNAL_REMOVABLE: meaning the journal is faulty or missing
> 
> When both flags are set, mdadm can clear MD_SB_HAS_JOURNAL to remove
> journal _feature_.

please use the new 'consistency_policy' interface to remove journal support

> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  drivers/md/md.c                | 42 ++++++++++++++++++++++++++++++++++++++++--
>  include/uapi/linux/raid/md_p.h | 11 ++++++++---
>  2 files changed, 48 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index ac3bd15..32ee994 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -5981,6 +5981,25 @@ static void autorun_devices(int part)
>  }
>  #endif /* !MODULE */
>  
> +/*
> + * the journal _feature_ is removable when:
> + * the array has journal support &&
> + *    (journal is missing || journal is faulty)
> + */
> +static bool journal_removable(struct mddev *mddev)
> +{
> +	struct md_rdev *rdev;
> +
> +	if (!test_bit(MD_HAS_JOURNAL, &mddev->flags))
> +		return false;
> +
> +	rdev_for_each_rcu(rdev, mddev)
> +		if (test_bit(Journal, &rdev->flags) &&
> +		    !test_bit(Faulty, &rdev->flags))
> +		    return false;
> +	return true;
> +}
> +
>  static int get_version(void __user *arg)
>  {
>  	mdu_version_t ver;
> @@ -6041,6 +6060,10 @@ static int get_array_info(struct mddev *mddev, void __user *arg)
>  		info.state |= (1<<MD_SB_BITMAP_PRESENT);
>  	if (mddev_is_clustered(mddev))
>  		info.state |= (1<<MD_SB_CLUSTERED);
> +	if (test_bit(MD_HAS_JOURNAL, &mddev->flags))
> +		info.state |= (1<<MD_SB_HAS_JOURNAL);
> +	if (journal_removable(mddev))
> +		info.state |= (1<<MD_SB_JOURNAL_REMOVABLE);
>  	info.active_disks  = insync;
>  	info.working_disks = working;
>  	info.failed_disks  = failed;
> @@ -6721,6 +6744,8 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
>  	/* calculate expected state,ignoring low bits */
>  	if (mddev->bitmap && mddev->bitmap_info.offset)
>  		state |= (1 << MD_SB_BITMAP_PRESENT);
> +	if (journal_removable(mddev))
> +		state |= (1 << MD_SB_JOURNAL_REMOVABLE);
>  
>  	if (mddev->major_version != info->major_version ||
>  	    mddev->minor_version != info->minor_version ||
> @@ -6730,8 +6755,11 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
>  /*	    mddev->layout        != info->layout        || */
>  	    mddev->persistent	 != !info->not_persistent ||
>  	    mddev->chunk_sectors != info->chunk_size >> 9 ||
> -	    /* ignore bottom 8 bits of state, and allow SB_BITMAP_PRESENT to change */
> -	    ((state^info->state) & 0xfffffe00)
> +	    /*
> +	     * ignore bottom 8 bits of state, and allow SB_BITMAP_PRESENT
> +	     * and SB_HAS_JOURNAL to change
> +	     */
> +	    ((state^info->state) & 0xfffffc00)
>  		)
>  		return -EINVAL;
>  	/* Check there is only one change */
> @@ -6743,6 +6771,8 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
>  		cnt++;
>  	if ((state ^ info->state) & (1<<MD_SB_BITMAP_PRESENT))
>  		cnt++;
> +	if ((state ^ info->state) & (1<<MD_SB_HAS_JOURNAL))
> +		cnt++;
>  	if (cnt == 0)
>  		return 0;
>  	if (cnt > 1)
> @@ -6831,6 +6861,14 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
>  			mddev->bitmap_info.offset = 0;
>  		}
>  	}
> +	if ((state ^ info->state) & (1<<MD_SB_HAS_JOURNAL)) {
> +		if (!journal_removable(mddev)) {
> +			rv = -EINVAL;
> +			goto err;
> +		}
> +		clear_bit(MD_HAS_JOURNAL,  &mddev->flags);
> +	}
> +
>  	md_update_sb(mddev, 1);
>  	return rv;
>  err:
> diff --git a/include/uapi/linux/raid/md_p.h b/include/uapi/linux/raid/md_p.h
> index d9a1ead..b1f2b63 100644
> --- a/include/uapi/linux/raid/md_p.h
> +++ b/include/uapi/linux/raid/md_p.h
> @@ -1,15 +1,15 @@
>  /*
>     md_p.h : physical layout of Linux RAID devices
>            Copyright (C) 1996-98 Ingo Molnar, Gadi Oxman
> -	  
> +
>     This program is free software; you can redistribute it and/or modify
>     it under the terms of the GNU General Public License as published by
>     the Free Software Foundation; either version 2, or (at your option)
>     any later version.
> -   
> +
>     You should have received a copy of the GNU General Public License
>     (for example /usr/src/linux/COPYING); if not, write to the Free
> -   Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  
> +   Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
>  */
>  
>  #ifndef _MD_P_H
> @@ -119,6 +119,11 @@ typedef struct mdp_device_descriptor_s {
>  
>  #define	MD_SB_CLUSTERED		5 /* MD is clustered */
>  #define	MD_SB_BITMAP_PRESENT	8 /* bitmap may be present nearby */
> +#define	MD_SB_HAS_JOURNAL	9
> +#define	MD_SB_JOURNAL_REMOVABLE	10 /* journal _feature_ can be removed,
> +				    * which means the journal is either
> +				    * missing or Faulty
> +				    */
>  
>  /*
>   * Notes:
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] md/r5cache: improve add-journal
  2017-03-15 22:51 ` [PATCH 1/2] md/r5cache: improve add-journal Shaohua Li
@ 2017-03-15 23:48   ` Song Liu
  0 siblings, 0 replies; 5+ messages in thread
From: Song Liu @ 2017-03-15 23:48 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-raid, Shaohua Li, NeilBrown, Kernel Team, Dan Williams,
	Christoph Hellwig, jes.sorensen@gmail.com


> On Mar 15, 2017, at 3:51 PM, Shaohua Li <shli@kernel.org> wrote:
> 
> On Wed, Mar 15, 2017 at 11:28:15AM -0700, Song Liu wrote:
>> 1. suspend the array before adding journal, so that we can add journal
>>   when the array is not read-only;
> 
> we can't call mddev_suspend in raid5d, because there is deadlock.
> raid5_add_disk can be called in raid5d.

I see. I guess we can just require setting read only before adding journal. 

> 
>> 2. allow recreate journal when existing journal is Faulty. So that we can
>>   add-journal before removing failed journal.
> 
> this is weird usage, why don't we remove the failed journal first?

This is really not necessary. 

I guess we can just drop this patch. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-03-15 23:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-15 18:28 [PATCH 1/2] md/r5cache: improve add-journal Song Liu
2017-03-15 18:28 ` [PATCH 2/2] md/r5cache: journal remove support Song Liu
2017-03-15 22:52   ` Shaohua Li
2017-03-15 22:51 ` [PATCH 1/2] md/r5cache: improve add-journal Shaohua Li
2017-03-15 23:48   ` Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).