All of lore.kernel.org
 help / color / mirror / Atom feed
* mirroring: [patch 1 of 6] device failure tolerance
@ 2005-06-30  7:33 Jonathan E Brassow
  2005-08-03 15:50 ` Dick
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan E Brassow @ 2005-06-30  7:33 UTC (permalink / raw)
  To: device-mapper development; +Cc: Kergon Alasdair

Thanks to some great feedback from agk and alewis, I've reworked the 
patches.  I moved away from creating new LOG_<state> defines and from 
forcing functions to do what they probably weren't meant to.  In the 
process though, I have added more log functions.  I think this makes 
things cleaner and clearer.  These 6 patches completely replace the old 
set of 8.

This first patch handles log device failures.

  brassow

diff -urN linux-2.6.12-orig/drivers/md/dm-log.c 
linux-2.6.12-00001/drivers/md/dm-log.c
--- linux-2.6.12-orig/drivers/md/dm-log.c	2005-06-17 14:48:29.000000000 
-0500
+++ linux-2.6.12-00001/drivers/md/dm-log.c	2005-06-29 
19:23:58.371949200 -0500
@@ -150,6 +150,7 @@
  	/*
  	 * Disk log fields
  	 */
+	int log_dev_failed;
  	struct dm_dev *log_dev;
  	struct log_header header;

@@ -412,6 +413,7 @@

  	lc = (struct log_c *) log->context;
  	lc->log_dev = dev;
+	lc->log_dev_failed = 0;

  	/* setup the disk header fields */
  	lc->header_location.bdev = lc->log_dev->bdev;
@@ -465,6 +467,17 @@
  	return count;
  }

+static void fail_log_device(struct log_c *lc)
+{
+	lc->log_dev_failed = 1;
+	dm_table_event(lc->ti->table);
+}
+
+static void restore_log_device(struct log_c *lc)
+{
+	lc->log_dev_failed = 0;
+}
+
  static int disk_resume(struct dirty_log *log)
  {
  	int r;
@@ -472,15 +485,16 @@
  	struct log_c *lc = (struct log_c *) log->context;
  	size_t size = lc->bitset_uint32_count * sizeof(uint32_t);

-	/* read the disk header */
-	r = read_header(lc);
-	if (r)
-		return r;
-
-	/* read the bits */
-	r = read_bits(lc);
-	if (r)
-		return r;
+	/*
+	 * Read the disk header, but only if we know it is good.
+	 */
+	if (!lc->log_dev_failed) {
+		if (read_header(lc) || read_bits(lc)) {
+			DMERR("A read failure has occurred on a mirror log device.");
+			fail_log_device(lc);
+			lc->header.nr_regions = 0;
+		}
+	}

  	/* set or clear any new bits */
  	if (lc->sync == NOSYNC)
@@ -496,16 +510,17 @@
  	memcpy(lc->sync_bits, lc->clean_bits, size);
  	lc->sync_count = count_bits32(lc->clean_bits, 
lc->bitset_uint32_count);

-	/* write the bits */
-	r = write_bits(lc);
-	if (r)
-		return r;
-
  	/* set the correct number of regions in the header */
  	lc->header.nr_regions = lc->region_count;

-	/* write the new header */
-	return write_header(lc);
+	/* write out the log */
+	if ((r = write_bits(lc)) || (r = write_header(lc))){
+		DMERR("A write failure has occurred on a mirror log device.");
+		fail_log_device(lc);
+	} else {
+		restore_log_device(lc);
+	}
+	return r;
  }

  static uint32_t core_get_region_size(struct dirty_log *log)
@@ -541,9 +556,29 @@
  	if (!lc->touched)
  		return 0;

+	/*
+	 * Could be dangerous if the write fails.
+	 * If the machine dies while the on-disk log is different from the 
core,
+	 * and the device is readable when the machine comes back, it may be
+	 * possible that not all regions will be recovered.
+	 *
+	 * The event is raised so that a user-land program 'waiting' on the
+	 * device can choose stop the mirror, remap the mirror using a
+	 * different log device, or switch to core.
+	 *
+	 * So, not taking action in user-space AND having a machine fail
+	 * after a log has failed AND having the device available when the
+	 * machine reboots is a bad thing.
+	 */
  	r = write_bits(lc);
-	if (!r)
+	if (!r) {
  		lc->touched = 0;
+		restore_log_device(lc);
+	} else {
+		DMERR("A write failure has occurred on a mirror log device.");
+		DMERR("Log device is now not in-sync with the core.");
+		fail_log_device(lc);
+	}

  	return r;
  }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mirroring: [patch 1 of 6] device failure tolerance
  2005-06-30  7:33 mirroring: [patch 1 of 6] device failure tolerance Jonathan E Brassow
@ 2005-08-03 15:50 ` Dick
  2005-08-03 18:48   ` Jonathan E Brassow
  0 siblings, 1 reply; 5+ messages in thread
From: Dick @ 2005-08-03 15:50 UTC (permalink / raw)
  To: dm-devel

I'm really happy someone (Jonathan E Brassow) is working on this part of
device-mapper. I'd like to know if I can 'safely' try these patches, or will it
most likely nuke my data? I'm especially interested in the read balancing
updates with raid-1.

Greetings,
Dick

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: mirroring: [patch 1 of 6] device failure tolerance
  2005-08-03 15:50 ` Dick
@ 2005-08-03 18:48   ` Jonathan E Brassow
  2006-04-26 17:09     ` Dick
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan E Brassow @ 2005-08-03 18:48 UTC (permalink / raw)
  To: device-mapper development

You're welcome to try them.  However, the user-space compliment for 
handling device failures is not there yet.  You should be able to test 
read balancing though.

Please make sure you have the latest patches.  See:
http://www.redhat.com/archives/dm-devel/2005-August/msg00006.html
and
http://www.redhat.com/archives/dm-devel/2005-August/msg00018.html

  brassow

On Aug 3, 2005, at 10:50 AM, Dick wrote:

> I'm really happy someone (Jonathan E Brassow) is working on this part 
> of
> device-mapper. I'd like to know if I can 'safely' try these patches, 
> or will it
> most likely nuke my data? I'm especially interested in the read 
> balancing
> updates with raid-1.
>
> Greetings,
> Dick
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: mirroring: [patch 1 of 6] device failure tolerance
  2005-08-03 18:48   ` Jonathan E Brassow
@ 2006-04-26 17:09     ` Dick
  2006-04-26 21:35       ` Jonathan E Brassow
  0 siblings, 1 reply; 5+ messages in thread
From: Dick @ 2006-04-26 17:09 UTC (permalink / raw)
  To: dm-devel

Hi Jonathan, or anyone else who is involved,

The old patches won't apply on linux-2.6.16(.3) anymore, could you please create
new patches (for linux 2.6.16) ?
The previous update worked for me quite well!

Thanks in advance,
Dick

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: mirroring: [patch 1 of 6] device failure tolerance
  2006-04-26 17:09     ` Dick
@ 2006-04-26 21:35       ` Jonathan E Brassow
  0 siblings, 0 replies; 5+ messages in thread
From: Jonathan E Brassow @ 2006-04-26 21:35 UTC (permalink / raw)
  To: device-mapper development

you can check http://www.brassow.com/mirroring ...

New(er) patches will be coming soon, as well as userland code updates.

  brassow

On Apr 26, 2006, at 12:09 PM, Dick wrote:

> Hi Jonathan, or anyone else who is involved,
>
> The old patches won't apply on linux-2.6.16(.3) anymore, could you 
> please create
> new patches (for linux 2.6.16) ?
> The previous update worked for me quite well!
>
> Thanks in advance,
> Dick
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-04-26 21:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-30  7:33 mirroring: [patch 1 of 6] device failure tolerance Jonathan E Brassow
2005-08-03 15:50 ` Dick
2005-08-03 18:48   ` Jonathan E Brassow
2006-04-26 17:09     ` Dick
2006-04-26 21:35       ` Jonathan E Brassow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.