dm-raid1 barriers

All of lore.kernel.org
 help / color / mirror / Atom feed

* dm-raid1 barriers
@ 2009-11-16 16:50 Mikulas Patocka
  2009-11-16 18:48 ` Alasdair G Kergon
  0 siblings, 1 reply; 4+ messages in thread
From: Mikulas Patocka @ 2009-11-16 16:50 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

Hi

As you talked about setting the whole device dirty. The problem is this:

When there are no writes pending for a given region, we set the bit in 
memory, indicating that the region is clean.

Some times later (usually while setting another dirty bit in the log), we 
write the log to disk.

Before writing the log, we must flush cache on both mirror legs. 
(otherwise, we might write some bit as clean while the data is still 
pending in the disk cache)

If this flush fails, the bits in memory are already clean and we don't 
know which regions may hold unwritten data in the disk cache and which 
not. So, we must set all regions as dirty.

---

If you don't like setting all regions as dirty, we would have to introduce 
new region state ("no pending writes but possibly unflushed") and 
introduce new bitmap for regions in this state.

I wouldn't like to do it because it is considerable coding overhead, it 
makes a lot of opportunities for bugs (which are hard to find because 
failures happen rarely) and the whole effect for the user could be 
negative --- instead of resyncing the whole array it could introduce a bug 
that corrupts user's data.

BTW, when running with dmeventd, the whole issue is poitless because 
dmeventd removes failed mirror legs, so it always resyncs when recreating 
the mirror. Without dmeventd, the mirror is already unsafe and has 
problems leading to data corruption when failed devices reappear, so 
resyncing or not-resyncing the whole mirror is probably the least serious 
thing to worry about.

Mikulas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dm-raid1 barriers
  2009-11-16 16:50 dm-raid1 barriers Mikulas Patocka
@ 2009-11-16 18:48 ` Alasdair G Kergon
  2009-11-17 10:06   ` Mikulas Patocka
  0 siblings, 1 reply; 4+ messages in thread
From: Alasdair G Kergon @ 2009-11-16 18:48 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel

On Mon, Nov 16, 2009 at 11:50:56AM -0500, Mikulas Patocka wrote:
> When there are no writes pending for a given region, we set the bit in 
> memory, indicating that the region is clean.
> Some times later (usually while setting another dirty bit in the log), we 
> write the log to disk.
> Before writing the log, we must flush cache on both mirror legs. 
> (otherwise, we might write some bit as clean while the data is still 
> pending in the disk cache)
> 
> If this flush fails, the bits in memory are already clean and we don't 
> know which regions may hold unwritten data in the disk cache and which 

Actually we do: the on-disk version of the log tells us.

> not. So, we must set all regions as dirty.
 
Anyway, let's not try to handle these situations at the moment.
what I need for now, is a way to know whether or not this situation has
actually occurred when we get reports from users, so probably a kernel
log message, a dm event raised plus a failure indicator against the device in
the status line.

Alasdair

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dm-raid1 barriers
  2009-11-16 18:48 ` Alasdair G Kergon
@ 2009-11-17 10:06   ` Mikulas Patocka
  2009-11-23  3:16     ` malahal
  0 siblings, 1 reply; 4+ messages in thread
From: Mikulas Patocka @ 2009-11-17 10:06 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1095 bytes --]

> > If this flush fails, the bits in memory are already clean and we don't 
> > know which regions may hold unwritten data in the disk cache and which 
> 
> Actually we do: the on-disk version of the log tells us.

Yes, it could be read into another buffer and anded.

> > not. So, we must set all regions as dirty.
>  
> Anyway, let's not try to handle these situations at the moment.
> what I need for now, is a way to know whether or not this situation has
> actually occurred when we get reports from users, so probably a kernel
> log message, a dm event raised plus a failure indicator against the device in
> the status line.
> 
> Alasdair

Here I'm attaching two patches that make it report 'F' on flush error. But 
I think it's not really important, the flush error could be treated as 
write error, there is no need for special handling of flush errors --- 
both flush errors and write errors can happen either on transport errors 
or if the device is out of spare sectors.

Note that this approach of reporting flushes speciall also breaks 
new-kernel-with-old-userspace case.

Mikulas

[-- Attachment #2: Type: TEXT/PLAIN, Size: 3288 bytes --]

Report flush errors as 'F' instead of 'D' for log and mirror devices.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-log.c   |   16 ++++++++++++----
 drivers/md/dm-raid1.c |    6 ++++--
 2 files changed, 16 insertions(+), 6 deletions(-)

Index: linux-2.6.31.6-fast/drivers/md/dm-raid1.c
===================================================================
--- linux-2.6.31.6-fast.orig/drivers/md/dm-raid1.c	2009-11-17 09:08:15.000000000 +0100
+++ linux-2.6.31.6-fast/drivers/md/dm-raid1.c	2009-11-17 09:10:07.000000000 +0100
@@ -35,6 +35,7 @@ static DECLARE_WAIT_QUEUE_HEAD(_kmirrord
  *---------------------------------------------------------------*/
 enum dm_raid1_error {
 	DM_RAID1_WRITE_ERROR,
+	DM_RAID1_FLUSH_ERROR,
 	DM_RAID1_SYNC_ERROR,
 	DM_RAID1_READ_ERROR
 };
@@ -270,7 +271,7 @@ static int mirror_flush(void *ms_)
 	if (unlikely(error_bits != 0)) {
 		for (i = 0; i < ms->nr_mirrors; i++)
 			if (test_bit(i, &error_bits))
-				fail_mirror(ms->mirror + i, DM_RAID1_WRITE_ERROR);
+				fail_mirror(ms->mirror + i, DM_RAID1_FLUSH_ERROR);
 		return -EIO;
 	}
 
@@ -1312,7 +1313,8 @@ static char device_status_char(struct mi
 	if (!atomic_read(&(m->error_count)))
 		return 'A';
 
-	return (test_bit(DM_RAID1_WRITE_ERROR, &(m->error_type))) ? 'D' :
+	return (test_bit(DM_RAID1_FLUSH_ERROR, &(m->error_type))) ? 'F' :
+		(test_bit(DM_RAID1_WRITE_ERROR, &(m->error_type))) ? 'D' :
 		(test_bit(DM_RAID1_SYNC_ERROR, &(m->error_type))) ? 'S' :
 		(test_bit(DM_RAID1_READ_ERROR, &(m->error_type))) ? 'R' : 'U';
 }
Index: linux-2.6.31.6-fast/drivers/md/dm-log.c
===================================================================
--- linux-2.6.31.6-fast.orig/drivers/md/dm-log.c	2009-11-17 10:20:39.000000000 +0100
+++ linux-2.6.31.6-fast/drivers/md/dm-log.c	2009-11-17 10:22:17.000000000 +0100
@@ -235,6 +235,7 @@ struct log_c {
 	 * Disk log fields
 	 */
 	int log_dev_failed;
+	int log_dev_flush_failed;
 	struct dm_dev *log_dev;
 	struct log_header header;
 
@@ -422,6 +423,7 @@ static int create_log_context(struct dm_
 	} else {
 		lc->log_dev = dev;
 		lc->log_dev_failed = 0;
+		lc->log_dev_flush_failed = 0;
 		lc->header_location.bdev = lc->log_dev->bdev;
 		lc->header_location.sector = 0;
 
@@ -630,8 +632,11 @@ static int disk_resume(struct dm_dirty_l
 
 	/* write the new header */
 	r = rw_header(lc, WRITE);
-	if (!r)
+	if (!r) {
 		r = flush_header(lc);
+		if (r)
+			lc->log_dev_flush_failed = 1;
+	}
 	if (r) {
 		DMWARN("%s: Failed to write header on dirty region log device",
 		       lc->log_dev->name);
@@ -701,9 +706,10 @@ static int disk_flush(struct dm_dirty_lo
 	else {
 		if (lc->touched_dirtied) {
 			r = flush_header(lc);
-			if (r)
+			if (r) {
+				lc->log_dev_flush_failed = 1;
 				fail_log_device(lc);
-			else
+			} else
 				lc->touched_dirtied = 0;
 		}
 		lc->touched_cleaned = 0;
@@ -803,7 +809,9 @@ static int disk_status(struct dm_dirty_l
 	switch(status) {
 	case STATUSTYPE_INFO:
 		DMEMIT("3 %s %s %c", log->type->name, lc->log_dev->name,
-		       lc->log_dev_failed ? 'D' : 'A');
+		       lc->log_dev_flush_failed ? 'F' :
+		       lc->log_dev_failed ? 'D' :
+		       'A');
 		break;
 
 	case STATUSTYPE_TABLE:

[-- Attachment #3: Type: TEXT/PLAIN, Size: 1481 bytes --]

Accept 'F' for devices with failed flush.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 daemons/dmeventd/plugins/mirror/dmeventd_mirror.c |   19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Index: LVM2.2.02.54/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c
===================================================================
--- LVM2.2.02.54.orig/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c	2009-11-17 09:12:40.000000000 +0100
+++ LVM2.2.02.54/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c	2009-11-17 10:56:43.000000000 +0100
@@ -94,12 +94,23 @@ static int _get_mirror_event(char *param
 			syslog(LOG_ERR, "Mirror device, %s, has failed.\n", args[i]);
 			r = ME_FAILURE;
 		}
+		if (dev_status_str[i] == 'F') {
+			syslog(LOG_ERR, "Flush on mirror device, %s, has failed.\n", args[i]);
+			r = ME_FAILURE;
+		}
 
 	/* Check for bad disk log device */
-	if (log_argc > 1 && log_status_str[0] == 'D') {
-		syslog(LOG_ERR, "Log device, %s, has failed.\n",
-		       args[2 + num_devs + log_argc]);
-		r = ME_FAILURE;
+	if (log_argc > 1) {
+		if (log_status_str[0] == 'D') {
+			syslog(LOG_ERR, "Log device, %s, has failed.\n",
+			       args[2 + num_devs + log_argc]);
+			r = ME_FAILURE;
+		}
+		if (log_status_str[0] == 'F') {
+			syslog(LOG_ERR, "Flush on log device, %s, has failed.\n",
+			       args[2 + num_devs + log_argc]);
+			r = ME_FAILURE;
+		}
 	}
 
 	if (r == ME_FAILURE)

[-- Attachment #4: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dm-raid1 barriers
  2009-11-17 10:06   ` Mikulas Patocka
@ 2009-11-23  3:16     ` malahal
  0 siblings, 0 replies; 4+ messages in thread
From: malahal @ 2009-11-23  3:16 UTC (permalink / raw)
  To: dm-devel

Mikulas Patocka [mpatocka@redhat.com] wrote:
> > > If this flush fails, the bits in memory are already clean and we don't 
> > > know which regions may hold unwritten data in the disk cache and which 
> > 
> > Actually we do: the on-disk version of the log tells us.
> 
> Yes, it could be read into another buffer and anded.
> 
> > > not. So, we must set all regions as dirty.
> >  
> > Anyway, let's not try to handle these situations at the moment.
> > what I need for now, is a way to know whether or not this situation has
> > actually occurred when we get reports from users, so probably a kernel
> > log message, a dm event raised plus a failure indicator against the device in
> > the status line.

Agreed, we don't want to handle this error case with reading the entries
from the disk log and comparing them with the in memory buffer at this
point.  Why not just set all regions to dirty as Mikulas pointed as a
short term solution? Not optimal but it is a correct way to handle.

--Malahal.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-11-23  3:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-16 16:50 dm-raid1 barriers Mikulas Patocka
2009-11-16 18:48 ` Alasdair G Kergon
2009-11-17 10:06   ` Mikulas Patocka
2009-11-23  3:16     ` malahal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.