* dm-raid1 barriers
@ 2009-11-16 16:50 Mikulas Patocka
2009-11-16 18:48 ` Alasdair G Kergon
0 siblings, 1 reply; 4+ messages in thread
From: Mikulas Patocka @ 2009-11-16 16:50 UTC (permalink / raw)
To: Alasdair G Kergon; +Cc: dm-devel
Hi
As you talked about setting the whole device dirty. The problem is this:
When there are no writes pending for a given region, we set the bit in
memory, indicating that the region is clean.
Some times later (usually while setting another dirty bit in the log), we
write the log to disk.
Before writing the log, we must flush cache on both mirror legs.
(otherwise, we might write some bit as clean while the data is still
pending in the disk cache)
If this flush fails, the bits in memory are already clean and we don't
know which regions may hold unwritten data in the disk cache and which
not. So, we must set all regions as dirty.
---
If you don't like setting all regions as dirty, we would have to introduce
new region state ("no pending writes but possibly unflushed") and
introduce new bitmap for regions in this state.
I wouldn't like to do it because it is considerable coding overhead, it
makes a lot of opportunities for bugs (which are hard to find because
failures happen rarely) and the whole effect for the user could be
negative --- instead of resyncing the whole array it could introduce a bug
that corrupts user's data.
BTW, when running with dmeventd, the whole issue is poitless because
dmeventd removes failed mirror legs, so it always resyncs when recreating
the mirror. Without dmeventd, the mirror is already unsafe and has
problems leading to data corruption when failed devices reappear, so
resyncing or not-resyncing the whole mirror is probably the least serious
thing to worry about.
Mikulas
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dm-raid1 barriers
2009-11-16 16:50 dm-raid1 barriers Mikulas Patocka
@ 2009-11-16 18:48 ` Alasdair G Kergon
2009-11-17 10:06 ` Mikulas Patocka
0 siblings, 1 reply; 4+ messages in thread
From: Alasdair G Kergon @ 2009-11-16 18:48 UTC (permalink / raw)
To: Mikulas Patocka; +Cc: dm-devel
On Mon, Nov 16, 2009 at 11:50:56AM -0500, Mikulas Patocka wrote:
> When there are no writes pending for a given region, we set the bit in
> memory, indicating that the region is clean.
> Some times later (usually while setting another dirty bit in the log), we
> write the log to disk.
> Before writing the log, we must flush cache on both mirror legs.
> (otherwise, we might write some bit as clean while the data is still
> pending in the disk cache)
>
> If this flush fails, the bits in memory are already clean and we don't
> know which regions may hold unwritten data in the disk cache and which
Actually we do: the on-disk version of the log tells us.
> not. So, we must set all regions as dirty.
Anyway, let's not try to handle these situations at the moment.
what I need for now, is a way to know whether or not this situation has
actually occurred when we get reports from users, so probably a kernel
log message, a dm event raised plus a failure indicator against the device in
the status line.
Alasdair
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dm-raid1 barriers
2009-11-16 18:48 ` Alasdair G Kergon
@ 2009-11-17 10:06 ` Mikulas Patocka
2009-11-23 3:16 ` malahal
0 siblings, 1 reply; 4+ messages in thread
From: Mikulas Patocka @ 2009-11-17 10:06 UTC (permalink / raw)
To: Alasdair G Kergon; +Cc: dm-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1095 bytes --]
> > If this flush fails, the bits in memory are already clean and we don't
> > know which regions may hold unwritten data in the disk cache and which
>
> Actually we do: the on-disk version of the log tells us.
Yes, it could be read into another buffer and anded.
> > not. So, we must set all regions as dirty.
>
> Anyway, let's not try to handle these situations at the moment.
> what I need for now, is a way to know whether or not this situation has
> actually occurred when we get reports from users, so probably a kernel
> log message, a dm event raised plus a failure indicator against the device in
> the status line.
>
> Alasdair
Here I'm attaching two patches that make it report 'F' on flush error. But
I think it's not really important, the flush error could be treated as
write error, there is no need for special handling of flush errors ---
both flush errors and write errors can happen either on transport errors
or if the device is out of spare sectors.
Note that this approach of reporting flushes speciall also breaks
new-kernel-with-old-userspace case.
Mikulas
[-- Attachment #2: Type: TEXT/PLAIN, Size: 3288 bytes --]
Report flush errors as 'F' instead of 'D' for log and mirror devices.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
drivers/md/dm-log.c | 16 ++++++++++++----
drivers/md/dm-raid1.c | 6 ++++--
2 files changed, 16 insertions(+), 6 deletions(-)
Index: linux-2.6.31.6-fast/drivers/md/dm-raid1.c
===================================================================
--- linux-2.6.31.6-fast.orig/drivers/md/dm-raid1.c 2009-11-17 09:08:15.000000000 +0100
+++ linux-2.6.31.6-fast/drivers/md/dm-raid1.c 2009-11-17 09:10:07.000000000 +0100
@@ -35,6 +35,7 @@ static DECLARE_WAIT_QUEUE_HEAD(_kmirrord
*---------------------------------------------------------------*/
enum dm_raid1_error {
DM_RAID1_WRITE_ERROR,
+ DM_RAID1_FLUSH_ERROR,
DM_RAID1_SYNC_ERROR,
DM_RAID1_READ_ERROR
};
@@ -270,7 +271,7 @@ static int mirror_flush(void *ms_)
if (unlikely(error_bits != 0)) {
for (i = 0; i < ms->nr_mirrors; i++)
if (test_bit(i, &error_bits))
- fail_mirror(ms->mirror + i, DM_RAID1_WRITE_ERROR);
+ fail_mirror(ms->mirror + i, DM_RAID1_FLUSH_ERROR);
return -EIO;
}
@@ -1312,7 +1313,8 @@ static char device_status_char(struct mi
if (!atomic_read(&(m->error_count)))
return 'A';
- return (test_bit(DM_RAID1_WRITE_ERROR, &(m->error_type))) ? 'D' :
+ return (test_bit(DM_RAID1_FLUSH_ERROR, &(m->error_type))) ? 'F' :
+ (test_bit(DM_RAID1_WRITE_ERROR, &(m->error_type))) ? 'D' :
(test_bit(DM_RAID1_SYNC_ERROR, &(m->error_type))) ? 'S' :
(test_bit(DM_RAID1_READ_ERROR, &(m->error_type))) ? 'R' : 'U';
}
Index: linux-2.6.31.6-fast/drivers/md/dm-log.c
===================================================================
--- linux-2.6.31.6-fast.orig/drivers/md/dm-log.c 2009-11-17 10:20:39.000000000 +0100
+++ linux-2.6.31.6-fast/drivers/md/dm-log.c 2009-11-17 10:22:17.000000000 +0100
@@ -235,6 +235,7 @@ struct log_c {
* Disk log fields
*/
int log_dev_failed;
+ int log_dev_flush_failed;
struct dm_dev *log_dev;
struct log_header header;
@@ -422,6 +423,7 @@ static int create_log_context(struct dm_
} else {
lc->log_dev = dev;
lc->log_dev_failed = 0;
+ lc->log_dev_flush_failed = 0;
lc->header_location.bdev = lc->log_dev->bdev;
lc->header_location.sector = 0;
@@ -630,8 +632,11 @@ static int disk_resume(struct dm_dirty_l
/* write the new header */
r = rw_header(lc, WRITE);
- if (!r)
+ if (!r) {
r = flush_header(lc);
+ if (r)
+ lc->log_dev_flush_failed = 1;
+ }
if (r) {
DMWARN("%s: Failed to write header on dirty region log device",
lc->log_dev->name);
@@ -701,9 +706,10 @@ static int disk_flush(struct dm_dirty_lo
else {
if (lc->touched_dirtied) {
r = flush_header(lc);
- if (r)
+ if (r) {
+ lc->log_dev_flush_failed = 1;
fail_log_device(lc);
- else
+ } else
lc->touched_dirtied = 0;
}
lc->touched_cleaned = 0;
@@ -803,7 +809,9 @@ static int disk_status(struct dm_dirty_l
switch(status) {
case STATUSTYPE_INFO:
DMEMIT("3 %s %s %c", log->type->name, lc->log_dev->name,
- lc->log_dev_failed ? 'D' : 'A');
+ lc->log_dev_flush_failed ? 'F' :
+ lc->log_dev_failed ? 'D' :
+ 'A');
break;
case STATUSTYPE_TABLE:
[-- Attachment #3: Type: TEXT/PLAIN, Size: 1481 bytes --]
Accept 'F' for devices with failed flush.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
daemons/dmeventd/plugins/mirror/dmeventd_mirror.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
Index: LVM2.2.02.54/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c
===================================================================
--- LVM2.2.02.54.orig/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c 2009-11-17 09:12:40.000000000 +0100
+++ LVM2.2.02.54/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c 2009-11-17 10:56:43.000000000 +0100
@@ -94,12 +94,23 @@ static int _get_mirror_event(char *param
syslog(LOG_ERR, "Mirror device, %s, has failed.\n", args[i]);
r = ME_FAILURE;
}
+ if (dev_status_str[i] == 'F') {
+ syslog(LOG_ERR, "Flush on mirror device, %s, has failed.\n", args[i]);
+ r = ME_FAILURE;
+ }
/* Check for bad disk log device */
- if (log_argc > 1 && log_status_str[0] == 'D') {
- syslog(LOG_ERR, "Log device, %s, has failed.\n",
- args[2 + num_devs + log_argc]);
- r = ME_FAILURE;
+ if (log_argc > 1) {
+ if (log_status_str[0] == 'D') {
+ syslog(LOG_ERR, "Log device, %s, has failed.\n",
+ args[2 + num_devs + log_argc]);
+ r = ME_FAILURE;
+ }
+ if (log_status_str[0] == 'F') {
+ syslog(LOG_ERR, "Flush on log device, %s, has failed.\n",
+ args[2 + num_devs + log_argc]);
+ r = ME_FAILURE;
+ }
}
if (r == ME_FAILURE)
[-- Attachment #4: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dm-raid1 barriers
2009-11-17 10:06 ` Mikulas Patocka
@ 2009-11-23 3:16 ` malahal
0 siblings, 0 replies; 4+ messages in thread
From: malahal @ 2009-11-23 3:16 UTC (permalink / raw)
To: dm-devel
Mikulas Patocka [mpatocka@redhat.com] wrote:
> > > If this flush fails, the bits in memory are already clean and we don't
> > > know which regions may hold unwritten data in the disk cache and which
> >
> > Actually we do: the on-disk version of the log tells us.
>
> Yes, it could be read into another buffer and anded.
>
> > > not. So, we must set all regions as dirty.
> >
> > Anyway, let's not try to handle these situations at the moment.
> > what I need for now, is a way to know whether or not this situation has
> > actually occurred when we get reports from users, so probably a kernel
> > log message, a dm event raised plus a failure indicator against the device in
> > the status line.
Agreed, we don't want to handle this error case with reading the entries
from the disk log and comparing them with the in memory buffer at this
point. Why not just set all regions to dirty as Mikulas pointed as a
short term solution? Not optimal but it is a correct way to handle.
--Malahal.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-11-23 3:16 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-16 16:50 dm-raid1 barriers Mikulas Patocka
2009-11-16 18:48 ` Alasdair G Kergon
2009-11-17 10:06 ` Mikulas Patocka
2009-11-23 3:16 ` malahal
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.