All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/4] dm-log: support multi-log devices
@ 2008-11-26  0:01 Takahiro Yasui
  2008-11-26 18:56 ` Phillip Susi
  2008-12-12 20:21 ` Jonathan Brassow
  0 siblings, 2 replies; 24+ messages in thread
From: Takahiro Yasui @ 2008-11-26  0:01 UTC (permalink / raw)
  To: dm-devel; +Cc: Alasdair G Kergon, Masami Hiramatsu

Hi All,

This is my first post to this mailing list, but let me introduce
a patch set to implement multi-log devices for dm-mirror.
I appreciate your kind comments and suggestions on this patch set.


PATCH SET
=========
  1/4: dm-log: fix no io_client_destroy
  2/4: dm-log: remove unnecessary updates of io_req members
  3/4: dm-log: introduce multi-log devices
  4/4: dm-log: update interface to control multi-log devices


BACKGROUND
==========

device mapper mirroring (dm-mirror) uses "log" to keep status of
consistency among mirror devices. There are two types of log,
"core" and "disk". The former keep its data only on memory, and
the latter can keep data on a device. When system starts booting
and setup disks managed by device mapper, device mapper checks
log data if each region is in sync state among mirror devices.
And if not, it executes data replication from default-mirror device
to others.

However, once log disk breaks down, data replications are required
for a whole data disks, and if a size of data disk is huge, it
takes long time for disk replication, and it utilizes much system
resources, such as I/O bandwidth, CPU, memory. That might cause
system performance degradation until disk replication completes.

This patch introduces multi-log devices for mirror target, which
stores log data on multiple log devices, and decreases probability
of disk replication even if one log disk has broken down.


DESIGN OVERVIEW
===============

  * maximum number of log devices

    Nine devices, the same maximum number as mirror devices, can be
    used for log devices.

  * error handling during setting up a mirror device

    All log devices should be detected Linux kernel. If any disks
    are not found, a mirror construction will fail.

    Also, log headers are checked if all log devices have the same
    region numbers. New log devices are excluded from this check.
    (They are usually initialized by "0")

  * error handling during I/O operation

    An error has been detected on a log device after a mirror device
    is constructed, the device gets marked "fail". I/O operations
    are done only to valid log devices, and no I/O is issued on
    those failed devices. If all log devices fails, disk log works
    much like "core" log.

  * ioctl interface

    This patch does not affect "core" log interface, but change
    "disk" log interface. To keep backward compatibility, path names
    of log device are just listed without a number of log devices.

    <current interface>

        disk_ctr():
             log_path region_size [[no]sync]

        disk_status() - STATUSTYPE_INFO:
             nr_params disk log_path log_status:"A" or "D"

        disk_status() - STATUSTYPE_TABLE:
            disk nr_params log_path region_size [[no]sync]

    <proposed interface>

        disk_ctr():
            [log_path]{1,} region_size [[no]sync]

        disk_status() - STATUSTYPE_INFO:
            nr_params disk [log_path]{1,} [log_status:"A" or "D"]{1,}

        disk_status() - STATUSTYPE_TABLE:
            disk nr_params [log_path]{1,} region_size [[no]sync]


EXAMPLES
========

  * create mirror with one log device (the same operation as usual)

    # dmsetup create mirror --table \
      "0 2097152 mirror disk 2 253:0 1024 2 253:1 0 253:2 0"
    # dmsetup status mirror
    0 2097152 mirror 2 253:1 253:2 960/2048 1 AA 3 disk 253:0 A

  * create mirror with two log devices [NEW]

    # dmsetup create mirror --table \
      "0 2097152 mirror disk 3 253:0 253:1 1024 2 253:2 0 253:3 0"
    # dmsetup status mirror
    0 2097152 mirror 2 253:2 253:3 2048/2048 1 AA 4 disk 253:0 253:1 AA

  * create mirror with four log devices [NEW]

    # dmsetup create mirror --table \
      "0 2097152 mirror disk 5 253:0 253:1 253:2 253:3 1024 \
       2 253:4 0 253:5 0"
    # dmsetup status mirror
      0 2097152 mirror 2 253:4 253:5 316/2048 1 AA 6 disk \
      253:0 253:1 253:2 253:3 AAAA


FUTURE WORKS
============

  * Independent header and bitmap I/O

    Currently disk_header holds header and bitmap and issue I/O together.
    But header can be modified during construction and resume procedures,
    and we can remove header I/O.

  * Partial bitmap update

    bitmap size could be larger if the amount of mirror devices are huge
    or region size is small. In the current implementation, disk log handles
    bitmap as one buffer and updates a whole bitmap every time. Therefore,
    the amount of I/O issued to a log device could be larger than to mirror
    devices. Partial bitmap update can issue I/O of updated sectors.

---
Takahiro Yasui
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-11-26  0:01 [RFC][PATCH 0/4] dm-log: support multi-log devices Takahiro Yasui
@ 2008-11-26 18:56 ` Phillip Susi
  2008-11-27  7:50   ` Takahiro Yasui
  2008-12-12 20:21 ` Jonathan Brassow
  1 sibling, 1 reply; 24+ messages in thread
From: Phillip Susi @ 2008-11-26 18:56 UTC (permalink / raw)
  To: device-mapper development

Takahiro Yasui wrote:
> Hi All,
> 
> This is my first post to this mailing list, but let me introduce
> a patch set to implement multi-log devices for dm-mirror.
> I appreciate your kind comments and suggestions on this patch set.

Does it keep track of which log device corresponds to each mirror device
and make sure that the log on disk X is updated before the data on disk
X?  In other words, if you are about to write data to disk 1 that would
cause that section to be flagged as dirty, you have to update the log on
disk 1 first, not the log on disk 2.

The syntax and the fact that the number of logs does not have to equal
the number of mirrors makes me think this is not the case.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-11-26 18:56 ` Phillip Susi
@ 2008-11-27  7:50   ` Takahiro Yasui
  2008-11-28 20:06     ` Phillip Susi
  0 siblings, 1 reply; 24+ messages in thread
From: Takahiro Yasui @ 2008-11-27  7:50 UTC (permalink / raw)
  To: device-mapper development

Phillip Susi wrote:
> Takahiro Yasui wrote:
>> Hi All,
>>
>> This is my first post to this mailing list, but let me introduce
>> a patch set to implement multi-log devices for dm-mirror.
>> I appreciate your kind comments and suggestions on this patch set.
> 
> Does it keep track of which log device corresponds to each mirror device
> and make sure that the log on disk X is updated before the data on disk
> X?  In other words, if you are about to write data to disk 1 that would
> cause that section to be flagged as dirty, you have to update the log on
> disk 1 first, not the log on disk 2.
> 
> The syntax and the fact that the number of logs does not have to equal
> the number of mirrors makes me think this is not the case.

bitmap data on a log disk indicates if each region are "clean" or "dirty"
among mirror disks. If a bit related to a region is "1", it means that
the region is "clean" and data in the region is synchronized in all mirror
disks and they store the same data in the region. On the other hand,
if a bit related to a region is "0", the region is "dirty" and the region
is out of synchronization. It means that each mirror disk might contain
different data in the region.

Therefore, a log disk contains a state of each region, but does not
correspond to a specific mirror device.

This patch set introduces redundant log disk on dm-mirror.

Thanks,
---
Takahiro Yasui
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-11-27  7:50   ` Takahiro Yasui
@ 2008-11-28 20:06     ` Phillip Susi
  2008-12-01  7:00       ` Takahiro Yasui
  0 siblings, 1 reply; 24+ messages in thread
From: Phillip Susi @ 2008-11-28 20:06 UTC (permalink / raw)
  To: device-mapper development

Takahiro Yasui wrote:
> bitmap data on a log disk indicates if each region are "clean" or "dirty"
> among mirror disks. If a bit related to a region is "1", it means that
> the region is "clean" and data in the region is synchronized in all mirror
> disks and they store the same data in the region. On the other hand,
> if a bit related to a region is "0", the region is "dirty" and the region
> is out of synchronization. It means that each mirror disk might contain
> different data in the region.
> 
> Therefore, a log disk contains a state of each region, but does not
> correspond to a specific mirror device.
> 
> This patch set introduces redundant log disk on dm-mirror.

Right... and when recording a log on every disk in the mirror, each copy
of the log may not contain exactly the same information at any given
time.  If you write to one disk in the mirror first, then you need to
mark the region as dirty on that disk first, so that if the system
crashes before you can copy the data to the other mirror, you can see
that the first disk is more up to date than the second disk.

In other words, knowing that a region is or is not synchronized across
each disk is not enough; if they are out of sync you need to figure out
which disk has the most current information so it can be replicated to
the others, don't you?

Or do you just always write to the first disk first, and assume it has
the most recent data if the region was marked as dirty in ANY of the logs?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-11-28 20:06     ` Phillip Susi
@ 2008-12-01  7:00       ` Takahiro Yasui
  2008-12-01 11:09         ` Michał Mirosław
  2008-12-01 16:10         ` Phillip Susi
  0 siblings, 2 replies; 24+ messages in thread
From: Takahiro Yasui @ 2008-12-01  7:00 UTC (permalink / raw)
  To: device-mapper development

Phillip Susi wrote:
> Takahiro Yasui wrote:
>> bitmap data on a log disk indicates if each region are "clean" or "dirty"
>> among mirror disks. If a bit related to a region is "1", it means that
>> the region is "clean" and data in the region is synchronized in all mirror
>> disks and they store the same data in the region. On the other hand,
>> if a bit related to a region is "0", the region is "dirty" and the region
>> is out of synchronization. It means that each mirror disk might contain
>> different data in the region.
>>
>> Therefore, a log disk contains a state of each region, but does not
>> correspond to a specific mirror device.
>>
>> This patch set introduces redundant log disk on dm-mirror.
> 
> Right... and when recording a log on every disk in the mirror, each copy
> of the log may not contain exactly the same information at any given
> time.  If you write to one disk in the mirror first, then you need to
> mark the region as dirty on that disk first, so that if the system
> crashes before you can copy the data to the other mirror, you can see
> that the first disk is more up to date than the second disk.
> 
> In other words, knowing that a region is or is not synchronized across
> each disk is not enough; if they are out of sync you need to figure out
> which disk has the most current information so it can be replicated to
> the others, don't you?
>
> Or do you just always write to the first disk first, and assume it has
> the most recent data if the region was marked as dirty in ANY of the logs?

log disks are updated in parallel and we do not know which disk has the
latest and correct data if the system crashes during write operations
on log devices. But there is no problem about it.

There are two cases we need to think about.

1) Some log devices contain "clean", but mirror devices are not synchronized

This case is problematic, but never happens, because data is written on
mirror devices after marking log devices "dirty", and make it "clean"
after write I/Os on mirror devices completed and mirrors get synchronized.

2) Some log devices contain "dirty", but mirror devices are synchronized

This case may happen but is not problematic. Just data replication of
the region among mirror devices will be done when the mirror is resumed.
This case would also happen on the system with the current single log if
the system crashes after marking a log device "dirty" and before marking
it back to "clean".

Thanks,
---
Takahiro Yasui
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-12-01  7:00       ` Takahiro Yasui
@ 2008-12-01 11:09         ` Michał Mirosław
  2008-12-02  6:05           ` Takahiro Yasui
  2008-12-01 16:10         ` Phillip Susi
  1 sibling, 1 reply; 24+ messages in thread
From: Michał Mirosław @ 2008-12-01 11:09 UTC (permalink / raw)
  To: dm-devel

2008/12/1 Takahiro Yasui <tyasui@redhat.com>:
> Phillip Susi wrote:
[...]
>> Right... and when recording a log on every disk in the mirror, each copy
>> of the log may not contain exactly the same information at any given
>> time.  If you write to one disk in the mirror first, then you need to
>> mark the region as dirty on that disk first, so that if the system
>> crashes before you can copy the data to the other mirror, you can see
>> that the first disk is more up to date than the second disk.
>>
>> In other words, knowing that a region is or is not synchronized across
>> each disk is not enough; if they are out of sync you need to figure out
>> which disk has the most current information so it can be replicated to
>> the others, don't you?
>>
>> Or do you just always write to the first disk first, and assume it has
>> the most recent data if the region was marked as dirty in ANY of the logs?
>
> log disks are updated in parallel and we do not know which disk has the
> latest and correct data if the system crashes during write operations
> on log devices. But there is no problem about it.
>
> There are two cases we need to think about.
>
> 1) Some log devices contain "clean", but mirror devices are not synchronized
>
> This case is problematic, but never happens, because data is written on
> mirror devices after marking log devices "dirty", and make it "clean"
> after write I/Os on mirror devices completed and mirrors get synchronized.
>
> 2) Some log devices contain "dirty", but mirror devices are synchronized
>
> This case may happen but is not problematic. Just data replication of
> the region among mirror devices will be done when the mirror is resumed.
> This case would also happen on the system with the current single log if
> the system crashes after marking a log device "dirty" and before marking
> it back to "clean".

What happens if some log devices contain "dirty" and not all mirrors were written
yet before a crash? How do you know which mirror has the most recent data?
Are the writes to mirrors ordered somehow?

Best Regards,
Michal Miroslaw

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-12-01  7:00       ` Takahiro Yasui
  2008-12-01 11:09         ` Michał Mirosław
@ 2008-12-01 16:10         ` Phillip Susi
  2008-12-02  4:52           ` Takahiro Yasui
  1 sibling, 1 reply; 24+ messages in thread
From: Phillip Susi @ 2008-12-01 16:10 UTC (permalink / raw)
  To: device-mapper development

Takahiro Yasui wrote:
> log disks are updated in parallel and we do not know which disk has the
> latest and correct data if the system crashes during write operations
> on log devices. But there is no problem about it.

Once the IO request has been completed, the data needs to be stable on
the disk.  This means that either you have to wait until the data has
been written to all underlying mirror devices before completing the
request ( slow ) or you have to have some way of knowing which disk(s)
got written to, and which ones need updated after a crash.  Are you
saying you take the former path?

> There are two cases we need to think about.
> 
> 1) Some log devices contain "clean", but mirror devices are not synchronized
> 
> This case is problematic, but never happens, because data is written on
> mirror devices after marking log devices "dirty", and make it "clean"
> after write I/Os on mirror devices completed and mirrors get synchronized.

Does the entire log-data-log update cycle complete before dm completes
the higher level IO request?  That would maintain data integrity, but at
significant cost to performance.

For performance sake, don't you want to allow write requests to be
completed before the log is necessarily marked as clean again?  That way
multiple writes to the same data zone do not require multiple log
dirty/clean updates.  Also for performance reasons, don't you want to
allow the data to be written to only one mirror before completing the
request?  Then go back and do lazy synchronization?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-12-01 16:10         ` Phillip Susi
@ 2008-12-02  4:52           ` Takahiro Yasui
  0 siblings, 0 replies; 24+ messages in thread
From: Takahiro Yasui @ 2008-12-02  4:52 UTC (permalink / raw)
  To: device-mapper development

Phillip Susi wrote:
> Takahiro Yasui wrote:
>> log disks are updated in parallel and we do not know which disk has the
>> latest and correct data if the system crashes during write operations
>> on log devices. But there is no problem about it.
> 
> Once the IO request has been completed, the data needs to be stable on
> the disk.  This means that either you have to wait until the data has
> been written to all underlying mirror devices before completing the
> request ( slow ) or you have to have some way of knowing which disk(s)
> got written to, and which ones need updated after a crash.  Are you
> saying you take the former path?

Yes, write I/O to all underlying mirror devices need to be completed.
I understand your concern and think that there is a room to study about
performance enhancement.

>> There are two cases we need to think about.
>>
>> 1) Some log devices contain "clean", but mirror devices are not synchronized
>>
>> This case is problematic, but never happens, because data is written on
>> mirror devices after marking log devices "dirty", and make it "clean"
>> after write I/Os on mirror devices completed and mirrors get synchronized.
> 
> Does the entire log-data-log update cycle complete before dm completes
> the higher level IO request?  That would maintain data integrity, but at
> significant cost to performance.

I/O request returns to the higher level after data I/O completed, and
an update of the log device is done later.

> For performance sake, don't you want to allow write requests to be
> completed before the log is necessarily marked as clean again?  That way
> multiple writes to the same data zone do not require multiple log
> dirty/clean updates.  Also for performance reasons, don't you want to
> allow the data to be written to only one mirror before completing the
> request?  Then go back and do lazy synchronization?

I am also thinking exactly what you mentioned, and it will improve performance
of dm-mirror. I am now trying to improve performance in terms of:

> FUTURE WORKS
>   * Independent header and bitmap I/O
>   * Partial bitmap update

Thanks,
---
Takahiro Yasui
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-12-01 11:09         ` Michał Mirosław
@ 2008-12-02  6:05           ` Takahiro Yasui
  0 siblings, 0 replies; 24+ messages in thread
From: Takahiro Yasui @ 2008-12-02  6:05 UTC (permalink / raw)
  To: device-mapper development

Michał Mirosław wrote:
> 2008/12/1 Takahiro Yasui <tyasui@redhat.com>:
>> Phillip Susi wrote:
> [...]
>>> Right... and when recording a log on every disk in the mirror, each copy
>>> of the log may not contain exactly the same information at any given
>>> time.  If you write to one disk in the mirror first, then you need to
>>> mark the region as dirty on that disk first, so that if the system
>>> crashes before you can copy the data to the other mirror, you can see
>>> that the first disk is more up to date than the second disk.
>>>
>>> In other words, knowing that a region is or is not synchronized across
>>> each disk is not enough; if they are out of sync you need to figure out
>>> which disk has the most current information so it can be replicated to
>>> the others, don't you?
>>>
>>> Or do you just always write to the first disk first, and assume it has
>>> the most recent data if the region was marked as dirty in ANY of the logs?
>> log disks are updated in parallel and we do not know which disk has the
>> latest and correct data if the system crashes during write operations
>> on log devices. But there is no problem about it.
>>
>> There are two cases we need to think about.
>>
>> 1) Some log devices contain "clean", but mirror devices are not synchronized
>>
>> This case is problematic, but never happens, because data is written on
>> mirror devices after marking log devices "dirty", and make it "clean"
>> after write I/Os on mirror devices completed and mirrors get synchronized.
>>
>> 2) Some log devices contain "dirty", but mirror devices are synchronized
>>
>> This case may happen but is not problematic. Just data replication of
>> the region among mirror devices will be done when the mirror is resumed.
>> This case would also happen on the system with the current single log if
>> the system crashes after marking a log device "dirty" and before marking
>> it back to "clean".
> 
> What happens if some log devices contain "dirty" and not all mirrors were written
> yet before a crash? How do you know which mirror has the most recent data?
> Are the writes to mirrors ordered somehow?

What happens if it is a raw device rather than dm-mirror? The I/O has
not completed yet and the request has not returned to the upper layer.
If system crashed at this point, no one knows which data, new or old,
is on the device, and application such as database should be responsible
for the transaction if necessary.

In the situation on dm-mirror you asked, we do not know which mirror
has the latest data, but I think that it is not a problem.

Thanks,
---
Takahiro Yasui
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-11-26  0:01 [RFC][PATCH 0/4] dm-log: support multi-log devices Takahiro Yasui
  2008-11-26 18:56 ` Phillip Susi
@ 2008-12-12 20:21 ` Jonathan Brassow
  2008-12-15 14:56   ` Takahiro Yasui
  1 sibling, 1 reply; 24+ messages in thread
From: Jonathan Brassow @ 2008-12-12 20:21 UTC (permalink / raw)
  To: device-mapper development


On Nov 25, 2008, at 6:01 PM, Takahiro Yasui wrote:
> PATCH SET
> =========
>  1/4: dm-log: fix no io_client_destroy

definitely, ACK.

>  2/4: dm-log: remove unnecessary updates of io_req members

haven't fully reviewed yet.

>  3/4: dm-log: introduce multi-log devices
>  4/4: dm-log: update interface to control multi-log devices

No.  more follows...

> BACKGROUND
> ==========

<snip>

> However, once log disk breaks down, data replications are required
> for a whole data disks, and if a size of data disk is huge, it
> takes long time

Not entirely true.  When the log disk breaks down /and/ the machine  
crashes or reboots, then resynchronization is necessary.  However,  
this means that in almost all circumstances, you are immediately able  
to replace the failed disk log with another and maintain the in-sync  
state of the log - avoiding the resync.

> This patch introduces multi-log devices for mirror target, which
> stores log data on multiple log devices, and decreases probability
> of disk replication even if one log disk has broken down.

Given what I said above, I'd like to see intelligence added to the  
dmeventd mirror DSO to handle replacing mirror logs done first.  There  
is certainly a lot of low lying fruit in that space.  However, I can  
see a small benefit to implementing multi-log.  Specifically, to  
address the case where a log device dies and is immediately followed  
by a machine failure before corrective action can be taken.  IOW, you  
are targeting a very small window of time here.

If you choose to take on the multi-log (which it appears you are  
mostly done), then I'd like to see it as a separate module.  IOW,  
there should be no code changes to dm-log.c.  You would implement a  
new module, named dm-log-multi[ple].ko.  The name would be "multi- 
disk".  This frees you to have whatever constructor table arguments  
you want.  (I've done something similar when creating my cluster-aware  
logging.)

I'll be on the call on Monday, of course you can ask any questions  
then too.

  brassow

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-12-12 20:21 ` Jonathan Brassow
@ 2008-12-15 14:56   ` Takahiro Yasui
  2008-12-20  0:13     ` malahal
  0 siblings, 1 reply; 24+ messages in thread
From: Takahiro Yasui @ 2008-12-15 14:56 UTC (permalink / raw)
  To: device-mapper development

Hi Jon,

Thanks for kind comments.

Jonathan Brassow wrote:
> On Nov 25, 2008, at 6:01 PM, Takahiro Yasui wrote:
>> PATCH SET
>> =========
>>  1/4: dm-log: fix no io_client_destroy
> 
> definitely, ACK.
> 
>>  2/4: dm-log: remove unnecessary updates of io_req members
> 
> haven't fully reviewed yet.
> 
>>  3/4: dm-log: introduce multi-log devices
>>  4/4: dm-log: update interface to control multi-log devices
> 
> No.  more follows...
> 
>> BACKGROUND
>> ==========
> 
> <snip>
> 
>> However, once log disk breaks down, data replications are required
>> for a whole data disks, and if a size of data disk is huge, it
>> takes long time
> 
> Not entirely true.  When the log disk breaks down /and/ the machine  
> crashes or reboots, then resynchronization is necessary.  However,  
> this means that in almost all circumstances, you are immediately able  
> to replace the failed disk log with another and maintain the in-sync  
> state of the log - avoiding the resync.
> 
>> This patch introduces multi-log devices for mirror target, which
>> stores log data on multiple log devices, and decreases probability
>> of disk replication even if one log disk has broken down.
> 
> Given what I said above, I'd like to see intelligence added to the  
> dmeventd mirror DSO to handle replacing mirror logs done first.  There  
> is certainly a lot of low lying fruit in that space.  However, I can  
> see a small benefit to implementing multi-log.  Specifically, to  
> address the case where a log device dies and is immediately followed  
> by a machine failure before corrective action can be taken.  IOW, you  
> are targeting a very small window of time here.

Yes, as you mentioned, it is a very small window of time. But mirroring
could be used for very critical systems and users would care such
a small window.

In addition, the multi-log feature is efficient and very important
at system booting. It is highly possible that disks get broken.
I think that the current log feature can not save this situation
because there is only one log disk and there is only way to construct
a mirror by "core" log. Then, a whole disk replication will be triggered.

Do you have a good idea to save it, too? If there is, it could be
an another solution of our concern.

> If you choose to take on the multi-log (which it appears you are  
> mostly done), then I'd like to see it as a separate module.  IOW,  
> there should be no code changes to dm-log.c.  You would implement a  
> new module, named dm-log-multi[ple].ko.  The name would be "multi- 
> disk".  This frees you to have whatever constructor table arguments  
> you want.  (I've done something similar when creating my cluster-aware  
> logging.)

Yes, it is easy to implement the multi-log feature as a separate
module, but there are many common functions for both modules.
For maintainability, I think that those functions should be shared
for both modules instead of being maintained by themselves.

Could you give me any suggestions for sharing common functions?

Thanks,
---
Takahiro Yasui
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-12-15 14:56   ` Takahiro Yasui
@ 2008-12-20  0:13     ` malahal
  2008-12-23  0:23       ` Takahiro Yasui
  0 siblings, 1 reply; 24+ messages in thread
From: malahal @ 2008-12-20  0:13 UTC (permalink / raw)
  To: dm-devel; +Cc: agk

Takahiro Yasui [tyasui@redhat.com] wrote:
> In addition, the multi-log feature is efficient and very important
> at system booting. It is highly possible that disks get broken.
> I think that the current log feature can not save this situation
> because there is only one log disk and there is only way to construct
> a mirror by "core" log. Then, a whole disk replication will be triggered.
> 
> Do you have a good idea to save it, too? If there is, it could be
> an another solution of our concern.

A while back IBM posted a patch to LVM that constructs a log device with
a mirror and then creates the real mirror using such a mirrored log
device. I think this may solve your problem. It was completely written
in LVM and Stefan refreshed it to the latest LVM.

Anyone thinks this user level only solution is any better? We will be
very happy to have any one of the solutions in upstream!

Here is a link the posted patch
(http://permalink.gmane.org/gmane.linux.kernel.device-mapper.devel/3988)


--Malahal.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-12-20  0:13     ` malahal
@ 2008-12-23  0:23       ` Takahiro Yasui
  2008-12-23 18:02         ` malahal
  0 siblings, 1 reply; 24+ messages in thread
From: Takahiro Yasui @ 2008-12-23  0:23 UTC (permalink / raw)
  To: dm-devel; +Cc: agk

Hi Malahal,

malahal@us.ibm.com wrote:
> Takahiro Yasui [tyasui@redhat.com] wrote:
>> In addition, the multi-log feature is efficient and very important
>> at system booting. It is highly possible that disks get broken.
>> I think that the current log feature can not save this situation
>> because there is only one log disk and there is only way to construct
>> a mirror by "core" log. Then, a whole disk replication will be triggered.
>>
>> Do you have a good idea to save it, too? If there is, it could be
>> an another solution of our concern.
> 
> A while back IBM posted a patch to LVM that constructs a log device with
> a mirror and then creates the real mirror using such a mirrored log
> device. I think this may solve your problem. It was completely written
> in LVM and Stefan refreshed it to the latest LVM.

Thank you for the comment and information. It seems that your
approach seems to address my problem, too. Here I have a concern
about write performance because an additional mirror mapping might
introduce additional delay and overhead. In addition, error for
log devices is better to be handled by the simple way, and a basic
error handling would work.

I couldn't find any discussion after you posted the patch.
Could you tell me if IBM also have the same background as I have,
or do you have another issue to solve? I would also like to know
if my approach solves your problem.

> Anyone thinks this user level only solution is any better? We will be
> very happy to have any one of the solutions in upstream!

Yes, I appreciate many comments and I'm glad to have a solution
in upstream, too.

> Here is a link the posted patch
> (http://permalink.gmane.org/gmane.linux.kernel.device-mapper.devel/3988)
> 
> 
> --Malahal.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-12-23  0:23       ` Takahiro Yasui
@ 2008-12-23 18:02         ` malahal
  2008-12-31 20:15           ` malahal
  2009-01-05 15:44           ` Takahiro Yasui
  0 siblings, 2 replies; 24+ messages in thread
From: malahal @ 2008-12-23 18:02 UTC (permalink / raw)
  To: Takahiro Yasui; +Cc: dm-devel, agk

Takahiro Yasui [tyasui@redhat.com] wrote:
> Hi Malahal,
> > A while back IBM posted a patch to LVM that constructs a log device with
> > a mirror and then creates the real mirror using such a mirrored log
> > device. I think this may solve your problem. It was completely written
> > in LVM and Stefan refreshed it to the latest LVM.
> 
> Thank you for the comment and information. It seems that your
> approach seems to address my problem, too. Here I have a concern
> about write performance because an additional mirror mapping might
> introduce additional delay and overhead. In addition, error for
> log devices is better to be handled by the simple way, and a basic
> error handling would work.

In theory yes, but I doubt it would be user visible that much. We expect
transient failures under some circumstances, so we would like to handle
them. In other words, a failed device is expected to come back and the
mirror target should re-integrate it automatically when it comes back.
Can your multi-log code handle re-synchronizing a log device?

With our user level only implementation, the log device handling would
be as good as the real mirror *leg* handling. We get all the benefits of
the mirror without doing any code! Wouldn't it be nice?


> I couldn't find any discussion after you posted the patch.
> Could you tell me if IBM also have the same background as I have,
> or do you have another issue to solve? I would also like to know
> if my approach solves your problem.

Jonathan, Alasdair and I had discussions about the patch. I can send
them to you if you want.

As I said, we want to handle transient device failures. Can your patch
work with such log devices?

--Malahal.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-12-23 18:02         ` malahal
@ 2008-12-31 20:15           ` malahal
  2009-01-08 18:30             ` Jonathan Brassow
  2009-01-05 15:44           ` Takahiro Yasui
  1 sibling, 1 reply; 24+ messages in thread
From: malahal @ 2008-12-31 20:15 UTC (permalink / raw)
  To: Takahiro Yasui, dm-devel, agk, jbrassow

Alasdair, Jonathan: Any comments regarding kernel module vs
implementing it entirely with in LVM to support this multi-log feature?

Thanks, Malahal.

malahal@us.ibm.com [malahal@us.ibm.com] wrote:
> Takahiro Yasui [tyasui@redhat.com] wrote:
> > Hi Malahal,
> > > A while back IBM posted a patch to LVM that constructs a log device with
> > > a mirror and then creates the real mirror using such a mirrored log
> > > device. I think this may solve your problem. It was completely written
> > > in LVM and Stefan refreshed it to the latest LVM.
> > 
> > Thank you for the comment and information. It seems that your
> > approach seems to address my problem, too. Here I have a concern
> > about write performance because an additional mirror mapping might
> > introduce additional delay and overhead. In addition, error for
> > log devices is better to be handled by the simple way, and a basic
> > error handling would work.
> 
> In theory yes, but I doubt it would be user visible that much. We expect
> transient failures under some circumstances, so we would like to handle
> them. In other words, a failed device is expected to come back and the
> mirror target should re-integrate it automatically when it comes back.
> Can your multi-log code handle re-synchronizing a log device?
> 
> With our user level only implementation, the log device handling would
> be as good as the real mirror *leg* handling. We get all the benefits of
> the mirror without doing any code! Wouldn't it be nice?
> 
> 
> > I couldn't find any discussion after you posted the patch.
> > Could you tell me if IBM also have the same background as I have,
> > or do you have another issue to solve? I would also like to know
> > if my approach solves your problem.
> 
> Jonathan, Alasdair and I had discussions about the patch. I can send
> them to you if you want.
> 
> As I said, we want to handle transient device failures. Can your patch
> work with such log devices?
> 
> --Malahal.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-12-23 18:02         ` malahal
  2008-12-31 20:15           ` malahal
@ 2009-01-05 15:44           ` Takahiro Yasui
  2009-01-05 16:24             ` malahal
  1 sibling, 1 reply; 24+ messages in thread
From: Takahiro Yasui @ 2009-01-05 15:44 UTC (permalink / raw)
  To: malahal; +Cc: dm-devel, agk

Hi Malahal,

Sorry for my late reply.

malahal@us.ibm.com wrote:
> Takahiro Yasui [tyasui@redhat.com] wrote:
>> Hi Malahal,
>>> A while back IBM posted a patch to LVM that constructs a log device with
>>> a mirror and then creates the real mirror using such a mirrored log
>>> device. I think this may solve your problem. It was completely written
>>> in LVM and Stefan refreshed it to the latest LVM.
>> Thank you for the comment and information. It seems that your
>> approach seems to address my problem, too. Here I have a concern
>> about write performance because an additional mirror mapping might
>> introduce additional delay and overhead. In addition, error for
>> log devices is better to be handled by the simple way, and a basic
>> error handling would work.
> 
> In theory yes, but I doubt it would be user visible that much. We expect
> transient failures under some circumstances, so we would like to handle
> them. In other words, a failed device is expected to come back and the
> mirror target should re-integrate it automatically when it comes back.
> Can your multi-log code handle re-synchronizing a log device?

my patch doesn't handle transient error now. I expect log devices
to be failed and got in a blockage status once an error has happened.

> With our user level only implementation, the log device handling would
> be as good as the real mirror *leg* handling. We get all the benefits of
> the mirror without doing any code! Wouldn't it be nice?

I agree that simple implementation is better, but log could be handled
without any additional layer, and also I'm thinking that log could be
handled in the simpler way.

Lower layer, such as SCSI, also has retry feature based on error type
and will be done in the proper way. Do you mean that it isn't enough
and should dm-layer handle errors for log device, too?

I introduced multi-log feature so that system can keep running even if
the lower layer could not recover errors.

Thanks,
Taka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2009-01-05 15:44           ` Takahiro Yasui
@ 2009-01-05 16:24             ` malahal
  2009-01-05 17:36               ` Takahiro Yasui
  0 siblings, 1 reply; 24+ messages in thread
From: malahal @ 2009-01-05 16:24 UTC (permalink / raw)
  To: Takahiro Yasui; +Cc: dm-devel, agk

Takahiro Yasui [tyasui@redhat.com] wrote:
> Hi Malahal,
> 
> my patch doesn't handle transient error now. I expect log devices
> to be failed and got in a blockage status once an error has happened.
> 
> > With our user level only implementation, the log device handling would
> > be as good as the real mirror *leg* handling. We get all the benefits of
> > the mirror without doing any code! Wouldn't it be nice?
> 
> I agree that simple implementation is better, but log could be handled
> without any additional layer, and also I'm thinking that log could be
> handled in the simpler way.
> 
> Lower layer, such as SCSI, also has retry feature based on error type
> and will be done in the proper way. Do you mean that it isn't enough
> and should dm-layer handle errors for log device, too?

Not really. What I meant is re-integrating a failed log device when it
comes back again. That is also what I mean by handling 'transient
errors'.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2009-01-05 16:24             ` malahal
@ 2009-01-05 17:36               ` Takahiro Yasui
  2009-01-05 18:44                 ` malahal
  0 siblings, 1 reply; 24+ messages in thread
From: Takahiro Yasui @ 2009-01-05 17:36 UTC (permalink / raw)
  To: malahal; +Cc: dm-devel, agk

malahal@us.ibm.com wrote:
> Takahiro Yasui [tyasui@redhat.com] wrote:
>> Hi Malahal,
>>
>> my patch doesn't handle transient error now. I expect log devices
>> to be failed and got in a blockage status once an error has happened.
>>
>>> With our user level only implementation, the log device handling would
>>> be as good as the real mirror *leg* handling. We get all the benefits of
>>> the mirror without doing any code! Wouldn't it be nice?
>> I agree that simple implementation is better, but log could be handled
>> without any additional layer, and also I'm thinking that log could be
>> handled in the simpler way.
>>
>> Lower layer, such as SCSI, also has retry feature based on error type
>> and will be done in the proper way. Do you mean that it isn't enough
>> and should dm-layer handle errors for log device, too?
> 
> Not really. What I meant is re-integrating a failed log device when it
> comes back again. That is also what I mean by handling 'transient
> errors'.

Thanks, I see your requirement on this feature.
Let me put one more question.

LVM mirroring is used to make system available even if some devices,
such as data and log device, have a problem. Currently, activations
by "vgchange -ay" command seem to fail during system booting if one
of devices related to VG are missing or had I/O error. For example,
if a mirror is structured by two data devices and log devices, the
mirror logical device should be activated and used even if one data
device and one log device are missing.

Could you give me an idea or solution to handle this? Do we need to
enhance your feature to achieve this requirement?

I appreciate your comments.

Thanks,
Taka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2009-01-05 17:36               ` Takahiro Yasui
@ 2009-01-05 18:44                 ` malahal
  2009-01-05 18:51                   ` Alasdair G Kergon
  2009-01-05 22:26                   ` Takahiro Yasui
  0 siblings, 2 replies; 24+ messages in thread
From: malahal @ 2009-01-05 18:44 UTC (permalink / raw)
  To: Takahiro Yasui; +Cc: dm-devel, agk

Takahiro Yasui [tyasui@redhat.com] wrote:
> > Not really. What I meant is re-integrating a failed log device when it
> > comes back again. That is also what I mean by handling 'transient
> > errors'.
> 
> Thanks, I see your requirement on this feature.
> Let me put one more question.
> 
> LVM mirroring is used to make system available even if some devices,
> such as data and log device, have a problem. Currently, activations
> by "vgchange -ay" command seem to fail during system booting if one
> of devices related to VG are missing or had I/O error. For example,
> if a mirror is structured by two data devices and log devices, the
> mirror logical device should be activated and used even if one data
> device and one log device are missing.
>
> Could you give me an idea or solution to handle this? Do we need to
> enhance your feature to achieve this requirement?

The requirement you are asking is more than "mirrored log" support.
Actually it is nothing to do with "mirrored log" support! LVM has
"--partial' option but that creates read-only volumes.  IBM has
implemented "--partial-rw" that activates such mirror devices.
Essentially it creates an 'error device' in place of a missing device,
but the patch isn't complete as it doesn't work well with other segment
(target) types!

What we need is changes to LVM "--partial" code path where it can create
volumes read-only or read-write based on what is available instead of
blindly doing read-only.

Thank you for looking at these enterprise level features.

Thanks, Malahal.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2009-01-05 18:44                 ` malahal
@ 2009-01-05 18:51                   ` Alasdair G Kergon
  2009-01-05 22:21                     ` Takahiro Yasui
  2009-01-05 22:26                   ` Takahiro Yasui
  1 sibling, 1 reply; 24+ messages in thread
From: Alasdair G Kergon @ 2009-01-05 18:51 UTC (permalink / raw)
  To: dm-devel

On Mon, Jan 05, 2009 at 10:44:08AM -0800, malahal@us.ibm.com wrote:
> What we need is changes to LVM "--partial" code path where it can create
> volumes read-only or read-write based on what is available instead of
> blindly doing read-only.

This functionality changed recently - try the latest release.

Alasdair
-- 
agk@redhat.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2009-01-05 18:51                   ` Alasdair G Kergon
@ 2009-01-05 22:21                     ` Takahiro Yasui
  0 siblings, 0 replies; 24+ messages in thread
From: Takahiro Yasui @ 2009-01-05 22:21 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

Hi Alasdair,

Alasdair G Kergon wrote:
> On Mon, Jan 05, 2009 at 10:44:08AM -0800, malahal@us.ibm.com wrote:
>> What we need is changes to LVM "--partial" code path where it can create
>> volumes read-only or read-write based on what is available instead of
>> blindly doing read-only.
> 
> This functionality changed recently - try the latest release.

Thanks for the information about this concern. I tried the latest
command in cvs and VG could be activated by read-write mode, while
lvm command still output read-only message.

 Partial mode. Incomplete volume groups will be activated read-only.

I will test it more.

Thanks,
Taka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2009-01-05 18:44                 ` malahal
  2009-01-05 18:51                   ` Alasdair G Kergon
@ 2009-01-05 22:26                   ` Takahiro Yasui
  1 sibling, 0 replies; 24+ messages in thread
From: Takahiro Yasui @ 2009-01-05 22:26 UTC (permalink / raw)
  To: malahal; +Cc: dm-devel, agk

malahal@us.ibm.com wrote:
> Takahiro Yasui [tyasui@redhat.com] wrote:
>>> Not really. What I meant is re-integrating a failed log device when it
>>> comes back again. That is also what I mean by handling 'transient
>>> errors'.
>> Thanks, I see your requirement on this feature.
>> Let me put one more question.
>>
>> LVM mirroring is used to make system available even if some devices,
>> such as data and log device, have a problem. Currently, activations
>> by "vgchange -ay" command seem to fail during system booting if one
>> of devices related to VG are missing or had I/O error. For example,
>> if a mirror is structured by two data devices and log devices, the
>> mirror logical device should be activated and used even if one data
>> device and one log device are missing.
>>
>> Could you give me an idea or solution to handle this? Do we need to
>> enhance your feature to achieve this requirement?
> 
> The requirement you are asking is more than "mirrored log" support.
> Actually it is nothing to do with "mirrored log" support! LVM has
> "--partial' option but that creates read-only volumes.  IBM has
> implemented "--partial-rw" that activates such mirror devices.
> Essentially it creates an 'error device' in place of a missing device,
> but the patch isn't complete as it doesn't work well with other segment
> (target) types!
> 
> What we need is changes to LVM "--partial" code path where it can create
> volumes read-only or read-write based on what is available instead of
> blindly doing read-only.

Thank you for the kind reply. I tried the latest lvm command as
Alasdair suggested, and it seems to work as I expected.

I will test the latest lvm command first.

Thanks,
Taka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2008-12-31 20:15           ` malahal
@ 2009-01-08 18:30             ` Jonathan Brassow
  2009-01-08 19:00               ` malahal
  0 siblings, 1 reply; 24+ messages in thread
From: Jonathan Brassow @ 2009-01-08 18:30 UTC (permalink / raw)
  To: device-mapper development

My general feeling is that it is better to do in userspace, but this  
is only because I think there is so much improvement to be done in the  
mirror DSO - transient fault handling being one of those areas.  If  
you all can get your benefits of multi-log, while I get my benefits of  
an improved DSO, then I am very happy.

That being said, there may also be merit in the kernel approach.  I  
haven't tried to think through all the nasty cases where log devices  
and mirror devices overlap.  For example, I want a 3-way mirror with a  
2-way redundant mirror log and I only have 3 physical disks.  If I get  
a failure on a device that contains both log and leg, how are the  
failures going to be handled?  It could get difficult with the  
layering...

And speaking of layering...  If we made LVM capable of generic  
layering (e.g. ability to stack targets, like RAID10 or snapshots of  
mirrors) and we improved the DSO, wouldn't we get everything we want?   
Stacking is already high on the list of priorities... so another good  
place to focus attention would be the mirror DSO.  :)

Perhaps others have a stronger opinion on kernel vs. userspace.

  brassow

On Dec 31, 2008, at 2:15 PM, malahal@us.ibm.com wrote:

> Alasdair, Jonathan: Any comments regarding kernel module vs
> implementing it entirely with in LVM to support this multi-log  
> feature?
>
> Thanks, Malahal.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 0/4] dm-log: support multi-log devices
  2009-01-08 18:30             ` Jonathan Brassow
@ 2009-01-08 19:00               ` malahal
  0 siblings, 0 replies; 24+ messages in thread
From: malahal @ 2009-01-08 19:00 UTC (permalink / raw)
  To: Jonathan Brassow; +Cc: device-mapper development

Jonathan Brassow [jbrassow@redhat.com] wrote:
> My general feeling is that it is better to do in userspace, but this is 
> only because I think there is so much improvement to be done in the mirror 
> DSO - transient fault handling being one of those areas.  If you all can 
> get your benefits of multi-log, while I get my benefits of an improved DSO, 
> then I am very happy.
>
> That being said, there may also be merit in the kernel approach.  I haven't 
> tried to think through all the nasty cases where log devices and mirror 
> devices overlap.  For example, I want a 3-way mirror with a 2-way redundant 
> mirror log and I only have 3 physical disks.  If I get a failure on a 
> device that contains both log and leg, how are the failures going to be 
> handled?  It could get difficult with the layering...

The log mirror would be out-of-sync but should/would still continue to
operate.  The leg mirror would be out-of-sync as well. Depending on how
it is configured, the log mirror may go to linear/single device mode.

Thank you Jonathan for your comments.

--Malahal.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-01-08 19:00 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-26  0:01 [RFC][PATCH 0/4] dm-log: support multi-log devices Takahiro Yasui
2008-11-26 18:56 ` Phillip Susi
2008-11-27  7:50   ` Takahiro Yasui
2008-11-28 20:06     ` Phillip Susi
2008-12-01  7:00       ` Takahiro Yasui
2008-12-01 11:09         ` Michał Mirosław
2008-12-02  6:05           ` Takahiro Yasui
2008-12-01 16:10         ` Phillip Susi
2008-12-02  4:52           ` Takahiro Yasui
2008-12-12 20:21 ` Jonathan Brassow
2008-12-15 14:56   ` Takahiro Yasui
2008-12-20  0:13     ` malahal
2008-12-23  0:23       ` Takahiro Yasui
2008-12-23 18:02         ` malahal
2008-12-31 20:15           ` malahal
2009-01-08 18:30             ` Jonathan Brassow
2009-01-08 19:00               ` malahal
2009-01-05 15:44           ` Takahiro Yasui
2009-01-05 16:24             ` malahal
2009-01-05 17:36               ` Takahiro Yasui
2009-01-05 18:44                 ` malahal
2009-01-05 18:51                   ` Alasdair G Kergon
2009-01-05 22:21                     ` Takahiro Yasui
2009-01-05 22:26                   ` Takahiro Yasui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.