[RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
@ 2007-08-12 10:35 Al Boldi
  2007-08-12 11:28 ` Jan Engelhardt
  2007-08-12 11:51 ` Evgeniy Polyakov
  0 siblings, 2 replies; 14+ messages in thread
From: Al Boldi @ 2007-08-12 10:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, netdev, linux-raid

Lars Ellenberg wrote:
> meanwhile, please, anyone interessted,
> the drbd paper for LinuxConf Eu 2007 is finalized.
> http://www.drbd.org/fileadmin/drbd/publications/
> drbd8.linux-conf.eu.2007.pdf
>
> it does not give too much implementation detail (would be inapropriate
> for conference proceedings, imo; some paper commenting on the source
> code should follow).
>
> but it does give a good overview about what DRBD actually is,
> what exact problems it tries to solve,
> and what developments to expect in the near future.
>
> so you can make up your mind about
>  "Do we need it?", and
>  "Why DRBD? Why not NBD + MD-RAID?"

Ok, conceptually your driver sounds really interresting, but when I read the 
pdf I got completely turned off.  The problem is that the concepts are not 
clearly implemented, when in fact the concepts are really simple:

  Allow shared access to remote block storage with fault tolerance.

The first thing to tackle here would be write serialization.  Then start 
thinking about fault tolerance.

Now, shared remote block access should theoretically be handled, as does 
DRBD, by a block layer driver, but realistically it may be more appropriate 
to let it be handled by the combining end user, like OCFS or GFS.

The idea here is to simplify lower layer implementations while removing any 
preconceived dependencies, and let upper layers reign free without incurring 
redundant overhead.

Look at ZFS; it illegally violates layering by combining md/dm/lvm with the 
fs, but it does this based on a realistic understanding of the problems 
involved, which enables it to improve performance, flexibility, and 
functionality specific to its use case.

This implies that there are two distinct forces at work here:

  1. Layer components
  2. Use-Case composers

Layer components should technically not implement any use case (other than 
providing a plumbing framework), as that would incur unnecessary 
dependencies, which could reduce its generality and thus reusability.

Use-Case composers can now leverage layer components from across the layering 
hierarchy, to yield a specific use case implementation.

DRBD is such a Use-Case composer, as is mdm / dm / lvm and any fs in general, 
whereas aoe / nbd / loop and the VFS / FUSE are examples of layer 
components.

It follows that Use-case composers, like DRBD, need common functionality that 
should be factored out into layer components, and then recompose to 
implement a specific use case.

Thanks!

--
Al

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
  2007-08-12 10:35 [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid]) Al Boldi
@ 2007-08-12 11:28 ` Jan Engelhardt
  2007-08-12 16:39   ` david
  2007-08-12 11:51 ` Evgeniy Polyakov
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Engelhardt @ 2007-08-12 11:28 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel, linux-fsdevel, netdev, linux-raid


On Aug 12 2007 13:35, Al Boldi wrote:
>Lars Ellenberg wrote:
>> meanwhile, please, anyone interessted,
>> the drbd paper for LinuxConf Eu 2007 is finalized.
>> http://www.drbd.org/fileadmin/drbd/publications/
>> drbd8.linux-conf.eu.2007.pdf
>>
>> but it does give a good overview about what DRBD actually is,
>> what exact problems it tries to solve,
>> and what developments to expect in the near future.
>>
>> so you can make up your mind about
>>  "Do we need it?", and
>>  "Why DRBD? Why not NBD + MD-RAID?"

I may have made a mistake when asking for how it compares to NBD+MD.
Let me retry: what's the functional difference between
GFS2 on a DRBD .vs. GFS2 on a DAS SAN?

>Now, shared remote block access should theoretically be handled, as does 
>DRBD, by a block layer driver, but realistically it may be more appropriate 
>to let it be handled by the combining end user, like OCFS or GFS.


	Jan
-- 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
  2007-08-12 10:35 [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid]) Al Boldi
  2007-08-12 11:28 ` Jan Engelhardt
@ 2007-08-12 11:51 ` Evgeniy Polyakov
  2007-08-12 15:28   ` Al Boldi
  1 sibling, 1 reply; 14+ messages in thread
From: Evgeniy Polyakov @ 2007-08-12 11:51 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel, linux-fsdevel, netdev, linux-raid

On Sun, Aug 12, 2007 at 01:35:17PM +0300, Al Boldi (a1426z@gawab.com) wrote:
> Lars Ellenberg wrote:
> > meanwhile, please, anyone interessted,
> > the drbd paper for LinuxConf Eu 2007 is finalized.
> > http://www.drbd.org/fileadmin/drbd/publications/
> > drbd8.linux-conf.eu.2007.pdf
> >
> > it does not give too much implementation detail (would be inapropriate
> > for conference proceedings, imo; some paper commenting on the source
> > code should follow).
> >
> > but it does give a good overview about what DRBD actually is,
> > what exact problems it tries to solve,
> > and what developments to expect in the near future.
> >
> > so you can make up your mind about
> >  "Do we need it?", and
> >  "Why DRBD? Why not NBD + MD-RAID?"
> 
> Ok, conceptually your driver sounds really interresting, but when I read the 
> pdf I got completely turned off.  The problem is that the concepts are not 
> clearly implemented, when in fact the concepts are really simple:
> 
>   Allow shared access to remote block storage with fault tolerance.
> 
> The first thing to tackle here would be write serialization.  Then start 
> thinking about fault tolerance.
> 
> Now, shared remote block access should theoretically be handled, as does 
> DRBD, by a block layer driver, but realistically it may be more appropriate 
> to let it be handled by the combining end user, like OCFS or GFS.
> 
> The idea here is to simplify lower layer implementations while removing any 
> preconceived dependencies, and let upper layers reign free without incurring 
> redundant overhead.
> 
> Look at ZFS; it illegally violates layering by combining md/dm/lvm with the 
> fs, but it does this based on a realistic understanding of the problems 
> involved, which enables it to improve performance, flexibility, and 
> functionality specific to its use case.
> 
> This implies that there are two distinct forces at work here:
> 
>   1. Layer components
>   2. Use-Case composers
> 
> Layer components should technically not implement any use case (other than 
> providing a plumbing framework), as that would incur unnecessary 
> dependencies, which could reduce its generality and thus reusability.
> 
> Use-Case composers can now leverage layer components from across the layering 
> hierarchy, to yield a specific use case implementation.
> 
> DRBD is such a Use-Case composer, as is mdm / dm / lvm and any fs in general, 
> whereas aoe / nbd / loop and the VFS / FUSE are examples of layer 
> components.
> 
> It follows that Use-case composers, like DRBD, need common functionality that 
> should be factored out into layer components, and then recompose to 
> implement a specific use case.

Out of curiosity, did you try ndb+dm+raid1 compared to drbd and/or zfs
on top of distributed storage (which is a urprise to me, that holy zfs
suppors that)?
 
> Thanks!
> 
> --
> Al
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
  2007-08-12 11:51 ` Evgeniy Polyakov
@ 2007-08-12 15:28   ` Al Boldi
  0 siblings, 0 replies; 14+ messages in thread
From: Al Boldi @ 2007-08-12 15:28 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: linux-kernel, linux-fsdevel, netdev, linux-raid

Evgeniy Polyakov wrote:
> Al Boldi (a1426z@gawab.com) wrote:
> > Look at ZFS; it illegally violates layering by combining md/dm/lvm with
> > the fs, but it does this based on a realistic understanding of the
> > problems involved, which enables it to improve performance, flexibility,
> > and functionality specific to its use case.
> >
> > This implies that there are two distinct forces at work here:
> >
> >   1. Layer components
> >   2. Use-Case composers
> >
> > Layer components should technically not implement any use case (other
> > than providing a plumbing framework), as that would incur unnecessary
> > dependencies, which could reduce its generality and thus reusability.
> >
> > Use-Case composers can now leverage layer components from across the
> > layering hierarchy, to yield a specific use case implementation.
> >
> > DRBD is such a Use-Case composer, as is mdm / dm / lvm and any fs in
> > general, whereas aoe / nbd / loop and the VFS / FUSE are examples of
> > layer components.
> >
> > It follows that Use-case composers, like DRBD, need common functionality
> > that should be factored out into layer components, and then recompose to
> > implement a specific use case.
>
> Out of curiosity, did you try ndb+dm+raid1 compared to drbd and/or zfs
> on top of distributed storage (which is a urprise to me, that holy zfs
> suppors that)?

Actually, I may not have been very clear in my Use-Case composer description 
to mean internal in-kernel Use-Case composer as opposed to external Userland 
Use-Case composer.

So, nbd+dm+raid1 would be an external Userland Use-Case composition, which 
obviously could have some drastic performance issues.

DRBD and ZFS are examples of internal in-kernel Use-Case composers, which 
obviously could show some drastic performance improvements.  

Although you could allow in-kernel Use-Case composers to be run on top of 
Userland Use-Case composers, that wouldn't be the preferred mode of 
operation.  Instead, you would for example recompose ZFS to incorporate an 
in-kernel distributed storage layer component, like nbd.

All this boils down to refactoring Use-Case composers to produce layer 
components with both in-kernel and userland interfaces.  Once we have that, 
it becomes a matter of plug-and-play to produce something awesome like ZFS.

Thanks!

--
Al

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
  2007-08-12 11:28 ` Jan Engelhardt
@ 2007-08-12 16:39   ` david
  2007-08-12 17:03     ` Jan Engelhardt
  0 siblings, 1 reply; 14+ messages in thread
From: david @ 2007-08-12 16:39 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Al Boldi, linux-kernel, linux-fsdevel, netdev, linux-raid

On Sun, 12 Aug 2007, Jan Engelhardt wrote:

> On Aug 12 2007 13:35, Al Boldi wrote:
>> Lars Ellenberg wrote:
>>> meanwhile, please, anyone interessted,
>>> the drbd paper for LinuxConf Eu 2007 is finalized.
>>> http://www.drbd.org/fileadmin/drbd/publications/
>>> drbd8.linux-conf.eu.2007.pdf
>>>
>>> but it does give a good overview about what DRBD actually is,
>>> what exact problems it tries to solve,
>>> and what developments to expect in the near future.
>>>
>>> so you can make up your mind about
>>>  "Do we need it?", and
>>>  "Why DRBD? Why not NBD + MD-RAID?"
>
> I may have made a mistake when asking for how it compares to NBD+MD.
> Let me retry: what's the functional difference between
> GFS2 on a DRBD .vs. GFS2 on a DAS SAN?

GFS is a distributed filesystem, DRDB is a replicated block device. you 
wouldn't do GFS on top of DRDB, you would do ext2/3, XFS, etc

DRDB is much closer to the NBD+MD option.

now, I am not an expert on either option, but three are a couple things 
that I would question about the DRDB+MD option

1. when the remote machine is down, how does MD deal with it for reads and 
writes?

2. MD over local drive will alternate reads between mirrors (or so I've 
been told), doing so over the network is wrong.

3. when writing, will MD wait for the network I/O to get the data saved on 
the backup before returning from the syscall? or can it sync the data out 
lazily

>> Now, shared remote block access should theoretically be handled, as does
>> DRBD, by a block layer driver, but realistically it may be more appropriate
>> to let it be handled by the combining end user, like OCFS or GFS.

there are times when you want to replicate at the block layer, and there 
are times when you want to have a filesystem do the work. don't force a 
filesystem on use-cases where a block device is the right answer.

David Lang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
  2007-08-12 16:39   ` david
@ 2007-08-12 17:03     ` Jan Engelhardt
  2007-08-12 17:45       ` Iustin Pop
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Engelhardt @ 2007-08-12 17:03 UTC (permalink / raw)
  To: david; +Cc: Al Boldi, linux-kernel, linux-fsdevel, netdev, linux-raid


On Aug 12 2007 09:39, david@lang.hm wrote:
>
> now, I am not an expert on either option, but three are a couple things that I
> would question about the DRDB+MD option
>
> 1. when the remote machine is down, how does MD deal with it for reads and
> writes?

I suppose it kicks the drive and you'd have to re-add it by hand unless done by
a cronjob.

> 2. MD over local drive will alternate reads between mirrors (or so I've been
> told), doing so over the network is wrong.

Certainly. In which case you set "write_mostly" (or even write_only, not sure
of its name) on the raid component that is nbd.

> 3. when writing, will MD wait for the network I/O to get the data saved on the
> backup before returning from the syscall? or can it sync the data out lazily

Can't answer this one - ask Neil :)




	Jan
-- 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
  2007-08-12 17:03     ` Jan Engelhardt
@ 2007-08-12 17:45       ` Iustin Pop
  2007-08-13  1:41         ` Paul Clements
  0 siblings, 1 reply; 14+ messages in thread
From: Iustin Pop @ 2007-08-12 17:45 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: david, Al Boldi, linux-kernel, linux-fsdevel, netdev, linux-raid

On Sun, Aug 12, 2007 at 07:03:44PM +0200, Jan Engelhardt wrote:
> 
> On Aug 12 2007 09:39, david@lang.hm wrote:
> >
> > now, I am not an expert on either option, but three are a couple things that I
> > would question about the DRDB+MD option
> >
> > 1. when the remote machine is down, how does MD deal with it for reads and
> > writes?
> 
> I suppose it kicks the drive and you'd have to re-add it by hand unless done by
> a cronjob.

>From my tests, since NBD doesn't have a timeout option, MD hangs in the
write to that mirror indefinitely, somewhat like when dealing with a
broken IDE driver/chipset/disk.

> > 2. MD over local drive will alternate reads between mirrors (or so I've been
> > told), doing so over the network is wrong.
> 
> Certainly. In which case you set "write_mostly" (or even write_only, not sure
> of its name) on the raid component that is nbd.
> 
> > 3. when writing, will MD wait for the network I/O to get the data saved on the
> > backup before returning from the syscall? or can it sync the data out lazily
> 
> Can't answer this one - ask Neil :)

MD has the write-mostly/write-behind options - which help in this case
but only up to a certain amount.

In my experience DRBD wins hands-down over MD+NBD because of MD doesn't
know (or handle) a component that never returns from a write, which is
quite different from returning with an error. Furthermore, DRBD was
designed to handle transient errors in the connection to the peer due to
its network-oriented design, whereas MD is mostly designed with local or
at least high-reliability disks (where disk can be SAN, SCSI, etc.) and
a failure is not normal for MD. Thus the need for manual reconnect in MD
case and the automated handling of reconnects in case of DRBD.

I'm just a happy user of both MD over local disks and DRBD for networked
raid.

regards,
iustin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
  2007-08-12 17:45       ` Iustin Pop
@ 2007-08-13  1:41         ` Paul Clements
  2007-08-13  3:21           ` david
  2007-08-13  7:51           ` David Greaves
  0 siblings, 2 replies; 14+ messages in thread
From: Paul Clements @ 2007-08-13  1:41 UTC (permalink / raw)
  To: Jan Engelhardt, david, Al Boldi, linux-kernel, linux-fsdevel,
	netdev, linux-raid

Iustin Pop wrote:
> On Sun, Aug 12, 2007 at 07:03:44PM +0200, Jan Engelhardt wrote:
>> On Aug 12 2007 09:39, david@lang.hm wrote:
>>> now, I am not an expert on either option, but three are a couple things that I
>>> would question about the DRDB+MD option
>>>
>>> 1. when the remote machine is down, how does MD deal with it for reads and
>>> writes?
>> I suppose it kicks the drive and you'd have to re-add it by hand unless done by
>> a cronjob.

Yes, and with a bitmap configured on the raid1, you just resync the 
blocks that have been written while the connection was down.


>>From my tests, since NBD doesn't have a timeout option, MD hangs in the
> write to that mirror indefinitely, somewhat like when dealing with a
> broken IDE driver/chipset/disk.

Well, if people would like to see a timeout option, I actually coded up 
a patch a couple of years ago to do just that, but I never got it into 
mainline because you can do almost as well by doing a check at 
user-level (I basically ping the nbd connection periodically and if it 
fails, I kill -9 the nbd-client).


>>> 2. MD over local drive will alternate reads between mirrors (or so I've been
>>> told), doing so over the network is wrong.
>> Certainly. In which case you set "write_mostly" (or even write_only, not sure
>> of its name) on the raid component that is nbd.
>>
>>> 3. when writing, will MD wait for the network I/O to get the data saved on the
>>> backup before returning from the syscall? or can it sync the data out lazily
>> Can't answer this one - ask Neil :)
> 
> MD has the write-mostly/write-behind options - which help in this case
> but only up to a certain amount.

You can configure write_behind (aka, asynchronous writes) to buffer as 
much data as you have RAM to hold. At a certain point, presumably, you'd 
want to just break the mirror and take the hit of doing a resync once 
your network leg falls too far behind.

--
Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
  2007-08-13  1:41         ` Paul Clements
@ 2007-08-13  3:21           ` david
  2007-08-13  8:03             ` David Greaves
  2007-08-13  9:02             ` Jan Engelhardt
  2007-08-13  7:51           ` David Greaves
  1 sibling, 2 replies; 14+ messages in thread
From: david @ 2007-08-13  3:21 UTC (permalink / raw)
  To: Paul Clements
  Cc: Jan Engelhardt, Al Boldi, linux-kernel, linux-fsdevel, netdev,
	linux-raid

per the message below MD (or DM) would need to be modified to work 
reasonably well with one of the disk components being over an unreliable 
link (like a network link)

are the MD/DM maintainers interested in extending their code in this 
direction? or would they prefer to keep it simpler by being able to 
continue to assume that the raid components are connected over a highly 
reliable connection?

if they are interested in adding (and maintaining) this functionality then 
there is a real possibility that NBD+MD/DM could eliminate the need for 
DRDB. however if they are not interested in adding all the code to deal 
with the network type issues, then the argument that DRDB should not be 
merged becouse you can do the same thing with MD/DM + NBD is invalid and 
can be dropped/ignored

David Lang

On Sun, 12 Aug 2007, Paul Clements wrote:

> Iustin Pop wrote:
>>  On Sun, Aug 12, 2007 at 07:03:44PM +0200, Jan Engelhardt wrote:
>> >  On Aug 12 2007 09:39, david@lang.hm wrote:
>> > >  now, I am not an expert on either option, but three are a couple 
>> > >  things that I
>> > >  would question about the DRDB+MD option
>> > > 
>> > >  1. when the remote machine is down, how does MD deal with it for reads 
>> > >  and
>> > >  writes?
>> >  I suppose it kicks the drive and you'd have to re-add it by hand unless 
>> >  done by
>> >  a cronjob.
>
> Yes, and with a bitmap configured on the raid1, you just resync the blocks 
> that have been written while the connection was down.
>
>
>> >From my tests, since NBD doesn't have a timeout option, MD hangs in the
>>  write to that mirror indefinitely, somewhat like when dealing with a
>>  broken IDE driver/chipset/disk.
>
> Well, if people would like to see a timeout option, I actually coded up a 
> patch a couple of years ago to do just that, but I never got it into mainline 
> because you can do almost as well by doing a check at user-level (I basically 
> ping the nbd connection periodically and if it fails, I kill -9 the 
> nbd-client).
>
>
>> > >  2. MD over local drive will alternate reads between mirrors (or so 
>> > >  I've been
>> > >  told), doing so over the network is wrong.
>> >  Certainly. In which case you set "write_mostly" (or even write_only, not 
>> >  sure
>> >  of its name) on the raid component that is nbd.
>> > 
>> > >  3. when writing, will MD wait for the network I/O to get the data 
>> > >  saved on the
>> > >  backup before returning from the syscall? or can it sync the data out 
>> > >  lazily
>> >  Can't answer this one - ask Neil :)
>>
>>  MD has the write-mostly/write-behind options - which help in this case
>>  but only up to a certain amount.
>
> You can configure write_behind (aka, asynchronous writes) to buffer as much 
> data as you have RAM to hold. At a certain point, presumably, you'd want to 
> just break the mirror and take the hit of doing a resync once your network 
> leg falls too far behind.
>
> --
> Paul
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
  2007-08-13  1:41         ` Paul Clements
  2007-08-13  3:21           ` david
@ 2007-08-13  7:51           ` David Greaves
  1 sibling, 0 replies; 14+ messages in thread
From: David Greaves @ 2007-08-13  7:51 UTC (permalink / raw)
  To: Paul Clements
  Cc: Jan Engelhardt, david, Al Boldi, linux-kernel, linux-fsdevel,
	netdev, linux-raid

Paul Clements wrote:
> Well, if people would like to see a timeout option, I actually coded up 
> a patch a couple of years ago to do just that, but I never got it into 
> mainline because you can do almost as well by doing a check at 
> user-level (I basically ping the nbd connection periodically and if it 
> fails, I kill -9 the nbd-client).


Yes please.

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
  2007-08-13  3:21           ` david
@ 2007-08-13  8:03             ` David Greaves
  2007-08-13  8:31               ` david
  2007-08-13  9:02             ` Jan Engelhardt
  1 sibling, 1 reply; 14+ messages in thread
From: David Greaves @ 2007-08-13  8:03 UTC (permalink / raw)
  To: david
  Cc: Paul Clements, Jan Engelhardt, Al Boldi, linux-kernel,
	linux-fsdevel, netdev, linux-raid

david@lang.hm wrote:
> per the message below MD (or DM) would need to be modified to work 
> reasonably well with one of the disk components being over an unreliable 
> link (like a network link)
> 
> are the MD/DM maintainers interested in extending their code in this 
> direction? or would they prefer to keep it simpler by being able to 
> continue to assume that the raid components are connected over a highly 
> reliable connection?
> 
> if they are interested in adding (and maintaining) this functionality 
> then there is a real possibility that NBD+MD/DM could eliminate the need 
> for DRDB. however if they are not interested in adding all the code to 
> deal with the network type issues, then the argument that DRDB should 
> not be merged becouse you can do the same thing with MD/DM + NBD is 
> invalid and can be dropped/ignored
> 
> David Lang

As a user I'd like to see md/nbd be extended to cope with unreliable links.
I think md could be better in handling link exceptions. My unreliable memory 
recalls sporadic issues with hot-plug leaving md hanging and certain lower level 
errors (or even very high latency) causing unsatisfactory behaviour in what is 
supposed to be a fault 'tolerant' subsystem.


Would this just be relevant to network devices or would it improve support for 
jostled usb and sata hot-plugging I wonder?

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
  2007-08-13  8:03             ` David Greaves
@ 2007-08-13  8:31               ` david
  2007-08-13 12:43                 ` David Greaves
  0 siblings, 1 reply; 14+ messages in thread
From: david @ 2007-08-13  8:31 UTC (permalink / raw)
  To: David Greaves
  Cc: Paul Clements, Jan Engelhardt, Al Boldi, linux-kernel,
	linux-fsdevel, netdev, linux-raid

On Mon, 13 Aug 2007, David Greaves wrote:

> david@lang.hm wrote:
>>  per the message below MD (or DM) would need to be modified to work
>>  reasonably well with one of the disk components being over an unreliable
>>  link (like a network link)
>>
>>  are the MD/DM maintainers interested in extending their code in this
>>  direction? or would they prefer to keep it simpler by being able to
>>  continue to assume that the raid components are connected over a highly
>>  reliable connection?
>>
>>  if they are interested in adding (and maintaining) this functionality then
>>  there is a real possibility that NBD+MD/DM could eliminate the need for
>>  DRDB. however if they are not interested in adding all the code to deal
>>  with the network type issues, then the argument that DRDB should not be
>>  merged becouse you can do the same thing with MD/DM + NBD is invalid and
>>  can be dropped/ignored
>>
>>  David Lang
>
> As a user I'd like to see md/nbd be extended to cope with unreliable links.
> I think md could be better in handling link exceptions. My unreliable memory 
> recalls sporadic issues with hot-plug leaving md hanging and certain lower 
> level errors (or even very high latency) causing unsatisfactory behaviour in 
> what is supposed to be a fault 'tolerant' subsystem.
>
>
> Would this just be relevant to network devices or would it improve support 
> for jostled usb and sata hot-plugging I wonder?

good question, I suspect that some of the error handling would be similar 
(for devices that are unreachable not haning the system for example), but 
a lot of the rest would be different (do you really want to try to 
auto-resync to a drive that you _think_ just reappeared, what if it's a 
different drive? how can you be sure?) the error rate of a network is gong 
to be significantly higher then for USB or SATA drives (although I suppose 
iscsi would be limilar)

David Lang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
  2007-08-13  3:21           ` david
  2007-08-13  8:03             ` David Greaves
@ 2007-08-13  9:02             ` Jan Engelhardt
  1 sibling, 0 replies; 14+ messages in thread
From: Jan Engelhardt @ 2007-08-13  9:02 UTC (permalink / raw)
  To: david
  Cc: Paul Clements, Al Boldi, linux-kernel, linux-fsdevel, netdev,
	linux-raid


On Aug 12 2007 20:21, david@lang.hm wrote:
>
> per the message below MD (or DM) would need to be modified to work
> reasonably well with one of the disk components being over an
> unreliable link (like a network link)

Does not dm-multipath do something like that?

> are the MD/DM maintainers interested in extending their code in this direction?
> or would they prefer to keep it simpler by being able to continue to assume
> that the raid components are connected over a highly reliable connection?

	Jan
-- 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
  2007-08-13  8:31               ` david
@ 2007-08-13 12:43                 ` David Greaves
  0 siblings, 0 replies; 14+ messages in thread
From: David Greaves @ 2007-08-13 12:43 UTC (permalink / raw)
  To: david
  Cc: Paul Clements, Jan Engelhardt, Al Boldi, linux-kernel,
	linux-fsdevel, netdev, linux-raid

david@lang.hm wrote:
>> Would this just be relevant to network devices or would it improve 
>> support for jostled usb and sata hot-plugging I wonder?
> 
> good question, I suspect that some of the error handling would be 
> similar (for devices that are unreachable not haning the system for 
> example), but a lot of the rest would be different (do you really want 
> to try to auto-resync to a drive that you _think_ just reappeared,
Well, omit 'think' and the answer may be "yes". A lot of systems are quite 
simple and RAID is common on the desktop now. If jostled USB fits into this 
category - then "yes".

> what 
> if it's a different drive? how can you be sure?
And that's the key isn't it. We have the RAID device UUID and the superblock 
info. Isn't that enough? If not then given the work involved an extended 
superblock wouldn't be unreasonable.
And I suspect the capability of devices would need recording in the superblock 
too? eg 'retry-on-fail'
I can see how md would fail a device but may now periodically retry it. If a 
retry shows that it's back then it would validate it (UUID) and then resync it.

> ) the error rate of a 
> network is gong to be significantly higher then for USB or SATA drives 
> (although I suppose iscsi would be limilar)

I do agree - I was looking for value-add for the existing subsystem. If this 
benefits existing RAID users then it's more likely to be attractive.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-08-13 12:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-12 10:35 [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid]) Al Boldi
2007-08-12 11:28 ` Jan Engelhardt
2007-08-12 16:39   ` david
2007-08-12 17:03     ` Jan Engelhardt
2007-08-12 17:45       ` Iustin Pop
2007-08-13  1:41         ` Paul Clements
2007-08-13  3:21           ` david
2007-08-13  8:03             ` David Greaves
2007-08-13  8:31               ` david
2007-08-13 12:43                 ` David Greaves
2007-08-13  9:02             ` Jan Engelhardt
2007-08-13  7:51           ` David Greaves
2007-08-12 11:51 ` Evgeniy Polyakov
2007-08-12 15:28   ` Al Boldi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).