Moving a backing device between 2 cachesets

linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Moving a backing device between 2 cachesets
@ 2014-01-28 21:59 Patrick Zwahlen
  2014-01-28 22:35 ` matthew patton
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Patrick Zwahlen @ 2014-01-28 21:59 UTC (permalink / raw)
  To: linux-bcache@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1354 bytes --]

Hi,

We're working on a 2-nodes pacemaker cluster that provides iSCSI LUNs via
SCST. The LUNs are either software RAID arrays located in a shared JBOD, or
DRBD ressources (active/passive).

We are adding bcache to the game with local SSDs (ie not shared, but
dedicated to each cluster node).

We are using write-through.

I need to evaluate the risk when moving a backing device (md) from cacheset1
(on node #1) to cacheset2 (on node #2) and then back to cacheset #1.

Scenario
- md attached to cacheset1 and working (on node 1)
- md detached from cacheset1
- md stopped on node 1
- md started on node 2
- md attached to cacheset2 on node 2

At this point, cacheset1 is attached to nothing, but still has valid blocks
"linked" to the backing md device

- md detached from cacheset2
- md stopped on node 2
- md started on node 1
- md RE-attached to cacheset1 on node 1

At this point, I need to make sure that bcache will not serve "old" blocks
that were linked to the backing device.

My understanding is that as we have attached the backing device to a new
cacheset (#2) in-between, this will be "recorded" in the bcache headers and
all the blocks that used to be valid in the first place won't be served.

Can you please validate if this is safe or if we need to take special care
about invalidating the original cacheset ?

Thanks a lot, - Patrick -

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6043 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Moving a backing device between 2 cachesets
  2014-01-28 21:59 Moving a backing device between 2 cachesets Patrick Zwahlen
@ 2014-01-28 22:35 ` matthew patton
  2014-01-29  7:49   ` Patrick Zwahlen
  2014-01-29 10:42 ` Gabriel de Perthuis
       [not found] ` < lcalu5$iek$1@ger.gmane.org>
  2 siblings, 1 reply; 9+ messages in thread
From: matthew patton @ 2014-01-28 22:35 UTC (permalink / raw)
  To: Patrick Zwahlen, linux-bcache@vger.kernel.org

>Scenario
>- md attached to cacheset1 and working (on node 1)
>- md detached from cacheset1

You can't just detach, you must flush everything out of cacheset1 and destroy/revoke/disassemble the relationship between the cache and MD. 'destroy-bcache(8) or make-bcache -D' if you will. Not that such a tool exists AFAIK.

>- md stopped on node 1
>- md started on node 2
>- md attached to cacheset2 on node 2

No, you don't attache it, you have to create a new relationship; ala 'make-bcache -C ...'

>At this point, cacheset1 is attached to nothing, but still has valid blocks
>"linked" to the backing md device

If you do, your data is hopelessly gone.

The only way you can bounce a bcache device from one node to the other is the ENTIRE stack moves back and forth. So the SSDs have to be in the shared JBOD. And you can skip all the malarky of attach/detach. You can write udev-rules to mask/ignore certain devices so that each node only knows about it's own devices and when it's time to cross-mount the peer's, you register them with the receiving system, do the 'probe-bcache' and resume operations. That's assuming the writes the client sent to the original node really got committed (write-thru not withstanding).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Moving a backing device between 2 cachesets
  2014-01-28 22:35 ` matthew patton
@ 2014-01-29  7:49   ` Patrick Zwahlen
  0 siblings, 0 replies; 9+ messages in thread
From: Patrick Zwahlen @ 2014-01-29  7:49 UTC (permalink / raw)
  To: linux-bcache@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1811 bytes --]

> -----Original Message-----
> From: matthew patton [mailto:pattonme@yahoo.com]
> Sent: mardi 28 janvier 2014 23:36
> To: Patrick Zwahlen; linux-bcache@vger.kernel.org
> Subject: Re: Moving a backing device between 2 cachesets
> 
> 
> 
> >Scenario
> >- md attached to cacheset1 and working (on node 1)
> >- md detached from cacheset1
> 
> You can't just detach, you must flush everything out of cacheset1 and
> destroy/revoke/disassemble the relationship between the cache and MD.
> 'destroy-bcache(8) or make-bcache -D' if you will. Not that such a tool
> exists AFAIK.

Doc says:
detach
  Write to this file to detach from a cache set. If there is dirty data in
the
  cache, it will be flushed first.

As we're in write-through mode, I don't care about dirty data

> >- md stopped on node 1
> >- md started on node 2
> >- md attached to cacheset2 on node 2
> 
> No, you don't attache it, you have to create a new relationship; ala
> 'make-bcache -C ...'

Hmmm. The local ssd (cacheset) is shared between multiple backing devices,
so having to "make-bcache -C" at this point is painful. 

> >At this point, cacheset1 is attached to nothing, but still has valid
> blocks
> >"linked" to the backing md device
> 
> If you do, your data is hopelessly gone.

Again, write-through means all the writes should be safe. I mostly care
about reads that were cached in a previous cacheset and might be delivered
again.

The setup is maybe hard to describe, so for the sake of simplicity it can be
summarized like that:

One node with 2 independent SSDs configured as write-through cachesets (-C)
and multiple storage devices configured as backing devices (-B).

I want to be able to safely move any single backing device from one SSD to
the other (and vice-versa) in a safe way.

Would that work ?

Regards, - Patrick -

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6043 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Moving a backing device between 2 cachesets
  2014-01-28 21:59 Moving a backing device between 2 cachesets Patrick Zwahlen
  2014-01-28 22:35 ` matthew patton
@ 2014-01-29 10:42 ` Gabriel de Perthuis
  2014-01-29 15:55   ` Patrick Zwahlen
       [not found] ` < lcalu5$iek$1@ger.gmane.org>
  2 siblings, 1 reply; 9+ messages in thread
From: Gabriel de Perthuis @ 2014-01-29 10:42 UTC (permalink / raw)
  To: linux-bcache

On Tue, 28 Jan 2014 21:59:02 +0000, Patrick Zwahlen wrote:

> Hi,
> 
> We're working on a 2-nodes pacemaker cluster that provides iSCSI LUNs via
> SCST. The LUNs are either software RAID arrays located in a shared JBOD, or
> DRBD ressources (active/passive).
> 
> We are adding bcache to the game with local SSDs (ie not shared, but
> dedicated to each cluster node).
> 
> We are using write-through.
> 
> I need to evaluate the risk when moving a backing device (md) from cacheset1
> (on node #1) to cacheset2 (on node #2) and then back to cacheset #1.
> 
> Scenario
> - md attached to cacheset1 and working (on node 1)
> - md detached from cacheset1
> - md stopped on node 1
> - md started on node 2
> - md attached to cacheset2 on node 2
> 
> At this point, cacheset1 is attached to nothing, but still has valid blocks
> "linked" to the backing md device

From bcache.h:

When you register a newly formatted backing device it'll come up
in passthrough mode, and then you can attach and detach a backing device from
a cache set at runtime - while it's mounted and in use. Detaching implicitly
invalidates any cached data for that backing device.

After flushing, detaching does two things:
- the backing device gets flagged as detached
- the backing device is removed from the cache set's metadata
(stored as uuid_entry in a special bucket; the entry is flagged
with a bogus uuid but not reused).  The offset in that uuids
array constitutes an id, local to the cache set, that is not
reused after detaching.

The second step invalidates the backing device's id in the cache set,
and indirectly invalidates all buckets that referenced it (through
bkey->inode in the bucket key).

> - md detached from cacheset2
> - md stopped on node 2
> - md started on node 1
> - md RE-attached to cacheset1 on node 1
> 
> At this point, I need to make sure that bcache will not serve "old" blocks
> that were linked to the backing device.
> 
> My understanding is that as we have attached the backing device to a new
> cacheset (#2) in-between, this will be "recorded" in the bcache headers and
> all the blocks that used to be valid in the first place won't be served.
> 
> Can you please validate if this is safe or if we need to take special care
> about invalidating the original cacheset ?

> Thanks a lot, - Patrick -

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Moving a backing device between 2 cachesets
  2014-01-29 10:42 ` Gabriel de Perthuis
@ 2014-01-29 15:55   ` Patrick Zwahlen
  2014-01-29 17:50     ` matthew patton
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick Zwahlen @ 2014-01-29 15:55 UTC (permalink / raw)
  To: linux-bcache@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 3016 bytes --]

> -----Original Message-----
> From: linux-bcache-owner@vger.kernel.org [mailto:linux-bcache-
> owner@vger.kernel.org] On Behalf Of Gabriel de Perthuis
> Sent: mercredi 29 janvier 2014 11:42
> To: linux-bcache@vger.kernel.org
> Subject: Re: Moving a backing device between 2 cachesets
>
> On Tue, 28 Jan 2014 21:59:02 +0000, Patrick Zwahlen wrote:
>
> > Hi,
> >
> > We're working on a 2-nodes pacemaker cluster that provides iSCSI LUNs
> via
> > SCST. The LUNs are either software RAID arrays located in a shared
> JBOD, or
> > DRBD ressources (active/passive).
> >
> > We are adding bcache to the game with local SSDs (ie not shared, but
> > dedicated to each cluster node).
> >
> > We are using write-through.
> >
> > I need to evaluate the risk when moving a backing device (md) from
> cacheset1
> > (on node #1) to cacheset2 (on node #2) and then back to cacheset #1.
> >
> > Scenario
> > - md attached to cacheset1 and working (on node 1)
> > - md detached from cacheset1
> > - md stopped on node 1
> > - md started on node 2
> > - md attached to cacheset2 on node 2
> >
> > At this point, cacheset1 is attached to nothing, but still has valid
> blocks
> > "linked" to the backing md device
>
> From bcache.h:
>
> When you register a newly formatted backing device it'll come up
> in passthrough mode, and then you can attach and detach a backing device
> from
> a cache set at runtime - while it's mounted and in use. Detaching
> implicitly
> invalidates any cached data for that backing device.
>
> After flushing, detaching does two things:
> - the backing device gets flagged as detached
> - the backing device is removed from the cache set's metadata
> (stored as uuid_entry in a special bucket; the entry is flagged
> with a bogus uuid but not reused).  The offset in that uuids
> array constitutes an id, local to the cache set, that is not
> reused after detaching.
>
> The second step invalidates the backing device's id in the cache set,
> and indirectly invalidates all buckets that referenced it (through
> bkey->inode in the bucket key).

I understand that we are safe, then. Right ? 

Thanks for your clarifications

>
> > - md detached from cacheset2
> > - md stopped on node 2
> > - md started on node 1
> > - md RE-attached to cacheset1 on node 1
> >
> > At this point, I need to make sure that bcache will not serve "old"
> blocks
> > that were linked to the backing device.
> >
> > My understanding is that as we have attached the backing device to a
> new
> > cacheset (#2) in-between, this will be "recorded" in the bcache
> headers and
> > all the blocks that used to be valid in the first place won't be
> served.
> >
> > Can you please validate if this is safe or if we need to take special
> care
> > about invalidating the original cacheset ?
>
> > Thanks a lot, - Patrick -
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6043 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Moving a backing device between 2 cachesets
       [not found]   ` <56b62eb646f94bbe868ae6b2f26d94e8@navex1. navixia.local>
@ 2014-01-29 17:01     ` Gabriel de Perthuis
  0 siblings, 0 replies; 9+ messages in thread
From: Gabriel de Perthuis @ 2014-01-29 17:01 UTC (permalink / raw)
  To: linux-bcache

>> After flushing, detaching does two things:
>> - the backing device gets flagged as detached
>> - the backing device is removed from the cache set's metadata
>> (stored as uuid_entry in a special bucket; the entry is flagged
>> with a bogus uuid but not reused).  The offset in that uuids
>> array constitutes an id, local to the cache set, that is not
>> reused after detaching.
>>
>> The second step invalidates the backing device's id in the cache set,
>> and indirectly invalidates all buckets that referenced it (through
>> bkey->inode in the bucket key).
> 
> I understand that we are safe, then. Right ? 

tl;dr: after looking at the code, I expect you'll be fine

That's what I expected anyway (iirc these non-reusable ids appear
somewhere, maybe logs), I just spent a little more time checking.

> Thanks for your clarifications

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Moving a backing device between 2 cachesets
  2014-01-29 15:55   ` Patrick Zwahlen
@ 2014-01-29 17:50     ` matthew patton
  2014-01-29 18:07       ` Patrick Zwahlen
  0 siblings, 1 reply; 9+ messages in thread
From: matthew patton @ 2014-01-29 17:50 UTC (permalink / raw)
  To: linux-bcache@vger.kernel.org



> When you register a newly formatted backing device it'll come up
> in passthrough mode, and then you can attach and detach a backing device
> from
> a cache set at runtime - while it's mounted and in use. Detaching
> implicitly
> invalidates any cached data for that backing device.


so it's better to think of bcache as adding and subtracting backing (aka real devices with user data) from a notionally "static" cacheset?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Moving a backing device between 2 cachesets
  2014-01-29 17:50     ` matthew patton
@ 2014-01-29 18:07       ` Patrick Zwahlen
  2014-01-29 18:42         ` matthew patton
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick Zwahlen @ 2014-01-29 18:07 UTC (permalink / raw)
  To: linux-bcache@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]

> -----Original Message-----
> From: linux-bcache-owner@vger.kernel.org [mailto:linux-bcache-
> owner@vger.kernel.org] On Behalf Of matthew patton
> Sent: mercredi 29 janvier 2014 18:51
> To: linux-bcache@vger.kernel.org
> Subject: Re: Moving a backing device between 2 cachesets
> 
> 
> > When you register a newly formatted backing device it'll come up
> > in passthrough mode, and then you can attach and detach a backing
> device
> > from
> > a cache set at runtime - while it's mounted and in use. Detaching
> > implicitly
> > invalidates any cached data for that backing device.
> 
> 
> so it's better to think of bcache as adding and subtracting backing (aka
> real devices with user data) from a notionally "static" cacheset?

That's at least how we're currently using bcache on our test setup in order
to avoid using shared (SAS) SSD.

I also know about interposers but never used any and would like to avoid
them if possible.

Moving a resource from one cachet to the other will be a rare event
(maintenance, patch, test, crash) and I don't care having to start again
with a cold cache in this case, including when re-assigining to resource to
its original "home" cacheset. 

Regards, - Patrick -


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6043 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Moving a backing device between 2 cachesets
  2014-01-29 18:07       ` Patrick Zwahlen
@ 2014-01-29 18:42         ` matthew patton
  0 siblings, 0 replies; 9+ messages in thread
From: matthew patton @ 2014-01-29 18:42 UTC (permalink / raw)
  To: Patrick Zwahlen, linux-bcache@vger.kernel.org

> Moving a resource from one cachet to the other will be a rare event
> (maintenance, patch, test, crash)

The crash aspect of this needs more investigation IMO. So there are ways to forcibly turn on a backing disk even if it's cache device is gone (because it's now running on the other head-unit). In write-thru at least you hope the written data is there (modulo caching further up the stack like in the application, or linux filesystem, or downstream at the HBA and on-disk caches). 

Once the crashed head is booted you'll need to invalidate all associations still present in the cache device. I don't see how you can "echo <CSET-UUID> > /sys/block/bcache0/bcache/dettach" when 'bcache0' doesn't exist anymore on that head. so now we've got no way to clean up the cacheset except to blow it away and re-init with 'make-bcache -C <ssd>' or do we accomplish the same thing by 
'echo unregister > /sys/fs/bcache/<cset-uuid>/bdev'?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-01-29 18:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-28 21:59 Moving a backing device between 2 cachesets Patrick Zwahlen
2014-01-28 22:35 ` matthew patton
2014-01-29  7:49   ` Patrick Zwahlen
2014-01-29 10:42 ` Gabriel de Perthuis
2014-01-29 15:55   ` Patrick Zwahlen
2014-01-29 17:50     ` matthew patton
2014-01-29 18:07       ` Patrick Zwahlen
2014-01-29 18:42         ` matthew patton
     [not found] ` < lcalu5$iek$1@ger.gmane.org>
     [not found]   ` <56b62eb646f94bbe868ae6b2f26d94e8@navex1. navixia.local>
2014-01-29 17:01     ` Gabriel de Perthuis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).