All of lore.kernel.org
 help / color / mirror / Atom feed
* dm-cache refusing to come up again after a crash
@ 2013-12-06 15:49 Steinar H. Gunderson
  2013-12-06 17:57 ` Joe Thornber
  0 siblings, 1 reply; 9+ messages in thread
From: Steinar H. Gunderson @ 2013-12-06 15:49 UTC (permalink / raw)
  To: dm-devel

Linux (3.12.0-rc5) hung, and on boot, I can't get the dm-cache up again:

(initramfs) echo 0 23440891904 cache /dev/cache/metadata /dev/cache/blocks /dev/md1 1024 1 writeback
default 4 random_threshold 8 sequential_threshold 512 | dmsetup create cache -u CACHE-0a8bb56fc873c195bf7117af925c7f08
device-mapper: reload ioctl on cache failed: Input/output error
Command failed

The kernel complains with

[  639.189756] attempt to access beyond end of device
[  639.189761] dm-0: rw=0, want=18445688752888627208, limit=1048576
[  639.189764] device-mapper: transaction manager: couldn't open metadata space map
[  639.189767] device-mapper: cache metadata: tm_open_with_sm failed
[  639.283130] device-mapper: table: 254:2: cache: Error creating metadata object
[  639.283134] device-mapper: ioctl: error adding target to table

Is there anything I can do short of nuking the metadata partition
and taking the loss of whatever wasn't written back?

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-cache refusing to come up again after a crash
  2013-12-06 15:49 dm-cache refusing to come up again after a crash Steinar H. Gunderson
@ 2013-12-06 17:57 ` Joe Thornber
  2013-12-06 19:16   ` Steinar H. Gunderson
  0 siblings, 1 reply; 9+ messages in thread
From: Joe Thornber @ 2013-12-06 17:57 UTC (permalink / raw)
  To: device-mapper development

On Fri, Dec 06, 2013 at 04:49:14PM +0100, Steinar H. Gunderson wrote:
> Linux (3.12.0-rc5) hung, and on boot, I can't get the dm-cache up again:
> 
> (initramfs) echo 0 23440891904 cache /dev/cache/metadata /dev/cache/blocks /dev/md1 1024 1 writeback
> default 4 random_threshold 8 sequential_threshold 512 | dmsetup create cache -u CACHE-0a8bb56fc873c195bf7117af925c7f08
> device-mapper: reload ioctl on cache failed: Input/output error
> Command failed
> 
> The kernel complains with
> 
> [  639.189756] attempt to access beyond end of device
> [  639.189761] dm-0: rw=0, want=18445688752888627208, limit=1048576
> [  639.189764] device-mapper: transaction manager: couldn't open metadata space map
> [  639.189767] device-mapper: cache metadata: tm_open_with_sm failed
> [  639.283130] device-mapper: table: 254:2: cache: Error creating metadata object
> [  639.283134] device-mapper: ioctl: error adding target to table
> 
> Is there anything I can do short of nuking the metadata partition
> and taking the loss of whatever wasn't written back?

Yep, grab:

https://github.com/jthornber/thin-provisioning-tools

build, and then try cache_check on it (which should tell you what's
wrong).  Other programs to play with are cache_dump, cache_restore and
cache_repair.

Let me know how it goes,

- Joe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-cache refusing to come up again after a crash
  2013-12-06 17:57 ` Joe Thornber
@ 2013-12-06 19:16   ` Steinar H. Gunderson
  2013-12-06 19:35     ` Steinar H. Gunderson
  2013-12-09 10:28     ` Joe Thornber
  0 siblings, 2 replies; 9+ messages in thread
From: Steinar H. Gunderson @ 2013-12-06 19:16 UTC (permalink / raw)
  To: device-mapper development

On Fri, Dec 06, 2013 at 05:57:13PM +0000, Joe Thornber wrote:
> Yep, grab:
> 
> https://github.com/jthornber/thin-provisioning-tools
> 
> build, and then try cache_check on it (which should tell you what's
> wrong).  Other programs to play with are cache_dump, cache_restore and
> cache_repair.

Well, first of all, it doesn't compile, since you use typename outside of
templates :-) Fixing that is easy, though. But afterwards:

root@ubuntu:~/thin-provisioning-tools# ./cache_check /dev/md1
examining superblock
  superblock is corrupt
    bad checksum in superblock

So where do I want to go from there? cache_dump doesn't want to play with the
superblock because the checksum is bad... do I want cache_repair, then? Do I
want to take a backup of anything first?

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-cache refusing to come up again after a crash
  2013-12-06 19:16   ` Steinar H. Gunderson
@ 2013-12-06 19:35     ` Steinar H. Gunderson
  2013-12-06 19:53       ` Steinar H. Gunderson
  2013-12-09 10:31       ` Joe Thornber
  2013-12-09 10:28     ` Joe Thornber
  1 sibling, 2 replies; 9+ messages in thread
From: Steinar H. Gunderson @ 2013-12-06 19:35 UTC (permalink / raw)
  To: device-mapper development

On Fri, Dec 06, 2013 at 08:16:05PM +0100, Steinar H. Gunderson wrote:
> Well, first of all, it doesn't compile, since you use typename outside of
> templates :-) Fixing that is easy, though. But afterwards:
> 
> root@ubuntu:~/thin-provisioning-tools# ./cache_check /dev/md1
> examining superblock
>   superblock is corrupt
>     bad checksum in superblock

Sorry, wrong device:

root@ubuntu:~/thin-provisioning-tools# ./cache_check /dev/cache/metadata 
examining superblock
examining mapping array
no hint array present
examining discard bitset
root@ubuntu:~/thin-provisioning-tools# echo $?
0

Does that mean it ought to have worked better? :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-cache refusing to come up again after a crash
  2013-12-06 19:35     ` Steinar H. Gunderson
@ 2013-12-06 19:53       ` Steinar H. Gunderson
  2013-12-07  0:16         ` Steinar H. Gunderson
  2013-12-09 10:31       ` Joe Thornber
  1 sibling, 1 reply; 9+ messages in thread
From: Steinar H. Gunderson @ 2013-12-06 19:53 UTC (permalink / raw)
  To: device-mapper development

On Fri, Dec 06, 2013 at 08:35:41PM +0100, Steinar H. Gunderson wrote:
> root@ubuntu:~/thin-provisioning-tools# ./cache_check /dev/cache/metadata 

And I forgot:

root@ubuntu:~/thin-provisioning-tools# ./cache_dump /dev/cache/metadata
<superblock uuid="" block_size="512" nr_cache_blocks="865560" policy="cleaner" hint_width="4">
  <mappings>
    <mapping cache_block="0" origin_block="6118373" dirty="false"/>
    <mapping cache_block="1" origin_block="6118275" dirty="false"/>
    <mapping cache_block="2" origin_block="5934780" dirty="false"/>
[... lots of blocks, none of them dirty ...]
    <mapping cache_block="505877" origin_block="889613" dirty="false"/>
    <mapping cache_block="505878" origin_block="690575" dirty="false"/>
    <mapping cache_block="505879" origin_block="875752" dirty="false"/>
  </mappings>
  <hints>
cache_dump: /usr/include/boost/smart_ptr/shared_ptr.hpp:418: T*
boost::shared_ptr< <template-parameter-1-1> >::operator->() const [with T =
caching::hint_array]: Assertion `px != 0' failed.

The cleaner policy is not the one I actually use, but I have used it in the
past, so I guess it's stuck somehow.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-cache refusing to come up again after a crash
  2013-12-06 19:53       ` Steinar H. Gunderson
@ 2013-12-07  0:16         ` Steinar H. Gunderson
  0 siblings, 0 replies; 9+ messages in thread
From: Steinar H. Gunderson @ 2013-12-07  0:16 UTC (permalink / raw)
  To: device-mapper development

On Fri, Dec 06, 2013 at 08:53:10PM +0100, Steinar H. Gunderson wrote:
> <superblock uuid="" block_size="512" nr_cache_blocks="865560" policy="cleaner" hint_width="4">
>   <mappings>
>     <mapping cache_block="0" origin_block="6118373" dirty="false"/>
>     <mapping cache_block="1" origin_block="6118275" dirty="false"/>
>     <mapping cache_block="2" origin_block="5934780" dirty="false"/>
> [... lots of blocks, none of them dirty ...]
>     <mapping cache_block="505877" origin_block="889613" dirty="false"/>
>     <mapping cache_block="505878" origin_block="690575" dirty="false"/>
>     <mapping cache_block="505879" origin_block="875752" dirty="false"/>
>   </mappings>

OK, so since all blocks were marked as non-dirty, I wiped the metadata
volume, which made the system boot just fine, but was seemingly a big
mistake; a lot of filesystems had more or less fatal errors.

I'm restoring from backup right now. (Yes, I have them.)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-cache refusing to come up again after a crash
  2013-12-06 19:16   ` Steinar H. Gunderson
  2013-12-06 19:35     ` Steinar H. Gunderson
@ 2013-12-09 10:28     ` Joe Thornber
  2013-12-09 10:34       ` Steinar H. Gunderson
  1 sibling, 1 reply; 9+ messages in thread
From: Joe Thornber @ 2013-12-09 10:28 UTC (permalink / raw)
  To: device-mapper development

On Fri, Dec 06, 2013 at 08:16:05PM +0100, Steinar H. Gunderson wrote:
> On Fri, Dec 06, 2013 at 05:57:13PM +0000, Joe Thornber wrote:
> > Yep, grab:
> > 
> > https://github.com/jthornber/thin-provisioning-tools
> > 
> > build, and then try cache_check on it (which should tell you what's
> > wrong).  Other programs to play with are cache_dump, cache_restore and
> > cache_repair.
> 
> Well, first of all, it doesn't compile, since you use typename outside of
> templates :-) Fixing that is easy, though. But afterwards:

Grr, I thought that was fixed, what version of g++ are you using?

> 
> root@ubuntu:~/thin-provisioning-tools# ./cache_check /dev/md1
> examining superblock
>   superblock is corrupt
>     bad checksum in superblock
> 
> So where do I want to go from there? cache_dump doesn't want to play with the
> superblock because the checksum is bad... do I want cache_repair, then? Do I
> want to take a backup of anything first?

Ouch.  Could you go through what happened please?  Did dm-cache crash,
or did the machine die for some other reason?

- Joe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-cache refusing to come up again after a crash
  2013-12-06 19:35     ` Steinar H. Gunderson
  2013-12-06 19:53       ` Steinar H. Gunderson
@ 2013-12-09 10:31       ` Joe Thornber
  1 sibling, 0 replies; 9+ messages in thread
From: Joe Thornber @ 2013-12-09 10:31 UTC (permalink / raw)
  To: device-mapper development

On Fri, Dec 06, 2013 at 08:35:41PM +0100, Steinar H. Gunderson wrote:
> On Fri, Dec 06, 2013 at 08:16:05PM +0100, Steinar H. Gunderson wrote:
> > Well, first of all, it doesn't compile, since you use typename outside of
> > templates :-) Fixing that is easy, though. But afterwards:
> > 
> > root@ubuntu:~/thin-provisioning-tools# ./cache_check /dev/md1
> > examining superblock
> >   superblock is corrupt
> >     bad checksum in superblock
> 
> Sorry, wrong device:
> 
> root@ubuntu:~/thin-provisioning-tools# ./cache_check /dev/cache/metadata 
> examining superblock
> examining mapping array
> no hint array present
> examining discard bitset
> root@ubuntu:~/thin-provisioning-tools# echo $?
> 0
> 
> Does that mean it ought to have worked better? :-)

Yes, this is good news.  The damage is probably in the space maps
which get completely regenerated during a restore/repair.

I see from a later mail that you're having another issue with the
tools.  I wonder if you could email me off list, and we'll work out
how I can get a copy of your metadata.

- Joe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: dm-cache refusing to come up again after a crash
  2013-12-09 10:28     ` Joe Thornber
@ 2013-12-09 10:34       ` Steinar H. Gunderson
  0 siblings, 0 replies; 9+ messages in thread
From: Steinar H. Gunderson @ 2013-12-09 10:34 UTC (permalink / raw)
  To: device-mapper development

On Mon, Dec 09, 2013 at 10:28:11AM +0000, Joe Thornber wrote:
>> Well, first of all, it doesn't compile, since you use typename outside of
>> templates :-) Fixing that is easy, though. But afterwards:
> Grr, I thought that was fixed, what version of g++ are you using?

This is an Ubuntu 10.04 live CD, which was what I was having handy.
It works fine in a Debian wheezy live CD (which I switched to later).

>> So where do I want to go from there? cache_dump doesn't want to play with the
>> superblock because the checksum is bad... do I want cache_repair, then? Do I
>> want to take a backup of anything first?
> Ouch.  Could you go through what happened please?  Did dm-cache crash,
> or did the machine die for some other reason?

The machine hung. I don't know entirely why (I don't have the logs).
I rebooted, and it refused to take up the volume (this is what the original
post in this message is about). After booting to a live CD and running
cache_check and cache_dump, I was convinced there were no dirty blocks,
so I nuked the entire metadata volume (using dd from /dev/zero).

This made the machine boot again, but with tons of filesystem errors on
anything I'd written to in the last few months, so I restored from backup
(thankfully I do have working backups!). I also upgraded to 3.13-rc3 in the
hopes of fixing whatever issue in 3.12 originally caused this; however, as
reported in the other thread, this was notoriously unstable, and after the
third crash, I was back into the “won't boot, but cache_check says everything
is fine” mode.

That's the current status; it's now standing in a live CD and not doing much
useful. I miss my machine :-) (And I hope I haven't lost data again.) Will it
help if I upload a dump of the 512MB metadata volume somewhere?

/* Steinar */
-- 
Homepage: http://www.sesse.net/

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-12-09 10:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-06 15:49 dm-cache refusing to come up again after a crash Steinar H. Gunderson
2013-12-06 17:57 ` Joe Thornber
2013-12-06 19:16   ` Steinar H. Gunderson
2013-12-06 19:35     ` Steinar H. Gunderson
2013-12-06 19:53       ` Steinar H. Gunderson
2013-12-07  0:16         ` Steinar H. Gunderson
2013-12-09 10:31       ` Joe Thornber
2013-12-09 10:28     ` Joe Thornber
2013-12-09 10:34       ` Steinar H. Gunderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.