linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kai Krakow <hurikhan77@gmail.com>
To: linux-bcache@vger.kernel.org
Subject: Re: bcache fails after reboot if discard is enabled
Date: Sat, 11 Apr 2015 09:52:43 +0200	[thread overview]
Message-ID: <bf0nvb-uvd.ln1@hurikhan77.spdns.de> (raw)
In-Reply-To: CAPL5yKcGvjC7ubzdKNjCFX6ZLc+qrr-Q0BQCX6W0zbueTkndPg@mail.gmail.com

Dan Merillat <dan.merillat@gmail.com> schrieb:

> Looking through the kernel log, this may be related: I booted into
> 4.0-rc7, and attempted to run it there at first:
> Apr  7 12:54:08 fileserver kernel: [ 2028.533893] bcache-register:
> page allocation failure: order:8, mode:0x
> ... memory dump
> Apr  7 12:54:08 fileserver kernel: [ 2028.541396] bcache:
> register_cache() error opening sda4: cannot allocate memory

Is your system under memory stress? Are you maybe using the huge memory page 
allocation policy in your kernel? If yes, could you retry without or at 
least set it to madvice mode?

> Apr  7 12:55:29 fileserver kernel: [ 2109.303315] bcache:
> run_cache_set() invalidating existing data
> Apr  7 12:55:29 fileserver kernel: [ 2109.408255] bcache:
> bch_cached_dev_attach() Caching md127 as bcache0 on set
> 804d6906-fa80-40ac-9081-a71a4d595378

Why is it on md? I thought you are not using intermediate layers like LVM...

> Apr  7 12:55:29 fileserver kernel: [ 2109.408443] bcache:
> register_cache() registered cache device sda4
> Apr  7 12:55:33 fileserver kernel: [ 2113.307687] bcache:
> bch_cached_dev_attach() Can't attach md127: already attached

And why is it done twice? Something looks strange here... What is your 
device layout?

> Apr  7 12:55:33 fileserver kernel: [ 2113.307747] bcache:
> __cached_dev_store() Can't attach 804d6906-fa80-40ac-9081-a71a4d595378
> Apr  7 12:55:33 fileserver kernel: [ 2113.307747] : cache set not found

My first guess would be that two different caches overlap and try to share 
the same device space. I had a similar problem after repartitioning because 
I did not "wipefs" the device first.

> A few hours later, I was getting stalls:
> Apr  7 18:00:20 fileserver kernel: [20400.288049] INFO: task java:3610
> blocked for more than 120 seconds.
> Apr  7 18:00:20 fileserver kernel: [20400.288069]       Not tainted
> 4.0.0-rc7 #1
> Apr  7 18:00:20 fileserver kernel: [20400.288085] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this
>  message.
> Apr  7 18:00:20 fileserver kernel: [20400.293521] INFO: task
> nmbd:22692 blocked for more than 120 seconds.
> Apr  7 18:00:20 fileserver kernel: [20400.293532]       Not tainted
> 4.0.0-rc7 #1
> Apr  7 18:00:20 fileserver kernel: [20400.293545] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this
>  message.

If you are using huge memory this may be an artifact of your initial 
finding.

> So I rebooted to 4.0-rc7 again:
> Apr  7 19:36:23 fileserver kernel: [    2.145004] bcache:
> journal_read_bucket() 157: too big, 552 bytes, offset 2047
> Apr  7 19:36:23 fileserver kernel: [    2.154586] bcache: prio_read()
> bad csum reading priorities
> Apr  7 19:36:23 fileserver kernel: [    2.154643] bcache: prio_read()
> bad magic reading priorities
> Apr  7 19:36:23 fileserver kernel: [    2.158008] bcache: error on
> 804d6906-fa80-40ac-9081-a71a4d595378: bad btree header at bucket
> 65638, block 0, 0 keys, disabling caching

Same here: If somehow two different caches overwrite each other, this could 
explain the problem.

> Apr  7 19:36:23 fileserver kernel: [    2.158408] bcache:
> cache_set_free() Cache set 804d6906-fa80-40ac-9081-a71a4d595378
> unregistered
> Apr  7 19:36:23 fileserver kernel: [    2.158468] bcache:
> register_cache() registered cache device sda4
> 
> Apr  7 19:36:23 fileserver kernel: [    2.226581] md127: detected
> capacity change from 0 to 12001954234368

I wonder where md127 comes from... Maybe bcache probing is running too early 
and should run after md setup.

> Apr  7 19:36:23 fileserver kernel: [    2.265347] bcache:
> register_bdev() registered backing device md127
> 
> Apr  7 19:36:23 fileserver kernel: [   21.423819] bcache:
> journal_read_bucket() 157: too big, 552 bytes, offset 2047
> Apr  7 19:36:23 fileserver kernel: [   21.432091] bcache: prio_read()
> bad csum reading priorities
> Apr  7 19:36:23 fileserver kernel: [   21.432138] bcache: prio_read()
> bad magic reading priorities
> Apr  7 19:36:23 fileserver kernel: [   21.435613] bcache: error on
> 804d6906-fa80-40ac-9081-a71a4d595378: bad btree header at bucket
> 65638, block 0, 0 keys, disabling caching
> Apr  7 19:36:23 fileserver kernel: [   21.436225] bcache:
> cache_set_free() Cache set 804d6906-fa80-40ac-9081-a71a4d595378
> unregistered
> Apr  7 19:36:23 fileserver kernel: [   21.436273] bcache:
> register_cache() registered cache device sda4
> 
> At this point, everything is gone, and that's where I'm at right now.

-- 
Replies to list only preferred.

  reply	other threads:[~2015-04-11  8:02 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-02  9:47 bcache fails after reboot if discard is enabled Stefan Priebe - Profihost AG
2015-01-02 10:00 ` Stefan Priebe - Profihost AG
2015-01-03 16:32   ` Rolf Fokkens
2015-01-03 19:32     ` Stefan Priebe
2015-01-05  0:06       ` Michael Goertz
2015-02-09 19:46         ` Kai Krakow
2015-04-08  0:06           ` Dan Merillat
2015-04-08 18:17             ` Eric Wheeler
2015-04-08 18:27               ` Stefan Priebe
2015-04-08 19:31                 ` Eric Wheeler
2015-04-08 19:54                   ` Kai Krakow
2015-04-08 22:02                     ` Dan Merillat
2015-04-10 23:00                       ` Kai Krakow
2015-04-11  0:14                         ` Kai Krakow
2015-04-11  6:31                           ` Dan Merillat
2015-04-11  6:54                             ` Dan Merillat
2015-04-11  7:52                               ` Kai Krakow [this message]
2015-04-11 18:53                                 ` Dan Merillat
     [not found]                                 ` <CAPL5yKfpk8+6Vw cUVcwJ9QxAZJQmqaa98spCyT7+LekkRvkeAw@mail.gmail.com>
2015-04-11 20:09                                   ` Kai Krakow
2015-04-12  5:56                                     ` Dan Merillat
2015-04-29 17:48                                       ` Dan Merillat
2015-04-29 18:00                                         ` Ming Lin
2015-04-29 19:57                                         ` Kai Krakow
2015-04-08 18:46             ` Kai Krakow
2015-06-05  5:11             ` Kai Krakow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bf0nvb-uvd.ln1@hurikhan77.spdns.de \
    --to=hurikhan77@gmail.com \
    --cc=linux-bcache@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).