linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* every boot gives: bcache/alloc.c:78 WARNING
@ 2016-08-03 23:19 Marc MERLIN
  2016-08-04  0:26 ` Eric Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Marc MERLIN @ 2016-08-03 23:19 UTC (permalink / raw)
  To: linux-bcache

This happens on all kernels up to 4.7.
Sorry, it happens earlier than ethernet coming up or my storage, so I can't
use netconsole or other text dumps:
https://goo.gl/photos/ubsi6maZXsjkevYY7

Looks like the warning happens on the registration of one of my bcache, but
I can't tell which one or why.

Does the trace give any hints?
(please ignore the load modules errors below, different issue)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: every boot gives: bcache/alloc.c:78 WARNING
  2016-08-03 23:19 every boot gives: bcache/alloc.c:78 WARNING Marc MERLIN
@ 2016-08-04  0:26 ` Eric Wheeler
  2016-08-04  0:33   ` Marc MERLIN
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wheeler @ 2016-08-04  0:26 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-bcache

On Wed, 3 Aug 2016, Marc MERLIN wrote:

> This happens on all kernels up to 4.7.
> Sorry, it happens earlier than ethernet coming up or my storage, so I can't
> use netconsole or other text dumps:
> https://goo.gl/photos/ubsi6maZXsjkevYY7
> 
> Looks like the warning happens on the registration of one of my bcache, but
> I can't tell which one or why.
> 
> Does the trace give any hints?
> (please ignore the load modules errors below, different issue)


Does it cause a problem, or just warn?

 73 uint8_t bch_inc_gen(struct cache *ca, struct bucket *b)
 74 {
 75         uint8_t ret = ++b->gen;
 76 
 77         ca->set->need_gc = max(ca->set->need_gc, bucket_gc_gen(b));
 78         WARN_ON_ONCE(ca->set->need_gc > BUCKET_GC_GEN_MAX);
 79 
 80         return ret;
 81 }

It looks like something needs garbage collected but perhaps isn't.  

You could write to sysfs/.../trigger_gc:
 
https://evilpiepirate.org/git/linux-bcache.git/tree/Documentation/bcache.txt
	trigger_gc
	  Writing to this file forces garbage collection to run.

If that doesn't work, I wonder what increasing BUCKET_GC_GEN_MAX would do, 
though I don't know if that is safe.  Its set to 96U, so its not on on a 
bit boundary which sounds like it could be slightly safer---but I wouldn't 
try it unless this is a test machine.


--
Eric Wheeler


> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: every boot gives: bcache/alloc.c:78 WARNING
  2016-08-04  0:26 ` Eric Wheeler
@ 2016-08-04  0:33   ` Marc MERLIN
  2016-08-04  1:43     ` Eric Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Marc MERLIN @ 2016-08-04  0:33 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: linux-bcache

On Wed, Aug 03, 2016 at 05:26:30PM -0700, Eric Wheeler wrote:
> On Wed, 3 Aug 2016, Marc MERLIN wrote:
> 
> > This happens on all kernels up to 4.7.
> > Sorry, it happens earlier than ethernet coming up or my storage, so I can't
> > use netconsole or other text dumps:
> > https://goo.gl/photos/ubsi6maZXsjkevYY7
> > 
> > Looks like the warning happens on the registration of one of my bcache, but
> > I can't tell which one or why.
> > 
> > Does the trace give any hints?
> > (please ignore the load modules errors below, different issue)
> 
> Does it cause a problem, or just warn?
 
No problem, but since it's a warning, I'm reporting it.

> It looks like something needs garbage collected but perhaps isn't.  
> 
> You could write to sysfs/.../trigger_gc:
>  
> https://evilpiepirate.org/git/linux-bcache.git/tree/Documentation/bcache.txt
> 	trigger_gc
> 	  Writing to this file forces garbage collection to run.

BTW both this and the just released 4.7.0 are still missing the
documentation updates I contributed months ago. Any idea what's going on
there?

> If that doesn't work, I wonder what increasing BUCKET_GC_GEN_MAX would do, 
> though I don't know if that is safe.  Its set to 96U, so its not on on a 
> bit boundary which sounds like it could be slightly safer---but I wouldn't 
> try it unless this is a test machine.

That's on the backing device, correct?
I have 3 of them, and I can only write to them way way later in the boot
process.
Should I do that one by one and see if I get output now?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: every boot gives: bcache/alloc.c:78 WARNING
  2016-08-04  0:33   ` Marc MERLIN
@ 2016-08-04  1:43     ` Eric Wheeler
  2016-08-04  2:44       ` Eric Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wheeler @ 2016-08-04  1:43 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-bcache


On Wed, 3 Aug 2016, Marc MERLIN wrote:

> On Wed, Aug 03, 2016 at 05:26:30PM -0700, Eric Wheeler wrote:
> > On Wed, 3 Aug 2016, Marc MERLIN wrote:
> > 
> > > This happens on all kernels up to 4.7.
> > > Sorry, it happens earlier than ethernet coming up or my storage, so I can't
> > > use netconsole or other text dumps:
> > > https://goo.gl/photos/ubsi6maZXsjkevYY7
> > > 
> > > Looks like the warning happens on the registration of one of my bcache, but
> > > I can't tell which one or why.
> > > 
> > > Does the trace give any hints?
> > > (please ignore the load modules errors below, different issue)
> > 
> > Does it cause a problem, or just warn?
>  
> No problem, but since it's a warning, I'm reporting it.
> 
> > It looks like something needs garbage collected but perhaps isn't.  
> > 
> > You could write to sysfs/.../trigger_gc:
> >  
> > https://evilpiepirate.org/git/linux-bcache.git/tree/Documentation/bcache.txt
> > 	trigger_gc
> > 	  Writing to this file forces garbage collection to run.
> 
> BTW both this and the just released 4.7.0 are still missing the
> documentation updates I contributed months ago. Any idea what's going on
> there?
> 
> > If that doesn't work, I wonder what increasing BUCKET_GC_GEN_MAX would do, 
> > though I don't know if that is safe.  Its set to 96U, so its not on on a 
> > bit boundary which sounds like it could be slightly safer---but I wouldn't 
> > try it unless this is a test machine.
> 
> That's on the backing device, correct?
> I have 3 of them, and I can only write to them way way later in the boot
> process.
> Should I do that one by one and see if I get output now?


When you say do "that" do you mean `trigger_gc` ?

I think trigger_gc a cache thing, but the whole bcacheN dev might need to 
be online before it can be triggered (not sure).  Backing devices really 
have metadata, just superblock.

--
Eric Wheeler


> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: every boot gives: bcache/alloc.c:78 WARNING
  2016-08-04  1:43     ` Eric Wheeler
@ 2016-08-04  2:44       ` Eric Wheeler
  2016-08-04  3:23         ` Marc MERLIN
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wheeler @ 2016-08-04  2:44 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-bcache

On Wed, 3 Aug 2016, Eric Wheeler wrote:

> 
> On Wed, 3 Aug 2016, Marc MERLIN wrote:
> 
> > On Wed, Aug 03, 2016 at 05:26:30PM -0700, Eric Wheeler wrote:
> > > On Wed, 3 Aug 2016, Marc MERLIN wrote:
> > > 
> > > > This happens on all kernels up to 4.7.
> > > > Sorry, it happens earlier than ethernet coming up or my storage, so I can't
> > > > use netconsole or other text dumps:
> > > > https://goo.gl/photos/ubsi6maZXsjkevYY7
> > > > 
> > > > Looks like the warning happens on the registration of one of my bcache, but
> > > > I can't tell which one or why.
> > > > 
> > > > Does the trace give any hints?
> > > > (please ignore the load modules errors below, different issue)
> > > 
> > > Does it cause a problem, or just warn?
> >  
> > No problem, but since it's a warning, I'm reporting it.
> > 
> > > It looks like something needs garbage collected but perhaps isn't.  
> > > 
> > > You could write to sysfs/.../trigger_gc:
> > >  
> > > https://evilpiepirate.org/git/linux-bcache.git/tree/Documentation/bcache.txt
> > > 	trigger_gc
> > > 	  Writing to this file forces garbage collection to run.
> > 
> > BTW both this and the just released 4.7.0 are still missing the
> > documentation updates I contributed months ago. Any idea what's going on
> > there?
> > 
> > > If that doesn't work, I wonder what increasing BUCKET_GC_GEN_MAX would do, 
> > > though I don't know if that is safe.  Its set to 96U, so its not on on a 
> > > bit boundary which sounds like it could be slightly safer---but I wouldn't 
> > > try it unless this is a test machine.
> > 
> > That's on the backing device, correct?
> > I have 3 of them, and I can only write to them way way later in the boot
> > process.
> > Should I do that one by one and see if I get output now?
> 
> 
> When you say do "that" do you mean `trigger_gc` ?
> 
> I think trigger_gc a cache thing, but the whole bcacheN dev might need to 
> be online before it can be triggered (not sure).  Backing devices really 
> have metadata, just superblock.

I meant to say: 

Backing devices have no metadata, just superblock.



--
Eric Wheeler


> 
> --
> Eric Wheeler
> 
> 
> > 
> > Thanks,
> > Marc
> > -- 
> > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> > Microsoft is to operating systems ....
> >                                       .... what McDonalds is to gourmet cooking
> > Home page: http://marc.merlins.org/  
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: every boot gives: bcache/alloc.c:78 WARNING
  2016-08-04  2:44       ` Eric Wheeler
@ 2016-08-04  3:23         ` Marc MERLIN
  2016-08-04  4:32           ` Eric Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Marc MERLIN @ 2016-08-04  3:23 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: linux-bcache

On Wed, Aug 03, 2016 at 07:44:41PM -0700, Eric Wheeler wrote:
> > When you say do "that" do you mean `trigger_gc` ?
> > 
> > I think trigger_gc a cache thing, but the whole bcacheN dev might need to 
> > be online before it can be triggered (not sure).  Backing devices really 
> > have metadata, just superblock.
> 
> I meant to say: 
> 
> Backing devices have no metadata, just superblock.

Mmmh, doing this just killed my cache:
saruman:/sys/block/bcache0# echo 1 > /sys/fs/bcache/7f2e1508-8db6-48cb-85d6-606c88f81f63/internal/trigger_gc

[ 1639.204612] bcache: error on fc8cd783-346b-48f5-a619-fb0380584aa9: key too stale: 97, need_gc 97, disabling caching
[ 1639.204625] CPU: 7 PID: 519 Comm: bcache_gc Tainted: G        W  OE   4.4.5-amd64-volpreempt-sysrq-20160312bc5 #10
[ 1639.204627] Hardware name: LENOVO 20ERCTO1WW/20ERCTO1WW, BIOS N1DET41W (1.15 ) 12/31/2015
[ 1639.204629]  0000000000000000 ffff8808781fbbc0 ffffffff8134d88e ffff880875040ab8
[ 1639.204635]  ffff88087a3edcd0 ffff8808781fbc00 ffffffffc03b8609 0000000000000001
[ 1639.204639]  ffff880875040ab8 ffffffffc03b17ed ffff88087a3edcd0 ffff8808781fbc50
[ 1639.204644] Call Trace:
[ 1639.204649]  [<ffffffff8134d88e>] dump_stack+0x61/0x7d
[ 1639.204668]  [<ffffffffc03b8609>] bch_extent_bad+0xd7/0x12b [bcache]
[ 1639.204677]  [<ffffffffc03b17ed>] ? bch_ptr_invalid+0xc/0xc [bcache]
[ 1639.204684]  [<ffffffffc03b17f7>] bch_ptr_bad+0xa/0xc [bcache]
[ 1639.204690]  [<ffffffffc03b1646>] bch_btree_iter_next_filter+0x32/0x42 [bcache]
[ 1639.204695]  [<ffffffffc03b1ce2>] btree_gc_count_keys+0x3b/0x59 [bcache]
[ 1639.204701]  [<ffffffffc03b5e44>] btree_gc_recurse+0x11b/0x2db [bcache]
[ 1639.204705]  [<ffffffff8164ad9b>] ? __schedule+0x3b1/0x575
[ 1639.204710]  [<ffffffffc03b27e5>] ? __bch_btree_mark_key+0xba/0x1a4 [bcache]
[ 1639.204716]  [<ffffffffc03b63c9>] bch_btree_gc+0x246/0x3cc [bcache]
[ 1639.204722]  [<ffffffffc03b63c9>] ? bch_btree_gc+0x246/0x3cc [bcache]
[ 1639.204725]  [<ffffffff8108d0f8>] ? wake_up_atomic_t+0x2c/0x2c
[ 1639.204731]  [<ffffffffc03b6586>] bch_gc_thread+0x37/0xea [bcache]
[ 1639.204736]  [<ffffffffc03b654f>] ? bch_btree_gc+0x3cc/0x3cc [bcache]
[ 1639.204741]  [<ffffffffc03b654f>] ? bch_btree_gc+0x3cc/0x3cc [bcache]
[ 1639.204745]  [<ffffffff81075c36>] kthread+0xa5/0xad
[ 1639.204747]  [<ffffffff81075b91>] ? kthread_parkme+0x24/0x24
[ 1639.204750]  [<ffffffff8164decf>] ret_from_fork+0x3f/0x70
[ 1639.204752]  [<ffffffff81075b91>] ? kthread_parkme+0x24/0x24
[ 1639.246944] bcache: cache_set_free() Cache set fc8cd783-346b-48f5-a619-fb0380584aa9 unregistered

That's not good.

What should I do now?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: every boot gives: bcache/alloc.c:78 WARNING
  2016-08-04  3:23         ` Marc MERLIN
@ 2016-08-04  4:32           ` Eric Wheeler
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Wheeler @ 2016-08-04  4:32 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-bcache

On Wed, 3 Aug 2016, Marc MERLIN wrote:

> On Wed, Aug 03, 2016 at 07:44:41PM -0700, Eric Wheeler wrote:
> > > When you say do "that" do you mean `trigger_gc` ?
> > > 
> > > I think trigger_gc a cache thing, but the whole bcacheN dev might need to 
> > > be online before it can be triggered (not sure).  Backing devices really 
> > > have metadata, just superblock.
> > 
> > I meant to say: 
> > 
> > Backing devices have no metadata, just superblock.
> 
> Mmmh, doing this just killed my cache:
> saruman:/sys/block/bcache0# echo 1 > /sys/fs/bcache/7f2e1508-8db6-48cb-85d6-606c88f81f63/internal/trigger_gc
> 
> [ 1639.204612] bcache: error on fc8cd783-346b-48f5-a619-fb0380584aa9: key too stale: 97, need_gc 97, disabling caching
> [ 1639.204625] CPU: 7 PID: 519 Comm: bcache_gc Tainted: G        W  OE   4.4.5-amd64-volpreempt-sysrq-20160312bc5 #10
> [ 1639.204627] Hardware name: LENOVO 20ERCTO1WW/20ERCTO1WW, BIOS N1DET41W (1.15 ) 12/31/2015
> [ 1639.204629]  0000000000000000 ffff8808781fbbc0 ffffffff8134d88e ffff880875040ab8
> [ 1639.204635]  ffff88087a3edcd0 ffff8808781fbc00 ffffffffc03b8609 0000000000000001
> [ 1639.204639]  ffff880875040ab8 ffffffffc03b17ed ffff88087a3edcd0 ffff8808781fbc50
> [ 1639.204644] Call Trace:
> [ 1639.204649]  [<ffffffff8134d88e>] dump_stack+0x61/0x7d
> [ 1639.204668]  [<ffffffffc03b8609>] bch_extent_bad+0xd7/0x12b [bcache]
> [ 1639.204677]  [<ffffffffc03b17ed>] ? bch_ptr_invalid+0xc/0xc [bcache]
> [ 1639.204684]  [<ffffffffc03b17f7>] bch_ptr_bad+0xa/0xc [bcache]

It seems that you've hit something that might not be a bug.  This looks 
like disk corruption somehow from the looks of the backtrace.  Maybe a 
failed on-SSD writeback flush, writeback controller flush (if any), or 
just erase block wearout.

A quick google shows the last person to have this was in writethrough and 
rebuilt their cache back in 3.11->3.14:
  http://www.spinics.net/lists/linux-bcache/msg02450.html

This looks like a better thread, possibly implicating TRIM:
  https://www.mail-archive.com/linux-bcache@vger.kernel.org/msg02720.html

If you are writeback, then maybe you could disable gc.  I don't think 
there's a way to disable gc via sysfs, but you could try to comment this 
out:

drivers/md/bcache/super.c:
	1669         if (bch_gc_thread_start(c))
	1670                 goto err;

If it still functions (no idea, it might fail in other unexpected ways), 
then perhaps you can detach your cache and get it to writeback.  

> [ 1639.204690]  [<ffffffffc03b1646>] bch_btree_iter_next_filter+0x32/0x42 [bcache]
> [ 1639.204695]  [<ffffffffc03b1ce2>] btree_gc_count_keys+0x3b/0x59 [bcache]
> [ 1639.204701]  [<ffffffffc03b5e44>] btree_gc_recurse+0x11b/0x2db [bcache]
> [ 1639.204705]  [<ffffffff8164ad9b>] ? __schedule+0x3b1/0x575
> [ 1639.204710]  [<ffffffffc03b27e5>] ? __bch_btree_mark_key+0xba/0x1a4 [bcache]
> [ 1639.204716]  [<ffffffffc03b63c9>] bch_btree_gc+0x246/0x3cc [bcache]
> [ 1639.204722]  [<ffffffffc03b63c9>] ? bch_btree_gc+0x246/0x3cc [bcache]
> [ 1639.204725]  [<ffffffff8108d0f8>] ? wake_up_atomic_t+0x2c/0x2c
> [ 1639.204731]  [<ffffffffc03b6586>] bch_gc_thread+0x37/0xea [bcache]
> [ 1639.204736]  [<ffffffffc03b654f>] ? bch_btree_gc+0x3cc/0x3cc [bcache]
> [ 1639.204741]  [<ffffffffc03b654f>] ? bch_btree_gc+0x3cc/0x3cc [bcache]
> [ 1639.204745]  [<ffffffff81075c36>] kthread+0xa5/0xad
> [ 1639.204747]  [<ffffffff81075b91>] ? kthread_parkme+0x24/0x24
> [ 1639.204750]  [<ffffffff8164decf>] ret_from_fork+0x3f/0x70
> [ 1639.204752]  [<ffffffff81075b91>] ? kthread_parkme+0x24/0x24
> [ 1639.246944] bcache: cache_set_free() Cache set fc8cd783-346b-48f5-a619-fb0380584aa9 unregistered

--
Eric Wheeler


> 
> That's not good.
> 
> What should I do now?
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/  
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-04  4:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-03 23:19 every boot gives: bcache/alloc.c:78 WARNING Marc MERLIN
2016-08-04  0:26 ` Eric Wheeler
2016-08-04  0:33   ` Marc MERLIN
2016-08-04  1:43     ` Eric Wheeler
2016-08-04  2:44       ` Eric Wheeler
2016-08-04  3:23         ` Marc MERLIN
2016-08-04  4:32           ` Eric Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).