public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed
* bcache hangs with continuous write I/O to SSD device, bcache device stops working
@ 2013-03-22 12:34 Heiko Wundram
       [not found] ` <514C4FC1.6090804-EqIAFqbRPK3NLxjTenLetw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Heiko Wundram @ 2013-03-22 12:34 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hey!

I've checked out multiple trunks over the last two weeks of the current 
bcache repository (all 3.8.0+), and they all exhibit the same kind of 
broken behaviour: after some time (generally around 6 hours), I/O to the 
bcache device stops working (I have layered LVM on top of it, and those 
devices also stop working, naturally), and the SSD that's part of the 
cache set spins continuously at around 50MB/s I/O on write. It is 
irrelevant whether I enable discard or not. I've only tested this with 
writeback mode.

Is this a known problem?

-- 
--- Heiko.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bcache hangs with continuous write I/O to SSD device, bcache device stops working
       [not found] ` <514C4FC1.6090804-EqIAFqbRPK3NLxjTenLetw@public.gmane.org>
@ 2013-03-22 14:16   ` Dongsu Park
       [not found]     ` <20130322141613.GA29496-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Dongsu Park @ 2013-03-22 14:16 UTC (permalink / raw)
  To: Heiko Wundram; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On 22.03.2013 13:34, Heiko Wundram wrote:
> Hey!
> 
> I've checked out multiple trunks over the last two weeks of the
> current bcache repository (all 3.8.0+), and they all exhibit the
> same kind of broken behaviour: after some time (generally around 6
> hours), I/O to the bcache device stops working (I have layered LVM
> on top of it, and those devices also stop working, naturally), and
> the SSD that's part of the cache set spins continuously at around
> 50MB/s I/O on write. It is irrelevant whether I enable discard or
> not. I've only tested this with writeback mode.

Hi Heiko,

that sounds pretty much the same as what I experienced this week.
After finishing some write benchmarks on a bcache device, dirty data
blocks have to be synchronized to backing devices really slowly.
(20~30 MB/s in my case)
That sync job usually takes more than 5 minutes,
which makes end users unable to do anything.

My test environment:
kernel 3.4.23 with bcache patches backported from master.
Backing device is configured as MD-RAID10 on spindle drives.
Cache mode is writeback.

I've also tested with bcache-3.2,
but it was not much different from 3.4+.

Regards,
Dongsu

> 
> Is this a known problem?
> 
> -- 
> --- Heiko.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bcache hangs with continuous write I/O to SSD device, bcache device stops working
       [not found]     ` <20130322141613.GA29496-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-03-22 14:19       ` Heiko Wundram
       [not found]         ` <514C687C.4010304-EqIAFqbRPK3NLxjTenLetw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Heiko Wundram @ 2013-03-22 14:19 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Am 22.03.2013 15:16, schrieb Dongsu Park:
> that sounds pretty much the same as what I experienced this week.
> After finishing some write benchmarks on a bcache device, dirty data
> blocks have to be synchronized to backing devices really slowly.
> (20~30 MB/s in my case)
> That sync job usually takes more than 5 minutes,
> which makes end users unable to do anything.

In my case, the syncing was still going on after around 16 hours (I 
simply let the system run after it got locked in this state). As I've 
only got around 2GB of page cache in the corresponding system that was 
used as the test bed, most probably, the system wasn't (still) flushing 
data, but rather spinning on something else.

The process that took most CPU time was a kworker, which migrated 
between CPUs. If there's any sensible way to debug this (without 
availability of a serial console, alas...), I'd appreciate any hints.

-- 
--- Heiko.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bcache hangs with continuous write I/O to SSD device, bcache device stops working
       [not found]         ` <514C687C.4010304-EqIAFqbRPK3NLxjTenLetw@public.gmane.org>
@ 2013-03-27 19:17           ` Heiko Wundram
  0 siblings, 0 replies; 4+ messages in thread
From: Heiko Wundram @ 2013-03-27 19:17 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Am 22.03.2013 15:19, schrieb Heiko Wundram:
> In my case, the syncing was still going on after around 16 hours (I
> simply let the system run after it got locked in this state). As I've
> only got around 2GB of page cache in the corresponding system that was
> used as the test bed, most probably, the system wasn't (still) flushing
> data, but rather spinning on something else.

I'm not entirely sure what difference I made, but it seems that this 
problem has disappeared (hasn't surfaced over the last three days, 
whereas before it was a matter of hours).

I've updated the kernel to statically contain bcache and the DM/md 
infrastructure (which should be unrelated, except for being part of a 
stack of block devices). Anyway, that seems to have done the trick, in 
case anybody else is experiencing this.

-- 
--- Heiko.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-03-27 19:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-22 12:34 bcache hangs with continuous write I/O to SSD device, bcache device stops working Heiko Wundram
     [not found] ` <514C4FC1.6090804-EqIAFqbRPK3NLxjTenLetw@public.gmane.org>
2013-03-22 14:16   ` Dongsu Park
     [not found]     ` <20130322141613.GA29496-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-03-22 14:19       ` Heiko Wundram
     [not found]         ` <514C687C.4010304-EqIAFqbRPK3NLxjTenLetw@public.gmane.org>
2013-03-27 19:17           ` Heiko Wundram

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox