public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed
* bcache hang on suspend? sometimes?
@ 2013-08-06  5:34 Darrick J. Wong
       [not found] ` <20130806053403.GB7998-yuuUpGxbzT9UbpRmUfBrXUB+6BGkLq7r@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2013-08-06  5:34 UTC (permalink / raw)
  To: kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi,

Lately (i.e. 3.10.5) I thought I'd give bcache a try on my work laptop.  I
paired up a boring SATA SSD with one of those newfangled 4k "advanced format"
drives, and formatted the whole stack with dm-crypt + lvm + ext4 on top of
bcache.  Things were looking pretty good -- LKML loads much faster in mutt, and
all was well with the world, even suspend/resume worked fine.

Then I rebooted the machine.  After the reboot, the machine will hang every
time I suspend.  I set up netconsole and set no_console_suspend=1, but nothing
interesting showed up in dmesg.  I see that SCSI managed to flush the disks,
but everything seems to stop dead.  No lockup messages or anything.

Curiously, if I set up a bcache between that SSD and a 512-byte-sector old
school disk, suspend/resume seem fine even after a reboot.  My bcache test
machine also suspends/resumes just fine.  I tried simulating a 4k disk with
qemu to see if I could arrange an easier testcase, but I couldn't reproduce the
hang there either.  (Yes, I do have bcache debugging turned on.)

I'll keep plugging away on this as time permits, but I was wondering -- has
anyone else seen this before?  Is this my own little crazy party?  I was
careful to make sure everything on the AF drive lined up on a 4k alignment.

<shrug>

--D

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bcache hang on suspend? sometimes?
       [not found] ` <20130806053403.GB7998-yuuUpGxbzT9UbpRmUfBrXUB+6BGkLq7r@public.gmane.org>
@ 2013-08-06 20:56   ` Darrick J. Wong
       [not found]     ` <20130806205650.GA5878-yuuUpGxbzT9UbpRmUfBrXUB+6BGkLq7r@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2013-08-06 20:56 UTC (permalink / raw)
  To: kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Ok, I've narrowed it down a little bit --

It's not caused by any of the fixes that went into 3.10.[1-5].  It doesn't seem
to be specific to any particular storage bus, controller, disk, or even cache
set -- the same bcache'd usb stick will crash my laptop and not crash my test
box.  The 4k/512b sector thing was a red herring; you can ignore that.

The test box is a boring old Core2 box; the laptop is an Ivy Bridge.  I'll try
to enable more verbose PM debugging to see if I can determine what exactly's
going on at sleep time.  (Again, shooting in the dark...)

--D

On Mon, Aug 05, 2013 at 10:34:03PM -0700, Darrick J. Wong wrote:
> Hi,
> 
> Lately (i.e. 3.10.5) I thought I'd give bcache a try on my work laptop.  I
> paired up a boring SATA SSD with one of those newfangled 4k "advanced format"
> drives, and formatted the whole stack with dm-crypt + lvm + ext4 on top of
> bcache.  Things were looking pretty good -- LKML loads much faster in mutt, and
> all was well with the world, even suspend/resume worked fine.
> 
> Then I rebooted the machine.  After the reboot, the machine will hang every
> time I suspend.  I set up netconsole and set no_console_suspend=1, but nothing
> interesting showed up in dmesg.  I see that SCSI managed to flush the disks,
> but everything seems to stop dead.  No lockup messages or anything.
> 
> Curiously, if I set up a bcache between that SSD and a 512-byte-sector old
> school disk, suspend/resume seem fine even after a reboot.  My bcache test
> machine also suspends/resumes just fine.  I tried simulating a 4k disk with
> qemu to see if I could arrange an easier testcase, but I couldn't reproduce the
> hang there either.  (Yes, I do have bcache debugging turned on.)
> 
> I'll keep plugging away on this as time permits, but I was wondering -- has
> anyone else seen this before?  Is this my own little crazy party?  I was
> careful to make sure everything on the AF drive lined up on a 4k alignment.
> 
> <shrug>
> 
> --D
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bcache hang on suspend? sometimes?
       [not found]     ` <20130806205650.GA5878-yuuUpGxbzT9UbpRmUfBrXUB+6BGkLq7r@public.gmane.org>
@ 2013-08-06 21:02       ` Kent Overstreet
  2013-08-06 22:48         ` Darrick J. Wong
  2013-08-06 23:08         ` Darrick J. Wong
  0 siblings, 2 replies; 6+ messages in thread
From: Kent Overstreet @ 2013-08-06 21:02 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 06, 2013 at 01:56:50PM -0700, Darrick J. Wong wrote:
> Ok, I've narrowed it down a little bit --
> 
> It's not caused by any of the fixes that went into 3.10.[1-5].  It doesn't seem
> to be specific to any particular storage bus, controller, disk, or even cache
> set -- the same bcache'd usb stick will crash my laptop and not crash my test
> box.  The 4k/512b sector thing was a red herring; you can ignore that.

Ok, that makes more sense...

> The test box is a boring old Core2 box; the laptop is an Ivy Bridge.  I'll try
> to enable more verbose PM debugging to see if I can determine what exactly's
> going on at sleep time.  (Again, shooting in the dark...)

I just looked at the code and it appears there was a freezer fix that
didn't make it into 3.10 and should have. Can you try the
bcache-for-3.11 branch and see if that fixes it? If so, I'll get that
patch sent out for stable.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bcache hang on suspend? sometimes?
  2013-08-06 21:02       ` Kent Overstreet
@ 2013-08-06 22:48         ` Darrick J. Wong
  2013-08-06 23:08         ` Darrick J. Wong
  1 sibling, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2013-08-06 22:48 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 06, 2013 at 02:02:24PM -0700, Kent Overstreet wrote:
> On Tue, Aug 06, 2013 at 01:56:50PM -0700, Darrick J. Wong wrote:
> > Ok, I've narrowed it down a little bit --
> > 
> > It's not caused by any of the fixes that went into 3.10.[1-5].  It doesn't seem
> > to be specific to any particular storage bus, controller, disk, or even cache
> > set -- the same bcache'd usb stick will crash my laptop and not crash my test
> > box.  The 4k/512b sector thing was a red herring; you can ignore that.
> 
> Ok, that makes more sense...
> 
> > The test box is a boring old Core2 box; the laptop is an Ivy Bridge.  I'll try
> > to enable more verbose PM debugging to see if I can determine what exactly's
> > going on at sleep time.  (Again, shooting in the dark...)
> 
> I just looked at the code and it appears there was a freezer fix that
> didn't make it into 3.10 and should have. Can you try the
> bcache-for-3.11 branch and see if that fixes it? If so, I'll get that
> patch sent out for stable.

Oddly(!), setting CONFIG_INTEL_MEI=n fixed the suspend problem.  I see that mei
and ahci both wound up mapped to irq 47 on the two failing machines, which
makes me suspicious.

I'll try your branch to see if it fixes it too.

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bcache hang on suspend? sometimes?
  2013-08-06 21:02       ` Kent Overstreet
  2013-08-06 22:48         ` Darrick J. Wong
@ 2013-08-06 23:08         ` Darrick J. Wong
       [not found]           ` <20130806230851.GA10118-yuuUpGxbzT9UbpRmUfBrXUB+6BGkLq7r@public.gmane.org>
  1 sibling, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2013-08-06 23:08 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 06, 2013 at 02:02:24PM -0700, Kent Overstreet wrote:
> On Tue, Aug 06, 2013 at 01:56:50PM -0700, Darrick J. Wong wrote:
> > Ok, I've narrowed it down a little bit --
> > 
> > It's not caused by any of the fixes that went into 3.10.[1-5].  It doesn't seem
> > to be specific to any particular storage bus, controller, disk, or even cache
> > set -- the same bcache'd usb stick will crash my laptop and not crash my test
> > box.  The 4k/512b sector thing was a red herring; you can ignore that.
> 
> Ok, that makes more sense...
> 
> > The test box is a boring old Core2 box; the laptop is an Ivy Bridge.  I'll try
> > to enable more verbose PM debugging to see if I can determine what exactly's
> > going on at sleep time.  (Again, shooting in the dark...)
> 
> I just looked at the code and it appears there was a freezer fix that
> didn't make it into 3.10 and should have. Can you try the
> bcache-for-3.11 branch and see if that fixes it? If so, I'll get that
> patch sent out for stable.

The suspend problem seems to be gone with the for-3.11 branch.  The branch
seems to be based off 3.10-rc7... is that correct?

Also, is there any way to figure out how much of a cache device is actively
holding cached data?

--D

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bcache hang on suspend? sometimes?
       [not found]           ` <20130806230851.GA10118-yuuUpGxbzT9UbpRmUfBrXUB+6BGkLq7r@public.gmane.org>
@ 2013-08-06 23:12             ` Kent Overstreet
  0 siblings, 0 replies; 6+ messages in thread
From: Kent Overstreet @ 2013-08-06 23:12 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Tue, Aug 06, 2013 at 04:08:52PM -0700, Darrick J. Wong wrote:
> On Tue, Aug 06, 2013 at 02:02:24PM -0700, Kent Overstreet wrote:
> > On Tue, Aug 06, 2013 at 01:56:50PM -0700, Darrick J. Wong wrote:
> > > Ok, I've narrowed it down a little bit --
> > > 
> > > It's not caused by any of the fixes that went into 3.10.[1-5].  It doesn't seem
> > > to be specific to any particular storage bus, controller, disk, or even cache
> > > set -- the same bcache'd usb stick will crash my laptop and not crash my test
> > > box.  The 4k/512b sector thing was a red herring; you can ignore that.
> > 
> > Ok, that makes more sense...
> > 
> > > The test box is a boring old Core2 box; the laptop is an Ivy Bridge.  I'll try
> > > to enable more verbose PM debugging to see if I can determine what exactly's
> > > going on at sleep time.  (Again, shooting in the dark...)
> > 
> > I just looked at the code and it appears there was a freezer fix that
> > didn't make it into 3.10 and should have. Can you try the
> > bcache-for-3.11 branch and see if that fixes it? If so, I'll get that
> > patch sent out for stable.
> 
> The suspend problem seems to be gone with the for-3.11 branch. 

Good!

> The branch
> seems to be based off 3.10-rc7... is that correct?

Yeah, that's just where it was when I sent Jens the pull request.

> Also, is there any way to figure out how much of a cache device is actively
> holding cached data?

<cache set dir>/cache0/priority_stats

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-08-06 23:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-06  5:34 bcache hang on suspend? sometimes? Darrick J. Wong
     [not found] ` <20130806053403.GB7998-yuuUpGxbzT9UbpRmUfBrXUB+6BGkLq7r@public.gmane.org>
2013-08-06 20:56   ` Darrick J. Wong
     [not found]     ` <20130806205650.GA5878-yuuUpGxbzT9UbpRmUfBrXUB+6BGkLq7r@public.gmane.org>
2013-08-06 21:02       ` Kent Overstreet
2013-08-06 22:48         ` Darrick J. Wong
2013-08-06 23:08         ` Darrick J. Wong
     [not found]           ` <20130806230851.GA10118-yuuUpGxbzT9UbpRmUfBrXUB+6BGkLq7r@public.gmane.org>
2013-08-06 23:12             ` Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox