From: Marc MERLIN <marc@merlins.org>
To: Eric Wheeler <bcache@lists.ewheeler.net>
Cc: Richard Bade <hitrich@gmail.com>, linux-bcache@vger.kernel.org
Subject: Re: Bcache still unstable for me (memory problems)
Date: Wed, 9 Mar 2016 18:15:16 -0800 [thread overview]
Message-ID: <20160310021516.GM14112@merlins.org> (raw)
In-Reply-To: <alpine.LRH.2.11.1603100125110.13488@mail.ewheeler.net>
On Thu, Mar 10, 2016 at 01:34:56AM +0000, Eric Wheeler wrote:
> Hi Richard, Marc,
>
> >>> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
>
> Do you still have the backtraces that show the function call stack for
> errors that look like this?
> %s: page allocation failure: order:%d, mode:0x%x
>
> Please send as many relevant OOM failure traces that you can. I would
> like to see which memory allocation(s) are failing and if they are always
> the same stack trace.
It's the same one I already sent you, just from syslog instead of serial
console (I was looking for other relevant cronjobs or errors per your
request)
> In the example above, order 7 means 2^7 of 4k pages, so it means the
> kernel can't find 512k of contiguous memory that can be allocated.
>
> It looks like the OOM is triggered in bch_cache_set_alloc, but might be
> cache_alloc too. I'm not sure if an alternate allocation mechanism can be
> used safely, but thats what I want to look into.
That was before your patches of course, so I'll report back further
crashes if any.
By the way, slightly related question. If I have a slightly hung system
that will not reboot with 'reboot', if I use sysrq - e + u + s + b, I
get:
[213056.198133] sysrq: SysRq : Emergency Remount R/O
[213058.266112] sysrq: SysRq : Emergency Sync
[213061.704158] sysrq: SysRq : Resetting
[213061.716559] ACPI MEMORY or I/O RESET_REG.
This does not properly stop bcache (I believe) or sw raid, or flush
things properly.
Instead of 'b', I usually use 'o', it does properly shut everything
down, flush all IO and everything, but then also turns off my machine,
and I have to rely on wake on lan to bring it back up, which mostly
works, until maybe it won't one day :)
'o' gives me the much reassuring:
[ 1744.758691] sysrq: SysRq : Emergency Remount R/O
[ 1745.867719] sysrq: SysRq : Emergency Sync
[ 1747.482890] sysrq: SysRq : Power Off
[ 1754.242984] Emergency Remount complete
[ 1758.535234] bcache: bcache_reboot() Stopping all devices:
[ 1758.551562] bcache: bcache_device_free() bcache0 stopped
[ 1760.539050] bcache: bcache_reboot() Timeout waiting for devices to be closed
[ 1760.560249] kvm: exiting hardware virtualization
[ 1760.574844] sd 17:0:0:0: [sdr] Synchronizing SCSI cache
[ 1760.590730] sd 17:0:0:0: [sdr] Stopping disk
[ 1760.891076] sd 16:0:0:0: [sdq] Synchronizing SCSI cache
[ 1760.911070] sd 16:0:0:0: [sdq] Stopping disk
[ 1761.219149] sd 15:0:0:0: [sdp] Synchronizing SCSI cache
[ 1761.235053] sd 15:0:0:0: [sdp] Stopping disk
[ 1761.535120] sd 14:0:0:0: [sdo] Synchronizing SCSI cache
[ 1761.555095] sd 14:0:0:0: [sdo] Stopping disk
[ 1761.855112] sd 13:0:0:0: [sdn] Synchronizing SCSI cache
[ 1761.870920] sd 13:0:0:0: [sdn] Stopping disk
[ 1762.751983] sd 11:4:0:0: [sdm] Synchronizing SCSI cache
[ 1762.767882] sd 11:4:0:0: [sdm] Stopping disk
[ 1763.191203] sd 11:3:0:0: [sdl] Synchronizing SCSI cache
[ 1763.207428] sd 11:3:0:0: [sdl] Stopping disk
[ 1763.631534] sd 11:2:0:0: [sdk] Synchronizing SCSI cache
[ 1763.647524] sd 11:2:0:0: [sdk] Stopping disk
[ 1764.071512] sd 11:1:0:0: [sdj] Synchronizing SCSI cache
[ 1764.087396] sd 11:1:0:0: [sdj] Stopping disk
[ 1764.510467] sd 11:0:0:0: [sdi] Synchronizing SCSI cache
[ 1764.526819] sd 11:0:0:0: [sdi] Stopping disk
[ 1764.950319] sd 9:0:0:0: [sdh] Synchronizing SCSI cache
[ 1764.966079] sd 9:0:0:0: [sdh] Stopping disk
[ 1765.960508] sd 8:0:0:0: [sdg] Synchronizing SCSI cache
[ 1765.978370] sd 8:0:0:0: [sdg] Stopping disk
[ 1766.278896] r8169 0000:05:00.0: System wakeup enabled by ACPI
[ 1766.442869] sd 3:0:0:0: [sdf] Synchronizing SCSI cache
[ 1766.519912] sd 3:0:0:0: [sdf] Stopping disk
[ 1767.014799] sd 2:0:0:0: [sde] Synchronizing SCSI cache
[ 1767.042979] sd 2:0:0:0: [sde] Stopping disk
[ 1767.864325] sd 1:0:1:0: [sdd] Synchronizing SCSI cache
[ 1767.976656] sd 1:0:1:0: [sdd] Stopping disk
[ 1768.754903] sd 1:0:0:0: [sdc] Synchronizing SCSI cache
[ 1770.197116] sd 1:0:0:0: [sdc] Stopping disk
[ 1771.084250] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[ 1771.125229] sd 0:0:1:0: [sdb] Stopping disk
[ 1771.558552] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 1771.574145] sd 0:0:0:0: [sda] Stopping disk
[ 1772.008787] ACPI: Preparing to enter system sleep state S5
[ 1772.026660] reboot: Power down
[ 1772.037064] acpi_power_off called
Is there another way to get a proper flush of everything and still
reboot instead of powering off?
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
next prev parent reply other threads:[~2016-03-10 2:15 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-08 23:04 Bcache still unstable for me (memory problems) Richard Bade
2016-03-09 3:51 ` Eric Wheeler
2016-03-10 1:34 ` Eric Wheeler
2016-03-10 2:15 ` Marc MERLIN [this message]
2016-03-10 7:07 ` sysrq-o for safe bcache shutdown, what about reboot? Eric Wheeler
2016-03-10 17:46 ` Marc MERLIN
2016-03-10 15:29 ` Bcache still unstable for me (memory problems) Marc MERLIN
2016-03-10 15:41 ` Christoph Nelles
2016-03-10 15:47 ` Marc MERLIN
2016-03-24 21:25 ` Marc MERLIN
2016-03-25 3:52 ` Eric Wheeler
2016-03-21 0:05 ` Richard Bade
2016-03-21 0:46 ` Marc MERLIN
2016-03-21 0:52 ` Richard Bade
2016-03-25 3:59 ` Eric Wheeler
-- strict thread matches above, loose matches on Subject: below --
2016-03-07 20:35 [PATCH] " Eric Wheeler
2016-03-07 14:45 Marc MERLIN
2016-03-07 19:56 ` Eric Wheeler
2016-03-08 23:52 ` Marc MERLIN
2016-03-09 3:59 ` Eric Wheeler
2016-03-09 20:55 ` Marc MERLIN
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160310021516.GM14112@merlins.org \
--to=marc@merlins.org \
--cc=bcache@lists.ewheeler.net \
--cc=hitrich@gmail.com \
--cc=linux-bcache@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).