linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Alex Adriaanse <alex@oseberg.io>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Ongoing Btrfs stability issues
Date: Fri, 16 Feb 2018 09:40:49 +0200	[thread overview]
Message-ID: <57784be9-55b7-2465-3cfe-c693a55e44f7@suse.com> (raw)
In-Reply-To: <81FC1059-87DE-49DE-81D0-95C651A5A928@oseberg.io>



On 16.02.2018 06:54, Alex Adriaanse wrote:
> 
>> On Feb 15, 2018, at 2:42 PM, Nikolay Borisov <nborisov@suse.com> wrote:
>>
>> On 15.02.2018 21:41, Alex Adriaanse wrote:
>>>
>>>> On Feb 15, 2018, at 12:00 PM, Nikolay Borisov <nborisov@suse.com> wrote:
>>>>
>>>> So in all of the cases you are hitting some form of premature enospc.
>>>> There was a fix that landed in 4.15 that should have fixed a rather
>>>> long-standing issue with the way metadata reservations are satisfied,
>>>> namely:
>>>>
>>>> 996478ca9c46 ("btrfs: change how we decide to commit transactions during
>>>> flushing").
>>>>
>>>> That commit was introduced in 4.14.3 stable kernel. Since you are not
>>>> using upstream kernel I'd advise you check whether the respective commit
>>>> is contained in the kernel versions you are using.
>>>>
>>>> Other than that in the reports you mentioned there is one crash in
>>>> __del_reloc_root which looks rather interesting, at the very least it
>>>> shouldn't crash...
>>>
>>> I checked the Debian source code that's used for building the kernels that we run, and can confirm that both 4.14.7-1~bpo9+1 and 4.14.13-1~bpo9+1 contain the changes associated with the commit you referenced. So crash instances #2, #3, and #4 at https://bugzilla.kernel.org/show_bug.cgi?id=198787 were all running kernels that contain this fix already.
>>>
>>> Could it be that some on-disk data structures got (silently) corrupted while we were running pre-4.14.7 kernels, and the aforementioned fix doesn't address anything relating to damage that has already been done? If so, is there a way to detect and/or repair this for existing filesystems other than running a "btrfs check --repair" or rebuilding filesystems (both of which require a significant amount of downtime)?
>>
>> From the logs provided I can see only a single crash, the others are
>> just ENOSPC which can cause corruption due to delayed refs (in majority
>> of examples) not finishing. Is btrfs hosted on the EBS volume or on the
>> ephemeral storage of the instance? Is the EBS an ssd? If it's ssd are
>> you using an io scheduler for those ebs devices? You ca check what the
>> io scheduler for a device is by reading the following sysfs file:
>>
>> /sys/block/<disk device>/queue/scheduler
> 
> It's hosted on an EBS volume; we don't use ephemeral storage at all. The EBS volumes are all SSD. We didn't change the default schedulers on the VMs and it looks like it's using mq-deadline:
> 
> $ cat /sys/block/xvdc/queue/scheduler
> [mq-deadline] none

SO one thing I can advise to test is set the scheduler for that xvdc to
none. Next, I'ad advise you backport the following patch to your kernel:
https://github.com/kdave/btrfs-devel/commit/1b816c23e91f70603c532af52cccf17e68393682

then mount the filesystem with -o enospc_debug. And the next time an
enospc occurs additional info should be printed in dmesg with the state
of the space_info structure.

> 
> Alex--
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

  reply	other threads:[~2018-02-16  7:40 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-15 16:18 Ongoing Btrfs stability issues Alex Adriaanse
2018-02-15 18:00 ` Nikolay Borisov
2018-02-15 19:41   ` Alex Adriaanse
2018-02-15 20:42     ` Nikolay Borisov
2018-02-16  4:54       ` Alex Adriaanse
2018-02-16  7:40         ` Nikolay Borisov [this message]
2018-02-16 19:44 ` Austin S. Hemmelgarn
2018-02-17  3:03   ` Duncan
2018-02-17  4:34     ` Shehbaz Jaffer
2018-02-17 15:18       ` Hans van Kranenburg
2018-02-17 16:42         ` Shehbaz Jaffer
2018-03-01 19:04   ` Alex Adriaanse
2018-03-01 19:40     ` Nikolay Borisov
2018-03-02 17:29       ` Liu Bo
2018-03-08 17:40         ` Alex Adriaanse
2018-03-09  9:54           ` Nikolay Borisov
2018-03-09 19:05             ` Alex Adriaanse
2018-03-10 12:04               ` Nikolay Borisov
2018-03-10 14:29                 ` Christoph Anton Mitterer
2018-03-11 17:51                   ` Goffredo Baroncelli
2018-03-11 22:37                     ` Christoph Anton Mitterer
2018-03-12 21:22                       ` Goffredo Baroncelli
2018-03-12 21:48                         ` Christoph Anton Mitterer
2018-03-13 19:36                           ` Goffredo Baroncelli
2018-03-13 20:10                             ` Christoph Anton Mitterer
2018-03-14 12:02                             ` Austin S. Hemmelgarn
2018-03-14 18:39                               ` Goffredo Baroncelli
2018-03-14 19:27                                 ` Austin S. Hemmelgarn
2018-03-14 22:17                                   ` Goffredo Baroncelli
2018-03-13 13:47               ` Patrik Lundquist
2018-03-02  4:02     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57784be9-55b7-2465-3cfe-c693a55e44f7@suse.com \
    --to=nborisov@suse.com \
    --cc=alex@oseberg.io \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).