public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Marc Lehmann <schmorp@schmorp.de>
To: james harvey <jamespharvey20@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Chris Mason <clm@fb.com>, Michal Hocko <mhocko@suse.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	bugzilla-daemon@bugzilla.kernel.org,
	bugzilla.kernel.org@plan9.de,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>,
	linux-mm@kvack.org, Jan Kara <jack@suse.cz>
Subject: Re: [Bug 199931] New: systemd/rtorrent file data corruption when using echo 3 >/proc/sys/vm/drop_caches
Date: Wed, 6 Jun 2018 21:06:35 +0200	[thread overview]
Message-ID: <20180606190635.meodcz3mchhtqprb@schmorp.de> (raw)
In-Reply-To: <CA+X5Wn5_iJYS9MLFdArG9sDHQO2n=BkZmaYAOexhdoVc+tQnmw@mail.gmail.com>

On Tue, Jun 05, 2018 at 05:52:38PM -0400, james harvey <jamespharvey20@gmail.com> wrote:
> >> This is not always reproducible, but when deleting our journal, creating log
> >> messages for a few hours and then doing the above manually has a ~50% chance of
> >> corrupting the journal.
> ...
> 
> My strong bet is you have a hardware issue.

Strange, what kind of harwdare bug would affect multiple very different
computers in exactly the same way?

> going bad, bad cables, bad port, etc.  My strong bet is you're also
> using BTRFS mirroring.

Not sure what exactly you mean with btrfs mirroring (there are many btrfs
features this could refer to), but the closest thing to that that I use is
dup for metadata (which is always checksummed), data is always single. All
btrfs filesystems are on lvm (not mirrored), and most (but not all) are
encrypted. One affected fs is on a hardware raid controller, one is on an
ssd. I have a single btrfs fs in that box with raid1 for metadata, as an
experiment, but I haven't used it for testing yet.

> You're describing intermittent data corruption on files that I'm
> thinking all have NOCOW turned on.

The systemd journal files are nocow (I re-enabled that after I turned it
off for a while), but the rtorrent directory (and the files in it) are
not.

I did experiment (a year ago) with nocow for torrent files and, more
importantly, vm images, but it didn't really solve the "millions of
fragments slow down" problem with btrfs, so I figured I can keep them cow
and regularly copy them to defragment them. Thats why I am quite sure cow
is switched on long before I booted my first 4.14 kernel (and it still
is).

> it's done writing to a journal file, but in a way that guarantees it
> to fail.  This has been reported to systemd at
> https://github.com/systemd/systemd/issues/9112 but poettering has

I am aware that systemd tries to turn on nocow, and I think this is actually
a bug, but this wouldn't have an an effect on rtorrent, which has corruption
problems on a different fs. And boy would it be wonderufl if Debian switched
away form systemd, I feel I personally ran into every single bug that
exists...

However, no matter how much systemd plays with btrfs flags, it shouldn't
corrupt data.

> The context I ran into this problem was with several other bugs
> interacting, that "btrfs replace" has been guaranteed to corrupt
> non-checksummed (NOCOW) compressed data, which the combination of
> those shouldn't happen, but does in some defragmentation situations
> due to another bug.  In my situation, I don't have a hardware issue.

Yeah, btrfs is full of bugs that I constantly run into, but most of them
are containable, unlikely this problem, which might or might not be a
btrfs bug - especially since all your bets seem to be wrong here.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

  reply	other threads:[~2018-06-06 19:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-199931-27@https.bugzilla.kernel.org/>
2018-06-05 20:03 ` [Bug 199931] New: systemd/rtorrent file data corruption when using echo 3 >/proc/sys/vm/drop_caches Andrew Morton
2018-06-05 21:22   ` Tetsuo Handa
2018-06-05 21:38     ` Andrew Morton
2018-06-05 21:52   ` james harvey
2018-06-06 19:06     ` Marc Lehmann [this message]
2018-06-06 20:33       ` james harvey
2018-06-08  7:18       ` Duncan
2018-06-06  0:18   ` Chris Mason
2018-06-06 13:38     ` Liu Bo
2018-06-06 13:44       ` Chris Mason
2018-06-06 13:55         ` Liu Bo
2018-06-06  8:45   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180606190635.meodcz3mchhtqprb@schmorp.de \
    --to=schmorp@schmorp.de \
    --cc=akpm@linux-foundation.org \
    --cc=bugzilla-daemon@bugzilla.kernel.org \
    --cc=bugzilla.kernel.org@plan9.de \
    --cc=clm@fb.com \
    --cc=jack@suse.cz \
    --cc=jamespharvey20@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox