From: Abelardo Ricart III <aricart@memnix.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: dm-devel@redhat.com, Brandon Smith <freedom@reardencode.com>,
linux-kernel@vger.kernel.org, Mike Snitzer <snitzer@redhat.com>
Subject: Re: Regression: Disk corruption with dm-crypt and kernels >= 4.0
Date: Tue, 02 Jun 2015 22:21:10 -0400 [thread overview]
Message-ID: <1433298070.5798.1.camel@memnix.com> (raw)
In-Reply-To: <alpine.LRH.2.02.1506021345310.22804@file01.intranet.prod.int.rdu2.redhat.com>
On Tue, 2015-06-02 at 13:51 -0400, Mikulas Patocka wrote:
>
> On Mon, 18 May 2015, Abelardo Ricart III wrote:
>
> > On Fri, 2015-05-15 at 08:04 -0700, Brandon Smith wrote:
> > > On 2015-05-01 (Fri) at 19:42:15 -0400, Abelardo Ricart III wrote:
> > > > > > The patchset in question was tested quite heavily so this is a
> > > > > > surprising report. I'm noticing you are opting in to dm-crypt
> discard
> > > > > > support. Have you tested without discards enabled?
> > > > >
> > > > > I've disabled discards universally and rebuilt a vanilla kernel. After
>
> > > > > running
> > > > > my heavy read-write-sync scripts, everything seems to be working fine
> now.
> > > > > I
> > > > > suppose this could be something that used to fail silently before, but
> now
> > > > > produces bad behavior? I seem to remember having something in my
> message
> > > > > log
> > > > > about "discards not supported on this device" when running with it
> enabled
> > > > > before.
> > > >
> > > > Forgive me, but I spoke too soon. The corruption and libata errors are
> still
> > > > there, as was evidenced when I went to reboot and got treated to an eye
> full
> > > > of
> > > > "read-only filesystem" and ata errors.
> > > >
> > > > So no, disabling discards unfortunately did nothing to help.
> > >
> > > I've been experiencing the same problem. Vanilla 4.0 series kernels,
> > > dm-crypt, with/or without discards, on a ThinkPad X1 Carbon with a
> > > LiteOn LGT-256M6G SSD.
> > >
> > > After some of googling around, I found some chatter relating to changes
> > > in NCQ on SSDs in 4.0. Been running w/o NCQ for a full kernel build so
> > > far without issue. Perhaps there's been some change in the interaction
> > > between dm-crypt and NCQ?
> > >
> > > Abelardo, can you try w/o NCQ and see if that helps your situation?
> > >
> > > Best,
> > >
> > > --Brandon
> >
> > I've been running with NCQ disabled and been stress testing for awhile and
> the
> > issue is indeed gone. Thanks for the workaround!
> >
> > So it seems the issue is somehow related to the combination of NCQ, dm
> -crypt,
> > and possibly (some?) SSDs.
>
> Hi
>
> I suspect that this is a bug in kernel NCQ processing or in SSD firmware
> and recent dm-crypt changes made the bug show up.
>
> I suggest this:
>
> If you have some test that reliably reproduces the bug, please do this:
> take kernel 3.19 or 3.18 and apply dm-crypt parallelization patches
> (commits f3396c58fd8442850e759843457d78b6ec3a9589,
> cf2f1abfbd0dba701f7f16ef619e4d2485de3366,
> 7145c241a1bf2841952c3e297c4080b357b3e52d,
> 94f5e0243c48aa01441c987743dc468e2d6eaca2,
> dc2676210c425ee8e5cb1bec5bc84d004ddf4179,
> 0f5d8e6ee758f7023e4353cca75d785b2d4f6abe,
> b3c5fd3052492f1b8d060799d4f18be5a5438add) on it. If the bug doesn't show
> up with the older kernel and dm-crypt parallelization patches, use git
> bisect to find out which patch broken NCQ. When you test a kernel with
> bisect, apply the above mentioned patches to it.
>
> Mikulas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Alright, I'll try this next and report back soon.
WARNING: multiple messages have this Message-ID (diff)
From: Abelardo Ricart III <aricart@memnix.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Brandon Smith <freedom@reardencode.com>,
Mike Snitzer <snitzer@redhat.com>,
dm-devel@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: Regression: Disk corruption with dm-crypt and kernels >= 4.0
Date: Tue, 02 Jun 2015 22:21:10 -0400 [thread overview]
Message-ID: <1433298070.5798.1.camel@memnix.com> (raw)
In-Reply-To: <alpine.LRH.2.02.1506021345310.22804@file01.intranet.prod.int.rdu2.redhat.com>
On Tue, 2015-06-02 at 13:51 -0400, Mikulas Patocka wrote:
>
> On Mon, 18 May 2015, Abelardo Ricart III wrote:
>
> > On Fri, 2015-05-15 at 08:04 -0700, Brandon Smith wrote:
> > > On 2015-05-01 (Fri) at 19:42:15 -0400, Abelardo Ricart III wrote:
> > > > > > The patchset in question was tested quite heavily so this is a
> > > > > > surprising report. I'm noticing you are opting in to dm-crypt
> discard
> > > > > > support. Have you tested without discards enabled?
> > > > >
> > > > > I've disabled discards universally and rebuilt a vanilla kernel. After
>
> > > > > running
> > > > > my heavy read-write-sync scripts, everything seems to be working fine
> now.
> > > > > I
> > > > > suppose this could be something that used to fail silently before, but
> now
> > > > > produces bad behavior? I seem to remember having something in my
> message
> > > > > log
> > > > > about "discards not supported on this device" when running with it
> enabled
> > > > > before.
> > > >
> > > > Forgive me, but I spoke too soon. The corruption and libata errors are
> still
> > > > there, as was evidenced when I went to reboot and got treated to an eye
> full
> > > > of
> > > > "read-only filesystem" and ata errors.
> > > >
> > > > So no, disabling discards unfortunately did nothing to help.
> > >
> > > I've been experiencing the same problem. Vanilla 4.0 series kernels,
> > > dm-crypt, with/or without discards, on a ThinkPad X1 Carbon with a
> > > LiteOn LGT-256M6G SSD.
> > >
> > > After some of googling around, I found some chatter relating to changes
> > > in NCQ on SSDs in 4.0. Been running w/o NCQ for a full kernel build so
> > > far without issue. Perhaps there's been some change in the interaction
> > > between dm-crypt and NCQ?
> > >
> > > Abelardo, can you try w/o NCQ and see if that helps your situation?
> > >
> > > Best,
> > >
> > > --Brandon
> >
> > I've been running with NCQ disabled and been stress testing for awhile and
> the
> > issue is indeed gone. Thanks for the workaround!
> >
> > So it seems the issue is somehow related to the combination of NCQ, dm
> -crypt,
> > and possibly (some?) SSDs.
>
> Hi
>
> I suspect that this is a bug in kernel NCQ processing or in SSD firmware
> and recent dm-crypt changes made the bug show up.
>
> I suggest this:
>
> If you have some test that reliably reproduces the bug, please do this:
> take kernel 3.19 or 3.18 and apply dm-crypt parallelization patches
> (commits f3396c58fd8442850e759843457d78b6ec3a9589,
> cf2f1abfbd0dba701f7f16ef619e4d2485de3366,
> 7145c241a1bf2841952c3e297c4080b357b3e52d,
> 94f5e0243c48aa01441c987743dc468e2d6eaca2,
> dc2676210c425ee8e5cb1bec5bc84d004ddf4179,
> 0f5d8e6ee758f7023e4353cca75d785b2d4f6abe,
> b3c5fd3052492f1b8d060799d4f18be5a5438add) on it. If the bug doesn't show
> up with the older kernel and dm-crypt parallelization patches, use git
> bisect to find out which patch broken NCQ. When you test a kernel with
> bisect, apply the above mentioned patches to it.
>
> Mikulas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Alright, I'll try this next and report back soon.
next prev parent reply other threads:[~2015-06-03 2:21 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-01 4:37 Regression: Disk corruption with dm-crypt and kernels >= 4.0 Abelardo Ricart III
2015-05-01 21:17 ` Mike Snitzer
2015-05-01 22:24 ` Abelardo Ricart III
2015-05-01 23:42 ` Abelardo Ricart III
2015-05-15 15:04 ` Brandon Smith
2015-05-18 14:36 ` Abelardo Ricart III
2015-06-02 17:51 ` Mikulas Patocka
2015-06-03 2:21 ` Abelardo Ricart III [this message]
2015-06-03 2:21 ` Abelardo Ricart III
2015-09-11 16:11 ` Mike Snitzer
2015-05-01 21:47 ` [dm-devel] " Alasdair G Kergon
2015-05-02 0:19 ` Abelardo Ricart III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1433298070.5798.1.camel@memnix.com \
--to=aricart@memnix.com \
--cc=dm-devel@redhat.com \
--cc=freedom@reardencode.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.