linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Regression in 4.14: wrong data being read from bcache device
       [not found] ` <CAPerZE-KiYVWXtBsu+SR0wCgbSJgzZAufm+3NYdoybjq+6ySbQ@mail.gmail.com>
@ 2017-11-16 19:48   ` Michael Lyle
  2017-11-16 20:31     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Lyle @ 2017-11-16 19:48 UTC (permalink / raw)
  To: Campbell Steven, Pavel Goran
  Cc: linux-bcache, linux-block@vger.kernel.org, Kent Overstreet

Hi Campbell & Pavel---

On 11/16/2017 11:36 AM, Campbell Steven wrote:
> On 16 November 2017 at 21:28, Pavel Goran <via-bcache@pvgoran.name> wrote:
>> Hello list,
>>
>> I encountered a severe problem when trying to switch to kernel version 4.14.
>> In short, reads from the bcache device produce different data in 4.14 and
>> 4.13.

Thanks for the report.  I've heard about this once on the #bcache IRC
channel too, so it seems this is a real problem, though I've not
encountered it in my testing yet.

This is just a note to let you all know that I'm looking at this and
will be seeking a clean repro that's not too painful to bisect to
determine what's going on.

Most of the 4.14 work predates my involvement on bcache, so I'm coming
up to speed on it.  That said, it looks pretty boring/safe within
bcache--- and where there's functional change it's almost all in the
write path which wouldn't be relevant here.  So I'm somewhat fearful
that there is an interaction with something else within the block layer.

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression in 4.14: wrong data being read from bcache device
  2017-11-16 19:48   ` Regression in 4.14: wrong data being read from bcache device Michael Lyle
@ 2017-11-16 20:31     ` Jens Axboe
  2017-11-16 23:02       ` Michael Lyle
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2017-11-16 20:31 UTC (permalink / raw)
  To: Michael Lyle, Campbell Steven, Pavel Goran
  Cc: linux-bcache, linux-block@vger.kernel.org, Kent Overstreet

On 11/16/2017 12:48 PM, Michael Lyle wrote:
> Hi Campbell & Pavel---
> 
> On 11/16/2017 11:36 AM, Campbell Steven wrote:
>> On 16 November 2017 at 21:28, Pavel Goran <via-bcache@pvgoran.name> wrote:
>>> Hello list,
>>>
>>> I encountered a severe problem when trying to switch to kernel version 4.14.
>>> In short, reads from the bcache device produce different data in 4.14 and
>>> 4.13.
> 
> Thanks for the report.  I've heard about this once on the #bcache IRC
> channel too, so it seems this is a real problem, though I've not
> encountered it in my testing yet.
> 
> This is just a note to let you all know that I'm looking at this and
> will be seeking a clean repro that's not too painful to bisect to
> determine what's going on.
> 
> Most of the 4.14 work predates my involvement on bcache, so I'm coming
> up to speed on it.  That said, it looks pretty boring/safe within
> bcache--- and where there's functional change it's almost all in the
> write path which wouldn't be relevant here.  So I'm somewhat fearful
> that there is an interaction with something else within the block layer.

The 4.14 block layer changes were pretty boring and uneventful, so that'd
be surprising as well.

I'd suggest just doing a bisect between 4.13 and 4.14, sounds like the
issue is trivially reproducible.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression in 4.14: wrong data being read from bcache device
  2017-11-16 20:31     ` Jens Axboe
@ 2017-11-16 23:02       ` Michael Lyle
  2017-11-17  3:07         ` Michael Lyle
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Lyle @ 2017-11-16 23:02 UTC (permalink / raw)
  To: Jens Axboe, Campbell Steven, Pavel Goran
  Cc: linux-bcache, linux-block@vger.kernel.org, Kent Overstreet

On 11/16/2017 12:31 PM, Jens Axboe wrote:> I'd suggest just doing a
bisect between 4.13 and 4.14, sounds like the> issue is trivially
reproducible.
Took me awhile to get a repro but I have one now.  Will start bisecting
tonight.

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression in 4.14: wrong data being read from bcache device
  2017-11-16 23:02       ` Michael Lyle
@ 2017-11-17  3:07         ` Michael Lyle
  2017-11-17 15:08           ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Lyle @ 2017-11-17  3:07 UTC (permalink / raw)
  To: Jens Axboe, Campbell Steven, Pavel Goran
  Cc: linux-bcache, linux-block@vger.kernel.org, Kent Overstreet, hch

Jens, Christoph & everyone--

Hi-- I believe I have a general idea what's going on.

On Thu, Nov 16, 2017 at 3:02 PM, Michael Lyle <mlyle@lyle.org> wrote:
> On 11/16/2017 12:31 PM, Jens Axboe wrote:
> > I'd suggest just doing a bisect between 4.13 and 4.14, sounds like the
> > issue is trivially reproducible.
> Took me awhile to get a repro but I have one now.  Will start bisecting
> tonight.

It looks like the regression came from commit is
74d46992e0d9dee7f1f376de0d56d31614c8a17a (3/3 repros with it, 0/3
repros without it and hacked to build).

I don't have time to analyze this tonight as I have a couple other
things to do, but will look at fully analyzing it tomorrow morning.

I think the probable cause is this: we construct bios based on
previously completed bios in a few places, but blk_partition_remap
sets the partno to 0 and adjusts the disk offset.  It seems like a lot
of potential uses of bio_copy_dev could be dangerous for this reason.

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression in 4.14: wrong data being read from bcache device
  2017-11-17  3:07         ` Michael Lyle
@ 2017-11-17 15:08           ` Christoph Hellwig
  2017-11-17 17:04             ` Michael Lyle
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2017-11-17 15:08 UTC (permalink / raw)
  To: Michael Lyle
  Cc: Jens Axboe, Campbell Steven, Pavel Goran, linux-bcache,
	linux-block@vger.kernel.org, Kent Overstreet, hch

On Thu, Nov 16, 2017 at 07:07:40PM -0800, Michael Lyle wrote:
> I think the probable cause is this: we construct bios based on
> previously completed bios in a few places,

That is an extremely bad idea in many ways, so I think we'll need
to fix this as the priority.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Regression in 4.14: wrong data being read from bcache device
  2017-11-17 15:08           ` Christoph Hellwig
@ 2017-11-17 17:04             ` Michael Lyle
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Lyle @ 2017-11-17 17:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Campbell Steven, Pavel Goran, linux-bcache,
	linux-block@vger.kernel.org, Kent Overstreet

On Fri, Nov 17, 2017 at 7:08 AM, Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Nov 16, 2017 at 07:07:40PM -0800, Michael Lyle wrote:
>> I think the probable cause is this: we construct bios based on
>> previously completed bios in a few places,
>
> That is an extremely bad idea in many ways, so I think we'll need
> to fix this as the priority.

Yah, sorry, this wasn't actually the case.  I was sleep deprived and
hadn't been able to follow it all yet.

Bart points out that there's many remaining cases of assignment of
disk where partno isn't assigned as well-- so I wonder if there are
other potential problems from this.

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-17 17:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1321800569.20171116152838@pvgoran.name>
     [not found] ` <CAPerZE-KiYVWXtBsu+SR0wCgbSJgzZAufm+3NYdoybjq+6ySbQ@mail.gmail.com>
2017-11-16 19:48   ` Regression in 4.14: wrong data being read from bcache device Michael Lyle
2017-11-16 20:31     ` Jens Axboe
2017-11-16 23:02       ` Michael Lyle
2017-11-17  3:07         ` Michael Lyle
2017-11-17 15:08           ` Christoph Hellwig
2017-11-17 17:04             ` Michael Lyle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).