dm-cache: dm-3.14-fixes-4

All of lore.kernel.org
 help / color / mirror / Atom feed

* dm-cache: dm-3.14-fixes-4
@ 2014-03-17 13:43 George .
  2014-03-17 14:01 ` Mike Snitzer
  0 siblings, 1 reply; 6+ messages in thread
From: George . @ 2014-03-17 13:43 UTC (permalink / raw)
  To: dm-devel

[-- Attachment #1.1: Type: text/plain, Size: 950 bytes --]

Hi,

In dm-3.14-fixes-4, there is a description that :

- fix corruption with >2TB fast device due to truncation bug
But looking at the diffidence I can't find anything related to such bug.

I'm asking this, because we are trying to use dm-cache on machine with 2.4
TB SDD cache and after I took following fix:

dm-3.14-fixes-1
dm cache: fix truncation bug when mapping I/O to >2TB fast device
dm-3.14-fixes-1<http://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/tag/?id=dm-3.14-fixes-1>

our cached device got corrupted again.

My question is: is there another truncation bug discovered?

I've back ported  dm-3.14-fixes-1 to 3.11.10 kernel, because when we tested
v3.14-rc5<http://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/tag/?id=v3.14-rc5>
-
cached device was corrupted after ~15 minutes and seems to be more
unstable.

Meanwhile I'll try to find what happens on >2TB border and eventually will
fix it.

[-- Attachment #1.2: Type: text/html, Size: 2877 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dm-cache: dm-3.14-fixes-4
  2014-03-17 13:43 dm-cache: dm-3.14-fixes-4 George .
@ 2014-03-17 14:01 ` Mike Snitzer
  2014-03-17 14:30   ` George .
  0 siblings, 1 reply; 6+ messages in thread
From: Mike Snitzer @ 2014-03-17 14:01 UTC (permalink / raw)
  To: George .; +Cc: Heinz Mauelshagen, dm-devel

On Mon, Mar 17 2014 at  9:43am -0400,
George . <george@ucdn.com> wrote:

> Hi,
> 
> In dm-3.14-fixes-4, there is a description that :
> 
> - fix corruption with >2TB fast device due to truncation bug
> But looking at the diffidence I can't find anything related to such bug.

Commit 8b9d96666529 ("dm cache: fix truncation bug when copying a block
to/from >2TB fast device") follows the same pattern as commit e0d849fad7
("dm cache: fix truncation bug when mapping I/O to >2TB fast device").
Which is that from_cblock() only returns a 32bit value, so any 64bit
math operation must use a type that can accomodate 64bit.  That is why
an intermediate sector_t value is now used in both commits.

> I'm asking this, because we are trying to use dm-cache on machine with 2.4
> TB SDD cache and after I took following fix:
> 
> dm-3.14-fixes-1
> dm cache: fix truncation bug when mapping I/O to >2TB fast device
> dm-3.14-fixes-1<http://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/tag/?id=dm-3.14-fixes-1>
> 
> our cached device got corrupted again.

Commit e0d849fad7 wouldn't have been the cause.  If you didn't also
apply 8b9d96666529 then you could have hit that one.

> My question is: is there another truncation bug discovered?

Yeah, both the above referenced commits (commit 8b9d96666529 being the
most recent).

> I've back ported  dm-3.14-fixes-1 to 3.11.10 kernel, because when we tested
> v3.14-rc5<http://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/tag/?id=v3.14-rc5>
> -
> cached device was corrupted after ~15 minutes and seems to be more
> unstable.

OK, well upstream dm-cache saw very little change for 3.14.  Just a
handful of bug fixes.  So you're likely hitting an outstanding bug that
we've yet to fix.  One issue that is being actively pursued is the
thought that discards could be contributing to corruption.  Heinz will
have an update on this line of discovery soon.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: dm-cache: dm-3.14-fixes-4
  2014-03-17 14:01 ` Mike Snitzer
@ 2014-03-17 14:30   ` George .
  2014-03-17 23:51     ` multipath prio issues Ross Anderson
  0 siblings, 1 reply; 6+ messages in thread
From: George . @ 2014-03-17 14:30 UTC (permalink / raw)
  To: Mike Snitzer, dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 2323 bytes --]

Hi Mike,

Thank you a lot for your very fast response. I'll backport  commit
e0d849fad7 to 3.11.10 and will let you know how dm-cache behaves on
large cache SSD. Probably I will check the code for other truncation
issues.



On Mon, Mar 17, 2014 at 4:01 PM, Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, Mar 17 2014 at  9:43am -0400,
> George . <george@ucdn.com> wrote:
>
> > Hi,
> >
> > In dm-3.14-fixes-4, there is a description that :
> >
> > - fix corruption with >2TB fast device due to truncation bug
> > But looking at the diffidence I can't find anything related to such bug.
>
> Commit 8b9d96666529 ("dm cache: fix truncation bug when copying a block
> to/from >2TB fast device") follows the same pattern as commit e0d849fad7
> ("dm cache: fix truncation bug when mapping I/O to >2TB fast device").
> Which is that from_cblock() only returns a 32bit value, so any 64bit
> math operation must use a type that can accomodate 64bit.  That is why
> an intermediate sector_t value is now used in both commits.
>
> > I'm asking this, because we are trying to use dm-cache on machine with
> 2.4
> > TB SDD cache and after I took following fix:
> >
> > dm-3.14-fixes-1
> > dm cache: fix truncation bug when mapping I/O to >2TB fast device
> > dm-3.14-fixes-1<
> http://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/tag/?id=dm-3.14-fixes-1
> >
> >
> > our cached device got corrupted again.
>
> Commit e0d849fad7 wouldn't have been the cause.  If you didn't also
> apply 8b9d96666529 then you could have hit that one.
>
> > My question is: is there another truncation bug discovered?
>
> Yeah, both the above referenced commits (commit 8b9d96666529 being the
> most recent).
>
> > I've back ported  dm-3.14-fixes-1 to 3.11.10 kernel, because when we
> tested
> > v3.14-rc5<
> http://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/tag/?id=v3.14-rc5
> >
> > -
> > cached device was corrupted after ~15 minutes and seems to be more
> > unstable.
>
> OK, well upstream dm-cache saw very little change for 3.14.  Just a
> handful of bug fixes.  So you're likely hitting an outstanding bug that
> we've yet to fix.  One issue that is being actively pursued is the
> thought that discards could be contributing to corruption.  Heinz will
> have an update on this line of discovery soon.
>

[-- Attachment #1.2: Type: text/html, Size: 3403 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 6+ messages in thread

* multipath prio issues
  2014-03-17 14:30   ` George .
@ 2014-03-17 23:51     ` Ross Anderson
  2014-03-18  6:55       ` Bart Van Assche
  0 siblings, 1 reply; 6+ messages in thread
From: Ross Anderson @ 2014-03-17 23:51 UTC (permalink / raw)
  To: dm-devel

Greetings,

I would appreciate support correcting another issues. We have two scst 
storage arrays configured  with drbd replication and alua settings via 
targets. When we run sg_inq and sg_rtpg we see the preferred port. 
Multipath however doesn't seem to respect the priority path. We then 
tried setting active and nonoptimized on the two storage arrays without 
success. If I'm reading this correctly the devices are reporting their 
state to the multipath system however it's not respecting the values. 
I'd appreciate any assistance on the matter. Pastebin all the 
appropriate .conf and command outs.

Thanks

Ross Anderson

http://pastebin.com/pp72YBXw

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: multipath prio issues
  2014-03-17 23:51     ` multipath prio issues Ross Anderson
@ 2014-03-18  6:55       ` Bart Van Assche
  2014-03-18 16:13         ` Ross Anderson
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2014-03-18  6:55 UTC (permalink / raw)
  To: Ross Anderson, device-mapper development

On 03/18/14 00:51, Ross Anderson wrote:
> I would appreciate support correcting another issues. We have two scst
> storage arrays configured  with drbd replication and alua settings via
> targets. When we run sg_inq and sg_rtpg we see the preferred port.
> Multipath however doesn't seem to respect the priority path. We then
> tried setting active and nonoptimized on the two storage arrays without
> success. If I'm reading this correctly the devices are reporting their
> state to the multipath system however it's not respecting the values.
> I'd appreciate any assistance on the matter. Pastebin all the
> appropriate .conf and command outs.

Hello Ross,

Have you already tried to change "prio group_by_prio" into "prio alua" ?

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: multipath prio issues
  2014-03-18  6:55       ` Bart Van Assche
@ 2014-03-18 16:13         ` Ross Anderson
  0 siblings, 0 replies; 6+ messages in thread
From: Ross Anderson @ 2014-03-18 16:13 UTC (permalink / raw)
  To: Bart Van Assche, device-mapper development

On 3/18/2014 1:55 AM, Bart Van Assche wrote:
> On 03/18/14 00:51, Ross Anderson wrote:
>> I would appreciate support correcting another issues. We have two scst
>> storage arrays configured  with drbd replication and alua settings via
>> targets. When we run sg_inq and sg_rtpg we see the preferred port.
>> Multipath however doesn't seem to respect the priority path. We then
>> tried setting active and nonoptimized on the two storage arrays without
>> success. If I'm reading this correctly the devices are reporting their
>> state to the multipath system however it's not respecting the values.
>> I'd appreciate any assistance on the matter. Pastebin all the
>> appropriate .conf and command outs.
> Hello Ross,
>
> Have you already tried to change "prio group_by_prio" into "prio alua" ?
>
> Bart.
>
Greetings,

Yes, the config file had multiple devices listed. The one I focused on 
was the general however they all attached to the same set of storage 
heads. In Alua mode it gives them all prio 50 and the output of 
multipath show them not accepting the preferred value. I threw the 
complete config with different options to show no real different.

Thanks,
Ross

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-03-18 16:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-17 13:43 dm-cache: dm-3.14-fixes-4 George .
2014-03-17 14:01 ` Mike Snitzer
2014-03-17 14:30   ` George .
2014-03-17 23:51     ` multipath prio issues Ross Anderson
2014-03-18  6:55       ` Bart Van Assche
2014-03-18 16:13         ` Ross Anderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.