* back-ported dm-cache not forwarding read-ahead bios to origin
@ 2015-07-20 16:50 Thanos Makatos
2015-07-20 17:22 ` Mike Snitzer
0 siblings, 1 reply; 3+ messages in thread
From: Thanos Makatos @ 2015-07-20 16:50 UTC (permalink / raw)
To: device-mapper development; +Cc: ejt, snitzer, Mauelshagen
[-- Attachment #1.1: Type: text/plain, Size: 4285 bytes --]
I'm working on a back-ported dm-cache version for kernel 2.6.32-431.29.2
(the CentOS 6 patched one) and I'm trying to solve a corruption bug
apparently introduced during the back-port. I can consistently reproduce it
by simply mounting an ext4 file-system that contains some data and running
stat(1) against a specific directory. stat(1) fails with "Input/output
error" and dmesg says:"EXT4-fs error (device dm-8): ext4_lookup: deleted
inode referenced: 29". The file-system is mounted with options
"data=writeback,ro,nodiscard,inode_readahead_blks=1" in order to minimise
noise.
Since I'm using the pass-through mode, my theory is that dm-cache:
(1) forwards the bio to the wrong device, and/or
(2) forwards the bio to the wrong location of the device (e,g, bio length
and/or offset are wrong), and/or
(3) copies the wrong piece of data from a forwarded bio to the original
bio, assuming it copies data in the first place (I don't know much about
the device mapper at this point).
I tried to confirm which of the above 3 could be happening by checking
which bio goes where using btrace(8). Specifically, I ran btrace(8) against
the cache target, the HDD, the SSD data and metadata devices (4 traces in
total). I observed that no bios go to the SSD data and metadata devices, so
this rules out (1). I also observed that read-ahead requests issued to the
cache target don't get forwarded to the HDD. I don't know whether or not
this can be a problem in the first place (can read-ahead bios be ignored?),
let alone identifying this being the problem, but I think it's worth it
ensuring that it really doesn't cause any problems.
Below are the traces when mounting the file-system:
cache target trace:
253,8 7 3 27.218701098 28536 Q R 2 + 2 [mount]
253,8 7 4 27.218726465 28536 U N [mount] 0
253,8 7 5 27.222694538 28536 Q R 0 + 8 [mount]
253,8 7 6 27.222707270 28536 U N [mount] 0
253,8 7 7 27.226580397 28536 Q R 8 + 8 [mount]
253,8 7 8 27.226598088 28536 U N [mount] 0
253,8 7 9 27.229666137 28536 Q RA 2832 + 8 [mount]
253,8 7 10 27.229677500 28536 Q RM 2824 + 8 [mount]
253,8 7 11 27.229679348 28536 U N [mount] 0
253,8 1 2 27.222630997 28198 C R 2 + 2 [0]
253,8 1 3 27.226560799 28198 C R 0 + 8 [0]
253,8 1 4 27.229570827 28198 C R 8 + 8 [0]
253,8 1 5 27.232313463 28198 C RM 2824 + 8 [0]
253,8 3 1 27.229683980 28291 C RA 2832 + 8 [0]
253,8 7 12 27.232360573 28536 Q R 4456448 + 8 [mount]
253,8 7 13 27.232402040 28536 U N [mount] 0
253,8 6 1 27.243263044 28204 C R 4456448 + 8 [0]
HDD trace:
253,5 1 3 27.222584291 28198 C R 2 + 2 [0]
253,5 7 2 27.218685545 28536 U N [(null)] 0
253,5 7 3 27.222664774 28536 U N [(null)] 0
253,5 3 1 27.218694575 28291 Q R 2 + 2 [dm-cache]
253,5 3 2 27.222670216 28291 Q R 0 + 8 [dm-cache]
253,5 3 3 27.226566647 28291 Q R 8 + 8 [dm-cache]
253,5 1 4 27.226516192 28198 C R 0 + 8 [0]
253,5 1 5 27.229526352 28198 C R 8 + 8 [0]
253,5 1 6 27.232269516 28198 C RM 2824 + 8 [0]
253,5 7 4 27.226555641 28536 U N [(null)] 0
253,5 7 5 27.229636877 28536 U N [(null)] 0
253,5 7 6 27.232331776 28536 A R 4456448 + 8 <- (253,8)
4456448
253,5 7 7 27.232332898 28536 Q R 4456448 + 8 [(null)]
253,5 7 8 27.232359557 28536 U N [(null)] 0
253,5 3 4 27.229649990 28291 Q RM 2824 + 8 [dm-cache]
253,5 6 1 27.243215063 28204 C R 4456448 + 8 [0]
The "RA 2832 + 8" request (7th line in the 1st trace) issued to the cache
target gets completed without ever reaching the HDD. Is this OK? I've
started looking at the code but I haven't found yet anything specific to
read-ahead bios.
Regarding my 3rd theory (data getting corrupted by dm-cache after read from
the HDD), is there some relatively easy way to confirm this? E.g. could
btrace tell me when a bio completes the checksum of the bio's data?
Is there something else that could be wrong?
[-- Attachment #1.2: Type: text/html, Size: 5298 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: back-ported dm-cache not forwarding read-ahead bios to origin
2015-07-20 16:50 back-ported dm-cache not forwarding read-ahead bios to origin Thanos Makatos
@ 2015-07-20 17:22 ` Mike Snitzer
2015-07-21 9:56 ` Thanos Makatos
0 siblings, 1 reply; 3+ messages in thread
From: Mike Snitzer @ 2015-07-20 17:22 UTC (permalink / raw)
To: Thanos Makatos; +Cc: device-mapper development, ejt, Mauelshagen
On Mon, Jul 20 2015 at 12:50pm -0400,
Thanos Makatos <thanos.makatos@onapp.com> wrote:
> I'm working on a back-ported dm-cache version for kernel 2.6.32-431.29.2
> (the CentOS 6 patched one) and I'm trying to solve a corruption bug
> apparently introduced during the back-port. I can consistently reproduce it
> by simply mounting an ext4 file-system that contains some data and running
> stat(1) against a specific directory. stat(1) fails with "Input/output
> error" and dmesg says:"EXT4-fs error (device dm-8): ext4_lookup: deleted
> inode referenced: 29". The file-system is mounted with options
> "data=writeback,ro,nodiscard,inode_readahead_blks=1" in order to minimise
> noise.
Upstream backports to RHEL6 are tricky because it doesn't have
upstream's latest block changes. Care must be taken during the
backport. I could easily see someone less aware of the pitfalls
producing a buggy kernel that cause corruption like you've reported.
The last dm-cache backport to RHEL6 was kernel-2.6.32-528.el6.
That backport sync'd dm-cache changes through upstream Linux 3.19.
I'm not going to put any time to this report. Your best bet is to
rebase to the >= 528.el6 kernel and go from there (this translates to
the forthcoming RHEL6.7 kernel as the first publicly available release
with the changes in question).
Mike
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: back-ported dm-cache not forwarding read-ahead bios to origin
2015-07-20 17:22 ` Mike Snitzer
@ 2015-07-21 9:56 ` Thanos Makatos
0 siblings, 0 replies; 3+ messages in thread
From: Thanos Makatos @ 2015-07-21 9:56 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development, ejt, Mauelshagen
> Upstream backports to RHEL6 are tricky because it doesn't have
> upstream's latest block changes. Care must be taken during the
> backport. I could easily see someone less aware of the pitfalls
> producing a buggy kernel that cause corruption like you've reported.
>
> The last dm-cache backport to RHEL6 was kernel-2.6.32-528.el6.
> That backport sync'd dm-cache changes through upstream Linux 3.19.
>
> I'm not going to put any time to this report. Your best bet is to
> rebase to the >= 528.el6 kernel and go from there (this translates to
> the forthcoming RHEL6.7 kernel as the first publicly available release
> with the changes in question).
Thanks Mike, had I known that there was a back-port planned for 6.7 I wouldn't
have bothered back-porting it myself... I tried to look for an existing
back-port for CentOS a few months back but didn't find anything. Just to teach
myself a lesson, if I had asked on this list whether a back-port was planned,
would I have gotten a response, or such information could not have been
disclosed?
Because our product is based on CentOS, it might take some time for dm-cache
to become available through CentOS 6.7. So for now I'll have to continue using
my broken version in order to integrate it with our product. Any help
appreciated
--
Thanos Makatos
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-07-21 9:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-20 16:50 back-ported dm-cache not forwarding read-ahead bios to origin Thanos Makatos
2015-07-20 17:22 ` Mike Snitzer
2015-07-21 9:56 ` Thanos Makatos
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.