From: caifeng.zhu@uniswdc.com
To: Ilya Dryomov <idryomov@gmail.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: bad crc in data caused by a race condtion in write_partial_message_data
Date: Mon, 16 Jan 2017 18:09:04 +0800 [thread overview]
Message-ID: <20170116100904.GA38449@T530I> (raw)
In-Reply-To: <CAOi1vP9-Rwo5Mc-DzogTQSBFy3Hbcy=gKehG-xsYDP5fFm3PBg@mail.gmail.com>
On Mon, Jan 16, 2017 at 09:42:25AM +0100, Ilya Dryomov wrote:
> On Mon, Jan 16, 2017 at 4:24 AM, <caifeng.zhu@uniswdc.com> wrote:
> > On Sun, Jan 15, 2017 at 06:01:05PM +0100, Ilya Dryomov wrote:
> >> On Sun, Jan 15, 2017 at 8:45 AM, <caifeng.zhu@uniswdc.com> wrote:
> >> > Hi, all
> >> >
> >> > Let's look at the problem first. We have a lot of 'bad crc in data'
> >> > warnings at OSDs, like below:
> >> > 2017-01-14 23:25:54.671599 7f67201b3700 0 bad crc in data 1480547403 != exp 3751318843
> >> > 2017-01-14 23:25:54.681146 7f67201b3700 0 bad crc in data 3044715775 != exp 3018112170
> >> > 2017-01-14 23:25:54.681822 7f67201b3700 0 bad crc in data 2815383560 != exp 1455746011
> >> > 2017-01-14 23:25:54.686106 7f67205da700 0 bad crc in data 1781929234 != exp 498105391
> >> > 2017-01-14 23:25:54.688092 7f67205da700 0 bad crc in data 1845054835 != exp 3337474350
> >> > 2017-01-14 23:25:54.693225 7f67205da700 0 bad crc in data 1518733907 != exp 3781627678
> >> > 2017-01-14 23:25:54.755653 7f6724115700 0 bad crc in data 1173337243 != exp 3759627242
> >> > ...
> >> > This problem occurs when we are testing(by fio) an NFS client, whose NFS server is
> >> > built on an XFS + RBD combination. The bad effect of the problem is that: OSD will close
> >> > the connection of crc error and drop all reply messages sent through the connection.
> >> > But the kernel rbd client will hold the requests and wait for the already dropped
> >> > replies which will never come. A deadlock occurs.
> >> >
> >> > After some analysis, we suspect write_partial_message_data may have a race condtion.
> >> > (Code below is got from gitbub.)
> >> > 1562 page = ceph_msg_data_next(cursor, &page_offset, &length,
> >> > 1563 &last_piece);
> >> > 1564 ret = ceph_tcp_sendpage(con->sock, page, page_offset,
> >> > 1565 length, !last_piece);
> >> > ...
> >> > 1572 if (do_datacrc && cursor->need_crc)
> >> > 1573 crc = ceph_crc32c_page(crc, page, page_offset, length);
> >> > At line 1564 ~ 1572, a worker thread of libceph workquue may send the page out by TCP
> >> > and compute the CRC. But simultaneously, at the VFS/XFS level, there may be another thread
> >> > writing to file position cached by the sending-out page. If page sending and crc compution
> >> > is interleaved by data writing, bad CRC will be complained by the receiving OSD.
> >> >
> >> > To verify our suspection, we add the debug patch below:
> >> > (Code below is based on our linux version.)
> >>
> >> ... which is based on? This should be fixed in 4.3+ and all recent stable
> >> kernels.
> >>
> >
> > We are using CentOS 7.1, with kernel as
> > kernel.osrelease = 3.10.0-229.14.1.el7.1.x86_64.
> > With patches added by CentOS, the ceph kernel client is roughly about 4.0~.
>
> No, it's not ~4.0. A lot of important fixes are missing from that
> kernel and I'd strognly encourage you to upgrade to the 7.3 kernel.
>
Thanks for your suggestion. We'll try it.
> >
> > Is there any info or doc about the fixes in 4.3+?
>
> This is the fix, trivial to cherry-pick and try out:
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bae818ee1577c27356093901a0ea48f672eda514
>
This patch makes sense for me. It is the elegant soltuion,
much (much ...) better than my proposal. Thanks for your help!
> Thanks,
>
> Ilya
>
prev parent reply other threads:[~2017-01-16 10:13 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-15 7:45 bad crc in data caused by a race condtion in write_partial_message_data caifeng.zhu
2017-01-15 16:26 ` Alex Elder
2017-01-16 3:09 ` caifeng.zhu
2017-01-15 17:01 ` Ilya Dryomov
2017-01-16 3:24 ` caifeng.zhu
2017-01-16 8:42 ` Ilya Dryomov
2017-01-16 10:09 ` caifeng.zhu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170116100904.GA38449@T530I \
--to=caifeng.zhu@uniswdc.com \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.