CEPH filesystem development
 help / color / mirror / Atom feed
From: caifeng.zhu@uniswdc.com
To: Ilya Dryomov <idryomov@gmail.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: bad crc in data caused by a race condtion in write_partial_message_data
Date: Mon, 16 Jan 2017 18:09:04 +0800	[thread overview]
Message-ID: <20170116100904.GA38449@T530I> (raw)
In-Reply-To: <CAOi1vP9-Rwo5Mc-DzogTQSBFy3Hbcy=gKehG-xsYDP5fFm3PBg@mail.gmail.com>

On Mon, Jan 16, 2017 at 09:42:25AM +0100, Ilya Dryomov wrote:
> On Mon, Jan 16, 2017 at 4:24 AM,  <caifeng.zhu@uniswdc.com> wrote:
> > On Sun, Jan 15, 2017 at 06:01:05PM +0100, Ilya Dryomov wrote:
> >> On Sun, Jan 15, 2017 at 8:45 AM,  <caifeng.zhu@uniswdc.com> wrote:
> >> > Hi, all
> >> >
> >> > Let's look at the problem first. We have a lot of 'bad crc in data'
> >> > warnings at OSDs, like below:
> >> >     2017-01-14 23:25:54.671599 7f67201b3700  0 bad crc in data 1480547403 != exp 3751318843
> >> >     2017-01-14 23:25:54.681146 7f67201b3700  0 bad crc in data 3044715775 != exp 3018112170
> >> >     2017-01-14 23:25:54.681822 7f67201b3700  0 bad crc in data 2815383560 != exp 1455746011
> >> >     2017-01-14 23:25:54.686106 7f67205da700  0 bad crc in data 1781929234 != exp 498105391
> >> >     2017-01-14 23:25:54.688092 7f67205da700  0 bad crc in data 1845054835 != exp 3337474350
> >> >     2017-01-14 23:25:54.693225 7f67205da700  0 bad crc in data 1518733907 != exp 3781627678
> >> >     2017-01-14 23:25:54.755653 7f6724115700  0 bad crc in data 1173337243 != exp 3759627242
> >> >     ...
> >> > This problem occurs when we are testing(by fio) an NFS client, whose NFS server is
> >> > built on an XFS + RBD combination. The bad effect of the problem is that: OSD will close
> >> > the connection of crc error and drop all reply messages sent through the connection.
> >> > But the kernel rbd client will hold the requests and wait for the already dropped
> >> > replies which will never come. A deadlock occurs.
> >> >
> >> > After some analysis, we suspect write_partial_message_data may have a race condtion.
> >> > (Code below is got from gitbub.)
> >> >     1562                page = ceph_msg_data_next(cursor, &page_offset, &length,
> >> >     1563                                          &last_piece);
> >> >     1564                ret = ceph_tcp_sendpage(con->sock, page, page_offset,
> >> >     1565                                        length, !last_piece);
> >> >     ...
> >> >     1572                if (do_datacrc && cursor->need_crc)
> >> >     1573                        crc = ceph_crc32c_page(crc, page, page_offset, length);
> >> > At line 1564 ~ 1572, a worker thread of libceph workquue may send the page out by TCP
> >> > and compute the CRC. But simultaneously, at the VFS/XFS level, there may be another thread
> >> > writing to file position cached by the sending-out page. If page sending and crc compution
> >> > is interleaved by data writing, bad CRC will be complained by the receiving OSD.
> >> >
> >> > To verify our suspection, we add the debug patch below:
> >> > (Code below is based on our linux version.)
> >>
> >> ... which is based on?  This should be fixed in 4.3+ and all recent stable
> >> kernels.
> >>
> >
> > We are using CentOS 7.1, with kernel as
> > kernel.osrelease = 3.10.0-229.14.1.el7.1.x86_64.
> > With patches added by CentOS, the ceph kernel client is roughly about 4.0~.
> 
> No, it's not ~4.0.  A lot of important fixes are missing from that
> kernel and I'd strognly encourage you to upgrade to the 7.3 kernel.
> 

Thanks for your suggestion. We'll try it.

> >
> > Is there any info or doc about the fixes in 4.3+?
> 
> This is the fix, trivial to cherry-pick and try out:
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bae818ee1577c27356093901a0ea48f672eda514
> 

This patch makes sense for me. It is the elegant soltuion, 
much (much ...) better than my proposal. Thanks for your help!

> Thanks,
> 
>                 Ilya
> 



      reply	other threads:[~2017-01-16 10:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-15  7:45 bad crc in data caused by a race condtion in write_partial_message_data caifeng.zhu
2017-01-15 16:26 ` Alex Elder
2017-01-16  3:09   ` caifeng.zhu
2017-01-15 17:01 ` Ilya Dryomov
2017-01-16  3:24   ` caifeng.zhu
2017-01-16  8:42     ` Ilya Dryomov
2017-01-16 10:09       ` caifeng.zhu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170116100904.GA38449@T530I \
    --to=caifeng.zhu@uniswdc.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox