From: Loic Dachary <loic@dachary.org>
To: Kevin Greenan <kmgreen2@gmail.com>
Cc: Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: jerasure/gf-complete segmentation violation
Date: Sun, 06 Apr 2014 12:12:43 +0200 [thread overview]
Message-ID: <5341289B.4080701@dachary.org> (raw)
In-Reply-To: <CA+AFVBgVXsTLJuGh-FrJMx3ee11Ztf=g+B9gnHybg9EXwunfnw@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4745 bytes --]
Hi,
An illegal instruction this time http://tracker.ceph.com/issues/7914#note-31 . Since the workload is slightly different, I'm trying to run it 30 times and see if that triggers the problem.
Cheers
On 02/04/2014 20:15, Kevin Greenan wrote:
> OK, it looks like this happens when the GF backend is first initialized (unless, like Loic pointed out, something is corrupted).
>
> Is this consistently happening for carry-free multiply and w=32 (i.e. gf_w32_cfm_init)?
>
> Can you send me a core + binary, so I can dig in gdb?
>
> -kevin
>
>
> On Wed, Apr 2, 2014 at 11:01 AM, Sage Weil <sage@inktank.com <mailto:sage@inktank.com>> wrote:
>
> On Wed, 2 Apr 2014, Loic Dachary wrote:
> >
> >
> > On 02/04/2014 19:44, Kevin Greenan wrote:
> > > Hey Loic,
> > >
> > > Are you ensuring that Jerasure (actually gf-complete) is getting memory buffers aligned on 16-byte boundaries? Without looking too deep, that is the first thing I would check.
> > >
> >
> > Yes
> >
> > https://github.com/ceph/ceph/blob/master/src/erasure-code/jerasure/ErasureCodeJerasure.cc#L32
> > https://github.com/ceph/ceph/blob/master/src/erasure-code/jerasure/ErasureCodeJerasure.cc#L242
> > https://github.com/ceph/ceph/blob/master/src/erasure-code/jerasure/ErasureCodeJerasure.cc#L65
> > https://github.com/ceph/ceph/blob/master/src/erasure-code/jerasure/ErasureCodeJerasure.cc#L108
> >
>
> In this case they are 2K aligned:
>
> (gdb) p data_ptrs[0]
> $1 = 0x3e46000 "I'm the", ' ' <repeats 16 times>, "3th object!"
> (gdb) p data_ptrs[1]
> $2 = 0x3e46800 'z' <repeats 200 times>...
> (gdb) p coding_ptrs[0]
> $3 = 0x338e000 "I'm the", ' ' <repeats 16 times>, "3th object!"
>
> sage
>
> > I'll re-read this logic tomorrow just to be sure.
> >
> > Cheers
> >
> > > I can have a deeper look later today or tomorrow.
> > >
> > > -kevin
> > >
> > >
> > > On Wed, Apr 2, 2014 at 10:35 AM, Loic Dachary <loic@dachary.org <mailto:loic@dachary.org> <mailto:loic@dachary.org <mailto:loic@dachary.org>>> wrote:
> > >
> > > Hi Kevin,
> > >
> > > In the context of http://tracker.ceph.com/issues/7914 we're trying to figure out why jerasure dumps core. We don't know how to reproduce it yet (ran dozens of identical tests suites with no such crash in the past few days, which is to be expected for rare bugs because the test suite introduces random errors / failures on purpose).
> > >
> > > The full stack trace is at http://tracker.ceph.com/issues/7914#note-24 but the relevant part is here:
> > >
> > > #0 0x00007f4756779b7b in raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
> > > #1 0x0000000000981b4e in reraise_fatal (signum=11) at global/signal_handler.cc:59
> > > #2 handle_fatal_signal (signum=11) at global/signal_handler.cc:105
> > > #3 <signal handler called>
> > > #4 0x0000000000000000 in ?? ()
> > > #5 0x00007f47385ae6b1 in jerasure_matrix_dotprod (k=2, w=8, matrix_row=0x31513a8, src_ids=0x0, dest_id=<optimized out>, data_ptrs=0x7f4741ec7a00, coding_ptrs=0x7f4741ec7a10,
> > > size=2048) at erasure-code/jerasure/jerasure/src/jerasure.c:607
> > > #6 0x00007f47385ae7d6 in jerasure_matrix_encode (k=2, m=1, w=8, matrix=<optimized out>, data_ptrs=0x7f4741ec7a00, coding_ptrs=0x7f4741ec7a10, size=2048)
> > > at erasure-code/jerasure/jerasure/src/jerasure.c:310
> > > ...
> > >
> > > Note that this jerasure/gf-complete combination has been compiled with SSE4.1, SSE4.2, PCLMUL, SSSE3, SSE3, SSE2, SSE flags activated. These are jerasure v2 and gf-complete v1, only slightly modified as found in https://github.com/ceph/jerasure/tree/v2-ceph and https://github.com/ceph/gf-complete/tree/v1-ceph (all commits there have a pending pull request under https://bitbucket.org/jimplank/gf-complete https://bitbucket.org/jimplank/jerasure, nothing you've not seen before).
> > >
> > > #5 is https://github.com/ceph/jerasure/blob/v2-ceph/src/jerasure.c#L607
> > >
> > > and then it dives into gf-complete and most probably destroyed part of the stack when corrupting memory. I'll be chasing this tomorrow. If you have a brilliant idea on why that happens, I'll take it ;-)
> > >
> > > Cheers
> > >
> > > --
> > > Loïc Dachary, Artisan Logiciel Libre
> > >
> > >
> >
> > --
> > Loïc Dachary, Artisan Logiciel Libre
> >
> >
>
>
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
next prev parent reply other threads:[~2014-04-06 10:12 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-02 17:35 jerasure/gf-complete segmentation violation Loic Dachary
2014-04-02 17:51 ` Loic Dachary
2014-04-02 22:57 ` Loic Dachary
[not found] ` <CA+AFVBg00yTzu-VGxSURDxv_UWOmZJEF+077txButeoSkoQuBg@mail.gmail.com>
2014-04-02 17:56 ` Loic Dachary
2014-04-02 18:01 ` Sage Weil
[not found] ` <CA+AFVBgVXsTLJuGh-FrJMx3ee11Ztf=g+B9gnHybg9EXwunfnw@mail.gmail.com>
2014-04-06 10:12 ` Loic Dachary [this message]
[not found] ` <D590780E-5F28-4ADA-B9F5-E2E14C9C0D27@gmail.com>
[not found] ` <5341A5C3.8090802@dachary.org>
[not found] ` <CA+AFVBjomjD_oReuEcrkpR-y5CSLw7cCjOEa3T+XHHGieT+=Hg@mail.gmail.com>
2014-04-07 18:29 ` Loic Dachary
2014-04-07 18:56 ` Loic Dachary
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5341289B.4080701@dachary.org \
--to=loic@dachary.org \
--cc=ceph-devel@vger.kernel.org \
--cc=kmgreen2@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.