All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Luis Lozano <llozano@chromium.org>
Cc: Jakub Jelinek <jakub@redhat.com>,
	Alexander Holler <holler@ahsoftware.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Richard Henderson <rth@twiddle.net>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Will Deacon <will.deacon@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	lttng-dev@lists.lttng.org, Nathan Lynch <Nathan_Lynch@mentor.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Bhaskar Janakiraman <bjanakiraman@chromium.org>,
	Han Shen <shenhan@chromium.org>
Subject: Re: current_thread_info() not respecting program order with gcc 4.8.x
Date: Fri, 22 Nov 2013 13:06:29 +0000 (UTC)	[thread overview]
Message-ID: <542673482.73156.1385125589922.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <CANxoKds7uwdwQ06AaqE-kcRbWGnqDU2gnFD-3GuKoF7tXigTHw@mail.gmail.com>

----- Original Message -----
> From: "Luis Lozano" <llozano@chromium.org>
> To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> Cc: "Jakub Jelinek" <jakub@redhat.com>, "Alexander Holler" <holler@ahsoftware.de>, "Linus Torvalds"
> <torvalds@linux-foundation.org>, "Richard Henderson" <rth@twiddle.net>, "Linux Kernel Mailing List"
> <linux-kernel@vger.kernel.org>, "Will Deacon" <will.deacon@arm.com>, "Catalin Marinas" <catalin.marinas@arm.com>,
> "Peter Zijlstra" <peterz@infradead.org>, lttng-dev@lists.lttng.org, "Nathan Lynch" <Nathan_Lynch@mentor.com>, "Paul
> E. McKenney" <paulmck@linux.vnet.ibm.com>, "Andrew Morton" <akpm@linux-foundation.org>, "Bhaskar Janakiraman"
> <bjanakiraman@chromium.org>, "Han Shen" <shenhan@chromium.org>
> Sent: Friday, November 22, 2013 3:18:14 AM
> Subject: Re: current_thread_info() not respecting program order with gcc 4.8.x
> 
> On Thu, Nov 21, 2013 at 7:38 PM, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
> >
> > ----- Original Message -----
> > > From: "Luis Lozano" <llozano@google.com>
> > > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> > > Cc: "Jakub Jelinek" <jakub@redhat.com>, "Alexander Holler"
> > > <holler@ahsoftware.de>, "Linus Torvalds"
> > > <torvalds@linux-foundation.org>, "Richard Henderson" <rth@twiddle.net>,
> > > "Linux Kernel Mailing List"
> > > <linux-kernel@vger.kernel.org>, "Will Deacon" <will.deacon@arm.com>,
> > > "Catalin Marinas" <catalin.marinas@arm.com>,
> > > "Peter Zijlstra" <peterz@infradead.org>, lttng-dev@lists.lttng.org,
> > > "Nathan Lynch" <Nathan_Lynch@mentor.com>, "Paul
> > > E. McKenney" <paulmck@linux.vnet.ibm.com>, "Andrew Morton"
> > > <akpm@linux-foundation.org>, "Bhaskar Janakiraman"
> > > <bjanakiraman@chromium.org>, "Han Shen" <shenhan@chromium.org>
> > > Sent: Thursday, November 21, 2013 9:36:27 PM
> > > Subject: Re: current_thread_info() not respecting program order with gcc
> > > 4.8.x
> > >
> > > Hi Mathieu,
> > >
> > > Yes the problem we were seeing with GCC bug 58854 is that an interrupt
> > > handler was corrupting the stack of a routine which had an invalid
> > > value of SP (not really on "top" of the stack) for part of the
> > > routine.
> > > But in the assembly code you sent, I don't see where sp is being
> > > modified... or where the access to "below" sp is happening.
> >
> > The following instruction
> >
> >   f1c:      e50b1048        str     r1, [fp, #-72]  ; 0xffffffb8
> >
> > appears in the assembly generated by gcc 4.8.2, but not in the one
> > generated by 4.7.3,
> 
> 
> 
> It is there in 4.7.3 just a few lines difference.
> I don't see any meaningful difference between the 2 assembly snippets...
> Just the instructions ordered a little different?
> 
> Luis

(resending reply in plain text)


Assembly generated by 4.7.3 vs 4.8.2 are much more different.

The diff I've presented yesterday is between 4.8.2 and 4.8.2+barrier() at the end of
the function error path. When I can get a cross compiler to work, I will send out more
the complete assembly listings of the function.

Thanks,

Mathieu


> 
> >
> > which makes me wonder if it's good or not. With
> > slightly different build output from the diff (probably a different config
> > from Nathan), the first function instructions look like:
> >
> > 00000efc <lttng_event_reserve>:
> >      efc:       e1a0c00d        mov     ip, sp
> >      f00:       e92ddff0        push    {r4, r5, r6, r7, r8, r9, sl, fp,
> >      ip, lr, pc}
> >      f04:       e24cb004        sub     fp, ip, #4
> >      f08:       e24dd03c        sub     sp, sp, #60     ; 0x3c
> >      f0c:       e52de004        push    {lr}            ; (str lr, [sp,
> >      #-4]!)
> >      f10:       ebfffffe        bl      0 <__gnu_mcount_nc>
> >      f14:       e5903000        ldr     r3, [r0]
> >      f18:       e1a04000        mov     r4, r0
> >      f1c:       e1a0000d        mov     r0, sp
> >      f20:       e5936030        ldr     r6, [r3, #48]   ; 0x30
> >      f24:       e3c03d7f        bic     r3, r0, #8128   ; 0x1fc0
> >      f28:       e3c3303f        bic     r3, r3, #63     ; 0x3f
> >      f2c:       e50b104c        str     r1, [fp, #-76]  ; 0xffffffb4
> >      f30:       e5932004        ldr     r2, [r3, #4]
> >      f34:       e2822001        add     r2, r2, #1
> >      f38:       e5832004        str     r2, [r3, #4]
> >      f3c:       ebfffffe        bl      0 <debug_smp_processor_id>
> >      f40:       e59f2e44        ldr     r2, [pc, #3652] ; 1d8c
> >      <lttng_event_reserve+0xe90>
> >      f44:       e59f3e44        ldr     r3, [pc, #3652] ; 1d90
> >      <lttng_event_reserve+0xe94>
> >      f48:       e7921100        ldr     r1, [r2, r0, lsl #2]
> >      f4c:       e1a05000        mov     r5, r0
> >      f50:       e7932001        ldr     r2, [r3, r1]
> >
> > My rusty ARM assembler knowledge analyzes this like:
> >
> > ip = sp            (ip being the scratch register)
> > push 11 registers * 4 bytes = 44 bytes onto the stack
> > fp = ip - 4
> > sp = sp - 60   (sp = ip - 104)
> > push {lr} bl
> > 0 <__gnu_mcount_nc>  (dynamically patched to a pop {lr})
> > [...]
> > store r1 into mem location fp - 76
> >
> > So yes, it does look like the -76 from fp is within the stack.
> >
> >
> > > In GCC bug 58854 it was pretty clear that the restore of the sp
> > > register in the epilogue is moved to somewhere close to the prologue
> > > of the routine.
> > > Are we missing some diffs from the assembly comparison?
> >
> > My next step is to setup my own 4.8.2 ARM cross-compiler. I've tried this
> > morning,
> > did not manage to get one working even after following 2 howtos, trying
> > with Debian
> > packages, etc. Nathan told me this diff was the full comparison between the
> > two
> > functions, but you'll understand that at this point I want to reproduce
> > everything myself, because this is a _weird_ issue.
> >
> > But after a long day of debugging, it's time for some sleep. I will get
> > back to this
> > tomorrow,
> >
> > Thanks,
> >
> > Mathieu
> >
> > >
> > > Thanks
> > >
> > > Luis
> > >
> > >
> > >
> > >
> > > On Thu, Nov 21, 2013 at 5:57 PM, Mathieu Desnoyers
> > > <mathieu.desnoyers@efficios.com> wrote:
> > > > ----- Original Message -----
> > > >> From: "Jakub Jelinek" <jakub@redhat.com>
> > > >> To: "Luis Lozano" <llozano@google.com>
> > > >> Cc: "Alexander Holler" <holler@ahsoftware.de>, "Linus Torvalds"
> > > >> <torvalds@linux-foundation.org>, "Mathieu Desnoyers"
> > > >> <mathieu.desnoyers@efficios.com>, "Richard Henderson"
> > > >> <rth@twiddle.net>,
> > > >> "Linux Kernel Mailing List"
> > > >> <linux-kernel@vger.kernel.org>, "Will Deacon" <will.deacon@arm.com>,
> > > >> "Catalin Marinas" <catalin.marinas@arm.com>,
> > > >> "Peter Zijlstra" <peterz@infradead.org>, lttng-dev@lists.lttng.org,
> > > >> "Nathan Lynch" <Nathan_Lynch@mentor.com>, "Paul
> > > >> E. McKenney" <paulmck@linux.vnet.ibm.com>, "Andrew Morton"
> > > >> <akpm@linux-foundation.org>, "Bhaskar Janakiraman"
> > > >> <bjanakiraman@chromium.org>, "Han Shen" <shenhan@chromium.org>
> > > >> Sent: Thursday, November 21, 2013 7:39:04 PM
> > > >> Subject: Re: current_thread_info() not respecting program order with
> > > >> gcc
> > > >> 4.8.x
> > > >>
> > > >> On Thu, Nov 21, 2013 at 03:45:35PM -0800, Luis Lozano wrote:
> > > >> > I think we need a reproducer. Without this we may all be going on
> > > >> > the
> > > >> > wrong path. This whole conversation started on an *assumption* that
> > > >> > some accesses were being reordered.
> > > >> >
> > > >> > evidence of the reorder or reproducer please?
> > > >>
> > > >> Yeah, if a compiler bug is suspected, can anybody please open
> > > >> a bugreport in http://gcc.gnu.org/bugzilla/ with the preprocessed
> > > >> source,
> > > >> compiler version, flags and how it was configured and some hint in
> > > >> which
> > > >> function to look for what exactly?  We don't necessarily need a
> > > >> runtime
> > > >> small reproducer, but if it can be shown in the assembly what has been
> > > >> reordered and why you think it shouldn't, with the above mentioned
> > > >> input
> > > >> that ought to be sufficient.  Thanks.
> > > >
> > > > OK OK, let me reply on list first so I can share the result of a full
> > > > day
> > > > of bug hunting. We're not there yet, but many options have been
> > > > eliminated.
> > > >
> > > > The issue shows up in stress test, when tracing with lttng-modules
> > > > 2.4-rc1,
> > > > on ARM. It's been reproduced with a Linux kernel 3.12 so far, with
> > > > lttng-modules
> > > > compiled against that kernel.
> > > >
> > > > First, I asked Nathan to compile his kernel with gcc 4.7, and
> > > > lttng-modules
> > > > with gcc 4.8.x (and vice-versa). The problem only appears when
> > > > lttng-modules
> > > > are compiled with gcc 4.8.x. The compiler version used to compile the
> > > > rest
> > > > of the kernel does not matter.
> > > >
> > > > Then I looked at gcc 4.8 changelog for ARM, new feature:
> > > > -fno-sched-pressure
> > > > (sched pressure is there by default). Nathan tried compiling
> > > > lttng-modules
> > > > with
> > > > -fno-sched-pressure. The problem still reproduces.
> > > >
> > > > Knowing that adding barrier() outside of preempt_disable()/enable() was
> > > > fixing the issue, we tried identifying which code location was
> > > > responsible
> > > > for working around the issue. Skipping a long investigation, here is
> > > > the
> > > > executive summary:
> > > >
> > > > http://git.lttng.org/?p=lttng-modules.git;a=blob;f=lttng-ring-buffer-client.h;h=50c47b3bf49f6c2dd24e250cf1a9b97808cd8e27;hb=HEAD
> > > >
> > > > Has the following function. We identified that adding a barrier() as
> > > > shown
> > > > below
> > > > works around the issue:
> > > >
> > > > static
> > > > int lttng_event_reserve(struct lib_ring_buffer_ctx *ctx,
> > > >                       uint32_t event_id)
> > > > {
> > > >         struct lttng_channel *lttng_chan =
> > > >         channel_get_private(ctx->chan);
> > > >         int ret, cpu;
> > > >
> > > >         cpu = lib_ring_buffer_get_cpu(&client_config);
> > > >         if (cpu < 0)
> > > >                 return -EPERM;
> > > >         ctx->cpu = cpu;
> > > >
> > > >         switch (lttng_chan->header_type) {
> > > >         case 1: /* compact */
> > > >                 if (event_id > 30)
> > > >                         ctx->rflags |= LTTNG_RFLAG_EXTENDED;
> > > >                 break;
> > > >         case 2: /* large */
> > > >                 if (event_id > 65534)
> > > >                         ctx->rflags |= LTTNG_RFLAG_EXTENDED;
> > > >                 break;
> > > >         default:
> > > >                 WARN_ON_ONCE(1);
> > > >         }
> > > >
> > > >         ret = lib_ring_buffer_reserve(&client_config, ctx);
> > > >         if (ret)
> > > >                 goto put;
> > > >         lttng_write_event_header(&client_config, ctx, event_id);
> > > >         return 0;
> > > > put:
> > > >         lib_ring_buffer_put_cpu(&client_config);
> > > >         ---------------> barrier() added here;
> > > >         <----------------------------
> > > >         return ret;
> > > > }
> > > >
> > > > Where barrier() is the usual asm volatile with "memory" clobber,
> > > > nothing
> > > > else.
> > > >
> > > > Nathan gave me the binary diff for the assembly generated for this
> > > > function
> > > > without
> > > > the barrier and with the barrier:
> > > >
> > > > --- /tmp/lttng_event_reserve-4.8.2.dump 2013-11-21 11:14:14.536495079
> > > > -0600
> > > > +++ /tmp/lttng_event_reserve-with-barrier-4.8.2.dump    2013-11-21
> > > > 14:12:52.997355907 -0600
> > > > @@ -7,11 +7,11 @@
> > > >       f10:      ebfffffe        bl      0 <__gnu_mcount_nc>
> > > >       f14:      e5903000        ldr     r3, [r0]
> > > >       f18:      e1a04000        mov     r4, r0
> > > > -     f1c:      e50b1048        str     r1, [fp, #-72]  ; 0xffffffb8
> > > > +     f1c:      e1a0000d        mov     r0, sp
> > > >       f20:      e5936030        ldr     r6, [r3, #48]   ; 0x30
> > > > -     f24:      e1a0000d        mov     r0, sp
> > > > -     f28:      e3c03d7f        bic     r3, r0, #8128   ; 0x1fc0
> > > > -     f2c:      e3c3303f        bic     r3, r3, #63     ; 0x3f
> > > > +     f24:      e3c03d7f        bic     r3, r0, #8128   ; 0x1fc0
> > > > +     f28:      e3c3303f        bic     r3, r3, #63     ; 0x3f
> > > > +     f2c:      e50b1048        str     r1, [fp, #-72]  ; 0xffffffb8
> > > >       f30:      e5932004        ldr     r2, [r3, #4]
> > > >       f34:      e2822001        add     r2, r2, #1
> > > >       f38:      e5832004        str     r2, [r3, #4]
> > > > @@ -53,7 +53,7 @@
> > > >       fc8:      e3c3303f        bic     r3, r3, #63     ; 0x3f
> > > >       fcc:      e5933000        ldr     r3, [r3]
> > > >       fd0:      e3130002        tst     r3, #2
> > > > -     fd4:      0a000000        beq     fdc <lttng_event_reserve+0xe0>
> > > > +     fd4:      0a0002be        beq     1ad4
> > > > <lttng_event_reserve+0xbd8>
> > > >       fd8:      ebfffffe        bl      0 <preempt_schedule>
> > > >       fdc:      ea0002bc        b       1ad4
> > > >       <lttng_event_reserve+0xbd8>
> > > >       fe0:      e3500000        cmp     r0, #0
> > > >
> > > > We tried disabling the ftrace function tracing to get mcount out of the
> > > > way,
> > > > and the problem still reproduces.
> > > >
> > > > I'm stopping here in terms of details about the disassembly, because I
> > > > need to double check with Nathan that I get the right disassembly for
> > > > the
> > > > right
> > > > cases. I also terribly need to setup a 4.8.2 ARM cross-compiler on my
> > > > machine.
> > > >
> > > > I'm attaching Nathan's ARM configuration.
> > > >
> > > > It does look behave a bit like this bug pointed out by Luis:
> > > >
> > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
> > > >
> > > > AFAIU (please correct me if I am wrong), ARM's interrupt handler run
> > > > on top of the thread stack (?). If it's the case, then anything stored
> > > > on the stack below "sp" could be overwritten by an interrupt handler.
> > > > This would fit well the reproduction scenario for this bug: Nathan runs
> > > > LTTng tracing of kmem_cache_free tracepoint with hackbench running.
> > > > A race between a short window of stack use below sp and interrupt
> > > > handlers
> > > > would trigger with this kind of stress-test.
> > > >
> > > > Thoughts ?
> > > >
> > > > Thanks,
> > > >
> > > > Mathieu
> > > >
> > > >
> > > > --
> > > > Mathieu Desnoyers
> > > > EfficiOS Inc.
> > > > http://www.efficios.com
> > >
> > >
> > >
> > > --
> > >
> > > Luis A. Lozano | Software Engineer | llozano@google.com | +1
> > > (408)431-5164
> > >
> >
> > --
> > Mathieu Desnoyers
> > EfficiOS Inc.
> > http://www.efficios.com
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  parent reply	other threads:[~2013-11-22 13:06 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <52803E5D.3050109@mentor.com>
2013-11-11 15:47 ` might_sleep warnings in overwrite mode Nathan Lynch
2013-11-14 18:16 ` Nathan Lynch
     [not found] ` <52851395.3010306@mentor.com>
2013-11-15  2:34   ` Mathieu Desnoyers
     [not found]   ` <67652521.68027.1384482849638.JavaMail.zimbra@efficios.com>
2013-11-18 19:30     ` Nathan Lynch
2013-11-19 15:29     ` current_thread_info() not respecting program order with gcc 4.8.x Mathieu Desnoyers
2013-11-19 15:57       ` Peter Zijlstra
2013-11-19 16:13         ` Jakub Jelinek
2013-11-19 16:21           ` Peter Zijlstra
2013-11-19 16:05       ` Will Deacon
2013-11-19 17:02         ` Mathieu Desnoyers
2013-11-19 17:33           ` Peter Zijlstra
2013-11-19 21:56             ` Multiple local register variables w/ same register Richard Henderson
2013-11-19 22:08               ` Jakub Jelinek
2013-11-19 22:13               ` Måns Rullgård
2013-11-19 22:13                 ` Måns Rullgård
2013-11-19 22:25               ` Mathieu Desnoyers
2013-11-19 22:34                 ` [lttng-dev] " Mathieu Desnoyers
2013-11-20  0:41       ` current_thread_info() not respecting program order with gcc 4.8.x Linus Torvalds
2013-11-20 15:10         ` Mathieu Desnoyers
2013-11-21 16:02         ` Alexander Holler
2013-11-21 22:12           ` Luis Lozano
2013-11-21 22:12             ` Luis Lozano
2013-11-21 22:32           ` Linus Torvalds
2013-11-21 23:03             ` Bhaskar Janakiraman
2013-11-21 23:18             ` Alexander Holler
2013-11-21 23:45               ` Luis Lozano
2013-11-22  0:39                 ` Jakub Jelinek
2013-11-22  1:57                   ` Mathieu Desnoyers
2013-11-22  2:36                     ` Luis Lozano
2013-11-22  3:38                       ` Mathieu Desnoyers
2013-11-22  8:16                         ` Luis Lozano
2013-11-22  8:18                         ` Luis Lozano
2013-11-22  8:33                           ` Luis Lozano
2013-11-22 13:06                           ` Mathieu Desnoyers [this message]
2013-11-22 20:33                             ` [lttng-dev] " Mathieu Desnoyers
     [not found]                         ` <CANxoKduwK=9__0WFXFcTWjQn3Rbn+HgSWZyL0FN_VuJ2Q_2TPQ@mail.gmail.com>
2013-11-22 13:02                           ` Mathieu Desnoyers
2013-11-22  0:17               ` Linus Torvalds
2013-11-22  0:34                 ` Alexander Holler
2013-11-21  1:52 Luis Lozano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=542673482.73156.1385125589922.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=Nathan_Lynch@mentor.com \
    --cc=akpm@linux-foundation.org \
    --cc=bjanakiraman@chromium.org \
    --cc=catalin.marinas@arm.com \
    --cc=holler@ahsoftware.de \
    --cc=jakub@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llozano@chromium.org \
    --cc=lttng-dev@lists.lttng.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rth@twiddle.net \
    --cc=shenhan@chromium.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.