From: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org,
Kyungmin Park <kyungmin.park@samsung.com>,
Felipe Balbi <balbi@ti.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Michal Nazarewicz <mina86@mina86.com>,
"David S. Miller" <davem@davemloft.net>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
James Morris <jmorris@namei.org>,
Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
Patrick McHardy <kaber@trash.net>,
netdev@vger.kernel.org
Subject: Re: [PATCH] net: sk == 0xffffffff fix - not for commit
Date: Fri, 17 Jan 2014 13:47:28 +0100 [thread overview]
Message-ID: <52D92660.5060308@samsung.com> (raw)
In-Reply-To: <52D91FAF.8090109@samsung.com>
W dniu 17.01.2014 13:18, Andrzej Pietrasiewicz pisze:
> W dniu 16.01.2014 17:29, Eric Dumazet pisze:
>> On Thu, 2014-01-16 at 16:21 +0100, Andrzej Pietrasiewicz wrote:
>>> W dniu 10.12.2013 15:25, Eric Dumazet pisze:
>>>> On Tue, 2013-12-10 at 07:55 +0100, Andrzej Pietrasiewicz wrote:
>>>>> W dniu 09.12.2013 16:31, Eric Dumazet pisze:
>>>>>> On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:
>>>>>>> NOT FOR COMMITTING TO MAINLINE.
>>>>>>>
>>>>>>> With g_ether loaded the sk occasionally becomes 0xffffffff.
>>>>>>> It happens usually after transferring few hundreds of kilobytes to few
>>>>>>> tens of megabytes. If sk is 0xffffffff then dereferencing it causes
>>>>>>> kernel panic.
>>>>>>>
>>>>>>> This is a *workaround*. I don't know enough net code to understand the core
>>>>>>> of the problem. However, with this patch applied the problems are gone,
>>>>>>> or at least pushed farther away.
>>>>>>
>>>>>> Is it happening on SMP or UP ?
>>>>>
>>>>> UP build, S5PC110
>>>>
>>>> OK
>>>>
>>>> I believe you need additional debugging to track the exact moment
>>>> 0xffffffff is fed to 'sk'
>>>>
>>>> It looks like a very strange bug, involving a problem in some assembly
>>>> helper, register save/restore, compiler bug or stack corruption or
>>>> something.
>>>>
>>>
>>> I started with adding WARN_ON(sk == 0xffffffff); just before return in
>>> __inet_lookup_established(), and the problem was gone. So this looks
>>> very strange, like a toolchain problem.
>>
>> Or a timing issue. Adding a WARN_ON() adds extra instructions and might
>> really change the assembly output.
>>
>>>
>>> I used gcc-linaro-arm-linux-gnueabihf-4.8-2013.05.
>>>
>>> If I change the toolchain to
>>>
>>> gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415
>>>
>>> the problem seems to have gone away.
>>
>> Its totally possible some barrier was not properly handled by the
>> compiler. You could disassemble the function on both toolchains and
>> try to spot the issue.
>>
>
> So I gave it a try.
>
> Below is a part of assembly code (ARM) which corresponds to the last
> lines of the __inet_lookup_established():
>
> C source:
> =========
> found:
> rcu_read_unlock();
> return sk;
> }
>
> assembly for toolchain 4.7:
> ===========================
> c0333bb8: ebf4bb6e bl c0062978 <__rcu_read_unlock>
> c0333bbc: e51b0030 ldr r0, [fp, #-48] ; 0x30
> c0333bc0: e24bd028 sub sp, fp, #40 ; 0x28
> c0333bc4: e89daff0 ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}
> c0333bc8: e5132018 ldr r2, [r3, #-24]
>
>
> assembly for toolchain 4.8:
> ===========================
> c033ff5c: ebf4927e bl c006495c <__rcu_read_unlock>
> c033ff60: e24bd028 sub sp, fp, #40 ; 0x28
> c033ff64: e51b0030 ldr r0, [fp, #-48] ; 0x30
> c033ff68: e89daff0 ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}
> c033ff6c: e5113018 ldr r3, [r1, #-24]
>
> What can be seen is that the usage of registers is slightly different,
> and, what is more important, the _order_ of ldr/sub is different.
> Now, if I swap the instructions at offsets c033ff60 and c033ff64
> in the 4.8-generated vmlinux, the problem seems gone! Well, at least
> the binary behaves the same way as the 4.7-generated one.
>
> Here is a _hypothesis_ of what _might_ be happening:
>
> The function in question puts its return value in the register r0.
> In both cases the return value is fetched from a memory location
> relative #-48 to what the frame pointer points to. However,
> in the 4.7-generated binary the ldr executes in the branch delay slot,
> whereas in the 4.8-generated binary it is the sub which executes
> in the branch delay slot. That way, in the 4.7-generated binary the return
> value is fetched before __rcu_read_unlock begins, but in the
> 4.8-generated binary it is fetched some time later. Which might be
> enough for someone else to schedule in and break the data to be
> copied to r0 and returned from the function.
>
> As I said, this is just a hypothesis.
>
Please disregard what I have written.
There is no delay slot on ARM :O
A nice hypothesis, though ;)
AP
next prev parent reply other threads:[~2014-01-17 12:47 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-09 11:47 [PATCH] net: sk == 0xffffffff fix - not for commit Andrzej Pietrasiewicz
2013-12-09 15:31 ` Eric Dumazet
[not found] ` <1386603119.30495.331.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2013-12-10 6:55 ` Andrzej Pietrasiewicz
[not found] ` <52A6BAC6.4090407-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
2013-12-10 14:25 ` Eric Dumazet
2014-01-16 15:21 ` Andrzej Pietrasiewicz
2014-01-16 16:29 ` Eric Dumazet
2014-01-17 12:18 ` Andrzej Pietrasiewicz
2014-01-17 12:47 ` Andrzej Pietrasiewicz [this message]
[not found] ` <1386589672-5830-1-git-send-email-andrzej.p-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
2014-01-17 0:01 ` Andrew Ruder
2014-01-24 13:38 ` Andrew Ruder
2014-01-24 14:15 ` Andrew Ruder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52D92660.5060308@samsung.com \
--to=andrzej.p@samsung.com \
--cc=balbi@ti.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=jmorris@namei.org \
--cc=kaber@trash.net \
--cc=kuznet@ms2.inr.ac.ru \
--cc=kyungmin.park@samsung.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=mina86@mina86.com \
--cc=netdev@vger.kernel.org \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).