* Re: J1939 Memory corruption
[not found] <CA+8_ywZWX9ECCvsRM5NOXaD9TvL1Sg=WuSGELfmAhzYRn6Z_GA@mail.gmail.com>
@ 2016-07-20 7:36 ` Kurt Van Dijck
0 siblings, 0 replies; only message in thread
From: Kurt Van Dijck @ 2016-07-20 7:36 UTC (permalink / raw)
To: Maxime Jayat; +Cc: linux-can
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 7091 bytes --]
Hey Maxime,
I added linux-can in CC:
I appreciate your comments.
I will try to provide you with proper answers.
I'll first fresh up my memory, and look into the code.
I hope to come back on this before the weekend.
Kind regards,
Kurt
--- Original message ---
> Date: Tue, 19 Jul 2016 15:15:24 +0200
> From: Maxime Jayat <maxime.jayat@mobile-devices.fr>
> To: kurt@vandijck-laurijssen.be
> Subject: J1939 Memory corruption
>
> Hello Kurt,
> I am a user of your J1939 implementation for linux. Previously I was
> using it with a linux 2.6.32 with success. Recently we moved to a 4.1
> kernel, so I am currently testing the new J1939 implementation (the
> 'direct' one). I merged the linux j1939d-v4.1 branch into my tree and
> it works correctly. Not having to patch and use iproute2 is greatly
> appreciated.
> However, I am able to reliably reproduce a kernel panic, or at least a
> kref warning which leads to a random kernel crash some time later.
> I have attached a C program reproducing the problem, but here is a
> quick description:
> - I bind a J1939 socket to source address 0x80 with name
> 0x7777777777777777
> - I send an address claim for this source address
> - Another "external" ECU sends an address claim for 0x80 with name
> 0x1111111111111111 (which has higher priority, so "wins" the source
> address).
> - I re-bind my J1939 socket to source address 0x81.
> This leads to:
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 861 at include/linux/kref.h:47
> j1939_name_local_get+0x128/0x13c()
> Modules linked in:
> CPU: 0 PID: 861 Comm: j1939 Not tainted 4.1.24-linux4sam_5.3 #64
> Hardware name: Atmel SAMA5
> [<c001721c>] (unwind_backtrace) from [<c00145e8>]
> (show_stack+0x20/0x24)
> [<c00145e8>] (show_stack) from [<c0697db0>] (dump_stack+0x24/0x28)
> [<c0697db0>] (dump_stack) from [<c0027a14>]
> (warn_slowpath_common+0x94/0xc0)
> [<c0027a14>] (warn_slowpath_common) from [<c0027afc>]
> (warn_slowpath_null+0x2c/0x34)
> [<c0027afc>] (warn_slowpath_null) from [<c05a00ec>]
> (j1939_name_local_get+0x128/0x13c)
> [<c05a00ec>] (j1939_name_local_get) from [<c05a42c8>]
> (j1939sk_bind+0x208/0x360)
> [<c05a42c8>] (j1939sk_bind) from [<c049aef4>] (SyS_bind+0x6c/0x9c)
> [<c049aef4>] (SyS_bind) from [<c0010860>] (ret_fast_syscall+0x0/0x60)
> ---[ end trace 93231ae522a8b287 ]---
> FYI, candump during this:
> (029.642505)Â can0Â 18EEFF80Â Â [8]Â 77 77 77 77 77 77 77 77
> (001.001629)Â can0Â 18EEFF80Â Â [8]Â 11 11 11 11 11 11 11 11
> (001.234503)Â can0Â 18EEFF81Â Â [8]Â 77 77 77 77 77 77 77 77
> I tried to understand to problem and arrived to the following (and
> incomplete) conclusions:
> It seems to be a mismatch between the ecu list in priv->ecus and the
> refcounting of the ECUs.
> When the external address claim is received, __j1939_ecu_unregister is
> called by j1939_process_address_claim for my ECU and it is removed from
> the priv->ecus list but there are still some references to it, so it is
> not freed.
> Then, when re-binding the socket, j1939_name_local_put is called which
> calls __j1939_ecu_get_register which does not find the name in the
> priv->ecus list, so it reallocates a new ECU and insert it in the
> priv->ecus list. However j1939_name_local_put immediately drops the ECU
> reference, so the ECU is freed, but it is still in the priv->ecus list!
> So the list contains a reference to a freed structure. The previous
> warning appears when kref tries to take a reference while the count is
> already at zero. At this point the kernel panics, either when closing
> the socket, or randomly a few seconds later because the memory is
> corrupted.
> I have trouble following where all the krefs to the ecus are taken and
> if it all makes sense everywhere. __j1939_ecu_get_register can either
> allocate a new ECU and hold a reference (kref_init initializes at 1),
> or return an existing ECU without touching the reference count. How is
> it possible to know if the caller must call put_j1939_ecu when done
> with the ecu?
> By the way, __j1939_ecu_get_register has a unused "create_if_necessary"
> parameter.
> Although I probably could fix these issues with some time, I would
> appreciate any help you can give me.
> Thanks in advance,
> --
> Maxime Jayat
> #include <errno.h>
> #include <inttypes.h>
> #include <net/if.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/ioctl.h>
> #include <sys/socket.h>
> #include <sys/types.h>
> #include <unistd.h>
> /* linux can headers copied into currect directory */
> #include "can.h"
> #include "raw.h"
> #include "j1939.h"
>
>
> static int open_j1939_socket(const char *iface, uint64_t name, uint8_t sa)
> {
> int ret, sock;
> int value;
> struct sockaddr_can saddr = {
> .can_family = AF_CAN,
> .can_addr.j1939 = {
> .name = name,
> .addr = sa,
> .pgn = 0x0ee00,
> },
> };
>
> sock = ret = socket(PF_CAN, SOCK_DGRAM | SOCK_CLOEXEC, CAN_J1939);
> if (ret < 0) {
> goto fail;
> }
>
> ret = setsockopt(sock, SOL_SOCKET, SO_BINDTODEVICE,
> iface, strnlen(iface, 40));
> if (ret < 0) {
> goto fail;
> }
>
> ret = bind(sock, (void *)&saddr, sizeof(saddr));
> if (ret < 0) {
> goto fail;
> }
> return sock;
>
> fail:
> close(sock);
> return -1;
> }
>
> static int open_raw_socket(const char *iface)
> {
> int s;
> struct sockaddr_can addr;
> struct ifreq ifr;
> s = socket(PF_CAN, SOCK_RAW, CAN_RAW);
>
> strcpy(ifr.ifr_name, iface);
> if (ioctl(s, SIOCGIFINDEX, &ifr) < 0) {
> perror("SIOCGIFINDEX");
> return 1;
> }
> addr.can_family = AF_CAN;
> addr.can_ifindex = ifr.ifr_ifindex;
>
> if (bind(s, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
> perror("bind");
> return 1;
> }
>
> return s;
> }
>
>
> int main(void)
> {
> int s1939, sraw;
> uint64_t name = 0x7777777777777777ll;
> uint8_t buf[8];
>
> s1939 = open_j1939_socket("can0", name, /* src address */ 0x80);
> sraw = open_raw_socket("can0");
>
> memcpy(buf, &name, 8);
> /* Send address claim for src address 0x80 */
> send(s1939, buf, 8, 0);
> sleep(1);
> {
> /* Simulate an ecu claiming src address 0x80 with smaller NAME */
> struct can_frame frame = {
> .can_id = 0x18eeff80 | CAN_EFF_FLAG,
> .can_dlc = 8,
> .data = { 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11 }
> };
> write(sraw, &frame, sizeof(frame));
> }
> sleep(1);
> /* Rebind to src address 0x81 */
> struct sockaddr_can saddr = {
> .can_family = AF_CAN,
> .can_addr.j1939 = {
> .name = name,
> .addr = 0x81,
> .pgn = 0x0ee00,
> },
> };
> bind(s1939, (void *)&saddr, sizeof(saddr));
> /* Memory corrupted */
>
> close(s1939);
> close(sraw);
> }
^ permalink raw reply [flat|nested] only message in thread