All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew <nitr0@seti.kr.ua>
To: Alexander Duyck <alexander.duyck@gmail.com>, netdev@vger.kernel.org
Subject: Re: Kernel 4.1.12 crash
Date: Wed, 25 Nov 2015 00:59:52 +0200	[thread overview]
Message-ID: <5654EBE8.9030705@seti.kr.ua> (raw)
In-Reply-To: <56514FF5.7060906@gmail.com>

Hi.

I tried to reproduce errors in virtual environment (some VMs on my 
notebook).

I've tried to create 1000 client PPPoE sessions from this box via script:
for i in `seq 1 1000`; do pppd plugin rp-pppoe.so user test password 
test nodefaultroute maxfail 0 persist nodefaultroute holdoff 1 noauth 
eth0; done

And on VM that is used as client I've got strange random crashes (that 
are present only when server is online - so they're network-related):

http://postimg.org/image/ohr2mu3rj/ - crash is here:
(gdb) list *process_one_work+0x32
0xc10607b2 is in process_one_work 
(/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/kernel/workqueue.c:1952).
1947    __releases(&pool->lock)
1948    __acquires(&pool->lock)
1949    {
1950        struct pool_workqueue *pwq = get_work_pwq(work);
1951        struct worker_pool *pool = worker->pool;
1952        bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
1953        int work_color;
1954        struct worker *collision;
1955    #ifdef CONFIG_LOCKDEP
1956        /*


http://postimg.org/image/x9mychssx/ - crash is here (noticed twice):
0xc10658bf is in kthread_data 
(/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/kernel/kthread.c:136).
131     * The caller is responsible for ensuring the validity of @task when
132     * calling this function.
133     */
134    void *kthread_data(struct task_struct *task)
135    {
136        return to_kthread(task)->data;
137    }

which is leaded by strange place:
(gdb) list *kthread_create_on_node+0x120
0xc1065340 is in kthread 
(/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/kernel/kthread.c:176).
171    {
172        __kthread_parkme(to_kthread(current));
173    }
174
175    static int kthread(void *_create)
176    {
177        /* Copy data: it's on kthread's stack */
178        struct kthread_create_info *create = _create;
179        int (*threadfn)(void *data) = create->threadfn;
180        void *data = create->data;

And earlier:
(gdb) list *ret_from_kernel_thread+0x21
0xc13bb181 is at 
/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/arch/x86/kernel/entry_32.S:312.
307        popl_cfi %eax
308        pushl_cfi $0x0202        # Reset kernel eflags
309        popfl_cfi
310        movl PT_EBP(%esp),%eax
311        call *PT_EBX(%esp)
312        movl $0,PT_EAX(%esp)
313        jmp syscall_exit
314        CFI_ENDPROC
315    ENDPROC(ret_from_kernel_thread)
316

Stack corruption?..

I'll try to make test environment on real hardware. And I'll try to test 
with older kernels.

22.11.2015 07:17, Alexander Duyck пишет:
> On 11/21/2015 12:16 AM, Andrew wrote:
>> Memory corruption, if happens, IMHO shouldn't be a hardware-related - 
>> almost all of these boxes, except H61M-based box from 1st log, works 
>> for a long time with uptime more than year; and only software was 
>> changed on it; H61M-based box runs memtest86 for a tens of hours w/o 
>> any error. If it was caused by hardware - they should crash even 
>> earlier.
>
> I wasn't saying it was hardware related.  My thought is that it could 
> be some sort of use after free or double free type issue. Basically 
> what you end up with is the memory getting corrupted by software that 
> is accessing regions it shouldn't be.
>
>> Rarely on different servers I saw 'zram decompression error' messages 
>> (in this case I've got such message on H61M-based box).
>>
>> Also, other people that uses accel-ppp as BRAS software, have 
>> different kernel panics/bugs/oopses on fresh kernels.
>>
>> I'll try to apply these patches, and I'll try to switch back to 
>> kernels that were stable on some boxes.
>
> If you could bisect this it would be useful.  Basically we just need 
> to determine where in the git history these issues started popping up 
> so that we can then narrow down on the root cause.
>
> - Alex

  parent reply	other threads:[~2015-11-24 22:59 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-20 13:58 Kernel 4.1.12 crash Andrew
2015-11-20 23:13 ` Alexander Duyck
2015-11-21  8:16   ` Andrew
2015-11-22  5:17     ` Alexander Duyck
2015-11-22 10:45       ` Andrew
2015-11-24 22:59       ` Andrew [this message]
2015-11-25  9:35         ` Andrew
2015-11-25 14:10         ` Guillaume Nault
     [not found]           ` <5655CCAE.6000300@seti.kr.ua>
2015-11-26 16:44             ` Guillaume Nault
     [not found]               ` <565B7699.8030105@seti.kr.ua>
2015-11-30 15:03                 ` Guillaume Nault
2015-11-30 20:42                   ` Guillaume Nault
2015-12-02 17:23                     ` Guillaume Nault
2015-12-03 15:35                       ` Guillaume Nault
2015-12-03 21:09                         ` Andrew

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5654EBE8.9030705@seti.kr.ua \
    --to=nitr0@seti.kr.ua \
    --cc=alexander.duyck@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.