netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew <nitr0@seti.kr.ua>
To: Alexander Duyck <alexander.duyck@gmail.com>, netdev@vger.kernel.org
Subject: Re: Kernel 4.1.12 crash
Date: Wed, 25 Nov 2015 11:35:44 +0200	[thread overview]
Message-ID: <565580F0.9010307@seti.kr.ua> (raw)
In-Reply-To: <5654EBE8.9030705@seti.kr.ua>

Hm, older image with 3.10.57 looks stable in same testcase - so at least 
one of bugs can be enough easily bisected. I'll try to downgrade kernel 
with same userland for testing, and then - bisect buggy commit.

25.11.2015 00:59, Andrew пишет:
> Hi.
>
> I tried to reproduce errors in virtual environment (some VMs on my 
> notebook).
>
> I've tried to create 1000 client PPPoE sessions from this box via script:
> for i in `seq 1 1000`; do pppd plugin rp-pppoe.so user test password 
> test nodefaultroute maxfail 0 persist nodefaultroute holdoff 1 noauth 
> eth0; done
>
> And on VM that is used as client I've got strange random crashes (that 
> are present only when server is online - so they're network-related):
>
> http://postimg.org/image/ohr2mu3rj/ - crash is here:
> (gdb) list *process_one_work+0x32
> 0xc10607b2 is in process_one_work 
> (/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/kernel/workqueue.c:1952).
> 1947    __releases(&pool->lock)
> 1948    __acquires(&pool->lock)
> 1949    {
> 1950        struct pool_workqueue *pwq = get_work_pwq(work);
> 1951        struct worker_pool *pool = worker->pool;
> 1952        bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
> 1953        int work_color;
> 1954        struct worker *collision;
> 1955    #ifdef CONFIG_LOCKDEP
> 1956        /*
>
>
> http://postimg.org/image/x9mychssx/ - crash is here (noticed twice):
> 0xc10658bf is in kthread_data 
> (/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/kernel/kthread.c:136).
> 131     * The caller is responsible for ensuring the validity of @task 
> when
> 132     * calling this function.
> 133     */
> 134    void *kthread_data(struct task_struct *task)
> 135    {
> 136        return to_kthread(task)->data;
> 137    }
>
> which is leaded by strange place:
> (gdb) list *kthread_create_on_node+0x120
> 0xc1065340 is in kthread 
> (/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/kernel/kthread.c:176).
> 171    {
> 172        __kthread_parkme(to_kthread(current));
> 173    }
> 174
> 175    static int kthread(void *_create)
> 176    {
> 177        /* Copy data: it's on kthread's stack */
> 178        struct kthread_create_info *create = _create;
> 179        int (*threadfn)(void *data) = create->threadfn;
> 180        void *data = create->data;
>
> And earlier:
> (gdb) list *ret_from_kernel_thread+0x21
> 0xc13bb181 is at 
> /var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/arch/x86/kernel/entry_32.S:312.
> 307        popl_cfi %eax
> 308        pushl_cfi $0x0202        # Reset kernel eflags
> 309        popfl_cfi
> 310        movl PT_EBP(%esp),%eax
> 311        call *PT_EBX(%esp)
> 312        movl $0,PT_EAX(%esp)
> 313        jmp syscall_exit
> 314        CFI_ENDPROC
> 315    ENDPROC(ret_from_kernel_thread)
> 316
>
> Stack corruption?..
>
> I'll try to make test environment on real hardware. And I'll try to 
> test with older kernels.
>
> 22.11.2015 07:17, Alexander Duyck пишет:
>> On 11/21/2015 12:16 AM, Andrew wrote:
>>> Memory corruption, if happens, IMHO shouldn't be a hardware-related 
>>> - almost all of these boxes, except H61M-based box from 1st log, 
>>> works for a long time with uptime more than year; and only software 
>>> was changed on it; H61M-based box runs memtest86 for a tens of hours 
>>> w/o any error. If it was caused by hardware - they should crash even 
>>> earlier.
>>
>> I wasn't saying it was hardware related.  My thought is that it could 
>> be some sort of use after free or double free type issue. Basically 
>> what you end up with is the memory getting corrupted by software that 
>> is accessing regions it shouldn't be.
>>
>>> Rarely on different servers I saw 'zram decompression error' 
>>> messages (in this case I've got such message on H61M-based box).
>>>
>>> Also, other people that uses accel-ppp as BRAS software, have 
>>> different kernel panics/bugs/oopses on fresh kernels.
>>>
>>> I'll try to apply these patches, and I'll try to switch back to 
>>> kernels that were stable on some boxes.
>>
>> If you could bisect this it would be useful.  Basically we just need 
>> to determine where in the git history these issues started popping up 
>> so that we can then narrow down on the root cause.
>>
>> - Alex
>

  reply	other threads:[~2015-11-25  9:35 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-20 13:58 Kernel 4.1.12 crash Andrew
2015-11-20 23:13 ` Alexander Duyck
2015-11-21  8:16   ` Andrew
2015-11-22  5:17     ` Alexander Duyck
2015-11-22 10:45       ` Andrew
2015-11-24 22:59       ` Andrew
2015-11-25  9:35         ` Andrew [this message]
2015-11-25 14:10         ` Guillaume Nault
     [not found]           ` <5655CCAE.6000300@seti.kr.ua>
2015-11-26 16:44             ` Guillaume Nault
     [not found]               ` <565B7699.8030105@seti.kr.ua>
2015-11-30 15:03                 ` Guillaume Nault
2015-11-30 20:42                   ` Guillaume Nault
2015-12-02 17:23                     ` Guillaume Nault
2015-12-03 15:35                       ` Guillaume Nault
2015-12-03 21:09                         ` Andrew

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=565580F0.9010307@seti.kr.ua \
    --to=nitr0@seti.kr.ua \
    --cc=alexander.duyck@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).