From: Andrew <nitr0@seti.kr.ua>
To: Alexander Duyck <alexander.duyck@gmail.com>, netdev@vger.kernel.org
Subject: Re: Kernel 4.1.12 crash
Date: Wed, 25 Nov 2015 00:59:52 +0200 [thread overview]
Message-ID: <5654EBE8.9030705@seti.kr.ua> (raw)
In-Reply-To: <56514FF5.7060906@gmail.com>
Hi.
I tried to reproduce errors in virtual environment (some VMs on my
notebook).
I've tried to create 1000 client PPPoE sessions from this box via script:
for i in `seq 1 1000`; do pppd plugin rp-pppoe.so user test password
test nodefaultroute maxfail 0 persist nodefaultroute holdoff 1 noauth
eth0; done
And on VM that is used as client I've got strange random crashes (that
are present only when server is online - so they're network-related):
http://postimg.org/image/ohr2mu3rj/ - crash is here:
(gdb) list *process_one_work+0x32
0xc10607b2 is in process_one_work
(/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/kernel/workqueue.c:1952).
1947 __releases(&pool->lock)
1948 __acquires(&pool->lock)
1949 {
1950 struct pool_workqueue *pwq = get_work_pwq(work);
1951 struct worker_pool *pool = worker->pool;
1952 bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
1953 int work_color;
1954 struct worker *collision;
1955 #ifdef CONFIG_LOCKDEP
1956 /*
http://postimg.org/image/x9mychssx/ - crash is here (noticed twice):
0xc10658bf is in kthread_data
(/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/kernel/kthread.c:136).
131 * The caller is responsible for ensuring the validity of @task when
132 * calling this function.
133 */
134 void *kthread_data(struct task_struct *task)
135 {
136 return to_kthread(task)->data;
137 }
which is leaded by strange place:
(gdb) list *kthread_create_on_node+0x120
0xc1065340 is in kthread
(/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/kernel/kthread.c:176).
171 {
172 __kthread_parkme(to_kthread(current));
173 }
174
175 static int kthread(void *_create)
176 {
177 /* Copy data: it's on kthread's stack */
178 struct kthread_create_info *create = _create;
179 int (*threadfn)(void *data) = create->threadfn;
180 void *data = create->data;
And earlier:
(gdb) list *ret_from_kernel_thread+0x21
0xc13bb181 is at
/var/testpoint/LEAF/source/i486-unknown-linux-uclibc/linux/linux-4.1/arch/x86/kernel/entry_32.S:312.
307 popl_cfi %eax
308 pushl_cfi $0x0202 # Reset kernel eflags
309 popfl_cfi
310 movl PT_EBP(%esp),%eax
311 call *PT_EBX(%esp)
312 movl $0,PT_EAX(%esp)
313 jmp syscall_exit
314 CFI_ENDPROC
315 ENDPROC(ret_from_kernel_thread)
316
Stack corruption?..
I'll try to make test environment on real hardware. And I'll try to test
with older kernels.
22.11.2015 07:17, Alexander Duyck пишет:
> On 11/21/2015 12:16 AM, Andrew wrote:
>> Memory corruption, if happens, IMHO shouldn't be a hardware-related -
>> almost all of these boxes, except H61M-based box from 1st log, works
>> for a long time with uptime more than year; and only software was
>> changed on it; H61M-based box runs memtest86 for a tens of hours w/o
>> any error. If it was caused by hardware - they should crash even
>> earlier.
>
> I wasn't saying it was hardware related. My thought is that it could
> be some sort of use after free or double free type issue. Basically
> what you end up with is the memory getting corrupted by software that
> is accessing regions it shouldn't be.
>
>> Rarely on different servers I saw 'zram decompression error' messages
>> (in this case I've got such message on H61M-based box).
>>
>> Also, other people that uses accel-ppp as BRAS software, have
>> different kernel panics/bugs/oopses on fresh kernels.
>>
>> I'll try to apply these patches, and I'll try to switch back to
>> kernels that were stable on some boxes.
>
> If you could bisect this it would be useful. Basically we just need
> to determine where in the git history these issues started popping up
> so that we can then narrow down on the root cause.
>
> - Alex
next prev parent reply other threads:[~2015-11-24 22:59 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-20 13:58 Kernel 4.1.12 crash Andrew
2015-11-20 23:13 ` Alexander Duyck
2015-11-21 8:16 ` Andrew
2015-11-22 5:17 ` Alexander Duyck
2015-11-22 10:45 ` Andrew
2015-11-24 22:59 ` Andrew [this message]
2015-11-25 9:35 ` Andrew
2015-11-25 14:10 ` Guillaume Nault
[not found] ` <5655CCAE.6000300@seti.kr.ua>
2015-11-26 16:44 ` Guillaume Nault
[not found] ` <565B7699.8030105@seti.kr.ua>
2015-11-30 15:03 ` Guillaume Nault
2015-11-30 20:42 ` Guillaume Nault
2015-12-02 17:23 ` Guillaume Nault
2015-12-03 15:35 ` Guillaume Nault
2015-12-03 21:09 ` Andrew
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5654EBE8.9030705@seti.kr.ua \
--to=nitr0@seti.kr.ua \
--cc=alexander.duyck@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.