From: "aluno3@poczta.onet.pl" <aluno3@poczta.onet.pl>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: device-mapper development <dm-devel@redhat.com>
Subject: Re: Calltrace in dm-snapshot in 2.6.27 kernel
Date: Wed, 19 Nov 2008 09:31:52 +0100 [thread overview]
Message-ID: <4923CEF8.5060103@poczta.onet.pl> (raw)
In-Reply-To: <Pine.LNX.4.64.0810230938280.3939@hs20-bc2-1.build.redhat.com>
Hi
I tested kernel 2.6.27.6 with patch from 2.6.28rc (wait for chunks in destructor,fix register_snapshot deadlock,) and I identified next problem with kernel and dm but repeatability this problem is very small.I got call trace:
Pid: 26230, comm: kcopyd Not tainted (2.6.27.6 #36)
EIP: 0060:[<c044d485>] EFLAGS: 00010282 CPU: 1
EIP is at remove_exception+0x5/0x20
EAX: ca3b5908 EBX: ca3b5908 ECX: 00200200 EDX: 00100100
ESI: f7b489f8 EDI: e92ad980 EBP: 00000000 ESP: f29c7ec0
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kcopyd (pid: 26230, ti=f29c6000 task=e8512430 task.ti=f29c6000)
Stack: c044e03f 0000000d 00000000 c85948c0 00000000 c044f2e7 0009bc30
00000000
0000e705 00000000 e8e41288 e92ad980 00000000 c044e0f0 e8e41288
c7800ec8
00000000 c0449224 00000000 c7800fb4 00000400 00000000 00000000
f2bdfbb0
Call Trace:
[<c044e03f>] pending_complete+0x9f/0x110
[<c044f2e7>] persistent_commit+0xc7/0x110
[<c044e0f0>] copy_callback+0x30/0x40
[<c0449224>] segment_complete+0x154/0x1d0
[<c0448e55>] run_complete_job+0x45/0x80
[<c04490d0>] segment_complete+0x0/0x1d0
[<c0448e10>] run_complete_job+0x0/0x80
[<c0449014>] process_jobs+0x14/0x70
[<c0449070>] do_work+0x0/0x40
[<c0449086>] do_work+0x16/0x40
[<c013502d>] run_workqueue+0x4d/0xf0
[<c013514d>] worker_thread+0x7d/0xc0
[<c01382e0>] autoremove_wake_function+0x0/0x30
[<c0526583>] __sched_text_start+0x1e3/0x4a0
[<c01382e0>] autoremove_wake_function+0x0/0x30
[<c0121a2b>] complete+0x2b/0x40
[<c01350d0>] worker_thread+0x0/0xc0
[<c0137db4>] kthread+0x44/0x70
[<c0137d70>] kthread+0x0/0x70
[<c0104c57>] kernel_thread_helper+0x7/0x10
=======================
Code: 4b 0c e8 cf ff ff ff 8b 56 08 8d 04 c2 8b 10 89 13 89 18 89 5a 04
89 43 04 5b 5e c3 8d 76 00 8d bc 27 00 00 00 00 8b 48 04 8b 10 <89> 11
89 4a 04 c7 00 00 01 10 00 c7 40 04 00 02 20 00 c3 90 8d
EIP: [<c044d485>] remove_exception+0x5/0x20 SS:ESP 0068:f29c7ec0
---[ end trace 834a1d3742a1be05 ]---
addr2line returned include/linux/list.h:93 for EIP c044d485:
static inline void __list_del(struct list_head * prev, struct list_head
* next)
{
next->prev = prev;
prev->next = next; //line 93
}
A few weeks ago I got similar call trace with plain kernel 2.6.27 and
patches from mail thread:
BUG: unable to handle kernel paging request at 00200200
IP: [<c044bf65>] remove_exception+0x5/0x20
*pdpt = 0000000029acc001 *pde = 0000000000000000
Oops: 0002 [#1] SMP
Modules linked in: iscsi_trgt mptctl mptbase st sg drbd bonding
iscsi_tcp libiscsi scsi_transport_iscsi aacraid sata_nv forcedeth button
ftdi_sio usbserial
Pid: 31375, comm: kcopyd Not tainted (2.6.27 #21)
EIP: 0060:[<c044bf65>] EFLAGS: 00010282 CPU: 1
EIP is at remove_exception+0x5/0x20
EAX: f276da88 EBX: f276da88 ECX: 00200200 EDX: 00100100
ESI: c79a4a58 EDI: c9268cc0 EBP: 00000000 ESP: ecbcbec0
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kcopyd (pid: 31375, ti=ecbca000 task=e6d9d220 task.ti=ecbca000)
Stack: c044cb1f 0000000e 00000000 c916b480 00000000 c044dde3 00018f47
00000000
00002870 00000000 c70cba48 c9268cc0 00000000 c044cbd0 c70cba48
c720aec8
00000000 c0447d04 00000000 c720afb4 00000400 00000000 00000000
efc65580
Call Trace:
[<c044cb1f>] pending_complete+0x9f/0x110
[<c044dde3>] persistent_commit+0xe3/0x110
[<c044cbd0>] copy_callback+0x30/0x40
[<c0447d04>] segment_complete+0x154/0x1d0
[<c0447935>] run_complete_job+0x45/0x80
[<c0447bb0>] segment_complete+0x0/0x1d0
[<c04478f0>] run_complete_job+0x0/0x80
[<c0447af4>] process_jobs+0x14/0x70
[<c0447b50>] do_work+0x0/0x40
[<c0447b66>] do_work+0x16/0x40
[<c013509d>] run_workqueue+0x4d/0xf0
[<c01351bd>] worker_thread+0x7d/0xc0
[<c0138350>] autoremove_wake_function+0x0/0x30
[<c0524f2c>] __sched_text_start+0x1ec/0x4b0
[<c0138350>] autoremove_wake_function+0x0/0x30
[<c0121a9b>] complete+0x2b/0x40
[<c0135140>] worker_thread+0x0/0xc0
[<c0137e24>] kthread+0x44/0x70
[<c0137de0>] kthread+0x0/0x70
[<c0104c57>] kernel_thread_helper+0x7/0x10
=======================
Code: 4b 0c e8 cf ff ff ff 8b 56 08 8d 04 c2 8b 10 89 13 89 18 89 5a 04
89 43 04 5b 5e c3 8d 76 00 8d bc 27 00 00 00 00 8b 48 04 8b 10 <89> 11
89 4a 04 c7 00 00 01 10 00 c7 40 04 00 02 20 00 c3 90 8d
EIP: [<c044bf65>] remove_exception+0x5/0x20 SS:ESP 0068:ecbcbec0
---[ end trace 25afcedfe7eb0a2b ]---
Is this known problem or something new? Thanks
Mikulas Patocka wrote:
> On Thu, 23 Oct 2008, aluno3@poczta.onet.pl wrote:
>
>
>> I used dm-snapshot-fix-primary-pe-race.patch and last patch related with
>> pending_exception.After the same test and workload everything work
>> correctly so far.Is it final patch?
>>
>
> Yes, these two patches are expected to be the final fix. Thanks for the
> testing. If you get some more crashes even with these two, write about
> them.
>
> Mikulas
>
>
>> best and thanks
>>
>>
>> Mikulas Patocka wrote:
>>
>>> Oh, sorry for this "struct struct" in the patch in free_pending_exception,
>>> replace it just with one "struct". I forgot to refresh the patch before
>>> sending it.
>>>
>>> Mikulas
>>>
>>> On Wed, 22 Oct 2008, Mikulas Patocka wrote:
>>>
>>>
>>>
>>>> On Wed, 22 Oct 2008, aluno3@poczta.onet.pl wrote:
>>>>
>>>>
>>>>
>>>>> Hi
>>>>>
>>>>> I used your patch and I ran test the same workload. After a few hours
>>>>> test, everything is OK. Is it possible? Test is still running.When I get
>>>>> something wrong from kernel I write to You again.
>>>>>
>>>>>
>>>> Hi
>>>>
>>>> That's good that it works. So try this. Keep the first patch (it is this
>>>> one ---
>>>> http://people.redhat.com/mpatocka/patches/kernel/2.6.27/dm-snapshot-fix-primary-pe-race.patch
>>>> --- I think Milan already sent it to you and you have it applied). Undo
>>>> the second patch (that one that hides deallocation with /* */ ). And apply
>>>> this. Run the same test.
>>>>
>>>> Mikulas
>>>>
>>>> ---
>>>> drivers/md/dm-snap.c | 10 +++++++++-
>>>> drivers/md/dm-snap.h | 2 ++
>>>> 2 files changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> Index: linux-2.6.27-clean/drivers/md/dm-snap.c
>>>> ===================================================================
>>>> --- linux-2.6.27-clean.orig/drivers/md/dm-snap.c 2008-10-22 15:41:24.000000000 +0200
>>>> +++ linux-2.6.27-clean/drivers/md/dm-snap.c 2008-10-22 15:51:33.000000000 +0200
>>>> @@ -368,6 +368,7 @@ static struct dm_snap_pending_exception
>>>> struct dm_snap_pending_exception *pe = mempool_alloc(s->pending_pool,
>>>> GFP_NOIO);
>>>>
>>>> + atomic_inc(&s->n_pending_exceptions);
>>>> pe->snap = s;
>>>>
>>>> return pe;
>>>> @@ -375,7 +376,10 @@ static struct dm_snap_pending_exception
>>>>
>>>> static void free_pending_exception(struct dm_snap_pending_exception *pe)
>>>> {
>>>> - mempool_free(pe, pe->snap->pending_pool);
>>>> + struct struct dm_snapshot *s = pe->snap;
>>>> + mempool_free(pe, s->pending_pool);
>>>> + smp_mb__before_atomic_dec();
>>>> + atomic_dec(&s->n_pending_exceptions);
>>>> }
>>>>
>>>> static void insert_completed_exception(struct dm_snapshot *s,
>>>> @@ -601,6 +605,7 @@ static int snapshot_ctr(struct dm_target
>>>> s->valid = 1;
>>>> s->active = 0;
>>>> s->last_percent = 0;
>>>> + atomic_set(&s->n_pending_exceptions, 0);
>>>> init_rwsem(&s->lock);
>>>> spin_lock_init(&s->pe_lock);
>>>> s->ti = ti;
>>>> @@ -727,6 +732,9 @@ static void snapshot_dtr(struct dm_targe
>>>> /* After this returns there can be no new kcopyd jobs. */
>>>> unregister_snapshot(s);
>>>>
>>>> + while (atomic_read(&s->n_pending_exceptions))
>>>> + yield();
>>>> +
>>>> #ifdef CONFIG_DM_DEBUG
>>>> for (i = 0; i < DM_TRACKED_CHUNK_HASH_SIZE; i++)
>>>> BUG_ON(!hlist_empty(&s->tracked_chunk_hash[i]));
>>>> Index: linux-2.6.27-clean/drivers/md/dm-snap.h
>>>> ===================================================================
>>>> --- linux-2.6.27-clean.orig/drivers/md/dm-snap.h 2008-10-22 15:45:08.000000000 +0200
>>>> +++ linux-2.6.27-clean/drivers/md/dm-snap.h 2008-10-22 15:46:49.000000000 +0200
>>>> @@ -163,6 +163,8 @@ struct dm_snapshot {
>>>>
>>>> mempool_t *pending_pool;
>>>>
>>>> + atomic_t n_pending_exceptions;
>>>> +
>>>> struct exception_table pending;
>>>> struct exception_table complete;
>>>>
>>>>
>>>> --
>>>> dm-devel mailing list
>>>> dm-devel@redhat.com
>>>> https://www.redhat.com/mailman/listinfo/dm-devel
>>>>
>>>>
>>>>
>>>
>>>
>
>
>
next prev parent reply other threads:[~2008-11-19 8:31 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-20 6:23 Calltrace in dm-snapshot in 2.6.27 kernel aluno3
2008-10-20 8:43 ` Milan Broz
2008-10-21 6:39 ` aluno3
2008-10-21 13:55 ` Mikulas Patocka
[not found] ` <48FDFF53.5080007@poczta.onet.pl>
2008-10-21 17:22 ` Mikulas Patocka
2008-10-21 18:42 ` aluno3
2008-10-21 21:43 ` Mikulas Patocka
2008-10-22 13:37 ` aluno3
2008-10-22 15:45 ` Mikulas Patocka
2008-10-22 16:39 ` Mikulas Patocka
2008-10-23 11:30 ` aluno3
2008-10-23 13:40 ` Mikulas Patocka
2008-11-19 8:31 ` aluno3 [this message]
2008-11-24 10:52 ` Mikulas Patocka
2008-11-26 7:38 ` aluno3
2008-11-28 7:28 ` aluno3
2008-12-02 2:10 ` Mikulas Patocka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4923CEF8.5060103@poczta.onet.pl \
--to=aluno3@poczta.onet.pl \
--cc=dm-devel@redhat.com \
--cc=mpatocka@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.