* [2.6.28.5] dm/crypt oops: "unable to handle kernel NULL pointer dereference" (tainted)
@ 2009-03-11 12:19 Melchior FRANZ
2009-03-11 20:42 ` Milan Broz
0 siblings, 1 reply; 4+ messages in thread
From: Melchior FRANZ @ 2009-03-11 12:19 UTC (permalink / raw)
To: linux-crypto
Hi,
a few days ago I got an oops in the crypt module. This is with an
nvidia-tainted kernel, so I didn't file a bugzilla report, but
Geert and Milan suggested that I post it here if the oops isn't
related to the nvidia driver. Of course, I can't be sure, but
I've been using both nvidia-drivers and dm/crypt since years
and never got this problem, and there's no indication in the bt
that nvidia is any way involved.
I couldn't reproduce the problem with or without the nvidia
driver (173.14.17 for an FX5500), nor can I say which of my
input exactly caused it (done by a script), but it was when
"destroying" a loop-mounted ~4GB file, ext3 formatted and aes
encrypted, so one of these:
umount ...
cryptsetup remove ...
losetup -d ...
This has worked for years, so it's probably not some recent
user error. The next time I mounted/unmounted the file it
worked flawlessly, as it used to. System:
Intel P4, 2.4G
Linux 2.6.28.5 (vanilla)
gcc 4.3.1 (openSuSE 11.0)
libc 2.8
All modules compiled in, except for the nvidia module and
scsi_wait_scan (for historic reasons).
One thing might be of interest: I had the (desktop-)system
hibernated a few times in a row (2 or 3 times?). Or maybe
it was just bitflipping due to cosmic rays? :-)
BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c0158a75>] mempool_free+0xd/0x9a
*pde = 00000000
Oops: 0000 [#1] PREEMPT
last sysfs file: /sys/devices/virtual/block/dm-1/range
Modules linked in: nvidia(P) scsi_wait_scan
Pid: 19181, comm: loop1 Tainted: P (2.6.28.5 #4) MS-6788
EIP: 0060:[<c0158a75>] EFLAGS: 00010286 CPU: 0
EIP is at mempool_free+0xd/0x9a
EAX: d36af0d4 EBX: 00000000 ECX: 00000001 EDX: 00000000
ESI: d36af0d4 EDI: 00000000 EBP: 00000000 ESP: f621bf58
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process loop1 (pid: 19181, ti=f621a000 task=d98537b0 task.ti=f621a000)
Stack:
c03b3a94 c03b3bb6 00000000 c0198dfb f71a13f0 c032ca0a 00336000 00000000
00000000 f723dd08 f723dc00 c72c1780 00001000 00000001 00000001 00000000
c032c570 00000000 00000000 00000000 f621bfb4 08005000 00000000 f723dc00
Call Trace:
[<c03b3a94>] crypt_dec_pending+0x28/0x4e
[<c03b3bb6>] crypt_endio+0x0/0xde
[<c0198dfb>] bio_endio+0x19/0x30
[<c032ca0a>] loop_thread+0x2f6/0x42d
[<c032c570>] do_lo_send_aops+0x0/0x1a4
[<c032c714>] loop_thread+0x0/0x42d
[<c0131c38>] kthread+0x37/0x5f
[<c0131c01>] kthread+0x0/0x5f
[<c0103d93>] kernel_thread_helper+0x7/0x14
Code: 08 08 74 05 e8 79 45 37 00 8b 04 24 8b 50 0c 89 f0 8b 0c 24 ff 51 14
31 c0 e9 90 fe ff ff 56 53 83 ec 04 89 c6 89 d3 85 c0 74 54 <8b> 42
04 3b 02 7d 6b 9c 59 fa 89 e0 25 00 e0 ff ff 83 40 14 01×
EIP: [<c0158a75>] mempool_free+0xd/0x9a SS:ESP 0068:f621bf58
---[ end trace 916d6e0bbba638c2 ]---
m.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [2.6.28.5] dm/crypt oops: "unable to handle kernel NULL pointer dereference" (tainted) 2009-03-11 12:19 [2.6.28.5] dm/crypt oops: "unable to handle kernel NULL pointer dereference" (tainted) Melchior FRANZ @ 2009-03-11 20:42 ` Milan Broz 2009-03-13 13:49 ` Alasdair G Kergon 0 siblings, 1 reply; 4+ messages in thread From: Milan Broz @ 2009-03-11 20:42 UTC (permalink / raw) To: Melchior FRANZ; +Cc: linux-crypto, Alasdair G Kergon Melchior FRANZ wrote: > > umount ... > cryptsetup remove ... > losetup -d ... > > BUG: unable to handle kernel NULL pointer dereference at 00000004 > IP: [<c0158a75>] mempool_free+0xd/0x9a ... Yes, I know about that problem and I am almost sure that it is fixed by patch I sent there http://lkml.org/lkml/2009/3/5/63 Something changed in timing so probability of hitting that bug apparently increased in the last kernel (the problem was there always:-) I had asked Alasdair to queue that to dm queue for upstream... Milan -- mbroz@redhat.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [2.6.28.5] dm/crypt oops: "unable to handle kernel NULL pointer dereference" (tainted) 2009-03-11 20:42 ` Milan Broz @ 2009-03-13 13:49 ` Alasdair G Kergon 2009-03-16 13:38 ` Melchior FRANZ 0 siblings, 1 reply; 4+ messages in thread From: Alasdair G Kergon @ 2009-03-13 13:49 UTC (permalink / raw) To: Melchior FRANZ, linux-crypto; +Cc: Milan Broz Latest version of this patch - please give this some testing! Alasdair From: Milan Broz <mbroz@redhat.com> The following oops has been reported when dm-crypt runs over a loop device. ... [ 70.381058] Process loop0 (pid: 4268, ti=cf3b2000 task=cf1cc1f0 task.ti=cf3b2000) ... [ 70.381058] Call Trace: [ 70.381058] [<d0d76601>] ? crypt_dec_pending+0x5e/0x62 [dm_crypt] [ 70.381058] [<d0d767b8>] ? crypt_endio+0xa2/0xaa [dm_crypt] [ 70.381058] [<d0d76716>] ? crypt_endio+0x0/0xaa [dm_crypt] [ 70.381058] [<c01a2f24>] ? bio_endio+0x2b/0x2e [ 70.381058] [<d0806530>] ? dec_pending+0x224/0x23b [dm_mod] [ 70.381058] [<d08066e4>] ? clone_endio+0x79/0xa4 [dm_mod] [ 70.381058] [<d080666b>] ? clone_endio+0x0/0xa4 [dm_mod] [ 70.381058] [<c01a2f24>] ? bio_endio+0x2b/0x2e [ 70.381058] [<c02bad86>] ? loop_thread+0x380/0x3b7 [ 70.381058] [<c02ba8a1>] ? do_lo_send_aops+0x0/0x165 [ 70.381058] [<c013754f>] ? autoremove_wake_function+0x0/0x33 [ 70.381058] [<c02baa06>] ? loop_thread+0x0/0x3b7 When a table is being replaced, it waits for I/O to complete before destroying the mempool, but the endio function doesn't call mempool_free() until after completing the bio. Fix it by swapping the order of those two operations. The same problem occurs in dm.c with referenced after dec_pending. Again, we swap the order. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> --- drivers/md/dm-crypt.c | 17 ++++++++++------- drivers/md/dm.c | 32 +++++++++++++++++++------------- 2 files changed, 29 insertions(+), 20 deletions(-) Index: linux-2.6.29-rc7/drivers/md/dm-crypt.c =================================================================== --- linux-2.6.29-rc7.orig/drivers/md/dm-crypt.c 2009-03-13 11:06:18.000000000 +0000 +++ linux-2.6.29-rc7/drivers/md/dm-crypt.c 2009-03-13 11:07:46.000000000 +0000 @@ -568,19 +568,22 @@ static void crypt_inc_pending(struct dm_ static void crypt_dec_pending(struct dm_crypt_io *io) { struct crypt_config *cc = io->target->private; + struct bio *base_bio = io->base_bio; + struct dm_crypt_io *base_io = io->base_io; + int error = io->error; if (!atomic_dec_and_test(&io->pending)) return; - if (likely(!io->base_io)) - bio_endio(io->base_bio, io->error); + mempool_free(io, cc->io_pool); + + if (likely(!base_io)) + bio_endio(base_bio, error); else { - if (io->error && !io->base_io->error) - io->base_io->error = io->error; - crypt_dec_pending(io->base_io); + if (error && !base_io->error) + base_io->error = error; + crypt_dec_pending(base_io); } - - mempool_free(io, cc->io_pool); } /* Index: linux-2.6.29-rc7/drivers/md/dm.c =================================================================== --- linux-2.6.29-rc7.orig/drivers/md/dm.c 2009-03-09 22:12:09.000000000 +0000 +++ linux-2.6.29-rc7/drivers/md/dm.c 2009-03-13 12:34:19.000000000 +0000 @@ -525,9 +525,12 @@ static int __noflush_suspending(struct m static void dec_pending(struct dm_io *io, int error) { unsigned long flags; + int io_error; + struct bio *bio; + struct mapped_device *md = io->md; /* Push-back supersedes any I/O errors */ - if (error && !(io->error > 0 && __noflush_suspending(io->md))) + if (error && !(io->error > 0 && __noflush_suspending(md))) io->error = error; if (atomic_dec_and_test(&io->io_count)) { @@ -537,24 +540,27 @@ static void dec_pending(struct dm_io *io * This must be handled before the sleeper on * suspend queue merges the pushback list. */ - spin_lock_irqsave(&io->md->pushback_lock, flags); - if (__noflush_suspending(io->md)) - bio_list_add(&io->md->pushback, io->bio); + spin_lock_irqsave(&md->pushback_lock, flags); + if (__noflush_suspending(md)) + bio_list_add(&md->pushback, io->bio); else /* noflush suspend was interrupted. */ io->error = -EIO; - spin_unlock_irqrestore(&io->md->pushback_lock, flags); + spin_unlock_irqrestore(&md->pushback_lock, flags); } end_io_acct(io); - if (io->error != DM_ENDIO_REQUEUE) { - trace_block_bio_complete(io->md->queue, io->bio); + io_error = io->error; + bio = io->bio; - bio_endio(io->bio, io->error); - } + free_io(md, io); - free_io(io->md, io); + if (io_error != DM_ENDIO_REQUEUE) { + trace_block_bio_complete(md->queue, bio); + + bio_endio(bio, io_error); + } } } @@ -562,6 +568,7 @@ static void clone_endio(struct bio *bio, { int r = 0; struct dm_target_io *tio = bio->bi_private; + struct dm_io *io = tio->io; struct mapped_device *md = tio->io->md; dm_endio_fn endio = tio->ti->type->end_io; @@ -585,15 +592,14 @@ static void clone_endio(struct bio *bio, } } - dec_pending(tio->io, error); ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [2.6.28.5] dm/crypt oops: "unable to handle kernel NULL pointer dereference" (tainted) 2009-03-13 13:49 ` Alasdair G Kergon @ 2009-03-16 13:38 ` Melchior FRANZ 0 siblings, 0 replies; 4+ messages in thread From: Melchior FRANZ @ 2009-03-16 13:38 UTC (permalink / raw) To: Alasdair G Kergon; +Cc: linux-crypto, Milan Broz * Alasdair G Kergon -- Friday 13 March 2009: > Latest version of this patch - please give this some testing! Thanks. I've applied it to 2.6.29-rc8-git2, and can confirm that my loop-mounted crypto file still works. Of course, I can't say if it fixes the oops bug, as I only ever got that once in a few years, and it would take another few years to be reasonably sure. Unfortunately, it looks as if 2.6.29-rc8-git2 doesn't run all USB devices that I depend on, so I can't even test the patch for a day. (Yes, I'll file an USB bug report once I know more about that.) m. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-03-16 13:38 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-03-11 12:19 [2.6.28.5] dm/crypt oops: "unable to handle kernel NULL pointer dereference" (tainted) Melchior FRANZ 2009-03-11 20:42 ` Milan Broz 2009-03-13 13:49 ` Alasdair G Kergon 2009-03-16 13:38 ` Melchior FRANZ
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.