[2.6.28.5] dm/crypt oops: "unable to handle kernel NULL pointer dereference" (tainted)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [2.6.28.5] dm/crypt oops: "unable to handle kernel NULL pointer dereference" (tainted)
@ 2009-03-11 12:19 Melchior FRANZ
  2009-03-11 20:42 ` Milan Broz
  0 siblings, 1 reply; 4+ messages in thread
From: Melchior FRANZ @ 2009-03-11 12:19 UTC (permalink / raw)
  To: linux-crypto

Hi,

a few days ago I got an oops in the crypt module. This is with an
nvidia-tainted kernel, so I didn't file a bugzilla report, but
Geert and Milan suggested that I post it here if the oops isn't
related to the nvidia driver. Of course, I can't be sure, but
I've been using both nvidia-drivers and dm/crypt since years
and never got this problem, and there's no indication in the bt
that nvidia is any way involved.

I couldn't reproduce the problem with or without the nvidia
driver (173.14.17 for an FX5500), nor can I say which of my
input exactly caused it (done by a script), but it was when
"destroying" a loop-mounted ~4GB file, ext3 formatted and aes
encrypted, so one of these:

  umount ...
  cryptsetup remove ...
  losetup -d ...

This has worked for years, so it's probably not some recent
user error. The next time I mounted/unmounted the file it
worked flawlessly, as it used to. System:

  Intel P4, 2.4G
  Linux 2.6.28.5  (vanilla)
  gcc 4.3.1  (openSuSE 11.0)
  libc 2.8

All modules compiled in, except for the nvidia module and
scsi_wait_scan (for historic reasons).

One thing might be of interest: I had the (desktop-)system
hibernated a few times in a row (2 or 3 times?). Or maybe
it was just bitflipping due to cosmic rays?  :-)



BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c0158a75>] mempool_free+0xd/0x9a
*pde = 00000000
Oops: 0000 [#1] PREEMPT
last sysfs file: /sys/devices/virtual/block/dm-1/range
Modules linked in: nvidia(P) scsi_wait_scan

Pid: 19181, comm: loop1 Tainted: P           (2.6.28.5 #4) MS-6788
EIP: 0060:[<c0158a75>] EFLAGS: 00010286 CPU: 0
EIP is at mempool_free+0xd/0x9a
EAX: d36af0d4 EBX: 00000000 ECX: 00000001 EDX: 00000000
ESI: d36af0d4 EDI: 00000000 EBP: 00000000 ESP: f621bf58
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process loop1 (pid: 19181, ti=f621a000 task=d98537b0 task.ti=f621a000)
Stack:
 c03b3a94 c03b3bb6 00000000 c0198dfb f71a13f0 c032ca0a 00336000 00000000
 00000000 f723dd08 f723dc00 c72c1780 00001000 00000001 00000001 00000000
 c032c570 00000000 00000000 00000000 f621bfb4 08005000 00000000 f723dc00
Call Trace:
 [<c03b3a94>] crypt_dec_pending+0x28/0x4e
 [<c03b3bb6>] crypt_endio+0x0/0xde
 [<c0198dfb>] bio_endio+0x19/0x30
 [<c032ca0a>] loop_thread+0x2f6/0x42d
 [<c032c570>] do_lo_send_aops+0x0/0x1a4
 [<c032c714>] loop_thread+0x0/0x42d
 [<c0131c38>] kthread+0x37/0x5f
 [<c0131c01>] kthread+0x0/0x5f
 [<c0103d93>] kernel_thread_helper+0x7/0x14
Code: 08 08 74 05 e8 79 45 37 00 8b 04 24 8b 50 0c 89 f0 8b 0c 24 ff 51 14
      31 c0 e9 90 fe ff ff 56 53 83 ec 04 89 c6 89 d3 85 c0 74 54 <8b> 42
      04 3b 02 7d 6b 9c 59 fa 89 e0 25 00 e0 ff ff 83 40 14 01×
EIP: [<c0158a75>] mempool_free+0xd/0x9a SS:ESP 0068:f621bf58
---[ end trace 916d6e0bbba638c2 ]---

m.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [2.6.28.5] dm/crypt oops: "unable to handle kernel NULL pointer dereference" (tainted)
  2009-03-11 12:19 [2.6.28.5] dm/crypt oops: "unable to handle kernel NULL pointer dereference" (tainted) Melchior FRANZ
@ 2009-03-11 20:42 ` Milan Broz
  2009-03-13 13:49   ` Alasdair G Kergon
  0 siblings, 1 reply; 4+ messages in thread
From: Milan Broz @ 2009-03-11 20:42 UTC (permalink / raw)
  To: Melchior FRANZ; +Cc: linux-crypto, Alasdair G Kergon

Melchior FRANZ wrote:
> 
>   umount ...
>   cryptsetup remove ...
>   losetup -d ...
> 
> BUG: unable to handle kernel NULL pointer dereference at 00000004
> IP: [<c0158a75>] mempool_free+0xd/0x9a
...

Yes, I know about that problem and I am almost sure that
it is fixed by patch I sent there http://lkml.org/lkml/2009/3/5/63

Something changed in timing so probability of hitting that
bug apparently increased in the last kernel (the problem was there always:-)

I had asked Alasdair to queue that to dm queue for upstream...

Milan
--
mbroz@redhat.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [2.6.28.5] dm/crypt oops: "unable to handle kernel NULL pointer dereference" (tainted)
  2009-03-11 20:42 ` Milan Broz
@ 2009-03-13 13:49   ` Alasdair G Kergon
  2009-03-16 13:38     ` Melchior FRANZ
  0 siblings, 1 reply; 4+ messages in thread
From: Alasdair G Kergon @ 2009-03-13 13:49 UTC (permalink / raw)
  To: Melchior FRANZ, linux-crypto; +Cc: Milan Broz

Latest version of this patch - please give this some testing!

Alasdair


From: Milan Broz <mbroz@redhat.com>

The following oops has been reported when dm-crypt runs over a loop device.

...
[   70.381058] Process loop0 (pid: 4268, ti=cf3b2000 task=cf1cc1f0 task.ti=cf3b2000)
...
[   70.381058] Call Trace:
[   70.381058]  [<d0d76601>] ? crypt_dec_pending+0x5e/0x62 [dm_crypt]
[   70.381058]  [<d0d767b8>] ? crypt_endio+0xa2/0xaa [dm_crypt]
[   70.381058]  [<d0d76716>] ? crypt_endio+0x0/0xaa [dm_crypt]
[   70.381058]  [<c01a2f24>] ? bio_endio+0x2b/0x2e
[   70.381058]  [<d0806530>] ? dec_pending+0x224/0x23b [dm_mod]
[   70.381058]  [<d08066e4>] ? clone_endio+0x79/0xa4 [dm_mod]
[   70.381058]  [<d080666b>] ? clone_endio+0x0/0xa4 [dm_mod]
[   70.381058]  [<c01a2f24>] ? bio_endio+0x2b/0x2e
[   70.381058]  [<c02bad86>] ? loop_thread+0x380/0x3b7
[   70.381058]  [<c02ba8a1>] ? do_lo_send_aops+0x0/0x165
[   70.381058]  [<c013754f>] ? autoremove_wake_function+0x0/0x33
[   70.381058]  [<c02baa06>] ? loop_thread+0x0/0x3b7

When a table is being replaced, it waits for I/O to complete 
before destroying the mempool, but the endio function doesn't
call mempool_free() until after completing the bio.

Fix it by swapping the order of those two operations.

The same problem occurs in dm.c with referenced after dec_pending.
Again, we swap the order.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm-crypt.c |   17 ++++++++++-------
 drivers/md/dm.c       |   32 +++++++++++++++++++-------------
 2 files changed, 29 insertions(+), 20 deletions(-)

Index: linux-2.6.29-rc7/drivers/md/dm-crypt.c
===================================================================
--- linux-2.6.29-rc7.orig/drivers/md/dm-crypt.c	2009-03-13 11:06:18.000000000 +0000
+++ linux-2.6.29-rc7/drivers/md/dm-crypt.c	2009-03-13 11:07:46.000000000 +0000
@@ -568,19 +568,22 @@ static void crypt_inc_pending(struct dm_
 static void crypt_dec_pending(struct dm_crypt_io *io)
 {
 	struct crypt_config *cc = io->target->private;
+	struct bio *base_bio = io->base_bio;
+	struct dm_crypt_io *base_io = io->base_io;
+	int error = io->error;
 
 	if (!atomic_dec_and_test(&io->pending))
 		return;
 
-	if (likely(!io->base_io))
-		bio_endio(io->base_bio, io->error);
+	mempool_free(io, cc->io_pool);
+
+	if (likely(!base_io))
+		bio_endio(base_bio, error);
 	else {
-		if (io->error && !io->base_io->error)
-			io->base_io->error = io->error;
-		crypt_dec_pending(io->base_io);
+		if (error && !base_io->error)
+			base_io->error = error;
+		crypt_dec_pending(base_io);
 	}
-
-	mempool_free(io, cc->io_pool);
 }
 
 /*
Index: linux-2.6.29-rc7/drivers/md/dm.c
===================================================================
--- linux-2.6.29-rc7.orig/drivers/md/dm.c	2009-03-09 22:12:09.000000000 +0000
+++ linux-2.6.29-rc7/drivers/md/dm.c	2009-03-13 12:34:19.000000000 +0000
@@ -525,9 +525,12 @@ static int __noflush_suspending(struct m
 static void dec_pending(struct dm_io *io, int error)
 {
 	unsigned long flags;
+	int io_error;
+	struct bio *bio;
+	struct mapped_device *md = io->md;
 
 	/* Push-back supersedes any I/O errors */
-	if (error && !(io->error > 0 && __noflush_suspending(io->md)))
+	if (error && !(io->error > 0 && __noflush_suspending(md)))
 		io->error = error;
 
 	if (atomic_dec_and_test(&io->io_count)) {
@@ -537,24 +540,27 @@ static void dec_pending(struct dm_io *io
 			 * This must be handled before the sleeper on
 			 * suspend queue merges the pushback list.
 			 */
-			spin_lock_irqsave(&io->md->pushback_lock, flags);
-			if (__noflush_suspending(io->md))
-				bio_list_add(&io->md->pushback, io->bio);
+			spin_lock_irqsave(&md->pushback_lock, flags);
+			if (__noflush_suspending(md))
+				bio_list_add(&md->pushback, io->bio);
 			else
 				/* noflush suspend was interrupted. */
 				io->error = -EIO;
-			spin_unlock_irqrestore(&io->md->pushback_lock, flags);
+			spin_unlock_irqrestore(&md->pushback_lock, flags);
 		}
 
 		end_io_acct(io);
 
-		if (io->error != DM_ENDIO_REQUEUE) {
-			trace_block_bio_complete(io->md->queue, io->bio);
+		io_error = io->error;
+		bio = io->bio;
 
-			bio_endio(io->bio, io->error);
-		}
+		free_io(md, io);
 
-		free_io(io->md, io);
+		if (io_error != DM_ENDIO_REQUEUE) {
+			trace_block_bio_complete(md->queue, bio);
+
+			bio_endio(bio, io_error);
+		}
 	}
 }
 
@@ -562,6 +568,7 @@ static void clone_endio(struct bio *bio,
 {
 	int r = 0;
 	struct dm_target_io *tio = bio->bi_private;
+	struct dm_io *io = tio->io;
 	struct mapped_device *md = tio->io->md;
 	dm_endio_fn endio = tio->ti->type->end_io;
 
@@ -585,15 +592,14 @@ static void clone_endio(struct bio *bio,
 		}
 	}
 
-	dec_pending(tio->io, error);

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [2.6.28.5] dm/crypt oops: "unable to handle kernel NULL pointer dereference" (tainted)
  2009-03-13 13:49   ` Alasdair G Kergon
@ 2009-03-16 13:38     ` Melchior FRANZ
  0 siblings, 0 replies; 4+ messages in thread
From: Melchior FRANZ @ 2009-03-16 13:38 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: linux-crypto, Milan Broz

* Alasdair G Kergon -- Friday 13 March 2009:
> Latest version of this patch - please give this some testing!

Thanks. I've applied it to 2.6.29-rc8-git2, and can confirm that
my loop-mounted crypto file still works. Of course, I can't say
if it fixes the oops bug, as I only ever got that once in a few
years, and it would take another few years to be reasonably sure.

Unfortunately, it looks as if 2.6.29-rc8-git2 doesn't run all
USB devices that I depend on, so I can't even test the patch
for a day. (Yes, I'll file an USB bug report once I know more
about that.)

m.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-03-16 13:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-11 12:19 [2.6.28.5] dm/crypt oops: "unable to handle kernel NULL pointer dereference" (tainted) Melchior FRANZ
2009-03-11 20:42 ` Milan Broz
2009-03-13 13:49   ` Alasdair G Kergon
2009-03-16 13:38     ` Melchior FRANZ

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.