From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kent Overstreet Subject: Re: Latest kernel NULL pointer deref when running mke2fs Date: Mon, 10 Feb 2014 15:08:18 -0800 Message-ID: <20140210230818.GD2362@kmo> References: <20140204160910.GA7373@redhat.com> <52F11D9E.9020601@fb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "Richard W.M. Jones" , Linux FS Devel , axboe@fb.com, neilb@suse.de To: Chris Mason Return-path: Received: from mail-pb0-f50.google.com ([209.85.160.50]:40027 "EHLO mail-pb0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752362AbaBJXIW (ORCPT ); Mon, 10 Feb 2014 18:08:22 -0500 Received: by mail-pb0-f50.google.com with SMTP id rq2so6945153pbb.9 for ; Mon, 10 Feb 2014 15:08:22 -0800 (PST) Content-Disposition: inline In-Reply-To: <52F11D9E.9020601@fb.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, Feb 04, 2014 at 12:04:30PM -0500, Chris Mason wrote: > > [ + Kent, Jens, Neil ] > > On 02/04/2014 11:09 AM, Richard W.M. Jones wrote: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1061339 > > > > It seems to happen when mke2fs issues an ioctl, looks like it might > > be related to TRIM/discard. > > > > This is under virtualization. The disk is backed by virtio-scsi. > > > > mke2fs -t ext2 -F -b 4096 /dev/VG/LV1 > > mke2fs 1.42.9 (28-Dec-2013) > > [ 44.142483] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 > > [ 44.142483] IP: [] bio_trim+0x1a/0x40 > > [ 44.142483] PGD 1d193067 PUD 1d1c1067 PMD 0 > > [ 44.142483] Oops: 0000 [#1] SMP > > [ 44.142483] Modules linked in: raid1 kvm_amd snd_pcsp snd_pcm kvm snd_timer snd soundcore serio_raw ata_generic pata_acpi virtio_balloon virtio_pci virtio_mmio virtio_net virtio_scsi virtio_blk virtio_console virtio_rng virtio_ring virtio ideapad_laptop sparse_keymap rfkill sym53c8xx scsi_transport_spi crc8 crc_ccitt crc32 crc_itu_t libcrc32c megaraid megaraid_sas megaraid_mbox megaraid_mm > > [ 44.142483] CPU: 0 PID: 229 Comm: mke2fs Tainted: G W 3.14.0-0.rc1.git0.1.fc21.x86_64 #1 > > [ 44.142483] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > > [ 44.142483] task: ffff88001c100000 ti: ffff88001c0e4000 task.ti: ffff88001c0e4000 > > [ 44.142483] RIP: 0010:[] [] bio_trim+0x1a/0x40 > > [ 44.142483] RSP: 0018:ffff88001c0e5b88 EFLAGS: 00000246 > > [ 44.142483] RAX: ffff88001d13f020 RBX: 0000000000000000 RCX: 000000000000b690 > > [ 44.142483] RDX: 0000000000008000 RSI: 0000000000000000 RDI: 0000000000000000 > > [ 44.142483] RBP: ffff88001c0e5b98 R08: 00000000000174a0 R09: ffff88001f0174a0 > > [ 44.142483] R10: 0000000000000000 R11: ffffea0000744fc0 R12: 0000000001000000 > > [ 44.142483] R13: 0000000000000000 R14: ffff88001c0bfe80 R15: ffff88001d16df00 > > [ 44.142483] FS: 00007fe89c7817c0(0000) GS:ffff88001f000000(0000) knlGS:0000000000000000 > > [ 44.142483] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 44.142483] CR2: 0000000000000028 CR3: 000000001c0e7000 CR4: 00000000000006f0 > > [ 44.142483] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 44.142483] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 > > [ 44.142483] Stack: > > [ 44.142483] 0000000000000001 0000000000000000 ffff88001c0e5c80 ffffffffa01923f3 > > [ 44.142483] ffff88001c0e5c50 ffffc90000125040 0000000000008000 ffff88001d16df60 > > [ 44.142483] 0000000000003000 ffff88001c0e5c18 ffffffff00008000 0000000000000001 > > [ 44.142483] Call Trace: > > [ 44.142483] [] make_request+0x4c3/0xcd0 [raid1] > > Based on the oops, we're passing a NULL bio to bio_trim from the MD raid1 make_request. > > Not really sure how we get this far, but my guess is it happens here: > > mbio = bio_clone_mddev(bio, GFP_NOIO, mddev); > bio_trim(mbio, r1_bio->sector - bio->bi_iter.bi_sector, max_sectors); > > Guessing mbio is NULL because bio_clone is trying to count the iovecs. > bio_for_each_segment expects the bvs to be setup, and since this is a > discard bio, they are not. Sorry for the delay, just got back. Your analysis looks correct to me - mailing out a patch shortly