All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jack Wang <jinpu.wang@profitbricks.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: [BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0
Date: Mon, 06 Jan 2014 09:45:08 +0100	[thread overview]
Message-ID: <52CA6D14.4040300@profitbricks.com> (raw)
In-Reply-To: <20140104060925.GF10323@ZenIV.linux.org.uk>

On 01/04/2014 07:09 AM, Al Viro wrote:
> On Thu, Jan 02, 2014 at 10:36:30AM +0100, Jack Wang wrote:
> 
>>> Bug happened at line 1486, looks disk->fops is NULL here for some
>>> reason, is it reasonable to add a check like:
>>>
>>> if (disk->fops)
>>> 	if (disk->fops->release)
>>> 		ret = disk->fops->release(disk, mode);
>>>
>>>
>>> Happy New Year and Best regards:)
>>> Jack
>>>
>>
>> Ping, could you share opnions on this, attached with patch I proposaled.
> 
> Sorry, had been sick since mid-December ;-/  The patch is not a good idea -
> in the best case it's papering over a bug (and insufficiently so, at that,
> since there are other places where disk->fops->some_method is checked).
> 
> gendisk->fops should never be assigned NULL; it starts life with NULL
> ->fops, but that should be assigned a non-NULL value (and never modified
> afterwards) before anyone can see it.  Moreover, even if some driver has
> fscked up and forgot to initialize the damn thing, get_gendisk() would've
> refused to return such a thing to any callers (including __blkdev_get()).
> Note that __blkdev_get() would oops on such a thing if get_gendisk()
> somehow returned it.
> 
> Looks like something is shitting over bdev->bd_disk or bdev->bd_disk->fops.
> The offsets in the disassembled code are all wrong (including that from
> beginning of function to oopsing instruction), but the code match is good,
> so I agree that we are hitting bdev->bd_disk->fops == NULL here.  The
> question is how it has happened - that's where the real bug is...
> 
> How reproducible it is?  And which kernel, while we are at it?  This area
> didn't get a lot of changes lately, but still...
> 
Thanks Al for reply, and look into this.
We're using 3.4.71, and this happened in production, we can not
reproduce it yet. What I could see is: before this happened, we saw scsi
devices offlined, and multipath failed path, raid1 failed member device.

Possible the bug lies in drivers md-raid1, dm-multipath or sd? How could
I narrow it down? Could you teach me?

Thanks, wish you happy and healthy!

Jack

      reply	other threads:[~2014-01-06  8:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-30 15:55 [BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0 Jack Wang
2014-01-02  9:36 ` Jack Wang
2014-01-04  6:09   ` Al Viro
2014-01-06  8:45     ` Jack Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52CA6D14.4040300@profitbricks.com \
    --to=jinpu.wang@profitbricks.com \
    --cc=axboe@kernel.dk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.