linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: yebin <yebin@huaweicloud.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: axboe@kernel.dk, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, Ye Bin <yebin10@huawei.com>
Subject: Re: [PATCH] block: bio-integrity: fix potential null-ptr-deref in bio_integrity_free
Date: Fri, 7 Jun 2024 09:32:29 +0800	[thread overview]
Message-ID: <6662632D.7020000@huaweicloud.com> (raw)
In-Reply-To: <ZmJQwvBXfm3zw+Xs@fedora>



On 2024/6/7 8:13, Ming Lei wrote:
> On Thu, Jun 06, 2024 at 02:26:55PM +0800, Ye Bin wrote:
>> From: Ye Bin <yebin10@huawei.com>
>>
>> There's a issue as follows when do format NVME with IO:
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> PGD 101727f067 P4D 1011fae067 PUD fbed78067 PMD 0
>> Oops: 0000 [#1] SMP NOPTI
>> RIP: 0010:kfree+0x4f/0x160
>> RSP: 0018:ff705a800912b910 EFLAGS: 00010247
>> RAX: 0000000000000000 RBX: 0d06d30000000000 RCX: ff4fb320260ad990
>> RDX: ff4fb30ee7acba40 RSI: 0000000000000000 RDI: 00b04cff80000000
>> RBP: ff4fb30ee7acba40 R08: 0000000000000200 R09: ff705a800912bb60
>> R10: 0000000000000000 R11: ff4fb3103b67c750 R12: ffffffff9a62d566
>> R13: ff4fb30aa0530000 R14: 0000000000000000 R15: 000000000000000a
>> FS:  00007f4399b6b700(0000) GS:ff4fb31040140000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000008 CR3: 0000001014cd4002 CR4: 0000000000761ee0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
>> PKRU: 55555554
>> Call Trace:
>>   bio_integrity_free+0xa6/0xb0
>>   __bio_integrity_endio+0x8c/0xa0
>>   bio_endio+0x2b/0x130
>>   blk_update_request+0x78/0x2b0
>>   blk_mq_end_request+0x1a/0x140
>>   blk_mq_try_issue_directly+0x5d/0xc0
>>   blk_mq_make_request+0x46b/0x540
>>   generic_make_request+0x121/0x300
>>   submit_bio+0x6c/0x140
>>   __blkdev_direct_IO_simple+0x1ca/0x3a0
>>   blkdev_direct_IO+0x3d9/0x460
>>   generic_file_read_iter+0xb4/0xc60
>>   new_sync_read+0x121/0x170
>>   vfs_read+0x89/0x130
>>   ksys_read+0x52/0xc0
>>   do_syscall_64+0x5d/0x1d0
>>   entry_SYSCALL_64_after_hwframe+0x65/0xca
>>
>> Assuming a 512 byte directIO is issued, the initial logical block size of
>> the state block device is 512 bytes, and then modified to 4096 bytes.
>> Above issue may happen as follows:
>>           Direct read                    format NVME
>> __blkdev_direct_IO_simple(iocb, iter, nr_pages);
>>    if ((pos | iov_iter_alignment(iter)) & (bdev_logical_block_size(bdev) - 1))
>> 	-->The logical block size is 512, and the IO issued is 512 bytes,
>> 	   which can be checked
>>      return -EINVAL;
>>    submit_bio(&bio);
>>                                        nvme_dev_ioctl
>>                                          case NVME_IOCTL_RESCAN:
>>                                            nvme_queue_scan(ctrl);
>>                                               ...
>>                                              nvme_update_disk_info(disk, ns, id);
>>                                                blk_queue_logical_block_size(disk->queue, bs);
>>                                                  --> 512->4096
>>       blk_queue_enter(q, flags)
>>       blk_mq_make_request(q, bio)
>>         bio_integrity_prep(bio)
>> 	 len = bio_integrity_bytes(bi, bio_sectors(bio));
>> 	   -->At this point, because the logical block size has increased to
>> 	      4096 bytes, the calculated 'len' here is 0
>>           buf = kmalloc(len, GFP_NOIO | q->bounce_gfp);
>> 	   -->Passed in len=0 and returned buf=16
>>           end = (((unsigned long) buf) + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
>>           start = ((unsigned long) buf) >> PAGE_SHIFT;
>>           nr_pages = end - start;  -->nr_pages == 1
>>           bip->bip_flags |= BIP_BLOCK_INTEGRITY;
>>           for (i = 0 ; i < nr_pages ; i++) {
>>             if (len <= 0)
>>                -->Not initializing the bip_vec of bio_integrity, will result
>> 		 in null pointer access during subsequent releases. Even if
>> 		 initialized, it will still cause subsequent releases access
>> 		 null pointer because the buffer address is incorrect.
>>               break;
>>
>> Firstly, it is unreasonable to format NVME in the presence of IO. It is also
>> possible to see IO smaller than the logical block size in the block layer for
>> this type of concurrency. It is expected that this type of IO device will
>> return an error, so exception handling should also be done for this type of
>> IO to prevent null pointer access from causing system crashes.
> Actually unaligned IO handling is one mess for nvme hardware. Yes, IO may fail,
> but it is observed that meta buffer is overwrite by DMA in read IO.
>
> Ye and Yi, can you test the following patch in your 'nvme format' & IO workload?
>
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 82c3ae22d76d..a41ab4a3a398 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -336,6 +336,19 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>   	return 0;
>   }
>   
> +static bool bio_unaligned(struct bio *bio)
> +{
> +	unsigned int bs = bdev_logical_block_size(bio->bi_bdev);
> +
> +	if (bio->bi_iter.bi_size & (bs - 1))
> +	        return true;
> +
> +	if ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))
> +	        return true;
> +
> +	return false;
> +}
I think this judgment is a bit incorrect. It should not be sufficient to 
only determine whether
the length and starting sector are logically block aligned.
> +
>   int __bio_queue_enter(struct request_queue *q, struct bio *bio)
>   {
>   	while (!blk_try_enter_queue(q, false)) {
> @@ -362,6 +375,15 @@ int __bio_queue_enter(struct request_queue *q, struct bio *bio)
>   			   test_bit(GD_DEAD, &disk->state));
>   		if (test_bit(GD_DEAD, &disk->state))
>   			goto dead;
> +		/*
> +		 * Not like other queue limits, logical block size is one
> +		 * fundamental limit which can't be covered by bio split.
> +		 *
> +		 * Device reconfiguration may happen and logical block size
> +		 * is changed, so fail the IO if that is true.
> +		 */
> +		if (bio_unaligned(bio))
> +			goto dead;
>   	}
>   
>   	return 0;
>
> Thanks,
> Ming
>
>


  reply	other threads:[~2024-06-07  1:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-06  6:26 [PATCH] block: bio-integrity: fix potential null-ptr-deref in bio_integrity_free Ye Bin
2024-06-06  6:44 ` Christoph Hellwig
2024-06-06 11:34   ` yebin
2024-06-06 14:30     ` Ming Lei
2024-06-06 14:52       ` Christoph Hellwig
2024-06-06 15:48         ` Keith Busch
2024-06-06 23:46         ` Ming Lei
2024-06-07  0:13 ` Ming Lei
2024-06-07  1:32   ` yebin [this message]
2024-06-07  1:35     ` Ming Lei
2024-06-11  2:48       ` yebin (H)
2024-06-11  3:29         ` Ming Lei
2024-06-17  3:49           ` yebin (H)
2024-06-18  1:43             ` Ming Lei
2024-06-11  9:15 ` Markus Elfring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6662632D.7020000@huaweicloud.com \
    --to=yebin@huaweicloud.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=yebin10@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).