From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linuxfoundation.org ([140.211.169.12]:45748 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932931AbcILRNJ (ORCPT ); Mon, 12 Sep 2016 13:13:09 -0400 From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Sebastian Roesner , Eric Wheeler , Shaohua Li , Kent Overstreet , Ming Lei , Jens Axboe Subject: [PATCH 4.4 153/192] block: make sure a big bio is split into at most 256 bvecs Date: Mon, 12 Sep 2016 19:01:02 +0200 Message-Id: <20160912152205.466641065@linuxfoundation.org> In-Reply-To: <20160912152158.855601725@linuxfoundation.org> References: <20160912152158.855601725@linuxfoundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: stable-owner@vger.kernel.org List-ID: 4.4-stable review patch. If anyone has any objections, please let me know. ------------------ From: Ming Lei commit 4d70dca4eadf2f95abe389116ac02b8439c2d16c upstream. After arbitrary bio size was introduced, the incoming bio may be very big. We have to split the bio into small bios so that each holds at most BIO_MAX_PAGES bvecs for safety reason, such as bio_clone(). This patch fixes the following kernel crash: > [ 172.660142] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 > [ 172.660229] IP: [] bio_trim+0xf/0x2a > [ 172.660289] PGD 7faf3e067 PUD 7f9279067 PMD 0 > [ 172.660399] Oops: 0000 [#1] SMP > [...] > [ 172.664780] Call Trace: > [ 172.664813] [] ? raid1_make_request+0x2e8/0xad7 [raid1] > [ 172.664846] [] ? blk_queue_split+0x377/0x3d4 > [ 172.664880] [] ? md_make_request+0xf6/0x1e9 [md_mod] > [ 172.664912] [] ? generic_make_request+0xb5/0x155 > [ 172.664947] [] ? prio_io+0x85/0x95 [bcache] > [ 172.664981] [] ? register_cache_set+0x355/0x8d0 [bcache] > [ 172.665016] [] ? register_bcache+0x1006/0x1174 [bcache] The issue can be reproduced by the following steps: - create one raid1 over two virtio-blk - build bcache device over the above raid1 and another cache device and bucket size is set as 2Mbytes - set cache mode as writeback - run random write over ext4 on the bcache device Fixes: 54efd50(block: make generic_make_request handle arbitrarily sized bios) Reported-by: Sebastian Roesner Reported-by: Eric Wheeler Cc: Shaohua Li Acked-by: Kent Overstreet Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- block/blk-merge.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -92,9 +92,31 @@ static struct bio *blk_bio_segment_split bool do_split = true; struct bio *new = NULL; const unsigned max_sectors = get_max_io_size(q, bio); + unsigned bvecs = 0; bio_for_each_segment(bv, bio, iter) { /* + * With arbitrary bio size, the incoming bio may be very + * big. We have to split the bio into small bios so that + * each holds at most BIO_MAX_PAGES bvecs because + * bio_clone() can fail to allocate big bvecs. + * + * It should have been better to apply the limit per + * request queue in which bio_clone() is involved, + * instead of globally. The biggest blocker is the + * bio_clone() in bio bounce. + * + * If bio is splitted by this reason, we should have + * allowed to continue bios merging, but don't do + * that now for making the change simple. + * + * TODO: deal with bio bounce's bio_clone() gracefully + * and convert the global limit into per-queue limit. + */ + if (bvecs++ >= BIO_MAX_PAGES) + goto split; + + /* * If the queue doesn't support SG gaps and adding this * offset would create a gap, disallow it. */