linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Coly Li <colyli@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	linux-block@vger.kernel.org, Hannes Reinecke <hare@suse.com>,
	Xiao Ni <xni@redhat.com>,
	"Martin K . Petersen" <martin.petersen@oracle.com>
Subject: Re: [PATCH] block: loop: set discard granularity and alignment for block device backed loop
Date: Tue, 4 Aug 2020 16:14:52 +0800	[thread overview]
Message-ID: <20200804081452.GA1957653@T590> (raw)
In-Reply-To: <79b1d428-8897-5dd1-47ca-c61512547045@suse.de>

On Tue, Aug 04, 2020 at 03:50:32PM +0800, Coly Li wrote:
> On 2020/8/4 14:01, Ming Lei wrote:
> > On Tue, Aug 04, 2020 at 12:29:07PM +0800, Coly Li wrote:
> >> On 2020/8/4 12:15, Ming Lei wrote:
> >>> In case of block device backend, if the backend supports discard, the
> >>> loop device will set queue flag of QUEUE_FLAG_DISCARD.
> >>>
> >>> However, limits.discard_granularity isn't setup, and this way is wrong,
> >>> see the following description in Documentation/ABI/testing/sysfs-block:
> >>>
> >>> 	A discard_granularity of 0 means that the device does not support
> >>> 	discard functionality.
> >>>
> >>> Especially 9b15d109a6b2 ("block: improve discard bio alignment in
> >>> __blkdev_issue_discard()") starts to take q->limits.discard_granularity
> >>> for computing max discard sectors. And zero discard granularity causes
> >>> kernel oops[1].
> >>>
> >>> Fix the issue by set up discard granularity and alignment.
> >>>
> >>> [1] kernel oops when use block disk as loop backend:
> >>>
> >>> [   33.405947] ------------[ cut here ]------------
> >>> [   33.405948] kernel BUG at block/blk-mq.c:563!
> >>> [   33.407504] invalid opcode: 0000 [#1] SMP PTI
> >>> [   33.408245] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.8.0-rc2_update_rq+ #17
> >>> [   33.409466] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx2.fedor4
> >>> [   33.411546] RIP: 0010:blk_mq_end_request+0x1c/0x2a
> >>> [   33.412354] Code: 48 89 df 5b 5d 41 5c 41 5d e9 2d fe ff ff 0f 1f 44 00 00 55 48 89 fd 53 40 0f b6 de 8b 57 5
> >>> [   33.415724] RSP: 0018:ffffc900019ccf48 EFLAGS: 00010202
> >>> [   33.416688] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffff88826f2600c0
> >>> [   33.417990] RDX: 0000000000200042 RSI: 0000000000000001 RDI: ffff888270c13100
> >>> [   33.419286] RBP: ffff888270c13100 R08: 0000000000000001 R09: ffffffff81345144
> >>> [   33.420584] R10: ffff88826f260600 R11: 0000000000000000 R12: ffff888270c13100
> >>> [   33.421881] R13: ffffffff820050c0 R14: 0000000000000004 R15: 0000000000000004
> >>> [   33.423053] FS:  0000000000000000(0000) GS:ffff888277d80000(0000) knlGS:0000000000000000
> >>> [   33.424360] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [   33.425416] CR2: 00005581d7903f08 CR3: 00000001e9a16002 CR4: 0000000000760ee0
> >>> [   33.426580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> [   33.427706] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> [   33.428910] PKRU: 55555554
> >>> [   33.429412] Call Trace:
> >>> [   33.429858]  <IRQ>
> >>> [   33.430189]  blk_done_softirq+0x84/0xa4
> >>> [   33.430884]  __do_softirq+0x11a/0x278
> >>> [   33.431567]  asm_call_on_stack+0x12/0x20
> >>> [   33.432283]  </IRQ>
> >>> [   33.432673]  do_softirq_own_stack+0x31/0x40
> >>> [   33.433448]  __irq_exit_rcu+0x46/0xae
> >>> [   33.434132]  sysvec_call_function_single+0x7d/0x8b
> >>> [   33.435021]  asm_sysvec_call_function_single+0x12/0x20
> >>> [   33.435965] RIP: 0010:native_safe_halt+0x7/0x8
> >>> [   33.436795] Code: 25 c0 7b 01 00 f0 80 60 02 df f0 83 44 24 fc 00 48 8b 00 a8 08 74 0b 65 81 25 db 38 95 7e b
> >>> [   33.440179] RSP: 0018:ffffc9000191fee0 EFLAGS: 00000246
> >>> [   33.441143] RAX: ffffffff816c4118 RBX: 0000000000000000 RCX: ffff888276631a00
> >>> [   33.442442] RDX: 000000000000ddfe RSI: 0000000000000003 RDI: 0000000000000001
> >>> [   33.443742] RBP: 0000000000000000 R08: 00000000f461e6ac R09: 0000000000000004
> >>> [   33.445046] R10: 8080808080808080 R11: fefefefefefefeff R12: 0000000000000000
> >>> [   33.446352] R13: 0000000000000003 R14: ffffffffffffffff R15: 0000000000000000
> >>> [   33.447450]  ? ldsem_down_write+0x1f5/0x1f5
> >>> [   33.448158]  default_idle+0x1b/0x2c
> >>> [   33.448809]  do_idle+0xf9/0x248
> >>> [   33.449392]  cpu_startup_entry+0x1d/0x1f
> >>> [   33.450118]  start_secondary+0x168/0x185
> >>> [   33.450843]  secondary_startup_64+0xa4/0xb0
> >>> [   33.451612] Modules linked in: scsi_debug iTCO_wdt iTCO_vendor_support nvme nvme_core i2c_i801 lpc_ich virtis
> >>> [   33.454292] Dumping ftrace buffer:
> >>> [   33.454951]    (ftrace buffer empty)
> >>> [   33.455626] ---[ end trace beece202d663d38f ]---
> >>>
> >>> Fixes: 9b15d109a6b2 ("block: improve discard bio alignment in __blkdev_issue_discard()")
> >>> Cc: Coly Li <colyli@suse.de>
> >>> Cc: Hannes Reinecke <hare@suse.com>
> >>> Cc: Xiao Ni <xni@redhat.com>
> >>> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> >>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >>> ---
> >>>  drivers/block/loop.c | 5 +++++
> >>>  1 file changed, 5 insertions(+)
> >>>
> >>> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> >>> index d18160146226..6102370bee35 100644
> >>> --- a/drivers/block/loop.c
> >>> +++ b/drivers/block/loop.c
> >>> @@ -890,6 +890,11 @@ static void loop_config_discard(struct loop_device *lo)
> >>>  		struct request_queue *backingq;
> >>>  
> >>>  		backingq = bdev_get_queue(inode->i_bdev);
> >>> +
> >>> +		q->limits.discard_granularity =
> >>> +			queue_physical_block_size(backingq);
> >>> +		q->limits.discard_alignment = 0;
> >>> +
> >>>  		blk_queue_max_discard_sectors(q,
> >>>  			backingq->limits.max_write_zeroes_sectors);
> >>>  
> >>>
> >>
> >> Hi Ming,
> >>
> >> I did similar change, it can avoid the panic or 0 length discard bio.
> >> But yesterday I realize the discard request to loop device should not go
> >> into __blkdev_issue_discard(). As commit c52abf563049 ("loop: Better
> >> discard support for block devices") mentioned it should go into
> >> blkdev_issue_zeroout(), this is why in loop_config_discard() the
> >> max_discard_sectors is set to backingq->limits.max_write_zeroes_sectors.
> > 
> > That commit meant REQ_OP_DISCARD on a loop device is translated into
> > blkdev_issue_zeroout(), because REQ_OP_DISCARD is handled as
> > file->f_op->fallocate(FALLOC_FL_PUNCH_HOLE), which will cause
> > "subsequent reads from this range will return zeroes".
> > 
> >>
> >> Now I am looking at the problem why discard request on loop device
> >> doesn't go into blkdev_issue_zeroout().
> > 
> > No, that is correct behavior, since loop can support discard or zeroout.
> > 
> > If QUEUE_FLAG_DISCARD is set, either discard_granularity or max discard
> > sectors shouldn't be zero.
> > 
> > This patch shouldn't set discard_granularity if limits.max_write_zeroes_sectors
> > is zero, will fix it in V2.
> > 
> >>
> >> With the above change, the discard is very slow on loop device with
> >> backing device. In my testing, mkfs.xfs on /dev/loop0 does not complete
> >> in 20 minutes, each discard request is only 4097 sectors.
> > 
> > I'd suggest you to check the discard related queue limits, and see why
> > each discard request just sends 4097 sectors.
> > 
> > Or we need to mirror underlying queue's discard_granularity too? Can you
> > try the following patch?
> 
> Can I know the exact command line to reproduce the panic ?

modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1

losetup -f /dev/sdc --direct-io=on   #suppose /dev/sdc is the scsi_debug LUN

mkfs.ext4 /dev/loop0				#suppose loop0 is the loop backed by /dev/sdc

mount /dev/loop0 /mnt

cd /mnt
dbench -t 20 -s 64


> 
> I try to use blkdiscard, or dbench -D /mount_point_to_loop0, and see all
> the discard requests go into blkdev_fallocate()=>blkdev_issue_zeroout().
> No request goes into __blkdev_issue_discard().

The issue isn't related with backend queue, and it is triggered on
loop's request.


Thanks,
Ming


  reply	other threads:[~2020-08-04  8:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-04  4:15 [PATCH] block: loop: set discard granularity and alignment for block device backed loop Ming Lei
2020-08-04  4:29 ` Coly Li
2020-08-04  6:01   ` Ming Lei
2020-08-04  7:50     ` Coly Li
2020-08-04  8:14       ` Ming Lei [this message]
2020-08-04 14:30         ` Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200804081452.GA1957653@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=colyli@suse.de \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).