All of lore.kernel.org
 help / color / mirror / Atom feed
* snapshot-origin with no snapshot may lead to BUG() in bio_split()
@ 2019-07-20  9:26 Cédric Delmas
  2019-07-29 14:38 ` Mike Snitzer
  0 siblings, 1 reply; 3+ messages in thread
From: Cédric Delmas @ 2019-07-20  9:26 UTC (permalink / raw)
  To: dm-devel@redhat.com

Hello,

I encountered a bug while working with DM snapshot targets: having a snapshot-origin target with all snapshots removed may lead to BUG_ON(sectors <= 0) in function bio_split() (file block/bio.c).

/proc/version: Linux version 4.19.59 (root@test-dm) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Fri Jul 19 15:26:13 CEST 2019
dmesg:
[  856.268151] ------------[ cut here ]------------
[  856.268154] kernel BUG at block/bio.c:1803!
[  856.268230] invalid opcode: 0000 [#1] SMP PTI
[  856.268294] CPU: 3 PID: 677 Comm: e2fsck Tainted: G            E     4.19.59 #1
[  856.268373] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
[  856.268462] RIP: 0010:bio_split+0x7a/0x80
[  856.268526] Code: 41 8b 74 24 30 48 89 ef e8 63 ec ff ff c7 45 38 00 00 00 00 f6 45 15 04 74 08 66 41 81 4c 24 14 00 04 4c 89 e0 5b 5d 41 5c c3 <0f> 0b 0f 0b 66 90 66 66 66 66 90 41 56 45 31 f6 41 55 41 54 55 53
[  856.268866] RSP: 0018:ffffb274012aba58 EFLAGS: 00010246
[  856.268934] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff979a62d4ae90
[  856.269030] RDX: 0000000000600000 RSI: 0000000000000000 RDI: ffff979a3488c100
[  856.269108] RBP: ffff979a76068000 R08: ffff979a770489c0 R09: ffff979a35625858
[  856.269184] R10: 0000000000000000 R11: ffff979a35625858 R12: ffff979a3488c100
[  856.269261] R13: 0000000000000000 R14: ffffffffc076b580 R15: ffffb274012abe48
[  856.269339] FS:  00007fc6c6d76780(0000) GS:ffff979a77b80000(0000) knlGS:0000000000000000
[  856.269423] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  856.269492] CR2: 000055ff1b8b1278 CR3: 0000000235026002 CR4: 0000000000060ee0
[  856.269573] Call Trace:
[  856.269648]  __split_and_process_bio+0xe4/0x1a0 [dm_mod]
[  856.269724]  __dm_make_request.isra.37+0x3f/0xa0 [dm_mod]
[  856.269796]  generic_make_request+0x1a4/0x400
[  856.269861]  submit_bio+0x45/0x140
[  856.269919]  ? guard_bio_eod+0x32/0x100
[  856.269979]  submit_bh_wbc+0x163/0x190
[  856.270039]  __block_write_full_page+0x22b/0x440
[  856.270103]  ? touch_buffer+0x60/0x60
[  856.270162]  ? check_disk_change+0x60/0x60
[  856.270224]  __writepage+0x19/0x50
[  856.270282]  write_cache_pages+0x1e1/0x470
[  856.270343]  ? __wb_calc_thresh+0x130/0x130
[  856.270405]  ? __block_commit_write.isra.39+0x4c/0xa0
[  856.272244]  ? block_write_end+0x2f/0x80
[  856.272764]  ? iov_iter_copy_from_user_atomic+0xbe/0x310
[  856.273298]  ? blkdev_write_end+0x1d/0x80
[  856.273805]  ? balance_dirty_pages_ratelimited+0x1f5/0x3c0
[  856.274313]  generic_writepages+0x56/0x90
[  856.274823]  do_writepages+0x41/0xd0
[  856.275326]  ? blkdev_write_iter+0xb0/0x120
[  856.275826]  ? io_schedule_timeout+0x19/0x40
[  856.276319]  __filemap_fdatawrite_range+0xbe/0xf0
[  856.276806]  file_write_and_wait_range+0x4c/0xa0
[  856.277295]  blkdev_fsync+0x16/0x40
[  856.277768]  do_fsync+0x38/0x70
[  856.278237]  __x64_sys_fsync+0x10/0x20
[  856.278700]  do_syscall_64+0x55/0xf0
[  856.279156]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  856.279617] RIP: 0033:0x7fc6c6e8b214
[  856.280075] Code: 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 e9 f4 0c 00 8b 00 85 c0 75 13 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c c3 0f 1f 00 53 89 fb 48 83 ec 10 e8 24 55
[  856.281131] RSP: 002b:00007fffd9870c18 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[  856.282181] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc6c6e8b214
[  856.282690] RDX: 0000000000000000 RSI: 000055ff1b8ab130 RDI: 0000000000000003
[  856.283179] RBP: 000055ff1b8ab130 R08: 0000000000000000 R09: 000055ff1b8ad2c4
[  856.283663] R10: 00007fc6c7011200 R11: 0000000000000246 R12: 0000000000000200
[  856.284129] R13: 000055ff1b8ad2e0 R14: 0000000000000000 R15: 000055ff1b8aaef0
[  856.284583] Modules linked in: dm_snapshot(E) dm_bufio(E) dm_mod(E) loop(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) pcbc(E) snd_pcm(E) aesni_intel(E) bochs_drm(E) snd_timer(E) aes_x86_64(E) crypto_simd(E) ttm(E) snd(E) cryptd(E) glue_helper(E) 9pnet_virtio(E) joydev(E) evdev(E) soundcore(E) drm_kms_helper(E) sg(E) hid_generic(E) serio_raw(E) 9pnet(E) pcspkr(E) virtio_balloon(E) iTCO_wdt(E) iTCO_vendor_support(E) drm(E) qemu_fw_cfg(E) button(E) virtio_rng(E) rng_core(E) ip_tables(E) x_tables(E) autofs4(E) usbhid(E) hid(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) sr_mod(E) cdrom(E) virtio_blk(E) virtio_console(E) virtio_net(E) net_failover(E) failover(E) ahci(E) libahci(E) libata(E) xhci_pci(E) crc32c_intel(E) xhci_hcd(E)
[  856.288356]  scsi_mod(E) psmouse(E) lpc_ich(E) i2c_i801(E) usbcore(E) virtio_pci(E) virtio_ring(E) virtio(E)
[  856.289169] ---[ end trace 54a02b5e2590a1fe ]---


Environment:
- Kernel version: 4.19.59 (same problem with versions 5.2.1, latest 4.19 from Debian Buster and latest 4.9 from Debian Stretch)
- Distribution: Debian 10
- Software versions:
GNU Make            	4.2.1
Binutils            	2.31.1
Util-linux          	2.33.1
Mount               	2.33.1
Module-init-tools   	26
E2fsprogs           	1.44.5
Linux C Library     	2.28
Dynamic linker (ldd)	2.28
Linux C++ Library   	6.0.25
Procps              	3.3.15
Kbd                 	2.0.4
Console-tools       	2.0.4
Sh-utils            	8.30
Udev                	241
Modules Loaded      	9pnet 9pnet_virtio aesni_intel aes_x86_64 ahci autofs4 bochs_drm button cdrom crc16 crc32c_generic crc32c_intel crc32_pclmul crct10dif_pclmul cryptd crypto_simd drm drm_kms_helper evdev ext4 failover fscrypto ghash_clmulni_intel glue_helper hid hid_generic i2c_i801 ip_tables irqbypass iTCO_vendor_support iTCO_wdt jbd2 joydev kvm kvm_intel libahci libata lpc_ich mbcache net_failover pcbc pcspkr psmouse qemu_fw_cfg rng_core scsi_mod serio_raw sg snd snd_pcm snd_timer soundcore sr_mod ttm usbcore usbhid virtio virtio_balloon virtio_blk virtio_console virtio_net virtio_pci virtio_ring virtio_rng xhci_hcd xhci_pci x_tables


Steps to reproduce:
truncate -s 500M origin.bin
truncate -s 50M snapshot.bin
losetup /dev/loop0 origin.bin
losetup /dev/loop1 snapshot.bin
mkfs.ext4 /dev/loop0
dmsetup create snap --table "0 $(blockdev --getsz /dev/loop0) snapshot /dev/loop0 /dev/loop1 N 256"
dmsetup create orig --table "0 $(blockdev --getsz /dev/loop0) snapshot-origin /dev/loop0"
# use /dev/mapper/snap and /dev/mapper/orig then unmount them
dmsetup suspend orig
dmsetup remove snap
dmsetup resume orig
e2fsck /dev/mapper/orig
# BUG in bio_split()

Steps to reproduce (the express way):
truncate -s 500M origin.bin
losetup /dev/loop0 origin.bin
mkfs.ext4 /dev/loop0
dmsetup create orig --table "0 $(blockdev --getsz /dev/loop0) snapshot-origin /dev/loop0"
e2fsck /dev/mapper/orig
# BUG in bio_split()


I looked at the code and to my opinion the problem comes from function origin_map (file drivers/md/dm-snap.c). In the following code:

static int origin_map(struct dm_target *ti, struct bio *bio)
{
	struct dm_origin *o = ti->private;
	unsigned available_sectors;
...
	available_sectors = o->split_boundary -
		((unsigned)bio->bi_iter.bi_sector & (o->split_boundary - 1));

	if (bio_sectors(bio) > available_sectors)
		dm_accept_partial_bio(bio, available_sectors);
...

when there is no snapshot, split_boundary is 0 so available_sectors gets an invalid value.
The problem no more appears if the function origin_map early exits using the following patch:
--- a/drivers/md/dm-snap.c      2019-07-14 08:11:23.000000000 +0200
+++ b/drivers/md/dm-snap.c      2019-07-19 17:50:15.876000000 +0200
@@ -2328,6 +2328,9 @@ static int origin_map(struct dm_target *
        if (bio_data_dir(bio) != WRITE)
                return DM_MAPIO_REMAPPED;
 
+       if (unlikely(!o->split_boundary))
+               return do_origin(o->dev, bio);
+
        available_sectors = o->split_boundary -
                ((unsigned)bio->bi_iter.bi_sector & (o->split_boundary - 1));
 

Best regards,

Cédric Delmas

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: snapshot-origin with no snapshot may lead to BUG() in bio_split()
  2019-07-20  9:26 snapshot-origin with no snapshot may lead to BUG() in bio_split() Cédric Delmas
@ 2019-07-29 14:38 ` Mike Snitzer
  2019-07-29 20:10   ` Cédric Delmas
  0 siblings, 1 reply; 3+ messages in thread
From: Mike Snitzer @ 2019-07-29 14:38 UTC (permalink / raw)
  To: Cédric Delmas; +Cc: dm-devel@redhat.com

On Sat, Jul 20 2019 at  5:26am -0400,
Cédric Delmas <cedricde@outlook.fr> wrote:

> Hello,
> 
> I encountered a bug while working with DM snapshot targets: having a
> snapshot-origin target with all snapshots removed may lead to
> BUG_ON(sectors <= 0) in function bio_split() (file block/bio.c).

...
 
> Steps to reproduce:
> truncate -s 500M origin.bin
> truncate -s 50M snapshot.bin
> losetup /dev/loop0 origin.bin
> losetup /dev/loop1 snapshot.bin
> mkfs.ext4 /dev/loop0
> dmsetup create snap --table "0 $(blockdev --getsz /dev/loop0) snapshot /dev/loop0 /dev/loop1 N 256"
> dmsetup create orig --table "0 $(blockdev --getsz /dev/loop0) snapshot-origin /dev/loop0"
> # use /dev/mapper/snap and /dev/mapper/orig then unmount them
> dmsetup suspend orig
> dmsetup remove snap
> dmsetup resume orig
> e2fsck /dev/mapper/orig
> # BUG in bio_split()
> 
> Steps to reproduce (the express way):
> truncate -s 500M origin.bin
> losetup /dev/loop0 origin.bin
> mkfs.ext4 /dev/loop0
> dmsetup create orig --table "0 $(blockdev --getsz /dev/loop0) snapshot-origin /dev/loop0"
> e2fsck /dev/mapper/orig
> # BUG in bio_split()
> 
> 
> I looked at the code and to my opinion the problem comes from function origin_map (file drivers/md/dm-snap.c). In the following code:
> 
> static int origin_map(struct dm_target *ti, struct bio *bio)
> {
> 	struct dm_origin *o = ti->private;
> 	unsigned available_sectors;
> ...
> 	available_sectors = o->split_boundary -
> 		((unsigned)bio->bi_iter.bi_sector & (o->split_boundary - 1));
> 
> 	if (bio_sectors(bio) > available_sectors)
> 		dm_accept_partial_bio(bio, available_sectors);
> ...
> 
> when there is no snapshot, split_boundary is 0 so available_sectors gets an invalid value.
> The problem no more appears if the function origin_map early exits using the following patch:
> --- a/drivers/md/dm-snap.c      2019-07-14 08:11:23.000000000 +0200
> +++ b/drivers/md/dm-snap.c      2019-07-19 17:50:15.876000000 +0200
> @@ -2328,6 +2328,9 @@ static int origin_map(struct dm_target *
>         if (bio_data_dir(bio) != WRITE)
>                 return DM_MAPIO_REMAPPED;
>  
> +       if (unlikely(!o->split_boundary))
> +               return do_origin(o->dev, bio);
> +
>         available_sectors = o->split_boundary -
>                 ((unsigned)bio->bi_iter.bi_sector & (o->split_boundary - 1));
>  

When there is no snapshot snapshot-origin shouldn't be used.

So your patch may fix the BUG() you hit but it doesn't go far enough
with warning the user that they've entered "unsupported" territory.

Rather than call do_origin() I'm inclined to
DMERR_LIMIT("... unsupported ...") and error the IO.

What are your reasons for wanting to silently allow this unsupported
usecase?

Mike

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: snapshot-origin with no snapshot may lead to BUG() in bio_split()
  2019-07-29 14:38 ` Mike Snitzer
@ 2019-07-29 20:10   ` Cédric Delmas
  0 siblings, 0 replies; 3+ messages in thread
From: Cédric Delmas @ 2019-07-29 20:10 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel@redhat.com

Le 29/07/2019 à 16:38, Mike Snitzer a écrit :
> On Sat, Jul 20 2019 at  5:26am -0400,
> Cédric Delmas <cedricde@outlook.fr> wrote:
> 
>> Hello,
>>
>> I encountered a bug while working with DM snapshot targets: having a
>> snapshot-origin target with all snapshots removed may lead to
>> BUG_ON(sectors <= 0) in function bio_split() (file block/bio.c).
> 
> ...
>   
>> Steps to reproduce:
>> truncate -s 500M origin.bin
>> truncate -s 50M snapshot.bin
>> losetup /dev/loop0 origin.bin
>> losetup /dev/loop1 snapshot.bin
>> mkfs.ext4 /dev/loop0
>> dmsetup create snap --table "0 $(blockdev --getsz /dev/loop0) snapshot /dev/loop0 /dev/loop1 N 256"
>> dmsetup create orig --table "0 $(blockdev --getsz /dev/loop0) snapshot-origin /dev/loop0"
>> # use /dev/mapper/snap and /dev/mapper/orig then unmount them
>> dmsetup suspend orig
>> dmsetup remove snap
>> dmsetup resume orig
>> e2fsck /dev/mapper/orig
>> # BUG in bio_split()
>>
>> Steps to reproduce (the express way):
>> truncate -s 500M origin.bin
>> losetup /dev/loop0 origin.bin
>> mkfs.ext4 /dev/loop0
>> dmsetup create orig --table "0 $(blockdev --getsz /dev/loop0) snapshot-origin /dev/loop0"
>> e2fsck /dev/mapper/orig
>> # BUG in bio_split()
>>
>>
>> I looked at the code and to my opinion the problem comes from function origin_map (file drivers/md/dm-snap.c). In the following code:
>>
>> static int origin_map(struct dm_target *ti, struct bio *bio)
>> {
>> 	struct dm_origin *o = ti->private;
>> 	unsigned available_sectors;
>> ...
>> 	available_sectors = o->split_boundary -
>> 		((unsigned)bio->bi_iter.bi_sector & (o->split_boundary - 1));
>>
>> 	if (bio_sectors(bio) > available_sectors)
>> 		dm_accept_partial_bio(bio, available_sectors);
>> ...
>>
>> when there is no snapshot, split_boundary is 0 so available_sectors gets an invalid value.
>> The problem no more appears if the function origin_map early exits using the following patch:
>> --- a/drivers/md/dm-snap.c      2019-07-14 08:11:23.000000000 +0200
>> +++ b/drivers/md/dm-snap.c      2019-07-19 17:50:15.876000000 +0200
>> @@ -2328,6 +2328,9 @@ static int origin_map(struct dm_target *
>>          if (bio_data_dir(bio) != WRITE)
>>                  return DM_MAPIO_REMAPPED;
>>   
>> +       if (unlikely(!o->split_boundary))
>> +               return do_origin(o->dev, bio);
>> +
>>          available_sectors = o->split_boundary -
>>                  ((unsigned)bio->bi_iter.bi_sector & (o->split_boundary - 1));
>>   
> 
> When there is no snapshot snapshot-origin shouldn't be used.
> 
> So your patch may fix the BUG() you hit but it doesn't go far enough
> with warning the user that they've entered "unsupported" territory.
> 
> Rather than call do_origin() I'm inclined to
> DMERR_LIMIT("... unsupported ...") and error the IO.
> 
> What are your reasons for wanting to silently allow this unsupported
> usecase?
> 
> Mike
> 

I didn't know that this usecase is unsupported, 
Documentation/device-mapper/snapshot.txt lets me think that even if the 
origin device should have one or more snapshots based on it, it is not 
mandatory. If this configuration is not supported, you are absolutely 
right, it is better to raise an error.

I think it could be nice to be able to permanently use a snapshot-origin 
device and to create snapshots only on demand (without forgetting to 
suspend the origin device during snapshot creation) however any 
correction or error notification is OK for me.

Cédric

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-07-29 20:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-07-20  9:26 snapshot-origin with no snapshot may lead to BUG() in bio_split() Cédric Delmas
2019-07-29 14:38 ` Mike Snitzer
2019-07-29 20:10   ` Cédric Delmas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.