All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
@ 2012-01-24 23:45 Luzipher McLeod
  2012-01-25  2:14 ` Mandeep Singh Baines
  0 siblings, 1 reply; 11+ messages in thread
From: Luzipher McLeod @ 2012-01-24 23:45 UTC (permalink / raw)
  To: dm-crypt

Hi :-)

A few days ago I encountered a kernel bug while copying files to an encrypted filesystem. The specific stack for the filesystem is: btrfs-on-crypt-on-mdraid. Vasts amounts of data copied without problems (about 6.3TB with 1.1 TB remaining), but when copying a certain directory, the kernel bug surfaces. I repeatedly deleted the affected directory and tried to re-copy it, but it always fails at the same point (or close to that). More recent test showed that I could copy a few more files to the filesystem to a different directory, but it very quickly failed there as well (a few megabytes later).
After talking to the btrfs devs on freenode (as btrfs is the most experimental thing in the stack, they came to the conclusion that it's most probably the crypto layer.

Some details:
gentoo kernel 3.2.1 (custom config and ubuntu config)
mdraid: linear, 4 disks, each 2TB (total 8TB)
crypt: setup via cryptsetup -c aes-xts-plain64 -h plain -s 512 -d - create tempraid /dev/md/tempraid_lin

I'd appreciate any help with this and would be happy to test patches or provide more debug info.

Thanks and Regards,
Luzipher




The kernel bug output retrieved by netconsole (also at http://pastebin.com/sjJy7QE4 ):
    [  294.538422] netconsole: local port 6666
    [  333.423583] SysRq : Changing Loglevel
    [  333.423609] Loglevel set to 9
    [  424.248405] ------------[ cut here ]------------
    [  424.248447] kernel BUG at fs/bio.c:1499!
    [  424.248476] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
    [  424.248558] CPU 3
    [  424.248577] Modules linked in: netconsole configfs reiserfs f71882fg coretemp raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx linear i915 snd_pcm snd_timer mpt2sas snd drm_kms_helper firewire_ohci soundcore usblp snd_page_alloc tpm_tis drm firewire_core scsi_transport_sas pcspkr crc_itu_t pata_jmicron raid_class iTCO_wdt r8169 iTCO_vendor_support i2c_i801 i2c_algo_bit mei(C) video
    [  424.250095]
    [  424.250119] Pid: 18, comm: kworker/3:0 Tainted: G         C   3.2.1-gentoo #9 MSI MS-7637/H55-GD65 (MS-7637)  
    [  424.250202] RIP: 0010:[<ffffffff811b114f>]  [<ffffffff811b114f>] bio_split+0x2bf/0x2d0
    [  424.250254] RSP: 0018:ffff88022e305c20  EFLAGS: 00010206
    [  424.250282] RAX: ffff88022b7f19e0 RBX: ffff88009fbb2200 RCX: 000000010027000e
    [  424.250313] RDX: ffff8800a77416f8 RSI: 0000000000000001 RDI: 0000000000000282
    [  424.250344] RBP: ffff88022e305c70 R08: ffff8800a77465b8 R09: ffff880233007640
    [  424.250375] R10: ffffea00029dd020 R11: 00000000000000ff R12: 0000000000000080
    [  424.250407] R13: ffff8800a77416f8 R14: 0000000000000001 R15: ffffea0002b0bfc0
    [  424.250438] FS:  0000000000000000(0000) GS:ffff88023bcc0000(0000) knlGS:0000000000000000
    [  424.250475] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [  424.250502] CR2: 0000000001df3000 CR3: 0000000001c05000 CR4: 00000000000006e0
    [  424.250533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  424.250564] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [  424.250595] Process kworker/3:0 (pid: 18, threadinfo ffff88022e304000, task ffff88022e2fb040)
    [  424.250629] Stack:
    [  424.250647]  ffff880219905050 0000000000001000 ffff88022b7f19e0 baa1821019905070
    [  424.250729]  0003120200001000 ffff88009fbb2200 ffff88022a524f38 000000000001f000
    [  424.250813]  0000000000000001 ffffea0002b0bfc0 ffff88022e305cd0 ffffffffa006e626
    [  424.250896] Call Trace:
    [  424.250918]  [<ffffffffa006e626>] linear_make_request+0x106/0x190 [linear]
    [  424.250953]  [<ffffffff8137576b>] ? generic_make_request_checks+0x1eb/0x370
    [  424.250988]  [<ffffffff815637e6>] md_make_request+0xc6/0x200
    [  424.251017]  [<ffffffff813759b7>] generic_make_request+0xc7/0x100
    [  424.251050]  [<ffffffff8157ebe4>] kcryptd_crypt_write_io_submit+0x44/0xc0
    [  424.251082]  [<ffffffff8157f160>] kcryptd_crypt+0x280/0x3d0
    [  424.251112]  [<ffffffff8157eee0>] ? crypt_convert_init.isra.17+0x60/0x60
    [  424.251146]  [<ffffffff8108dfda>] process_one_work+0x11a/0x480
    [  424.251176]  [<ffffffff8108ed64>] worker_thread+0x164/0x370
    [  424.251205]  [<ffffffff8108ec00>] ? manage_workers.isra.30+0x230/0x230
    [  424.251889]  [<ffffffff810934dc>] kthread+0x8c/0xa0
    [  424.251977]  [<ffffffff816d8e34>] kernel_thread_helper+0x4/0x10
    [  424.252009]  [<ffffffff81093450>] ? flush_kthread_worker+0xa0/0xa0
    [  424.252040]  [<ffffffff816d8e30>] ? gs_change+0x13/0x13
    [  424.252068] Code: 48 89 da 49 83 c6 10 8b 4d cc 48 8b 75 c0 ff d0 48 8b 55 b8 4c 89 f0 4c 29 f8 48 8b 44 02 f0 48 85 c0 75 d8 e9 9f fd ff ff 0f 0b <0f> 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5
    [  424.253585] RIP  [<ffffffff811b114f>] bio_split+0x2bf/0x2d0
    [  424.253626]  RSP <ffff88022e305c20>
    [  424.312992] ---[ end trace c8048857547cd8da ]---
    [  424.313134] BUG: unable to handle kernel paging request at fffffffffffffff8
    [  424.313225] IP: [<ffffffff81093971>] kthread_data+0x11/0x20
    [  424.313279] PGD 1c07067 PUD 1c08067 PMD 0
    [  424.313357] Oops: 0000 [#2] SMP DEBUG_PAGEALLOC
    [  424.313437] CPU 3
    [  424.313459] Modules linked in: netconsole configfs reiserfs f71882fg coretemp raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx linear i915 snd_pcm snd_timer mpt2sas snd drm_kms_helper firewire_ohci soundcore usblp snd_page_alloc tpm_tis drm firewire_core scsi_transport_sas pcspkr crc_itu_t pata_jmicron



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
  2012-01-24 23:45 [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device Luzipher McLeod
@ 2012-01-25  2:14 ` Mandeep Singh Baines
  2012-01-25  8:06   ` Luzipher McLeod
  0 siblings, 1 reply; 11+ messages in thread
From: Mandeep Singh Baines @ 2012-01-25  2:14 UTC (permalink / raw)
  To: Luzipher McLeod; +Cc: dm-crypt, NeilBrown

Luzipher McLeod (luziphermcleod@yahoo.ie) wrote:
> Hi :-)
> 
> A few days ago I encountered a kernel bug while copying files to an encrypted filesystem. The specific stack for the filesystem is: btrfs-on-crypt-on-mdraid. Vasts amounts of data copied without problems (about 6.3TB with 1.1 TB remaining), but when copying a certain directory, the kernel bug surfaces. I repeatedly deleted the affected directory and tried to re-copy it, but it always fails at the same point (or close to that). More recent test showed that I could copy a few more files to the filesystem to a different directory, but it very quickly failed there as well (a few megabytes later).
> After talking to the btrfs devs on freenode (as btrfs is the most experimental thing in the stack, they came to the conclusion that it's most probably the crypto layer.
> 
> Some details:
> gentoo kernel 3.2.1 (custom config and ubuntu config)
> mdraid: linear, 4 disks, each 2TB (total 8TB)
> crypt: setup via cryptsetup -c aes-xts-plain64 -h plain -s 512 -d - create tempraid /dev/md/tempraid_lin
> 
> I'd appreciate any help with this and would be happy to test patches or provide more debug info.
> 
> Thanks and Regards,
> Luzipher
> 
> 
> 
> 
> The kernel bug output retrieved by netconsole (also at http://pastebin.com/sjJy7QE4 ):
>     [  294.538422] netconsole: local port 6666
>     [  333.423583] SysRq : Changing Loglevel
>     [  333.423609] Loglevel set to 9
>     [  424.248405] ------------[ cut here ]------------
>     [  424.248447] kernel BUG at fs/bio.c:1499!

Hi Luzipher,

Looks like the BUG is because bio_split only works on single-page iovecs.

I see a relevant (old) patch from Neil Brown here:

https://lkml.org/lkml/2007/7/30/496

Regards,
Mandeep

>     [  424.248476] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
>     [  424.248558] CPU 3
>     [  424.248577] Modules linked in: netconsole configfs reiserfs f71882fg coretemp raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx linear i915 snd_pcm snd_timer mpt2sas snd drm_kms_helper firewire_ohci soundcore usblp snd_page_alloc tpm_tis drm firewire_core scsi_transport_sas pcspkr crc_itu_t pata_jmicron raid_class iTCO_wdt r8169 iTCO_vendor_support i2c_i801 i2c_algo_bit mei(C) video
>     [  424.250095]
>     [  424.250119] Pid: 18, comm: kworker/3:0 Tainted: G         C   3.2.1-gentoo #9 MSI MS-7637/H55-GD65 (MS-7637)  
>     [  424.250202] RIP: 0010:[<ffffffff811b114f>]  [<ffffffff811b114f>] bio_split+0x2bf/0x2d0
>     [  424.250254] RSP: 0018:ffff88022e305c20  EFLAGS: 00010206
>     [  424.250282] RAX: ffff88022b7f19e0 RBX: ffff88009fbb2200 RCX: 000000010027000e
>     [  424.250313] RDX: ffff8800a77416f8 RSI: 0000000000000001 RDI: 0000000000000282
>     [  424.250344] RBP: ffff88022e305c70 R08: ffff8800a77465b8 R09: ffff880233007640
>     [  424.250375] R10: ffffea00029dd020 R11: 00000000000000ff R12: 0000000000000080
>     [  424.250407] R13: ffff8800a77416f8 R14: 0000000000000001 R15: ffffea0002b0bfc0
>     [  424.250438] FS:  0000000000000000(0000) GS:ffff88023bcc0000(0000) knlGS:0000000000000000
>     [  424.250475] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>     [  424.250502] CR2: 0000000001df3000 CR3: 0000000001c05000 CR4: 00000000000006e0
>     [  424.250533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>     [  424.250564] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>     [  424.250595] Process kworker/3:0 (pid: 18, threadinfo ffff88022e304000, task ffff88022e2fb040)
>     [  424.250629] Stack:
>     [  424.250647]  ffff880219905050 0000000000001000 ffff88022b7f19e0 baa1821019905070
>     [  424.250729]  0003120200001000 ffff88009fbb2200 ffff88022a524f38 000000000001f000
>     [  424.250813]  0000000000000001 ffffea0002b0bfc0 ffff88022e305cd0 ffffffffa006e626
>     [  424.250896] Call Trace:
>     [  424.250918]  [<ffffffffa006e626>] linear_make_request+0x106/0x190 [linear]
>     [  424.250953]  [<ffffffff8137576b>] ? generic_make_request_checks+0x1eb/0x370
>     [  424.250988]  [<ffffffff815637e6>] md_make_request+0xc6/0x200
>     [  424.251017]  [<ffffffff813759b7>] generic_make_request+0xc7/0x100
>     [  424.251050]  [<ffffffff8157ebe4>] kcryptd_crypt_write_io_submit+0x44/0xc0
>     [  424.251082]  [<ffffffff8157f160>] kcryptd_crypt+0x280/0x3d0
>     [  424.251112]  [<ffffffff8157eee0>] ? crypt_convert_init.isra.17+0x60/0x60
>     [  424.251146]  [<ffffffff8108dfda>] process_one_work+0x11a/0x480
>     [  424.251176]  [<ffffffff8108ed64>] worker_thread+0x164/0x370
>     [  424.251205]  [<ffffffff8108ec00>] ? manage_workers.isra.30+0x230/0x230
>     [  424.251889]  [<ffffffff810934dc>] kthread+0x8c/0xa0
>     [  424.251977]  [<ffffffff816d8e34>] kernel_thread_helper+0x4/0x10
>     [  424.252009]  [<ffffffff81093450>] ? flush_kthread_worker+0xa0/0xa0
>     [  424.252040]  [<ffffffff816d8e30>] ? gs_change+0x13/0x13
>     [  424.252068] Code: 48 89 da 49 83 c6 10 8b 4d cc 48 8b 75 c0 ff d0 48 8b 55 b8 4c 89 f0 4c 29 f8 48 8b 44 02 f0 48 85 c0 75 d8 e9 9f fd ff ff 0f 0b <0f> 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5
>     [  424.253585] RIP  [<ffffffff811b114f>] bio_split+0x2bf/0x2d0
>     [  424.253626]  RSP <ffff88022e305c20>
>     [  424.312992] ---[ end trace c8048857547cd8da ]---
>     [  424.313134] BUG: unable to handle kernel paging request at fffffffffffffff8
>     [  424.313225] IP: [<ffffffff81093971>] kthread_data+0x11/0x20
>     [  424.313279] PGD 1c07067 PUD 1c08067 PMD 0
>     [  424.313357] Oops: 0000 [#2] SMP DEBUG_PAGEALLOC
>     [  424.313437] CPU 3
>     [  424.313459] Modules linked in: netconsole configfs reiserfs f71882fg coretemp raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx linear i915 snd_pcm snd_timer mpt2sas snd drm_kms_helper firewire_ohci soundcore usblp snd_page_alloc tpm_tis drm firewire_core scsi_transport_sas pcspkr crc_itu_t pata_jmicron
> 
> 
> _______________________________________________
> dm-crypt mailing list
> dm-crypt@saout.de
> http://www.saout.de/mailman/listinfo/dm-crypt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
  2012-01-25  2:14 ` Mandeep Singh Baines
@ 2012-01-25  8:06   ` Luzipher McLeod
  2012-01-25  9:08     ` Milan Broz
  2012-01-25 18:20     ` Mandeep Singh Baines
  0 siblings, 2 replies; 11+ messages in thread
From: Luzipher McLeod @ 2012-01-25  8:06 UTC (permalink / raw)
  To: Mandeep Singh Baines; +Cc: dm-crypt

Hi Mandeep,

Thanks fpr your quick answer. So, what can be done about this ? Should I try to apply that patch you linked to ? (but I guess a patch from 2007 won't apply cleanly ...)

Regards,
Luzipher



--- On Wed, 25/1/12, Mandeep Singh Baines <msb@chromium.org> wrote:

> From: Mandeep Singh Baines <msb@chromium.org>
> Subject: Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
> To: "Luzipher McLeod" <luziphermcleod@yahoo.ie>
> Cc: dm-crypt@saout.de, "NeilBrown" <neilb@suse.de>
> Date: Wednesday, 25 January, 2012, 2:14
> Luzipher McLeod (luziphermcleod@yahoo.ie)
> wrote:
> > Hi :-)
> > 
> > A few days ago I encountered a kernel bug while copying
> files to an encrypted filesystem. The specific stack for the
> filesystem is: btrfs-on-crypt-on-mdraid. Vasts amounts of
> data copied without problems (about 6.3TB with 1.1 TB
> remaining), but when copying a certain directory, the kernel
> bug surfaces. I repeatedly deleted the affected directory
> and tried to re-copy it, but it always fails at the same
> point (or close to that). More recent test showed that I
> could copy a few more files to the filesystem to a different
> directory, but it very quickly failed there as well (a few
> megabytes later).
> > After talking to the btrfs devs on freenode (as btrfs
> is the most experimental thing in the stack, they came to
> the conclusion that it's most probably the crypto layer.
> > 
> > Some details:
> > gentoo kernel 3.2.1 (custom config and ubuntu config)
> > mdraid: linear, 4 disks, each 2TB (total 8TB)
> > crypt: setup via cryptsetup -c aes-xts-plain64 -h plain
> -s 512 -d - create tempraid /dev/md/tempraid_lin
> > 
> > I'd appreciate any help with this and would be happy to
> test patches or provide more debug info.
> > 
> > Thanks and Regards,
> > Luzipher
> > 
> > 
> > 
> > 
> > The kernel bug output retrieved by netconsole (also at
> http://pastebin.com/sjJy7QE4 ):
> >     [  294.538422] netconsole:
> local port 6666
> >     [  333.423583] SysRq :
> Changing Loglevel
> >     [  333.423609] Loglevel
> set to 9
> >     [  424.248405]
> ------------[ cut here ]------------
> >     [  424.248447] kernel BUG
> at fs/bio.c:1499!
> 
> Hi Luzipher,
> 
> Looks like the BUG is because bio_split only works on
> single-page iovecs.
> 
> I see a relevant (old) patch from Neil Brown here:
> 
> https://lkml.org/lkml/2007/7/30/496
> 
> Regards,
> Mandeep
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
  2012-01-25  8:06   ` Luzipher McLeod
@ 2012-01-25  9:08     ` Milan Broz
  2012-01-25 18:20     ` Mandeep Singh Baines
  1 sibling, 0 replies; 11+ messages in thread
From: Milan Broz @ 2012-01-25  9:08 UTC (permalink / raw)
  To: Luzipher McLeod; +Cc: dm-crypt, Mandeep Singh Baines

On 01/25/2012 09:06 AM, Luzipher McLeod wrote:
> Hi Mandeep,
>
> Thanks fpr your quick answer. So, what can be done about this ?
> Should I try to apply that patch you linked to ? (but I guess a patch
> from 2007 won't apply cleanly ...)

bio_split is called from linear_make_request, which is MD raid code.
(MD linear is very rarely used IMHO, device-mapper & LVM usually works
better in this situation - can you try to reproduce it using DM?)

Anyway, seems like and MD bug, please forward it to md list
linux-raid@vger.kernel.org

(please cc me, if it uncovers some bug in dmcrypt, I'll fix it)

Milan

p.s.
> After talking to the btrfs devs on freenode (as btrfs is the most
> experimental thing in the stack, they came to the conclusion
> that it's most probably the crypto layer.

They are repeating this for years every time they see dmcrypt in stack despite
they are not able to provide reliable reproducer. We have no such report
for ext3, ext4, xfs or any other fs.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
  2012-01-25  8:06   ` Luzipher McLeod
  2012-01-25  9:08     ` Milan Broz
@ 2012-01-25 18:20     ` Mandeep Singh Baines
  2012-01-25 23:46         ` Mandeep Singh Baines
  1 sibling, 1 reply; 11+ messages in thread
From: Mandeep Singh Baines @ 2012-01-25 18:20 UTC (permalink / raw)
  To: Luzipher McLeod; +Cc: dm-crypt, Mandeep Singh Baines

Luzipher McLeod (luziphermcleod@yahoo.ie) wrote:
> Hi Mandeep,
> 
> Thanks fpr your quick answer. So, what can be done about this ? Should I try to apply that patch you linked to ? (but I guess a patch from 2007 won't apply cleanly ...)
> 

Hi Luzipher,

I wouldn't apply the patch directly. Just copy bio_multi_split (might
need to do some forward porting) and then modify linear_make_request to
use bio_multi_split instead of bio_split.

But I'm not really an expert on this particular code. I'm hoping someone
else will confirm that this is in fact the bug and not a side effect of
something else. Its seem reasonble that you could get a bio that is
multi-page and falls on a boundary (spans two or more devices). So I
suspect this is the bug.

Regards,
Mandeep

> Regards,
> Luzipher
> 
> 
> 
> --- On Wed, 25/1/12, Mandeep Singh Baines <msb@chromium.org> wrote:
> 
> > From: Mandeep Singh Baines <msb@chromium.org>
> > Subject: Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
> > To: "Luzipher McLeod" <luziphermcleod@yahoo.ie>
> > Cc: dm-crypt@saout.de, "NeilBrown" <neilb@suse.de>
> > Date: Wednesday, 25 January, 2012, 2:14
> > Luzipher McLeod (luziphermcleod@yahoo.ie)
> > wrote:
> > > Hi :-)
> > > 
> > > A few days ago I encountered a kernel bug while copying
> > files to an encrypted filesystem. The specific stack for the
> > filesystem is: btrfs-on-crypt-on-mdraid. Vasts amounts of
> > data copied without problems (about 6.3TB with 1.1 TB
> > remaining), but when copying a certain directory, the kernel
> > bug surfaces. I repeatedly deleted the affected directory
> > and tried to re-copy it, but it always fails at the same
> > point (or close to that). More recent test showed that I
> > could copy a few more files to the filesystem to a different
> > directory, but it very quickly failed there as well (a few
> > megabytes later).
> > > After talking to the btrfs devs on freenode (as btrfs
> > is the most experimental thing in the stack, they came to
> > the conclusion that it's most probably the crypto layer.
> > > 
> > > Some details:
> > > gentoo kernel 3.2.1 (custom config and ubuntu config)
> > > mdraid: linear, 4 disks, each 2TB (total 8TB)
> > > crypt: setup via cryptsetup -c aes-xts-plain64 -h plain
> > -s 512 -d - create tempraid /dev/md/tempraid_lin
> > > 
> > > I'd appreciate any help with this and would be happy to
> > test patches or provide more debug info.
> > > 
> > > Thanks and Regards,
> > > Luzipher
> > > 
> > > 
> > > 
> > > 
> > > The kernel bug output retrieved by netconsole (also at
> > http://pastebin.com/sjJy7QE4 ):
> > >     [  294.538422] netconsole:
> > local port 6666
> > >     [  333.423583] SysRq :
> > Changing Loglevel
> > >     [  333.423609] Loglevel
> > set to 9
> > >     [  424.248405]
> > ------------[ cut here ]------------
> > >     [  424.248447] kernel BUG
> > at fs/bio.c:1499!
> > 
> > Hi Luzipher,
> > 
> > Looks like the BUG is because bio_split only works on
> > single-page iovecs.
> > 
> > I see a relevant (old) patch from Neil Brown here:
> > 
> > https://lkml.org/lkml/2007/7/30/496
> > 
> > Regards,
> > Mandeep
> > 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
  2012-01-25 18:20     ` Mandeep Singh Baines
@ 2012-01-25 23:46         ` Mandeep Singh Baines
  0 siblings, 0 replies; 11+ messages in thread
From: Mandeep Singh Baines @ 2012-01-25 23:46 UTC (permalink / raw)
  To: Mandeep Singh Baines; +Cc: dm-crypt, linux-raid, Luzipher McLeod

+cc linux-raid

Mandeep Singh Baines (msb@chromium.org) wrote:
> Luzipher McLeod (luziphermcleod@yahoo.ie) wrote:
> > Hi Mandeep,
> > 
> > Thanks fpr your quick answer. So, what can be done about this ? Should I try to apply that patch you linked to ? (but I guess a patch from 2007 won't apply cleanly ...)
> > 
> 
> Hi Luzipher,
> 
> I wouldn't apply the patch directly. Just copy bio_multi_split (might
> need to do some forward porting) and then modify linear_make_request to
> use bio_multi_split instead of bio_split.
> 
> But I'm not really an expert on this particular code. I'm hoping someone
> else will confirm that this is in fact the bug and not a side effect of
> something else. Its seem reasonble that you could get a bio that is
> multi-page and falls on a boundary (spans two or more devices). So I
> suspect this is the bug.
> 
> Regards,
> Mandeep
> 
> > Regards,
> > Luzipher
> > 
> > 
> > 
> > --- On Wed, 25/1/12, Mandeep Singh Baines <msb@chromium.org> wrote:
> > 
> > > From: Mandeep Singh Baines <msb@chromium.org>
> > > Subject: Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
> > > To: "Luzipher McLeod" <luziphermcleod@yahoo.ie>
> > > Cc: dm-crypt@saout.de, "NeilBrown" <neilb@suse.de>
> > > Date: Wednesday, 25 January, 2012, 2:14
> > > Luzipher McLeod (luziphermcleod@yahoo.ie)
> > > wrote:
> > > > Hi :-)
> > > > 
> > > > A few days ago I encountered a kernel bug while copying
> > > files to an encrypted filesystem. The specific stack for the
> > > filesystem is: btrfs-on-crypt-on-mdraid. Vasts amounts of
> > > data copied without problems (about 6.3TB with 1.1 TB
> > > remaining), but when copying a certain directory, the kernel
> > > bug surfaces. I repeatedly deleted the affected directory
> > > and tried to re-copy it, but it always fails at the same
> > > point (or close to that). More recent test showed that I
> > > could copy a few more files to the filesystem to a different
> > > directory, but it very quickly failed there as well (a few
> > > megabytes later).
> > > > After talking to the btrfs devs on freenode (as btrfs
> > > is the most experimental thing in the stack, they came to
> > > the conclusion that it's most probably the crypto layer.
> > > > 
> > > > Some details:
> > > > gentoo kernel 3.2.1 (custom config and ubuntu config)
> > > > mdraid: linear, 4 disks, each 2TB (total 8TB)
> > > > crypt: setup via cryptsetup -c aes-xts-plain64 -h plain
> > > -s 512 -d - create tempraid /dev/md/tempraid_lin
> > > > 
> > > > I'd appreciate any help with this and would be happy to
> > > test patches or provide more debug info.
> > > > 
> > > > Thanks and Regards,
> > > > Luzipher
> > > > 
> > > > 
> > > > 
> > > > 
> > > > The kernel bug output retrieved by netconsole (also at
> > > http://pastebin.com/sjJy7QE4 ):
> > > >     [  294.538422] netconsole:
> > > local port 6666
> > > >     [  333.423583] SysRq :
> > > Changing Loglevel
> > > >     [  333.423609] Loglevel
> > > set to 9
> > > >     [  424.248405]
> > > ------------[ cut here ]------------
> > > >     [  424.248447] kernel BUG
> > > at fs/bio.c:1499!
> > > 
> > > Hi Luzipher,
> > > 
> > > Looks like the BUG is because bio_split only works on
> > > single-page iovecs.
> > > 
> > > I see a relevant (old) patch from Neil Brown here:
> > > 
> > > https://lkml.org/lkml/2007/7/30/496
> > > 
> > > Regards,
> > > Mandeep
> > > 
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
@ 2012-01-25 23:46         ` Mandeep Singh Baines
  0 siblings, 0 replies; 11+ messages in thread
From: Mandeep Singh Baines @ 2012-01-25 23:46 UTC (permalink / raw)
  To: Mandeep Singh Baines; +Cc: Luzipher McLeod, dm-crypt, linux-raid

+cc linux-raid

Mandeep Singh Baines (msb@chromium.org) wrote:
> Luzipher McLeod (luziphermcleod@yahoo.ie) wrote:
> > Hi Mandeep,
> > 
> > Thanks fpr your quick answer. So, what can be done about this ? Should I try to apply that patch you linked to ? (but I guess a patch from 2007 won't apply cleanly ...)
> > 
> 
> Hi Luzipher,
> 
> I wouldn't apply the patch directly. Just copy bio_multi_split (might
> need to do some forward porting) and then modify linear_make_request to
> use bio_multi_split instead of bio_split.
> 
> But I'm not really an expert on this particular code. I'm hoping someone
> else will confirm that this is in fact the bug and not a side effect of
> something else. Its seem reasonble that you could get a bio that is
> multi-page and falls on a boundary (spans two or more devices). So I
> suspect this is the bug.
> 
> Regards,
> Mandeep
> 
> > Regards,
> > Luzipher
> > 
> > 
> > 
> > --- On Wed, 25/1/12, Mandeep Singh Baines <msb@chromium.org> wrote:
> > 
> > > From: Mandeep Singh Baines <msb@chromium.org>
> > > Subject: Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
> > > To: "Luzipher McLeod" <luziphermcleod@yahoo.ie>
> > > Cc: dm-crypt@saout.de, "NeilBrown" <neilb@suse.de>
> > > Date: Wednesday, 25 January, 2012, 2:14
> > > Luzipher McLeod (luziphermcleod@yahoo.ie)
> > > wrote:
> > > > Hi :-)
> > > > 
> > > > A few days ago I encountered a kernel bug while copying
> > > files to an encrypted filesystem. The specific stack for the
> > > filesystem is: btrfs-on-crypt-on-mdraid. Vasts amounts of
> > > data copied without problems (about 6.3TB with 1.1 TB
> > > remaining), but when copying a certain directory, the kernel
> > > bug surfaces. I repeatedly deleted the affected directory
> > > and tried to re-copy it, but it always fails at the same
> > > point (or close to that). More recent test showed that I
> > > could copy a few more files to the filesystem to a different
> > > directory, but it very quickly failed there as well (a few
> > > megabytes later).
> > > > After talking to the btrfs devs on freenode (as btrfs
> > > is the most experimental thing in the stack, they came to
> > > the conclusion that it's most probably the crypto layer.
> > > > 
> > > > Some details:
> > > > gentoo kernel 3.2.1 (custom config and ubuntu config)
> > > > mdraid: linear, 4 disks, each 2TB (total 8TB)
> > > > crypt: setup via cryptsetup -c aes-xts-plain64 -h plain
> > > -s 512 -d - create tempraid /dev/md/tempraid_lin
> > > > 
> > > > I'd appreciate any help with this and would be happy to
> > > test patches or provide more debug info.
> > > > 
> > > > Thanks and Regards,
> > > > Luzipher
> > > > 
> > > > 
> > > > 
> > > > 
> > > > The kernel bug output retrieved by netconsole (also at
> > > http://pastebin.com/sjJy7QE4 ):
> > > >     [  294.538422] netconsole:
> > > local port 6666
> > > >     [  333.423583] SysRq :
> > > Changing Loglevel
> > > >     [  333.423609] Loglevel
> > > set to 9
> > > >     [  424.248405]
> > > ------------[ cut here ]------------
> > > >     [  424.248447] kernel BUG
> > > at fs/bio.c:1499!
> > > 
> > > Hi Luzipher,
> > > 
> > > Looks like the BUG is because bio_split only works on
> > > single-page iovecs.
> > > 
> > > I see a relevant (old) patch from Neil Brown here:
> > > 
> > > https://lkml.org/lkml/2007/7/30/496
> > > 
> > > Regards,
> > > Mandeep
> > > 
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
@ 2012-01-30  0:46           ` Luzipher McLeod
  0 siblings, 0 replies; 11+ messages in thread
From: Luzipher McLeod @ 2012-01-30  0:46 UTC (permalink / raw)
  To: Mandeep Singh Baines; +Cc: dm-crypt, linux-raid

Hi Mandeep,

sorry for the long delay, I couldn't get around to try this earlier.
I did as you suggested and copied the bio_multi_split function into bio.c, and added its signature to bio.h (as done by the original patch by Neil Brown). Then I replaced the call to bio_split with bio_multi_split in linear_make_request.
Unfortunately I fail with the forward-porting. Compilation results in:

fs/bio.c: In function 'bio_multi_split':
fs/bio.c:1585:23: error: 'struct bio' has no member named 'bi_iocnt'
fs/bio.c:1591:8: error: 'struct bio' has no member named 'bi_offset'
fs/bio.c:1592:15: error: 'struct bio' has no member named 'bi_offset'
fs/bio.c:1593:5: error: 'struct bio' has no member named 'bi_offset'
fs/bio.c:1598:12: error: 'struct bio' has no member named 'bi_offset'
make[1]: *** [fs/bio.o] Error 1
make: *** [fs] Error 2

Further examination reveals, that 'struct bio', as defined in blk_types.h really doesn't have those members anymore and I can't determine an easy replacement for those. The whole original patch also doesn't touch 'struct bio', so I'm at a loss what to do.

Thanks for any pointers or hints how to resolve this !

Regards,
Luzipher




--- On Wed, 25/1/12, Mandeep Singh Baines <msb@chromium.org> wrote:

> From: Mandeep Singh Baines <msb@chromium.org>
> Subject: Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
> To: "Mandeep Singh Baines" <msb@chromium.org>
> Cc: "Luzipher McLeod" <luziphermcleod@yahoo.ie>, dm-crypt@saout.de, linux-raid@vger.kernel.org
> Date: Wednesday, 25 January, 2012, 23:46
> +cc linux-raid
> 
> Mandeep Singh Baines (msb@chromium.org)
> wrote:
> > Luzipher McLeod (luziphermcleod@yahoo.ie)
> wrote:
> > > Hi Mandeep,
> > > 
> > > Thanks fpr your quick answer. So, what can be done
> about this ? Should I try to apply that patch you linked to
> ? (but I guess a patch from 2007 won't apply cleanly ...)
> > > 
> > 
> > Hi Luzipher,
> > 
> > I wouldn't apply the patch directly. Just copy
> bio_multi_split (might
> > need to do some forward porting) and then modify
> linear_make_request to
> > use bio_multi_split instead of bio_split.
> > 
> > But I'm not really an expert on this particular code.
> I'm hoping someone
> > else will confirm that this is in fact the bug and not
> a side effect of
> > something else. Its seem reasonble that you could get a
> bio that is
> > multi-page and falls on a boundary (spans two or more
> devices). So I
> > suspect this is the bug.
> > 
> > Regards,
> > Mandeep
> > 
> > > Regards,
> > > Luzipher
> > > 
> > > 
> > > 
> > > --- On Wed, 25/1/12, Mandeep Singh Baines <msb@chromium.org>
> wrote:
> > > 
> > > > From: Mandeep Singh Baines <msb@chromium.org>
> > > > Subject: Re: [dm-crypt] Kernel BUG
> (fs/bio.c:1499) when copying more files to an encrypted
> device
> > > > To: "Luzipher McLeod" <luziphermcleod@yahoo.ie>
> > > > Cc: dm-crypt@saout.de,
> "NeilBrown" <neilb@suse.de>
> > > > Date: Wednesday, 25 January, 2012, 2:14
> > > > Luzipher McLeod (luziphermcleod@yahoo.ie)
> > > > wrote:
> > > > > Hi :-)
> > > > > 
> > > > > A few days ago I encountered a kernel
> bug while copying
> > > > files to an encrypted filesystem. The
> specific stack for the
> > > > filesystem is: btrfs-on-crypt-on-mdraid.
> Vasts amounts of
> > > > data copied without problems (about 6.3TB
> with 1.1 TB
> > > > remaining), but when copying a certain
> directory, the kernel
> > > > bug surfaces. I repeatedly deleted the
> affected directory
> > > > and tried to re-copy it, but it always fails
> at the same
> > > > point (or close to that). More recent test
> showed that I
> > > > could copy a few more files to the filesystem
> to a different
> > > > directory, but it very quickly failed there
> as well (a few
> > > > megabytes later).
> > > > > After talking to the btrfs devs on
> freenode (as btrfs
> > > > is the most experimental thing in the stack,
> they came to
> > > > the conclusion that it's most probably the
> crypto layer.
> > > > > 
> > > > > Some details:
> > > > > gentoo kernel 3.2.1 (custom config and
> ubuntu config)
> > > > > mdraid: linear, 4 disks, each 2TB (total
> 8TB)
> > > > > crypt: setup via cryptsetup -c
> aes-xts-plain64 -h plain
> > > > -s 512 -d - create tempraid
> /dev/md/tempraid_lin
> > > > > 
> > > > > I'd appreciate any help with this and
> would be happy to
> > > > test patches or provide more debug info.
> > > > > 
> > > > > Thanks and Regards,
> > > > > Luzipher
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > The kernel bug output retrieved by
> netconsole (also at
> > > > http://pastebin.com/sjJy7QE4 ):
> > > > >     [  294.538422] netconsole:
> > > > local port 6666
> > > > >     [  333.423583] SysRq :
> > > > Changing Loglevel
> > > > >     [  333.423609] Loglevel
> > > > set to 9
> > > > >     [  424.248405]
> > > > ------------[ cut here ]------------
> > > > >     [  424.248447] kernel BUG
> > > > at fs/bio.c:1499!
> > > > 
> > > > Hi Luzipher,
> > > > 
> > > > Looks like the BUG is because bio_split only
> works on
> > > > single-page iovecs.
> > > > 
> > > > I see a relevant (old) patch from Neil Brown
> here:
> > > > 
> > > > https://lkml.org/lkml/2007/7/30/496
> > > > 
> > > > Regards,
> > > > Mandeep
> > > > 
> > > 
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
@ 2012-01-30  0:46           ` Luzipher McLeod
  0 siblings, 0 replies; 11+ messages in thread
From: Luzipher McLeod @ 2012-01-30  0:46 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: dm-crypt-4q3lyFh4P1g, linux-raid-u79uwXL29TY76Z2rM5mHXA

Hi Mandeep,

sorry for the long delay, I couldn't get around to try this earlier.
I did as you suggested and copied the bio_multi_split function into bio.c, and added its signature to bio.h (as done by the original patch by Neil Brown). Then I replaced the call to bio_split with bio_multi_split in linear_make_request.
Unfortunately I fail with the forward-porting. Compilation results in:

fs/bio.c: In function 'bio_multi_split':
fs/bio.c:1585:23: error: 'struct bio' has no member named 'bi_iocnt'
fs/bio.c:1591:8: error: 'struct bio' has no member named 'bi_offset'
fs/bio.c:1592:15: error: 'struct bio' has no member named 'bi_offset'
fs/bio.c:1593:5: error: 'struct bio' has no member named 'bi_offset'
fs/bio.c:1598:12: error: 'struct bio' has no member named 'bi_offset'
make[1]: *** [fs/bio.o] Error 1
make: *** [fs] Error 2

Further examination reveals, that 'struct bio', as defined in blk_types.h really doesn't have those members anymore and I can't determine an easy replacement for those. The whole original patch also doesn't touch 'struct bio', so I'm at a loss what to do.

Thanks for any pointers or hints how to resolve this !

Regards,
Luzipher




--- On Wed, 25/1/12, Mandeep Singh Baines <msb-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote:

> From: Mandeep Singh Baines <msb-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> Subject: Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
> To: "Mandeep Singh Baines" <msb-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> Cc: "Luzipher McLeod" <luziphermcleod-CVfgzVlYcRg@public.gmane.org>, dm-crypt-4q3lyFh4P1g@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Date: Wednesday, 25 January, 2012, 23:46
> +cc linux-raid
> 
> Mandeep Singh Baines (msb-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org)
> wrote:
> > Luzipher McLeod (luziphermcleod-CVfgzVlYcRg@public.gmane.org)
> wrote:
> > > Hi Mandeep,
> > > 
> > > Thanks fpr your quick answer. So, what can be done
> about this ? Should I try to apply that patch you linked to
> ? (but I guess a patch from 2007 won't apply cleanly ...)
> > > 
> > 
> > Hi Luzipher,
> > 
> > I wouldn't apply the patch directly. Just copy
> bio_multi_split (might
> > need to do some forward porting) and then modify
> linear_make_request to
> > use bio_multi_split instead of bio_split.
> > 
> > But I'm not really an expert on this particular code.
> I'm hoping someone
> > else will confirm that this is in fact the bug and not
> a side effect of
> > something else. Its seem reasonble that you could get a
> bio that is
> > multi-page and falls on a boundary (spans two or more
> devices). So I
> > suspect this is the bug.
> > 
> > Regards,
> > Mandeep
> > 
> > > Regards,
> > > Luzipher
> > > 
> > > 
> > > 
> > > --- On Wed, 25/1/12, Mandeep Singh Baines <msb-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> wrote:
> > > 
> > > > From: Mandeep Singh Baines <msb-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> > > > Subject: Re: [dm-crypt] Kernel BUG
> (fs/bio.c:1499) when copying more files to an encrypted
> device
> > > > To: "Luzipher McLeod" <luziphermcleod-CVfgzVlYcRg@public.gmane.org>
> > > > Cc: dm-crypt-4q3lyFh4P1g@public.gmane.org,
> "NeilBrown" <neilb-l3A5Bk7waGM@public.gmane.org>
> > > > Date: Wednesday, 25 January, 2012, 2:14
> > > > Luzipher McLeod (luziphermcleod-CVfgzVlYcRg@public.gmane.org)
> > > > wrote:
> > > > > Hi :-)
> > > > > 
> > > > > A few days ago I encountered a kernel
> bug while copying
> > > > files to an encrypted filesystem. The
> specific stack for the
> > > > filesystem is: btrfs-on-crypt-on-mdraid.
> Vasts amounts of
> > > > data copied without problems (about 6.3TB
> with 1.1 TB
> > > > remaining), but when copying a certain
> directory, the kernel
> > > > bug surfaces. I repeatedly deleted the
> affected directory
> > > > and tried to re-copy it, but it always fails
> at the same
> > > > point (or close to that). More recent test
> showed that I
> > > > could copy a few more files to the filesystem
> to a different
> > > > directory, but it very quickly failed there
> as well (a few
> > > > megabytes later).
> > > > > After talking to the btrfs devs on
> freenode (as btrfs
> > > > is the most experimental thing in the stack,
> they came to
> > > > the conclusion that it's most probably the
> crypto layer.
> > > > > 
> > > > > Some details:
> > > > > gentoo kernel 3.2.1 (custom config and
> ubuntu config)
> > > > > mdraid: linear, 4 disks, each 2TB (total
> 8TB)
> > > > > crypt: setup via cryptsetup -c
> aes-xts-plain64 -h plain
> > > > -s 512 -d - create tempraid
> /dev/md/tempraid_lin
> > > > > 
> > > > > I'd appreciate any help with this and
> would be happy to
> > > > test patches or provide more debug info.
> > > > > 
> > > > > Thanks and Regards,
> > > > > Luzipher
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > The kernel bug output retrieved by
> netconsole (also at
> > > > http://pastebin.com/sjJy7QE4 ):
> > > > >     [  294.538422] netconsole:
> > > > local port 6666
> > > > >     [  333.423583] SysRq :
> > > > Changing Loglevel
> > > > >     [  333.423609] Loglevel
> > > > set to 9
> > > > >     [  424.248405]
> > > > ------------[ cut here ]------------
> > > > >     [  424.248447] kernel BUG
> > > > at fs/bio.c:1499!
> > > > 
> > > > Hi Luzipher,
> > > > 
> > > > Looks like the BUG is because bio_split only
> works on
> > > > single-page iovecs.
> > > > 
> > > > I see a relevant (old) patch from Neil Brown
> here:
> > > > 
> > > > https://lkml.org/lkml/2007/7/30/496
> > > > 
> > > > Regards,
> > > > Mandeep
> > > > 
> > > 
> 
_______________________________________________
dm-crypt mailing list
dm-crypt-4q3lyFh4P1g@public.gmane.org
http://www.saout.de/mailman/listinfo/dm-crypt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
  2012-01-30  0:46           ` Luzipher McLeod
@ 2012-01-31 23:26             ` Luzipher McLeod
  -1 siblings, 0 replies; 11+ messages in thread
From: Luzipher McLeod @ 2012-01-31 23:26 UTC (permalink / raw)
  To: linux-raid; +Cc: dm-crypt, Mandeep Singh Baines

Hi list,

I just wanted to know if there is interest in solving this bug (see below). I'd need some help with creating a working patch for bio_multi_split, so I could test if it solves my problem.
If nobody has any further suggestions, I'll just use lvm instead of linear, as it's a temporary backup anyway.

Thanks and regards,
Luzipher



--- On Mon, 30/1/12, Luzipher McLeod <luziphermcleod@yahoo.ie> wrote:

> From: Luzipher McLeod <luziphermcleod@yahoo.ie>
> Subject: Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
> To: "Mandeep Singh Baines" <msb@chromium.org>
> Cc: dm-crypt@saout.de, linux-raid@vger.kernel.org
> Date: Monday, 30 January, 2012, 0:46
> Hi Mandeep,
> 
> sorry for the long delay, I couldn't get around to try this
> earlier.
> I did as you suggested and copied the bio_multi_split
> function into bio.c, and added its signature to bio.h (as
> done by the original patch by Neil Brown). Then I replaced
> the call to bio_split with bio_multi_split in
> linear_make_request.
> Unfortunately I fail with the forward-porting. Compilation
> results in:
> 
> fs/bio.c: In function 'bio_multi_split':
> fs/bio.c:1585:23: error: 'struct bio' has no member named
> 'bi_iocnt'
> fs/bio.c:1591:8: error: 'struct bio' has no member named
> 'bi_offset'
> fs/bio.c:1592:15: error: 'struct bio' has no member named
> 'bi_offset'
> fs/bio.c:1593:5: error: 'struct bio' has no member named
> 'bi_offset'
> fs/bio.c:1598:12: error: 'struct bio' has no member named
> 'bi_offset'
> make[1]: *** [fs/bio.o] Error 1
> make: *** [fs] Error 2
> 
> Further examination reveals, that 'struct bio', as defined
> in blk_types.h really doesn't have those members anymore and
> I can't determine an easy replacement for those. The whole
> original patch also doesn't touch 'struct bio', so I'm at a
> loss what to do.
> 
> Thanks for any pointers or hints how to resolve this !
> 
> Regards,
> Luzipher
> 
> 
> 
> 
> --- On Wed, 25/1/12, Mandeep Singh Baines <msb@chromium.org>
> wrote:
> 
> > From: Mandeep Singh Baines <msb@chromium.org>
> > Subject: Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when
> copying more files to an encrypted device
> > To: "Mandeep Singh Baines" <msb@chromium.org>
> > Cc: "Luzipher McLeod" <luziphermcleod@yahoo.ie>,
> dm-crypt@saout.de,
> linux-raid@vger.kernel.org
> > Date: Wednesday, 25 January, 2012, 23:46
> > +cc linux-raid
> > 
> > Mandeep Singh Baines (msb@chromium.org)
> > wrote:
> > > Luzipher McLeod (luziphermcleod@yahoo.ie)
> > wrote:
> > > > Hi Mandeep,
> > > > 
> > > > Thanks fpr your quick answer. So, what can be
> done
> > about this ? Should I try to apply that patch you
> linked to
> > ? (but I guess a patch from 2007 won't apply cleanly
> ...)
> > > > 
> > > 
> > > Hi Luzipher,
> > > 
> > > I wouldn't apply the patch directly. Just copy
> > bio_multi_split (might
> > > need to do some forward porting) and then modify
> > linear_make_request to
> > > use bio_multi_split instead of bio_split.
> > > 
> > > But I'm not really an expert on this particular
> code.
> > I'm hoping someone
> > > else will confirm that this is in fact the bug and
> not
> > a side effect of
> > > something else. Its seem reasonble that you could
> get a
> > bio that is
> > > multi-page and falls on a boundary (spans two or
> more
> > devices). So I
> > > suspect this is the bug.
> > > 
> > > Regards,
> > > Mandeep
> > > 
> > > > Regards,
> > > > Luzipher
> > > > 
> > > > 
> > > > 
> > > > --- On Wed, 25/1/12, Mandeep Singh Baines
> <msb@chromium.org>
> > wrote:
> > > > 
> > > > > From: Mandeep Singh Baines <msb@chromium.org>
> > > > > Subject: Re: [dm-crypt] Kernel BUG
> > (fs/bio.c:1499) when copying more files to an
> encrypted
> > device
> > > > > To: "Luzipher McLeod" <luziphermcleod@yahoo.ie>
> > > > > Cc: dm-crypt@saout.de,
> > "NeilBrown" <neilb@suse.de>
> > > > > Date: Wednesday, 25 January, 2012, 2:14
> > > > > Luzipher McLeod (luziphermcleod@yahoo.ie)
> > > > > wrote:
> > > > > > Hi :-)
> > > > > > 
> > > > > > A few days ago I encountered a
> kernel
> > bug while copying
> > > > > files to an encrypted filesystem. The
> > specific stack for the
> > > > > filesystem is:
> btrfs-on-crypt-on-mdraid.
> > Vasts amounts of
> > > > > data copied without problems (about
> 6.3TB
> > with 1.1 TB
> > > > > remaining), but when copying a certain
> > directory, the kernel
> > > > > bug surfaces. I repeatedly deleted the
> > affected directory
> > > > > and tried to re-copy it, but it always
> fails
> > at the same
> > > > > point (or close to that). More recent
> test
> > showed that I
> > > > > could copy a few more files to the
> filesystem
> > to a different
> > > > > directory, but it very quickly failed
> there
> > as well (a few
> > > > > megabytes later).
> > > > > > After talking to the btrfs devs on
> > freenode (as btrfs
> > > > > is the most experimental thing in the
> stack,
> > they came to
> > > > > the conclusion that it's most probably
> the
> > crypto layer.
> > > > > > 
> > > > > > Some details:
> > > > > > gentoo kernel 3.2.1 (custom config
> and
> > ubuntu config)
> > > > > > mdraid: linear, 4 disks, each 2TB
> (total
> > 8TB)
> > > > > > crypt: setup via cryptsetup -c
> > aes-xts-plain64 -h plain
> > > > > -s 512 -d - create tempraid
> > /dev/md/tempraid_lin
> > > > > > 
> > > > > > I'd appreciate any help with this
> and
> > would be happy to
> > > > > test patches or provide more debug
> info.
> > > > > > 
> > > > > > Thanks and Regards,
> > > > > > Luzipher
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > The kernel bug output retrieved by
> > netconsole (also at
> > > > > http://pastebin.com/sjJy7QE4 ):
> > > > > >     [  294.538422]
> netconsole:
> > > > > local port 6666
> > > > > >     [  333.423583] SysRq :
> > > > > Changing Loglevel
> > > > > >     [  333.423609] Loglevel
> > > > > set to 9
> > > > > >     [  424.248405]
> > > > > ------------[ cut here ]------------
> > > > > >     [  424.248447] kernel BUG
> > > > > at fs/bio.c:1499!
> > > > > 
> > > > > Hi Luzipher,
> > > > > 
> > > > > Looks like the BUG is because bio_split
> only
> > works on
> > > > > single-page iovecs.
> > > > > 
> > > > > I see a relevant (old) patch from Neil
> Brown
> > here:
> > > > > 
> > > > > https://lkml.org/lkml/2007/7/30/496
> > > > > 
> > > > > Regards,
> > > > > Mandeep
> > > > > 
> > > > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
@ 2012-01-31 23:26             ` Luzipher McLeod
  0 siblings, 0 replies; 11+ messages in thread
From: Luzipher McLeod @ 2012-01-31 23:26 UTC (permalink / raw)
  To: linux-raid; +Cc: dm-crypt, Mandeep Singh Baines

Hi list,

I just wanted to know if there is interest in solving this bug (see below). I'd need some help with creating a working patch for bio_multi_split, so I could test if it solves my problem.
If nobody has any further suggestions, I'll just use lvm instead of linear, as it's a temporary backup anyway.

Thanks and regards,
Luzipher



--- On Mon, 30/1/12, Luzipher McLeod <luziphermcleod@yahoo.ie> wrote:

> From: Luzipher McLeod <luziphermcleod@yahoo.ie>
> Subject: Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device
> To: "Mandeep Singh Baines" <msb@chromium.org>
> Cc: dm-crypt@saout.de, linux-raid@vger.kernel.org
> Date: Monday, 30 January, 2012, 0:46
> Hi Mandeep,
> 
> sorry for the long delay, I couldn't get around to try this
> earlier.
> I did as you suggested and copied the bio_multi_split
> function into bio.c, and added its signature to bio.h (as
> done by the original patch by Neil Brown). Then I replaced
> the call to bio_split with bio_multi_split in
> linear_make_request.
> Unfortunately I fail with the forward-porting. Compilation
> results in:
> 
> fs/bio.c: In function 'bio_multi_split':
> fs/bio.c:1585:23: error: 'struct bio' has no member named
> 'bi_iocnt'
> fs/bio.c:1591:8: error: 'struct bio' has no member named
> 'bi_offset'
> fs/bio.c:1592:15: error: 'struct bio' has no member named
> 'bi_offset'
> fs/bio.c:1593:5: error: 'struct bio' has no member named
> 'bi_offset'
> fs/bio.c:1598:12: error: 'struct bio' has no member named
> 'bi_offset'
> make[1]: *** [fs/bio.o] Error 1
> make: *** [fs] Error 2
> 
> Further examination reveals, that 'struct bio', as defined
> in blk_types.h really doesn't have those members anymore and
> I can't determine an easy replacement for those. The whole
> original patch also doesn't touch 'struct bio', so I'm at a
> loss what to do.
> 
> Thanks for any pointers or hints how to resolve this !
> 
> Regards,
> Luzipher
> 
> 
> 
> 
> --- On Wed, 25/1/12, Mandeep Singh Baines <msb@chromium.org>
> wrote:
> 
> > From: Mandeep Singh Baines <msb@chromium.org>
> > Subject: Re: [dm-crypt] Kernel BUG (fs/bio.c:1499) when
> copying more files to an encrypted device
> > To: "Mandeep Singh Baines" <msb@chromium.org>
> > Cc: "Luzipher McLeod" <luziphermcleod@yahoo.ie>,
> dm-crypt@saout.de,
> linux-raid@vger.kernel.org
> > Date: Wednesday, 25 January, 2012, 23:46
> > +cc linux-raid
> > 
> > Mandeep Singh Baines (msb@chromium.org)
> > wrote:
> > > Luzipher McLeod (luziphermcleod@yahoo.ie)
> > wrote:
> > > > Hi Mandeep,
> > > > 
> > > > Thanks fpr your quick answer. So, what can be
> done
> > about this ? Should I try to apply that patch you
> linked to
> > ? (but I guess a patch from 2007 won't apply cleanly
> ...)
> > > > 
> > > 
> > > Hi Luzipher,
> > > 
> > > I wouldn't apply the patch directly. Just copy
> > bio_multi_split (might
> > > need to do some forward porting) and then modify
> > linear_make_request to
> > > use bio_multi_split instead of bio_split.
> > > 
> > > But I'm not really an expert on this particular
> code.
> > I'm hoping someone
> > > else will confirm that this is in fact the bug and
> not
> > a side effect of
> > > something else. Its seem reasonble that you could
> get a
> > bio that is
> > > multi-page and falls on a boundary (spans two or
> more
> > devices). So I
> > > suspect this is the bug.
> > > 
> > > Regards,
> > > Mandeep
> > > 
> > > > Regards,
> > > > Luzipher
> > > > 
> > > > 
> > > > 
> > > > --- On Wed, 25/1/12, Mandeep Singh Baines
> <msb@chromium.org>
> > wrote:
> > > > 
> > > > > From: Mandeep Singh Baines <msb@chromium.org>
> > > > > Subject: Re: [dm-crypt] Kernel BUG
> > (fs/bio.c:1499) when copying more files to an
> encrypted
> > device
> > > > > To: "Luzipher McLeod" <luziphermcleod@yahoo.ie>
> > > > > Cc: dm-crypt@saout.de,
> > "NeilBrown" <neilb@suse.de>
> > > > > Date: Wednesday, 25 January, 2012, 2:14
> > > > > Luzipher McLeod (luziphermcleod@yahoo.ie)
> > > > > wrote:
> > > > > > Hi :-)
> > > > > > 
> > > > > > A few days ago I encountered a
> kernel
> > bug while copying
> > > > > files to an encrypted filesystem. The
> > specific stack for the
> > > > > filesystem is:
> btrfs-on-crypt-on-mdraid.
> > Vasts amounts of
> > > > > data copied without problems (about
> 6.3TB
> > with 1.1 TB
> > > > > remaining), but when copying a certain
> > directory, the kernel
> > > > > bug surfaces. I repeatedly deleted the
> > affected directory
> > > > > and tried to re-copy it, but it always
> fails
> > at the same
> > > > > point (or close to that). More recent
> test
> > showed that I
> > > > > could copy a few more files to the
> filesystem
> > to a different
> > > > > directory, but it very quickly failed
> there
> > as well (a few
> > > > > megabytes later).
> > > > > > After talking to the btrfs devs on
> > freenode (as btrfs
> > > > > is the most experimental thing in the
> stack,
> > they came to
> > > > > the conclusion that it's most probably
> the
> > crypto layer.
> > > > > > 
> > > > > > Some details:
> > > > > > gentoo kernel 3.2.1 (custom config
> and
> > ubuntu config)
> > > > > > mdraid: linear, 4 disks, each 2TB
> (total
> > 8TB)
> > > > > > crypt: setup via cryptsetup -c
> > aes-xts-plain64 -h plain
> > > > > -s 512 -d - create tempraid
> > /dev/md/tempraid_lin
> > > > > > 
> > > > > > I'd appreciate any help with this
> and
> > would be happy to
> > > > > test patches or provide more debug
> info.
> > > > > > 
> > > > > > Thanks and Regards,
> > > > > > Luzipher
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > The kernel bug output retrieved by
> > netconsole (also at
> > > > > http://pastebin.com/sjJy7QE4 ):
> > > > > >     [  294.538422]
> netconsole:
> > > > > local port 6666
> > > > > >     [  333.423583] SysRq :
> > > > > Changing Loglevel
> > > > > >     [  333.423609] Loglevel
> > > > > set to 9
> > > > > >     [  424.248405]
> > > > > ------------[ cut here ]------------
> > > > > >     [  424.248447] kernel BUG
> > > > > at fs/bio.c:1499!
> > > > > 
> > > > > Hi Luzipher,
> > > > > 
> > > > > Looks like the BUG is because bio_split
> only
> > works on
> > > > > single-page iovecs.
> > > > > 
> > > > > I see a relevant (old) patch from Neil
> Brown
> > here:
> > > > > 
> > > > > https://lkml.org/lkml/2007/7/30/496
> > > > > 
> > > > > Regards,
> > > > > Mandeep
> > > > > 
> > > > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-01-31 23:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-24 23:45 [dm-crypt] Kernel BUG (fs/bio.c:1499) when copying more files to an encrypted device Luzipher McLeod
2012-01-25  2:14 ` Mandeep Singh Baines
2012-01-25  8:06   ` Luzipher McLeod
2012-01-25  9:08     ` Milan Broz
2012-01-25 18:20     ` Mandeep Singh Baines
2012-01-25 23:46       ` Mandeep Singh Baines
2012-01-25 23:46         ` Mandeep Singh Baines
2012-01-30  0:46         ` Luzipher McLeod
2012-01-30  0:46           ` Luzipher McLeod
2012-01-31 23:26           ` [dm-crypt] " Luzipher McLeod
2012-01-31 23:26             ` Luzipher McLeod

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.