Raid 5/10 discard support broken in 3.8.2

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Raid 5/10 discard support broken in 3.8.2
@ 2013-03-06  4:20 Dave Cundiff
  2013-03-07 10:02 ` Roy Sigurd Karlsbakk
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Dave Cundiff @ 2013-03-06  4:20 UTC (permalink / raw)
  To: Linux MDADM Raid

Hi all,

It appears the Raid 5/10 discard support does not work in the mainline kernel.

I've been trying to backport it to a RHEL 6 kernel without success. I
finally managed to setup a mainline dev box and discovered it doesn't
work on it either!

I'm now testing on a stock 3.8.2 kernel. The drives I'm using are
Samsung 840 Pro's hanging off an LSI 9211-8i. No backplane and each
drive has a dedicated channel. No RAID on the LSI, its just an HBA.

I added a few kprints to blk-lib.c:blkdev_issue_discard to see a few
variables that I thought were the issue. The version says el6,
however, there are no redhat patches applied.

--- linux-3.8.2-1.el6.x86_64.orig/block/blk-lib.c       2013-03-03
17:04:08.000000000 -0500
+++ linux-3.8.2-1.el6.x86_64/block/blk-lib.c    2013-03-05
22:05:38.181591562 -0500
@@ -58,17 +58,21 @@

        /* Zero-sector (unknown) and one-sector granularities are the same.  */
        granularity = max(q->limits.discard_granularity >> 9, 1U);
+  printk("granularity: %d\n", (int)granularity);
        alignment = bdev_discard_alignment(bdev) >> 9;
        alignment = sector_div(alignment, granularity);
-
+  printk("alignment: %d\n", (int)alignment);
        /*
         * Ensure that max_discard_sectors is of the proper
         * granularity, so that requests stay aligned after a split.
         */
        max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
+  printk("max_discard_sectors: %d\n", (int)max_discard_sectors);
        sector_div(max_discard_sectors, granularity);
        max_discard_sectors *= granularity;
+  printk("max_discard_sectors: %d\n", (int)max_discard_sectors);
        if (unlikely(!max_discard_sectors)) {
+    printk("Discard disabled\n");
                /* Avoid infinite loop below. Being cautious never hurts. */
                return -EOPNOTSUPP;
        }

My tests were done by doing mkfs.ext4 /dev/md126. On a device that
supports discard it should first discard the device and then format.
On most of the tests it did not attempt to discard or the kernel
crashed.

This Raid10 does not discard:

mdadm -C /dev/md126 -n6 -l10 -c512 --assume-clean /dev/sda3 /dev/sdb3
/dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3

My kprints output:
granularity: 65535
alignment: 52284
max_discard_sectors: 1024
max_discard_sectors: 0
Discard disabled

max_discard_sectors ends up zero and support is disabled.

Max_discard_sectors seems to be equal to chunk size. I'm pretty sure
the discard must be greater than granularity to not be 0 so I doubled
the chunk size until discard starting working.

This Raid10 does discard(notice the huge chunk size):

mdadm -C /dev/md126 -n6 -l10 -c65536 --assume-clean /dev/sda3
/dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3

My kprints scroll the following since the discards seem to make it all
the way to the disks.
granularity: 65535
alignment: 52284
max_discard_sectors: 131072
max_discard_sectors: 131070

It appears the max_discard_sectors is set from
q->limits.max_discard_sectors, which itself is set from line 3570 in
raid10.c

    blk_queue_max_discard_sectors(mddev->queue,
                mddev->chunk_sectors);

In the little I think I know I believe it needs to be a multiple of
chunk_sectors but not greater than the device size. Then if a large
discard comes in won't the raid10 code simply split it into smaller
bios?

As for Raid5 that just explodes on a BUG.

This Raid5:
mdadm -C /dev/md126 -n6 -l5 --assume-clean /dev/sda3 /dev/sdb3
/dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3

Outputs 2 sets of kprints

granularity: 65535
alignment: 42966
max_discard_sectors: 8388607
max_discard_sectors: 8388480
granularity: 65535
alignment: 42966
max_discard_sectors: 8388607
max_discard_sectors: 8388480

and then dies on a BUG

------------[ cut here ]------------
kernel BUG at drivers/scsi/scsi_lib.c:1028!
invalid opcode: 0000 [#1] SMP
Modules linked in: raid456 async_raid6_recov async_pq raid6_pq
async_xor xor async_memcpy async_tx xt_REDIRECT ipt_MASQUERADE
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_DSCP iptable_mangle iptable_filter nf_conntrack_ftp
nf_conntrack_irc xt_TCPMSS xt_owner xt_mac xt_length xt_ecn xt_LOG
xt_recent xt_limit xt_multiport xt_conntrack ipt_ULOG ipt_REJECT
ip_tables sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
nf_conntrack ip6table_filter ip6_tables ext3 jbd dm_mod gpio_ich
iTCO_wdt iTCO_vendor_support coretemp hwmon acpi_cpufreq freq_table
mperf kvm_intel kvm microcode serio_raw pcspkr i2c_i801 lpc_ich
snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm
snd_timer snd soundcore snd_page_alloc ioatdma dca i7core_edac
edac_core sg ext4 mbcache jbd2 raid1 raid10 sd_mod crc_t10dif
crc32c_intel pata_acpi ata_generic ata_piix e1000e mpt2sas
scsi_transport_sas raid_class mgag200 ttm drm_kms_helper be2iscsi
bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio
libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi
CPU 7
Pid: 6993, comm: md127_raid5 Not tainted 3.8.2-1.el6.x86_64 #2
Supermicro X8DTL/X8DTL
RIP: 0010:[<ffffffff813fe5e2>]  [<ffffffff813fe5e2>] scsi_init_sgtable+0x62/0x70
RSP: 0018:ffff88032d9e5a98  EFLAGS: 00010006
RAX: 000000000000007f RBX: ffff88062bbd0d90 RCX: ffff88032ccc1808
RDX: ffff8805618ed080 RSI: ffffea000b202540 RDI: 0000000000000000
RBP: ffff88032d9e5aa8 R08: 0000160000000000 R09: 000000032df23000
R10: 000000032dc18000 R11: 0000000000000000 R12: ffff88062bbf1518
R13: 0000000000000000 R14: 0000000000000020 R15: 000000000007f000
FS:  0000000000000000(0000) GS:ffff88063fc60000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000002024360 CR3: 000000032ed69000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process md127_raid5 (pid: 6993, threadinfo ffff88032d9e4000, task
ffff88032c30e040)
Stack:
 ffff88062bbf14c0 ffff88062bbd0d90 ffff88032d9e5af8 ffffffff813fe89d
 ffff88032cdbe800 0000000000000086 ffff88032d9e5af8 ffff88062bbd0d90
 ffff88062bbf14c0 0000000000000000 ffff88032cdbe800 000000000007f000
Call Trace:
 [<ffffffff813fe89d>] scsi_init_io+0x3d/0x170
 [<ffffffff813feb44>] scsi_setup_blk_pc_cmnd+0x94/0x180
 [<ffffffffa023d1f2>] sd_setup_discard_cmnd+0x182/0x270 [sd_mod]
 [<ffffffffa023d378>] sd_prep_fn+0x98/0xbd0 [sd_mod]
 [<ffffffff8129ae00>] ? list_sort+0x1b0/0x3c0
 [<ffffffff8126ba1e>] blk_peek_request+0xce/0x220
 [<ffffffff813fddd0>] scsi_request_fn+0x60/0x540
 [<ffffffff8126a5e7>] __blk_run_queue+0x37/0x50
 [<ffffffff8126abae>] queue_unplugged+0x4e/0xb0
 [<ffffffff8126bcf6>] blk_flush_plug_list+0x156/0x230
 [<ffffffff8126bde8>] blk_finish_plug+0x18/0x50
 [<ffffffffa067b602>] raid5d+0x282/0x2a0 [raid456]
 [<ffffffff8149d1f7>] md_thread+0x117/0x150
 [<ffffffff8107bfd0>] ? wake_up_bit+0x40/0x40
 [<ffffffff8149d0e0>] ? md_rdev_init+0x110/0x110
 [<ffffffff8107b73e>] kthread+0xce/0xe0
 [<ffffffff8107b670>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff815dbeec>] ret_from_fork+0x7c/0xb0
 [<ffffffff8107b670>] ? kthread_freezable_should_stop+0x70/0x70
Code: 49 8b 14 24 e8 f0 31 e7 ff 41 3b 44 24 08 77 1b 41 89 44 24 08
8b 43 54 41 89 44 24 10 31 c0 5b 41 5c c9 c3 b8 02 00 00 00 eb f4 <0f>
0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66
RIP  [<ffffffff813fe5e2>] scsi_init_sgtable+0x62/0x70
 RSP <ffff88032d9e5a98>
---[ end trace 5aea2a41495b91fc ]---
Kernel panic - not syncing: Fatal exception

That BUG is in

  /*
   * Next, walk the list, and fill in the addresses and sizes of
   * each segment.
   */
  count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
  BUG_ON(count > sdb->table.nents);
  sdb->table.nents = count;
  sdb->length = blk_rq_bytes(req);
  return BLKPREP_OK;

WAAAY over my head.

So at this point I'm unsure how to continue. My total time in kernel
code numbers in hours(maybe days). :)

My Backport to RHEL works if I increase the chunk size to 65536 as
well. I could go with that but I'm fairly certain such huge chunks may
cause an IO issue even on a crazy fast SSD array.

--
Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Raid 5/10 discard support broken in 3.8.2
  2013-03-06  4:20 Raid 5/10 discard support broken in 3.8.2 Dave Cundiff
@ 2013-03-07 10:02 ` Roy Sigurd Karlsbakk
  2013-03-07 17:08   ` Dave Cundiff
  2013-03-09  5:26 ` Brad Campbell
  2013-04-09 14:14 ` Kevin Liao
  2 siblings, 1 reply; 8+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-03-07 10:02 UTC (permalink / raw)
  To: Dave Cundiff; +Cc: Linux MDADM Raid

> It appears the Raid 5/10 discard support does not work in the mainline
> kernel.
> 
> I've been trying to backport it to a RHEL 6 kernel without success. I
> finally managed to setup a mainline dev box and discovered it doesn't
> work on it either!
> 
> I'm now testing on a stock 3.8.2 kernel. The drives I'm using are
> Samsung 840 Pro's hanging off an LSI 9211-8i. No backplane and each
> drive has a dedicated channel. No RAID on the LSI, its just an HBA.

I'm not sure if TRIM/UNMAP is supported on that controller, even in HBA mode (and with IT firmware). I've also seen scterc fail to work on that controller as well, while working with an ordinary "stupid" controller on the same disk.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Raid 5/10 discard support broken in 3.8.2
  2013-03-07 10:02 ` Roy Sigurd Karlsbakk
@ 2013-03-07 17:08   ` Dave Cundiff
  2013-03-07 18:26     ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Cundiff @ 2013-03-07 17:08 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Linux MDADM Raid

On Thu, Mar 7, 2013 at 5:02 AM, Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote:
>> It appears the Raid 5/10 discard support does not work in the mainline
>> kernel.
>>
>> I've been trying to backport it to a RHEL 6 kernel without success. I
>> finally managed to setup a mainline dev box and discovered it doesn't
>> work on it either!
>>
>> I'm now testing on a stock 3.8.2 kernel. The drives I'm using are
>> Samsung 840 Pro's hanging off an LSI 9211-8i. No backplane and each
>> drive has a dedicated channel. No RAID on the LSI, its just an HBA.
>
> I'm not sure if TRIM/UNMAP is supported on that controller, even in HBA mode (and with IT firmware). I've also seen scterc fail to work on that controller as well, while working with an ordinary "stupid" controller on the same disk.
>

I had heard that as well. I believe it may have been only older
firmware or maybe just people trying to TRIM against a RAID. I'm
running recent firmware and it appears to execute the command
correctly against a single disk.

[root@a2simage:etc]$ mkfs.ext4 /dev/sda3
mke2fs 1.41.12 (17-May-2010)
Discarding device blocks: ^C825792/54624256


Mar  7 11:53:08 66 kernel: sd 0:0:0:0: [sda] Send:
Mar  7 11:53:08 66 kernel: 0xffff8805edd86c80
Mar  7 11:53:08 66 kernel: sd 0:0:0:0: [sda] CDB:
Mar  7 11:53:08 66 kernel: Unmap/Read sub-channel: 42 00 00 00 00 00 00 00 18 00
Mar  7 11:53:08 66 kernel: buffer = 0xffff88062fe4a280, bufflen = 24,
queuecommand 0xffffffffa01d27e0
Mar  7 11:53:08 66 kernel: leaving scsi_dispatch_cmnd()
Mar  7 11:53:08 66 kernel: sd 0:0:0:0: [sda] Done:
Mar  7 11:53:08 66 kernel: 0xffff8805edd86c80 SUCCESS
Mar  7 11:53:08 66 kernel: sd 0:0:0:0: [sda]
Mar  7 11:53:08 66 kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Mar  7 11:53:08 66 kernel: sd 0:0:0:0: [sda] CDB:
Mar  7 11:53:08 66 kernel: Unmap/Read sub-channel: 42 00 00 00 00 00 00 00 18 00
Mar  7 11:53:08 66 kernel: sd 0:0:0:0: [sda] scsi host busy 1 failed 0
Mar  7 11:53:08 66 kernel: sd 0:0:0:0: Notifying upper driver of
completion (result 0)
Mar  7 11:53:08 66 kernel: sd 0:0:0:0: [sda]
Mar  7 11:53:08 66 kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Mar  7 11:53:08 66 kernel: 4194304 sectors total, -2147483648 bytes done.

That negative bytes done number concerned me a little but it looks
likes its just a type issue in the printk. Its printing an unsigned
int as a signed.

--
Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Raid 5/10 discard support broken in 3.8.2
  2013-03-07 17:08   ` Dave Cundiff
@ 2013-03-07 18:26     ` Roy Sigurd Karlsbakk
  0 siblings, 0 replies; 8+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-03-07 18:26 UTC (permalink / raw)
  To: Dave Cundiff; +Cc: Linux MDADM Raid

> > I'm not sure if TRIM/UNMAP is supported on that controller, even in
> > HBA mode (and with IT firmware). I've also seen scterc fail to work
> > on that controller as well, while working with an ordinary "stupid"
> > controller on the same disk.
> >
> 
> I had heard that as well. I believe it may have been only older
> firmware or maybe just people trying to TRIM against a RAID. I'm
> running recent firmware and it appears to execute the command
> correctly against a single disk.
> 
> [root@a2simage:etc]$ mkfs.ext4 /dev/sda3
> mke2fs 1.41.12 (17-May-2010)
> Discarding device blocks: ^C825792/54624256
> 
> 
> Mar 7 11:53:08 66 kernel: sd 0:0:0:0: [sda] Send:
> Mar 7 11:53:08 66 kernel: 0xffff8805edd86c80
> Mar 7 11:53:08 66 kernel: sd 0:0:0:0: [sda] CDB:
> Mar 7 11:53:08 66 kernel: Unmap/Read sub-channel: 42 00 00 00 00 00 00
> 00 18 00
> Mar 7 11:53:08 66 kernel: buffer = 0xffff88062fe4a280, bufflen = 24,
> queuecommand 0xffffffffa01d27e0

Could it be as simple as the controller trying to send UNMAP to an (S)ATA device? TRIM and UNMAP are different commands for ATA and SCSI, respectively.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Raid 5/10 discard support broken in 3.8.2
  2013-03-06  4:20 Raid 5/10 discard support broken in 3.8.2 Dave Cundiff
  2013-03-07 10:02 ` Roy Sigurd Karlsbakk
@ 2013-03-09  5:26 ` Brad Campbell
  2013-03-09 19:56   ` Dave Cundiff
  2013-04-09 14:14 ` Kevin Liao
  2 siblings, 1 reply; 8+ messages in thread
From: Brad Campbell @ 2013-03-09  5:26 UTC (permalink / raw)
  To: Dave Cundiff; +Cc: Linux MDADM Raid

On 06/03/13 12:20, Dave Cundiff wrote:
> Hi all,
>
> It appears the Raid 5/10 discard support does not work in the mainline kernel.
>
> I've been trying to backport it to a RHEL 6 kernel without success. I
> finally managed to setup a mainline dev box and discovered it doesn't
> work on it either!
>
> I'm now testing on a stock 3.8.2 kernel. The drives I'm using are
> Samsung 840 Pro's hanging off an LSI 9211-8i. No backplane and each
> drive has a dedicated channel. No RAID on the LSI, its just an HBA.
>

I'd be interested if you could test it against 3.7.9.

I have a 6 drive RAID10. 3 Drives are on the on-board AHCI, and the 
other three are on an LSI 9240-8i (SAS2008 with mpt2sas module).

TRIM is being passed down the stack as after an fstrim of the ext4 fs on 
the RAID the mismatch count goes through the roof.

The 3 drives on the LSI are Intel 330 (which support deterministic read 
after trim) while the drives on the AHCI are Samsung 830 (which are 
*not* deterministic after trim).

My raid is 6 drives in an n2 configuration, so three pairs of mirrors. 
Each one a Samsung and an Intel.

I used dd to pull off the first 2G of the first pair and ran vbindiff 
over them.

The differences are easily spotted where the Samsungs are returning data 
and the Intels are returning zero. So the trim is most certainly making 
its way down through the RAID10 and to the Intel 330 drives on the LSI 
controller on kernel 3.7.9.

It's a production machine, so I can't really pull it down to upgrade it 
to 3.8.2.

Hope this helps.

Regards,
Brad

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Raid 5/10 discard support broken in 3.8.2
  2013-03-09  5:26 ` Brad Campbell
@ 2013-03-09 19:56   ` Dave Cundiff
  2013-03-10  4:49     ` Brad Campbell
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Cundiff @ 2013-03-09 19:56 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Linux MDADM Raid

On Sat, Mar 9, 2013 at 12:26 AM, Brad Campbell
<lists2009@fnarfbargle.com> wrote:
>
> I'd be interested if you could test it against 3.7.9.
>
> I have a 6 drive RAID10. 3 Drives are on the on-board AHCI, and the other
> three are on an LSI 9240-8i (SAS2008 with mpt2sas module).
>
> TRIM is being passed down the stack as after an fstrim of the ext4 fs on the
> RAID the mismatch count goes through the roof.
>
> The 3 drives on the LSI are Intel 330 (which support deterministic read
> after trim) while the drives on the AHCI are Samsung 830 (which are *not*
> deterministic after trim).
>
> My raid is 6 drives in an n2 configuration, so three pairs of mirrors. Each
> one a Samsung and an Intel.
>

While I set this up could you let me know what you have in the
following sys files?

/sys/block/[mddevice]/discard_alignment
/sys/block/[mddevice]/queue/discard_granularity
/sys/block/[mddevice]/queue/discard_max_bytes

Also what chunk size is your array set to?

Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Raid 5/10 discard support broken in 3.8.2
  2013-03-09 19:56   ` Dave Cundiff
@ 2013-03-10  4:49     ` Brad Campbell
  0 siblings, 0 replies; 8+ messages in thread
From: Brad Campbell @ 2013-03-10  4:49 UTC (permalink / raw)
  To: Dave Cundiff; +Cc: Linux MDADM Raid

On 10/03/13 03:56, Dave Cundiff wrote:
> On Sat, Mar 9, 2013 at 12:26 AM, Brad Campbell
> <lists2009@fnarfbargle.com> wrote:
>>
>> I'd be interested if you could test it against 3.7.9.
>>
>> I have a 6 drive RAID10. 3 Drives are on the on-board AHCI, and the other
>> three are on an LSI 9240-8i (SAS2008 with mpt2sas module).
>>
>> TRIM is being passed down the stack as after an fstrim of the ext4 fs on the
>> RAID the mismatch count goes through the roof.
>>
>> The 3 drives on the LSI are Intel 330 (which support deterministic read
>> after trim) while the drives on the AHCI are Samsung 830 (which are *not*
>> deterministic after trim).
>>
>> My raid is 6 drives in an n2 configuration, so three pairs of mirrors. Each
>> one a Samsung and an Intel.
>>
>
> While I set this up could you let me know what you have in the
> following sys files?
>
> /sys/block/[mddevice]/discard_alignment
> /sys/block/[mddevice]/queue/discard_granularity
> /sys/block/[mddevice]/queue/discard_max_bytes
>
> Also what chunk size is your array set to?


brad@srv:~$ cat /sys/block/md2/discard_alignment
32504832
brad@srv:~$ cat /sys/block/md2/queue/discard_granularity
33553920
brad@srv:~$ cat /sys/block/md2/queue/discard_max_bytes
65536

/dev/md2:
         Version : 1.2
   Creation Time : Fri Nov  9 15:49:44 2012
      Raid Level : raid10
      Array Size : 628752000 (599.62 GiB 643.84 GB)
   Used Dev Size : 209584000 (199.87 GiB 214.61 GB)
    Raid Devices : 6
   Total Devices : 6
     Persistence : Superblock is persistent

   Intent Bitmap : Internal

     Update Time : Sun Mar 10 12:49:41 2013
           State : active
  Active Devices : 6
Working Devices : 6
  Failed Devices : 0
   Spare Devices : 0

          Layout : near=2
      Chunk Size : 64K

            Name : unset:2
            UUID : 5a6bc3c5:f37768c6:909783aa:2c3fbd28
          Events : 23015

     Number   Major   Minor   RaidDevice State
        0       8        2        0      active sync set-A   /dev/sda2
        1      65       66        1      active sync set-B   /dev/sdu2
        2       8       18        2      active sync set-A   /dev/sdb2
        3      65       34        3      active sync set-B   /dev/sds2
        4       8       50        4      active sync set-A   /dev/sdd2
        5      65       50        5      active sync set-B   /dev/sdt2

Regards,
Brad

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Raid 5/10 discard support broken in 3.8.2
  2013-03-06  4:20 Raid 5/10 discard support broken in 3.8.2 Dave Cundiff
  2013-03-07 10:02 ` Roy Sigurd Karlsbakk
  2013-03-09  5:26 ` Brad Campbell
@ 2013-04-09 14:14 ` Kevin Liao
  2 siblings, 0 replies; 8+ messages in thread
From: Kevin Liao @ 2013-04-09 14:14 UTC (permalink / raw)
  To: Dave Cundiff; +Cc: Linux MDADM Raid

2013/3/6 Dave Cundiff <syshackmin@gmail.com>
>
> Hi all,
>
> It appears the Raid 5/10 discard support does not work in the mainline kernel.
>
> I've been trying to backport it to a RHEL 6 kernel without success. I
> finally managed to setup a mainline dev box and discovered it doesn't
> work on it either!
>
> I'm now testing on a stock 3.8.2 kernel. The drives I'm using are
> Samsung 840 Pro's hanging off an LSI 9211-8i. No backplane and each
> drive has a dedicated channel. No RAID on the LSI, its just an HBA.
>
> As for Raid5 that just explodes on a BUG.
>
> This Raid5:
> mdadm -C /dev/md126 -n6 -l5 --assume-clean /dev/sda3 /dev/sdb3
> /dev/sdc3 /dev/sdd3 /dev/sde3 /dev/sdf3
>
> Outputs 2 sets of kprints
>
> granularity: 65535
> alignment: 42966
> max_discard_sectors: 8388607
> max_discard_sectors: 8388480
> granularity: 65535
> alignment: 42966
> max_discard_sectors: 8388607
> max_discard_sectors: 8388480
>
> and then dies on a BUG
>
> ------------[ cut here ]------------
> kernel BUG at drivers/scsi/scsi_lib.c:1028!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: raid456 async_raid6_recov async_pq raid6_pq
> async_xor xor async_memcpy async_tx xt_REDIRECT ipt_MASQUERADE
> iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
> xt_DSCP iptable_mangle iptable_filter nf_conntrack_ftp
> nf_conntrack_irc xt_TCPMSS xt_owner xt_mac xt_length xt_ecn xt_LOG
> xt_recent xt_limit xt_multiport xt_conntrack ipt_ULOG ipt_REJECT
> ip_tables sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
> nf_conntrack ip6table_filter ip6_tables ext3 jbd dm_mod gpio_ich
> iTCO_wdt iTCO_vendor_support coretemp hwmon acpi_cpufreq freq_table
> mperf kvm_intel kvm microcode serio_raw pcspkr i2c_i801 lpc_ich
> snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm
> snd_timer snd soundcore snd_page_alloc ioatdma dca i7core_edac
> edac_core sg ext4 mbcache jbd2 raid1 raid10 sd_mod crc_t10dif
> crc32c_intel pata_acpi ata_generic ata_piix e1000e mpt2sas
> scsi_transport_sas raid_class mgag200 ttm drm_kms_helper be2iscsi
> bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio
> libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi
> CPU 7
> Pid: 6993, comm: md127_raid5 Not tainted 3.8.2-1.el6.x86_64 #2
> Supermicro X8DTL/X8DTL
> RIP: 0010:[<ffffffff813fe5e2>]  [<ffffffff813fe5e2>] scsi_init_sgtable+0x62/0x70
> RSP: 0018:ffff88032d9e5a98  EFLAGS: 00010006
> RAX: 000000000000007f RBX: ffff88062bbd0d90 RCX: ffff88032ccc1808
> RDX: ffff8805618ed080 RSI: ffffea000b202540 RDI: 0000000000000000
> RBP: ffff88032d9e5aa8 R08: 0000160000000000 R09: 000000032df23000
> R10: 000000032dc18000 R11: 0000000000000000 R12: ffff88062bbf1518
> R13: 0000000000000000 R14: 0000000000000020 R15: 000000000007f000
> FS:  0000000000000000(0000) GS:ffff88063fc60000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000002024360 CR3: 000000032ed69000 CR4: 00000000000007e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process md127_raid5 (pid: 6993, threadinfo ffff88032d9e4000, task
> ffff88032c30e040)
> Stack:
>  ffff88062bbf14c0 ffff88062bbd0d90 ffff88032d9e5af8 ffffffff813fe89d
>  ffff88032cdbe800 0000000000000086 ffff88032d9e5af8 ffff88062bbd0d90
>  ffff88062bbf14c0 0000000000000000 ffff88032cdbe800 000000000007f000
> Call Trace:
>  [<ffffffff813fe89d>] scsi_init_io+0x3d/0x170
>  [<ffffffff813feb44>] scsi_setup_blk_pc_cmnd+0x94/0x180
>  [<ffffffffa023d1f2>] sd_setup_discard_cmnd+0x182/0x270 [sd_mod]
>  [<ffffffffa023d378>] sd_prep_fn+0x98/0xbd0 [sd_mod]
>  [<ffffffff8129ae00>] ? list_sort+0x1b0/0x3c0
>  [<ffffffff8126ba1e>] blk_peek_request+0xce/0x220
>  [<ffffffff813fddd0>] scsi_request_fn+0x60/0x540
>  [<ffffffff8126a5e7>] __blk_run_queue+0x37/0x50
>  [<ffffffff8126abae>] queue_unplugged+0x4e/0xb0
>  [<ffffffff8126bcf6>] blk_flush_plug_list+0x156/0x230
>  [<ffffffff8126bde8>] blk_finish_plug+0x18/0x50
>  [<ffffffffa067b602>] raid5d+0x282/0x2a0 [raid456]
>  [<ffffffff8149d1f7>] md_thread+0x117/0x150
>  [<ffffffff8107bfd0>] ? wake_up_bit+0x40/0x40
>  [<ffffffff8149d0e0>] ? md_rdev_init+0x110/0x110
>  [<ffffffff8107b73e>] kthread+0xce/0xe0
>  [<ffffffff8107b670>] ? kthread_freezable_should_stop+0x70/0x70
>  [<ffffffff815dbeec>] ret_from_fork+0x7c/0xb0
>  [<ffffffff8107b670>] ? kthread_freezable_should_stop+0x70/0x70
> Code: 49 8b 14 24 e8 f0 31 e7 ff 41 3b 44 24 08 77 1b 41 89 44 24 08
> 8b 43 54 41 89 44 24 10 31 c0 5b 41 5c c9 c3 b8 02 00 00 00 eb f4 <0f>
> 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66
> RIP  [<ffffffff813fe5e2>] scsi_init_sgtable+0x62/0x70
>  RSP <ffff88032d9e5a98>
> ---[ end trace 5aea2a41495b91fc ]---
> Kernel panic - not syncing: Fatal exception
>
> That BUG is in
>
>   /*
>    * Next, walk the list, and fill in the addresses and sizes of
>    * each segment.
>    */
>   count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
>   BUG_ON(count > sdb->table.nents);
>   sdb->table.nents = count;
>   sdb->length = blk_rq_bytes(req);
>   return BLKPREP_OK;
>
> WAAAY over my head.
>
> So at this point I'm unsure how to continue. My total time in kernel
> code numbers in hours(maybe days). :)
>
> My Backport to RHEL works if I increase the chunk size to 65536 as
> well. I could go with that but I'm fairly certain such huge chunks may
> cause an IO issue even on a crazy fast SSD array.
>

Hi Dave and all,

May I ask about the status of this problem? I am trying to backport discard
support to kernel 3.4 but have almost the same kernel bug and error message.
Thanks a lot.

Regards,
Kevin

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-04-09 14:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-06  4:20 Raid 5/10 discard support broken in 3.8.2 Dave Cundiff
2013-03-07 10:02 ` Roy Sigurd Karlsbakk
2013-03-07 17:08   ` Dave Cundiff
2013-03-07 18:26     ` Roy Sigurd Karlsbakk
2013-03-09  5:26 ` Brad Campbell
2013-03-09 19:56   ` Dave Cundiff
2013-03-10  4:49     ` Brad Campbell
2013-04-09 14:14 ` Kevin Liao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).