* sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)
@ 2014-06-19 7:02 Stefan Priebe - Profihost AG
2014-06-20 3:08 ` Martin K. Petersen
0 siblings, 1 reply; 12+ messages in thread
From: Stefan Priebe - Profihost AG @ 2014-06-19 7:02 UTC (permalink / raw)
To: NeilBrown
Cc: linux-raid, linux-scsi, JBottomley, Jens Axboe, konrad.wilk,
elder, Josh Durgin, Greg KH, Lars Ellenberg
Hi,
while using vanilla 3.10.44 with drbd on top of a md raid1.
I'm pretty often hitting the followin kernel bug.
It reminds me of:
http://lists.openwall.net/linux-kernel/2014/02/19/428
But i don't use bcache.
[sched_delayed] sched: RT throttling activated
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8128105c>] blk_add_request_payload+0xc/0x90
PGD 0
Oops: 0002 [#1] SMP
Modules linked in: vhost_net tun macvtap macvlan dlm sctp netconsole
drbd lru_cache xt_multiport iptable_filter ip_tables x_tables iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss oid_registry
bonding ext2 8021q garp fuse acpi_cpufreq mperf coretemp kvm_intel kvm
microcode i2c_i801 i7core_edac edac_core button dm_mod raid1 md_mod
usb_storage ohci_hcd usbhid sg sd_mod uhci_hcd ehci_pci ehci_hcd ahci
libahci usbcore ixgbe(O) usb_common igb i2c_algo_bit mpt2sas i2c_core
raid_class ptp scsi_transport_sas pps_core
CPU: 0 PID: 636 Comm: md124_raid1 Tainted: G O 3.10.41+76-ph #1
Hardware name: Supermicro X8DT3/X8DT3, BIOS 2.1 03/17/2012
task: ffff880811e16400 ti: ffff88081324c000 task.ti: ffff88081324c000
RIP: 0010:[<ffffffff8128105c>] [<ffffffff8128105c>]
blk_add_request_payload+0xc/0x90
RSP: 0018:ffff88081324dac8 EFLAGS: 00010086
RAX: ffff88080b686b40 RBX: ffff88080af19028 RCX: 0000000000000000
RDX: 0000000000000018 RSI: ffffea00202b7200 RDI: ffff88080af19028
RBP: ffff88081324dac8 R08: ffffea00202b7200 R09: 00000000006f4a62
R10: 000000000000243d R11: 0000000000000000 R12: ffff8808145b9400
R13: ffff8810139a6000 R14: 0000000004400000 R15: ffff88080adc8000
FS: 0000000000000000(0000) GS:ffff88081fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a0b000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400Stack:
ffff88081324db18 ffffffffa01b515f ffffea00202b7200 000000000b637800
ffff88081324db18 ffff88080af19028 ffff880812f18818 ffff88081324dca8
ffff8810139a6000 ffff8808145b9800 ffff88081324db88 ffffffffa01b5318
Call Trace:
[<ffffffffa01b515f>] sd_setup_discard_cmnd+0x13f/0x260 [sd_mod]
[<ffffffffa01b5318>] sd_prep_fn+0x98/0xbb0 [sd_mod]
[<ffffffff812b1750>] ? merge+0x50/0xa0
[<ffffffff81281180>] ? blk_start_plug+0x50/0x50
[<ffffffff81286eb2>] blk_peek_request+0x132/0x250
[<ffffffff8138ae96>] scsi_request_fn+0x46/0x520
[<ffffffff81285a62>] __blk_run_queue+0x32/0x40
[<ffffffff81285b34>] queue_unplugged+0x34/0xb0
[<ffffffff81287343>] blk_flush_plug_list+0x183/0x220
[<ffffffff812873f3>] blk_finish_plug+0x13/0x50
[<ffffffffa008ac56>] raid1d+0x706/0xee0 [raid1]
[<ffffffff81057ada>] ? try_to_del_timer_sync+0x4a/0x60
[<ffffffff81057b42>] ? del_timer_sync+0x52/0x60
[<ffffffff81056e10>] ? usleep_range+0x40/0x40
[<ffffffffa00d5abd>] md_thread+0x11d/0x170 [md_mod]
[<ffffffff8106ca20>] ? finish_wait+0x80/0x80
Stefan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)
2014-06-19 7:02 sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null) Stefan Priebe - Profihost AG
@ 2014-06-20 3:08 ` Martin K. Petersen
2014-06-20 15:53 ` Lars Ellenberg
0 siblings, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2014-06-20 3:08 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: NeilBrown, linux-raid, linux-scsi, JBottomley, Jens Axboe,
konrad.wilk, elder, Josh Durgin, Greg KH, Lars Ellenberg
>>>>> "Stefan" == Stefan Priebe <- Profihost AG <s.priebe@profihost.ag>> writes:
Stefan> Hi, while using vanilla 3.10.44 with drbd on top of a md raid1.
Stefan> I'm pretty often hitting the followin kernel bug.
Stefan> [<ffffffff8128105c>] blk_add_request_payload+0xc/0x90
That's really messed up. This means we received a request with no bio.
Does this happen with later kernels?
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)
2014-06-20 3:08 ` Martin K. Petersen
@ 2014-06-20 15:53 ` Lars Ellenberg
2014-06-20 16:49 ` Martin K. Petersen
0 siblings, 1 reply; 12+ messages in thread
From: Lars Ellenberg @ 2014-06-20 15:53 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Stefan Priebe - Profihost AG, NeilBrown, linux-raid, linux-scsi,
JBottomley, Jens Axboe, konrad.wilk, elder, Josh Durgin, Greg KH
On Thu, Jun 19, 2014 at 11:08:22PM -0400, Martin K. Petersen wrote:
> >>>>> "Stefan" == Stefan Priebe <- Profihost AG <s.priebe@profihost.ag>> writes:
>
> Stefan> Hi, while using vanilla 3.10.44 with drbd on top of a md raid1.
>
> Stefan> I'm pretty often hitting the followin kernel bug.
>
> Stefan> [<ffffffff8128105c>] blk_add_request_payload+0xc/0x90
>
> That's really messed up. This means we received a request with no bio.
No.
That means you received a bio that has been allocated with
bio_alloc(... , nr_iovecs = 0);
thus bio->bi_io_vec is NULL,
but blk_add_request_payload insists on using it anyways.
Even though it also requires that bio->bi_vcnt = 0
(because it then explicitly sets that to 1).
This is some subtlety with discard requests that has bitten some
stacking drivers now.
Any bio allocated that will be passed down with REQ_DISCARD
has to be allocated with nr_iovecs = 1 (at least),
even though it must not contain any bio_vec payload.
Though DRBD in 3.10 is not supposed to accept discard requests.
So I'm not sure how it manages to pass them down?
Lars
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)
2014-06-20 15:53 ` Lars Ellenberg
@ 2014-06-20 16:49 ` Martin K. Petersen
2014-06-20 18:29 ` Lars Ellenberg
0 siblings, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2014-06-20 16:49 UTC (permalink / raw)
To: Lars Ellenberg
Cc: Martin K. Petersen, Stefan Priebe - Profihost AG, NeilBrown,
linux-raid, linux-scsi, JBottomley, Jens Axboe, konrad.wilk,
elder, Josh Durgin, Greg KH
>>>>> "Lars" == Lars Ellenberg <lars.ellenberg@linbit.com> writes:
Lars,
Lars> Any bio allocated that will be passed down with REQ_DISCARD has to
Lars> be allocated with nr_iovecs = 1 (at least), even though it must
Lars> not contain any bio_vec payload.
True. Although the correct answer is: Any discard request must be issued
by blkdev_issue_discard(). That's the interface.
The hacks we do to carry the information inside the bio constitute an
internal interface that is subject to change (it is just about to,
actually).
Lars> Though DRBD in 3.10 is not supposed to accept discard requests.
Lars> So I'm not sure how it manages to pass them down?
drbd_receiver.c:
static unsigned long wire_flags_to_bio(struct drbd_conf *mdev, u32 dpf)
{
return (dpf & DP_RW_SYNC ? REQ_SYNC : 0) |
(dpf & DP_FUA ? REQ_FUA : 0) |
(dpf & DP_FLUSH ? REQ_FLUSH : 0) |
(dpf & DP_DISCARD ? REQ_DISCARD : 0);
}
[...]
/* mirrored write */
static int receive_Data(struct drbd_tconn *tconn, struct packet_info
*pi)
{
[...]
dp_flags = be32_to_cpu(p->dp_flags);
rw |= wire_flags_to_bio(mdev, dp_flags);
[...]
That's pretty busticated. I suggest you simply remove REQ_DISCARD from
that helper for now.
It's also a good idea to disable discard and write same on the client
side when you set up the request queue:
blk_queue_max_discard_sectors(q, 0);
blk_queue_max_write_same_sectors(q, 0);
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)
2014-06-20 16:49 ` Martin K. Petersen
@ 2014-06-20 18:29 ` Lars Ellenberg
2014-06-21 17:48 ` Stefan Priebe
2014-06-23 19:37 ` Martin K. Petersen
0 siblings, 2 replies; 12+ messages in thread
From: Lars Ellenberg @ 2014-06-20 18:29 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Stefan Priebe - Profihost AG, NeilBrown, linux-raid, linux-scsi,
JBottomley, Jens Axboe, konrad.wilk, elder, Josh Durgin, Greg KH
On Fri, Jun 20, 2014 at 12:49:39PM -0400, Martin K. Petersen wrote:
> >>>>> "Lars" == Lars Ellenberg <lars.ellenberg@linbit.com> writes:
>
> Lars,
>
> Lars> Any bio allocated that will be passed down with REQ_DISCARD has to
> Lars> be allocated with nr_iovecs = 1 (at least), even though it must
> Lars> not contain any bio_vec payload.
>
> True. Although the correct answer is: Any discard request must be issued
> by blkdev_issue_discard(). That's the interface.
>
> The hacks we do to carry the information inside the bio constitute an
> internal interface that is subject to change (it is just about to,
> actually).
>
> Lars> Though DRBD in 3.10 is not supposed to accept discard requests.
> Lars> So I'm not sure how it manages to pass them down?
>
> drbd_receiver.c:
>
> static unsigned long wire_flags_to_bio(struct drbd_conf *mdev, u32 dpf)
> {
> return (dpf & DP_RW_SYNC ? REQ_SYNC : 0) |
> (dpf & DP_FUA ? REQ_FUA : 0) |
> (dpf & DP_FLUSH ? REQ_FLUSH : 0) |
> (dpf & DP_DISCARD ? REQ_DISCARD : 0);
> }
>
> [...]
>
> /* mirrored write */
> static int receive_Data(struct drbd_tconn *tconn, struct packet_info
> *pi)
> {
> [...]
> dp_flags = be32_to_cpu(p->dp_flags);
> rw |= wire_flags_to_bio(mdev, dp_flags);
> [...]
>
> That's pretty busticated. I suggest you simply remove REQ_DISCARD from
> that helper for now.
>
> It's also a good idea to disable discard and write same on the client
> side when you set up the request queue:
>
> blk_queue_max_discard_sectors(q, 0);
> blk_queue_max_write_same_sectors(q, 0);
Our main development still happens out-of-tree,
trying to be compatible to a large range of kernel versions.
linux upstream DRBD is supposed to handle discards "correctly"
(even though not using the proper interface blkdev_issue_discard).
But it does not, because one fix apparently slipped through
when preparing the pull request.
So linux upstream needs:
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index b6c8aaf..5b17ec8 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1337,8 +1337,11 @@ int drbd_submit_peer_request(struct drbd_device *device,
return 0;
}
+ /* Discards don't have any payload.
+ * But the scsi layer still expects a bio_vec it can use internally,
+ * see sd_setup_discard_cmnd() and blk_add_request_payload(). */
if (peer_req->flags & EE_IS_TRIM)
- nr_pages = 0; /* discards don't have any payload. */
+ nr_pages = 1;
/* In most cases, we will only need one bio. But in case the lower
* level restrictions happen to be different at this offset on this
I'll prepare a proper patch with commit message later.
linux upstream DRBD also does blk_queue_max_write_same_sectors(q, 0)
and blk_queue_max_discard_sectors(q, DRBD_MAX_DISCARD_SECTORS)
-------
For linux 3.10, things are different.
DRBD in linux 3.10 does not set QUEUE_FLAG_DISCARD,
and does not announce discard capabilities in any other way,
even though it already contains some preparation steps
(those pieces your grep foo managed to find above...)
DRBD does a handshake, and if there is no discard capability announced,
the peer is supposed to never send discards (and stop announcing them
on his side), even if the peer's DRBD version already supports
and announces discard capabilities.
So I'm still not really seeing how discard requests would be issued
by that version of DRBD.
The local submit path should not allow them (no QUEUE_FLAG_DISCARD set)
and the remote submit path should not allow them either,
for the same reason, and because the DRBD handshake does not allow them.
So my current guess would be that Stefan prepared a 3.10.44
+ "upstream DRBD", but unfortunately not upstream enough?
Stefan, please give more details how to trigger this,
with which exact DRBD versions on the peers, and what action.
Lars
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)
2014-06-20 18:29 ` Lars Ellenberg
@ 2014-06-21 17:48 ` Stefan Priebe
2014-06-23 13:38 ` Lars Ellenberg
2014-06-23 19:37 ` Martin K. Petersen
1 sibling, 1 reply; 12+ messages in thread
From: Stefan Priebe @ 2014-06-21 17:48 UTC (permalink / raw)
To: Lars Ellenberg, Martin K. Petersen
Cc: NeilBrown, linux-raid, linux-scsi, JBottomley, Jens Axboe,
konrad.wilk, elder, Josh Durgin, Greg KH
Hi Lars,
Am 20.06.2014 20:29, schrieb Lars Ellenberg:
> On Fri, Jun 20, 2014 at 12:49:39PM -0400, Martin K. Petersen wrote:
>>>>>>> "Lars" == Lars Ellenberg <lars.ellenberg@linbit.com> writes:
>>
>> Lars,
>>
>> Lars> Any bio allocated that will be passed down with REQ_DISCARD has to
>> Lars> be allocated with nr_iovecs = 1 (at least), even though it must
>> Lars> not contain any bio_vec payload.
>>
>> True. Although the correct answer is: Any discard request must be issued
>> by blkdev_issue_discard(). That's the interface.
>>
>> The hacks we do to carry the information inside the bio constitute an
>> internal interface that is subject to change (it is just about to,
>> actually).
>>
>> Lars> Though DRBD in 3.10 is not supposed to accept discard requests.
>> Lars> So I'm not sure how it manages to pass them down?
your're absolutely right - a collegue installed drbd 8.4.4 as a module.
I didn't knew that. Sorry.
So your attached patch will fix it?
>> drbd_receiver.c:
>>
>> static unsigned long wire_flags_to_bio(struct drbd_conf *mdev, u32 dpf)
>> {
>> return (dpf & DP_RW_SYNC ? REQ_SYNC : 0) |
>> (dpf & DP_FUA ? REQ_FUA : 0) |
>> (dpf & DP_FLUSH ? REQ_FLUSH : 0) |
>> (dpf & DP_DISCARD ? REQ_DISCARD : 0);
>> }
>>
>> [...]
>>
>> /* mirrored write */
>> static int receive_Data(struct drbd_tconn *tconn, struct packet_info
>> *pi)
>> {
>> [...]
>> dp_flags = be32_to_cpu(p->dp_flags);
>> rw |= wire_flags_to_bio(mdev, dp_flags);
>> [...]
>>
>> That's pretty busticated. I suggest you simply remove REQ_DISCARD from
>> that helper for now.
>>
>> It's also a good idea to disable discard and write same on the client
>> side when you set up the request queue:
>>
>> blk_queue_max_discard_sectors(q, 0);
>> blk_queue_max_write_same_sectors(q, 0);
>
> Our main development still happens out-of-tree,
> trying to be compatible to a large range of kernel versions.
>
> linux upstream DRBD is supposed to handle discards "correctly"
> (even though not using the proper interface blkdev_issue_discard).
>
> But it does not, because one fix apparently slipped through
> when preparing the pull request.
>
> So linux upstream needs:
> diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
> index b6c8aaf..5b17ec8 100644
> --- a/drivers/block/drbd/drbd_receiver.c
> +++ b/drivers/block/drbd/drbd_receiver.c
> @@ -1337,8 +1337,11 @@ int drbd_submit_peer_request(struct drbd_device *device,
> return 0;
> }
>
> + /* Discards don't have any payload.
> + * But the scsi layer still expects a bio_vec it can use internally,
> + * see sd_setup_discard_cmnd() and blk_add_request_payload(). */
> if (peer_req->flags & EE_IS_TRIM)
> - nr_pages = 0; /* discards don't have any payload. */
> + nr_pages = 1;
>
> /* In most cases, we will only need one bio. But in case the lower
> * level restrictions happen to be different at this offset on this
>
> I'll prepare a proper patch with commit message later.
>
> linux upstream DRBD also does blk_queue_max_write_same_sectors(q, 0)
> and blk_queue_max_discard_sectors(q, DRBD_MAX_DISCARD_SECTORS)
>
> -------
> For linux 3.10, things are different.
>
> DRBD in linux 3.10 does not set QUEUE_FLAG_DISCARD,
> and does not announce discard capabilities in any other way,
> even though it already contains some preparation steps
> (those pieces your grep foo managed to find above...)
>
> DRBD does a handshake, and if there is no discard capability announced,
> the peer is supposed to never send discards (and stop announcing them
> on his side), even if the peer's DRBD version already supports
> and announces discard capabilities.
>
> So I'm still not really seeing how discard requests would be issued
> by that version of DRBD.
> The local submit path should not allow them (no QUEUE_FLAG_DISCARD set)
> and the remote submit path should not allow them either,
> for the same reason, and because the DRBD handshake does not allow them.
>
> So my current guess would be that Stefan prepared a 3.10.44
> + "upstream DRBD", but unfortunately not upstream enough?
>
> Stefan, please give more details how to trigger this,
> with which exact DRBD versions on the peers, and what action.
>
> Lars
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)
2014-06-21 17:48 ` Stefan Priebe
@ 2014-06-23 13:38 ` Lars Ellenberg
0 siblings, 0 replies; 12+ messages in thread
From: Lars Ellenberg @ 2014-06-23 13:38 UTC (permalink / raw)
To: Stefan Priebe
Cc: Martin K. Petersen, NeilBrown, linux-raid, linux-scsi, JBottomley,
Jens Axboe, konrad.wilk, elder, Josh Durgin, Greg KH
On Sat, Jun 21, 2014 at 07:48:22PM +0200, Stefan Priebe wrote:
> Hi Lars,
> Am 20.06.2014 20:29, schrieb Lars Ellenberg:
> >On Fri, Jun 20, 2014 at 12:49:39PM -0400, Martin K. Petersen wrote:
> >>>>>>>"Lars" == Lars Ellenberg <lars.ellenberg@linbit.com> writes:
> >>
> >>Lars,
> >>
> >>Lars> Any bio allocated that will be passed down with REQ_DISCARD has to
> >>Lars> be allocated with nr_iovecs = 1 (at least), even though it must
> >>Lars> not contain any bio_vec payload.
> >>
> >>True. Although the correct answer is: Any discard request must be issued
> >>by blkdev_issue_discard(). That's the interface.
> >>
> >>The hacks we do to carry the information inside the bio constitute an
> >>internal interface that is subject to change (it is just about to,
> >>actually).
> >>
> >>Lars> Though DRBD in 3.10 is not supposed to accept discard requests.
> >>Lars> So I'm not sure how it manages to pass them down?
>
> your're absolutely right - a collegue installed drbd 8.4.4 as a
> module. I didn't knew that. Sorry.
That is (again) incorrect/incomplete.
Your original post:
> while using vanilla 3.10.44 with drbd on top of a md raid1.
...
> CPU: 0 PID: 636 Comm: md124_raid1 Tainted: G O 3.10.41+76-ph #1
> Modules linked in: ... drbd ...
So it's not vanilla, its not 3.10.44, and its not 3.10.41 either,
and its not even a "clean" external module.
But its "something" based on 3.10.41,
where you added your own patches or "backports",
and now complain to the upstream maintainers that it explodes,
and don't bother to tell them that it is modified code.
> So your attached patch will fix it?
No.
For the out-of-tree module it is fixed.
You just need to upgrade.
This is for the 3.16-rc1 and later in-tree DRBD,
where this fix apparently slipt through when preparing the pull request.
It has not even been in a released mainline kernel yet.
But thanks anyways for reporting it,
it may have ended up unnoticed in 3.16.
Lars
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)
2014-06-20 18:29 ` Lars Ellenberg
2014-06-21 17:48 ` Stefan Priebe
@ 2014-06-23 19:37 ` Martin K. Petersen
2014-06-24 11:53 ` Lars Ellenberg
1 sibling, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2014-06-23 19:37 UTC (permalink / raw)
To: Lars Ellenberg
Cc: Martin K. Petersen, Stefan Priebe - Profihost AG, NeilBrown,
linux-raid, linux-scsi, JBottomley, Jens Axboe, konrad.wilk,
elder, Josh Durgin, Greg KH
>>>>> "Lars" == Lars Ellenberg <lars.ellenberg@linbit.com> writes:
Lars,
Thanks for fixing this.
I'd still like to see you use the lib call instead like you do for
zeroout. I have some patches in the pipeline for multi-range discard
support and things are going to break for drbd if you manually roll
bios.
Lars> linux upstream DRBD also does blk_queue_max_write_same_sectors(q,
Lars> 0) and blk_queue_max_discard_sectors(q, DRBD_MAX_DISCARD_SECTORS)
Great!
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)
2014-06-23 19:37 ` Martin K. Petersen
@ 2014-06-24 11:53 ` Lars Ellenberg
2014-06-24 23:11 ` Martin K. Petersen
0 siblings, 1 reply; 12+ messages in thread
From: Lars Ellenberg @ 2014-06-24 11:53 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Stefan Priebe - Profihost AG, NeilBrown, linux-raid, linux-scsi,
JBottomley, Jens Axboe, konrad.wilk, elder, Josh Durgin, Greg KH
On Mon, Jun 23, 2014 at 03:37:03PM -0400, Martin K. Petersen wrote:
> >>>>> "Lars" == Lars Ellenberg <lars.ellenberg@linbit.com> writes:
>
> Lars,
>
> Thanks for fixing this.
>
> I'd still like to see you use the lib call instead like you do for
> zeroout. I have some patches in the pipeline for multi-range discard
> support and things are going to break for drbd if you manually roll
> bios.
Okay, thanks for the heads up.
I think we just did this so we would not
have to use a synchronous interface there.
We are receiving (from network) and submitting (to lower level IO stack)
in the same context and would like the submit to be async.
Do you intend to provide an asynchronous interface?
Lars
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)
2014-06-24 11:53 ` Lars Ellenberg
@ 2014-06-24 23:11 ` Martin K. Petersen
2014-06-25 10:14 ` Lars Ellenberg
0 siblings, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2014-06-24 23:11 UTC (permalink / raw)
To: Lars Ellenberg
Cc: Martin K. Petersen, Stefan Priebe - Profihost AG, NeilBrown,
linux-raid, linux-scsi, JBottomley, Jens Axboe, konrad.wilk,
elder, Josh Durgin, Greg KH
>>>>> "Lars" == Lars Ellenberg <lars.ellenberg@linbit.com> writes:
Lars> We are receiving (from network) and submitting (to lower level IO
Lars> stack) in the same context and would like the submit to be async.
Lars> Do you intend to provide an asynchronous interface?
I guess we can look into that if there is a need.
Do different clients share that context? I.e. does a synchronous discard
block other clients from accessing the drbd server?
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)
2014-06-24 23:11 ` Martin K. Petersen
@ 2014-06-25 10:14 ` Lars Ellenberg
2014-06-26 1:44 ` Martin K. Petersen
0 siblings, 1 reply; 12+ messages in thread
From: Lars Ellenberg @ 2014-06-25 10:14 UTC (permalink / raw)
To: Martin K. Petersen
Cc: NeilBrown, linux-raid, linux-scsi, JBottomley, Jens Axboe,
konrad.wilk, elder, Josh Durgin, Greg KH
On Tue, Jun 24, 2014 at 07:11:47PM -0400, Martin K. Petersen wrote:
> >>>>> "Lars" == Lars Ellenberg <lars.ellenberg@linbit.com> writes:
>
> Lars> We are receiving (from network) and submitting (to lower level IO
> Lars> stack) in the same context and would like the submit to be async.
>
> Lars> Do you intend to provide an asynchronous interface?
>
> I guess we can look into that if there is a need.
>
> Do different clients share that context? I.e. does a synchronous discard
> block other clients from accessing the drbd server?
Uhm, it's not like exactly like that, really.
Because the way we do some internal bookkeeping,
we announce a max discard of 4 MiB.
So if some user on the "active" (Primary) DRBD
does large discards, you will end up submitting
lots of bios, and these are async.
Bios are the entry point to DRBD.
So DRBD ships these discard-bios over to the peer,
which then right now submits them as bios, again async.
So we do some pipelining, may have a number of discard bios in flight,
and effectively the latency will be increased by something in the order
of the network rtt.
If we now have to use the synchronous interface on the peer
for each discard bio, there is no longer any pipelining,
and the overall latency of a single "user" level discard
(that ends up doing many discard bios) will noticeably increase.
Also, since the "receiver" is blocked in "submit",
we cannot meanwhile interleave other, "normal" BIOs,
so a larger discard will block all write (and depending on configuration
and current state, also read) within that DRBD resource (which again may
be one or more DRBD minor devices or "volumes").
I don't have real-life numbers on how much that may hurt.
Similar for the WRITE_SAME interface (which we do not properly support
on the DRBD protocol level yet -- backward compatibility concerns -- but
intend to support "soon").
If we only have a synchronous interface,
we will probably have to either add some "async wrapper",
or defer such submissions to worker threads.
I'd prefer to have an async submit path.
Lars
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)
2014-06-25 10:14 ` Lars Ellenberg
@ 2014-06-26 1:44 ` Martin K. Petersen
0 siblings, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2014-06-26 1:44 UTC (permalink / raw)
To: Lars Ellenberg
Cc: Martin K. Petersen, NeilBrown, linux-raid, linux-scsi, JBottomley,
Jens Axboe, konrad.wilk, elder, Josh Durgin, Greg KH
>>>>> "Lars" == Lars Ellenberg <lars.ellenberg@linbit.com> writes:
Lars> Because the way we do some internal bookkeeping, we announce a max
Lars> discard of 4 MiB.
Yikes!
Lars> Similar for the WRITE_SAME interface (which we do not properly
Lars> support on the DRBD protocol level yet -- backward compatibility
Lars> concerns -- but intend to support "soon").
Yeah, WRITE SAME can take a long time to complete too. However, we're
typically issuing them 32 megs at a time.
Lars> If we only have a synchronous interface, we will probably have to
Lars> either add some "async wrapper", or defer such submissions to
Lars> worker threads. I'd prefer to have an async submit path.
OK, I'll chew on it.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-06-26 1:44 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-19 7:02 sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null) Stefan Priebe - Profihost AG
2014-06-20 3:08 ` Martin K. Petersen
2014-06-20 15:53 ` Lars Ellenberg
2014-06-20 16:49 ` Martin K. Petersen
2014-06-20 18:29 ` Lars Ellenberg
2014-06-21 17:48 ` Stefan Priebe
2014-06-23 13:38 ` Lars Ellenberg
2014-06-23 19:37 ` Martin K. Petersen
2014-06-24 11:53 ` Lars Ellenberg
2014-06-24 23:11 ` Martin K. Petersen
2014-06-25 10:14 ` Lars Ellenberg
2014-06-26 1:44 ` Martin K. Petersen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox