All of lore.kernel.org
 help / color / mirror / Atom feed
* Compare And Write against unwritten ranges
@ 2016-07-26 12:14 David Disseldorp
  2016-07-26 18:29 ` Mike Christie
  0 siblings, 1 reply; 4+ messages in thread
From: David Disseldorp @ 2016-07-26 12:14 UTC (permalink / raw)
  To: Mike Christie; +Cc: ceph-devel@vger.kernel.org

Hi Mike,

Returning to the OSD cmpext functionality in
https://github.com/ceph/ceph/pull/8911 , I'm wondering how such
requests should be handled against unwritten ranges.

Currently an OSD will return -EINVAL to the client, as the short read
will be caught via:
https://github.com/ceph/ceph/pull/8911/commits/440895ea9f2604756c9f3c81e5c4ec5ca40401d7#diff-72747d40a424e7b5404366b557ff12a3R3722
-EINVAL then means that krbd will return an error for the corresponding
client I/O.

For read requests, rbd_img_obj_request_read_callback() handles
zero-filling read buffers that cover unwritten RBD ranges. For SCSI
Compare And Write the OSD is responsible for atomicity, so zero-filling
on the client side is problematic.
One potential option could be to add a truncate/zero operation to the
Compare And Write compound request, or optionally support truncate_seq
and truncate_size parameters in cmpext. Any thoughts/suggestions on the
approach here?

FWIW, I'm using the following test to trigger this from the client side:
https://github.com/sahlberg/libiscsi/pull/213/commits/e6f2ce306cf8706b666ec088fc74219bc52ea8cf

Cheers, David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Compare And Write against unwritten ranges
  2016-07-26 12:14 Compare And Write against unwritten ranges David Disseldorp
@ 2016-07-26 18:29 ` Mike Christie
  2016-07-27 12:57   ` David Disseldorp
  2016-07-29 11:09   ` David Disseldorp
  0 siblings, 2 replies; 4+ messages in thread
From: Mike Christie @ 2016-07-26 18:29 UTC (permalink / raw)
  To: David Disseldorp; +Cc: ceph-devel@vger.kernel.org

On 07/26/2016 07:14 AM, David Disseldorp wrote:
> Hi Mike,
> 
> Returning to the OSD cmpext functionality in
> https://github.com/ceph/ceph/pull/8911 , I'm wondering how such
> requests should be handled against unwritten ranges.
> 
> Currently an OSD will return -EINVAL to the client, as the short read
> will be caught via:
> https://github.com/ceph/ceph/pull/8911/commits/440895ea9f2604756c9f3c81e5c4ec5ca40401d7#diff-72747d40a424e7b5404366b557ff12a3R3722
> -EINVAL then means that krbd will return an error for the corresponding
> client I/O.
> 
> For read requests, rbd_img_obj_request_read_callback() handles
> zero-filling read buffers that cover unwritten RBD ranges. For SCSI
> Compare And Write the OSD is responsible for atomicity, so zero-filling
> on the client side is problematic.
> One potential option could be to add a truncate/zero operation to the
> Compare And Write compound request, or optionally support truncate_seq
> and truncate_size parameters in cmpext. Any thoughts/suggestions on the
> approach here?

We have a similar problem if the data needed to be copyup'd right? I
think the multi-op route might be nice because it could work for both cases.

Did you already try the multi op zero/truncate approach? Did you have to
make changes to the OSD code too?

A long while back, I was working on the copyup part of the problem but I
hit another problem. It was something like the copyup's write would
succeed, but when the cmpext op does the read it will fail still. If I
sent it down as a multi-op, some other bits/structs on the OSD side
needed to be updated before I could do the cmpext op. I cannot find the
patches and I never submitted because I had just hacked it in for
testing. Did you hit something similar?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Compare And Write against unwritten ranges
  2016-07-26 18:29 ` Mike Christie
@ 2016-07-27 12:57   ` David Disseldorp
  2016-07-29 11:09   ` David Disseldorp
  1 sibling, 0 replies; 4+ messages in thread
From: David Disseldorp @ 2016-07-27 12:57 UTC (permalink / raw)
  To: Mike Christie; +Cc: ceph-devel@vger.kernel.org

Thanks for the feedback Mike...

On Tue, 26 Jul 2016 13:29:03 -0500, Mike Christie wrote:

> On 07/26/2016 07:14 AM, David Disseldorp wrote:
> > Hi Mike,
> > 
> > Returning to the OSD cmpext functionality in
> > https://github.com/ceph/ceph/pull/8911 , I'm wondering how such
> > requests should be handled against unwritten ranges.
> > 
> > Currently an OSD will return -EINVAL to the client, as the short read
> > will be caught via:
> > https://github.com/ceph/ceph/pull/8911/commits/440895ea9f2604756c9f3c81e5c4ec5ca40401d7#diff-72747d40a424e7b5404366b557ff12a3R3722
> > -EINVAL then means that krbd will return an error for the corresponding
> > client I/O.
> > 
> > For read requests, rbd_img_obj_request_read_callback() handles
> > zero-filling read buffers that cover unwritten RBD ranges. For SCSI
> > Compare And Write the OSD is responsible for atomicity, so zero-filling
> > on the client side is problematic.
> > One potential option could be to add a truncate/zero operation to the
> > Compare And Write compound request, or optionally support truncate_seq
> > and truncate_size parameters in cmpext. Any thoughts/suggestions on the
> > approach here?
> 
> We have a similar problem if the data needed to be copyup'd right? I

Similar, but different - copyup should be handled via the
rbd_img_obj_parent_read_full() logic in rbd_img_obj_request_submit().

> think the multi-op route might be nice because it could work for both cases.
> 
> Did you already try the multi op zero/truncate approach? Did you have to
> make changes to the OSD code too?

I'm working on a multi-op prototype now, and will send you the patches
when done. I don't expect any changes on the OSD side.

> A long while back, I was working on the copyup part of the problem but I
> hit another problem. It was something like the copyup's write would
> succeed, but when the cmpext op does the read it will fail still. If I
> sent it down as a multi-op, some other bits/structs on the OSD side
> needed to be updated before I could do the cmpext op. I cannot find the
> patches and I never submitted because I had just hacked it in for
> testing. Did you hit something similar?

Yeah, I've seen issues with copyup+cmpext+write, but am treating that
as a separate problem for now.

Cheers, David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Compare And Write against unwritten ranges
  2016-07-26 18:29 ` Mike Christie
  2016-07-27 12:57   ` David Disseldorp
@ 2016-07-29 11:09   ` David Disseldorp
  1 sibling, 0 replies; 4+ messages in thread
From: David Disseldorp @ 2016-07-29 11:09 UTC (permalink / raw)
  To: Mike Christie; +Cc: ceph-devel@vger.kernel.org, Josh Durgin

On Tue, 26 Jul 2016 13:29:03 -0500, Mike Christie wrote:

> Did you already try the multi op zero/truncate approach? Did you have to
> make changes to the OSD code too?

I'm a little stumped by the OSD handling of these requests once truncate
is added to the mix...
As mentioned, with set-alloc-hint+cmpext(512~512)+write(512~512), the
cmpext/sync_read obtains an empty read buffer against the unwritten
range.
With set-alloc-hint+truncate(4194304)+cmpext(512~512)+write(512~512),
the cmpext/sync_read gets ENOENT from the filestore. The truncate
immediately prior doesn't appear to hit the filestore - vstart logs
below.

Cheers, David

7fd9bbfff700 10 osd.2 pg_epoch: 11 pg[0.1( empty local-les=9 n=0 ec=1 les/c/f 9/9/0 8/8/8) [2,0,1] r=0 lpr=8 crt=0'0 mlcod 0'0 active+clean] do_op 0:841a7acf:::rbd_data.100e74b0dc51.0000000000000000:head [set-alloc-hint object_size 4194304 write_size 4194304,truncate 4194304,cmpext 512~512,write 512~512] ov 0'0 av 11'1 snapc 0=[] snapset 0=[]:[]
7fd9bbfff700 10 osd.2 pg_epoch: 11 pg[0.1( empty local-les=9 n=0 ec=1 les/c/f 9/9/0 8/8/8) [2,0,1] r=0 lpr=8 crt=0'0 mlcod 0'0 active+clean]  taking ondisk_read_lock
7fd9bbfff700 10 osd.2 pg_epoch: 11 pg[0.1( empty local-les=9 n=0 ec=1 les/c/f 9/9/0 8/8/8) [2,0,1] r=0 lpr=8 crt=0'0 mlcod 0'0 active+clean] do_osd_op 0:841a7acf:::rbd_data.100e74b0dc51.0000000000000000:head [set-alloc-hint object_size 4194304 write_size 4194304,truncate 4194304,cmpext 512~512,write 512~512]
7fd9bbfff700 10 osd.2 pg_epoch: 11 pg[0.1( empty local-les=9 n=0 ec=1 les/c/f 9/9/0 8/8/8) [2,0,1] r=0 lpr=8 crt=0'0 mlcod 0'0 active+clean] do_osd_op  set-alloc-hint object_size 4194304 write_size 4194304
7fd9bbfff700 10 osd.2 pg_epoch: 11 pg[0.1( empty local-les=9 n=0 ec=1 les/c/f 9/9/0 8/8/8) [2,0,1] r=0 lpr=8 crt=0'0 mlcod 0'0 active+clean] do_osd_op  truncate 4194304
7fd9bbfff700 10 osd.2 pg_epoch: 11 pg[0.1( empty local-les=9 n=0 ec=1 les/c/f 9/9/0 8/8/8) [2,0,1] r=0 lpr=8 crt=0'0 mlcod 0'0 active+clean] do_osd_op  cmpext 512~512
7fd9bbfff700 10 osd.2 pg_epoch: 11 pg[0.1( empty local-les=9 n=0 ec=1 les/c/f 9/9/0 8/8/8) [2,0,1] r=0 lpr=8 crt=0'0 mlcod 0'0 active+clean] do_osd_op 0:841a7acf:::rbd_data.100e74b0dc51.0000000000000000:head [sync_read 512~512]
7fd9bbfff700 10 osd.2 pg_epoch: 11 pg[0.1( empty local-les=9 n=0 ec=1 les/c/f 9/9/0 8/8/8) [2,0,1] r=0 lpr=8 crt=0'0 mlcod 0'0 active+clean] do_osd_op  sync_read 512~512
7fd9bbfff700 15 filestore(/home/ddiss/isms/ceph/src/dev/osd2) read 0.1_head/#0:841a7acf:::rbd_data.100e74b0dc51.0000000000000000:head# 512~512
7fd9bbfff700 10 filestore(/home/ddiss/isms/ceph/src/dev/osd2) error opening file /home/ddiss/isms/ceph/src/dev/osd2/current/0.1_head/rbd\udata.100e74b0dc51.0000000000000000__head_F35E5821__0 with flags=2: (2) No such file or directory
7fd9bbfff700 10 filestore(/home/ddiss/isms/ceph/src/dev/osd2) FileStore::read(0.1_head/#0:841a7acf:::rbd_data.100e74b0dc51.0000000000000000:head#) open error: (2) No such file or directory
7fd9bbfff700 10 osd.2 pg_epoch: 11 pg[0.1( empty local-les=9 n=0 ec=1 les/c/f 9/9/0 8/8/8) [2,0,1] r=0 lpr=8 crt=0'0 mlcod 0'0 active+clean]  read got -2 / 0 bytes from obj 0:841a7acf:::rbd_data.100e74b0dc51.0000000000000000:head
7fd9bbfff700 -1 osd.2 pg_epoch: 11 pg[0.1( empty local-les=9 n=0 ec=1 les/c/f 9/9/0 8/8/8) [2,0,1] r=0 lpr=8 crt=0'0 mlcod 0'0 active+clean] do_extent_cmp do_osd_ops failed -2
7fd9bbfff700 10 osd.2 pg_epoch: 11 pg[0.1( empty local-les=9 n=0 ec=1 les/c/f 9/9/0 8/8/8) [2,0,1] r=0 lpr=8 crt=0'0 mlcod 0'0 active+clean]  dropping ondisk_read_lock
7fd9bbfff700  1 -- 192.168.155.1:6808/7807 --> 192.168.155.101:0/3185149525 -- osd_op_reply(78 rbd_data.100e74b0dc51.0000000000000000 [set-alloc-hint object_size 4194304 write_size 4194304,truncate 4194304,cmpext 512~512,write 512~512] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v7 -- ?+0 0x7fd9d002bde0 con 0x7fda08006230


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-07-29 11:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-26 12:14 Compare And Write against unwritten ranges David Disseldorp
2016-07-26 18:29 ` Mike Christie
2016-07-27 12:57   ` David Disseldorp
2016-07-29 11:09   ` David Disseldorp

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.