diff for duplicates of <1489006194.3098.12.camel@primarydata.com> diff --git a/a/1.txt b/N1/1.txt index 0410a14..ea74290 100644 --- a/a/1.txt +++ b/N1/1.txt @@ -1,82 +1,125 @@ -T24gV2VkLCAyMDE3LTAzLTA4IGF0IDE1OjMyIC0wNTAwLCBiZmllbGRzQGZpZWxkc2VzLm9yZyB3 -cm90ZToNCj4gT24gV2VkLCBNYXIgMDgsIDIwMTcgYXQgMDg6MTg6MzFQTSArMDAwMCwgVHJvbmQg -TXlrbGVidXN0IHdyb3RlOg0KPiA+IE9uIFdlZCwgMjAxNy0wMy0wOCBhdCAxNTowMCAtMDUwMCwg -T2xnYSBLb3JuaWV2c2thaWEgd3JvdGU6DQo+ID4gPiA+IE9uIE1hciA4LCAyMDE3LCBhdCAyOjUz -IFBNLCBKLiBCcnVjZSBGaWVsZHMgPGJmaWVsZHNAZmllbGRzZXMubw0KPiA+ID4gPiByZz4NCj4g -PiA+ID4gd3JvdGU6DQo+ID4gPiA+IA0KPiA+ID4gPiBPbiBXZWQsIE1hciAwOCwgMjAxNyBhdCAx -MjozMjoxMlBNIC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYQ0KPiA+ID4gPiB3cm90ZToNCj4gPiA+ -ID4gPiANCj4gPiA+ID4gPiA+IE9uIE1hciA4LCAyMDE3LCBhdCAxMjoyNSBQTSwgQ2hyaXN0b3Bo -IEhlbGx3aWcgPGhjaEBpbmZyYWRlDQo+ID4gPiA+ID4gPiBhZC5vDQo+ID4gPiA+ID4gPiByZz4N -Cj4gPiA+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiBPbiBXZWQsIE1h -ciAwOCwgMjAxNyBhdCAxMjowNToyMVBNIC0wNTAwLCBKLiBCcnVjZSBGaWVsZHMNCj4gPiA+ID4g -PiA+IHdyb3RlOg0KPiA+ID4gPiA+ID4gPiBTaW5jZSBjb3B5IGlzbid0IGF0b21pYyB0aGF0IGNo -ZWNrIGlzIG5ldmVyIGdvaW5nIHRvIGJlDQo+ID4gPiA+ID4gPiA+IHJlbGlhYmxlLg0KPiA+ID4g -PiA+ID4gDQo+ID4gPiA+ID4gPiBUaGF0J3MgdHJ1ZSBmb3IgZXZlcnl0aGluZyB0aGF0IENPUFkg -ZG9lcy7CoMKgQnkgdGhhdCBsb2dpYw0KPiA+ID4gPiA+ID4gd2UNCj4gPiA+ID4gPiA+IHNob3Vs -ZA0KPiA+ID4gPiA+ID4gbm90IGltcGxlbWVudCBpdCBhdCBhbGwgKGEgbG9naWMgdGhhdCBJJ2Qg -ZnVsbHkgc3VwcG9ydCkNCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBJZiB5b3Ugd2VyZSB0byBvbmx5 -IGtlZXAgQ0xPTkUgdGhlbiB5b3XigJlkIGxvc2UgYSBodWdlDQo+ID4gPiA+ID4gcGVyZm9ybWFu -Y2UNCj4gPiA+ID4gPiBnYWluDQo+ID4gPiA+ID4geW91IGdldCBmcm9tIHNlcnZlci10by1zZXJ2 -ZXIgQ09QWS7CoA0KPiA+ID4gPiANCj4gPiA+ID4gWWVzLsKgwqBBbHNvLCBJIHRoaW5rIGNvcHkt -bGlrZSBjb3B5IGltcGxlbWVudGF0aW9ucyBoYXZlDQo+ID4gPiA+IHJlYXNvbmFibGUNCj4gPiA+ -ID4gc2VtYW50aWNzIHRoYXQgYXJlIGJhc2ljYWxseSB0aGUgc2FtZSBhcyByZWFkOg0KPiA+ID4g -PiANCj4gPiA+ID4gCS0gY29weSBjYW4gcmV0dXJuIHN1Y2Nlc3NmdWxseSB3aXRoIGxlc3MgY29w -aWVkIHRoYW4NCj4gPiA+ID4gcmVxdWVzdGVkLg0KPiA+ID4gPiAJLSBpdCdzIGZpbmUgZm9yIHRo -ZSBjb3BpZWQgcmFuZ2UgdG8gc3RhcnQgYW5kL29yIGVuZA0KPiA+ID4gPiBwYXN0IGVuZA0KPiA+ -ID4gPiBvZg0KPiA+ID4gPiAJwqDCoGZpbGUsIGl0J2xsIGp1c3QgcmV0dXJuIGEgc2hvcnQgcmVh -ZC4NCj4gPiA+ID4gCS0gQSBjb3B5IG9mIG1vcmUgdGhhbiAwIGJ5dGVzIHJldHVybmluZyAwIG1l -YW5zIHlvdSdyZQ0KPiA+ID4gPiBhdCBlbmQNCj4gPiA+ID4gb2YNCj4gPiA+ID4gCcKgwqBmaWxl -Lg0KPiA+ID4gPiANCj4gPiA+ID4gVGhlIHBhcnRpY3VsYXIgcHJvYmxlbSBoZXJlIGlzIHRoYXQg -dGhhdCBkb2Vzbid0IGZpdCBob3cgY2xvbmUNCj4gPiA+ID4gd29ya3MgYXQNCj4gPiA+ID4gYWxs -Lg0KPiA+ID4gPiANCj4gPiA+ID4gSXQgZmVlbHMgbGlrZSB3aGF0IGhhcHBlbmVkIGlzIHRoYXQg -Y29weV9maWxlX3JhbmdlKCkgd2FzIG1hZGUNCj4gPiA+ID4gbWFpbmx5DQo+ID4gPiA+IGZvciB0 -aGUgY2xvbmUgY2FzZSwgd2l0aCB0aGUgaWRlYSB0aGF0IGNvcHkgbWlnaHQgYmUNCj4gPiA+ID4g -cmVsdWN0YW50bHkNCj4gPiA+ID4gYWNjZXB0ZWQgYXMgYSBzZWNvbmQtY2xhc3MgaW1wbGVtZW50 -YXRpb24uDQo+ID4gDQo+ID4gSGlzdG9yaWNhbGx5PyBOby4uLiBDaHJpc3RvcGggYWRkZWQgY2xv -bmUgYXMgYSB2YWxpZCBpbXBsZW1lbnRhdGlvbg0KPiA+IG9mDQo+ID4gY29weV9maWxlX3Jhbmdl -KCkgYWxtb3N0IGEgeWVhciBhZnRlciBaYWNoIGFuZCBBbm5hIGRlZmluZWQgdGhlDQo+ID4gc2Vt -YW50aWNzIG9mIHZmc19jb3B5X2ZpbGVfcmFuZ2UoKS4gZ2l0IGJsYW1lIGlzIHlvdXIgZnJpZW5k -Li4uDQo+IA0KPiBZZWFoLCBJIGtub3cuwqDCoEl0IHN0aWxsIGZlZWxzIHRvIG1lIGxpa2UgdGhl -IGludGVyZmFjZSB3YXMgb3JpZ2luYWxseQ0KPiBkZXNpZ25lZCB3aXRoIGNsb25lIGluIG1pbmQs -IGJ1dCB0aGF0J3MgbXkgdmFndWUgaW1wcmVzc2lvbiBmcm9tIHRoZQ0KPiBtYW4NCj4gcGFnZXMg -YW5kIGhhbGYtcmVtZW1iZXJlZCBjb252ZXJzYXRpb25zLg0KPiANCj4gVGhvdWdoIHRoZSBsYWNr -IG9mIGEgImp1c3QgY29weSB0aGUgd2hvbGUgZmlsZSByZWdhcmRsZXNzIG9mIHNpemUiDQo+IGNh -c2UNCj4gaXMgd2VpcmQgZm9yIGNsb25lLsKgwqBBbGwgeW91IGNhbiBkbyBpcyBzdGF0IHRoZSBm -aWxlIGFuZCB0aGVuIGhvcGUgaXQNCj4gZG9lc24ndCBjaGFuZ2UgYmVmb3JlIHlvdSBpc3N1ZSB0 -aGUgY29weV9maWxlX3JhbmdlLsKgwqBCdXQgSSdkIHRoaW5rDQo+IGl0J2QNCj4gYmUgZWFzeSBm -b3IgYW4gYXRvbWljIGNsb25lIGltcGxlbWVudGF0aW9uIHRvIGhhbmRsZSwgc2F5LCBnZXR0aW5n -IGENCj4gc25hcHNob3Qgb2YgYSBsb2cgZmlsZSB3aGlsZSBpdCdzIGdldHRpbmcgY29udGludW91 -c2x5IGFwcGVuZGVkIHRvLg0KDQpJdCByZWFsbHkgaXNuJ3QgdGhhdCBpbnRlcmVzdGluZyBpbiB0 -aGUgY29udGludW91c2x5IGFwcGVuZGVkIGNhc2UNCih3aGF0IGRpZmZlcmVuY2UgZG9lcyBpdCBt -YWtlIGlmIHlvdSBvbmx5IGdldCBkYXRhIGZyb20ganVzdCBhIGZldw0KbW9tZW50cyBhZ28pLCBi -dXQgSSBjYW4gc2VlIGl0IGJlaW5nIGFuIGlzc3VlIGluIHRoZSBjYXNlIG9mIHJhbmRvbQ0Kd3Jp -dGVzIHdoZXJlIHRoZSBmaWxlIHNpemUgaXMgYmVpbmcgZXh0ZW5kZWQuDQoNClRoZSB0aGluZyBp -cyB0aGF0IGluIGJvdGggdGhvc2UgY2FzZXMsIHRoZSBjb3B5X2ZpbGVfcmFuZ2UoKSBzZW1hbnRp -Y3MNCmFyZSB3b3JzZSwgc2luY2UgdGhleSBkb24ndCBldmVuIGd1YXJhbnRlZSBhIHRpbWUtY29u -c2lzdGVudCBjb3B5Lg0KDQo+ID4gPiA+IEJ1dCB0aGUgcGVyZm9ybWFuY2UgZ2FpbiBvZiBjb3B5 -IG9mZmxvYWQgaXMgdG9vIGJpZyB0byBqdXN0DQo+ID4gPiA+IGlnbm9yZSwNCj4gPiA+ID4gYW5k -DQo+ID4gPiA+IGluIGZhY3QgaXQncyB3aGF0IGNvcHlfZmlsZV9yYW5nZSBkb2VzIG9uIGV2ZXJ5 -IGZpbGVzeXN0ZW0gYnV0DQo+ID4gPiA+IGJ0cmZzIGFuZA0KPiA+ID4gPiBvY2ZzMiAoYW5kIG1h -eWJlIGNpZnM/KSwgc28gSSBkb24ndCB0aGluayB3ZSBjYW4ganVzdCBpZ25vcmUNCj4gPiA+ID4g -aXQuDQo+ID4gPiA+IA0KPiA+ID4gPiBJZiB3ZSBoYWQgc2VwYXJhdGUgY29weV9maWxlX3Jhbmdl -IGFuZCBjbG9uZV9maWxlX3JhbmdlLCBJDQo+ID4gPiA+ICp0aGluayoNCj4gPiA+ID4gaXQNCj4g -PiA+ID4gY291bGQgYWxsIGJlIG1hZGUgc2Vuc2libGUuwqDCoEFtIEkgbWlzc2luZyBzb21ldGhp -bmc/DQo+ID4gPiA+IA0KPiA+ID4gDQo+ID4gPiBIb3cgd291bGQgdGhlIGFwcGxpY2F0aW9uIChj -cCkga25vdyB3aGVuIHRvIGNhbGwgdGhlDQo+ID4gPiBjbG9uZV9maWxlX3JhbmdlDQo+ID4gPiBh -bmQgd2hlbiB0byBjYWxsIGNvcHlfZmlsZV9yYW5nZT8NCj4gPiANCj4gPiBjcCBjYW4gcHJvYmFi -bHkgY2FsbCBjb3B5X2ZpbGVfcmFuZ2UoKSwgYnV0IGFueSBhcHBsaWNhdGlvbiB0aGF0DQo+ID4g -bmVlZHMNCj4gPiBhdG9taWMgc2VtYW50aWNzIChpLmUuIGEgYmluYXJ5IG9wZXJhdGlvbiBzdWNj -ZXNzL2ZhaWwpIG11c3QgY2FsbA0KPiA+IGNsb25lX2ZpbGVfcmFuZ2UoKS4NCj4gDQo+IEkgZG9u -J3QgYmVsaWV2ZSB0aGVyZSdzIGEgY2xvbmVfZmlsZV9yYW5nZSgpLsKgwqBJIHNlZSB0aGUgdmZz -DQo+IGludGVyZmFjZSwNCj4gYnV0IG5vIHN5c3RlbSBjYWxsLg0KDQpUaGVyZSBpcyBhIHN0YW5k -YXJkIEZJQ0xPTkVSQU5HRSBpb2N0bCgpIHRoYXQgY2FuIGJlIHVzZWQgb24gYWxsDQpmaWxlc3lz -dGVtcyB0aGF0IHN1cHBvcnQgdGhlIHZmcyBpbnRlcmZhY2UuDQoNCj4gQW5kIGltcGxlbWVudGlu -ZyBhIHNpbXBsZSBjcCBpcyBoYXJkZXIgdGhhbiBpdCBzaG91bGQgYmUgd2hlbiB5b3UNCj4gZG9u -J3QNCj4ga25vdyB3aGV0aGVyIGl0J3MgaW1wbGVtZW50ZWQgYXMgY29weSBvciBjbG9uZS7CoMKg -WW91IGhhdmUgdG8gc3RhdCBmb3INCj4gdGhlIGZpbGUgc2l6ZSBmaXJzdCwgcmV0cnkgaWYgeW91 -IGdvdCBpdCB3cm9uZywgYW5kIGFsc28gcmV0cnkgaWYgeW91DQo+IGdldCBhIHNob3J0IHJlYWQu -wqDCoFRoZSBleGFtcGxlIGluIHRoZSBjbG9uZV9maWxlX3JhbmdlKCkgbWFuIHBhZ2UgaXMNCj4g -aW5jb21wbGV0ZS4NCg0KQXMgSSBzYWlkLCB5b3Ugc2hvdWxkbid0IGJlIHVzaW5nIGNvcHlfZmls -ZV9yYW5nZSgpIGVpdGhlciBpbiB0aGUgY2FzZQ0Kd2hlcmUgdGhlIGZpbGUgaXMgYmVpbmcgbW9k -aWZpZWQuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWlu -ZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo= +On Wed, 2017-03-08 at 15:32 -0500, bfields@fieldses.org wrote: +> On Wed, Mar 08, 2017 at 08:18:31PM +0000, Trond Myklebust wrote: +> > On Wed, 2017-03-08 at 15:00 -0500, Olga Kornievskaia wrote: +> > > > On Mar 8, 2017, at 2:53 PM, J. Bruce Fields <bfields@fieldses.o +> > > > rg> +> > > > wrote: +> > > > +> > > > On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia +> > > > wrote: +> > > > > +> > > > > > On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infrade +> > > > > > ad.o +> > > > > > rg> +> > > > > > wrote: +> > > > > > +> > > > > > On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields +> > > > > > wrote: +> > > > > > > Since copy isn't atomic that check is never going to be +> > > > > > > reliable. +> > > > > > +> > > > > > That's true for everything that COPY does. By that logic +> > > > > > we +> > > > > > should +> > > > > > not implement it at all (a logic that I'd fully support) +> > > > > +> > > > > If you were to only keep CLONE then you’d lose a huge +> > > > > performance +> > > > > gain +> > > > > you get from server-to-server COPY. +> > > > +> > > > Yes. Also, I think copy-like copy implementations have +> > > > reasonable +> > > > semantics that are basically the same as read: +> > > > +> > > > - copy can return successfully with less copied than +> > > > requested. +> > > > - it's fine for the copied range to start and/or end +> > > > past end +> > > > of +> > > > file, it'll just return a short read. +> > > > - A copy of more than 0 bytes returning 0 means you're +> > > > at end +> > > > of +> > > > file. +> > > > +> > > > The particular problem here is that that doesn't fit how clone +> > > > works at +> > > > all. +> > > > +> > > > It feels like what happened is that copy_file_range() was made +> > > > mainly +> > > > for the clone case, with the idea that copy might be +> > > > reluctantly +> > > > accepted as a second-class implementation. +> > +> > Historically? No... Christoph added clone as a valid implementation +> > of +> > copy_file_range() almost a year after Zach and Anna defined the +> > semantics of vfs_copy_file_range(). git blame is your friend... +> +> Yeah, I know. It still feels to me like the interface was originally +> designed with clone in mind, but that's my vague impression from the +> man +> pages and half-remembered conversations. +> +> Though the lack of a "just copy the whole file regardless of size" +> case +> is weird for clone. All you can do is stat the file and then hope it +> doesn't change before you issue the copy_file_range. But I'd think +> it'd +> be easy for an atomic clone implementation to handle, say, getting a +> snapshot of a log file while it's getting continuously appended to. + +It really isn't that interesting in the continuously appended case +(what difference does it make if you only get data from just a few +moments ago), but I can see it being an issue in the case of random +writes where the file size is being extended. + +The thing is that in both those cases, the copy_file_range() semantics +are worse, since they don't even guarantee a time-consistent copy. + +> > > > But the performance gain of copy offload is too big to just +> > > > ignore, +> > > > and +> > > > in fact it's what copy_file_range does on every filesystem but +> > > > btrfs and +> > > > ocfs2 (and maybe cifs?), so I don't think we can just ignore +> > > > it. +> > > > +> > > > If we had separate copy_file_range and clone_file_range, I +> > > > *think* +> > > > it +> > > > could all be made sensible. Am I missing something? +> > > > +> > > +> > > How would the application (cp) know when to call the +> > > clone_file_range +> > > and when to call copy_file_range? +> > +> > cp can probably call copy_file_range(), but any application that +> > needs +> > atomic semantics (i.e. a binary operation success/fail) must call +> > clone_file_range(). +> +> I don't believe there's a clone_file_range(). I see the vfs +> interface, +> but no system call. + +There is a standard FICLONERANGE ioctl() that can be used on all +filesystems that support the vfs interface. + +> And implementing a simple cp is harder than it should be when you +> don't +> know whether it's implemented as copy or clone. You have to stat for +> the file size first, retry if you got it wrong, and also retry if you +> get a short read. The example in the clone_file_range() man page is +> incomplete. + +As I said, you shouldn't be using copy_file_range() either in the case +where the file is being modified. + +-- +Trond Myklebust +Linux NFS client maintainer, PrimaryData +trond.myklebust@primarydata.com diff --git a/a/content_digest b/N1/content_digest index f9920d3..804835b 100644 --- a/a/content_digest +++ b/N1/content_digest @@ -19,87 +19,130 @@ " linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org>\0" "\00:1\0" "b\0" - "T24gV2VkLCAyMDE3LTAzLTA4IGF0IDE1OjMyIC0wNTAwLCBiZmllbGRzQGZpZWxkc2VzLm9yZyB3\n" - "cm90ZToNCj4gT24gV2VkLCBNYXIgMDgsIDIwMTcgYXQgMDg6MTg6MzFQTSArMDAwMCwgVHJvbmQg\n" - "TXlrbGVidXN0IHdyb3RlOg0KPiA+IE9uIFdlZCwgMjAxNy0wMy0wOCBhdCAxNTowMCAtMDUwMCwg\n" - "T2xnYSBLb3JuaWV2c2thaWEgd3JvdGU6DQo+ID4gPiA+IE9uIE1hciA4LCAyMDE3LCBhdCAyOjUz\n" - "IFBNLCBKLiBCcnVjZSBGaWVsZHMgPGJmaWVsZHNAZmllbGRzZXMubw0KPiA+ID4gPiByZz4NCj4g\n" - "PiA+ID4gd3JvdGU6DQo+ID4gPiA+IA0KPiA+ID4gPiBPbiBXZWQsIE1hciAwOCwgMjAxNyBhdCAx\n" - "MjozMjoxMlBNIC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYQ0KPiA+ID4gPiB3cm90ZToNCj4gPiA+\n" - "ID4gPiANCj4gPiA+ID4gPiA+IE9uIE1hciA4LCAyMDE3LCBhdCAxMjoyNSBQTSwgQ2hyaXN0b3Bo\n" - "IEhlbGx3aWcgPGhjaEBpbmZyYWRlDQo+ID4gPiA+ID4gPiBhZC5vDQo+ID4gPiA+ID4gPiByZz4N\n" - "Cj4gPiA+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiBPbiBXZWQsIE1h\n" - "ciAwOCwgMjAxNyBhdCAxMjowNToyMVBNIC0wNTAwLCBKLiBCcnVjZSBGaWVsZHMNCj4gPiA+ID4g\n" - "PiA+IHdyb3RlOg0KPiA+ID4gPiA+ID4gPiBTaW5jZSBjb3B5IGlzbid0IGF0b21pYyB0aGF0IGNo\n" - "ZWNrIGlzIG5ldmVyIGdvaW5nIHRvIGJlDQo+ID4gPiA+ID4gPiA+IHJlbGlhYmxlLg0KPiA+ID4g\n" - "PiA+ID4gDQo+ID4gPiA+ID4gPiBUaGF0J3MgdHJ1ZSBmb3IgZXZlcnl0aGluZyB0aGF0IENPUFkg\n" - "ZG9lcy7CoMKgQnkgdGhhdCBsb2dpYw0KPiA+ID4gPiA+ID4gd2UNCj4gPiA+ID4gPiA+IHNob3Vs\n" - "ZA0KPiA+ID4gPiA+ID4gbm90IGltcGxlbWVudCBpdCBhdCBhbGwgKGEgbG9naWMgdGhhdCBJJ2Qg\n" - "ZnVsbHkgc3VwcG9ydCkNCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBJZiB5b3Ugd2VyZSB0byBvbmx5\n" - "IGtlZXAgQ0xPTkUgdGhlbiB5b3XigJlkIGxvc2UgYSBodWdlDQo+ID4gPiA+ID4gcGVyZm9ybWFu\n" - "Y2UNCj4gPiA+ID4gPiBnYWluDQo+ID4gPiA+ID4geW91IGdldCBmcm9tIHNlcnZlci10by1zZXJ2\n" - "ZXIgQ09QWS7CoA0KPiA+ID4gPiANCj4gPiA+ID4gWWVzLsKgwqBBbHNvLCBJIHRoaW5rIGNvcHkt\n" - "bGlrZSBjb3B5IGltcGxlbWVudGF0aW9ucyBoYXZlDQo+ID4gPiA+IHJlYXNvbmFibGUNCj4gPiA+\n" - "ID4gc2VtYW50aWNzIHRoYXQgYXJlIGJhc2ljYWxseSB0aGUgc2FtZSBhcyByZWFkOg0KPiA+ID4g\n" - "PiANCj4gPiA+ID4gCS0gY29weSBjYW4gcmV0dXJuIHN1Y2Nlc3NmdWxseSB3aXRoIGxlc3MgY29w\n" - "aWVkIHRoYW4NCj4gPiA+ID4gcmVxdWVzdGVkLg0KPiA+ID4gPiAJLSBpdCdzIGZpbmUgZm9yIHRo\n" - "ZSBjb3BpZWQgcmFuZ2UgdG8gc3RhcnQgYW5kL29yIGVuZA0KPiA+ID4gPiBwYXN0IGVuZA0KPiA+\n" - "ID4gPiBvZg0KPiA+ID4gPiAJwqDCoGZpbGUsIGl0J2xsIGp1c3QgcmV0dXJuIGEgc2hvcnQgcmVh\n" - "ZC4NCj4gPiA+ID4gCS0gQSBjb3B5IG9mIG1vcmUgdGhhbiAwIGJ5dGVzIHJldHVybmluZyAwIG1l\n" - "YW5zIHlvdSdyZQ0KPiA+ID4gPiBhdCBlbmQNCj4gPiA+ID4gb2YNCj4gPiA+ID4gCcKgwqBmaWxl\n" - "Lg0KPiA+ID4gPiANCj4gPiA+ID4gVGhlIHBhcnRpY3VsYXIgcHJvYmxlbSBoZXJlIGlzIHRoYXQg\n" - "dGhhdCBkb2Vzbid0IGZpdCBob3cgY2xvbmUNCj4gPiA+ID4gd29ya3MgYXQNCj4gPiA+ID4gYWxs\n" - "Lg0KPiA+ID4gPiANCj4gPiA+ID4gSXQgZmVlbHMgbGlrZSB3aGF0IGhhcHBlbmVkIGlzIHRoYXQg\n" - "Y29weV9maWxlX3JhbmdlKCkgd2FzIG1hZGUNCj4gPiA+ID4gbWFpbmx5DQo+ID4gPiA+IGZvciB0\n" - "aGUgY2xvbmUgY2FzZSwgd2l0aCB0aGUgaWRlYSB0aGF0IGNvcHkgbWlnaHQgYmUNCj4gPiA+ID4g\n" - "cmVsdWN0YW50bHkNCj4gPiA+ID4gYWNjZXB0ZWQgYXMgYSBzZWNvbmQtY2xhc3MgaW1wbGVtZW50\n" - "YXRpb24uDQo+ID4gDQo+ID4gSGlzdG9yaWNhbGx5PyBOby4uLiBDaHJpc3RvcGggYWRkZWQgY2xv\n" - "bmUgYXMgYSB2YWxpZCBpbXBsZW1lbnRhdGlvbg0KPiA+IG9mDQo+ID4gY29weV9maWxlX3Jhbmdl\n" - "KCkgYWxtb3N0IGEgeWVhciBhZnRlciBaYWNoIGFuZCBBbm5hIGRlZmluZWQgdGhlDQo+ID4gc2Vt\n" - "YW50aWNzIG9mIHZmc19jb3B5X2ZpbGVfcmFuZ2UoKS4gZ2l0IGJsYW1lIGlzIHlvdXIgZnJpZW5k\n" - "Li4uDQo+IA0KPiBZZWFoLCBJIGtub3cuwqDCoEl0IHN0aWxsIGZlZWxzIHRvIG1lIGxpa2UgdGhl\n" - "IGludGVyZmFjZSB3YXMgb3JpZ2luYWxseQ0KPiBkZXNpZ25lZCB3aXRoIGNsb25lIGluIG1pbmQs\n" - "IGJ1dCB0aGF0J3MgbXkgdmFndWUgaW1wcmVzc2lvbiBmcm9tIHRoZQ0KPiBtYW4NCj4gcGFnZXMg\n" - "YW5kIGhhbGYtcmVtZW1iZXJlZCBjb252ZXJzYXRpb25zLg0KPiANCj4gVGhvdWdoIHRoZSBsYWNr\n" - "IG9mIGEgImp1c3QgY29weSB0aGUgd2hvbGUgZmlsZSByZWdhcmRsZXNzIG9mIHNpemUiDQo+IGNh\n" - "c2UNCj4gaXMgd2VpcmQgZm9yIGNsb25lLsKgwqBBbGwgeW91IGNhbiBkbyBpcyBzdGF0IHRoZSBm\n" - "aWxlIGFuZCB0aGVuIGhvcGUgaXQNCj4gZG9lc24ndCBjaGFuZ2UgYmVmb3JlIHlvdSBpc3N1ZSB0\n" - "aGUgY29weV9maWxlX3JhbmdlLsKgwqBCdXQgSSdkIHRoaW5rDQo+IGl0J2QNCj4gYmUgZWFzeSBm\n" - "b3IgYW4gYXRvbWljIGNsb25lIGltcGxlbWVudGF0aW9uIHRvIGhhbmRsZSwgc2F5LCBnZXR0aW5n\n" - "IGENCj4gc25hcHNob3Qgb2YgYSBsb2cgZmlsZSB3aGlsZSBpdCdzIGdldHRpbmcgY29udGludW91\n" - "c2x5IGFwcGVuZGVkIHRvLg0KDQpJdCByZWFsbHkgaXNuJ3QgdGhhdCBpbnRlcmVzdGluZyBpbiB0\n" - "aGUgY29udGludW91c2x5IGFwcGVuZGVkIGNhc2UNCih3aGF0IGRpZmZlcmVuY2UgZG9lcyBpdCBt\n" - "YWtlIGlmIHlvdSBvbmx5IGdldCBkYXRhIGZyb20ganVzdCBhIGZldw0KbW9tZW50cyBhZ28pLCBi\n" - "dXQgSSBjYW4gc2VlIGl0IGJlaW5nIGFuIGlzc3VlIGluIHRoZSBjYXNlIG9mIHJhbmRvbQ0Kd3Jp\n" - "dGVzIHdoZXJlIHRoZSBmaWxlIHNpemUgaXMgYmVpbmcgZXh0ZW5kZWQuDQoNClRoZSB0aGluZyBp\n" - "cyB0aGF0IGluIGJvdGggdGhvc2UgY2FzZXMsIHRoZSBjb3B5X2ZpbGVfcmFuZ2UoKSBzZW1hbnRp\n" - "Y3MNCmFyZSB3b3JzZSwgc2luY2UgdGhleSBkb24ndCBldmVuIGd1YXJhbnRlZSBhIHRpbWUtY29u\n" - "c2lzdGVudCBjb3B5Lg0KDQo+ID4gPiA+IEJ1dCB0aGUgcGVyZm9ybWFuY2UgZ2FpbiBvZiBjb3B5\n" - "IG9mZmxvYWQgaXMgdG9vIGJpZyB0byBqdXN0DQo+ID4gPiA+IGlnbm9yZSwNCj4gPiA+ID4gYW5k\n" - "DQo+ID4gPiA+IGluIGZhY3QgaXQncyB3aGF0IGNvcHlfZmlsZV9yYW5nZSBkb2VzIG9uIGV2ZXJ5\n" - "IGZpbGVzeXN0ZW0gYnV0DQo+ID4gPiA+IGJ0cmZzIGFuZA0KPiA+ID4gPiBvY2ZzMiAoYW5kIG1h\n" - "eWJlIGNpZnM/KSwgc28gSSBkb24ndCB0aGluayB3ZSBjYW4ganVzdCBpZ25vcmUNCj4gPiA+ID4g\n" - "aXQuDQo+ID4gPiA+IA0KPiA+ID4gPiBJZiB3ZSBoYWQgc2VwYXJhdGUgY29weV9maWxlX3Jhbmdl\n" - "IGFuZCBjbG9uZV9maWxlX3JhbmdlLCBJDQo+ID4gPiA+ICp0aGluayoNCj4gPiA+ID4gaXQNCj4g\n" - "PiA+ID4gY291bGQgYWxsIGJlIG1hZGUgc2Vuc2libGUuwqDCoEFtIEkgbWlzc2luZyBzb21ldGhp\n" - "bmc/DQo+ID4gPiA+IA0KPiA+ID4gDQo+ID4gPiBIb3cgd291bGQgdGhlIGFwcGxpY2F0aW9uIChj\n" - "cCkga25vdyB3aGVuIHRvIGNhbGwgdGhlDQo+ID4gPiBjbG9uZV9maWxlX3JhbmdlDQo+ID4gPiBh\n" - "bmQgd2hlbiB0byBjYWxsIGNvcHlfZmlsZV9yYW5nZT8NCj4gPiANCj4gPiBjcCBjYW4gcHJvYmFi\n" - "bHkgY2FsbCBjb3B5X2ZpbGVfcmFuZ2UoKSwgYnV0IGFueSBhcHBsaWNhdGlvbiB0aGF0DQo+ID4g\n" - "bmVlZHMNCj4gPiBhdG9taWMgc2VtYW50aWNzIChpLmUuIGEgYmluYXJ5IG9wZXJhdGlvbiBzdWNj\n" - "ZXNzL2ZhaWwpIG11c3QgY2FsbA0KPiA+IGNsb25lX2ZpbGVfcmFuZ2UoKS4NCj4gDQo+IEkgZG9u\n" - "J3QgYmVsaWV2ZSB0aGVyZSdzIGEgY2xvbmVfZmlsZV9yYW5nZSgpLsKgwqBJIHNlZSB0aGUgdmZz\n" - "DQo+IGludGVyZmFjZSwNCj4gYnV0IG5vIHN5c3RlbSBjYWxsLg0KDQpUaGVyZSBpcyBhIHN0YW5k\n" - "YXJkIEZJQ0xPTkVSQU5HRSBpb2N0bCgpIHRoYXQgY2FuIGJlIHVzZWQgb24gYWxsDQpmaWxlc3lz\n" - "dGVtcyB0aGF0IHN1cHBvcnQgdGhlIHZmcyBpbnRlcmZhY2UuDQoNCj4gQW5kIGltcGxlbWVudGlu\n" - "ZyBhIHNpbXBsZSBjcCBpcyBoYXJkZXIgdGhhbiBpdCBzaG91bGQgYmUgd2hlbiB5b3UNCj4gZG9u\n" - "J3QNCj4ga25vdyB3aGV0aGVyIGl0J3MgaW1wbGVtZW50ZWQgYXMgY29weSBvciBjbG9uZS7CoMKg\n" - "WW91IGhhdmUgdG8gc3RhdCBmb3INCj4gdGhlIGZpbGUgc2l6ZSBmaXJzdCwgcmV0cnkgaWYgeW91\n" - "IGdvdCBpdCB3cm9uZywgYW5kIGFsc28gcmV0cnkgaWYgeW91DQo+IGdldCBhIHNob3J0IHJlYWQu\n" - "wqDCoFRoZSBleGFtcGxlIGluIHRoZSBjbG9uZV9maWxlX3JhbmdlKCkgbWFuIHBhZ2UgaXMNCj4g\n" - "aW5jb21wbGV0ZS4NCg0KQXMgSSBzYWlkLCB5b3Ugc2hvdWxkbid0IGJlIHVzaW5nIGNvcHlfZmls\n" - "ZV9yYW5nZSgpIGVpdGhlciBpbiB0aGUgY2FzZQ0Kd2hlcmUgdGhlIGZpbGUgaXMgYmVpbmcgbW9k\n" - "aWZpZWQuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWlu\n" - ZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo= + "On Wed, 2017-03-08 at 15:32 -0500, bfields@fieldses.org wrote:\n" + "> On Wed, Mar 08, 2017 at 08:18:31PM +0000, Trond Myklebust wrote:\n" + "> > On Wed, 2017-03-08 at 15:00 -0500, Olga Kornievskaia wrote:\n" + "> > > > On Mar 8, 2017, at 2:53 PM, J. Bruce Fields <bfields@fieldses.o\n" + "> > > > rg>\n" + "> > > > wrote:\n" + "> > > > \n" + "> > > > On Wed, Mar 08, 2017 at 12:32:12PM -0500, Olga Kornievskaia\n" + "> > > > wrote:\n" + "> > > > > \n" + "> > > > > > On Mar 8, 2017, at 12:25 PM, Christoph Hellwig <hch@infrade\n" + "> > > > > > ad.o\n" + "> > > > > > rg>\n" + "> > > > > > wrote:\n" + "> > > > > > \n" + "> > > > > > On Wed, Mar 08, 2017 at 12:05:21PM -0500, J. Bruce Fields\n" + "> > > > > > wrote:\n" + "> > > > > > > Since copy isn't atomic that check is never going to be\n" + "> > > > > > > reliable.\n" + "> > > > > > \n" + "> > > > > > That's true for everything that COPY does.\302\240\302\240By that logic\n" + "> > > > > > we\n" + "> > > > > > should\n" + "> > > > > > not implement it at all (a logic that I'd fully support)\n" + "> > > > > \n" + "> > > > > If you were to only keep CLONE then you\342\200\231d lose a huge\n" + "> > > > > performance\n" + "> > > > > gain\n" + "> > > > > you get from server-to-server COPY.\302\240\n" + "> > > > \n" + "> > > > Yes.\302\240\302\240Also, I think copy-like copy implementations have\n" + "> > > > reasonable\n" + "> > > > semantics that are basically the same as read:\n" + "> > > > \n" + "> > > > \t- copy can return successfully with less copied than\n" + "> > > > requested.\n" + "> > > > \t- it's fine for the copied range to start and/or end\n" + "> > > > past end\n" + "> > > > of\n" + "> > > > \t\302\240\302\240file, it'll just return a short read.\n" + "> > > > \t- A copy of more than 0 bytes returning 0 means you're\n" + "> > > > at end\n" + "> > > > of\n" + "> > > > \t\302\240\302\240file.\n" + "> > > > \n" + "> > > > The particular problem here is that that doesn't fit how clone\n" + "> > > > works at\n" + "> > > > all.\n" + "> > > > \n" + "> > > > It feels like what happened is that copy_file_range() was made\n" + "> > > > mainly\n" + "> > > > for the clone case, with the idea that copy might be\n" + "> > > > reluctantly\n" + "> > > > accepted as a second-class implementation.\n" + "> > \n" + "> > Historically? No... Christoph added clone as a valid implementation\n" + "> > of\n" + "> > copy_file_range() almost a year after Zach and Anna defined the\n" + "> > semantics of vfs_copy_file_range(). git blame is your friend...\n" + "> \n" + "> Yeah, I know.\302\240\302\240It still feels to me like the interface was originally\n" + "> designed with clone in mind, but that's my vague impression from the\n" + "> man\n" + "> pages and half-remembered conversations.\n" + "> \n" + "> Though the lack of a \"just copy the whole file regardless of size\"\n" + "> case\n" + "> is weird for clone.\302\240\302\240All you can do is stat the file and then hope it\n" + "> doesn't change before you issue the copy_file_range.\302\240\302\240But I'd think\n" + "> it'd\n" + "> be easy for an atomic clone implementation to handle, say, getting a\n" + "> snapshot of a log file while it's getting continuously appended to.\n" + "\n" + "It really isn't that interesting in the continuously appended case\n" + "(what difference does it make if you only get data from just a few\n" + "moments ago), but I can see it being an issue in the case of random\n" + "writes where the file size is being extended.\n" + "\n" + "The thing is that in both those cases, the copy_file_range() semantics\n" + "are worse, since they don't even guarantee a time-consistent copy.\n" + "\n" + "> > > > But the performance gain of copy offload is too big to just\n" + "> > > > ignore,\n" + "> > > > and\n" + "> > > > in fact it's what copy_file_range does on every filesystem but\n" + "> > > > btrfs and\n" + "> > > > ocfs2 (and maybe cifs?), so I don't think we can just ignore\n" + "> > > > it.\n" + "> > > > \n" + "> > > > If we had separate copy_file_range and clone_file_range, I\n" + "> > > > *think*\n" + "> > > > it\n" + "> > > > could all be made sensible.\302\240\302\240Am I missing something?\n" + "> > > > \n" + "> > > \n" + "> > > How would the application (cp) know when to call the\n" + "> > > clone_file_range\n" + "> > > and when to call copy_file_range?\n" + "> > \n" + "> > cp can probably call copy_file_range(), but any application that\n" + "> > needs\n" + "> > atomic semantics (i.e. a binary operation success/fail) must call\n" + "> > clone_file_range().\n" + "> \n" + "> I don't believe there's a clone_file_range().\302\240\302\240I see the vfs\n" + "> interface,\n" + "> but no system call.\n" + "\n" + "There is a standard FICLONERANGE ioctl() that can be used on all\n" + "filesystems that support the vfs interface.\n" + "\n" + "> And implementing a simple cp is harder than it should be when you\n" + "> don't\n" + "> know whether it's implemented as copy or clone.\302\240\302\240You have to stat for\n" + "> the file size first, retry if you got it wrong, and also retry if you\n" + "> get a short read.\302\240\302\240The example in the clone_file_range() man page is\n" + "> incomplete.\n" + "\n" + "As I said, you shouldn't be using copy_file_range() either in the case\n" + "where the file is being modified.\n" + "\n" + "-- \n" + "Trond Myklebust\n" + "Linux NFS client maintainer, PrimaryData\n" + trond.myklebust@primarydata.com -015d054f725d5a4891f1cd784d888e0b050ea016c699579057bc66804098cb72 +f5d59714012746448afffa2ccd362311d15e0dc88d287102e74291b13b871716
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.