From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ulrich Drepper Subject: Re: NFSv4/pNFS possible POSIX I/O API standards Date: Tue, 05 Dec 2006 15:55:46 -0800 Message-ID: <45760702.6040805@redhat.com> References: <1164984094.5761.86.camel@lade.trondhjem.org> <20061203015203.GA5656@schatzie.adilger.int> <20061204073200.GB5637@schatzie.adilger.int> <1165245336.711.176.camel@lade.trondhjem.org> <4574C48A.8030007@mcs.anl.gov> <1165298200.5776.26.camel@lade.trondhjem.org> <20061205100748.GC5871@infradead.org> <4575E9B0.3060908@mcs.anl.gov> <20061205220538.GA1988@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Rob Ross , Trond Myklebust , Andreas Dilger , Sage Weil , Brad Boyer , Anton Altaparmakov , Gary Grider , linux-fsdevel@vger.kernel.org Return-path: Received: from mx1.redhat.com ([66.187.233.31]:58773 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758148AbWLEX4a (ORCPT ); Tue, 5 Dec 2006 18:56:30 -0500 To: Christoph Hellwig In-Reply-To: <20061205220538.GA1988@infradead.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Christoph Hellwig wrote: > Ulrich, this in reply to these API proposals: I know the documents. The HECWG was actually supposed to submit an=20 actual draft to the OpenGroup-internal working group but I haven't seen= =20 anything yet. I'm not opposed to getting real-world experience first. >> So other than this "lite" version of the readdirplus() call, and thi= s=20 >> idea of making the flags indicate validity rather than accuracy, are= =20 >> there other comments on the directory-related calls? I understand th= at=20 >> they might or might not ever make it in, but assuming they did, what= =20 >> other changes would you like to see? I don't think an accuracy flag is useful at all. Programs don't want t= o=20 use fuzzy information. If you want a fast 'ls -l' then add a mode whic= h=20 doesn't print the fields which are not provided. Don't provide outdate= d=20 information. Similarly for other programs. > statlite needs to separate the flag for valid fields from the actual > stat structure and reuse the existing stat(64) structure. stat lite > needs to at least get a better name, even better be folded into *stat= at*, > either by having a new AT_VALID_MASK flag that enables a new > unsigned int valid argument or by folding the valid flags into the AT= _ > flags. Yes, this is also my pet peeve with this interface. I don't want to=20 have another data structure. Especially since programs might want to=20 store the value in places where normal stat results are returned. And also yes on 'statat'. I strongly suggest to define only a statat=20 variant. In the standards group I'll vehemently oppose the introductio= n=20 of yet another superfluous non-*at interface. As for reusing the existing statat interface and magically add another=20 parameter through ellipsis: no. We need to become more type-safe. The= =20 userlevel interface needs to be a new one. For the system call there i= s=20 no such restriction. We can indeed extend the existing syscall. We=20 have appropriate checks for the validity of the flags parameter in plac= e=20 which make such calls backward compatible. > I think having a stat lite variant is pretty much consensus, we just = need > to fine tune the actual API - and of course get a reference implement= ation. > So if you want to get this going try to implement it based on > http://marc.theaimsgroup.com/?l=3Dlinux-fsdevel&m=3D115487991724607&w= =3D2. > Bonus points for actually making use of the flags in some filesystems= =2E I don't like that approach. The flag parameter should be exclusively a= n=20 output parameter. By default the kernel should fill in all the fields=20 it has access to. If access is not easily possible then set the bit an= d=20 clear the field. There are of course certain fields which always shoul= d=20 be added. In the proposed man page these are already identified (i.e.,= =20 those before the st_litemask member). > At the actual > C prototype level I would rename d_stat_err to d_stat_errno for consi= stency > and maybe drop the readdirplus() entry point in favour of readdirplus= _r > only - there is no point in introducing new non-reenetrant APIs today= =2E No, readdirplus should be kept (and yes, readdirplus_r must be added).=20 The reason is that the readdir_r interface is only needed if multiple=20 threads use the _same_ DIR stream. This is hardly ever the case.=20 =46orcing everybody to use the _r variant means that we unconditionally= =20 have to copy the data in the user-provided buffer. With readdir there=20 is the possibility to just pass back a pointer into the internal buffer= =20 read into by getdents. This is how readdir works for most kernel/arch=20 combinations. This requires that the dirent_plus structure matches so it's important=20 to get it right. I'm not comfortable with the current proposal. Yes,=20 having ordinary dirent and stat structure in there is a plus. But we=20 have overlap: - d_ino and st_ino - d_type and parts of st_mode And we have superfluous information - st_dev, the same for all entries, at least this is what readdir assumes I haven't made up my mind yet whether this is enough reason to introduc= e=20 a new type which isn't made up of the the two structures. And one last point: I haven't seen any discussion why readdirplus shoul= d=20 do the equivalent of stat and there is no 'statlite' variant. Are all=20 places for readdir is used non-critical for performance or depend on=20 accurate information? --=20 =E2=9E=A7 Ulrich Drepper =E2=9E=A7 Red Hat, Inc. =E2=9E=A7 444 Castro S= t =E2=9E=A7 Mountain View, CA =E2=9D=96 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html