* FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL @ 2019-12-28 14:36 Pali Rohár 2019-12-28 19:14 ` Andreas Dilger 2019-12-31 22:54 ` Eric Sandeen 0 siblings, 2 replies; 9+ messages in thread From: Pali Rohár @ 2019-12-28 14:36 UTC (permalink / raw) To: Eric Sandeen, Andreas Dilger, David Sterba, Darrick J. Wong, linux-fsdevel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 558 bytes --] Hello! I see that you have introduced in commit 62750d0 two new IOCTLs for filesyetems: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL. I would like to ask, are these two new ioctls mean to be generic way for userspace to get or set fs label independently of which filesystem is used? Or are they only for btrfs? Because I was not able to find any documentation for it, what is format of passed buffer... null-term string? fixed-length? and in which encoding? utf-8? latin1? utf-16? or filesystem dependent? -- Pali Rohár pali.rohar@gmail.com [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL 2019-12-28 14:36 FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL Pali Rohár @ 2019-12-28 19:14 ` Andreas Dilger 2019-12-28 20:05 ` Pali Rohár 2019-12-31 22:54 ` Eric Sandeen 1 sibling, 1 reply; 9+ messages in thread From: Andreas Dilger @ 2019-12-28 19:14 UTC (permalink / raw) To: Pali Rohár Cc: Eric Sandeen, David Sterba, Darrick J. Wong, linux-fsdevel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1634 bytes --] > On Dec 28, 2019, at 7:36 AM, Pali Rohár <pali.rohar@gmail.com> wrote: > > Hello! > > I see that you have introduced in commit 62750d0 two new IOCTLs for > filesyetems: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL. > > I would like to ask, are these two new ioctls mean to be generic way for > userspace to get or set fs label independently of which filesystem is > used? Or are they only for btrfs? > > Because I was not able to find any documentation for it, what is format > of passed buffer... null-term string? fixed-length? and in which > encoding? utf-8? latin1? utf-16? or filesystem dependent? It seems that SETFSLABEL is supported by BtrFS and XFS, and GETFSLABEL also by GFS2. We were just discussing recently about adding it to ext4, so if you wanted to submit a patch for that it would be welcome. It looks like the label is a NUL-terminated string, up to the length allowed by the various filesystems. That said, it seems like a bit of a bug that the kernel will return -EFAULT if a string shorter than the maximum size is supplied (256 chars for btrfs). The copy_{to,from}_user() function will (I think) return the number of bytes remaining to be copied, so IMHO it would make sense that this is compared to the string length of the label, and not return -EFAULT if the buffer is large enough to hold the actual label. Otherwise, the caller has to magically know the maximum label size that is returned from any filesystem and/or allocate specific buffer sizes for different filesystem types, which makes it not very useful as a generic interface. Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 873 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL 2019-12-28 19:14 ` Andreas Dilger @ 2019-12-28 20:05 ` Pali Rohár 0 siblings, 0 replies; 9+ messages in thread From: Pali Rohár @ 2019-12-28 20:05 UTC (permalink / raw) To: Andreas Dilger Cc: Eric Sandeen, David Sterba, Darrick J. Wong, linux-fsdevel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 5889 bytes --] On Saturday 28 December 2019 12:14:57 Andreas Dilger wrote: > > On Dec 28, 2019, at 7:36 AM, Pali Rohár <pali.rohar@gmail.com> wrote: > > > > Hello! > > > > I see that you have introduced in commit 62750d0 two new IOCTLs for > > filesyetems: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL. > > > > I would like to ask, are these two new ioctls mean to be generic way for > > userspace to get or set fs label independently of which filesystem is > > used? Or are they only for btrfs? > > > > Because I was not able to find any documentation for it, what is format > > of passed buffer... null-term string? fixed-length? and in which > > encoding? utf-8? latin1? utf-16? or filesystem dependent? > > It seems that SETFSLABEL is supported by BtrFS and XFS, and GETFSLABEL > also by GFS2. We were just discussing recently about adding it to ext4, > so if you wanted to submit a patch for that it would be welcome. I have in-progress patches for FAT with following ioctls: #define FAT_IOCTL_GET_VOLUME_LABEL(len) _IOC(_IOC_READ, 'r', 0x14, len) /* variable length */ #define FAT_IOCTL_SET_VOLUME_LABEL _IOC(_IOC_WRITE, 'r', 0x15, 0) /* variable length, nul term string */ GET ioctl macro takes length of buffer and pass it via standard ioctl bits into kernel would know how many bytes can copy to userspace. And via SET ioctl is passed null term string, so kernel copy from userspace whole null term string (with some upper limit to page size or 1024 bytes). I plan to finish FAT patches in next month. > It looks like the label is a NUL-terminated string, up to the length > allowed by the various filesystems. That said, it seems like a bit of > a bug that the kernel will return -EFAULT if a string shorter than the > maximum size is supplied (256 chars for btrfs). FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls now seems like a nice general / unified approach, but it has problems which I already mention in questions in previous email. There are basically two kinds of filesystems when looking how filename / label should be represented. First group: file name / label is sequence of bytes (possible null termined) without any representation. It can be in any encoding, it is up to the user. There is no mapping to Unicode. Examples: ext4 Second group: file name / label is stored in exact encoding (defined internally or externally by e.g. code page). There is exact translate function to Unicode. For these file system there is either iocharset= or nls= mount option which maps name / label to bytes. Examples: vfat (UTF-16LE), ntfs (UTF-16LE), udf (Latin1 and UTF-16BE), iso9660 with joliet (UTF-16LE), apfs (UTF-8). All mentioned file systems by you are in first group, so you probably have not hit this "implementation" problem for FS_IOC_GETFSLABEL yet. And question is, what should FS_IOC_GETFSLABEL returns for file systems from second group? In my implementation for FAT_IOCTL_GET_VOLUME_LABEL I'm reading label buffer from filesyetem, then converting this buffer from encoding specified in codepage= mount option to Unicode and then from Unicode I'm converting it to encoding specified in iocharset= mount option. Why? Because same procedure is used for reading file names from vfat when corresponding file name does not have long UTF-16LE vfat entry (label on vfat really is not in UTF-16LE but in OEM code page). And I think that volume label should be handled in same way like file names. Is above behavior "label in ioctl would be retrieved in same encoding as file names" correct / expected also for FS_IOC_GETFSLABEL? Or do you have better idea how should filesystems from second group implement FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL? (E.g. always in UTF-8 independently of mount options? Or in filesystem native encoding, so sometimes in UTF-16LE, sometimes in UTF-16BE, sometimes in DOS OEM code page, sometimes in Latin1? Or something different?) > The copy_{to,from}_user() function will (I think) return the number of > bytes remaining to be copied, so IMHO it would make sense that this is > compared to the string length of the label, and not return -EFAULT if > the buffer is large enough to hold the actual label. Otherwise, the > caller has to magically know the maximum label size that is returned > from any filesystem and/or allocate specific buffer sizes for different > filesystem types, which makes it not very useful as a generic interface. I think that my approach with specifying length in ioctl bits for GET ioctl is more flexible. But I do not know if it is even possible to change as API is already in kernel. Also I see there more important bug in that API: #define FSLABEL_MAX 256 /* Max chars for the interface; each fs may differ */ git show 62750d040bd13 "This retains 256 chars as the maximum size through the interface, which is the btrfs limit and AFAIK exceeds any other filesystem's maximum label size." It has some artificial limit for volume label length. Which filesystems was checked that maximal label length cannot be larger? E.g. on UDF filesystem (supported by Linux kernel for a long time) is label stored in 128bit buffer. First byte specifies Unicode encoding, last byte specifies used length of buffer. Two possible encodings are currently defined: Latin1 and UTF-16BE. Therefore if FS_IOC_GETFSLABEL for UDF would use UTF-8 encoding, then maximal length for label is: (128-2)*2 = 252 plus null byte (*2 because all non-ASCII Latin1 characters which are one-byte are encoded in UTF-8 by two bytes). Which is almost upper limit of API. And this is just first filesytem which I checked. I would not be surprised if there is already filesystem which maximal length exceed current limit defined by kernel API. -- Pali Rohár pali.rohar@gmail.com [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL 2019-12-28 14:36 FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL Pali Rohár 2019-12-28 19:14 ` Andreas Dilger @ 2019-12-31 22:54 ` Eric Sandeen 2020-01-01 18:10 ` Theodore Y. Ts'o 1 sibling, 1 reply; 9+ messages in thread From: Eric Sandeen @ 2019-12-31 22:54 UTC (permalink / raw) To: Pali Rohár, Andreas Dilger, David Sterba, Darrick J. Wong, linux-fsdevel, linux-kernel On 12/28/19 6:36 AM, Pali Rohár wrote: > Hello! > > I see that you have introduced in commit 62750d0 two new IOCTLs for > filesyetems: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL. > > I would like to ask, are these two new ioctls mean to be generic way for > userspace to get or set fs label independently of which filesystem is > used? Or are they only for btrfs? The reason it was lifted out of btrfs to the vfs is so that other filesystems can use the same interface. However, it is up to each filesystem to implement it (and to interpret what's been written to or read from disk.) > Because I was not able to find any documentation for it, what is format > of passed buffer... null-term string? fixed-length? and in which > encoding? utf-8? latin1? utf-16? or filesystem dependent? It simply copies the bits from the memory location you pass in, it knows nothing of encodings. For the most part it's up to the filesystem's own utilities to do any interpretation of the resulting bits on disk, null-terminating maximal-length label strings, etc. -Eric ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL 2019-12-31 22:54 ` Eric Sandeen @ 2020-01-01 18:10 ` Theodore Y. Ts'o 2020-01-01 18:39 ` Pali Rohár 0 siblings, 1 reply; 9+ messages in thread From: Theodore Y. Ts'o @ 2020-01-01 18:10 UTC (permalink / raw) To: Eric Sandeen Cc: Pali Rohár, Andreas Dilger, David Sterba, Darrick J. Wong, linux-fsdevel, linux-kernel On Tue, Dec 31, 2019 at 04:54:18PM -0600, Eric Sandeen wrote: > > Because I was not able to find any documentation for it, what is format > > of passed buffer... null-term string? fixed-length? and in which > > encoding? utf-8? latin1? utf-16? or filesystem dependent? > > It simply copies the bits from the memory location you pass in, it knows > nothing of encodings. > > For the most part it's up to the filesystem's own utilities to do any > interpretation of the resulting bits on disk, null-terminating maximal-length > label strings, etc. I'm not sure this is going to be the best API design choice. The blkid library interprets the on disk format for each file syustem knowing what is the "native" format for that particular file system. This is mainly an issue only for the non-Linux file systems; for the Linux file system, the party line has historically been that we don't get involved with character encoding, but in practice, what that has evolved into is that userspace has standardized on UTF-8, and that's what we pass into the kernel from userspace by convention. But the problem is that if the goal is to make FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL work without the calling program knowing what file system type a particular pathname happens to be, then it would be easist for the userspace program if it can expect that it can always pass in a null-terminated UTF-8 string, and get back a null-terminated UTF-8. I bet that in practice, that is what most userspace programs are going to be do anyway, since it works that way for all other file system syscalls. So for a file system which is a non-Linux-native file system, if it happens to store the its label using utf-16, or some other Windows-system-silliness, it would work a lot better if it assumed that it was passed in utf-8, and stored in the the Windows file system using whatever crazy encoding Windows wants to use. Otherwise, why bother uplifting the ioctl to one which is file system independent, if the paramters are defined to be file system *dependent*? - Ted ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL 2020-01-01 18:10 ` Theodore Y. Ts'o @ 2020-01-01 18:39 ` Pali Rohár 2020-01-02 21:57 ` Darrick J. Wong 0 siblings, 1 reply; 9+ messages in thread From: Pali Rohár @ 2020-01-01 18:39 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Eric Sandeen, Andreas Dilger, David Sterba, Darrick J. Wong, linux-fsdevel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2583 bytes --] On Wednesday 01 January 2020 13:10:54 Theodore Y. Ts'o wrote: > On Tue, Dec 31, 2019 at 04:54:18PM -0600, Eric Sandeen wrote: > > > Because I was not able to find any documentation for it, what is format > > > of passed buffer... null-term string? fixed-length? and in which > > > encoding? utf-8? latin1? utf-16? or filesystem dependent? > > > > It simply copies the bits from the memory location you pass in, it knows > > nothing of encodings. > > > > For the most part it's up to the filesystem's own utilities to do any > > interpretation of the resulting bits on disk, null-terminating maximal-length > > label strings, etc. > > I'm not sure this is going to be the best API design choice. The > blkid library interprets the on disk format for each file syustem > knowing what is the "native" format for that particular file system. > This is mainly an issue only for the non-Linux file systems; for the > Linux file system, the party line has historically been that we don't > get involved with character encoding, but in practice, what that has > evolved into is that userspace has standardized on UTF-8, and that's > what we pass into the kernel from userspace by convention. > > But the problem is that if the goal is to make FS_IOC_GETFSLABEL and > FS_IOC_SETFSLABEL work without the calling program knowing what file > system type a particular pathname happens to be, then it would be > easist for the userspace program if it can expect that it can always > pass in a null-terminated UTF-8 string, and get back a null-terminated > UTF-8. I bet that in practice, that is what most userspace programs > are going to be do anyway, since it works that way for all other file > system syscalls. > > So for a file system which is a non-Linux-native file system, if it > happens to store the its label using utf-16, or some other > Windows-system-silliness, it would work a lot better if it assumed > that it was passed in utf-8, and stored in the the Windows file system > using whatever crazy encoding Windows wants to use. Otherwise, why > bother uplifting the ioctl to one which is file system independent, if > the paramters are defined to be file system *dependent*? Exactly. In another email I wrote that for those non-Linux-native filesystem could be used encoding specified in iocharset= mount parameter. I think it is better as usage of one fixing encoding (e.g. UTF-8) if other filesystem strings are propagated to userspace in other encoding (as specified by iocharset=). -- Pali Rohár pali.rohar@gmail.com [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL 2020-01-01 18:39 ` Pali Rohár @ 2020-01-02 21:57 ` Darrick J. Wong 2020-01-02 22:08 ` Pali Rohár 0 siblings, 1 reply; 9+ messages in thread From: Darrick J. Wong @ 2020-01-02 21:57 UTC (permalink / raw) To: Pali Rohár Cc: Theodore Y. Ts'o, Eric Sandeen, Andreas Dilger, David Sterba, linux-fsdevel, linux-kernel On Wed, Jan 01, 2020 at 07:39:20PM +0100, Pali Rohár wrote: > On Wednesday 01 January 2020 13:10:54 Theodore Y. Ts'o wrote: > > On Tue, Dec 31, 2019 at 04:54:18PM -0600, Eric Sandeen wrote: > > > > Because I was not able to find any documentation for it, what is format > > > > of passed buffer... null-term string? fixed-length? and in which > > > > encoding? utf-8? latin1? utf-16? or filesystem dependent? > > > > > > It simply copies the bits from the memory location you pass in, it knows > > > nothing of encodings. > > > > > > For the most part it's up to the filesystem's own utilities to do any > > > interpretation of the resulting bits on disk, null-terminating maximal-length > > > label strings, etc. > > > > I'm not sure this is going to be the best API design choice. The > > blkid library interprets the on disk format for each file syustem > > knowing what is the "native" format for that particular file system. > > This is mainly an issue only for the non-Linux file systems; for the > > Linux file system, the party line has historically been that we don't > > get involved with character encoding, but in practice, what that has > > evolved into is that userspace has standardized on UTF-8, and that's > > what we pass into the kernel from userspace by convention. > > > > But the problem is that if the goal is to make FS_IOC_GETFSLABEL and > > FS_IOC_SETFSLABEL work without the calling program knowing what file > > system type a particular pathname happens to be, then it would be > > easist for the userspace program if it can expect that it can always > > pass in a null-terminated UTF-8 string, and get back a null-terminated > > UTF-8. I bet that in practice, that is what most userspace programs > > are going to be do anyway, since it works that way for all other file > > system syscalls. "Null terminated sequence of bytes*" is more or less what xfsprogs do, and it looks like btrfs does that as well. (* with the idiotic exception that if the label is exactly 256 bytes long then the array is not required to have a null terminator, because btrfs encoded that quirk of their ondisk format into the API. <grumble>) So for VFAT, I think you can use the same code that does the name encoding transformations for iocharset= to handle labels, right? > > So for a file system which is a non-Linux-native file system, if it > > happens to store the its label using utf-16, or some other > > Windows-system-silliness, it would work a lot better if it assumed > > that it was passed in utf-8, and stored in the the Windows file system > > using whatever crazy encoding Windows wants to use. Otherwise, why > > bother uplifting the ioctl to one which is file system independent, if > > the paramters are defined to be file system *dependent*? > > Exactly. In another email I wrote that for those non-Linux-native > filesystem could be used encoding specified in iocharset= mount > parameter. I think it is better as usage of one fixing encoding (e.g. > UTF-8) if other filesystem strings are propagated to userspace in other > encoding (as specified by iocharset=). I'm confused by this statement... but I think we're saying the same thing? --D > > -- > Pali Rohár > pali.rohar@gmail.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL 2020-01-02 21:57 ` Darrick J. Wong @ 2020-01-02 22:08 ` Pali Rohár 2020-01-07 23:15 ` Darrick J. Wong 0 siblings, 1 reply; 9+ messages in thread From: Pali Rohár @ 2020-01-02 22:08 UTC (permalink / raw) To: Darrick J. Wong Cc: Theodore Y. Ts'o, Eric Sandeen, Andreas Dilger, David Sterba, linux-fsdevel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 3975 bytes --] On Thursday 02 January 2020 13:57:54 Darrick J. Wong wrote: > On Wed, Jan 01, 2020 at 07:39:20PM +0100, Pali Rohár wrote: > > On Wednesday 01 January 2020 13:10:54 Theodore Y. Ts'o wrote: > > > On Tue, Dec 31, 2019 at 04:54:18PM -0600, Eric Sandeen wrote: > > > > > Because I was not able to find any documentation for it, what is format > > > > > of passed buffer... null-term string? fixed-length? and in which > > > > > encoding? utf-8? latin1? utf-16? or filesystem dependent? > > > > > > > > It simply copies the bits from the memory location you pass in, it knows > > > > nothing of encodings. > > > > > > > > For the most part it's up to the filesystem's own utilities to do any > > > > interpretation of the resulting bits on disk, null-terminating maximal-length > > > > label strings, etc. > > > > > > I'm not sure this is going to be the best API design choice. The > > > blkid library interprets the on disk format for each file syustem > > > knowing what is the "native" format for that particular file system. > > > This is mainly an issue only for the non-Linux file systems; for the > > > Linux file system, the party line has historically been that we don't > > > get involved with character encoding, but in practice, what that has > > > evolved into is that userspace has standardized on UTF-8, and that's > > > what we pass into the kernel from userspace by convention. > > > > > > But the problem is that if the goal is to make FS_IOC_GETFSLABEL and > > > FS_IOC_SETFSLABEL work without the calling program knowing what file > > > system type a particular pathname happens to be, then it would be > > > easist for the userspace program if it can expect that it can always > > > pass in a null-terminated UTF-8 string, and get back a null-terminated > > > UTF-8. I bet that in practice, that is what most userspace programs > > > are going to be do anyway, since it works that way for all other file > > > system syscalls. > > "Null terminated sequence of bytes*" is more or less what xfsprogs do, > and it looks like btrfs does that as well. > > (* with the idiotic exception that if the label is exactly 256 bytes long > then the array is not required to have a null terminator, because btrfs > encoded that quirk of their ondisk format into the API. <grumble>) > > So for VFAT, I think you can use the same code that does the name > encoding transformations for iocharset= to handle labels, right? Yes I can! But I need to process also codepage= transformation (details in email <20191228200523.eaxpwxkpswzuihow@pali>). And I already have this implementation in progress. > > > So for a file system which is a non-Linux-native file system, if it > > > happens to store the its label using utf-16, or some other > > > Windows-system-silliness, it would work a lot better if it assumed > > > that it was passed in utf-8, and stored in the the Windows file system > > > using whatever crazy encoding Windows wants to use. Otherwise, why > > > bother uplifting the ioctl to one which is file system independent, if > > > the paramters are defined to be file system *dependent*? > > > > Exactly. In another email I wrote that for those non-Linux-native > > filesystem could be used encoding specified in iocharset= mount > > parameter. I think it is better as usage of one fixing encoding (e.g. > > UTF-8) if other filesystem strings are propagated to userspace in other > > encoding (as specified by iocharset=). > > I'm confused by this statement... but I think we're saying the same > thing? Theodore suggested to use UTF-8 encoding for FS_IOC_GETFSLABEL. And I suggested to use iocharset= encoding for FS_IOC_GETFSLABEL. You said to use for VFAT "same code that does the name encoding", so if I'm understanding correctly, yes it is the same thing (as VFAT use iocharset= and codepage= mount options for name encoding). Right? -- Pali Rohár pali.rohar@gmail.com [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL 2020-01-02 22:08 ` Pali Rohár @ 2020-01-07 23:15 ` Darrick J. Wong 0 siblings, 0 replies; 9+ messages in thread From: Darrick J. Wong @ 2020-01-07 23:15 UTC (permalink / raw) To: Pali Rohár Cc: Theodore Y. Ts'o, Eric Sandeen, Andreas Dilger, David Sterba, linux-fsdevel, linux-kernel On Thu, Jan 02, 2020 at 11:08:00PM +0100, Pali Rohár wrote: > On Thursday 02 January 2020 13:57:54 Darrick J. Wong wrote: > > On Wed, Jan 01, 2020 at 07:39:20PM +0100, Pali Rohár wrote: > > > On Wednesday 01 January 2020 13:10:54 Theodore Y. Ts'o wrote: > > > > On Tue, Dec 31, 2019 at 04:54:18PM -0600, Eric Sandeen wrote: > > > > > > Because I was not able to find any documentation for it, what is format > > > > > > of passed buffer... null-term string? fixed-length? and in which > > > > > > encoding? utf-8? latin1? utf-16? or filesystem dependent? > > > > > > > > > > It simply copies the bits from the memory location you pass in, it knows > > > > > nothing of encodings. > > > > > > > > > > For the most part it's up to the filesystem's own utilities to do any > > > > > interpretation of the resulting bits on disk, null-terminating maximal-length > > > > > label strings, etc. > > > > > > > > I'm not sure this is going to be the best API design choice. The > > > > blkid library interprets the on disk format for each file syustem > > > > knowing what is the "native" format for that particular file system. > > > > This is mainly an issue only for the non-Linux file systems; for the > > > > Linux file system, the party line has historically been that we don't > > > > get involved with character encoding, but in practice, what that has > > > > evolved into is that userspace has standardized on UTF-8, and that's > > > > what we pass into the kernel from userspace by convention. > > > > > > > > But the problem is that if the goal is to make FS_IOC_GETFSLABEL and > > > > FS_IOC_SETFSLABEL work without the calling program knowing what file > > > > system type a particular pathname happens to be, then it would be > > > > easist for the userspace program if it can expect that it can always > > > > pass in a null-terminated UTF-8 string, and get back a null-terminated > > > > UTF-8. I bet that in practice, that is what most userspace programs > > > > are going to be do anyway, since it works that way for all other file > > > > system syscalls. > > > > "Null terminated sequence of bytes*" is more or less what xfsprogs do, > > and it looks like btrfs does that as well. > > > > (* with the idiotic exception that if the label is exactly 256 bytes long > > then the array is not required to have a null terminator, because btrfs > > encoded that quirk of their ondisk format into the API. <grumble>) > > > > So for VFAT, I think you can use the same code that does the name > > encoding transformations for iocharset= to handle labels, right? > > Yes I can! But I need to process also codepage= transformation (details > in email <20191228200523.eaxpwxkpswzuihow@pali>). And I already have > this implementation in progress. <nod> > > > > So for a file system which is a non-Linux-native file system, if it > > > > happens to store the its label using utf-16, or some other > > > > Windows-system-silliness, it would work a lot better if it assumed > > > > that it was passed in utf-8, and stored in the the Windows file system > > > > using whatever crazy encoding Windows wants to use. Otherwise, why > > > > bother uplifting the ioctl to one which is file system independent, if > > > > the paramters are defined to be file system *dependent*? > > > > > > Exactly. In another email I wrote that for those non-Linux-native > > > filesystem could be used encoding specified in iocharset= mount > > > parameter. I think it is better as usage of one fixing encoding (e.g. > > > UTF-8) if other filesystem strings are propagated to userspace in other > > > encoding (as specified by iocharset=). > > > > I'm confused by this statement... but I think we're saying the same > > thing? > > Theodore suggested to use UTF-8 encoding for FS_IOC_GETFSLABEL. And I > suggested to use iocharset= encoding for FS_IOC_GETFSLABEL. You said to > use for VFAT "same code that does the name encoding", so if I'm > understanding correctly, yes it is the same thing (as VFAT use > iocharset= and codepage= mount options for name encoding). Right? Right. --D > -- > Pali Rohár > pali.rohar@gmail.com ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-01-07 23:15 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-12-28 14:36 FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL Pali Rohár 2019-12-28 19:14 ` Andreas Dilger 2019-12-28 20:05 ` Pali Rohár 2019-12-31 22:54 ` Eric Sandeen 2020-01-01 18:10 ` Theodore Y. Ts'o 2020-01-01 18:39 ` Pali Rohár 2020-01-02 21:57 ` Darrick J. Wong 2020-01-02 22:08 ` Pali Rohár 2020-01-07 23:15 ` Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).