[PATCH] Two bugs in fs/isofs

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] Two bugs in fs/isofs
@ 2015-10-21 11:00 Thomas Schmitt
  2015-10-21 12:51 ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Schmitt @ 2015-10-21 11:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, jack

Hi,

during regression tests with libisofs i found two bugs
in fs/isofs, which i reported to Debian
  
  "fs/isofs/util.c iso_date() will map years >= 2028 to 1970"

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800627

  "fs/isofs/rock.c coarsely truncates file names of 254 or 255 bytes length"

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=798300


I got the advise to mail to the addresses i am using now.
My apologies for not being used to the submission customs of
Linux kernel development. Please give me instructions if i made
mistakes which hamper processing.

I have tested my change proposals in a VM with several ISO
images as virtual CD-ROM. No negative effects to see.

Bugs in detail (with patches):

===============================================================
"fs/isofs/util.c iso_date() will map years >= 2028 to 1970"
---------------------------------------------------------------

Reproduce by:

  # File "/bin/true" is assumed to exist and be readable.
  # Else use any other readable file instead.

  xorriso -outdev test.iso \
          -blank as_needed \
          -map /bin/true /victim \
          -alter_date m 'Oct 01 22:06:12 2030' /victim --

  mount -o loop test.iso /mnt/iso

  ls -l /mnt/iso/victim

shows

  -rwxr-xr-x 1 root root 27080 Jan  1  1970 /mnt/iso/victim

--------------- Source analysis:

The function iso_date() interprets the unsigned bytes of ISO 9660
7-byte timestamps as 7 chars. By
	if (year < 0) {
		crtime = 0;
it maps years 1900+128 ... 1900+255 to 1970, if type char is signed.

iso_date() sends int parameters where mktime64() expects unsigned int.
Further it does not make sure that the parameters are in the ranges
expected by mktime64().

Finally, if year would get interpreted as unsigned char, the return
type of iso_date() would roll over in year 2038, unless int becomes
64 bit until then.

--------------- Remedy proposal:

* Repair isofs function iso_date():
  + Change return type to time64_t.
  + Interpret bytes from ISO 9660 with appropriate signedness.
  + Map illegal byte values to defaults.
  + Remove test and handling for (year < 0), which is no longer possible.

--- linux-source-4.1/fs/isofs/isofs.h	2015-08-17 05:52:51.000000000 +0200
+++ linux-4.1.6/fs/isofs/isofs.h	2015-10-02 17:28:37.984990113 +0200
@@ -103,7 +103,7 @@ static inline unsigned int isonum_733(ch
 	/* Ignore bigendian datum due to broken mastering programs */
 	return get_unaligned_le32(p);
 }
-extern int iso_date(char *, int);
+extern time64_t iso_date(char *, int);
 
 struct inode;		/* To make gcc happy */
 
--- linux-source-4.1/fs/isofs/util.c	2015-08-17 05:52:51.000000000 +0200
+++ linux-4.1.6/fs/isofs/util.c	2015-10-08 16:21:58.675894476 +0200
@@ -15,56 +15,59 @@
  * to GMT.  Thus  we should always be correct.
  */
 
-int iso_date(char * p, int flag)
+time64_t iso_date(char * p, int flag)
 {
-	int year, month, day, hour, minute, second, tz;
-	int crtime;
+	unsigned int year, month, day, hour, minute, second;
+	int tz;
+	time64_t crtime;
+
+	year = isonum_711(p + 0);
+	month = isonum_711(p + 1);
+	if (month > 12) month = 12;
+	day = isonum_711(p + 2);
+	if (day > 31) day = 31;
+	hour = isonum_711(p + 3);
+	if (hour > 23) hour = 23;
+	minute = isonum_711(p + 4);
+	if (minute > 59) minute = 59;
+	second = isonum_711(p + 5);
+	if (second > 59) second = 59;
+	if (flag == 0) tz = isonum_712(p + 6);
+	else tz = 0; /* High sierra has no time zone */
+
+	crtime = mktime64(year+1900, month, day, hour, minute, second);
+
+	/* sign extend */
+	if (tz & 0x80)
+		tz |= (-1 << 8);
+
+	/* 
+	 * The timezone offset is unreliable on some disks,
+	 * so we make a sanity check.  In no case is it ever
+	 * more than 13 hours from GMT, which is 52*15min.
+	 * The time is always stored in localtime with the
+	 * timezone offset being what get added to GMT to
+	 * get to localtime.  Thus we need to subtract the offset
+	 * to get to true GMT, which is what we store the time
+	 * as internally.  On the local system, the user may set
+	 * their timezone any way they wish, of course, so GMT
+	 * gets converted back to localtime on the receiving
+	 * system.
+	 *
+	 * NOTE: mkisofs in versions prior to mkisofs-1.10 had
+	 * the sign wrong on the timezone offset.  This has now
+	 * been corrected there too, but if you are getting screwy
+	 * results this may be the explanation.  If enough people
+	 * complain, a user configuration option could be added
+	 * to add the timezone offset in with the wrong sign
+	 * for 'compatibility' with older discs, but I cannot see how
+	 * it will matter that much.
+	 *
+	 * Thanks to kuhlmav@elec.canterbury.ac.nz (Volker Kuhlmann)
+	 * for pointing out the sign error.
+	 */
+	if (-52 <= tz && tz <= 52)
+		crtime -= tz * 15 * 60;
 
-	year = p[0];
-	month = p[1];
-	day = p[2];
-	hour = p[3];
-	minute = p[4];
-	second = p[5];
-	if (flag == 0) tz = p[6]; /* High sierra has no time zone */
-	else tz = 0;
-	
-	if (year < 0) {
-		crtime = 0;
-	} else {
-		crtime = mktime64(year+1900, month, day, hour, minute, second);
-
-		/* sign extend */
-		if (tz & 0x80)
-			tz |= (-1 << 8);
-		
-		/* 
-		 * The timezone offset is unreliable on some disks,
-		 * so we make a sanity check.  In no case is it ever
-		 * more than 13 hours from GMT, which is 52*15min.
-		 * The time is always stored in localtime with the
-		 * timezone offset being what get added to GMT to
-		 * get to localtime.  Thus we need to subtract the offset
-		 * to get to true GMT, which is what we store the time
-		 * as internally.  On the local system, the user may set
-		 * their timezone any way they wish, of course, so GMT
-		 * gets converted back to localtime on the receiving
-		 * system.
-		 *
-		 * NOTE: mkisofs in versions prior to mkisofs-1.10 had
-		 * the sign wrong on the timezone offset.  This has now
-		 * been corrected there too, but if you are getting screwy
-		 * results this may be the explanation.  If enough people
-		 * complain, a user configuration option could be added
-		 * to add the timezone offset in with the wrong sign
-		 * for 'compatibility' with older discs, but I cannot see how
-		 * it will matter that much.
-		 *
-		 * Thanks to kuhlmav@elec.canterbury.ac.nz (Volker Kuhlmann)
-		 * for pointing out the sign error.
-		 */
-		if (-52 <= tz && tz <= 52)
-			crtime -= tz * 15 * 60;
-	}
 	return crtime;
 }		


===============================================================
"fs/isofs/rock.c coarsely truncates file names of 254 or 255 bytes length"
---------------------------------------------------------------

Reproduce by:

  # Create Rock Ridge file name of 254 bytes length.
  # File "/bin/true" is assumed to exist and be readable.
  # Else use any other readable file instead.

  genisoimage -R -o test.iso -graft-points /12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234=/bin/true

  mount -o loop test.iso /mnt/iso

  ls /mnt/iso

shows

  12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456

That's 146 characters only.
If you do the same with xorriso you get other truncation lengths.

--------------- Source analysis:

There is a deliberate limit of < 254 in fs/isofs/rock.c

                       if ((strlen(retname) + rr->len - 5) >= 254) {
                                truncate = 1;
                                break;
                        }

It is not clear to me why there should be such a limit.
In the Debian bug report i discuss my findings about clients of
the Rock Ridge name reader.
Length up to 255 should be ok.
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=798300
  
The truncation drops the whole second NM entry which contains
the rest of the file name bytes. In above example it contained
the 108 missing digits.

Truncation nowadays has to take into respect that UTF-8 may
consist of multiple bytes and should avoid to leave incomplete
byte sequences.
(Does the kernel have a function for this ?)

The truncated names are not necessarily unique within the
directory. (There is few chance to check this when the name
gets composed from NM entries.)

--------------- Remedy proposal:

I implemented truncation in libisofs this way (from man xorriso):

                   Path components which are longer than the given
  number [64 to 255] will get truncated and have their last 33 bytes
  overwritten by a colon ':' and the hex representation of the MD5
  of the first 4095 bytes of the whole oversized  name.  Potential
  incomplete   UTF-8  characters  will  get  their  leading  bytes
  replaced by '_'.

This gives hope for unique truncated names and an opportunity for
implementing lookup via the oversized untruncated name.
The situation in libisofs gets a bit more complicated by the number
being adjustable (for Linux kernels which will be old in future).

If there is interest, i would try to port the truncation and the
lookup algorithms from libisofs to Linux. But i guess that
i am not the first one who needs to truncate UTF-8. So possibly
there are better ways than my userland code.

In my test kernel i only implemented and tested protection of the
innocent for now:

* Allow Rock Ridge names of 254 and 255 bytes length.

--- linux-source-4.1/fs/isofs/rock.c	2015-08-17 05:52:51.000000000 +0200
+++ linux-4.1.6/fs/isofs/rock.c	2015-10-07 21:53:13.281654707 +0200
@@ -267,7 +267,7 @@ repeat:
 					rr->u.NM.flags);
 				break;
 			}
-			if ((strlen(retname) + rr->len - 5) >= 254) {
+			if ((strlen(retname) + rr->len - 5) >= 256) {
 				truncate = 1;
 				break;
 			}

===============================================================

In https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800627
i mentioned a "third bug".

That one turned out to be a cooperation of Linux function
isofs_normalize_block_and_offset() and FreeBSD's program makefs,
which puts different timestamp equipment into '.' and the directory
record which actually holds the directory name.
(I have posted FreeBSD PR 203531 about four flaws of makefs produced
 ISO filesystems.)


Have a nice day :)

Thomas


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Two bugs in fs/isofs
  2015-10-21 11:00 Thomas Schmitt
@ 2015-10-21 12:51 ` Jan Kara
  0 siblings, 0 replies; 3+ messages in thread
From: Jan Kara @ 2015-10-21 12:51 UTC (permalink / raw)
  To: Thomas Schmitt; +Cc: linux-kernel, viro, jack

  Hi,

  thanks for detailed reports. For now I did some research on the case of
file name truncation.

> ===============================================================
> "fs/isofs/rock.c coarsely truncates file names of 254 or 255 bytes length"
> ---------------------------------------------------------------
> 
> Reproduce by:
> 
>   # Create Rock Ridge file name of 254 bytes length.
>   # File "/bin/true" is assumed to exist and be readable.
>   # Else use any other readable file instead.
> 
>   genisoimage -R -o test.iso -graft-points /12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234=/bin/true
> 
>   mount -o loop test.iso /mnt/iso
> 
>   ls /mnt/iso
> 
> shows
> 
>   12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456
> 
> That's 146 characters only.
> If you do the same with xorriso you get other truncation lengths.
> 
> --------------- Source analysis:
> 
> There is a deliberate limit of < 254 in fs/isofs/rock.c
> 
>                        if ((strlen(retname) + rr->len - 5) >= 254) {
>                                 truncate = 1;
>                                 break;
>                         }
> 
> It is not clear to me why there should be such a limit.
> In the Debian bug report i discuss my findings about clients of
> the Rock Ridge name reader.
> Length up to 255 should be ok.
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=798300

Looking into the code, we should be fine with names upto 1023 characters
long (that's the buffer space we have available in that code). However I
guess there's no point in supporting more than NAME_MAX (255). So I agree
with you here the limit of 254 seems to be off by two. I've tried to dig in
history but that code seems to predate even BitKeeper times...

> The truncation drops the whole second NM entry which contains
> the rest of the file name bytes. In above example it contained
> the 108 missing digits.

Yeah, that should be reasonably easy to fix.

> Truncation nowadays has to take into respect that UTF-8 may
> consist of multiple bytes and should avoid to leave incomplete
> byte sequences.
> (Does the kernel have a function for this ?)

Well, such truncation function would have to be specific to encoding the fs
uses.

> The truncated names are not necessarily unique within the
> directory. (There is few chance to check this when the name
> gets composed from NM entries.)

Well, true but is it worth the bother? I mean realistically, do people use
media with more than 255 characters in a file name or is it mostly a
theoretical concern?

								Honza

> --------------- Remedy proposal:
> 
> I implemented truncation in libisofs this way (from man xorriso):
> 
>                    Path components which are longer than the given
>   number [64 to 255] will get truncated and have their last 33 bytes
>   overwritten by a colon ':' and the hex representation of the MD5
>   of the first 4095 bytes of the whole oversized  name.  Potential
>   incomplete   UTF-8  characters  will  get  their  leading  bytes
>   replaced by '_'.
> 
> This gives hope for unique truncated names and an opportunity for
> implementing lookup via the oversized untruncated name.
> The situation in libisofs gets a bit more complicated by the number
> being adjustable (for Linux kernels which will be old in future).
> 
> If there is interest, i would try to port the truncation and the
> lookup algorithms from libisofs to Linux. But i guess that
> i am not the first one who needs to truncate UTF-8. So possibly
> there are better ways than my userland code.
> 
> In my test kernel i only implemented and tested protection of the
> innocent for now:
> 
> * Allow Rock Ridge names of 254 and 255 bytes length.
> 
> --- linux-source-4.1/fs/isofs/rock.c	2015-08-17 05:52:51.000000000 +0200
> +++ linux-4.1.6/fs/isofs/rock.c	2015-10-07 21:53:13.281654707 +0200
> @@ -267,7 +267,7 @@ repeat:
>  					rr->u.NM.flags);
>  				break;
>  			}
> -			if ((strlen(retname) + rr->len - 5) >= 254) {
> +			if ((strlen(retname) + rr->len - 5) >= 256) {
>  				truncate = 1;
>  				break;
>  			}
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Two bugs in fs/isofs
@ 2015-10-21 14:01 Thomas Schmitt
  0 siblings, 0 replies; 3+ messages in thread
From: Thomas Schmitt @ 2015-10-21 14:01 UTC (permalink / raw)
  To: jack; +Cc: viro, linux-kernel

Hi,

i wrote:
> > Truncation nowadays has to take into respect that UTF-8 may
> > consist of multiple bytes and should avoid to leave incomplete
> > byte sequences.
> > (Does the kernel have a function for this ?)

Jan Kara wrote:
> Well, such truncation function would have to be specific to encoding the fs
> uses.

But the problem of truncating a string that may contain multi-byte
UTF-8 characters is generic.

Rock Ridge gives no clue about the character set used with the names.
(libsofs can do via its SUSP protocol AAIP.)
Nowadays most unixly systems use UTF-8 anyways.
So if we truncate then we should avoid byte sequences which demand
more bytes to follow if interpreted as UTF-8.

> > The truncated names are not necessarily unique within the
> > directory.

> Well, true but is it worth the bother? I mean realistically, do people use
> media with more than 255 characters in a file name or is it mostly a
> theoretical concern?

One can easily produce such names with genisoimage.
libisofs refuses to produce more than 255 bytes name length.

It depends on the local filesystems whether such names can be
present in backup situations. Home user backup is my motivation
to care for ISO 9660 and optical drives.
So i had to implement qualified truncation in order to get the
minimum fidelity needed for backups.

I doubt anybody toggles 250+ bytes by hand. But in the three-byte
UTF-8 range, we get to the limit with less than 90 characters.
Also there may be automats with insane ideas about file naming.

The problem is that there will be no method to access the second
file of an identical name pair. One can study the behavior now
with two names of length 254 which differ only by bytes near
their end. The heavy truncation helps to create non-unique names.

One could use libisofs, e.g. via xorriso, to copy such inaccessible
files out of the ISO onto hard disk. (Provided my truncation method
is as good as i hope.)

The most simplistic way to get unique names would be mount(8) option
"norock". Then you get to see Joliet names or ISO 9660 names of
harmless length. But guessing the original name from an ISO 9660
name can then be an adventure of its own.
The MD5 suffix of libisofs would allow to compute the truncated
name from the known original name.

Have a nice day :)

Thomas

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-10-21 13:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-21 14:01 [PATCH] Two bugs in fs/isofs Thomas Schmitt
  -- strict thread matches above, loose matches on Subject: below --
2015-10-21 11:00 Thomas Schmitt
2015-10-21 12:51 ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).