linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] utimensat() non-conformances and fixes -- version 2
@ 2008-05-16  8:31 Michael Kerrisk
       [not found] ` <482D4665.4050401-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Kerrisk @ 2008-05-16  8:31 UTC (permalink / raw)
  To: Miklos Szeredi, Ulrich Drepper, Al Viro
  Cc: Andrew Morton, lkml, linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Miklos, Al, Ulrich,

Could you please review the following patch.

This is a revised version of my earlier
(http://article.gmane.org/gmane.linux.man/129/ ) patch to fix
non-conformances in the utimensat() implementation.  The patch is
against 2.6.22-rc2.  Miklos wrote a patch that is already in
2.6.22-rc2 to fix the security issues that he saw from my earlier
mail.  Miklos's patch also introduced a few spec non-conformances,
but provided me with some pointers to how to improve this version
of my patch.  The following paragraphs summarize the rules that
this patch implements:

Historical permissions rules for target file (utime(), utimes()):

[a] The EACCES error (only) occurs if times is NULL:
       The times argument is a null pointer and the effective user
       ID of the process does not match the owner of the file and
       write access is denied.

[b] The EPERM error (only) occurs if times is not NULL (i.e., both
    times are being changed):
       The times argument is not a null pointer, the calling
       process' effective user ID does not match the owner of the
       file, and the calling process does not have appropriate
       privileges.
       (As noted in a thread on the security@ list, the current spec
       for utimensat() needlessly makes mention of "write access"
       for the EPERM error.  I've raise a bug against the spec, and
       it's recognized as something that needs to be fixed.)

My summary of the rules from the draft spec for utimensat() in
the upcoming POSIX.1 revision:

[c] No error needs to be generated if
    times == {UTIME_OMIT,UTIME_OMIT}.

[d] The times == {UTIMES_NOW,UTIMES_NOW} case should be treated
    like times == NULL.

[e] The times == {UTIMES_NOW,UTIMES_OMIT}
    and times == {UTIMES_OMIT,UTIMES_NOW}
    cases should be treated like times == {m,n}.

Further historical Linux rules, for files with "ext2" extended file
attributes (see chattr(1)).

[f] Append-only attribute set: If times == NULL, and we own the
    file, then the call should succeed.

[g] Immutable attribute set: the call should fail, but the error
    depends on the value in times.

The following table shows the expected results for various cases,
and indicates places where 2.6.26-rc2 deviates from those results.
The key for the columns is shown at the end of the table.  All of
these cases (as well as many others) have been tested with my
patch, and conform to the rules above.  (See
http://article.gmane.org/gmane.linux.man/129/ for the test
program used.)

====================================================
times  Owner?    File        Expected     2.6.26-rc2
 arg           Writable?      Result      deviation

N       o         w          success
N       o         !w         success
N       !o        w          success
!N-n-n  !o        w          success
N       !o        !w         EACCES  [1]
!N-n-n  !o        !w         EACCES  [1a]

!N      o         w          success
!N      o         !w         success
!N      !o        w          EPERM   [2]
!N-o-n  !o        w          EPERM   [2a]  success
!N      !o        !w         EPERM   [3]
!N-o-n  !o        !w         EPERM   [3a]  EACCES

(Append-only attribute set, file writable)
N       o         append     success [4]
!N-n-n  o         append     success [4a]  EPERM
!N-n-o  o         append     EPERM   [5]

(Immutable attribute set, file writable)
N       o         immutable  EACCES  [6]
!N-n-n  o         immutable  EACCES  [6a]  EPERM
!N-n-o  o         immutable  EPERM   [7]
====================================================

Key to table columns:
times arg:
  N       times is NULL
  !N      times is not NULL, and at least one of the fields is a
          value other than UTIME_OMIT or UTIME_NOW
  !N-n-n  times == {UTIME_NOW,UTIME_NOW}
  !N-n-o  times == {UTIME_NOW,UTIME_OMIT} ||
          times == {UTIME_OMIT,UTIME_NOW}

Owner:
  o       Process EUID is owner of file
  !o      Process EUID is not owner of file

File writable
  w       Process has permission to write to file
  !w      Process does not have permission to write to file

The "Expected result" column shows either "success" or expected
value in errno after failure

Labels in [] are provided for my own reference, and in case anyone wants to
refer to specific cases in discussing the patch.

Cheers,

Michael


Signed-off-by: Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

--- linux-2.6.26-rc2/fs/utimes.c	2008-05-15 10:33:20.000000000 +0200
+++ linux-2.6.26-rc2-mtk/fs/utimes.c	2008-05-16 07:33:02.000000000 +0200
@@ -40,14 +40,9 @@

 #endif

-static bool nsec_special(long nsec)
-{
-	return nsec == UTIME_OMIT || nsec == UTIME_NOW;
-}
-
 static bool nsec_valid(long nsec)
 {
-	if (nsec_special(nsec))
+	if (nsec == UTIME_OMIT || nsec == UTIME_NOW)
 		return true;

 	return nsec >= 0 && nsec <= 999999999;
@@ -106,7 +101,9 @@
 	newattrs.ia_valid = ATTR_CTIME | ATTR_MTIME | ATTR_ATIME;
 	if (times) {
 		error = -EPERM;
-                if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
+		if (!(times[0].tv_nsec == UTIME_NOW &&
+		      times[1].tv_nsec == UTIME_NOW) &&
+		    (IS_IMMUTABLE(inode) || IS_APPEND(inode)))
 			goto mnt_drop_write_and_out;

 		if (times[0].tv_nsec == UTIME_OMIT)
@@ -131,9 +128,10 @@
 	 * UTIME_NOW, then need to check permissions, because
 	 * inode_change_ok() won't do it.
 	 */
-	if (!times || (nsec_special(times[0].tv_nsec) &&
-		       nsec_special(times[1].tv_nsec))) {
+	if (!times || (times[0].tv_nsec == UTIME_NOW &&
+		       times[1].tv_nsec == UTIME_NOW)) {
 		error = -EACCES;
+
                 if (IS_IMMUTABLE(inode))
 			goto mnt_drop_write_and_out;

@@ -147,7 +145,20 @@
 					goto mnt_drop_write_and_out;
 			}
 		}
+	} else if (times && ((times[0].tv_nsec == UTIME_NOW &&
+			      times[1].tv_nsec == UTIME_OMIT)
+			     ||
+			     (times[0].tv_nsec == UTIME_OMIT &&
+			      times[1].tv_nsec == UTIME_NOW))) {
+		error = -EPERM;
+
+		if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
+			goto mnt_drop_write_and_out;
+
+		if (!is_owner_or_cap(inode))
+			goto mnt_drop_write_and_out;
 	}
+
 	mutex_lock(&inode->i_mutex);
 	error = notify_change(dentry, &newattrs);
 	mutex_unlock(&inode->i_mutex);

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes -- version 2
       [not found] ` <482D4665.4050401-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2008-05-16  8:34   ` Michael Kerrisk
  2008-05-16 16:59   ` Miklos Szeredi
  1 sibling, 0 replies; 33+ messages in thread
From: Michael Kerrisk @ 2008-05-16  8:34 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Miklos Szeredi, Ulrich Drepper, Al Viro, Andrew Morton, lkml,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

For completeness here's the test program again:

/* t_utimensat.c

   Copyright (C) 2008, Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

   Licensed under the GPLv2 or later.

   A command-line interface for testing the utimensat() system
   call.

   17 Mar 2008  Initial creation
*/
#define _GNU_SOURCE
#define _ATFILE_SOURCE
#include <stdio.h>
#include <time.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <fcntl.h>
#include <string.h>
#include <sys/stat.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)


#define __NR_utimensat          320     /* x86 syscall number */

# define UTIME_NOW      ((1l << 30) - 1l)
# define UTIME_OMIT     ((1l << 30) - 2l)

static inline int
utimensat(int dirfd, const char *pathname,
          const struct timespec times[2], int flags)
{
    return syscall(__NR_utimensat, dirfd, pathname, times, flags);
}

static void
usageError(char *progName)
{
    fprintf(stderr, "Usage: %s pathname [atime-sec "
            "atime-nsec mtime-sec mtime-nsec]\n\n", progName);
    fprintf(stderr, "Permitted options are:\n");
    fprintf(stderr, "    [-d path] "
            "open a directory file descriptor"
            " (instead of using AT_FDCWD)\n");
    fprintf(stderr, "    -w        Open directory file "
            "descriptor with O_RDWR (instead of O_RDONLY)\n");
    fprintf(stderr, "    -n        Use AT_SYMLINK_NOFOLLOW\n");
    fprintf(stderr, "\n");

    fprintf(stderr, "pathname can be \"NULL\" to use NULL "
            "argument in call\n");
    fprintf(stderr, "\n");

    fprintf(stderr, "Either nsec field can be\n");
    fprintf(stderr, "    'n' for UTIME_NOW\n");
    fprintf(stderr, "    'o' for UTIME_OMIT\n");
    fprintf(stderr, "\n");

    fprintf(stderr, "If the time fields are omitted, "
            "then a NULL 'times' argument is used\n");
    fprintf(stderr, "\n");

    exit(EXIT_FAILURE);
}

int
main(int argc, char *argv[])
{
    int flags, dirfd, opt, oflag;
    struct timespec ts[2];
    struct timespec *tsp;
    char *pathname, *dirfdPath;
    struct stat sb;

    flags = 0;
    dirfd = AT_FDCWD;
    dirfdPath = NULL;
    oflag = O_RDONLY;

    while ((opt = getopt(argc, argv, "d:nw")) != -1) {
        switch (opt) {
        case 'd':
            dirfdPath = optarg;
            break;

        case 'n':
            flags |= AT_SYMLINK_NOFOLLOW;
            printf("Not following symbolic links\n");
            break;

        case 'w':
            oflag = O_RDWR;
            break;

        default:
            usageError(argv[0]);
        }
    }

    if ((optind + 5 != argc) && (optind + 1 != argc))
        usageError(argv[0]);

    if (dirfdPath != NULL) {
        dirfd = open(dirfdPath, oflag);
        if (dirfd == -1) errExit("open");

        printf("Opened dirfd");
        printf(" O_RDWR");
        printf(": %s\n", dirfdPath);
    }

    pathname = (strcmp(argv[optind], "NULL") == 0) ?
                        NULL : argv[optind];

    if (argc == optind + 1) {
        tsp = NULL;

    } else {
        ts[0].tv_sec = atoi(argv[optind + 1]);
        if (argv[optind + 2][0] == 'n') {
            ts[0].tv_nsec = UTIME_NOW;
        } else if (argv[optind + 2][0] == 'o') {
            ts[0].tv_nsec = UTIME_OMIT;
        } else {
            ts[0].tv_nsec = atoi(argv[optind + 2]);
        }

        ts[1].tv_sec = atoi(argv[optind + 3]);
        if (argv[optind + 4][0] == 'n') {
            ts[1].tv_nsec = UTIME_NOW;
        } else if (argv[optind + 4][0] == 'o') {
            ts[1].tv_nsec = UTIME_OMIT;
        } else {
            ts[1].tv_nsec = atoi(argv[optind + 4]);
        }

        tsp = ts;
    }

    /* For testing purposes, it may have been useful to run this
       program as set-user-ID-root so that a directory file
       descriptor could be opened as root.  Now we reset to the
       real UID before making the utimensat() call, so that the
       permission checking is performed under that UID. */

    if (geteuid() == 0) {
        uid_t u;

        u = getuid();

        printf("Resettng UIDs to %ld\n", (long) u);

        if (setresuid(u, u, u) == -1)
            errExit("setresuid");
    }

    printf("dirfd is %d\n", dirfd);
    printf("pathname is %s\n", pathname);
    printf("tsp is %p", tsp);
    if (tsp != NULL) {
        printf("; struct  = { %ld, %ld } { %ld, %ld }",
                (long) tsp[0].tv_sec, (long) tsp[0].tv_nsec,
                (long) tsp[1].tv_sec, (long) tsp[1].tv_nsec);
    }
    printf("\n");
    printf("flags is %d\n", flags);

    if (utimensat(dirfd, pathname, tsp, flags) == -1) {
        if (errno == EPERM)
            printf("utimensat failed with EPERM\n");
        else if (errno == EACCES)
            printf("utimensat failed with EACCES\n");
        else
            perror("utimensat");
        exit(EXIT_FAILURE);
    }

    printf("utimensat() succeeded\n");

    if (stat(pathname, &sb) == -1) errExit("stat");
    printf("Last file access:         %s", ctime(&sb.st_atime));
    printf("Last file modification:   %s", ctime(&sb.st_mtime));
    printf("Last status change:       %s", ctime(&sb.st_ctime));

    exit(EXIT_SUCCESS);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes -- version 2
       [not found] ` <482D4665.4050401-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2008-05-16  8:34   ` Michael Kerrisk
@ 2008-05-16 16:59   ` Miklos Szeredi
       [not found]     ` <E1Jx3Gw-0002eA-55-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
  1 sibling, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2008-05-16 16:59 UTC (permalink / raw)
  To: mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg
  Cc: miklos-sUDqSbJrdHQHWmgEVkV9KA, drepper-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

> This is a revised version of my earlier
> (http://article.gmane.org/gmane.linux.man/129/ ) patch to fix
> non-conformances in the utimensat() implementation.

The patch looks functionally correct.  But there are several things I
don't really like...

> --- linux-2.6.26-rc2/fs/utimes.c	2008-05-15 10:33:20.000000000 +0200
> +++ linux-2.6.26-rc2-mtk/fs/utimes.c	2008-05-16 07:33:02.000000000 +0200
> @@ -40,14 +40,9 @@
> 
>  #endif
> 
> -static bool nsec_special(long nsec)
> -{
> -	return nsec == UTIME_OMIT || nsec == UTIME_NOW;
> -}
> -
>  static bool nsec_valid(long nsec)
>  {
> -	if (nsec_special(nsec))
> +	if (nsec == UTIME_OMIT || nsec == UTIME_NOW)
>  		return true;
> 
>  	return nsec >= 0 && nsec <= 999999999;
> @@ -106,7 +101,9 @@
>  	newattrs.ia_valid = ATTR_CTIME | ATTR_MTIME | ATTR_ATIME;
>  	if (times) {
>  		error = -EPERM;
> -                if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
> +		if (!(times[0].tv_nsec == UTIME_NOW &&
> +		      times[1].tv_nsec == UTIME_NOW) &&
> +		    (IS_IMMUTABLE(inode) || IS_APPEND(inode)))
>  			goto mnt_drop_write_and_out;
> 
>  		if (times[0].tv_nsec == UTIME_OMIT)
> @@ -131,9 +128,10 @@
>  	 * UTIME_NOW, then need to check permissions, because
>  	 * inode_change_ok() won't do it.
>  	 */
> -	if (!times || (nsec_special(times[0].tv_nsec) &&
> -		       nsec_special(times[1].tv_nsec))) {
> +	if (!times || (times[0].tv_nsec == UTIME_NOW &&
> +		       times[1].tv_nsec == UTIME_NOW)) {
>  		error = -EACCES;
> +

How about explicitly turning UTIME_NOW/UTIME_NOW into times = NULL at
the beginning of the function?  That would both simplify things and
also make it absolutely sure that the two cases are handled the same
way (which makes sense, and is also what the standard wants).

>                  if (IS_IMMUTABLE(inode))
>  			goto mnt_drop_write_and_out;
> 
> @@ -147,7 +145,20 @@
>  					goto mnt_drop_write_and_out;
>  			}
>  		}
> +	} else if (times && ((times[0].tv_nsec == UTIME_NOW &&
> +			      times[1].tv_nsec == UTIME_OMIT)
> +			     ||
> +			     (times[0].tv_nsec == UTIME_OMIT &&
> +			      times[1].tv_nsec == UTIME_NOW))) {
> +		error = -EPERM;
> +
> +		if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
> +			goto mnt_drop_write_and_out;
> +
> +		if (!is_owner_or_cap(inode))
> +			goto mnt_drop_write_and_out;

I don't like adding _more_ owner checks to this function.  It would be
better if we were removing them: some weird filesystems want to do
their own permission checking and so the owner checks should really be
moved into inode_change_ok().

One way to do that could be to add a pseudo attribute flag,
e.g. ATTR_TIMES_UPDATE, that tells the permission checking code to
check the owner, even when neither ATTR_[AM]TIME_SET flag is set.

Even the check for the owner in the !times case could be removed, by
adding ATTR_TIMES_UPDATE only if we don't have write permission on the
file.  That's a cleanup I'd really be happy with.

All this may also be done with the ATTR_FORCE flag, but that would
mean:

  - modifying lots of call sites
  - making it impossible to selectively check the permission if
    multiple attributes are being modified (don't know if any callers
    want that though).

Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes -- version 2
       [not found]     ` <E1Jx3Gw-0002eA-55-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
@ 2008-05-17 19:57       ` Michael Kerrisk
  2008-05-19  9:50         ` Miklos Szeredi
  2008-05-30 15:34       ` [PATCH] utimensat() non-conformances and fixes [v3] Michael Kerrisk
  1 sibling, 1 reply; 33+ messages in thread
From: Michael Kerrisk @ 2008-05-17 19:57 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg,
	drepper-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Miklos,

Thanks for the comments.

Miklos Szeredi wrote:
>> This is a revised version of my earlier
>> (http://article.gmane.org/gmane.linux.man/129/ ) patch to fix
>> non-conformances in the utimensat() implementation.
>
> The patch looks functionally correct.  But there are several things I
> don't really like...

[..]

>> -	if (!times || (nsec_special(times[0].tv_nsec) &&
>> -		       nsec_special(times[1].tv_nsec))) {
>> +	if (!times || (times[0].tv_nsec == UTIME_NOW &&
>> +		       times[1].tv_nsec == UTIME_NOW)) {
>>  		error = -EACCES;
>> +
>
> How about explicitly turning UTIME_NOW/UTIME_NOW into times = NULL at
> the beginning of the function?  That would both simplify things and
> also make it absolutely sure that the two cases are handled the same
> way (which makes sense, and is also what the standard wants).

Yes, simple, and sensible.

>>                  if (IS_IMMUTABLE(inode))
>>  			goto mnt_drop_write_and_out;
>>
>> @@ -147,7 +145,20 @@
>>  					goto mnt_drop_write_and_out;
>>  			}
>>  		}
>> +	} else if (times && ((times[0].tv_nsec == UTIME_NOW &&
>> +			      times[1].tv_nsec == UTIME_OMIT)
>> +			     ||
>> +			     (times[0].tv_nsec == UTIME_OMIT &&
>> +			      times[1].tv_nsec == UTIME_NOW))) {
>> +		error = -EPERM;
>> +
>> +		if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
>> +			goto mnt_drop_write_and_out;
>> +
>> +		if (!is_owner_or_cap(inode))
>> +			goto mnt_drop_write_and_out;
>
> I don't like adding _more_ owner checks to this function.  It would be
> better if we were removing them: some weird filesystems want to do
> their own permission checking and so the owner checks should really be
> moved into inode_change_ok().
>
> One way to do that could be to add a pseudo attribute flag,
> e.g. ATTR_TIMES_UPDATE, that tells the permission checking code to
> check the owner, even when neither ATTR_[AM]TIME_SET flag is set.
>
> Even the check for the owner in the !times case could be removed, by
> adding ATTR_TIMES_UPDATE only if we don't have write permission on the
> file.  That's a cleanup I'd really be happy with.

See below.

> All this may also be done with the ATTR_FORCE flag, but that would
> mean:
>
>   - modifying lots of call sites
>   - making it impossible to selectively check the permission if
>     multiple attributes are being modified (don't know if any callers
>     want that though).

Okay, I won't go that route...

Regarding your suggestions above, are you meaning something like the
patch below?

The patch is a little less pretty than I'd like because of the need to
return EACCES or EPERM depending on whether (times == NULL).  In
particular, these lines in utimes.c:

+	if (!times && error == -EPERM)
+		error = -EACCES;

seem a little fragile (but maybe I worry too much).

Cheers,

Michael

========================

diff -ru linux-2.6.26-rc2/fs/attr.c linux-2.6.26-rc2-utimensat-fix/fs/attr.c
--- linux-2.6.26-rc2/fs/attr.c	2008-01-24 23:58:37.000000000 +0100
+++ linux-2.6.26-rc2-utimensat-fix/fs/attr.c	2008-05-16 21:56:51.000000000 +0200
@@ -51,7 +51,8 @@
 	}

 	/* Check for setting the inode time. */
-	if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET)) {
+	if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET |
+			ATTR_TIMES_UPDATE)) {
 		if (!is_owner_or_cap(inode))
 			goto error;
 	}
diff -ru linux-2.6.26-rc2/fs/utimes.c linux-2.6.26-rc2-utimensat-fix/fs/utimes.c
--- linux-2.6.26-rc2/fs/utimes.c	2008-05-15 10:33:20.000000000 +0200
+++ linux-2.6.26-rc2-utimensat-fix/fs/utimes.c	2008-05-16 22:14:31.000000000 +0200
@@ -40,14 +40,9 @@

 #endif

-static bool nsec_special(long nsec)
-{
-	return nsec == UTIME_OMIT || nsec == UTIME_NOW;
-}
-
 static bool nsec_valid(long nsec)
 {
-	if (nsec_special(nsec))
+	if (nsec == UTIME_OMIT || nsec == UTIME_NOW)
 		return true;

 	return nsec >= 0 && nsec <= 999999999;
@@ -102,11 +97,15 @@
 	if (error)
 		goto dput_and_out;

+	if (times && times[0].tv_nsec == UTIME_NOW &&
+		     times[1].tv_nsec == UTIME_NOW)
+		times = NULL;
+
 	/* Don't worry, the checks are done in inode_change_ok() */
 	newattrs.ia_valid = ATTR_CTIME | ATTR_MTIME | ATTR_ATIME;
 	if (times) {
 		error = -EPERM;
-                if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
+		if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
 			goto mnt_drop_write_and_out;

 		if (times[0].tv_nsec == UTIME_OMIT)
@@ -131,25 +130,39 @@
 	 * UTIME_NOW, then need to check permissions, because
 	 * inode_change_ok() won't do it.
 	 */
-	if (!times || (nsec_special(times[0].tv_nsec) &&
-		       nsec_special(times[1].tv_nsec))) {
+	if (!times) {
 		error = -EACCES;
                 if (IS_IMMUTABLE(inode))
 			goto mnt_drop_write_and_out;

-		if (!is_owner_or_cap(inode)) {
-			if (f) {
-				if (!(f->f_mode & FMODE_WRITE))
-					goto mnt_drop_write_and_out;
-			} else {
-				error = vfs_permission(&nd, MAY_WRITE);
-				if (error)
-					goto mnt_drop_write_and_out;
-			}
+		if (f) {
+			if (!(f->f_mode & FMODE_WRITE))
+				newattrs.ia_valid |= ATTR_TIMES_UPDATE;
+		} else {
+			error = vfs_permission(&nd, MAY_WRITE);
+			if (error == -EACCES)
+				newattrs.ia_valid |= ATTR_TIMES_UPDATE;
+			else if (error)
+				goto mnt_drop_write_and_out;
 		}
+	} else if (times && ((times[0].tv_nsec == UTIME_NOW &&
+			      times[1].tv_nsec == UTIME_OMIT)
+			     ||
+			     (times[0].tv_nsec == UTIME_OMIT &&
+			      times[1].tv_nsec == UTIME_NOW))) {
+		error = -EPERM;
+
+		if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
+			goto mnt_drop_write_and_out;
+
+		newattrs.ia_valid |= ATTR_TIMES_UPDATE;
 	}
+
 	mutex_lock(&inode->i_mutex);
 	error = notify_change(dentry, &newattrs);
+	if (!times && error == -EPERM)
+		error = -EACCES;
+		
 	mutex_unlock(&inode->i_mutex);
 mnt_drop_write_and_out:
 	mnt_drop_write(mnt);
diff -ru linux-2.6.26-rc2/include/linux/fs.h linux-2.6.26-rc2-utimensat-fix/include/linux/fs.h
--- linux-2.6.26-rc2/include/linux/fs.h	2008-05-15 10:33:25.000000000 +0200
+++ linux-2.6.26-rc2-utimensat-fix/include/linux/fs.h	2008-05-16 21:39:24.000000000 +0200
@@ -317,22 +317,23 @@
  * Attribute flags.  These should be or-ed together to figure out what
  * has been changed!
  */
-#define ATTR_MODE	1
-#define ATTR_UID	2
-#define ATTR_GID	4
-#define ATTR_SIZE	8
-#define ATTR_ATIME	16
-#define ATTR_MTIME	32
-#define ATTR_CTIME	64
-#define ATTR_ATIME_SET	128
-#define ATTR_MTIME_SET	256
-#define ATTR_FORCE	512	/* Not a change, but a change it */
-#define ATTR_ATTR_FLAG	1024
-#define ATTR_KILL_SUID	2048
-#define ATTR_KILL_SGID	4096
-#define ATTR_FILE	8192
-#define ATTR_KILL_PRIV	16384
-#define ATTR_OPEN	32768	/* Truncating from open(O_TRUNC) */
+#define ATTR_MODE		1
+#define ATTR_UID		2
+#define ATTR_GID		4
+#define ATTR_SIZE		8
+#define ATTR_ATIME		16
+#define ATTR_MTIME		32
+#define ATTR_CTIME		64
+#define ATTR_ATIME_SET		128
+#define ATTR_MTIME_SET		256
+#define ATTR_FORCE		512	/* Not a change, but a change it */
+#define ATTR_ATTR_FLAG		1024
+#define ATTR_KILL_SUID		2048
+#define ATTR_KILL_SGID		4096
+#define ATTR_FILE		8192
+#define ATTR_KILL_PRIV		16384
+#define ATTR_OPEN		32768	/* Truncating from open(O_TRUNC) */
+#define ATTR_TIMES_UPDATE	65536

 /*
  * This is the Inode Attributes structure, used for notify_change().  It
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes -- version 2
  2008-05-17 19:57       ` Michael Kerrisk
@ 2008-05-19  9:50         ` Miklos Szeredi
  2008-05-19 10:12           ` Miklos Szeredi
  2008-05-19 12:24           ` Michael Kerrisk
  0 siblings, 2 replies; 33+ messages in thread
From: Miklos Szeredi @ 2008-05-19  9:50 UTC (permalink / raw)
  To: mtk.manpages
  Cc: miklos, mtk.manpages, drepper, viro, akpm, linux-kernel,
	linux-man, linux-fsdevel

> Regarding your suggestions above, are you meaning something like the
> patch below?

Yes.

> The patch is a little less pretty than I'd like because of the need to
> return EACCES or EPERM depending on whether (times == NULL).  In
> particular, these lines in utimes.c:
> 
> +	if (!times && error == -EPERM)
> +		error = -EACCES;
> 
> seem a little fragile (but maybe I worry too much).

It's not only fragile, it's ugly as sin.  I'd rather do it this way:

- initialize error to zero
- if no write access then set error, and the ATTR_TIMES_UPDATE(*) flag
- set error2 from result of notify_change()
- if error is zero then return error2, otherwise return error

(*) I've been mulling over the name and perhaps ATTR_OWNER_CHECK would
be better, or something that implies that it's not really about
updating the times, but about checking the owner.

Also could you do the patch against the

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git vfs-cleanups

tree, which does big structural cleanups to do_utimes?

Thanks,
Miklos

> ========================
> 
> diff -ru linux-2.6.26-rc2/fs/attr.c linux-2.6.26-rc2-utimensat-fix/fs/attr.c
> --- linux-2.6.26-rc2/fs/attr.c	2008-01-24 23:58:37.000000000 +0100
> +++ linux-2.6.26-rc2-utimensat-fix/fs/attr.c	2008-05-16 21:56:51.000000000 +0200
> @@ -51,7 +51,8 @@
>  	}
> 
>  	/* Check for setting the inode time. */
> -	if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET)) {
> +	if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET |
> +			ATTR_TIMES_UPDATE)) {
>  		if (!is_owner_or_cap(inode))
>  			goto error;
>  	}
> diff -ru linux-2.6.26-rc2/fs/utimes.c linux-2.6.26-rc2-utimensat-fix/fs/utimes.c
> --- linux-2.6.26-rc2/fs/utimes.c	2008-05-15 10:33:20.000000000 +0200
> +++ linux-2.6.26-rc2-utimensat-fix/fs/utimes.c	2008-05-16 22:14:31.000000000 +0200
> @@ -40,14 +40,9 @@
> 
>  #endif
> 
> -static bool nsec_special(long nsec)
> -{
> -	return nsec == UTIME_OMIT || nsec == UTIME_NOW;
> -}
> -
>  static bool nsec_valid(long nsec)
>  {
> -	if (nsec_special(nsec))
> +	if (nsec == UTIME_OMIT || nsec == UTIME_NOW)
>  		return true;
> 
>  	return nsec >= 0 && nsec <= 999999999;
> @@ -102,11 +97,15 @@
>  	if (error)
>  		goto dput_and_out;
> 
> +	if (times && times[0].tv_nsec == UTIME_NOW &&
> +		     times[1].tv_nsec == UTIME_NOW)
> +		times = NULL;
> +
>  	/* Don't worry, the checks are done in inode_change_ok() */
>  	newattrs.ia_valid = ATTR_CTIME | ATTR_MTIME | ATTR_ATIME;
>  	if (times) {
>  		error = -EPERM;
> -                if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
> +		if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
>  			goto mnt_drop_write_and_out;
> 
>  		if (times[0].tv_nsec == UTIME_OMIT)
> @@ -131,25 +130,39 @@
>  	 * UTIME_NOW, then need to check permissions, because
>  	 * inode_change_ok() won't do it.
>  	 */
> -	if (!times || (nsec_special(times[0].tv_nsec) &&
> -		       nsec_special(times[1].tv_nsec))) {
> +	if (!times) {
>  		error = -EACCES;
>                  if (IS_IMMUTABLE(inode))
>  			goto mnt_drop_write_and_out;
> 
> -		if (!is_owner_or_cap(inode)) {
> -			if (f) {
> -				if (!(f->f_mode & FMODE_WRITE))
> -					goto mnt_drop_write_and_out;
> -			} else {
> -				error = vfs_permission(&nd, MAY_WRITE);
> -				if (error)
> -					goto mnt_drop_write_and_out;
> -			}
> +		if (f) {
> +			if (!(f->f_mode & FMODE_WRITE))
> +				newattrs.ia_valid |= ATTR_TIMES_UPDATE;
> +		} else {
> +			error = vfs_permission(&nd, MAY_WRITE);
> +			if (error == -EACCES)
> +				newattrs.ia_valid |= ATTR_TIMES_UPDATE;
> +			else if (error)
> +				goto mnt_drop_write_and_out;
>  		}
> +	} else if (times && ((times[0].tv_nsec == UTIME_NOW &&
> +			      times[1].tv_nsec == UTIME_OMIT)
> +			     ||
> +			     (times[0].tv_nsec == UTIME_OMIT &&
> +			      times[1].tv_nsec == UTIME_NOW))) {
> +		error = -EPERM;
> +
> +		if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
> +			goto mnt_drop_write_and_out;
> +
> +		newattrs.ia_valid |= ATTR_TIMES_UPDATE;
>  	}
> +
>  	mutex_lock(&inode->i_mutex);
>  	error = notify_change(dentry, &newattrs);
> +	if (!times && error == -EPERM)
> +		error = -EACCES;
> +		
>  	mutex_unlock(&inode->i_mutex);
>  mnt_drop_write_and_out:
>  	mnt_drop_write(mnt);
> diff -ru linux-2.6.26-rc2/include/linux/fs.h linux-2.6.26-rc2-utimensat-fix/include/linux/fs.h
> --- linux-2.6.26-rc2/include/linux/fs.h	2008-05-15 10:33:25.000000000 +0200
> +++ linux-2.6.26-rc2-utimensat-fix/include/linux/fs.h	2008-05-16 21:39:24.000000000 +0200
> @@ -317,22 +317,23 @@
>   * Attribute flags.  These should be or-ed together to figure out what
>   * has been changed!
>   */
> -#define ATTR_MODE	1
> -#define ATTR_UID	2
> -#define ATTR_GID	4
> -#define ATTR_SIZE	8
> -#define ATTR_ATIME	16
> -#define ATTR_MTIME	32
> -#define ATTR_CTIME	64
> -#define ATTR_ATIME_SET	128
> -#define ATTR_MTIME_SET	256
> -#define ATTR_FORCE	512	/* Not a change, but a change it */
> -#define ATTR_ATTR_FLAG	1024
> -#define ATTR_KILL_SUID	2048
> -#define ATTR_KILL_SGID	4096
> -#define ATTR_FILE	8192
> -#define ATTR_KILL_PRIV	16384
> -#define ATTR_OPEN	32768	/* Truncating from open(O_TRUNC) */
> +#define ATTR_MODE		1
> +#define ATTR_UID		2
> +#define ATTR_GID		4
> +#define ATTR_SIZE		8
> +#define ATTR_ATIME		16
> +#define ATTR_MTIME		32
> +#define ATTR_CTIME		64
> +#define ATTR_ATIME_SET		128
> +#define ATTR_MTIME_SET		256
> +#define ATTR_FORCE		512	/* Not a change, but a change it */
> +#define ATTR_ATTR_FLAG		1024
> +#define ATTR_KILL_SUID		2048
> +#define ATTR_KILL_SGID		4096
> +#define ATTR_FILE		8192
> +#define ATTR_KILL_PRIV		16384
> +#define ATTR_OPEN		32768	/* Truncating from open(O_TRUNC) */
> +#define ATTR_TIMES_UPDATE	65536
> 
>  /*
>   * This is the Inode Attributes structure, used for notify_change().  It
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes -- version 2
  2008-05-19  9:50         ` Miklos Szeredi
@ 2008-05-19 10:12           ` Miklos Szeredi
  2008-05-19 12:24           ` Michael Kerrisk
  1 sibling, 0 replies; 33+ messages in thread
From: Miklos Szeredi @ 2008-05-19 10:12 UTC (permalink / raw)
  To: mtk.manpages; +Cc: drepper, viro, akpm, linux-kernel, linux-man, linux-fsdevel

> It's not only fragile, it's ugly as sin.  I'd rather do it this way:
> 
> - initialize error to zero
> - if no write access then set error, and the ATTR_TIMES_UPDATE(*) flag
> - set error2 from result of notify_change()
> - if error is zero then return error2, otherwise return error

Something like this (haven't thought it through, totally untested,
etc...)

Miklos

---
 fs/utimes.c |   48 ++++++++++++++++++++++--------------------------
 1 file changed, 22 insertions(+), 26 deletions(-)

Index: linux-2.6/fs/utimes.c
===================================================================
--- linux-2.6.orig/fs/utimes.c	2008-05-17 08:50:01.000000000 +0200
+++ linux-2.6/fs/utimes.c	2008-05-19 12:08:18.000000000 +0200
@@ -53,7 +53,8 @@ static bool nsec_valid(long nsec)
 	return nsec >= 0 && nsec <= 999999999;
 }
 
-static int utimes_common(struct path *path, struct timespec *times)
+static int utimes_common(struct path *path, struct timespec *times,
+			 int write_error)
 {
 	int error;
 	struct iattr newattrs;
@@ -76,11 +77,18 @@ static int utimes_common(struct path *pa
 			newattrs.ia_mtime.tv_nsec = times[1].tv_nsec;
 			newattrs.ia_valid |= ATTR_MTIME_SET;
 		}
+		newattrs.ia_valid |= ATTR_OWNER_CHECK;
+	} else if (write_error) {
+		newattrs.ia_valid |= ATTR_OWNER_CHECK;
 	}
+
 	mutex_lock(&path->dentry->d_inode->i_mutex);
 	error = path_setattr(path, &newattrs);
 	mutex_unlock(&path->dentry->d_inode->i_mutex);
 
+	if (write_error && error)
+		error = write_error;
+
 	return error;
 }
 
@@ -97,21 +105,16 @@ static bool utimes_need_permission(struc
 static int do_futimes(int fd, struct timespec *times)
 {
 	int error;
+	int write_error = 0;
 	struct file *file = fget(fd);
 
 	if (!file)
 		return -EBADF;
 
-	if (utimes_need_permission(times)) {
-		struct inode *inode = file->f_path.dentry->d_inode;
+	if (!times && !(file->f_mode & FMODE_WRITE))
+		write_error = -EACCES;
 
-		error = -EACCES;
-		if (!is_owner_or_cap(inode) && !(file->f_mode & FMODE_WRITE))
-			goto out_fput;
-	}
-	error = utimes_common(&file->f_path, times);
-
- out_fput:
+	error = utimes_common(&file->f_path, times, write_error);
 	fput(file);
 
 	return error;
@@ -121,6 +124,7 @@ static int do_utimes_name(int dfd, char 
 			  struct timespec *times, int flags)
 {
 	int error;
+	int write_error = 0;
 	struct nameidata nd;
 	int lookup_flags;
 
@@ -132,23 +136,10 @@ static int do_utimes_name(int dfd, char 
 	if (error)
 		return error;
 
+	if (!times)
+		write_error = vfs_permission(&nd, MAY_WRITE);
 
-	if (utimes_need_permission(times)) {
-		struct inode *inode = nd.path.dentry->d_inode;
-
-		error = -EACCES;
-		if (IS_IMMUTABLE(inode))
-			goto out_path_put;
-
-		if (!is_owner_or_cap(inode)) {
-			error = vfs_permission(&nd, MAY_WRITE);
-			if (error)
-				goto out_path_put;
-		}
-	}
-	error = utimes_common(&nd.path, times);
-
- out_path_put:
+	error = utimes_common(&nd.path, times, write_error);
 	path_put(&nd.path);
 
 	return error;
@@ -177,6 +168,11 @@ int do_utimes(int dfd, char __user *file
 		return -EINVAL;
 	}
 
+	if (times && times[0].tv_nsec == UTIME_NOW &&
+	    times[1].tv_nsec == UTIME_NOW) {
+		times = NULL;
+	}
+
 	if (filename == NULL && dfd != AT_FDCWD) {
 		if (flags)
 			return -EINVAL;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes -- version 2
  2008-05-19  9:50         ` Miklos Szeredi
  2008-05-19 10:12           ` Miklos Szeredi
@ 2008-05-19 12:24           ` Michael Kerrisk
  2008-05-19 13:17             ` Miklos Szeredi
  1 sibling, 1 reply; 33+ messages in thread
From: Michael Kerrisk @ 2008-05-19 12:24 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: drepper, viro, akpm, linux-kernel, linux-man, linux-fsdevel

Miklos,

On Mon, May 19, 2008 at 11:50 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> Regarding your suggestions above, are you meaning something like the
>> patch below?
>
> Yes.
>
>> The patch is a little less pretty than I'd like because of the need to
>> return EACCES or EPERM depending on whether (times == NULL).  In
>> particular, these lines in utimes.c:
>>
>> +     if (!times && error == -EPERM)
>> +             error = -EACCES;
>>
>> seem a little fragile (but maybe I worry too much).
>
> It's not only fragile, it's ugly as sin.  I'd rather do it this way:
>
> - initialize error to zero
> - if no write access then set error, and the ATTR_TIMES_UPDATE(*) flag
> - set error2 from result of notify_change()
> - if error is zero then return error2, otherwise return error
>
> (*) I've been mulling over the name and perhaps ATTR_OWNER_CHECK would
> be better, or something that implies that it's not really about
> updating the times, but about checking the owner.

This all makes sense, but...

> Also could you do the patch against the
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git vfs-cleanups
>
> tree, which does big structural cleanups to do_utimes?

You keep moving the goalposts here...  First, it was provide an
obvious correct fix (for the non-conformances); then: can you cleanup
the owner checks; then: can you rewrite against my git tree...  My
time at the moment is fairly limited, and I have a problem: currently,
I'm not a git user.  That'll change soon, but I don't have the time to
change it now.  Can I just download a snapshot tarball of your git
changes somehwere?  Alternatively, when do you expect your changes to
make it into an rc?

Cheers,

Michael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes -- version 2
  2008-05-19 12:24           ` Michael Kerrisk
@ 2008-05-19 13:17             ` Miklos Szeredi
  0 siblings, 0 replies; 33+ messages in thread
From: Miklos Szeredi @ 2008-05-19 13:17 UTC (permalink / raw)
  To: mtk.manpages
  Cc: miklos, drepper, viro, akpm, linux-kernel, linux-man,
	linux-fsdevel

> > Also could you do the patch against the
> >
> >  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git vfs-cleanups
> >
> > tree, which does big structural cleanups to do_utimes?
> 
> You keep moving the goalposts here...  First, it was provide an
> obvious correct fix (for the non-conformances); then: can you cleanup
> the owner checks;

Sorry, that was just an idea, but since it's not as simple as it
should be, I guess we should leave that till later.  My main objection
was against introducing more is_owner_or_cap() checks.  Just doing the
times == NULL case with ATTR_OWNER_CHECK should be fine.

That reminds me, one more comment about the patch: if you are
reindenting the ATTR_* definitions anyway, why not also change them to
the cleaner (1 << N) format?

>  then: can you rewrite against my git tree...  My
> time at the moment is fairly limited, and I have a problem: currently,
> I'm not a git user.  That'll change soon, but I don't have the time to
> change it now.  Can I just download a snapshot tarball of your git
> changes somehwere?  Alternatively, when do you expect your changes to
> make it into an rc?

Here's the relevant part (dfe9b50d..aeb1fe4b) of that tree as a single
patch.  I hope it compiles.

Thanks,
Miklos



diff --git a/fs/attr.c b/fs/attr.c
index 966b73e..e8bd11b 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -108,6 +108,12 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
 	struct timespec now;
 	unsigned int ia_valid = attr->ia_valid;
 
+	if (ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID |
+			ATTR_ATIME_SET | ATTR_MTIME_SET)) {
+		if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
+			return -EPERM;
+	}
+
 	now = current_fs_time(inode->i_sb);
 
 	attr->ia_ctime = now;
diff --git a/fs/exec.c b/fs/exec.c
index 1f8a24a..b68682a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1763,7 +1763,7 @@ int do_coredump(long signr, int exit_code, struct pt_regs * regs)
 		goto close_fail;
 	if (!file->f_op->write)
 		goto close_fail;
-	if (!ispipe && do_truncate(file->f_path.dentry, 0, 0, file) != 0)
+	if (!ispipe && file_truncate(file, 0, 0) != 0)
 		goto close_fail;
 
 	retval = binfmt->core_dump(signr, regs, file, core_limit);
diff --git a/fs/open.c b/fs/open.c
index a145008..bb604a6 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -195,6 +195,13 @@ out:
 	return error;
 }
 
+/*
+ * do_truncate - truncate (or extend) an inode
+ * @dentry: the dentry to truncate
+ * @length: the new length
+ * @time_attrs: file times to be updated (e.g. ATTR_MTIME|ATTR_CTIME)
+ * @filp: an open file or NULL (see file_truncate() as well)
+ */
 int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
 	struct file *filp)
 {
@@ -221,6 +228,17 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
 	return err;
 }
 
+/*
+ * file_truncate - truncate (or extend) an open file
+ * @filp: the open file
+ * @length: the new length
+ * @time_attrs: file times to be updated (e.g. ATTR_MTIME|ATTR_CTIME)
+ */
+int file_truncate(struct file *filp, loff_t length, unsigned int time_attrs)
+{
+	return do_truncate(filp->f_path.dentry, length, time_attrs, filp);
+}
+
 static long do_sys_truncate(const char __user * path, loff_t length)
 {
 	struct nameidata nd;
@@ -254,7 +272,7 @@ static long do_sys_truncate(const char __user * path, loff_t length)
 		goto mnt_drop_write_and_out;
 
 	error = -EPERM;
-	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
+	if (IS_APPEND(inode))
 		goto mnt_drop_write_and_out;
 
 	error = get_write_access(inode);
@@ -294,7 +312,6 @@ asmlinkage long sys_truncate(const char __user * path, unsigned long length)
 static long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
 {
 	struct inode * inode;
-	struct dentry *dentry;
 	struct file * file;
 	int error;
 
@@ -310,8 +327,7 @@ static long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
 	if (file->f_flags & O_LARGEFILE)
 		small = 0;
 
-	dentry = file->f_path.dentry;
-	inode = dentry->d_inode;
+	inode = file->f_path.dentry->d_inode;
 	error = -EINVAL;
 	if (!S_ISREG(inode->i_mode) || !(file->f_mode & FMODE_WRITE))
 		goto out_putf;
@@ -327,7 +343,7 @@ static long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
 
 	error = locks_verify_truncate(inode, file, length);
 	if (!error)
-		error = do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, file);
+		error = file_truncate(file, length, ATTR_MTIME|ATTR_CTIME);
 out_putf:
 	fput(file);
 out:
@@ -582,9 +598,6 @@ asmlinkage long sys_fchmod(unsigned int fd, mode_t mode)
 	err = mnt_want_write(file->f_path.mnt);
 	if (err)
 		goto out_putf;
-	err = -EPERM;
-	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-		goto out_drop_write;
 	mutex_lock(&inode->i_mutex);
 	if (mode == (mode_t) -1)
 		mode = inode->i_mode;
@@ -592,8 +605,6 @@ asmlinkage long sys_fchmod(unsigned int fd, mode_t mode)
 	newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
 	err = notify_change(dentry, &newattrs);
 	mutex_unlock(&inode->i_mutex);
-
-out_drop_write:
 	mnt_drop_write(file->f_path.mnt);
 out_putf:
 	fput(file);
@@ -617,11 +628,6 @@ asmlinkage long sys_fchmodat(int dfd, const char __user *filename,
 	error = mnt_want_write(nd.path.mnt);
 	if (error)
 		goto dput_and_out;
-
-	error = -EPERM;
-	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-		goto out_drop_write;
-
 	mutex_lock(&inode->i_mutex);
 	if (mode == (mode_t) -1)
 		mode = inode->i_mode;
@@ -629,8 +635,6 @@ asmlinkage long sys_fchmodat(int dfd, const char __user *filename,
 	newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
 	error = notify_change(nd.path.dentry, &newattrs);
 	mutex_unlock(&inode->i_mutex);
-
-out_drop_write:
 	mnt_drop_write(nd.path.mnt);
 dput_and_out:
 	path_put(&nd.path);
@@ -645,18 +649,10 @@ asmlinkage long sys_chmod(const char __user *filename, mode_t mode)
 
 static int chown_common(struct dentry * dentry, uid_t user, gid_t group)
 {
-	struct inode * inode;
+	struct inode *inode = dentry->d_inode;
 	int error;
 	struct iattr newattrs;
 
-	error = -ENOENT;
-	if (!(inode = dentry->d_inode)) {
-		printk(KERN_ERR "chown_common: NULL inode\n");
-		goto out;
-	}
-	error = -EPERM;
-	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-		goto out;
 	newattrs.ia_valid =  ATTR_CTIME;
 	if (user != (uid_t) -1) {
 		newattrs.ia_valid |= ATTR_UID;
@@ -672,7 +668,7 @@ static int chown_common(struct dentry * dentry, uid_t user, gid_t group)
 	mutex_lock(&inode->i_mutex);
 	error = notify_change(dentry, &newattrs);
 	mutex_unlock(&inode->i_mutex);
-out:
+
 	return error;
 }
 
diff --git a/fs/utimes.c b/fs/utimes.c
index af059d5..cccefe4 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -53,62 +53,14 @@ static bool nsec_valid(long nsec)
 	return nsec >= 0 && nsec <= 999999999;
 }
 
-/* If times==NULL, set access and modification to current time,
- * must be owner or have write permission.
- * Else, update from *times, must be owner or super user.
- */
-long do_utimes(int dfd, char __user *filename, struct timespec *times, int flags)
+static int utimes_common(struct path *path, struct timespec *times)
 {
 	int error;
-	struct nameidata nd;
-	struct dentry *dentry;
-	struct inode *inode;
 	struct iattr newattrs;
-	struct file *f = NULL;
-	struct vfsmount *mnt;
-
-	error = -EINVAL;
-	if (times && (!nsec_valid(times[0].tv_nsec) ||
-		      !nsec_valid(times[1].tv_nsec))) {
-		goto out;
-	}
-
-	if (flags & ~AT_SYMLINK_NOFOLLOW)
-		goto out;
-
-	if (filename == NULL && dfd != AT_FDCWD) {
-		error = -EINVAL;
-		if (flags & AT_SYMLINK_NOFOLLOW)
-			goto out;
-
-		error = -EBADF;
-		f = fget(dfd);
-		if (!f)
-			goto out;
-		dentry = f->f_path.dentry;
-		mnt = f->f_path.mnt;
-	} else {
-		error = __user_walk_fd(dfd, filename, (flags & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW, &nd);
-		if (error)
-			goto out;
-
-		dentry = nd.path.dentry;
-		mnt = nd.path.mnt;
-	}
-
-	inode = dentry->d_inode;
-
-	error = mnt_want_write(mnt);
-	if (error)
-		goto dput_and_out;
 
 	/* Don't worry, the checks are done in inode_change_ok() */
 	newattrs.ia_valid = ATTR_CTIME | ATTR_MTIME | ATTR_ATIME;
 	if (times) {
-		error = -EPERM;
-                if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
-			goto mnt_drop_write_and_out;
-
 		if (times[0].tv_nsec == UTIME_OMIT)
 			newattrs.ia_valid &= ~ATTR_ATIME;
 		else if (times[0].tv_nsec != UTIME_NOW) {
@@ -125,41 +77,118 @@ long do_utimes(int dfd, char __user *filename, struct timespec *times, int flags
 			newattrs.ia_valid |= ATTR_MTIME_SET;
 		}
 	}
+	mutex_lock(&path->dentry->d_inode->i_mutex);
+	error = mnt_want_write(path->mnt);
+	if (!error) {
+		error = notify_change(path->dentry, &newattrs);
+		mnt_drop_write(path->mnt);
+	}
+	mutex_unlock(&path->dentry->d_inode->i_mutex);
+
+	return error;
+}
+
+/*
+ * If times is NULL or both times are either UTIME_OMIT or UTIME_NOW, then
+ * need to check permissions, because inode_change_ok() won't do it.
+ */
+static bool utimes_need_permission(struct timespec *times)
+{
+	return !times || (nsec_special(times[0].tv_nsec) &&
+			  nsec_special(times[1].tv_nsec));
+}
+
+static int do_futimes(int fd, struct timespec *times)
+{
+	int error;
+	struct file *file = fget(fd);
+
+	if (!file)
+		return -EBADF;
+
+	if (utimes_need_permission(times)) {
+		struct inode *inode = file->f_path.dentry->d_inode;
+
+		error = -EACCES;
+		if (!is_owner_or_cap(inode) && !(file->f_mode & FMODE_WRITE))
+			goto out_fput;
+	}
+	error = utimes_common(&file->f_path, times);
+
+ out_fput:
+	fput(file);
+
+	return error;
+}
+
+static int do_utimes_name(int dfd, char __user *filename,
+			  struct timespec *times, int flags)
+{
+	int error;
+	struct nameidata nd;
+	int lookup_flags;
+
+	if (flags & ~AT_SYMLINK_NOFOLLOW)
+		return -EINVAL;
+
+	lookup_flags = (flags & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW;
+	error = __user_walk_fd(dfd, filename, lookup_flags, &nd);
+	if (error)
+		return error;
+
+
+	if (utimes_need_permission(times)) {
+		struct inode *inode = nd.path.dentry->d_inode;
 
-	/*
-	 * If times is NULL or both times are either UTIME_OMIT or
-	 * UTIME_NOW, then need to check permissions, because
-	 * inode_change_ok() won't do it.
-	 */
-	if (!times || (nsec_special(times[0].tv_nsec) &&
-		       nsec_special(times[1].tv_nsec))) {
 		error = -EACCES;
-                if (IS_IMMUTABLE(inode))
-			goto mnt_drop_write_and_out;
+		if (IS_IMMUTABLE(inode))
+			goto out_path_put;
 
 		if (!is_owner_or_cap(inode)) {
-			if (f) {
-				if (!(f->f_mode & FMODE_WRITE))
-					goto mnt_drop_write_and_out;
-			} else {
-				error = vfs_permission(&nd, MAY_WRITE);
-				if (error)
-					goto mnt_drop_write_and_out;
-			}
+			error = vfs_permission(&nd, MAY_WRITE);
+			if (error)
+				goto out_path_put;
 		}
 	}
-	mutex_lock(&inode->i_mutex);
-	error = notify_change(dentry, &newattrs);
-	mutex_unlock(&inode->i_mutex);
-mnt_drop_write_and_out:
-	mnt_drop_write(mnt);
-dput_and_out:
-	if (f)
-		fput(f);
-	else
-		path_put(&nd.path);
-out:
+	error = utimes_common(&nd.path, times);
+
+ out_path_put:
+	path_put(&nd.path);
+
 	return error;
+
+}
+
+/*
+ * do_utimes - change times on filename or file descriptor
+ * @dfd: open file descriptor, -1 or AT_FDCWD
+ * @filename: path name or NULL
+ * @times: new times or NULL
+ * @flags: zero or more flags (only AT_SYMLINK_NOFOLLOW for the moment)
+ *
+ * If filename is NULL and dfd refers to an open file, then operate on
+ * the file.  Otherwise look up filename, possibly using dfd as a
+ * starting point.
+ *
+ * If times==NULL, set access and modification to current time,
+ * must be owner or have write permission.
+ * Else, update from *times, must be owner or super user.
+ */
+int do_utimes(int dfd, char __user *filename, struct timespec *times, int flags)
+{
+	if (times && (!nsec_valid(times[0].tv_nsec) ||
+		      !nsec_valid(times[1].tv_nsec))) {
+		return -EINVAL;
+	}
+
+	if (filename == NULL && dfd != AT_FDCWD) {
+		if (flags)
+			return -EINVAL;
+
+		return do_futimes(dfd, times);
+	} else {
+		return do_utimes_name(dfd, filename, times, flags);
+	}
 }
 
 asmlinkage long sys_utimensat(int dfd, char __user *filename, struct timespec __user *utimes, int flags)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f413085..5b382ca 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1600,6 +1600,8 @@ static inline int break_lease(struct inode *inode, unsigned int mode)
 
 extern int do_truncate(struct dentry *, loff_t start, unsigned int time_attrs,
 		       struct file *filp);
+extern int file_truncate(struct file *filp, loff_t start,
+			 unsigned int time_attrs);
 extern long do_sys_open(int dfd, const char __user *filename, int flags,
 			int mode);
 extern struct file *filp_open(const char *, int, int);
diff --git a/include/linux/time.h b/include/linux/time.h
index d32ef0a..d757ffc 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -109,7 +109,8 @@ extern void do_gettimeofday(struct timeval *tv);
 extern int do_settimeofday(struct timespec *tv);
 extern int do_sys_settimeofday(struct timespec *tv, struct timezone *tz);
 #define do_posix_clock_monotonic_gettime(ts) ktime_get_ts(ts)
-extern long do_utimes(int dfd, char __user *filename, struct timespec *times, int flags);
+extern int do_utimes(int dfd, char __user *filename, struct timespec *times,
+		     int flags);
 struct itimerval;
 extern int do_setitimer(int which, struct itimerval *value,
 			struct itimerval *ovalue);
diff --git a/mm/tiny-shmem.c b/mm/tiny-shmem.c
index ae532f5..65226ce 100644
--- a/mm/tiny-shmem.c
+++ b/mm/tiny-shmem.c
@@ -80,7 +80,7 @@ struct file *shmem_file_setup(char *name, loff_t size, unsigned long flags)
 	inode->i_nlink = 0;	/* It is unlinked */
 
 	/* notify everyone as to the change of file size */
-	error = do_truncate(dentry, size, 0, file);
+	error = file_truncate(file, size, 0);
 	if (error < 0)
 		goto close_file;
 

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH] utimensat() non-conformances and fixes [v3]
       [not found]     ` <E1Jx3Gw-0002eA-55-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
  2008-05-17 19:57       ` Michael Kerrisk
@ 2008-05-30 15:34       ` Michael Kerrisk
       [not found]         ` <48401E7E.9090304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2008-06-03 11:05         ` Michael Kerrisk
  1 sibling, 2 replies; 33+ messages in thread
From: Michael Kerrisk @ 2008-05-30 15:34 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: drepper-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Hi Miklos,

Here's a further version of the patch, against 2.6.26rc2, with the
2008-05-19 git changes you sent me applied.  This patch is based on
the draft patch you sent me.  I've tested this version of the patch,
and it works as for all cases except the one mentioned below.  But
note the following points:

1) I didn't make use of the code in notify_change() that checks
IS_IMMUTABLE() and IS_APPEND() (i.e., I did not add
ATTR_OWNER_CHECK to the mask in the controlling if statement).
Doing this can't easily be made to work for the
do_futimes() case without reworking the arguments passed to
notify_change().  Actually, I'm inclined to doubt whether it
is a good idea to try to roll that check into notify_change() --
at least for utimensat() it seems simpler to not do so.

2) I've found yet another divergence from the spec -- but this
was in the original implementation, rather than being
something that has been introduced.  In do_futimes() there is

        if (!times && !(file->f_mode & FMODE_WRITE))
                write_error = -EACCES;

However, the check here should not be against the f_mode (file access
mode), but the against actual permission of the file referred to by
the underlying descriptor.  This means that for the do_futimes() +
times==NULL case, a set-user-ID root program could open a file
descriptor O_RDWR/O_WRONLY for which the real UID does not have write
access, and then even after reverting the the effective UID, the real
user could still update file.

I'm not sure of the correct way to get the required nameidata (to do a
vfs_permission() call) from the file descriptor.  Can you give me a
tip there?

Cheers,

Michael

--- linux-2.6.26-rc2-miklos.git-080519/fs/utimes.c	2008-05-19 17:40:37.000000000 +0200
+++ linux-2.6.26-rc2-miklos.git-080519-utimensat-fix-v3/fs/utimes.c	2008-05-30 16:29:00.000000000 +0200
@@ -53,14 +53,19 @@
 	return nsec >= 0 && nsec <= 999999999;
 }

-static int utimes_common(struct path *path, struct timespec *times)
+static int utimes_common(struct path *path, struct timespec *times,
+			 int write_error)
 {
 	int error;
 	struct iattr newattrs;
+	struct inode *inode = path->dentry->d_inode;

 	/* Don't worry, the checks are done in inode_change_ok() */
 	newattrs.ia_valid = ATTR_CTIME | ATTR_MTIME | ATTR_ATIME;
 	if (times) {
+		if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
+			return -EPERM;
+
 		if (times[0].tv_nsec == UTIME_OMIT)
 			newattrs.ia_valid &= ~ATTR_ATIME;
 		else if (times[0].tv_nsec != UTIME_NOW) {
@@ -76,7 +81,15 @@
 			newattrs.ia_mtime.tv_nsec = times[1].tv_nsec;
 			newattrs.ia_valid |= ATTR_MTIME_SET;
 		}
+		newattrs.ia_valid |= ATTR_OWNER_CHECK;
+	} else {
+                if (IS_IMMUTABLE(inode))
+			return -EACCES;
+
+		if (write_error)
+			newattrs.ia_valid |= ATTR_OWNER_CHECK;
 	}
+
 	mutex_lock(&path->dentry->d_inode->i_mutex);
 	error = mnt_want_write(path->mnt);
 	if (!error) {
@@ -85,37 +98,26 @@
 	}
 	mutex_unlock(&path->dentry->d_inode->i_mutex);

-	return error;
-}
+	if (write_error && error)
+		error = write_error;

-/*
- * If times is NULL or both times are either UTIME_OMIT or UTIME_NOW, then
- * need to check permissions, because inode_change_ok() won't do it.
- */
-static bool utimes_need_permission(struct timespec *times)
-{
-	return !times || (nsec_special(times[0].tv_nsec) &&
-			  nsec_special(times[1].tv_nsec));
+	return error;
 }

 static int do_futimes(int fd, struct timespec *times)
 {
 	int error;
+	int write_error = 0;
 	struct file *file = fget(fd);

 	if (!file)
 		return -EBADF;

-	if (utimes_need_permission(times)) {
-		struct inode *inode = file->f_path.dentry->d_inode;
+	if (!times && !(file->f_mode & FMODE_WRITE))
+		write_error = -EACCES;

-		error = -EACCES;
-		if (!is_owner_or_cap(inode) && !(file->f_mode & FMODE_WRITE))
-			goto out_fput;
-	}
-	error = utimes_common(&file->f_path, times);
+	error = utimes_common(&file->f_path, times, write_error);

- out_fput:
 	fput(file);

 	return error;
@@ -125,6 +127,7 @@
 			  struct timespec *times, int flags)
 {
 	int error;
+	int write_error = 0;
 	struct nameidata nd;
 	int lookup_flags;

@@ -136,23 +139,10 @@
 	if (error)
 		return error;

+	if (!times)
+		write_error = vfs_permission(&nd, MAY_WRITE);

-	if (utimes_need_permission(times)) {
-		struct inode *inode = nd.path.dentry->d_inode;
-
-		error = -EACCES;
-		if (IS_IMMUTABLE(inode))
-			goto out_path_put;
-
-		if (!is_owner_or_cap(inode)) {
-			error = vfs_permission(&nd, MAY_WRITE);
-			if (error)
-				goto out_path_put;
-		}
-	}
-	error = utimes_common(&nd.path, times);
-
- out_path_put:
+	error = utimes_common(&nd.path, times, write_error);
 	path_put(&nd.path);

 	return error;
@@ -181,6 +171,10 @@
 		return -EINVAL;
 	}

+	if (times && times[0].tv_nsec == UTIME_NOW &&
+		     times[1].tv_nsec == UTIME_NOW)
+		times = NULL;
+
 	if (filename == NULL && dfd != AT_FDCWD) {
 		if (flags)
 			return -EINVAL;
--- linux-2.6.26-rc2-miklos.git-080519/fs/attr.c	2008-05-19 17:40:37.000000000 +0200
+++ linux-2.6.26-rc2-miklos.git-080519-utimensat-fix-v3/fs/attr.c	2008-05-30 15:33:21.000000000 +0200
@@ -51,7 +51,8 @@
 	}

 	/* Check for setting the inode time. */
-	if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET)) {
+	if (ia_valid & (ATTR_MTIME_SET | ATTR_ATIME_SET |
+			ATTR_OWNER_CHECK)) {
 		if (!is_owner_or_cap(inode))
 			goto error;
 	}
--- linux-2.6.26-rc2-miklos.git-080519/include/linux/fs.h	2008-05-19 17:40:37.000000000 +0200
+++ linux-2.6.26-rc2-miklos.git-080519-utimensat-fix-v3/include/linux/fs.h	2008-05-29 07:08:50.000000000 +0200
@@ -317,22 +317,23 @@
  * Attribute flags.  These should be or-ed together to figure out what
  * has been changed!
  */
-#define ATTR_MODE	1
-#define ATTR_UID	2
-#define ATTR_GID	4
-#define ATTR_SIZE	8
-#define ATTR_ATIME	16
-#define ATTR_MTIME	32
-#define ATTR_CTIME	64
-#define ATTR_ATIME_SET	128
-#define ATTR_MTIME_SET	256
-#define ATTR_FORCE	512	/* Not a change, but a change it */
-#define ATTR_ATTR_FLAG	1024
-#define ATTR_KILL_SUID	2048
-#define ATTR_KILL_SGID	4096
-#define ATTR_FILE	8192
-#define ATTR_KILL_PRIV	16384
-#define ATTR_OPEN	32768	/* Truncating from open(O_TRUNC) */
+#define ATTR_MODE	    (1 << 0)
+#define ATTR_UID	    (1 << 1)
+#define ATTR_GID	    (1 << 2)
+#define ATTR_SIZE	    (1 << 3)
+#define ATTR_ATIME	    (1 << 4)
+#define ATTR_MTIME	    (1 << 5)
+#define ATTR_CTIME	    (1 << 6)
+#define ATTR_ATIME_SET	    (1 << 7)
+#define ATTR_MTIME_SET	    (1 << 8)
+#define ATTR_FORCE	    (1 << 9)	/* Not a change, but a change it */
+#define ATTR_ATTR_FLAG	    (1 << 10)
+#define ATTR_KILL_SUID	    (1 << 11)
+#define ATTR_KILL_SGID	    (1 << 12)
+#define ATTR_FILE	    (1 << 13)
+#define ATTR_KILL_PRIV	    (1 << 14)
+#define ATTR_OPEN	    (1 << 15)	/* Truncating from open(O_TRUNC) */
+#define ATTR_OWNER_CHECK    (1 << 16)

 /*
  * This is the Inode Attributes structure, used for notify_change().  It


--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
       [not found]         ` <48401E7E.9090304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2008-05-30 16:37           ` Miklos Szeredi
  2008-05-30 18:24             ` Michael Kerrisk
  0 siblings, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2008-05-30 16:37 UTC (permalink / raw)
  To: mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg
  Cc: miklos-sUDqSbJrdHQHWmgEVkV9KA, drepper-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

> Here's a further version of the patch, against 2.6.26rc2, with the
> 2008-05-19 git changes you sent me applied.  This patch is based on
> the draft patch you sent me.  I've tested this version of the patch,
> and it works as for all cases except the one mentioned below.  But
> note the following points:
> 
> 1) I didn't make use of the code in notify_change() that checks
> IS_IMMUTABLE() and IS_APPEND() (i.e., I did not add
> ATTR_OWNER_CHECK to the mask in the controlling if statement).
> Doing this can't easily be made to work for the
> do_futimes() case without reworking the arguments passed to
> notify_change().  Actually, I'm inclined to doubt whether it
> is a good idea to try to roll that check into notify_change() --
> at least for utimensat() it seems simpler to not do so.

Ugh...  Could we just omit this part (the if !times and write error
then check owner)?  I know it was my idea, but

 a) my ideas are often stupid
 b) one patch should ideally do just one thing

After we fixed the original issue, we can still think about this other
thing :)

> 2) I've found yet another divergence from the spec -- but this
> was in the original implementation, rather than being
> something that has been introduced.  In do_futimes() there is
> 
>         if (!times && !(file->f_mode & FMODE_WRITE))
>                 write_error = -EACCES;
> 
> However, the check here should not be against the f_mode (file access
> mode), but the against actual permission of the file referred to by
> the underlying descriptor.  This means that for the do_futimes() +
> times==NULL case, a set-user-ID root program could open a file
> descriptor O_RDWR/O_WRONLY for which the real UID does not have write
> access, and then even after reverting the the effective UID, the real
> user could still update file.

Sure, but so could a write(2), so that doesn't seem such a big
problem.

I think we should leave it this way, since changing it would affect
not just utimensat() and futimesat() but utime() and utimes() as well,
which are well established, old interfaces.  Shanging them could in
theory break userspace, which we try to avoid if possible.

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
  2008-05-30 16:37           ` Miklos Szeredi
@ 2008-05-30 18:24             ` Michael Kerrisk
  2008-05-30 19:22               ` Miklos Szeredi
                                 ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Michael Kerrisk @ 2008-05-30 18:24 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: drepper, viro, akpm, linux-kernel, linux-man, linux-fsdevel

Miklos,

On Fri, May 30, 2008 at 6:37 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> Here's a further version of the patch, against 2.6.26rc2, with the
>> 2008-05-19 git changes you sent me applied.  This patch is based on
>> the draft patch you sent me.  I've tested this version of the patch,
>> and it works as for all cases except the one mentioned below.  But
>> note the following points:
>>
>> 1) I didn't make use of the code in notify_change() that checks
>> IS_IMMUTABLE() and IS_APPEND() (i.e., I did not add
>> ATTR_OWNER_CHECK to the mask in the controlling if statement).
>> Doing this can't easily be made to work for the
>> do_futimes() case without reworking the arguments passed to
>> notify_change().  Actually, I'm inclined to doubt whether it
>> is a good idea to try to roll that check into notify_change() --
>> at least for utimensat() it seems simpler to not do so.
>
> Ugh...  Could we just omit this part (the if !times and write error
> then check owner)?  I know it was my idea, but
>
>  a) my ideas are often stupid
>  b) one patch should ideally do just one thing
>
> After we fixed the original issue, we can still think about this other
> thing :)

Okay, by now quite a bit of my time has been wasted, and my patience
is starting to get a little thin.

I already fixed most of the isues with utimensat() in my previous
version of the patch several days back, and that patch (probably
still) applies against current mainline.  The one issue that wasn't
fixed by my earlier patch was the one below, which I've only just
today noticed.

>> 2) I've found yet another divergence from the spec -- but this
>> was in the original implementation, rather than being
>> something that has been introduced.  In do_futimes() there is
>>
>>         if (!times && !(file->f_mode & FMODE_WRITE))
>>                 write_error = -EACCES;
>>
>> However, the check here should not be against the f_mode (file access
>> mode), but the against actual permission of the file referred to by
>> the underlying descriptor.  This means that for the do_futimes() +
>> times==NULL case, a set-user-ID root program could open a file
>> descriptor O_RDWR/O_WRONLY for which the real UID does not have write
>> access, and then even after reverting the the effective UID, the real
>> user could still update file.
>
> Sure, but so could a write(2), so that doesn't seem such a big
> problem.
>
> I think we should leave it this way, since changing it would affect
> not just utimensat() and futimesat() but utime() and utimes() as well,
> which are well established, old interfaces.  Shanging them could in
> theory break userspace, which we try to avoid if possible.

It is a problem, because every portable user program that uses this
interfaces, and relies on this corner of behavior, will have to
special case for Linux.  Can you tell me one good reason why we should
do that?  (And preserving bugs because fixing them would break the ABI
is not a good reason.)

The relevant interfaces here are:

utimensat()
futimesat()
utime()
utimes()
futimens() -- because implemented in glibc via utimensat() with path==NULL.

* utime() and utimes() can't be affected by this point: they don't use
file descriptors.
* futimesat() doesn't matter: it is a non-standad interface that was
prematurely added to the kernel, and then promptly replaced with
utimensat().  No other OS will add this interface, and no-one will
ever use it on Linux.  Its manual page will very soon say as much.
* utimensat() and futimens() matter, because they are currently broken
on this point (as well as a number of others).

This is a bug.  It is one of *several* bugs in the original
implementation of the utimensat()/futimens() interface.  All of them
should be fixed.  I have by now provided fixes for most of them.  (Not
point 2 above, but with a little help that should be quickly fixed as
well.)  At this point, I think you need to explain why you think those
fixes shouldn't be applied.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
  2008-05-30 18:24             ` Michael Kerrisk
@ 2008-05-30 19:22               ` Miklos Szeredi
       [not found]                 ` <E1K2ABK-0002ck-UT-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
       [not found]               ` <cfd18e0f0805301124o5f217dden10726b268d05d81a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2008-05-30 20:17               ` Andrew Morton
  2 siblings, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2008-05-30 19:22 UTC (permalink / raw)
  To: mtk.manpages
  Cc: miklos, drepper, viro, akpm, linux-kernel, linux-man,
	linux-fsdevel

> >> Here's a further version of the patch, against 2.6.26rc2, with the
> >> 2008-05-19 git changes you sent me applied.  This patch is based on
> >> the draft patch you sent me.  I've tested this version of the patch,
> >> and it works as for all cases except the one mentioned below.  But
> >> note the following points:
> >>
> >> 1) I didn't make use of the code in notify_change() that checks
> >> IS_IMMUTABLE() and IS_APPEND() (i.e., I did not add
> >> ATTR_OWNER_CHECK to the mask in the controlling if statement).
> >> Doing this can't easily be made to work for the
> >> do_futimes() case without reworking the arguments passed to
> >> notify_change().  Actually, I'm inclined to doubt whether it
> >> is a good idea to try to roll that check into notify_change() --
> >> at least for utimensat() it seems simpler to not do so.
> >
> > Ugh...  Could we just omit this part (the if !times and write error
> > then check owner)?  I know it was my idea, but
> >
> >  a) my ideas are often stupid
> >  b) one patch should ideally do just one thing
> >
> > After we fixed the original issue, we can still think about this other
> > thing :)
> 
> Okay, by now quite a bit of my time has been wasted, and my patience
> is starting to get a little thin.

I understand your frustration, but actually I did say this the last
time as well:

  "Sorry, that was just an idea, but since it's not as simple as it
   should be, I guess we should leave that till later.  My main
   objection was against introducing more is_owner_or_cap() checks.
   Just doing the times == NULL case with ATTR_OWNER_CHECK should be
   fine."

Yeah, I may have been more explicit...

> The relevant interfaces here are:
> 
> utimensat()
> futimesat()
> utime()
> utimes()
> futimens() -- because implemented in glibc via utimensat() with path==NULL.
> 
> * utime() and utimes() can't be affected by this point: they don't use
> file descriptors.

OK, you're right.  I've overlooked this point.

So as long as we believe that nobody is using the futime*() interfaces
in the way you described, which is highly probable, then we can fix
that as well.  Which is actually a nice thing, because it means the
permission checking for the times == NULL case can move from both
do_utimes_name() and do_futimes() into utimes_common().

So let's make two patches, and let's forget about the write_error
thing for now:

1)
  - turn UTIME_NOW/UTIME_NOW into times = NULL
  - for times != NULL set ATTR_OWNER_CHECK

2)
  - move times == NULL permission check into utimes_common.

For 2) you can just use permission() instead of vfs_permission() which
is exactly the same in this case (and consolidated into a common
function later in the vfs-cleanups tree).

OK?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
       [not found]                 ` <E1K2ABK-0002ck-UT-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
@ 2008-05-30 19:32                   ` Matthew Wilcox
       [not found]                     ` <20080530193207.GB28074-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Matthew Wilcox @ 2008-05-30 19:32 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg,
	drepper-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Fri, May 30, 2008 at 09:22:30PM +0200, Miklos Szeredi wrote:
> For 2) you can just use permission() instead of vfs_permission() which
> is exactly the same in this case (and consolidated into a common
> function later in the vfs-cleanups tree).

Miklos, don't delude people into thinking the vfs-cleanups tree is ever
going to get merged in its current state.  Al's NAKed the path_* stuff,
Christoph's NAKed it too.  Ignoring them and putting up a VFS tree of
your own is not going to help matters.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
       [not found]               ` <cfd18e0f0805301124o5f217dden10726b268d05d81a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-05-30 19:43                 ` Michael Kerrisk
       [not found]                   ` <cfd18e0f0805301243h7d862963o8320a2c1f48942ce-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Kerrisk @ 2008-05-30 19:43 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: drepper-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Miklos,

You omitted to answer my question, the last sentence below:

On Fri, May 30, 2008 at 8:24 PM, Michael Kerrisk
<mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
> Miklos,
>
> I already fixed most of the isues with utimensat() in my previous
> version of the patch several days back, and that patch (probably
> still) applies against current mainline.
[...]
> This is a bug.  It is one of *several* bugs in the original
> implementation of the utimensat()/futimens() interface.  All of them
> should be fixed.  I have by now provided fixes for most of them.  (Not
> point 2 above, but with a little help that should be quickly fixed as
> well.)  At this point, I think you need to explain why you think those
> fixes shouldn't be applied.

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
       [not found]                     ` <20080530193207.GB28074-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>
@ 2008-05-30 20:08                       ` Miklos Szeredi
  0 siblings, 0 replies; 33+ messages in thread
From: Miklos Szeredi @ 2008-05-30 20:08 UTC (permalink / raw)
  To: matthew-Ztpu424NOJ8
  Cc: miklos-sUDqSbJrdHQHWmgEVkV9KA,
	mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg,
	drepper-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

> > For 2) you can just use permission() instead of vfs_permission() which
> > is exactly the same in this case (and consolidated into a common
> > function later in the vfs-cleanups tree).
> 
> Miklos, don't delude people into thinking the vfs-cleanups tree is ever
> going to get merged in its current state.  Al's NAKed the path_* stuff,
> Christoph's NAKed it too.

Actually, the cleanup patches which I'm asking Michael to base his
patches on were not NAKed.

And Michael is not even working against the vfs-cleanups tree, but
rather just a couple of these utimes cleanup patches, none of which
are controversial like the path_* stuff.  But even if he was working
against the vfs-cleanups tree it wouldn't matter, since porting
patches to/from the path_* API is trivial.

>  Ignoring them and putting up a VFS tree of your own is not going to
> help matters.

Do please tell me how else should I work?  I post patches for review,
gradually, not as a 100 patch series, when we manage to submit
apparmor.  That far easier for people to handle, and sure enough I get
comments are testing for this stuff.

As for Al's NAK: the only alternative suggestion he has been able to
come up is to move the security hooks into callers.  And that has been
NAKed (not surprisingly) by the selinux folks.  And there's nothing
actually _wrong_ with those patches, they don't add complexity
(actually they remove complexity), they don't slow down the kernel,
and they even fix some bugs.  Now what the hell more do we need?

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
  2008-05-30 18:24             ` Michael Kerrisk
  2008-05-30 19:22               ` Miklos Szeredi
       [not found]               ` <cfd18e0f0805301124o5f217dden10726b268d05d81a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-05-30 20:17               ` Andrew Morton
  2008-05-31  5:44                 ` Michael Kerrisk
  2 siblings, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2008-05-30 20:17 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: miklos, drepper, viro, linux-kernel, linux-man, linux-fsdevel

On Fri, 30 May 2008 20:24:22 +0200
"Michael Kerrisk" <mtk.manpages@googlemail.com> wrote:

> > After we fixed the original issue, we can still think about this other
> > thing :)
> 
> Okay, by now quite a bit of my time has been wasted, and my patience
> is starting to get a little thin.

Well yeah.  This has been dragging on for ages.  I was going to just
apply this patch and stomp off again but this:

  "against 2.6.26rc2, with the 2008-05-19 git changes you sent me
   applied"

stymied me.

Please, if poss, just send a standalone, fully-changelogged,
not-referential-to-some-random-email-discussion patch and let's get
going.

(Where's Ulrich, btw?)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
       [not found]                   ` <cfd18e0f0805301243h7d862963o8320a2c1f48942ce-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-05-30 20:17                     ` Miklos Szeredi
       [not found]                       ` <E1K2B2k-0002kS-Cz-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2008-05-30 20:17 UTC (permalink / raw)
  To: mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg
  Cc: miklos-sUDqSbJrdHQHWmgEVkV9KA, drepper-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Michael,

> You omitted to answer my question, the last sentence below:

Oh, come on.  Where in my last mail did I say that these fixes
shouldn't be applied?  I was only objecting to putting a fix and an
independent cleanup (which by now I realized was not such a good idea)
into a single patch.

Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
       [not found]                       ` <E1K2B2k-0002kS-Cz-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
@ 2008-05-31  5:28                         ` Michael Kerrisk
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Kerrisk @ 2008-05-31  5:28 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: drepper-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Fri, May 30, 2008 at 10:17 PM, Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> wrote:
> Michael,
>
>> You omitted to answer my question, the last sentence below:
>
> Oh, come on.  Where in my last mail did I say that these fixes
> shouldn't be applied?

I didn't say you did.  However, while saying "I understand your
frustration", you simply ignored this question, which I would say was
fairly important in the context, and asked me to write another version
of the patch against your tree.  My takeaway from this is that you
attach a higher priority to seeing that your tree isn't broken, than
to seeing that these bugs get fixed.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
  2008-05-30 20:17               ` Andrew Morton
@ 2008-05-31  5:44                 ` Michael Kerrisk
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Kerrisk @ 2008-05-31  5:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: miklos, drepper, viro, linux-kernel, linux-man, linux-fsdevel

Andrew,

On Fri, May 30, 2008 at 10:17 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Fri, 30 May 2008 20:24:22 +0200
> "Michael Kerrisk" <mtk.manpages@googlemail.com> wrote:
>
>> > After we fixed the original issue, we can still think about this other
>> > thing :)
>>
>> Okay, by now quite a bit of my time has been wasted, and my patience
>> is starting to get a little thin.
>
> Well yeah.  This has been dragging on for ages.  I was going to just
> apply this patch and stomp off again but this:
>
>  "against 2.6.26rc2, with the 2008-05-19 git changes you sent me
>   applied"
>
> stymied me.
>
> Please, if poss, just send a standalone, fully-changelogged,
> not-referential-to-some-random-email-discussion patch and let's get
> going.

Yes, I'll send it to you by Tuesday latest.  Sorry I probably can't
manage sooner.  Am travelling part of Monday, and trying to spend
(most of) the weekend with my baby daughter.

> (Where's Ulrich, btw?)

That's what I'd like to know....

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
  2008-05-30 15:34       ` [PATCH] utimensat() non-conformances and fixes [v3] Michael Kerrisk
       [not found]         ` <48401E7E.9090304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2008-06-03 11:05         ` Michael Kerrisk
       [not found]           ` <cfd18e0f0806030405u1c32b114pa0fdd979f36f87fb-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 33+ messages in thread
From: Michael Kerrisk @ 2008-06-03 11:05 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: drepper, viro, akpm, linux-kernel, linux-man, linux-fsdevel

Hi Miklos,

> 2) I've found yet another divergence from the spec -- but this
> was in the original implementation, rather than being
> something that has been introduced.  In do_futimes() there is
>
>        if (!times && !(file->f_mode & FMODE_WRITE))
>                write_error = -EACCES;
>
> However, the check here should not be against the f_mode (file access
> mode), but the against actual permission of the file referred to by
> the underlying descriptor.  This means that for the do_futimes() +
> times==NULL case, a set-user-ID root program could open a file
> descriptor O_RDWR/O_WRONLY for which the real UID does not have write
> access, and then even after reverting the the effective UID, the real
> user could still update file.
>
> I'm not sure of the correct way to get the required nameidata (to do a
> vfs_permission() call) from the file descriptor.  Can you give me a
> tip there?

Could you point me at the right way of doing this?

Cheers,

Michael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
       [not found]           ` <cfd18e0f0806030405u1c32b114pa0fdd979f36f87fb-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-06-03 11:13             ` Miklos Szeredi
  2008-06-03 11:22               ` Al Viro
  2008-06-03 11:52               ` Michael Kerrisk
  0 siblings, 2 replies; 33+ messages in thread
From: Miklos Szeredi @ 2008-06-03 11:13 UTC (permalink / raw)
  To: mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg
  Cc: miklos-sUDqSbJrdHQHWmgEVkV9KA, drepper-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

> > I'm not sure of the correct way to get the required nameidata (to do a
> > vfs_permission() call) from the file descriptor.  Can you give me a
> > tip there?
> 
> Could you point me at the right way of doing this?

You don't need nameidata for this at all.  Just call permission() with
a NULL nameidata.

Ugly API?  Yes, will be cleaned up if we manage to find some common
ground with the VFS maintainers.

Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
  2008-06-03 11:13             ` Miklos Szeredi
@ 2008-06-03 11:22               ` Al Viro
  2008-06-03 11:27                 ` Michael Kerrisk
       [not found]                 ` <20080603112221.GW28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
  2008-06-03 11:52               ` Michael Kerrisk
  1 sibling, 2 replies; 33+ messages in thread
From: Al Viro @ 2008-06-03 11:22 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: mtk.manpages, drepper, akpm, linux-kernel, linux-man,
	linux-fsdevel

On Tue, Jun 03, 2008 at 01:13:00PM +0200, Miklos Szeredi wrote:
> > > I'm not sure of the correct way to get the required nameidata (to do a
> > > vfs_permission() call) from the file descriptor.  Can you give me a
> > > tip there?
> > 
> > Could you point me at the right way of doing this?
> 
> You don't need nameidata for this at all.  Just call permission() with
> a NULL nameidata.
> 
> Ugly API?  Yes, will be cleaned up if we manage to find some common
> ground with the VFS maintainers.

As soon as I'm done with sysctls...

FWIW, I very much doubt that you are right wrt required permissions, though.
AFAICS, intent here is "if you can write to file, you can touch the timestamps
anyway" and having descriptor opened for write gives that, current permissions
be damned.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
  2008-06-03 11:22               ` Al Viro
@ 2008-06-03 11:27                 ` Michael Kerrisk
  2008-06-03 11:30                   ` Jamie Lokier
       [not found]                 ` <20080603112221.GW28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
  1 sibling, 1 reply; 33+ messages in thread
From: Michael Kerrisk @ 2008-06-03 11:27 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, drepper, akpm, linux-kernel, linux-man,
	linux-fsdevel

On Tue, Jun 3, 2008 at 1:22 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Jun 03, 2008 at 01:13:00PM +0200, Miklos Szeredi wrote:
>> > > I'm not sure of the correct way to get the required nameidata (to do a
>> > > vfs_permission() call) from the file descriptor.  Can you give me a
>> > > tip there?
>> >
>> > Could you point me at the right way of doing this?
>>
>> You don't need nameidata for this at all.  Just call permission() with
>> a NULL nameidata.
>>
>> Ugly API?  Yes, will be cleaned up if we manage to find some common
>> ground with the VFS maintainers.
>
> As soon as I'm done with sysctls...
>
> FWIW, I very much doubt that you are right wrt required permissions, though.
> AFAICS, intent here is "if you can write to file, you can touch the timestamps
> anyway" and having descriptor opened for write gives that, current permissions
> be damned.

The standard is pretty clear on this point:

[[
Only a process with the effective user ID equal to the user ID of the
file, or with write access to the file, or with appropriate privileges
may use futimens( ) or utimensat( ) with a null pointer as the times
argument or with both tv_nsec fields set to the special value
UTIME_NOW.
]]

The crucial words here are "a process ... with write access to the
file" -- in other words, the permissions are determined by the
process's credentials, not by the access mode of the file descriptor.
I was not 100% sure on that to start with, so I did check it out with
one of the folk at The Open Group, to make sure of my understanding.

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
  2008-06-03 11:27                 ` Michael Kerrisk
@ 2008-06-03 11:30                   ` Jamie Lokier
       [not found]                     ` <20080603113018.GA27955-yetKDKU6eevNLxjTenLetw@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Jamie Lokier @ 2008-06-03 11:30 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Al Viro, Miklos Szeredi, drepper, akpm, linux-kernel, linux-man,
	linux-fsdevel

Michael Kerrisk wrote:
> > FWIW, I very much doubt that you are right wrt required
> > permissions, though.  AFAICS, intent here is "if you can write to
> > file, you can touch the timestamps anyway" and having descriptor
> > opened for write gives that, current permissions be damned.
> 
> The standard is pretty clear on this point:
> 
> [[
> Only a process with the effective user ID equal to the user ID of the
> file, or with write access to the file, or with appropriate privileges
> may use futimens( ) or utimensat( ) with a null pointer as the times
> argument or with both tv_nsec fields set to the special value
> UTIME_NOW.
> ]]
> 
> The crucial words here are "a process ... with write access to the
> file" -- in other words, the permissions are determined by the
> process's credentials, not by the access mode of the file descriptor.
> I was not 100% sure on that to start with, so I did check it out with
> one of the folk at The Open Group, to make sure of my understanding.

Is there anything else where the file descriptor's access mode allows
doing things on Linux, but the standard requires a permissions check
each time?

-- Jamie

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
       [not found]                     ` <20080603113018.GA27955-yetKDKU6eevNLxjTenLetw@public.gmane.org>
@ 2008-06-03 11:39                       ` Michael Kerrisk
  2008-06-03 11:49                         ` Al Viro
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Kerrisk @ 2008-06-03 11:39 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Al Viro, Miklos Szeredi, drepper-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Tue, Jun 3, 2008 at 1:30 PM, Jamie Lokier <jamie-yetKDKU6eevNLxjTenLetw@public.gmane.org> wrote:
> Michael Kerrisk wrote:
>> > FWIW, I very much doubt that you are right wrt required
>> > permissions, though.  AFAICS, intent here is "if you can write to
>> > file, you can touch the timestamps anyway" and having descriptor
>> > opened for write gives that, current permissions be damned.
>>
>> The standard is pretty clear on this point:
>>
>> [[
>> Only a process with the effective user ID equal to the user ID of the
>> file, or with write access to the file, or with appropriate privileges
>> may use futimens( ) or utimensat( ) with a null pointer as the times
>> argument or with both tv_nsec fields set to the special value
>> UTIME_NOW.
>> ]]
>>
>> The crucial words here are "a process ... with write access to the
>> file" -- in other words, the permissions are determined by the
>> process's credentials, not by the access mode of the file descriptor.
>> I was not 100% sure on that to start with, so I did check it out with
>> one of the folk at The Open Group, to make sure of my understanding.
>
> Is there anything else where the file descriptor's access mode allows
> doing things on Linux, but the standard requires a permissions check
> each time?

Jamie,

I can't think of examples offhand -- but I'm also not quite sure what
your question is about.  Could you say a little more?

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
  2008-06-03 11:39                       ` Michael Kerrisk
@ 2008-06-03 11:49                         ` Al Viro
       [not found]                           ` <20080603114921.GX28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
  2008-06-03 12:01                           ` Jamie Lokier
  0 siblings, 2 replies; 33+ messages in thread
From: Al Viro @ 2008-06-03 11:49 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Jamie Lokier, Miklos Szeredi, drepper, akpm, linux-kernel,
	linux-man, linux-fsdevel

On Tue, Jun 03, 2008 at 01:39:07PM +0200, Michael Kerrisk wrote:

> > Is there anything else where the file descriptor's access mode allows
> > doing things on Linux, but the standard requires a permissions check
> > each time?
> 
> Jamie,
> 
> I can't think of examples offhand -- but I'm also not quite sure what
> your question is about.  Could you say a little more?

"Is anything else equally stupid?", I suspect...  AFAICS, behaviour in
question is inherited from futimes(2) in one of the *BSD - nothing to
do about that now (at least 10 years too late).  It's rather inconsistent
with a lot of things, starting with "why utimes(2) has weaker requirements
with NULL argument", but we are far too late to fix that.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
  2008-06-03 11:13             ` Miklos Szeredi
  2008-06-03 11:22               ` Al Viro
@ 2008-06-03 11:52               ` Michael Kerrisk
  1 sibling, 0 replies; 33+ messages in thread
From: Michael Kerrisk @ 2008-06-03 11:52 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: drepper, viro, akpm, linux-kernel, linux-man, linux-fsdevel

On Tue, Jun 3, 2008 at 1:13 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> > I'm not sure of the correct way to get the required nameidata (to do a
>> > vfs_permission() call) from the file descriptor.  Can you give me a
>> > tip there?
>>
>> Could you point me at the right way of doing this?
>
> You don't need nameidata for this at all.  Just call permission() with
> a NULL nameidata.

Thanks Miklos!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
       [not found]                           ` <20080603114921.GX28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
@ 2008-06-03 11:58                             ` Al Viro
  0 siblings, 0 replies; 33+ messages in thread
From: Al Viro @ 2008-06-03 11:58 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Jamie Lokier, Miklos Szeredi, drepper-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Tue, Jun 03, 2008 at 12:49:21PM +0100, Al Viro wrote:
> On Tue, Jun 03, 2008 at 01:39:07PM +0200, Michael Kerrisk wrote:
> 
> > > Is there anything else where the file descriptor's access mode allows
> > > doing things on Linux, but the standard requires a permissions check
> > > each time?
> > 
> > Jamie,
> > 
> > I can't think of examples offhand -- but I'm also not quite sure what
> > your question is about.  Could you say a little more?
> 
> "Is anything else equally stupid?", I suspect...  AFAICS, behaviour in
> question is inherited from futimes(2) in one of the *BSD - nothing to
> do about that now (at least 10 years too late).  It's rather inconsistent
> with a lot of things, starting with "why utimes(2) has weaker requirements
> with NULL argument", but we are far too late to fix that.

PS: as far as I can reconstruct what had happened there, they've got
these checks buried directly in ufs_setattr() and its ilk, which worked
for utimes(2), but had bitten them when they tried to do descriptor-based
analog...
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
  2008-06-03 11:49                         ` Al Viro
       [not found]                           ` <20080603114921.GX28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
@ 2008-06-03 12:01                           ` Jamie Lokier
       [not found]                             ` <20080603120135.GA28905-yetKDKU6eevNLxjTenLetw@public.gmane.org>
  1 sibling, 1 reply; 33+ messages in thread
From: Jamie Lokier @ 2008-06-03 12:01 UTC (permalink / raw)
  To: Al Viro
  Cc: Michael Kerrisk, Miklos Szeredi, drepper, akpm, linux-kernel,
	linux-man, linux-fsdevel

Al Viro wrote:
> On Tue, Jun 03, 2008 at 01:39:07PM +0200, Michael Kerrisk wrote:
> 
> > > Is there anything else where the file descriptor's access mode allows
> > > doing things on Linux, but the standard requires a permissions check
> > > each time?
> > 
> > Jamie,
> > 
> > I can't think of examples offhand -- but I'm also not quite sure what
> > your question is about.  Could you say a little more?
> 
> "Is anything else equally stupid?", I suspect...  AFAICS, behaviour in
> question is inherited from futimes(2) in one of the *BSD - nothing to
> do about that now (at least 10 years too late).  It's rather inconsistent
> with a lot of things, starting with "why utimes(2) has weaker requirements
> with NULL argument", but we are far too late to fix that.

To be fair, having a writable file descriptor only lets you change the
mtime to "now", and having a readable file descriptor only lets you
change the atime to "now".

Changing the times _in general_ can be seen as over-reaching those
capabilities and arguably justifies more strict checks.

E.g. setting times in the past, you can break some caching systems,
Make, etc.  Setting times to "now" will not break those things.

(A bit analogous with O_APPEND vs. O_WRITE.  Someone hands you an
O_APPEND descriptor and they can continue to assume you won't clobber
earlier records in their file.)

-- Jamie

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
       [not found]                             ` <20080603120135.GA28905-yetKDKU6eevNLxjTenLetw@public.gmane.org>
@ 2008-06-03 12:08                               ` Al Viro
       [not found]                                 ` <20080603120850.GZ28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Al Viro @ 2008-06-03 12:08 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Michael Kerrisk, Miklos Szeredi, drepper-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Tue, Jun 03, 2008 at 01:01:35PM +0100, Jamie Lokier wrote:
> > with a lot of things, starting with "why utimes(2) has weaker requirements
> > with NULL argument", but we are far too late to fix that.
> 
> To be fair, having a writable file descriptor only lets you change the
> mtime to "now", and having a readable file descriptor only lets you
> change the atime to "now".
> 
> Changing the times _in general_ can be seen as over-reaching those
> capabilities and arguably justifies more strict checks.

Which is what all questions about writability apply only to NULL case
anyway...
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
       [not found]                                 ` <20080603120850.GZ28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
@ 2008-06-03 12:10                                   ` Jamie Lokier
  0 siblings, 0 replies; 33+ messages in thread
From: Jamie Lokier @ 2008-06-03 12:10 UTC (permalink / raw)
  To: Al Viro
  Cc: Michael Kerrisk, Miklos Szeredi, drepper-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Al Viro wrote:
> On Tue, Jun 03, 2008 at 01:01:35PM +0100, Jamie Lokier wrote:
> > > with a lot of things, starting with "why utimes(2) has weaker requirements
> > > with NULL argument", but we are far too late to fix that.
> > 
> > To be fair, having a writable file descriptor only lets you change the
> > mtime to "now", and having a readable file descriptor only lets you
> > change the atime to "now".
> > 
> > Changing the times _in general_ can be seen as over-reaching those
> > capabilities and arguably justifies more strict checks.
> 
> Which is what all questions about writability apply only to NULL case
> anyway...

Oh!

So if I have the file descriptor, I can just as well change the mtime
by read a byte and write it back.  Or even do a zero-length write, on
some OSes.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
       [not found]                 ` <20080603112221.GW28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
@ 2008-06-03 12:16                   ` Miklos Szeredi
  2008-06-03 13:05                     ` Al Viro
  0 siblings, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2008-06-03 12:16 UTC (permalink / raw)
  To: viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn
  Cc: miklos-sUDqSbJrdHQHWmgEVkV9KA,
	mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg,
	drepper-H+wXaHxf7aLQT0dZR+AlfA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

> > > > I'm not sure of the correct way to get the required nameidata (to do a
> > > > vfs_permission() call) from the file descriptor.  Can you give me a
> > > > tip there?
> > > 
> > > Could you point me at the right way of doing this?
> > 
> > You don't need nameidata for this at all.  Just call permission() with
> > a NULL nameidata.
> > 
> > Ugly API?  Yes, will be cleaned up if we manage to find some common
> > ground with the VFS maintainers.
> 
> As soon as I'm done with sysctls...

Can't you just do that independently (for now just put a
d_find_alias() in there and be done with it)?  If you fix every piece
of horrid code that you come across, it'll never be done...

Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] utimensat() non-conformances and fixes [v3]
  2008-06-03 12:16                   ` Miklos Szeredi
@ 2008-06-03 13:05                     ` Al Viro
  0 siblings, 0 replies; 33+ messages in thread
From: Al Viro @ 2008-06-03 13:05 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: mtk.manpages, drepper, akpm, linux-kernel, linux-man,
	linux-fsdevel

On Tue, Jun 03, 2008 at 02:16:33PM +0200, Miklos Szeredi wrote:
> > > > > I'm not sure of the correct way to get the required nameidata (to do a
> > > > > vfs_permission() call) from the file descriptor.  Can you give me a
> > > > > tip there?
> > > > 
> > > > Could you point me at the right way of doing this?
> > > 
> > > You don't need nameidata for this at all.  Just call permission() with
> > > a NULL nameidata.
> > > 
> > > Ugly API?  Yes, will be cleaned up if we manage to find some common
> > > ground with the VFS maintainers.
> > 
> > As soon as I'm done with sysctls...
> 
> Can't you just do that independently (for now just put a
> d_find_alias() in there and be done with it)?  If you fix every piece
> of horrid code that you come across, it'll never be done...

There's not much left to do, actually...  FWIW, solution goes like this:
	* introduce structure on the classes of sysctls
(currently - root and per-network-namespace).  Namely "X is parent of Y",
with "if task T sees Y, it also sees X" as defining property.
	* when adding a sysctl table, find a "parent" one.  Which is to say,
find the deepest node on its stem that already is present in one of the
tables from our class or its ancestor classes.  That table will be our
parent and that node in it - attachment point.
	* delay freeing the table headers; have them refcounted and instead
of unconditionally freeing the sucker on unregistration just drop the refcount.

Now we can keep a pair (reference to header, pointer to ctl table entry)
as long as we hold refcount on header.  It won't affect unregistration
in any way.  And at any point we can try to acquire "active" (use) reference
to header.  If that succeeds, we know that
	+ unregistration hadn't been started
	+ unregistration won't be finished until we unuse the sucker
	+ table entry is alive and will stay alive until then.

So we can hold references to those puppies from inodes under /proc/sys
without blocking unregistration, etc.

What's more, we can associate such pair with each node in sysctl tree.
For non-directories that's obvious.  For directories, take the tree such
that directory belongs to tree \setminus parent of tree.

That's pretty much it.  Filesystem side is simple - we keep a pointer to
class of tree responsible for a node (see directly above) in dentry.
And ->d_compare() checks that class of candidate match should be visible
for task doing the lookup.  ->lookup() tries finding an entry with requested
name in sysctl table (found by directory inode) and in case of miss it goes
through the list of tables attached at that node, searching in those that
ought to be visible to us.

As the result, we have direct access to sysctl table entry right from inode,
maintain these references accross lookups without going through the contortions
done by current code and we do *NOT* use the same dentry for flipping between
unrelated sysctl nodes with different visibility...

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2008-06-03 13:05 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-16  8:31 [PATCH] utimensat() non-conformances and fixes -- version 2 Michael Kerrisk
     [not found] ` <482D4665.4050401-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-05-16  8:34   ` Michael Kerrisk
2008-05-16 16:59   ` Miklos Szeredi
     [not found]     ` <E1Jx3Gw-0002eA-55-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
2008-05-17 19:57       ` Michael Kerrisk
2008-05-19  9:50         ` Miklos Szeredi
2008-05-19 10:12           ` Miklos Szeredi
2008-05-19 12:24           ` Michael Kerrisk
2008-05-19 13:17             ` Miklos Szeredi
2008-05-30 15:34       ` [PATCH] utimensat() non-conformances and fixes [v3] Michael Kerrisk
     [not found]         ` <48401E7E.9090304-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-05-30 16:37           ` Miklos Szeredi
2008-05-30 18:24             ` Michael Kerrisk
2008-05-30 19:22               ` Miklos Szeredi
     [not found]                 ` <E1K2ABK-0002ck-UT-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
2008-05-30 19:32                   ` Matthew Wilcox
     [not found]                     ` <20080530193207.GB28074-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>
2008-05-30 20:08                       ` Miklos Szeredi
     [not found]               ` <cfd18e0f0805301124o5f217dden10726b268d05d81a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-05-30 19:43                 ` Michael Kerrisk
     [not found]                   ` <cfd18e0f0805301243h7d862963o8320a2c1f48942ce-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-05-30 20:17                     ` Miklos Szeredi
     [not found]                       ` <E1K2B2k-0002kS-Cz-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
2008-05-31  5:28                         ` Michael Kerrisk
2008-05-30 20:17               ` Andrew Morton
2008-05-31  5:44                 ` Michael Kerrisk
2008-06-03 11:05         ` Michael Kerrisk
     [not found]           ` <cfd18e0f0806030405u1c32b114pa0fdd979f36f87fb-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-06-03 11:13             ` Miklos Szeredi
2008-06-03 11:22               ` Al Viro
2008-06-03 11:27                 ` Michael Kerrisk
2008-06-03 11:30                   ` Jamie Lokier
     [not found]                     ` <20080603113018.GA27955-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2008-06-03 11:39                       ` Michael Kerrisk
2008-06-03 11:49                         ` Al Viro
     [not found]                           ` <20080603114921.GX28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-06-03 11:58                             ` Al Viro
2008-06-03 12:01                           ` Jamie Lokier
     [not found]                             ` <20080603120135.GA28905-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2008-06-03 12:08                               ` Al Viro
     [not found]                                 ` <20080603120850.GZ28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-06-03 12:10                                   ` Jamie Lokier
     [not found]                 ` <20080603112221.GW28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-06-03 12:16                   ` Miklos Szeredi
2008-06-03 13:05                     ` Al Viro
2008-06-03 11:52               ` Michael Kerrisk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).