linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Request for comments: reserving a value for O_SEARCH and O_EXEC
@ 2013-08-05 22:25 Rich Felker
  2013-08-06  5:54 ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2013-08-05 22:25 UTC (permalink / raw)
  To: linux-api; +Cc: Joseph S. Myers, libc-alpha

Hi,

[I'm resending this to linux-api instead of linux-kernel on the advice
of Joseph Myers on libc-alpha. Please see the link to the libc-alpha
thread at the bottom of this message for discussion that has already
taken place.]

At present, one of the few interface-level conformance issues for
Linux against POSIX 2008 is lack of O_SEARCH and O_EXEC. I am trying
to get full, conforming support for them both into musl libc (for
which I am the maintainer) and glibc (see the libc-alpha post[1]).
At this point, I believe it is possible to do so with no changes at
the kernel level, using O_PATH and a moderate amount of
userspace-level emulation where O_PATH semantics are lacking. What
we're missing, however, is a reserved O_ACCMODE value for O_SEARCH and
O_EXEC (it can be the same for both). Using O_PATH directly is not an
option because the semantics for O_PATH|O_NOFOLLOW differ from the
POSIX semantics for O_SEARCH|O_NOFOLLOW and O_EXEC|O_NOFOLLOW:

- Linux O_PATH|O_NOFOLLOW opens a file descriptor referring to the
  symlink inode itself.

- POSIX O_NOFOLLOW with O_SEARCH or O_EXEC forces failure if the
  pathname refers to a symlink.

Both are important functionality to support - the former for features
and the latter for security. We can't just fstat and reject symbolic
links in userspace when O_PATH gets one or we would break access to
the Linux-specific O_PATH functionality, which is useful. So there
needs to be a way for open (the library function) to detect whether
the caller requested O_PATH or O_SEARCH/O_EXEC.

We could chord O_PATH with another flag such as O_EXCL where the
behavior would otherwise be undefined, but I don't want to conflict
with future such use by the kernel; that would be a compatibility
disaster.

My preference would be to use the value 3 for O_SEARCH and O_EXEC, so
that the O_ACCMODE mask would not even need to change. But doing this
requires (even moreso than chording) agreement with the kernel
community that this value will not be used for something else in the
future. Looking back, I see that it's been accepted by the kernel for
a long time (at least since 2.6.32) and treated as "no access" (reads
and writes result in EBADF, like O_PATH) but still does not let you
open files you don't have permissions to, or directories. However I'm
not clear if this is a documented (or undocumented, but stable :)
interface that should be left with its current behavior. Taking the
value 3 for O_SEARCH and O_EXEC would mean having open (the library
function) automatically apply O_PATH before passing it to the kernel
and rejecting the resulting fd if it's a symbolic link.

An alternate, less graceful but perhaps more compatible approach,
would be to use O_PATH|3 for O_SEARCH and O_EXEC. Then open could just
look for the low bits of flags (which should be 0 when using O_PATH
for the Linux semantics, no?) and reject symbolic links if they are
set.

Whatever approach we settle on, it would be nice if it has the
property that the kernel could eventually provide the full O_SEARCH
and O_EXEC semantics itself and eliminate the need for userspace
emulation. The current emulations we need are:

- fchmod and fchown (still not supported for O_PATH) fall back to
  calling chmod or chown on the pseudo-symlink in /proc/self/fd.

- fchdir and fstat (not supported prior to 3.5/3.6) fall back to
  calling chdir or stat.

- open checks whether it obtained a symlink and if so closes it and
  reports ELOOP.

- fcntl, depending on the value chosen for O_SEARCH/O_EXEC, may have
  to map the flags from F_GETFL to the right value.

There may be others I'm missing, but emulation generally follows the
same pattern.

Opinions? Please keep me CC'd on replies since I am not on the list.


Thanks,

Rich





[1] http://www.sourceware.org/ml/libc-alpha/2013-08/msg00016.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Request for comments: reserving a value for O_SEARCH and O_EXEC
  2013-08-05 22:25 Rich Felker
@ 2013-08-06  5:54 ` Christoph Hellwig
       [not found]   ` <20130806055425.GA9280-jcswGhMUV9g@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2013-08-06  5:54 UTC (permalink / raw)
  To: Rich Felker; +Cc: linux-api, Joseph S. Myers, libc-alpha

On Mon, Aug 05, 2013 at 06:25:44PM -0400, Rich Felker wrote:
> Hi,
> 
> [I'm resending this to linux-api instead of linux-kernel on the advice
> of Joseph Myers on libc-alpha. Please see the link to the libc-alpha
> thread at the bottom of this message for discussion that has already
> taken place.]

As told you earlier on linux-kernel just send a patch with your semantics
to lkml.  We're not going to reserve a value for a namespace that is
reserved for the kernel to implement something that should better
be done in kernel space.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Request for comments: reserving a value for O_SEARCH and O_EXEC
       [not found]   ` <20130806055425.GA9280-jcswGhMUV9g@public.gmane.org>
@ 2013-08-06 13:42     ` Rich Felker
  2013-08-06 14:03       ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2013-08-06 13:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Joseph S. Myers,
	libc-alpha-9JcytcrH/bA+uJoB2kUjGw

On Tue, Aug 06, 2013 at 07:54:25AM +0200, Christoph Hellwig wrote:
> On Mon, Aug 05, 2013 at 06:25:44PM -0400, Rich Felker wrote:
> > Hi,
> > 
> > [I'm resending this to linux-api instead of linux-kernel on the advice
> > of Joseph Myers on libc-alpha. Please see the link to the libc-alpha
> > thread at the bottom of this message for discussion that has already
> > taken place.]
> 
> As told you earlier on linux-kernel just send a patch with your semantics

Apologies, I did not see the reply, and I'm still looking for it. I
should have put the request to CC me more prominently in the email...

> to lkml.  We're not going to reserve a value for a namespace that is
> reserved for the kernel to implement something that should better
> be done in kernel space.

Did you mean "that should better be done in user space"?

Whether O_SEARCH and O_EXEC are provided fully natively by the kernel
or handled by userspace, either way a reserved value in the open flags
must be set aside. Otherwise any value used by the userspace
implementation would risk conflicting with future kernel features
using the same bit(s).

I fully understand that the kernel folks may not want to put O_SEARCH
and O_EXEC specific semantics in the kernel when O_PATH can, with some
trivial additional code in userspace, provide what's needed already.
This is why, at this time, I'm requesting a reserved value and not
trying to push code into the kernel.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Request for comments: reserving a value for O_SEARCH and O_EXEC
  2013-08-06 13:42     ` Rich Felker
@ 2013-08-06 14:03       ` Christoph Hellwig
       [not found]         ` <20130806140321.GA4421-jcswGhMUV9g@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2013-08-06 14:03 UTC (permalink / raw)
  To: Rich Felker; +Cc: Christoph Hellwig, linux-api, Joseph S. Myers, libc-alpha

On Tue, Aug 06, 2013 at 09:42:54AM -0400, Rich Felker wrote:
> > As told you earlier on linux-kernel just send a patch with your semantics
> 
> Apologies, I did not see the reply, and I'm still looking for it. I
> should have put the request to CC me more prominently in the email...

Sorry, it actually was libc-alpha that I replied to.  I didn't notice
you sent two slightly different messages instead of a having a cross-posted
discussion, which would have been more useful.

> 
> > to lkml.  We're not going to reserve a value for a namespace that is
> > reserved for the kernel to implement something that should better
> > be done in kernel space.
> 
> Did you mean "that should better be done in user space"?

No.  It should be done in kernelspace, just like all other O_ flags.

> 
> Whether O_SEARCH and O_EXEC are provided fully natively by the kernel
> or handled by userspace, either way a reserved value in the open flags
> must be set aside. Otherwise any value used by the userspace
> implementation would risk conflicting with future kernel features
> using the same bit(s).

No flag is going to get reserved without a proper (kernel-level)
implementation.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Request for comments: reserving a value for O_SEARCH and O_EXEC
       [not found]         ` <20130806140321.GA4421-jcswGhMUV9g@public.gmane.org>
@ 2013-08-06 14:36           ` Rich Felker
       [not found]             ` <20130806143609.GV221-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2013-08-06 14:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Joseph S. Myers,
	libc-alpha-9JcytcrH/bA+uJoB2kUjGw

On Tue, Aug 06, 2013 at 04:03:21PM +0200, Christoph Hellwig wrote:
> On Tue, Aug 06, 2013 at 09:42:54AM -0400, Rich Felker wrote:
> > > As told you earlier on linux-kernel just send a patch with your semantics
> > 
> > Apologies, I did not see the reply, and I'm still looking for it. I
> > should have put the request to CC me more prominently in the email...
> 
> Sorry, it actually was libc-alpha that I replied to.  I didn't notice
> you sent two slightly different messages instead of a having a cross-posted
> discussion, which would have been more useful.

I agree totally. That's why I cross-posted this new thread.

> > > to lkml.  We're not going to reserve a value for a namespace that is
> > > reserved for the kernel to implement something that should better
> > > be done in kernel space.
> > 
> > Did you mean "that should better be done in user space"?
> 
> No.  It should be done in kernelspace, just like all other O_ flags.

OK, I was just confused by your wording.

> > Whether O_SEARCH and O_EXEC are provided fully natively by the kernel
> > or handled by userspace, either way a reserved value in the open flags
> > must be set aside. Otherwise any value used by the userspace
> > implementation would risk conflicting with future kernel features
> > using the same bit(s).
> 
> No flag is going to get reserved without a proper (kernel-level)
> implementation.

This is frustrating because early on in the O_PATH discussions on LKML
when it was first added, there were requests for O_SEARCH and O_EXEC
semantics in the kernel, and these requests were rejected with the
response being roughly "you can do it in userspace using the more
general O_PATH approach". So we have two contradictory conditions:

- O_SEARCH/O_EXEC semantics won't be added in the kernel because you
  can do it in userspace with O_PATH.

- O_SEARCH/O_EXEC can't be added in userspace because they can't be
  assigned a value without having an implementation in kernelspace.

If there's a willingness to override/drop that previous decision
(which I believe Linus was in on, but I'd have to search for the old
threads again) then I can propose a patch. As far as I can tell, the
simplest implementation would be to follow the O_PATH code path but
include a check for this new mode and fail at the point of opening a
symlink where O_NOFOLLOW is processed. I am not sufficiently familiar
with this code to write the patch yet, but I can try to learn it. My
guess is that the patch would be less than 20 lines, half of it being
a change for the top-level O_PATH logic in openat that strips other
flags when O_PATH is present and half of it being 

If I do this, do you have a recommendation on the value to use? My
guess for the best choice would be O_PATH|3, so that O_PATH, O_SEARCH,
O_EXEC, O_RDONLY, O_WRONLY, and O_RDWR can all fall under O_ACCMODE
without adding more than one bit to O_ACCMODE. If we do it this way,
the patch should also make it so the extra bits (bits 0 and 1) set at
open time should be preserved when fcntl(F_GETFL) is called so that
the application correctly sees the access mode it requested.

Really, my preference would be if O_PATH could be changed to honor
O_NOFOLLOW just like other open types, and a new O_SYMLINK could be
added to open the link itself, but this would be an incompatible
change in the kernel API and I fully agree that would not be
appropriate.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Request for comments: reserving a value for O_SEARCH and O_EXEC
       [not found]             ` <20130806143609.GV221-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>
@ 2013-08-06 14:51               ` Christoph Hellwig
       [not found]                 ` <20130806145159.GA8192-jcswGhMUV9g@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2013-08-06 14:51 UTC (permalink / raw)
  To: Rich Felker
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Joseph S. Myers,
	libc-alpha-9JcytcrH/bA+uJoB2kUjGw

On Tue, Aug 06, 2013 at 10:36:10AM -0400, Rich Felker wrote:
> This is frustrating because early on in the O_PATH discussions on LKML
> when it was first added, there were requests for O_SEARCH and O_EXEC
> semantics in the kernel, and these requests were rejected with the
> response being roughly "you can do it in userspace using the more
> general O_PATH approach". So we have two contradictory conditions:
> 
> - O_SEARCH/O_EXEC semantics won't be added in the kernel because you
>   can do it in userspace with O_PATH.
> 
> - O_SEARCH/O_EXEC can't be added in userspace because they can't be
>   assigned a value without having an implementation in kernelspace.
> 
> If there's a willingness to override/drop that previous decision
> (which I believe Linus was in on, but I'd have to search for the old
> threads again)

Yes, Linus has complained about it.  Probably rightly so because the
O_EXEC and O_SEARCH semantics don't seem overly useful.

> then I can propose a patch. As far as I can tell, the
> simplest implementation would be to follow the O_PATH code path but
> include a check for this new mode and fail at the point of opening a
> symlink where O_NOFOLLOW is processed. I am not sufficiently familiar
> with this code to write the patch yet, but I can try to learn it. My
> guess is that the patch would be less than 20 lines, half of it being
> a change for the top-level O_PATH logic in openat that strips other
> flags when O_PATH is present and half of it being

<text missing here>


Besides the symlink semantics I think we should really get a narrow
implementation of it, that is really forbid everything but executing
it (if S_IREG()) or performing openat on it (if S_ISDIR).

For that we'd also want to move fexec(ve) into the kernel space.

> If I do this, do you have a recommendation on the value to use? My
> guess for the best choice would be O_PATH|3, so that O_PATH, O_SEARCH,
> O_EXEC, O_RDONLY, O_WRONLY, and O_RDWR can all fall under O_ACCMODE
> without adding more than one bit to O_ACCMODE. If we do it this way,
> the patch should also make it so the extra bits (bits 0 and 1) set at
> open time should be preserved when fcntl(F_GETFL) is called so that
> the application correctly sees the access mode it requested.

Note that "3" aready has a magic meaning on Linux:

"Linux  reserves  the special, nonstandard access mode 3 (binary 11) in flags
 to mean: check for read and write permission on the file and return a
 descriptor that can't be used for reading or writing.  This nonstandard
 access mode is  used  by  some Linux drivers to return a descriptor that
 is to be used only for device-specific ioctl(2) operations."

Given that it's limited to device nodes and a somewhat similar limitation
to O_SEARCH and O_EXEC it doesn't sound too bad.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Request for comments: reserving a value for O_SEARCH and O_EXEC
       [not found]                 ` <20130806145159.GA8192-jcswGhMUV9g@public.gmane.org>
@ 2013-08-06 15:23                   ` Rich Felker
       [not found]                     ` <20130806152316.GW221-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2013-08-06 15:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Joseph S. Myers,
	libc-alpha-9JcytcrH/bA+uJoB2kUjGw

On Tue, Aug 06, 2013 at 04:51:59PM +0200, Christoph Hellwig wrote:
> On Tue, Aug 06, 2013 at 10:36:10AM -0400, Rich Felker wrote:
> > This is frustrating because early on in the O_PATH discussions on LKML
> > when it was first added, there were requests for O_SEARCH and O_EXEC
> > semantics in the kernel, and these requests were rejected with the
> > response being roughly "you can do it in userspace using the more
> > general O_PATH approach". So we have two contradictory conditions:
> > 
> > - O_SEARCH/O_EXEC semantics won't be added in the kernel because you
> >   can do it in userspace with O_PATH.
> > 
> > - O_SEARCH/O_EXEC can't be added in userspace because they can't be
> >   assigned a value without having an implementation in kernelspace.
> > 
> > If there's a willingness to override/drop that previous decision
> > (which I believe Linus was in on, but I'd have to search for the old
> > threads again)
> 
> Yes, Linus has complained about it.  Probably rightly so because the
> O_EXEC and O_SEARCH semantics don't seem overly useful.

I really don't want to have the argument over whether they're useful.
I just want to be able to provide them to applications since they're
required by the standard and useful to applications as the only
portable way to achieve things that you could otherwise not achieve
without Linux-specific tricks. And I don't want the implementation I
provide to have security bugs (which it does without an ability to
give O_NOFOLLOW the POSIX semantics). I'm perfectly happy with
accepting a judgement from Linus that they don't belong in the kernel,
as long as there's a way to implement it in userspace without clashing
with future use of flag/mode bits by the kernel for other purposes.

> Besides the symlink semantics I think we should really get a narrow
> implementation of it, that is really forbid everything but executing
> it (if S_IREG()) or performing openat on it (if S_ISDIR).

This is non-conforming. POSIX makes no provision that fchmod, fchown,
fstat, fchdir, etc. can or must fail on descriptors opened with
O_SEARCH or O_EXEC. The mode used for opening is generally irrelevant
to these functions (for example, whether you can fchmod is a function
of the file's ownership and the process's privileges, not whether the
function was opened with write access) and, unless specified
otherwise, the same principle applies to these new access modes.

> For that we'd also want to move fexec(ve) into the kernel space.

I agree this would be useful, but it's a separate issue.

> > If I do this, do you have a recommendation on the value to use? My
> > guess for the best choice would be O_PATH|3, so that O_PATH, O_SEARCH,
> > O_EXEC, O_RDONLY, O_WRONLY, and O_RDWR can all fall under O_ACCMODE
> > without adding more than one bit to O_ACCMODE. If we do it this way,
> > the patch should also make it so the extra bits (bits 0 and 1) set at
> > open time should be preserved when fcntl(F_GETFL) is called so that
> > the application correctly sees the access mode it requested.
> 
> Note that "3" aready has a magic meaning on Linux:
> 
> "Linux  reserves  the special, nonstandard access mode 3 (binary 11) in flags
>  to mean: check for read and write permission on the file and return a
>  descriptor that can't be used for reading or writing.  This nonstandard
>  access mode is  used  by  some Linux drivers to return a descriptor that
>  is to be used only for device-specific ioctl(2) operations."
> 
> Given that it's limited to device nodes and a somewhat similar limitation
> to O_SEARCH and O_EXEC it doesn't sound too bad.

Thanks for digging this up. I observed the behavior but couldn't find
anywhere it was documented. I wasn't aware that it was checking for
both read and write permission, though, and assumed it was just
checking for read. This is somewhat unfortunate, as on old kernels
without O_PATH, O_PATH|3 would fail to open directories, whereas plain
O_PATH succeeds as long as you have read permission, thus providing an
acceptable "low quality fallback implementation" of O_SEARCH and
O_EXEC on old kernels.

Of course, the userspace fallback code could detect such failures and
retry with O_RDONLY, so maybe it's not such a big issue. With a
working O_PATH, open should never fail with EISDIR or EACCES, so these
errors could be used as a condition to retry.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Request for comments: reserving a value for O_SEARCH and O_EXEC
       [not found]                     ` <20130806152316.GW221-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>
@ 2013-08-06 15:53                       ` Joseph S. Myers
  2013-08-06 15:54                       ` Christoph Hellwig
  1 sibling, 0 replies; 12+ messages in thread
From: Joseph S. Myers @ 2013-08-06 15:53 UTC (permalink / raw)
  To: Rich Felker
  Cc: Christoph Hellwig, linux-api-u79uwXL29TY76Z2rM5mHXA,
	libc-alpha-9JcytcrH/bA+uJoB2kUjGw

On Tue, 6 Aug 2013, Rich Felker wrote:

> Of course, the userspace fallback code could detect such failures and
> retry with O_RDONLY, so maybe it's not such a big issue. With a
> working O_PATH, open should never fail with EISDIR or EACCES, so these
> errors could be used as a condition to retry.

Surely you'll still get EACCES when some component in the specified path, 
not the last one, lacks search permission?

-- 
Joseph S. Myers
joseph-qD8j1LwMmJjtCj0u4l0SBw@public.gmane.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Request for comments: reserving a value for O_SEARCH and O_EXEC
       [not found]                     ` <20130806152316.GW221-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>
  2013-08-06 15:53                       ` Joseph S. Myers
@ 2013-08-06 15:54                       ` Christoph Hellwig
       [not found]                         ` <20130806155415.GA12926-jcswGhMUV9g@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2013-08-06 15:54 UTC (permalink / raw)
  To: Rich Felker
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Joseph S. Myers,
	libc-alpha-9JcytcrH/bA+uJoB2kUjGw

On Tue, Aug 06, 2013 at 11:23:16AM -0400, Rich Felker wrote:
> > For that we'd also want to move fexec(ve) into the kernel space.
> 
> I agree this would be useful, but it's a separate issue.

I don't think it is.  The whole point of O_EXEC is to support fexecve.

Without moving it to kernel I can't see how you can make it strictly
conforming to this requirement in Posix that the file descriptor 
must be valid for executing.

Fortunately enough the kernel implementation is trivial.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Request for comments: reserving a value for O_SEARCH and O_EXEC
       [not found]                         ` <20130806155415.GA12926-jcswGhMUV9g@public.gmane.org>
@ 2013-08-06 16:30                           ` Rich Felker
  0 siblings, 0 replies; 12+ messages in thread
From: Rich Felker @ 2013-08-06 16:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Joseph S. Myers,
	libc-alpha-9JcytcrH/bA+uJoB2kUjGw

On Tue, Aug 06, 2013 at 05:54:15PM +0200, Christoph Hellwig wrote:
> On Tue, Aug 06, 2013 at 11:23:16AM -0400, Rich Felker wrote:
> > > For that we'd also want to move fexec(ve) into the kernel space.
> > 
> > I agree this would be useful, but it's a separate issue.
> 
> I don't think it is.  The whole point of O_EXEC is to support fexecve.
> 
> Without moving it to kernel I can't see how you can make it strictly
> conforming to this requirement in Posix that the file descriptor 
> must be valid for executing.

POSIX is contradictory on this issue. I should probably file a request
for an interpretation. On the one hand, fexecve "shall fail" if:

  "The fd argument is not a valid file descriptor open for executing."

On the other hand, the application usage text (non-normative, if I'm
not mistaken) reads:

  "Since execute permission is checked by fexecve(), the file
  description fd need not have been opened with the O_EXEC flag.
  However, if the file to be executed denies read and write permission
  for the process preparing to do the exec, the only way to provide
  the fd to fexecve() will be to use the O_EXEC flag when opening fd.
  In this case, the application will not be able to perform a checksum
  test since it will not be able to read the contents of the file."

This suggests that file descriptors opened in other modes could be
used, and that O_EXEC only exists so that you can open a file you
would otherwise not have permissions to. Moreover, early in the
normative ("description") text, it reads:

  "The fexecve() function shall be equivalent to the execve() function
  except that the file to be executed is determined by the file
  descriptor fd instead of a pathname. The file offset of fd is
  ignored."

Anyway, if failure when O_ACCMODE is not O_EXEC is required, it's
trivial to do in userspace: just call fcntl(F_GETFL) on the file
descriptor and generate an artificial EBADF if O_PATH is not set.
(Naturally this check would be omitted on pre-O_PATH kernels where
full conformance is not possible.)

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Request for comments: reserving a value for O_SEARCH and O_EXEC
       [not found] ` <20130803024808.GA26932-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>
@ 2013-08-12 17:42   ` Andy Lutomirski
       [not found]     ` <52091E6B.10800-3s7WtUTddSA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2013-08-12 17:42 UTC (permalink / raw)
  To: Rich Felker
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

[cc: linux-api]


On 08/02/2013 07:48 PM, Rich Felker wrote:
> Hi,
> 
> At present, one of the few interface-level conformance issues for
> Linux against POSIX 2008 is lack of O_SEARCH and O_EXEC. I am trying
> to get full, conforming support for them both into musl libc (for
> which I am the maintainer) and glibc (see the libc-alpha post[1]).
> At this point, I believe it is possible to do so with no changes at
> the kernel level, using O_PATH and a moderate amount of
> userspace-level emulation where O_PATH semantics are lacking. What
> we're missing, however, is a reserved O_ACCMODE value for O_SEARCH and
> O_EXEC (it can be the same for both). Using O_PATH directly is not an
> option because the semantics for O_PATH|O_NOFOLLOW differ from the
> POSIX semantics for O_SEARCH|O_NOFOLLOW and O_EXEC|O_NOFOLLOW:
> 
> - Linux O_PATH|O_NOFOLLOW opens a file descriptor referring to the
>   symlink inode itself.
> 
> - POSIX O_NOFOLLOW with O_SEARCH or O_EXEC forces failure if the
>   pathname refers to a symlink.
> 
> Both are important functionality to support - the former for features
> and the latter for security. We can't just fstat and reject symbolic
> links in userspace when O_PATH gets one or we would break access to
> the Linux-specific O_PATH functionality, which is useful. So there
> needs to be a way for open (the library function) to detect whether
> the caller requested O_PATH or O_SEARCH/O_EXEC.
> 
> We could chord O_PATH with another flag such as O_EXCL where the
> behavior would otherwise be undefined, but I don't want to conflict
> with future such use by the kernel; that would be a compatibility
> disaster.
> 
> My preference would be to use the value 3 for O_SEARCH and O_EXEC, so
> that the O_ACCMODE mask would not even need to change. But doing this
> requires (even moreso than chording) agreement with the kernel
> community that this value will not be used for something else in the
> future. Looking back, I see that it's been accepted by the kernel for
> a long time (at least since 2.6.32) and treated as "no access" (reads
> and writes result in EBADF, like O_PATH) but still does not let you
> open files you don't have permissions to, or directories. However I'm
> not clear if this is a documented (or undocumented, but stable :)
> interface that should be left with its current behavior. Taking the
> value 3 for O_SEARCH and O_EXEC would mean having open (the library
> function) automatically apply O_PATH before passing it to the kernel
> and rejecting the resulting fd if it's a symbolic link.
> 
> An alternate, less graceful but perhaps more compatible approach,
> would be to use O_PATH|3 for O_SEARCH and O_EXEC. Then open could just
> look for the low bits of flags (which should be 0 when using O_PATH
> for the Linux semantics, no?) and reject symbolic links if they are
> set.
> 
> Whatever approach we settle on, it would be nice if it has the
> property that the kernel could eventually provide the full O_SEARCH
> and O_EXEC semantics itself and eliminate the need for userspace
> emulation. The current emulations we need are:
> 
> - fchmod and fchown (still not supported for O_PATH) fall back to
>   calling chmod or chown on the pseudo-symlink in /proc/self/fd.
> 
> - fchdir and fstat (not supported prior to 3.5/3.6) fall back to
>   calling chdir or stat.
> 
> - open checks whether it obtained a symlink and if so closes it and
>   reports ELOOP.
> 
> - fcntl, depending on the value chosen for O_SEARCH/O_EXEC, may have
>   to map the flags from F_GETFL to the right value.
> 
> There may be others I'm missing, but emulation generally follows the
> same pattern.
> 
> Opinions? Please keep me CC'd on replies since I am not on the list.

You'll have the same problem that O_TMPFILE had: the kernel currently
ignores unrecognized flags.  I wonder if it's time to add a new syscall
(or syscalls) with more sensible semantics.

--Andy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Request for comments: reserving a value for O_SEARCH and O_EXEC
       [not found]     ` <52091E6B.10800-3s7WtUTddSA@public.gmane.org>
@ 2013-08-13  3:22       ` Rich Felker
  0 siblings, 0 replies; 12+ messages in thread
From: Rich Felker @ 2013-08-13  3:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Mon, Aug 12, 2013 at 10:42:03AM -0700, Andy Lutomirski wrote:
> You'll have the same problem that O_TMPFILE had: the kernel currently
> ignores unrecognized flags.  I wonder if it's time to add a new syscall
> (or syscalls) with more sensible semantics.

That's not a problem here. In fact, in the case where O_PATH is not
supported by the kernel, the best possible behavior for O_SEARCH and
O_EXEC would be for them to be the same as O_RDONLY, since this gives
comforming behavior in all ways except that it will fail if you don't
have read access to the file.

Some folks have raised the issue that it would be "dangerous" because
certain devices have side effects on open, even open for read, but
POSIX does not specify that opening for search or exec suppresses such
side effects anyway. It's only applications directly using O_PATH and
expecting the Linux semantics that would be thrown off by getting
O_READ semantics instead. In any case, there are many reasons it's
unsafe for a privileged process to open an untrusted pathname already.

Anyway, the whole point of this discussion is about choosing a value
that has the best fallback behavior on old kernels. O_PATH alone would
meet that requirement almost perfectly, but it has the unfortunate
issue that O_NOFOLLOW is interpreted in a special way with O_PATH: it
causes the symlink itelf to be opened, rather than for open to fail
when encountering a symlink. So we need a new flag by which the kernel
could detect and reject symlinks with O_PATH, _or_ the kernel could
just ignore this new flag, since userspace will have to check (to
support older kernels) that it did not get a symlink, and if so,
simulate failure.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-08-13  3:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20130803024808.GA26932@brightrain.aerifal.cx>
     [not found] ` <20130803024808.GA26932-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>
2013-08-12 17:42   ` Request for comments: reserving a value for O_SEARCH and O_EXEC Andy Lutomirski
     [not found]     ` <52091E6B.10800-3s7WtUTddSA@public.gmane.org>
2013-08-13  3:22       ` Rich Felker
2013-08-05 22:25 Rich Felker
2013-08-06  5:54 ` Christoph Hellwig
     [not found]   ` <20130806055425.GA9280-jcswGhMUV9g@public.gmane.org>
2013-08-06 13:42     ` Rich Felker
2013-08-06 14:03       ` Christoph Hellwig
     [not found]         ` <20130806140321.GA4421-jcswGhMUV9g@public.gmane.org>
2013-08-06 14:36           ` Rich Felker
     [not found]             ` <20130806143609.GV221-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>
2013-08-06 14:51               ` Christoph Hellwig
     [not found]                 ` <20130806145159.GA8192-jcswGhMUV9g@public.gmane.org>
2013-08-06 15:23                   ` Rich Felker
     [not found]                     ` <20130806152316.GW221-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>
2013-08-06 15:53                       ` Joseph S. Myers
2013-08-06 15:54                       ` Christoph Hellwig
     [not found]                         ` <20130806155415.GA12926-jcswGhMUV9g@public.gmane.org>
2013-08-06 16:30                           ` Rich Felker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).