linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly
@ 2013-10-09  0:15 Al Viro
  2013-10-09  0:52 ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2013-10-09  0:15 UTC (permalink / raw)
  To: torvalds; +Cc: linux-fsdevel, linux-kernel


... and deal with short writes properly - the output might be to pipe, after
all; as it is, e.g. no-MMU case of elf_fdpic coredump can write a whole lot
more than a page worth of data at one call.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/coredump.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 319f973..478ebad 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -696,13 +696,20 @@ EXPORT_SYMBOL(dump_write);
 int dump_emit(struct coredump_params *cprm, const void *addr, int nr)
 {
 	struct file *file = cprm->file;
-	if (dump_interrupted() || !access_ok(VERIFY_READ, addr, nr))
-		return 0;
+	loff_t pos = file->f_pos;
+	ssize_t n;
 	if (cprm->written + nr > cprm->limit)
 		return 0;
-	if (file->f_op->write(file, addr, nr, &file->f_pos) != nr)
-		return 0;
-	cprm->written += nr;
+	while (nr) {
+		if (dump_interrupted())
+			return 0;
+		n = vfs_write(file, addr, nr, &pos);
+		if (n < 0)
+			return 0;
+		file->f_pos = pos;
+		cprm->written += n;
+		nr -= n;
+	}
 	return 1;
 }
 EXPORT_SYMBOL(dump_emit);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly
  2013-10-09  0:15 [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly Al Viro
@ 2013-10-09  0:52 ` Linus Torvalds
  2013-10-09  1:18   ` Al Viro
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2013-10-09  0:52 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linux Kernel Mailing List

On Tue, Oct 8, 2013 at 5:15 PM, Al Viro <viro@ftp.linux.org.uk> wrote:
>
> ... and deal with short writes properly

.. except you don't.

> +       while (nr) {
> +               if (dump_interrupted())
> +                       return 0;
> +               n = vfs_write(file, addr, nr, &pos);
> +               if (n < 0)
> +                       return 0;
> +               file->f_pos = pos;
> +               cprm->written += n;
> +               nr -= n;
> +       }

Please handle 'n == 0' too. Maybe it never happens (ie you get EPIPE
or ENOSPC), but write returning zero is actually possible and a valid
return value and traditional for "end of media". Looping forever is
not a good idea.

               Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly
  2013-10-09  0:52 ` Linus Torvalds
@ 2013-10-09  1:18   ` Al Viro
  2013-10-09  1:20     ` Al Viro
  2013-10-09  1:38     ` Linus Torvalds
  0 siblings, 2 replies; 7+ messages in thread
From: Al Viro @ 2013-10-09  1:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel, Linux Kernel Mailing List

On Tue, Oct 08, 2013 at 05:52:42PM -0700, Linus Torvalds wrote:
> On Tue, Oct 8, 2013 at 5:15 PM, Al Viro <viro@ftp.linux.org.uk> wrote:
> >
> > ... and deal with short writes properly
> 
> .. except you don't.
> 
> > +       while (nr) {
> > +               if (dump_interrupted())
> > +                       return 0;
> > +               n = vfs_write(file, addr, nr, &pos);
> > +               if (n < 0)
> > +                       return 0;
> > +               file->f_pos = pos;
> > +               cprm->written += n;
> > +               nr -= n;
> > +       }
> 
> Please handle 'n == 0' too. Maybe it never happens (ie you get EPIPE
> or ENOSPC), but write returning zero is actually possible and a valid
> return value and traditional for "end of media". Looping forever is
> not a good idea.

Point, but I would argue that we should yell very loud if we get 0 from
vfs_write() for non-zero size.  I'm not sure if POSIX allows write(2)
to return that, but a lot of userland code won't be expecting that and
won't be able to cope...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly
  2013-10-09  1:18   ` Al Viro
@ 2013-10-09  1:20     ` Al Viro
  2013-10-09  1:38     ` Linus Torvalds
  1 sibling, 0 replies; 7+ messages in thread
From: Al Viro @ 2013-10-09  1:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel, Linux Kernel Mailing List

On Wed, Oct 09, 2013 at 02:18:33AM +0100, Al Viro wrote:

> Point, but I would argue that we should yell very loud if we get 0 from
> vfs_write() for non-zero size.  I'm not sure if POSIX allows write(2)
> to return that, but a lot of userland code won't be expecting that and
> won't be able to cope...

PS: I agree that we should abort that loop if we get nr == 0, of course,
but I believe that we should report the pathname of the offending file,
at the very least.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly
  2013-10-09  1:18   ` Al Viro
  2013-10-09  1:20     ` Al Viro
@ 2013-10-09  1:38     ` Linus Torvalds
  2013-10-09  2:06       ` Al Viro
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2013-10-09  1:38 UTC (permalink / raw)
  To: Al Viro; +Cc: Al Viro, linux-fsdevel, Linux Kernel Mailing List

On Tue, Oct 8, 2013 at 6:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Point, but I would argue that we should yell very loud if we get 0 from
> vfs_write() for non-zero size.  I'm not sure if POSIX allows write(2)
> to return that, but a lot of userland code won't be expecting that and
> won't be able to cope...

Actually POSIX very much allows zero returns. O_NDELAY is mentioned as
a possible cause, in addition to zero-sized writes themselves, of
course.

Also, writing to (but not past) the end of a block device returns 0
for "end of device", iirc.

                Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly
  2013-10-09  1:38     ` Linus Torvalds
@ 2013-10-09  2:06       ` Al Viro
  2013-10-09  2:27         ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Al Viro @ 2013-10-09  2:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel, Linux Kernel Mailing List

On Tue, Oct 08, 2013 at 06:38:47PM -0700, Linus Torvalds wrote:
> On Tue, Oct 8, 2013 at 6:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Point, but I would argue that we should yell very loud if we get 0 from
> > vfs_write() for non-zero size.  I'm not sure if POSIX allows write(2)
> > to return that, but a lot of userland code won't be expecting that and
> > won't be able to cope...
> 
> Actually POSIX very much allows zero returns. O_NDELAY is mentioned as
> a possible cause, in addition to zero-sized writes themselves, of
> course.

Umm...  What it says is "If some data can be written without blocking the
thread, write() shall write what it can and return the number of bytes
written. Otherwise, it shall return -1 and set errno to EAGAIN."  For
sockets EWOULDBLOCK is also allowed as a possible errno value.  I hadn't
dug through the streams-related part, but we don't have that mess anyway.
 
> Also, writing to (but not past) the end of a block device returns 0
> for "end of device", iirc.

What do you mean?  If the starting position is below the end of device,
we get a non-zero length write, not exceeding the end.  If it's at
the end of device, we get -ENOSPC.  It's out of scope for POSIX, but
Linux is definitely acting that way...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly
  2013-10-09  2:06       ` Al Viro
@ 2013-10-09  2:27         ` Linus Torvalds
  0 siblings, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2013-10-09  2:27 UTC (permalink / raw)
  To: Al Viro; +Cc: Al Viro, linux-fsdevel, Linux Kernel Mailing List

On Tue, Oct 8, 2013 at 7:06 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Oct 08, 2013 at 06:38:47PM -0700, Linus Torvalds wrote:
>> On Tue, Oct 8, 2013 at 6:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> >
>> > Point, but I would argue that we should yell very loud if we get 0 from
>> > vfs_write() for non-zero size.  I'm not sure if POSIX allows write(2)
>> > to return that, but a lot of userland code won't be expecting that and
>> > won't be able to cope...
>>
>> Actually POSIX very much allows zero returns. O_NDELAY is mentioned as
>> a possible cause, in addition to zero-sized writes themselves, of
>> course.
>
> Umm...  What it says is "If some data can be written without blocking the
> thread, write() shall write what it can and return the number of bytes
> written. Otherwise, it shall return -1 and set errno to EAGAIN."

Look closer.

  ".. most historical implementations return zero (with the O_NDELAY
flag set, which is the historical predecessor of O_NONBLOCK .."

>> Also, writing to (but not past) the end of a block device returns 0
>> for "end of device", iirc.
>
> What do you mean?  If the starting position is below the end of device,
> we get a non-zero length write, not exceeding the end.  If it's at
> the end of device, we get -ENOSPC.  It's out of scope for POSIX, but
> Linux is definitely acting that way...

Hmm. I'm pretty sure I've seen zero returns for EOF somewhere..

         Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-10-09  2:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-09  0:15 [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly Al Viro
2013-10-09  0:52 ` Linus Torvalds
2013-10-09  1:18   ` Al Viro
2013-10-09  1:20     ` Al Viro
2013-10-09  1:38     ` Linus Torvalds
2013-10-09  2:06       ` Al Viro
2013-10-09  2:27         ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).