Re: jffs garbage collection and signal handling (fwd)

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: jffs garbage collection and signal handling (fwd)
@ 2002-10-16 12:11 David Woodhouse
  0 siblings, 0 replies; 5+ messages in thread
From: David Woodhouse @ 2002-10-16 12:11 UTC (permalink / raw)
  To: linux-fsdevel


Someone please tell me I don't need to do this -- at least not this way :)

-- 
dwmw2

---------- Forwarded message ----------
Date: Mon, 14 Oct 2002 10:32:45 +0100
From: David Woodhouse <dwmw2@infradead.org>
To: Peter 'p2' De Schrijver <p2@mind.be>
Cc: jffs-dev@axis.com
Subject: Re: jffs garbage collection and signal handling 


p2@mind.be said:
>  While debugging an application which heavily uses signals, I noticed
> that write calls sometimes returned -EINTR even though the signal
> handlers were registered with the SA_RESTART flag. 

	( s/-EINTR/-ERESTARTSYS/ patch omitted )

Hmmm. That's sane but still leaves you with the possibility of a short write
when a write is interrupted. Which is permitted by POSIX and all code should
deal with that, but there's a lot of broken code out there which doesn't --
so much so that we should really try to avoid breaking it, unfortunately.

I suspect I need to find a barf-bag and try to bring myself to commit 
something like the following:

Index: fs/jffs2/write.c
===================================================================
RCS file: /home/cvs/mtd/fs/jffs2/write.c,v
retrieving revision 1.60
diff -u -p -r1.60 write.c
--- fs/jffs2/write.c	9 Sep 2002 16:29:08 -0000	1.60
+++ fs/jffs2/write.c	14 Oct 2002 09:30:01 -0000
@@ -270,7 +270,9 @@ int jffs2_write_inode_range(struct jffs2
 {
 	int ret = 0;
 	uint32_t writtenlen = 0;
-
+#ifndef USER_SPACE_PROGRAMMERS_DONT_SUCK_ARSE_ANY_MORE
+	int need_recalc_current_sigpending = 0;
+#endif
        	D1(printk(KERN_DEBUG "jffs2_write_inode_range(): Ino #%u, ofs 0x%x, len 0x%x\n",
 		  f->inocache->ino, offset, writelen));
 		
@@ -286,6 +288,23 @@ int jffs2_write_inode_range(struct jffs2
 		ret = jffs2_reserve_space(c, sizeof(*ri) + JFFS2_MIN_DATA_LEN, &phys_ofs, &alloclen, ALLOC_NORMAL);
 		if (ret) {
 			D1(printk(KERN_DEBUG "jffs2_reserve_space returned %d\n", ret));
+			if (ret == -EINTR) {
+				if (!writtenlen) {
+					/* Nobody expects the Spanish Inquisition. */
+					ret = -ERESTARTNOINTR;
+#ifndef USER_SPACE_PROGRAMMERS_DONT_SUCK_ARSE_ANY_MORE
+				} else {
+					/* Lots of broken software out there which can't deal
+					   with partial writes. So we copy this vomit-inducing
+					   crap from fs/lockd/svc.c to attempt to work around it. */
+					D1(printk(KERN_DEBUG "jffs2_write_inode_range(): Ignoring signals, write already started\n"));
+					current->sigpending = 0;
+					need_recalc_current_sigpending = 1;
+					continue;
+#endif
+
+				}
+			}
 			break;
 		}
 		down(&f->sem);
@@ -363,6 +382,13 @@ int jffs2_write_inode_range(struct jffs2
 		writelen -= datalen;
 		buf += datalen;
 	}
+#ifndef USER_SPACE_PROGRAMMERS_DONT_SUCK_ARSE_ANY_MORE
+	if (need_recalc_current_sigpending) {
+		spin_lock_irq(&current->sigmask_lock);
+		recalc_sigpending(current);
+		spin_unlock_irq(&current->sigmask_lock);
+	}
+#endif
 	*retlen = writtenlen;
 	return ret;
 }


--
dwmw2



To unsubscribe from this list: send the line "unsubscribe jffs-dev" in
the body of a message to majordomo@axis.com



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: jffs garbage collection and signal handling (fwd)
@ 2002-10-16 18:45 Bryan Henderson
  2002-10-16 19:43 ` David Woodhouse
  0 siblings, 1 reply; 5+ messages in thread
From: Bryan Henderson @ 2002-10-16 18:45 UTC (permalink / raw)
  To: David Woodhouse; +Cc: linux-fsdevel

>Someone please tell me I don't need to do this -- at least not this way :)

There's not enough context in the email or the diff for me to know what the
question is, but let me take a stab:  You're concerned that a user won't
properly handle a short write, so the code change is intended to make
writes not interruptible.  If a signal is sent while a write() system call
is in progress, the write() system call completes for the entire requested
count.

(A short write is where you make a write system call and say to write 10
bytes, but the system call writes only 5 bytes and tells you it did so.
The response that is usually indicated is to do another write() system call
for the remaining 5 bytes.  If the system call doesn't write any bytes at
all, that's not a short write; it fails with ERESTARTSYS.  But that
generally indicates the same redrive response).

I'm skeptical about how prevalent it is not expect a short write from a
write() system call to a regular file.  The C library fwrite() would do the
additional system calls to finish the write, wouldn't it?  Probably not the
write() C library function, though?  Does anyone have any information on
the severity of this problem?

Assuming you want to make writes not interruptible, I think it would be
cleaner to block all signals while the write is happening than to push a
signal back once it occurs and retry.

>...still leaves you with the possibility of a short write
>when a write is interrupted. Which is permitted by POSIX and all code
should
>deal with that, but there's a lot of broken code out there which doesn't
--
>so much so that we should really try to avoid breaking it, unfortunately.

+                                    if (ret == -EINTR) {
+                                                if (!writtenlen) {
...
+                                                } else {
+                                                            /* Lots of
broken software out there which can't deal
+                                                               with
partial writes. So we copy this vomit-inducing
+                                                               crap from
fs/lockd/svc.c to attempt to work around it. */
+
D1(printk(KERN_DEBUG "jffs2_write_inode_range():
                                                              Ignoring
signals, write already started\n"));
+
current->sigpending = 0;
+
need_recalc_current_sigpending = 1;
+                                                            continue;
+

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: jffs garbage collection and signal handling (fwd)
  2002-10-16 18:45 Bryan Henderson
@ 2002-10-16 19:43 ` David Woodhouse
  0 siblings, 0 replies; 5+ messages in thread
From: David Woodhouse @ 2002-10-16 19:43 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: linux-fsdevel

hbryan@us.ibm.com said:
> There's not enough context in the email or the diff for me to know
> what the question is, but let me take a stab:  

Yes there was -- you are of course correct:

> You're concerned that a user won't properly handle a short write, so 
> the code change is intended to make writes not interruptible.  If a 
> signal is sent while a write() system call is in progress, the write()
> system call completes for the entire requested count.

> (A short write is where you make a write system call and say to write
> 10 bytes, but the system call writes only 5 bytes and tells you it did
> so. The response that is usually indicated is to do another write()
> system call for the remaining 5 bytes.  If the system call doesn't
> write any bytes at all, that's not a short write; it fails with
> ERESTARTSYS.  But that generally indicates the same redrive response).

Hence ERESTARTNOINTR in the case where nothing's been written yet --
although I note now that that should be 'if nothing's been written yet in
this call to generic_file_write()', not '...in this call to commit_write()'
as demonstrated in the patch I included.

> I'm skeptical about how prevalent it is not expect a short write from
> a write() system call to a regular file. The C library fwrite() would
> do the additional system calls to finish the write, wouldn't it?
> Probably not the write() C library function, though?  Does anyone have
> any information on the severity of this problem? 

It's actually fairly common, apparently. My original inclination was to let
such broken userspace software remain broken -- but Linus has declared
otherwise and I'm not intending to revive that particular thread here. For
the purpose of this discussion, we shall assume that it is not permitted for
a file system to return short writes except on error or EOF.

> Assuming you want to make writes not interruptible, I think it would
> be cleaner to block all signals while the write is happening than to
> push a signal back once it occurs and retry. 

That's a useful suggestion, and would perhaps allow the signal to be queued
to a different thread of the group if appropriate. Unfortunately I believe
it doesn't actually work -- even if you mask all signals I think it's still
possible for sigpending to get set. Things like ptrace make it this little
more, erm, 'interesting' than one might otherwise expect.

--
dwmw2

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: jffs garbage collection and signal handling (fwd)
@ 2002-10-18 17:55 Bryan Henderson
  2002-10-18 18:57 ` David Woodhouse
  0 siblings, 1 reply; 5+ messages in thread
From: Bryan Henderson @ 2002-10-18 17:55 UTC (permalink / raw)
  To: David Woodhouse; +Cc: David Woodhouse, linux-fsdevel

>>...to block all signals while the write is happening...
>
>Unfortunately I believe
>it doesn't actually work -- even if you mask all signals I think it's
still
>possible for sigpending to get set. Things like ptrace make it this little
>more, erm, 'interesting' than one might otherwise expect.

I'd love to have more solid information this.  I just went through this
interruptible write exercise about a month ago and came to the conclusion
that blocking signals does work for processing that has to happen
atomically.  I have code based on that.  I don't remember considering
ptrace specifically, though.

I don't know what it means for "sigpending to get set" (I don't remember
any of this on a variable name basis), but I know that signals can be
pending or pending delivery, and that in order for a pending signal to
become pending delivery, it must be in a class that is not blocked.  And
pending delivery is the problem because it prevents interruptible sleeps
from sleeping.  I looked at Linux 2.4.

I know if a process is tracing signals, ignored signals can't be thrown out
as cavlierly as otherwise, but I can't think of any effect it would have on
blocking signals - you trace a signal as it's delivered, not when it
becomes pending, right?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: jffs garbage collection and signal handling (fwd)
  2002-10-18 17:55 jffs garbage collection and signal handling (fwd) Bryan Henderson
@ 2002-10-18 18:57 ` David Woodhouse
  0 siblings, 0 replies; 5+ messages in thread
From: David Woodhouse @ 2002-10-18 18:57 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: linux-fsdevel, dhowells

hbryan@us.ibm.com said:
> >>...to block all signals while the write is happening... 

> > Unfortunately I believe it doesn't actually work -- even if you mask
> > all signals I think it's still possible for sigpending to get set.
> > Things like ptrace make it this little more, erm, 'interesting' than
> > one might otherwise expect.

> I'd love to have more solid information this.  I just went through
> this interruptible write exercise about a month ago and came to the
> conclusion that blocking signals does work for processing that has to
> happen atomically.  I have code based on that.  I don't remember
> considering ptrace specifically, though.

I don't recall the details. I have a vague recollection of it being just 
another of the OpenAFS brokennesses, which means it was probably dhowells 
who was muttering nasty words about it. David?

>  I don't know what it means for "sigpending to get set" (I don't
> remember any of this on a variable name basis), but I know that
> signals can be pending or pending delivery, and that in order for a
> pending signal to become pending delivery, it must be in a class that
> is not blocked.  And pending delivery is the problem because it
> prevents interruptible sleeps from sleeping.  I looked at Linux 2.4.

I mean the variable current->sigpending, which if just a (recalculateable)
flag to indicate that there is a signal pending delivery for the current
process. My original patch simply cleared that flag in the rare case that an
operation was interrupted, and restarted the offending operation (which in
this case is perfectly restartable). Then if this had happened, the correct
value of the flag would be recalculated when the write had completed.

Although I'm not wonderfully impressed with that option, I think it's
slightly less messy and more reliable than changing the signal mask.

In the long term, I suspect that the 'interruptible_count' suggestion which 
I made on linux-kernel a day or so might prove to be a better answer. 
That's not going to happen before 2.7, if at all, though.

--
dwmw2

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-10-18 18:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-18 17:55 jffs garbage collection and signal handling (fwd) Bryan Henderson
2002-10-18 18:57 ` David Woodhouse
  -- strict thread matches above, loose matches on Subject: below --
2002-10-16 18:45 Bryan Henderson
2002-10-16 19:43 ` David Woodhouse
2002-10-16 12:11 David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).