public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* manipulating sigmask from filesystems and drivers
@ 2002-07-31 11:52 David Howells
  2002-07-31 11:58 ` Alan Cox
  2002-08-01 19:09 ` Linus Torvalds
  0 siblings, 2 replies; 37+ messages in thread
From: David Howells @ 2002-07-31 11:52 UTC (permalink / raw)
  To: torvalds, alan; +Cc: linux-kernel, dhowells


Hi Linus, Alan,

Can you confirm that this is A Bad Thing(TM)? I've been poking around in the
OpenAFS filesystem driver, and it tries to achieve uninterruptible I/O waiting
by the following means:

	/* CV_WAIT and CV_TIMEDWAIT rely on the fact that the Linux kernel has
	 * a global lock. Thus we can safely drop our locks before calling the
	 * kernel sleep services.
	 */
	static inline int CV_WAIT(afs_kcondvar_t *cv, afs_kmutex_t *l)
	{
	    int isAFSGlocked = ISAFS_GLOCK(); 
	    sigset_t saved_set;
	#ifdef DECLARE_WAITQUEUE
	    DECLARE_WAITQUEUE(wait, current);
	#else
	    struct wait_queue wait = { current, NULL };
	#endif

	    add_wait_queue((wait_queue_head_t *)cv, &wait);
	    set_current_state(TASK_INTERRUPTIBLE);

	    if (isAFSGlocked) AFS_GUNLOCK();
	    MUTEX_EXIT(l);

	    spin_lock_irq(&current->sigmask_lock);
	    saved_set = current->blocked;
	    sigfillset(&current->blocked);
	    recalc_sigpending(current);
	    spin_unlock_irq(&current->sigmask_lock);

	    schedule();
	    remove_wait_queue(cv, &wait);

	    spin_lock_irq(&current->sigmask_lock);
	    current->blocked = saved_set;
	    recalc_sigpending(current);
	    spin_unlock_irq(&current->sigmask_lock);

	    if (isAFSGlocked) AFS_GLOCK();
	    MUTEX_ENTER(l);

	    return 0;
	}

The reason for them doing this is so that they can get the process to appear
in the "S" state and thus avoid increasing the load average.

What I'm concerned about is that they wait for an event to happen by blocking
all signals (by accessing the process's signal masks directly) and then
sitting in TASK_INTERRUPTIBLE (which _mostly_ works, but ptrace(PTRACE_KILL)
can interrupt).

Can you comment on whether a driver is allowed to block signals like this, and
whether they should be waiting in TASK_UNINTERRUPTIBLE?

Cheers,
David

^ permalink raw reply	[flat|nested] 37+ messages in thread
[parent not found: <0C01A29FBAE24448A792F5C68F5EA47D2D3E2B@nasdaq.ms.ensim.com>]
* Re: manipulating sigmask from filesystems and drivers
@ 2002-08-02 18:24 Jesse Pollard
  0 siblings, 0 replies; 37+ messages in thread
From: Jesse Pollard @ 2002-08-02 18:24 UTC (permalink / raw)
  To: linux-kernel

Linus Torvalds <torvalds@transmeta.com>:
>On Fri, 2 Aug 2002, Jamie Lokier wrote:
>>
>> Linus Torvalds wrote:
>> > Sending somebody a SIGKILL (or any signal that kills the process) is
>> > different (in my opinion) from a signal that interrupts a system call in
>> > order to run a signal handler.
>>
>> So it's ok to have truncated log entries (or more realistically,
>> truncated simple database entries) if the logging program is killed?
>
>This is why I said
>
> "Which is what we want in generic_file_read() (and _probably_
>  generic_file_write() as well, but that's slightly more debatable)"
>
>The "slightly more debatable" comes exactly from the thing you mention.
>
>The thing is, "read()" on a file doesn't have any side effects outside the
>process that does it, so if you kill the process, doing a partial read is
>always ok (yeah, you can come up with thread examples etc where you can
>see the state, but I think those are so contrieved as to not really merit
>much worry and certainly have no existing programs issues).
>
>With write(), you have to make a judgement call. Unlike read, a truncated
>write _is_ visible outside the killed process. But exactly like read()
>there _are_ system management reasons why you may really need to kill
>writers. So the debatable point comes from whether you want to consider a
>killing signal to be "exceptional enough" to warrant the partial write.
>
>I can see both sides. I personally think I'd prefer the "if I kill a
>process, I want it dead _now_" approach, but this definitely _is_ up for
>discussion (unlike the signal handler case).

There has been cases (and systems) in the past that have provided BOTH
interpretations:

1. current kill -9 action:

	terminates process as soon as current process returns or is in
	the process of returning to user mode. This is normal, and prevents
	most partial writes. This is applicable to things like data base
	servers, log servers, and journaling processes.

2. Kill, and abort outstanding I/O.

	This casues partial log writes, corrupts databases (usually), and will
	cause any process to terminate.

When is #2 used:
	a. real time systems where the device handling MUST be terminated now.
	b. system shutdown for emergencies (this allows filesystems to
	   finsh flushing, but user processes may be stuck writing to an
	   audio/parallel device... procedure is to use kill -15, wait a
	   second or two, kill -9 wait a second or two, KILL UNCONDITIONALLY,
	   and then shutdown anyway).
	   Other uses:
		b1. fire, flood, power failure (act of god)
		b2. system overtemp (loss of AC cooling...)
		b3. disk drive failures (to stop writing to a drive, abort
		    DMA actions, controller failure detection - no need to
		    propagate errors to a raid...)
		b4. safety related aborts in time critical applications

Item b3 allows a system with some pretty catastrophic hardware
failures to actually do something and shutdown/clean up as much as possible
without just hanging - which will also introduce partial log writes...

I worked on one system that determined the main disk controller was failing,
and proceded to request a power cycle on all disk drives attached to that
particular controller to attempt to clear the failure. All user processes
were killed, a detailed diagnostic was provided, then the system shut itself
off.

In realtime underwater survey systems we used such an abort to cancel
expensive operations that were already in progress (expensive if it
finished - setting off remote explosives via an external controller).

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2002-10-17  8:26 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-31 11:52 manipulating sigmask from filesystems and drivers David Howells
2002-07-31 11:58 ` Alan Cox
2002-08-01 19:09 ` Linus Torvalds
2002-08-01 20:10   ` David Woodhouse
2002-08-01 20:21     ` Linus Torvalds
2002-08-01 20:47       ` Roman Zippel
2002-08-01 20:51         ` Linus Torvalds
2002-08-01 21:15           ` Roman Zippel
2002-08-01 21:42             ` Linus Torvalds
2002-08-01 22:29               ` David Woodhouse
2002-08-01 22:40                 ` Linus Torvalds
2002-08-01 22:50                   ` David Woodhouse
2002-08-02 15:59                   ` yodaiken
2002-08-01 22:35               ` Roman Zippel
2002-08-01 23:30                 ` Linus Torvalds
2002-08-02  0:31                   ` Olivier Galibert
2002-08-02  8:00                     ` Kai Henningsen
2002-08-02 10:02                   ` Roman Zippel
2002-08-02 12:38                     ` Ryan Anderson
2002-08-02 15:39                     ` Linus Torvalds
2002-08-02 16:00                       ` Benjamin LaHaise
2002-08-02 16:27                         ` Linus Torvalds
2002-08-02 17:13                           ` Jamie Lokier
2002-08-02 17:29                             ` Linus Torvalds
2002-08-02 17:57                               ` Trond Myklebust
2002-08-02 18:10                                 ` Linus Torvalds
2002-08-02 17:33                           ` Oliver Neukum
2002-08-03 18:27                             ` David Woodhouse
2002-10-17  8:32                           ` David Woodhouse
2002-08-02 19:27                       ` Roman Zippel
2002-08-02  7:31                 ` Giuliano Pochini
     [not found]   ` <mailman.1028232841.11555.linux-kernel2news@redhat.com>
2002-08-01 23:37     ` Pete Zaitcev
2002-08-01 23:46       ` David Woodhouse
     [not found] <0C01A29FBAE24448A792F5C68F5EA47D2D3E2B@nasdaq.ms.ensim.com>
2002-08-02 17:57 ` Paul Menage
2002-08-02 23:25   ` Ryan Anderson
2002-08-02 23:30     ` Paul Menage
  -- strict thread matches above, loose matches on Subject: below --
2002-08-02 18:24 Jesse Pollard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox