* [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
@ 2005-01-06 18:31 Steve Dickson
2005-01-07 17:14 ` Trond Myklebust
0 siblings, 1 reply; 14+ messages in thread
From: Steve Dickson @ 2005-01-06 18:31 UTC (permalink / raw)
To: nfs
[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]
Here is a 2.4.28 patch that allows core dumps to be written on
fileystems mounted with the 'intr' option.
By setting the PF_DUMPCORE bit in the current->flags (via
do_coredump()), I was able to signal the RPC and NFS code that
the current task is trying to dump a core. So the trick was to have
the rpc code (temporarily) ignore the fact the task got signalled()
and let the write/commit path complete....
I do this by checking the PF_DUMPCORE bit in both __rpc_execute()
and nfs_wait_event(). One side effect is the dropping core process
becomes uninterruptable.... I combat this by only allowing the process
three retries before giving up...
Finally there are two placing in the NFS code where I don't ignore
the signal and do fail the writing of the core. One, is in
nfs_create_request()
when there are no available pages and in nfs3_rpc_wrapper() when
the server returns EJUKEBOX. But I did place printks so it will be clear
as to why it failed....
Comments... Suggestions... Crimes against humanity that I'm committing???
steved.
[-- Attachment #2: linux-2.4.28-nfs-dropcore.patch --]
[-- Type: text/x-patch, Size: 3642 bytes --]
--- linux-2.4.28/fs/nfs/pagelist.c.orig 2004-04-14 09:05:40.000000000 -0400
+++ linux-2.4.28/fs/nfs/pagelist.c 2005-01-06 12:27:21.053600000 -0500
@@ -94,8 +94,13 @@ nfs_create_request(struct rpc_cred *cred
*/
if (nfs_try_to_free_pages(server))
continue;
- if (signalled() && (server->flags & NFS_MOUNT_INTR))
+
+ if (signalled() && (server->flags & NFS_MOUNT_INTR)) {
+ if (task_core_dumping())
+ printk(KERN_WARNING \
+ "NFS: Core Dump Aborted due to lack of resources\n");
return ERR_PTR(-ERESTARTSYS);
+ }
yield();
}
--- linux-2.4.28/fs/nfs/nfs3proc.c.orig 2003-11-28 13:26:21.000000000 -0500
+++ linux-2.4.28/fs/nfs/nfs3proc.c 2005-01-06 12:27:21.063600000 -0500
@@ -31,6 +31,9 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt,
set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout(NFS_JUKEBOX_RETRY_TIME);
res = -ERESTARTSYS;
+ if (signalled() && task_core_dumping())
+ printk(KERN_WARNING \
+ "NFS: Core Dump Aborted due to lack of resources\n");
} while (!signalled());
rpc_clnt_sigunmask(clnt, &oldset);
return res;
--- linux-2.4.28/fs/exec.c.orig 2004-02-18 08:36:31.000000000 -0500
+++ linux-2.4.28/fs/exec.c 2005-01-06 12:27:21.082600000 -0500
@@ -1125,6 +1125,7 @@ int do_coredump(long signr, struct pt_re
if (current->rlim[RLIMIT_CORE].rlim_cur < binfmt->min_coredump)
goto fail;
+ current->flags |= PF_DUMPCORE;
format_corename(corename, core_pattern, signr);
file = filp_open(corename, O_CREAT | 2 | O_NOFOLLOW, 0600);
if (IS_ERR(file))
--- linux-2.4.28/include/linux/sunrpc/clnt.h.orig 2002-11-28 18:53:15.000000000 -0500
+++ linux-2.4.28/include/linux/sunrpc/clnt.h 2005-01-06 12:35:12.701368000 -0500
@@ -108,6 +108,8 @@ struct rpc_procinfo {
#ifdef __KERNEL__
+#define task_core_dumping() (current->flags & PF_DUMPCORE)
+
struct rpc_clnt *rpc_create_client(struct rpc_xprt *xprt, char *servname,
struct rpc_program *info,
u32 version, int authflavor);
--- linux-2.4.28/include/linux/nfs_fs.h.orig 2004-04-14 09:05:40.000000000 -0400
+++ linux-2.4.28/include/linux/nfs_fs.h 2005-01-06 12:35:12.718369000 -0500
@@ -345,7 +345,7 @@ extern void * nfs_root_data(void);
#define nfs_wait_event(clnt, wq, condition) \
({ \
int __retval = 0; \
- if (clnt->cl_intr) { \
+ if (clnt->cl_intr && !task_core_dumping()) { \
sigset_t oldmask; \
rpc_clnt_sigmask(clnt, &oldmask); \
__retval = wait_event_interruptible(wq, condition); \
--- linux-2.4.28/net/sunrpc/sched.c.orig 2003-06-13 10:51:39.000000000 -0400
+++ linux-2.4.28/net/sunrpc/sched.c 2005-01-06 12:27:21.137600000 -0500
@@ -498,7 +498,7 @@ __rpc_atrun(struct rpc_task *task)
static int
__rpc_execute(struct rpc_task *task)
{
- int status = 0;
+ int status = 0, core_retry = 0;
dprintk("RPC: %4d rpc_execute flgs %x\n",
task->tk_pid, task->tk_flags);
@@ -572,9 +572,20 @@ __rpc_execute(struct rpc_task *task)
* -ERESTARTSYS. In order to catch any callbacks that
* clean up after sleeping on some queue, we don't
* break the loop here, but go around once more.
- */
+ */
if (task->tk_client->cl_intr && signalled()) {
dprintk("RPC: %4d got signal\n", task->tk_pid);
+ /*
+ * If we are dropping a core and 'intr' is set,
+ * we want to make an attempted at writting
+ * out the core, but not get hung up in it.
+ */
+ if (task_core_dumping()) {
+ dprintk("RPC: %4d dropping core retries %d\n",
+ task->tk_pid, core_retry);
+ if (++core_retry < 3)
+ continue;
+ }
task->tk_flags |= RPC_TASK_KILLED;
rpc_exit(task, -ERESTARTSYS);
rpc_wake_up_task(task);
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
2005-01-06 18:31 [PATCH] -o intr mount option prevents core dumps on 2.4 kernel Steve Dickson
@ 2005-01-07 17:14 ` Trond Myklebust
2005-01-11 19:36 ` Steve Dickson
0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2005-01-07 17:14 UTC (permalink / raw)
To: Steve Dickson; +Cc: nfs
to den 06.01.2005 Klokka 13:31 (-0500) skreiv Steve Dickson:
> Comments... Suggestions... Crimes against humanity that I'm committing???
Firstly, how is "cl_intr" preventing core dumps? Normally, we should be
masking all signals except SIGKILL, SIGINT and SIGQUIT. I can see that
masking SIGINT and SIGQUIT might be useful here, but not SIGKILL...
Secondly: Why not do all this in rpc_clnt_sigmask()?
Cheers,
Trond
--
Trond Myklebust <trond.myklebust@fys.uio.no>
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
2005-01-07 17:14 ` Trond Myklebust
@ 2005-01-11 19:36 ` Steve Dickson
2005-01-11 20:10 ` Trond Myklebust
0 siblings, 1 reply; 14+ messages in thread
From: Steve Dickson @ 2005-01-11 19:36 UTC (permalink / raw)
To: Trond Myklebust; +Cc: nfs
Trond Myklebust wrote:
>to den 06.01.2005 Klokka 13:31 (-0500) skreiv Steve Dickson:
>
>
>
>>Comments... Suggestions... Crimes against humanity that I'm committing???
>>
>>
>
>
>Firstly, how is "cl_intr" preventing core dumps? Normally, we should be
>masking all signals except SIGKILL, SIGINT and SIGQUIT. I can see that
>masking SIGINT and SIGQUIT might be useful here, but not SIGKILL...
>
>
Because __rpc_execute() notices that signalled() is set and aborts the
write by
returning -ERESTARTSYS makes the core dumping code stop dumping.
>Secondly: Why not do all this in rpc_clnt_sigmask()?
>
>
I looked at that.... but I really was not sure if could hack it up
to turn off signalled() and plus I was sure how I could turn
signalled() back on at the end of the write.... The way I did
just seemed less risky and seem to work...
steved.
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
2005-01-11 19:36 ` Steve Dickson
@ 2005-01-11 20:10 ` Trond Myklebust
2005-01-11 20:58 ` Steve Dickson
0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2005-01-11 20:10 UTC (permalink / raw)
To: Steve Dickson; +Cc: nfs
ty den 11.01.2005 Klokka 14:36 (-0500) skreiv Steve Dickson:
> >Firstly, how is "cl_intr" preventing core dumps? Normally, we should be
> >masking all signals except SIGKILL, SIGINT and SIGQUIT. I can see that
> >masking SIGINT and SIGQUIT might be useful here, but not SIGKILL...
> >
> >
> Because __rpc_execute() notices that signalled() is set and aborts the
> write by
> returning -ERESTARTSYS makes the core dumping code stop dumping.
Err... That's what the sigmask is for. signalled() only looks at those
signals that are enabled by the sigmask.
> >Secondly: Why not do all this in rpc_clnt_sigmask()?
> >
> >
> I looked at that.... but I really was not sure if could hack it up
> to turn off signalled() and plus I was sure how I could turn
> signalled() back on at the end of the write.... The way I did
> just seemed less risky and seem to work...
See above.
Cheers,
Trond
--
Trond Myklebust <trond.myklebust@fys.uio.no>
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
2005-01-11 20:10 ` Trond Myklebust
@ 2005-01-11 20:58 ` Steve Dickson
2005-01-11 21:59 ` Trond Myklebust
0 siblings, 1 reply; 14+ messages in thread
From: Steve Dickson @ 2005-01-11 20:58 UTC (permalink / raw)
To: Trond Myklebust; +Cc: nfs
Trond Myklebust wrote:
>ty den 11.01.2005 Klokka 14:36 (-0500) skreiv Steve Dickson:
>
>
>
>>>Firstly, how is "cl_intr" preventing core dumps? Normally, we should be
>>>masking all signals except SIGKILL, SIGINT and SIGQUIT. I can see that
>>>masking SIGINT and SIGQUIT might be useful here, but not SIGKILL...
>>>
>>>
>>>
>>>
>>Because __rpc_execute() notices that signalled() is set and aborts the
>>write by
>>returning -ERESTARTSYS makes the core dumping code stop dumping.
>>
>>
>
>Err... That's what the sigmask is for. signalled() only looks at those
>signals that are enabled by the sigmask.
>
>
hmm... I'll admit kernel signal process is not an area I've ventured
into much so
correct me if I'm wrong.... But if SIGKILL is masked off, would that stop
the process from getting the signal and in turn stop the core dropping
as well???
steved.
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
2005-01-11 20:58 ` Steve Dickson
@ 2005-01-11 21:59 ` Trond Myklebust
2005-01-12 18:18 ` Steve Dickson
0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2005-01-11 21:59 UTC (permalink / raw)
To: Steve Dickson; +Cc: nfs
ty den 11.01.2005 Klokka 15:58 (-0500) skreiv Steve Dickson:
> hmm... I'll admit kernel signal process is not an area I've ventured
> into much so
> correct me if I'm wrong.... But if SIGKILL is masked off, would that stop
> the process from getting the signal and in turn stop the core dropping
> as well???
Yes, but I'm not sure that it makes sense to mask off SIGKILL as that
makes core dumping completely uninterruptible. People do expect "kill
-9" to always work when the "intr" flag is set.
You should, however, mask off all other signals, including SIGINT and
SIGQUIT. The latter two (+ sigkill) are the only ones that
rpc_clnt_sigmask() does not currently mask out when "intr" is set.
Cheers,
Trond
--
Trond Myklebust <trond.myklebust@fys.uio.no>
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
2005-01-11 21:59 ` Trond Myklebust
@ 2005-01-12 18:18 ` Steve Dickson
2005-01-12 18:25 ` Trond Myklebust
0 siblings, 1 reply; 14+ messages in thread
From: Steve Dickson @ 2005-01-12 18:18 UTC (permalink / raw)
To: Trond Myklebust; +Cc: nfs
[-- Attachment #1: Type: text/plain, Size: 532 bytes --]
Trond Myklebust wrote:
>Yes, but I'm not sure that it makes sense to mask off SIGKILL as that
>makes core dumping completely uninterruptible. People do expect "kill
>-9" to always work when the "intr" flag is set.
>
>
True... and it will continue to work...
>You should, however, mask off all other signals, including SIGINT and
>SIGQUIT. The latter two (+ sigkill) are the only ones that
>rpc_clnt_sigmask() does not currently mask out when "intr" is set.
>
>
Good point... that attached patch does just that....
steved
[-- Attachment #2: linux-2.4.28-nfs-dropcore2.patch --]
[-- Type: text/x-patch, Size: 4119 bytes --]
--- linux-2.4.28/fs/nfs/pagelist.c.orig 2005-01-12 12:37:13.100184000 -0500
+++ linux-2.4.28/fs/nfs/pagelist.c 2005-01-12 12:40:43.770864000 -0500
@@ -94,8 +94,13 @@ nfs_create_request(struct rpc_cred *cred
*/
if (nfs_try_to_free_pages(server))
continue;
- if (signalled() && (server->flags & NFS_MOUNT_INTR))
+
+ if (signalled() && (server->flags & NFS_MOUNT_INTR)) {
+ if (task_core_dumping())
+ printk(KERN_WARNING \
+ "NFS: Core Dump Aborted due to lack of resources\n");
return ERR_PTR(-ERESTARTSYS);
+ }
yield();
}
--- linux-2.4.28/fs/nfs/nfs3proc.c.orig 2005-01-12 12:37:13.110184000 -0500
+++ linux-2.4.28/fs/nfs/nfs3proc.c 2005-01-12 12:40:43.786864000 -0500
@@ -31,6 +31,9 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt,
set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout(NFS_JUKEBOX_RETRY_TIME);
res = -ERESTARTSYS;
+ if (signalled() && task_core_dumping())
+ printk(KERN_WARNING \
+ "NFS: Core Dump Aborted due to lack of resources\n");
} while (!signalled());
rpc_clnt_sigunmask(clnt, &oldset);
return res;
--- linux-2.4.28/fs/exec.c.orig 2005-01-12 12:37:13.120184000 -0500
+++ linux-2.4.28/fs/exec.c 2005-01-12 12:40:43.798864000 -0500
@@ -1125,6 +1125,7 @@ int do_coredump(long signr, struct pt_re
if (current->rlim[RLIMIT_CORE].rlim_cur < binfmt->min_coredump)
goto fail;
+ current->flags |= PF_DUMPCORE;
format_corename(corename, core_pattern, signr);
file = filp_open(corename, O_CREAT | 2 | O_NOFOLLOW, 0600);
if (IS_ERR(file))
--- linux-2.4.28/include/linux/sunrpc/clnt.h.orig 2005-01-12 12:37:13.129184000 -0500
+++ linux-2.4.28/include/linux/sunrpc/clnt.h 2005-01-12 12:40:43.804865000 -0500
@@ -108,6 +108,8 @@ struct rpc_procinfo {
#ifdef __KERNEL__
+#define task_core_dumping() (current->flags & PF_DUMPCORE)
+
struct rpc_clnt *rpc_create_client(struct rpc_xprt *xprt, char *servname,
struct rpc_program *info,
u32 version, int authflavor);
--- linux-2.4.28/include/linux/nfs_fs.h.orig 2005-01-12 12:37:13.137184000 -0500
+++ linux-2.4.28/include/linux/nfs_fs.h 2005-01-12 12:40:43.823864000 -0500
@@ -345,7 +345,7 @@ extern void * nfs_root_data(void);
#define nfs_wait_event(clnt, wq, condition) \
({ \
int __retval = 0; \
- if (clnt->cl_intr) { \
+ if (clnt->cl_intr && !task_core_dumping()) { \
sigset_t oldmask; \
rpc_clnt_sigmask(clnt, &oldmask); \
__retval = wait_event_interruptible(wq, condition); \
--- linux-2.4.28/net/sunrpc/sched.c.orig 2005-01-12 12:37:13.145184000 -0500
+++ linux-2.4.28/net/sunrpc/sched.c 2005-01-12 12:40:43.834864000 -0500
@@ -498,7 +498,7 @@ __rpc_atrun(struct rpc_task *task)
static int
__rpc_execute(struct rpc_task *task)
{
- int status = 0;
+ int status = 0, core_retry = 0;
dprintk("RPC: %4d rpc_execute flgs %x\n",
task->tk_pid, task->tk_flags);
@@ -572,9 +572,20 @@ __rpc_execute(struct rpc_task *task)
* -ERESTARTSYS. In order to catch any callbacks that
* clean up after sleeping on some queue, we don't
* break the loop here, but go around once more.
- */
+ */
if (task->tk_client->cl_intr && signalled()) {
dprintk("RPC: %4d got signal\n", task->tk_pid);
+ /*
+ * If we are dropping a core and 'intr' is set,
+ * we want to make an attempted at writting
+ * out the core, but not get hung up in it.
+ */
+ if (task_core_dumping()) {
+ dprintk("RPC: %4d dropping core retries %d\n",
+ task->tk_pid, core_retry);
+ if (++core_retry < 3)
+ continue;
+ }
task->tk_flags |= RPC_TASK_KILLED;
rpc_exit(task, -ERESTARTSYS);
rpc_wake_up_task(task);
--- linux-2.4.28/net/sunrpc/clnt.c.orig 2003-11-28 13:26:21.000000000 -0500
+++ linux-2.4.28/net/sunrpc/clnt.c 2005-01-12 12:41:25.647200000 -0500
@@ -209,7 +209,7 @@ void rpc_clnt_sigmask(struct rpc_clnt *c
unsigned long irqflags;
/* Turn off various signals */
- if (clnt->cl_intr) {
+ if (clnt->cl_intr && !task_core_dumping()) {
struct k_sigaction *action = current->sig->action;
if (action[SIGINT-1].sa.sa_handler == SIG_DFL)
sigallow |= sigmask(SIGINT);
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
2005-01-12 18:18 ` Steve Dickson
@ 2005-01-12 18:25 ` Trond Myklebust
2005-01-12 19:11 ` Steve Dickson
0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2005-01-12 18:25 UTC (permalink / raw)
To: Steve Dickson; +Cc: nfs
[-- Attachment #1: Type: text/plain, Size: 442 bytes --]
on den 12.01.2005 Klokka 13:18 (-0500) skreiv Steve Dickson:
> >You should, however, mask off all other signals, including SIGINT and
> >SIGQUIT. The latter two (+ sigkill) are the only ones that
> >rpc_clnt_sigmask() does not currently mask out when "intr" is set.
> >
> >
> Good point... that attached patch does just that....
I meant something more like the following.
Cheers,
Trond
--
Trond Myklebust <trond.myklebust@fys.uio.no>
[-- Attachment #2: fix_coredump.dif --]
[-- Type: text/plain, Size: 1121 bytes --]
fs/exec.c | 1 +
net/sunrpc/clnt.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
Index: linux-2.4.28-rc1/fs/exec.c
===================================================================
--- linux-2.4.28-rc1.orig/fs/exec.c
+++ linux-2.4.28-rc1/fs/exec.c
@@ -1125,6 +1125,7 @@ int do_coredump(long signr, struct pt_re
if (current->rlim[RLIMIT_CORE].rlim_cur < binfmt->min_coredump)
goto fail;
+ current->flags |= PF_DUMPCORE;
format_corename(corename, core_pattern, signr);
file = filp_open(corename, O_CREAT | 2 | O_NOFOLLOW, 0600);
if (IS_ERR(file))
Index: linux-2.4.28-rc1/net/sunrpc/clnt.c
===================================================================
--- linux-2.4.28-rc1.orig/net/sunrpc/clnt.c
+++ linux-2.4.28-rc1/net/sunrpc/clnt.c
@@ -209,7 +209,7 @@ void rpc_clnt_sigmask(struct rpc_clnt *c
unsigned long irqflags;
/* Turn off various signals */
- if (clnt->cl_intr) {
+ if (clnt->cl_intr && !(current->flags & PF_DUMPCORE)) {
struct k_sigaction *action = current->sig->action;
if (action[SIGINT-1].sa.sa_handler == SIG_DFL)
sigallow |= sigmask(SIGINT);
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
2005-01-12 18:25 ` Trond Myklebust
@ 2005-01-12 19:11 ` Steve Dickson
2005-01-13 0:38 ` Trond Myklebust
0 siblings, 1 reply; 14+ messages in thread
From: Steve Dickson @ 2005-01-12 19:11 UTC (permalink / raw)
To: Trond Myklebust; +Cc: nfs
Trond Myklebust wrote:
>===================================================================
>--- linux-2.4.28-rc1.orig/net/sunrpc/clnt.c
>+++ linux-2.4.28-rc1/net/sunrpc/clnt.c
>@@ -209,7 +209,7 @@ void rpc_clnt_sigmask(struct rpc_clnt *c
> unsigned long irqflags;
>
> /* Turn off various signals */
>- if (clnt->cl_intr) {
>+ if (clnt->cl_intr && !(current->flags & PF_DUMPCORE)) {
> struct k_sigaction *action = current->sig->action;
> if (action[SIGINT-1].sa.sa_handler == SIG_DFL)
> sigallow |= sigmask(SIGINT);
>
>
Well did you try just this? ;-)
Here is what I found.... just adding the PF_DUMPCORE rpc_clnt_sigmask()
no core was dropped
because __rpc_execute returned -ERESTARTSYS due to signalled() == TRUE.
When I added back the PF_DUMPCORE check to __rpc_execute(), only the
header of the core
was dropped because nfs_wait_event() returned -ERESTARTSYS because
signalled() == TRUE.
When I added back the PF_DUMPCORE to nfs_wait_event(), the entire core
was dropped.
So it appears to me that you need both checks.....
steved
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
2005-01-12 19:11 ` Steve Dickson
@ 2005-01-13 0:38 ` Trond Myklebust
2005-01-13 14:01 ` Steve Dickson
0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2005-01-13 0:38 UTC (permalink / raw)
To: Steve Dickson; +Cc: nfs
on den 12.01.2005 Klokka 14:11 (-0500) skreiv Steve Dickson:
> Here is what I found.... just adding the PF_DUMPCORE rpc_clnt_sigmask()
> no core was dropped
> because __rpc_execute returned -ERESTARTSYS due to signalled() == TRUE.
>
> When I added back the PF_DUMPCORE check to __rpc_execute(), only the
> header of the core
> was dropped because nfs_wait_event() returned -ERESTARTSYS because
> signalled() == TRUE.
>
> When I added back the PF_DUMPCORE to nfs_wait_event(), the entire core
> was dropped.
>
> So it appears to me that you need both checks.....
No! Those extra checks are neither necessary, nor are they even correct!
I repeat what I said in my earlier mail:
- The change to nfs_wait_event() converts it into an
unconditional uninterruptible sleep. That means you can never
"kill -9" out of waiting for the core dump to finish in case the
server crashes!
- The loop you add to __rpc_execute() is pointless: The signal
that caused your process to wake up is neither cleared nor
masked, so when it later tries to block (such as when waiting
for a reply from the server), that process will just find itself
immediately woken up again by the same signal.
IOW: any milage you are getting out of this change will be 100%
timing-dependent. Not being able to wait for replies from the
server, you are depending entirely on the server to reply so
quickly that it appears to be "instantaneous".
...
As for myself, I'm getting full coredumps when I apply the 2 line patch
I sent you. The appended testcase works every time:
[trondmy@trondheim trondmy]$ uname -a
Linux trondheim.citi.umich.edu 2.4.29-rc2 #3 on jan 12 19:12:57 EST 2005 i686 athlon i386 GNU/Linux
[trondmy@trondheim trondmy]$ ./a.out
Quit (core dumped)
[trondmy@trondheim trondmy]$ ls -l core.1657
-rw------- 1 trondmy users 16953344 jan 12 19:36 core.1657
Cheers,
Trond
--
Trond Myklebust <trond.myklebust@fys.uio.no>
---------------------------------------
#include <signal.h>
#include <stdlib.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <unistd.h>
#define CORESIZE (1024*1024*16)
int main()
{
void *ptr;
struct rlimit rlimit = {
.rlim_cur = RLIM_INFINITY,
.rlim_max = RLIM_INFINITY
};
/* Ensure we are set to create core dump files */
if (setrlimit(RLIMIT_CORE, &rlimit) < 0) {
perror("setrlimit failed!");
exit(1);
}
/* Ensure coredump is large! */
ptr = malloc(CORESIZE);
memset(ptr, '0', CORESIZE);
/* raise hell */
raise(SIGQUIT);
}
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
2005-01-13 0:38 ` Trond Myklebust
@ 2005-01-13 14:01 ` Steve Dickson
2005-01-13 21:37 ` Steve Dickson
0 siblings, 1 reply; 14+ messages in thread
From: Steve Dickson @ 2005-01-13 14:01 UTC (permalink / raw)
To: Trond Myklebust; +Cc: nfs
Trond Myklebust wrote:
>on den 12.01.2005 Klokka 14:11 (-0500) skreiv Steve Dickson:
>
>
>
>>Here is what I found.... just adding the PF_DUMPCORE rpc_clnt_sigmask()
>>no core was dropped
>>because __rpc_execute returned -ERESTARTSYS due to signalled() == TRUE.
>>
>>When I added back the PF_DUMPCORE check to __rpc_execute(), only the
>>header of the core
>>was dropped because nfs_wait_event() returned -ERESTARTSYS because
>>signalled() == TRUE.
>>
>>When I added back the PF_DUMPCORE to nfs_wait_event(), the entire core
>>was dropped.
>>
>>So it appears to me that you need both checks.....
>>
>>
>
>No! Those extra checks are neither necessary, nor are they even correct!
>I repeat what I said in my earlier mail:
>
> - The change to nfs_wait_event() converts it into an
> unconditional uninterruptible sleep. That means you can never
> "kill -9" out of waiting for the core dump to finish in case the
> server crashes!
>
>
By adding the PF_DUMPCORE check to rpc_clnt_sigmask() (as you suggested)
I thought I had taken care of this problem... but now I realize the
check in nfs_wait_event()
is the real issue... sorry for making you swing that clue bat twice!! :)
> - The loop you add to __rpc_execute() is pointless: The signal
> that caused your process to wake up is neither cleared nor
> masked, so when it later tries to block (such as when waiting
> for a reply from the server), that process will just find itself
> immediately woken up again by the same signal.
>
>
I do understand this point.... and I *thought* I had addressed it by
making __rpc_execute()
temporary ignoring signals and having nfs_wait_event () call
wait_event() instead
of wait_event_interruptible().....
>As for myself, I'm getting full coredumps when I apply the 2 line patch
>I sent you. The appended testcase works every time:
>
>
hmm.... something is amiss.... because I'm definitely not see this....
but I'll keep digging...
thanks!
steved.
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
2005-01-13 14:01 ` Steve Dickson
@ 2005-01-13 21:37 ` Steve Dickson
2005-01-14 2:57 ` Trond Myklebust
0 siblings, 1 reply; 14+ messages in thread
From: Steve Dickson @ 2005-01-13 21:37 UTC (permalink / raw)
To: nfs
Steve Dickson wrote:
>
>> As for myself, I'm getting full coredumps when I apply the 2 line patch
>> I sent you. The appended testcase works every time:
>>
>>
> hmm.... something is amiss.... because I'm definitely not see this....
> but I'll keep digging...
after further review.... it seems the reason your patch works on a
2.4.28 kernel, is
because signalled() never returns true, even when the process is sent
the SIGSEGV signal. Why signalled() never returns true, I don't know....
but the
reason your patch did not work on a RHEL3 kernel is because signalled()
does
return true and calls to recalc_sigpending() do not reset that pending
signal....
which seems to be a bug in the NTPL-related code.... code that is not in the
2.4.28 kernel....
steved.
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
2005-01-13 21:37 ` Steve Dickson
@ 2005-01-14 2:57 ` Trond Myklebust
2005-01-14 19:52 ` Steve Dickson
0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2005-01-14 2:57 UTC (permalink / raw)
To: Steve Dickson; +Cc: nfs
to den 13.01.2005 Klokka 16:37 (-0500) skreiv Steve Dickson:
> after further review.... it seems the reason your patch works on a
> 2.4.28 kernel, is
> because signalled() never returns true, even when the process is sent
> the SIGSEGV signal. Why signalled() never returns true, I don't know....
> but the
> reason your patch did not work on a RHEL3 kernel is because signalled()
> does
> return true and calls to recalc_sigpending() do not reset that pending
> signal....
> which seems to be a bug in the NTPL-related code.... code that is not in the
> 2.4.28 kernel....
recalc_sigpending() failing to honour the sigmask sounds like a pretty
major bug! Is there no way you can backport some of the 2.6 signal code
or something like that in order to work around it?
It sounds to me as if that bug will affect more than just coredumps...
Cheers,
Trond
--
Trond Myklebust <trond.myklebust@fys.uio.no>
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] -o intr mount option prevents core dumps on 2.4 kernel
2005-01-14 2:57 ` Trond Myklebust
@ 2005-01-14 19:52 ` Steve Dickson
0 siblings, 0 replies; 14+ messages in thread
From: Steve Dickson @ 2005-01-14 19:52 UTC (permalink / raw)
To: Trond Myklebust; +Cc: nfs
Trond Myklebust wrote:
>to den 13.01.2005 Klokka 16:37 (-0500) skreiv Steve Dickson:
>
>
>
>>after further review.... it seems the reason your patch works on a
>>2.4.28 kernel, is
>>because signalled() never returns true, even when the process is sent
>>the SIGSEGV signal. Why signalled() never returns true, I don't know....
>>but the
>>reason your patch did not work on a RHEL3 kernel is because signalled()
>>does
>>return true and calls to recalc_sigpending() do not reset that pending
>>signal....
>>which seems to be a bug in the NTPL-related code.... code that is not in the
>>2.4.28 kernel....
>>
>>
>
>recalc_sigpending() failing to honour the sigmask sounds like a pretty major bug!
>
>
Well it turns it was a bug in the kernel signaling code which was causing
this issue.... which means there are no changes needed to the NFS code,
which is always a good thing... :)
See bz 132162 if interested....
steved.
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2005-01-14 19:52 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-06 18:31 [PATCH] -o intr mount option prevents core dumps on 2.4 kernel Steve Dickson
2005-01-07 17:14 ` Trond Myklebust
2005-01-11 19:36 ` Steve Dickson
2005-01-11 20:10 ` Trond Myklebust
2005-01-11 20:58 ` Steve Dickson
2005-01-11 21:59 ` Trond Myklebust
2005-01-12 18:18 ` Steve Dickson
2005-01-12 18:25 ` Trond Myklebust
2005-01-12 19:11 ` Steve Dickson
2005-01-13 0:38 ` Trond Myklebust
2005-01-13 14:01 ` Steve Dickson
2005-01-13 21:37 ` Steve Dickson
2005-01-14 2:57 ` Trond Myklebust
2005-01-14 19:52 ` Steve Dickson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.