All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] lnet_upcall on LBUG & LU-8418
@ 2016-08-30 15:16 Patrick Farrell
  2016-09-01 17:37 ` Oleg Drokin
  0 siblings, 1 reply; 3+ messages in thread
From: Patrick Farrell @ 2016-08-30 15:16 UTC (permalink / raw)
  To: lustre-devel

Hello,

Currently, on LBUG, Lustre tries to call a usermode helper at 
'/usr/lib/lustre/lnet_upcall'.  This is for some sort of binary that a 
user would like executed before the LBUG itself (by default, a panic) 
happens.  Lustre does not include an lnet_upcall script, so by default, 
the call fails.

Unfortunately, in extremely low memory situations, the attempt to make 
this call can hang, resulting in a node which is in an invalid state but 
will not actually panic.  This is quite problematic as it can, for 
example, prevent failover or dump collection (for debugging purposes), 
depending on how a system is configured.

LU-8418 <https://jira.hpdd.intel.com/browse/LU-8418> (from Alexander 
Zarochentsev) is looking to disable this by default.  As Andreas Dilger 
pointed out in the patch review 
(http://review.whamcloud.com/#/c/21440/), this would break any existing 
users who had put their script in that location.

But I suspect no one is actually using this feature.

So:
Do you use (or know of anyone using) the lnet_upcall feature to call a 
binary before LBUG?  (I'm looking for end user uses; if a developer is 
using it, I think it's reasonable to ask them to set it manually.)

- Patrick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20160830/12864eb6/attachment.htm>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [lustre-devel] lnet_upcall on LBUG & LU-8418
  2016-08-30 15:16 [lustre-devel] lnet_upcall on LBUG & LU-8418 Patrick Farrell
@ 2016-09-01 17:37 ` Oleg Drokin
  2016-09-20 17:04   ` James Simmons
  0 siblings, 1 reply; 3+ messages in thread
From: Oleg Drokin @ 2016-09-01 17:37 UTC (permalink / raw)
  To: lustre-devel


On Aug 30, 2016, at 11:16 AM, Patrick Farrell wrote:

> Hello,
> 
> Currently, on LBUG, Lustre tries to call a usermode helper at '/usr/lib/lustre/lnet_upcall'.  This is for some sort of binary that a user would like executed before the LBUG itself (by default, a panic) happens.  Lustre does not include an lnet_upcall script, so by default, the call fails.

I think this i a throwback to prehistoric times when BUG was not causing panic
by default to try and copy stuff off the node with no local storage
before it's killed.
Modern crashdumping more or less superseded is.

> Unfortunately, in extremely low memory situations, the attempt to make this call can hang, resulting in a node which is in an invalid state but will not actually panic.  This is quite problematic as it can, for example, prevent failover or dump collection (for debugging purposes), depending on how a system is configured.
> 
> LU-8418 (from Alexander Zarochentsev) is looking to disable this by default.  As Andreas Dilger pointed out in the patch review (http://review.whamcloud.com/#/c/21440/), this would break any existing users who had put their script in that location.
> 
> But I suspect no one is actually using this feature.
> 
> So:
> Do you use (or know of anyone using) the lnet_upcall feature to call a binary before LBUG?  (I'm looking for end user uses; if a developer is using it, I think it's reasonable to ask them to set it manually.)

I think it's unused so it should be safe to kill it, but let's see if anybody shows up
indeed.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [lustre-devel] lnet_upcall on LBUG & LU-8418
  2016-09-01 17:37 ` Oleg Drokin
@ 2016-09-20 17:04   ` James Simmons
  0 siblings, 0 replies; 3+ messages in thread
From: James Simmons @ 2016-09-20 17:04 UTC (permalink / raw)
  To: lustre-devel

> On Aug 30, 2016, at 11:16 AM, Patrick Farrell wrote:
> 
> > Hello,
> > 
> > Currently, on LBUG, Lustre tries to call a usermode helper at '/usr/lib/lustre/lnet_upcall'.  This is for some sort of binary that a user would like executed before the LBUG itself (by default, a panic) happens.  Lustre does not include an lnet_upcall script, so by default, the call fails.
> 
> I think this i a throwback to prehistoric times when BUG was not causing panic
> by default to try and copy stuff off the node with no local storage
> before it's killed.
> Modern crashdumping more or less superseded is.

Sorry I didn't answer earlier. I agree with Oleg. Lets just remove it. 
This is a case of lack of document which meant no one knew about it
worked to our advantage.
 
> > Unfortunately, in extremely low memory situations, the attempt to make this call can hang, resulting in a node which is in an invalid state but will not actually panic.  This is quite problematic as it can, for example, prevent failover or dump collection (for debugging purposes), depending on how a system is configured.
> > 
> > LU-8418 (from Alexander Zarochentsev) is looking to disable this by default.  As Andreas Dilger pointed out in the patch review (http://review.whamcloud.com/#/c/21440/), this would break any existing users who had put their script in that location.
> > 
> > But I suspect no one is actually using this feature.
> > 
> > So:
> > Do you use (or know of anyone using) the lnet_upcall feature to call a binary before LBUG?  (I'm looking for end user uses; if a developer is using it, I think it's reasonable to ask them to set it manually.)
> 
> I think it's unused so it should be safe to kill it, but let's see if anybody shows up
> indeed.
> 
> Bye,
>     Oleg
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-09-20 17:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-30 15:16 [lustre-devel] lnet_upcall on LBUG & LU-8418 Patrick Farrell
2016-09-01 17:37 ` Oleg Drokin
2016-09-20 17:04   ` James Simmons

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.