[lustre-devel] Changelogs and RH

All of lore.kernel.org
 help / color / mirror / Atom feed

* [lustre-devel] Changelogs and RH
@ 2015-05-12 18:27 Nathan Rutman
  2015-05-13  7:54 ` DOREAU Henri
  0 siblings, 1 reply; 3+ messages in thread
From: Nathan Rutman @ 2015-05-12 18:27 UTC (permalink / raw)
  To: lustre-devel

Someone sent me a link to this:
http://arxiv.org/pdf/1505.02656v1.pdf
Very cool. We'll need to start using that.

This reminded me to send my changelog/robinhood/HSM concerns that I brought
up at LUG to you guys for your thoughts.

1. What should happen when the changelog on an MDS fills up? Maybe LCAP
helps with the processing rate, but fundamentally the issue might still
happen if nobody consumes due to various software or comms errors. We
should either stop recording records and risk losing change tracking, or
stop MDS processing. (I believe at the moment this will just crash the
MDS.) We probably need a high water mark.

2. There should be some kind of rate limiting for HSM requests (RH to MDS),
so that the number of HSM requests queued up in the coordinator doesn't
grow without bound.  Probably we need a -EAGAIN return code to RH at some
point.

3. It feels like there needs to be some feedback from the backend HSM
storage to RH, in particular to pass back a "backend full" message. We can
presumably pass a backend ENOSPC from the copytool back to the Coordinator,
but how can that message get back to Robinhood? I guess coordinator could
start returning ENOSPC for subsequent archive requests from RH, but then we
have to clear that response if the backend condition clears.

*--*

*Nathan Rutman ? Principal Systems ArchitectSeagate Technology** ? *+1 503
877-9507* ? *GMT-8
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20150512/66138479/attachment.htm>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [lustre-devel] Changelogs and RH
@ 2015-05-13  4:33 Ulka Vaze
  0 siblings, 0 replies; 3+ messages in thread
From: Ulka Vaze @ 2015-05-13  4:33 UTC (permalink / raw)
  To: lustre-devel

Hello Nathan ,
   I was just going through the questions and i was wondering
     Is it possible to have SNMP trap like mechanism in MDS ?

    Every  policy engine has to register for the traps or events from MDS.
Traps can be change log full , or disk  full etc.
 So when MDS reaches high water mark it will send trap  to RH
then RH should buffer requests till it gets next trap.

But i am not aware of architectural complexity or amount of change needed
etc.
I am new to lustre. So sorry if i have suggested something which might have
already discussed or stupid in this context.
So this is just a thought and thought of sharing to have your view.






On Wed, May 13, 2015 at 1:34 AM, <lustre-devel-request@lists.lustre.org>
wrote:

> Send lustre-devel mailing list submissions to
>         lustre-devel at lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
> or, via email, send a message with subject or body 'help' to
>         lustre-devel-request at lists.lustre.org
>
> You can reach the person managing the list at
>         lustre-devel-owner at lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-devel digest..."
>
>
> Today's Topics:
>
>    1. Changelogs and RH (Nathan Rutman)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 12 May 2015 11:27:42 -0700
> From: Nathan Rutman <nathan.rutman@seagate.com>
> To: henri.doreau at cea.fr, Thomas Leibovici <Thomas.Leibovici@cea.fr>
> Cc: "lustre-devel at lists.lustre.org" <lustre-devel@lists.lustre.org>,
>         St?phane Thiell <stephane.thiell@cea.fr>
> Subject: [lustre-devel] Changelogs and RH
> Message-ID:
>         <CAB_j=
> MdgcH6_3Y0RopcL_YaX86iNrVjOo7Pp3dJD1kJvhVAcJQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Someone sent me a link to this:
> http://arxiv.org/pdf/1505.02656v1.pdf
> Very cool. We'll need to start using that.
>
> This reminded me to send my changelog/robinhood/HSM concerns that I brought
> up at LUG to you guys for your thoughts.
>
> 1. What should happen when the changelog on an MDS fills up? Maybe LCAP
> helps with the processing rate, but fundamentally the issue might still
> happen if nobody consumes due to various software or comms errors. We
> should either stop recording records and risk losing change tracking, or
> stop MDS processing. (I believe at the moment this will just crash the
> MDS.) We probably need a high water mark.
>
> 2. There should be some kind of rate limiting for HSM requests (RH to MDS),
> so that the number of HSM requests queued up in the coordinator doesn't
> grow without bound.  Probably we need a -EAGAIN return code to RH at some
> point.
>
> 3. It feels like there needs to be some feedback from the backend HSM
> storage to RH, in particular to pass back a "backend full" message. We can
> presumably pass a backend ENOSPC from the copytool back to the Coordinator,
> but how can that message get back to Robinhood? I guess coordinator could
> start returning ENOSPC for subsequent archive requests from RH, but then we
> have to clear that response if the backend condition clears.
>
> *--*
>
> *Nathan Rutman ? Principal Systems ArchitectSeagate Technology** ? *+1 503
> 877-9507* ? *GMT-8
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20150512/66138479/attachment.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
>
>
> ------------------------------
>
> End of lustre-devel Digest, Vol 100, Issue 5
> ********************************************
>



-- 
Thanks & Regards,

   *Ulka Vaze*
Principle System Software Engineer,

ClogenyTechnologies

Ulka.vaze at clogeny.com

+91-989-032-3754
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20150513/f964a9fb/attachment.htm>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [lustre-devel] Changelogs and RH
  2015-05-12 18:27 [lustre-devel] Changelogs and RH Nathan Rutman
@ 2015-05-13  7:54 ` DOREAU Henri
  0 siblings, 0 replies; 3+ messages in thread
From: DOREAU Henri @ 2015-05-13  7:54 UTC (permalink / raw)
  To: lustre-devel

Le 12/05/2015 20:27, Nathan Rutman a ?crit :
> Someone sent me a link to this:
> http://arxiv.org/pdf/1505.02656v1.pdf
> Very cool. We'll need to start using that.
>
> This reminded me to send my changelog/robinhood/HSM concerns that I 
> brought up at LUG to you guys for your thoughts.
>
> 1. What should happen when the changelog on an MDS fills up? Maybe 
> LCAP helps with the processing rate, but fundamentally the issue might 
> still happen if nobody consumes due to various software or comms 
> errors. We should either stop recording records and risk losing change 
> tracking, or stop MDS processing. (I believe at the moment this will 
> just crash the MDS.) We probably need a high water mark.
>
> 2. There should be some kind of rate limiting for HSM requests (RH to 
> MDS), so that the number of HSM requests queued up in the coordinator 
> doesn't grow without bound. Probably we need a -EAGAIN return code to 
> RH at some point.
>
> 3. It feels like there needs to be some feedback from the backend HSM 
> storage to RH, in particular to pass back a "backend full" message. We 
> can presumably pass a backend ENOSPC from the copytool back to the 
> Coordinator, but how can that message get back to Robinhood? I guess 
> coordinator could start returning ENOSPC for subsequent archive 
> requests from RH, but then we have to clear that response if the 
> backend condition clears.
>
> *--*
> *Nathan Rutman ? Principal Systems Architect
> Seagate Technology** ? *+1 503 877-9507* ? *GMT-8

Hello Nathan,

1: when the changelog catalog is full (4B entries IIRC) lustre should 
either automatically clear the catalog or turn the FS read-only 
(tunable, indeed). I want to propose a patch for this but don't have it yet.

2: Right, there is no limitation at the moment. I think what is needed 
there is rather a high watermark on the number of pending requests than 
rate limiting. Note that on robinhood side your can set limitations on 
the number of active requests.

3: As you say, the copytools can propagate error messages back to the 
coordinator, indicating whether they are retryable or not. Non-retryable 
errors would cause the requests to fail. Lustre can then either emit a 
changelog for failed requests (which is on the edge of what changelogs 
are for, though...) or we can add some mechanism into rbh to let it 
react when it detects that too many requests have failed. That said, 
many failed requests is something that probably has to be detected and 
handled by monitoring systems. Avoiding too tight coupling between HSM 
components is desirable.

Regards

-- 
Henri
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20150513/423ea916/attachment.htm>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-05-13  7:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-12 18:27 [lustre-devel] Changelogs and RH Nathan Rutman
2015-05-13  7:54 ` DOREAU Henri
  -- strict thread matches above, loose matches on Subject: below --
2015-05-13  4:33 Ulka Vaze

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.