All of lore.kernel.org
 help / color / mirror / Atom feed
* [Lustre-devel] async write and abort_recov
@ 2010-07-12 10:10 Aurelien Degremont
  2010-07-12 20:16 ` Andreas Dilger
  0 siblings, 1 reply; 6+ messages in thread
From: Aurelien Degremont @ 2010-07-12 10:10 UTC (permalink / raw)
  To: lustre-devel

Hello

I'm wondering how Lustre client handles recovery when OST restarts with abort_recov flag set.

Let's say a client has page to flush to OST, but OST is stopped, then restarts with -o abort_recov.
There is no recovery, so:
1- client retakes extent locks and then re-try to flush its pages
or
2- client cannot flush anymore and drop the i/o, returns an error to the caller.


If #2, what if the process has already closed the file ?
What is the file is still opened and the process try to do another I/O, will it have an error for the former bad i/o?


abort_recov is used only at first start, or the OST uses this flag until it is stopped for any other recovery-like
mechanisms?


Thanks

-- 
Aurelien Degremont

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] async write and abort_recov
  2010-07-12 10:10 [Lustre-devel] async write and abort_recov Aurelien Degremont
@ 2010-07-12 20:16 ` Andreas Dilger
  2010-07-15  8:05   ` Aurelien Degremont
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Dilger @ 2010-07-12 20:16 UTC (permalink / raw)
  To: lustre-devel

On 2010-07-12, at 04:10, Aurelien Degremont wrote:
> I'm wondering how Lustre client handles recovery when OST restarts with abort_recov flag set.
> 
> Let's say a client has page to flush to OST, but OST is stopped, then restarts with -o abort_recov.  There is no recovery, so:
> 1- client retakes extent locks and then re-try to flush its pages
> or
> 2- client cannot flush anymore and drop the i/o, returns an error to the caller.

When the client is evicted, it drops all of its locks for that OST, and any unwritten pages for those files is discarded.  While I know Lustre will save errors from async write RPCs into the file descriptor (for later write calls or fsync), I don't know if we save any IO error into the file descriptor if we discard pages due to eviction.  I think only errors due to currently in-flight RPCs that are aborted due to client eviction are returned.

This is the same for "-o abort_recov" or if the client is evicted for other reasons (failed lock callbacks, or failed recovery even if abort_recovery is not used).

> If #2, what if the process has already closed the file ?
> What is the file is still opened and the process try to do another I/O, will it have an error for the former bad i/o?

If the file is not closed yet, then fsync or a later write will return an earlier error.  If the file descriptor is closed then there is no way to return that error.  That is true for local filesystems as well.

> abort_recov is used only at first start, or the OST uses this flag until it is stopped for any other recovery-like mechanisms?

The "abort_recov" mount option is equivalent to:

	lctl --device {ost dev} abort_recovery

it is only affecting the initial startup recovery, and is ignored afterward.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] async write and abort_recov
  2010-07-12 20:16 ` Andreas Dilger
@ 2010-07-15  8:05   ` Aurelien Degremont
  2010-07-15 16:19     ` Andreas Dilger
  0 siblings, 1 reply; 6+ messages in thread
From: Aurelien Degremont @ 2010-07-15  8:05 UTC (permalink / raw)
  To: lustre-devel

Andreas Dilger a ?crit :
> While I know Lustre will save errors from async write RPCs into the file descriptor 
 > for later write calls or fsync), I don't know if we save any IO error into the file
 > descriptor if we discard pages due to eviction.  I think only errors due to currently
 > in-flight RPCs that are aborted due to client eviction are returned.

Sounds like a bug to me?
That means, if a process write data on a client, those data goes to page cache.
Not yet to OST if there is no local memory pressure. At that moment, if the client is evicted, those pages are dropped.
Then client reconnect, the process writes other data. Those I/O are successful, client has missed that some previous I/O 
failed?

Am I correct?

-- 
Aurelien Degremont
CEA

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] async write and abort_recov
  2010-07-15  8:05   ` Aurelien Degremont
@ 2010-07-15 16:19     ` Andreas Dilger
  2010-07-15 18:57       ` John Hammond
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Dilger @ 2010-07-15 16:19 UTC (permalink / raw)
  To: lustre-devel

On 2010-07-15, at 02:05, Aurelien Degremont wrote:
> Andreas Dilger a ?crit :
>> While I know Lustre will save errors from async write RPCs into the file descriptor for later write calls or fsync), I don't know if we save any IO error into the file descriptor if we discard pages due to eviction.  I think only errors due to currently in-flight RPCs that are aborted due to client eviction are returned.
> 
> Sounds like a bug to me?  That means, if a process write data on a client, those data goes to page cache.  Not yet to OST if there is no local memory pressure. At that moment, if the client is evicted, those pages are dropped. Then client reconnect, the process writes other data. Those I/O are successful, client has missed that some previous I/O failed?

I would agree.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] async write and abort_recov
  2010-07-15 16:19     ` Andreas Dilger
@ 2010-07-15 18:57       ` John Hammond
  2010-07-15 19:25         ` Alexey Lyashkov
  0 siblings, 1 reply; 6+ messages in thread
From: John Hammond @ 2010-07-15 18:57 UTC (permalink / raw)
  To: lustre-devel

On 07/15/2010 11:19 AM, Andreas Dilger wrote:
> On 2010-07-15, at 02:05, Aurelien Degremont wrote:
>> Andreas Dilger a ?crit :
>>> While I know Lustre will save errors from async write RPCs into
>>> the file descriptor for later write calls or fsync), I don't know
>>> if we save any IO error into the file descriptor if we discard
>>> pages due to eviction.  I think only errors due to currently
>>> in-flight RPCs that are aborted due to client eviction are
>>> returned.

If the async write fails due to eviction then writepage() will store 
-ESHUTDOWN in the inode info's lli_async_rc member.

>> Sounds like a bug to me?  That means, if a process write data on a
>> client, those data goes to page cache.  Not yet to OST if there is
>> no local memory pressure. At that moment, if the client is evicted,
>> those pages are dropped. Then client reconnect, the process writes
>> other data. Those I/O are successful, client has missed that some
>> previous I/O failed?

I filed a bug because the async errors weren't being reported, see 
https://bugzilla.lustre.org/show_bug.cgi?id=22360.  It looks like this 
is addressed in 1.8.4.  Thereafter they should be reported on the next 
call to close() for that inode; but note that the error need not go to 
the processes whose writes were lost.  Tant pis!

-- 
John L. Hammond, Ph.D.
ICES, The University of Texas at Austin
jhammond at ices.utexas.edu
(512) 471-9304

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] async write and abort_recov
  2010-07-15 18:57       ` John Hammond
@ 2010-07-15 19:25         ` Alexey Lyashkov
  0 siblings, 0 replies; 6+ messages in thread
From: Alexey Lyashkov @ 2010-07-15 19:25 UTC (permalink / raw)
  To: lustre-devel


On Jul 15, 2010, at 21:57, John Hammond wrote:

> On 07/15/2010 11:19 AM, Andreas Dilger wrote:
>> On 2010-07-15, at 02:05, Aurelien Degremont wrote:
>>> Andreas Dilger a ?crit :
>>>> While I know Lustre will save errors from async write RPCs into
>>>> the file descriptor for later write calls or fsync), I don't know
>>>> if we save any IO error into the file descriptor if we discard
>>>> pages due to eviction.  I think only errors due to currently
>>>> in-flight RPCs that are aborted due to client eviction are
>>>> returned.
> 
> If the async write fails due to eviction then writepage() will store 
> -ESHUTDOWN in the inode info's lli_async_rc member.
no sure. look to ll_ap_completion to correct error reporting.

        } else {                                                                                                
                if (cmd & OBD_BRW_READ) {                                                                       
                        llap->llap_defer_uptodate = 0;                                                          
                }                                                                                               
                SetPageError(page);                                                                             
                if (rc == -ENOSPC)                                                                              
                        set_bit(AS_ENOSPC, &page->mapping->flags);                                              
                else                                                                                            
                        set_bit(AS_EIO, &page->mapping->flags);                                                 
        }                                                                                                       

but that codepath never called if client has dirty data, but async IO don't started.
in that case, client canceled owned locks with local + discard flags set, so ll_page_removal_cb called with discard flag set and error bit don't set in mapping.



--------------------------------------
Alexey Lyashkov
alexey.lyashkov at clusterstor.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-07-15 19:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-12 10:10 [Lustre-devel] async write and abort_recov Aurelien Degremont
2010-07-12 20:16 ` Andreas Dilger
2010-07-15  8:05   ` Aurelien Degremont
2010-07-15 16:19     ` Andreas Dilger
2010-07-15 18:57       ` John Hammond
2010-07-15 19:25         ` Alexey Lyashkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.