Re: [PoC 0/7] Kobjectify filesystem

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Goldwyn Rodrigues <rgoldwyn@suse.de>
To: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: linux-fsdevel@vger.kernel.org, Goldwyn Rodrigues <rgoldwyn@suse.com>
Subject: Re: [PoC 0/7] Kobjectify filesystem
Date: Fri, 29 Apr 2016 17:09:44 -0500	[thread overview]
Message-ID: <5723DBA8.8060802@suse.de> (raw)
In-Reply-To: <1461960974.2619.17.camel@slavad-ubuntu-14.04>



On 04/29/2016 03:16 PM, Viacheslav Dubeyko wrote:
> On Fri, 2016-04-29 at 13:28 -0500, Goldwyn Rodrigues wrote:
>>
>> On 04/29/2016 12:32 PM, Viacheslav Dubeyko wrote:
>>>
>>> You can register any attributes in sysfs. So, what do you suggest
>>> finally? What common scheme for all file systems do you suggest to use?
>>> Suppose, I didn't catch the idea. Did you invent sysfs itself? Could you
>>> describe your vision more clearly? The idea looks obscure right now,
>>> from my point of view.
>>>
>> Yes, you can register any attribute to sysfs, and most filesystems are
>> doing exactly that. They maintain the kobject in their <fs>_super_block
>> struct and use it to create /sys/fs/<fstype>/<id> entries. So what I
>> propose is this:
>>
>> 1. Move the kobject to super_block from individual filesystem's
>> super_block structs and institute it when the filesystem mounts. We
>> could use explicit flags if filesystems choose not to use this "feature"
>> and do it on their own.
>>
>> 2. Add a kset to files_system_type to create the /sys/fs/<fstype> entry.
>> Again, most filesystems are doing this anyways.
>>
>
> OK. I see your point. Sounds reasonable for me.
>
>> 3. Provide super_block_attribute structures if filesystems want to
>> export their filesystem attributes via sysfs. Individual filesystems
>> would have to write their own <name>_store() and <name>_show() functions
>> to describe how/what they want to export the values. These are purely
>> helping functions.
>>
>> 4. (Primary objective: Improve availability of filesystems) Use the
>> kobject in the super_block to generate filesystem uevents which could be
>> used to communicate errors. The idea was to provide an option of
>> errors=continue in filesystems so filesystems continue in case of
>> errors. While the process encountering the error will be terminated with
>> say an EIO, however an error-code is also delivered through uevent which
>> can be picked up by udev scripts. Since the filesystem module is also
>> listed, each individual filesystems should come out with a utility to
>> fix the problem while the filesystem is online at best-effort ability.
>> Or could take backups or inform the administrator etc.
>> For a fix, the uevent would deliver the necessary information, say the
>> inode number or file path, which could be used to fix this. The
>> filesystem utility would use this to fix the error online. Once fixed,
>> the program which caused the error can be restarted. Other possible
>> errors are ENOSPC being interrupt based as opposed to poll based
>> applications which exist today.
>
> Are you sure that error code or inode number will be enough for file
> system consistency recovering? Usually, it needs to unmount a file
> system for recovering activity.

Perhaps not, but filesystems would have enough information required to 
focus on the error and hence the fix. And that should be provided in the 
uevent by individual filesystems. This is an effort to improve 
availability in case of errors and avoid an umount (translated downtime) 
for fsck, or at least postpone it to a later time.

> Are you sure that application will be
> able to survive after or during fsck utility activity?

No, the application will not survive. The application will get an EIO or 
some other error. However, as soon as the online fsck has done it's 
patching (successfully), the application would be able to re-start 
immediately until a downtime can be scheduled.

Consider this approach as a EMS (Emergency Medical Service) as opposed 
to a full hospital recovery. A paramedic will provide the best possible 
first aid until the time the patient can be taken to a hospital for a 
full recovery (offline fsck recovery). Depending on the kind of failure, 
  it is possible  the paramedic can handle everything required to get 
the patient on his feet or cannot do anything to fix the problem.

> How do you
> imagine the whole workflow? Again, not every file system has fsck
> utility and not every corruption could be fixed by fsck.

This hopefully would provide an ecosystem for the filesystem developers 
to come up with ideas and utilities which can fix the error as soon as 
possible. This can/would be automated through udev scripts. Yes, 
filesystems don't have it as yet, but I am hoping this effort will lead 
to it. If the error cannot be solved from an offline fsck, I doubt an 
online fsck would be able to solve it.


>
>> Currently, ocfs2 performs an online fix, albeit at a basic inode block
>> level. You write the faulty inode number into a sysfs file and it tries
>> to fix inode block.
>> Please note, this is a trade-off between availability and
>> consistency/integrity in the system. While such a system would keep the
>> system alive and running at peak hours, a complete fsck may still be
>> required when the administrators are more at peace (off-peak hours)
>>
>
> So, what generalized functionality are you ready to provide for other
> file systems? And what generalized interface should be used by file
> systems? Could you share your preliminary vision?

The fs layer would provide  a function report_event() for filesystems to 
use. When a filesystem encounters an error, it reports to userspace 
using uevents/udev in the form of "VARIABLE=VALUE" arrays. A udev script 
will interpret this information and take the necessary action, which 
could be a user defined script. Users are creative enough to do what 
they want. It could be any of:
	- Mailing the administrator
	- Fixing the error through online checks
	- Something else the user wants to do.

I know xfs has been working on online filesystem checks for a while:
http://xfs.org/index.php/Reliable_Detection_and_Repair_of_Metadata_Corruption

-- 
Goldwyn

next prev parent reply	other threads:[~2016-04-29 22:09 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-29  2:01 [PoC 0/7] Kobjectify filesystem Goldwyn Rodrigues
2016-04-29  2:01 ` [PoC 1/7] Add kset to file_system_type Goldwyn Rodrigues
2016-04-29  2:01 ` [PoC 2/7] Add kobject to super_block Goldwyn Rodrigues
2016-04-29  2:26   ` Al Viro
2016-04-29 19:09     ` Goldwyn Rodrigues
2016-04-29  2:01 ` [PoC 3/7] Create sysfs files under sb Goldwyn Rodrigues
2016-04-29  2:01 ` [PoC 4/7] Report file system events Goldwyn Rodrigues
2016-04-29  2:01 ` [PoC 5/7] ocfs2: Use the sb's kset Goldwyn Rodrigues
2016-04-29  2:01 ` [PoC 6/7] ocfs2: create filecheck files Goldwyn Rodrigues
2016-04-29  2:01 ` [PoC 7/7] ocfs2: report inode errors to userspace Goldwyn Rodrigues
2016-04-29 17:32 ` [PoC 0/7] Kobjectify filesystem Viacheslav Dubeyko
2016-04-29 18:28   ` Goldwyn Rodrigues
2016-04-29 20:16     ` Viacheslav Dubeyko
2016-04-29 22:09       ` Goldwyn Rodrigues [this message]
2016-04-29 22:31     ` Al Viro
2016-04-29 22:45       ` Goldwyn Rodrigues
2016-05-04 22:20       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5723DBA8.8060802@suse.de \
    --to=rgoldwyn@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=rgoldwyn@suse.com \
    --cc=slava@dubeyko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).