Re: RFC swap over raid1

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Doug Ledford <dledford@redhat.com>
To: Roberto Spadim <roberto@spadim.com.br>
Cc: Stan Hoeppner <stan@hardwarefreak.com>,
	Linux-RAID <linux-raid@vger.kernel.org>
Subject: Re: RFC swap over raid1
Date: Fri, 02 Aug 2013 11:40:12 -0400	[thread overview]
Message-ID: <51FBD2DC.3010900@redhat.com> (raw)
In-Reply-To: <CAH3kUhE2ys1nGetLy0p=9rj2BxajAvBdWACFXHMe4TP81zNh=w@mail.gmail.com>

On 08/02/2013 10:21 AM, Roberto Spadim wrote:
> hummm, wow very new informations to me...
> 
> today linux don't have a generic badblock remap?! that's what i understood?
> for example... ext2,3,4,xfs,reiserfs,zfs, and others fs, they handle
> badblock by their self? right?

No, not really.  I think back in the day, badblock support mattered, but
it was really just limited to having a list of known bad sectors that
the filesystem would never use because once the sector failed, it was
toast.  But that was when the firmware on the disk controllers didn't
have a pool of spare sectors that were available to remap bad blocks.

Now a days, the drives automatically remap bad sectors into their own
internal spare sector pool.  The only time the OS sees a bad block is
when it went bad by surprise to the drive and so the data couldn't be
read and remapped before it went away.

In that case, you just rewrite something to the bad sector, and
generally the drive firmware will have remap-on-write-error enabled and
generally the failed sector will not only fail to be read but will also
fail to be written, and so the drive will remap the bad sector to a
spare as long as it has spares available.

It's for this reason that, with modern drives, a failed read is somewhat
acceptable as it will likely be fixed simply by writing back to the same
sector, but if that sector persists in being bad even after a write,
then you know that the drive's internal pool of spare sectors are all
allocated and so all future failures on the drive will be permanent
failures.  It's at that point that you need to replace the drive ASAP.

However, filesystems don't keep two copies of their data laying around
in order to rewrite bad sectors.  The md raid layer does (when using a
reliable level of course).  Basically, badblock management by
filesystems has always just been to mark a sector as bad and work around
it (with a possibly corrupted file as a result).  Badblock management by
the raid subsystem is to try and get the drive to reallocate the sector
by rewriting the correct data to that sector.

> it's a nice information i never thinked about a layer only for
> badblock reallocation, i read/write in this list of linux-raid when
> the started the badblock development, in some time near to raid1 write
> multithread
> 
> today the badblock of raid1 is embedded in the source? or it's easy to
> implement a new layer just to badblock realloc logic?
> 
> about "mkswap -c" it just show information like you told, i'm a bit
> surprised about no badblock at swap that's information is new to me i
> will read about others os (freebsd, reactos, etc) to check how they
> handle this there
> 
> I'm rethinking now about the swap as a file in a filesystem, this
> could increase security or another solution is better?

It used to be that a swapfile on a filesystem was slower than swap on
its own partition.  I think they cleared that up some time ago so that
the speed difference is mostly negligible.  But having it as a file on
the filesystem makes management of partitions easier, so that's
something in the swapfile's favor.

next prev parent reply	other threads:[~2013-08-02 15:40 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-01 22:11 RFC swap over raid1 Roberto Spadim
2013-08-01 23:04 ` Doug Ledford
2013-08-02  2:01   ` Roberto Spadim
2013-08-02  7:46     ` Stan Hoeppner
2013-08-02 14:21       ` Roberto Spadim
2013-08-02 15:40         ` Doug Ledford [this message]
2013-08-02 15:59           ` Roberto Spadim
2013-08-02 16:35             ` Doug Ledford
2013-08-02 16:40               ` Roberto Spadim
2013-08-02 16:50                 ` Doug Ledford
2013-08-02 17:29                   ` Roberto Spadim
2013-08-02 17:35                     ` Doug Ledford
2013-08-02 17:38                       ` Roberto Spadim
2013-08-02 18:26                     ` keld
2013-08-02 18:39                       ` Roberto Spadim
2013-08-02 21:31                         ` Keld Jørn Simonsen
2013-08-02 15:21     ` Doug Ledford
2013-08-02  1:59 ` Brad Campbell
2013-08-02  2:02   ` Roberto Spadim
2013-08-02  2:18     ` Brad Campbell
2013-08-02  2:21       ` Roberto Spadim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51FBD2DC.3010900@redhat.com \
    --to=dledford@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=roberto@spadim.com.br \
    --cc=stan@hardwarefreak.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).