Running check and e2fsck simultaneously

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Running check and e2fsck simultaneously
@ 2013-11-10 16:06 Ivan Lezhnjov IV
  2013-11-10 18:08 ` Stan Hoeppner
  0 siblings, 1 reply; 17+ messages in thread
From: Ivan Lezhnjov IV @ 2013-11-10 16:06 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

Is this a good idea to run check and e2fsck on raid1 simultaneously? I'm leaning towards a definitive no, but then again I don't really know.

Ivan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-10 16:06 Running check and e2fsck simultaneously Ivan Lezhnjov IV
@ 2013-11-10 18:08 ` Stan Hoeppner
  2013-11-10 18:12   ` Ivan Lezhnjov IV
  0 siblings, 1 reply; 17+ messages in thread
From: Stan Hoeppner @ 2013-11-10 18:08 UTC (permalink / raw)
  To: Ivan Lezhnjov IV, linux-raid@vger.kernel.org

On 11/10/2013 10:06 AM, Ivan Lezhnjov IV wrote:
> Is this a good idea to run check and e2fsck on raid1 simultaneously? I'm leaning towards a definitive no, but then again I don't really know.

A more critical question is what current circumstances have prompted you
to consider this?

-- 
Stan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-10 18:08 ` Stan Hoeppner
@ 2013-11-10 18:12   ` Ivan Lezhnjov IV
  2013-11-10 19:17     ` Stan Hoeppner
  0 siblings, 1 reply; 17+ messages in thread
From: Ivan Lezhnjov IV @ 2013-11-10 18:12 UTC (permalink / raw)
  To: stan; +Cc: linux-raid@vger.kernel.org

Love for optimization :) I'm going to run check via cron job, and then I thought why not run e2fsck on the same day so that I do all the maintenance on the same day (in my configuration check requires some almost 48 hours for this raid1 2TB array when filesystem is mounted but it can run in foreground and results examined later, while e2fsck obviously requires some attention).

On Nov 10, 2013, at 8:08 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

> On 11/10/2013 10:06 AM, Ivan Lezhnjov IV wrote:
>> Is this a good idea to run check and e2fsck on raid1 simultaneously? I'm leaning towards a definitive no, but then again I don't really know.
> 
> A more critical question is what current circumstances have prompted you
> to consider this?
> 
> -- 
> Stan
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-10 18:12   ` Ivan Lezhnjov IV
@ 2013-11-10 19:17     ` Stan Hoeppner
  2013-11-10 19:35       ` Ivan Lezhnjov IV
  2013-11-10 20:34       ` NeilBrown
  0 siblings, 2 replies; 17+ messages in thread
From: Stan Hoeppner @ 2013-11-10 19:17 UTC (permalink / raw)
  To: Ivan Lezhnjov IV; +Cc: linux-raid@vger.kernel.org

On 11/10/2013 12:12 PM, Ivan Lezhnjov IV wrote:
> Love for optimization :) I'm going to run check via cron job, and then I thought why not run e2fsck on the same day so that I do all the maintenance on the same day (in my configuration check requires some almost 48 hours for this raid1 2TB array when filesystem is mounted but it can run in foreground and results examined later, while e2fsck obviously requires some attention).

This is not optimization.  This is unnecessary duplication of sanity
checking of on-disk data structures.

No journaling filesystem requires scheduled "preemptive" metadata
structure checking, not EXT3/4, XFS, nor JFS.  If there is a problem
they will alert you in the logs before your scheduled check runs.  Then
you run a check/repair manually.  You mentioned e2fsck so I assume you
have EXT3 or 4.

Also, I see little/no value in running a scheduled mdadm check on a
RAID1 array.  Any problems with RAID1 will be due to one of the disks
beginning to fail in some mode, usually requiring sector relocation.
Most drives do this automatically until they run out of spare sectors,
at which point md will throw write errors.  Monitoring SMART data and/or
running SMART self analysis on a schedule is much more effective here,
as you will become aware of a problem sooner, and have the opportunity
to correct it before it shows up in md.

> On Nov 10, 2013, at 8:08 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> 
>> On 11/10/2013 10:06 AM, Ivan Lezhnjov IV wrote:
>>> Is this a good idea to run check and e2fsck on raid1 simultaneously? I'm leaning towards a definitive no, but then again I don't really know.
>>
>> A more critical question is what current circumstances have prompted you
>> to consider this?
>>
>> -- 
>> Stan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-10 19:17     ` Stan Hoeppner
@ 2013-11-10 19:35       ` Ivan Lezhnjov IV
  2013-11-10 20:12         ` Stan Hoeppner
  2013-11-10 20:34       ` NeilBrown
  1 sibling, 1 reply; 17+ messages in thread
From: Ivan Lezhnjov IV @ 2013-11-10 19:35 UTC (permalink / raw)
  To: stan; +Cc: linux-raid@vger.kernel.org

On Nov 10, 2013, at 9:17 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

> On 11/10/2013 12:12 PM, Ivan Lezhnjov IV wrote:
>> Love for optimization :) I'm going to run check via cron job, and then I thought why not run e2fsck on the same day so that I do all the maintenance on the same day (in my configuration check requires some almost 48 hours for this raid1 2TB array when filesystem is mounted but it can run in foreground and results examined later, while e2fsck obviously requires some attention).
> 
> This is not optimization.  This is unnecessary duplication of sanity
> checking of on-disk data structures.
> 
> No journaling filesystem requires scheduled "preemptive" metadata
> structure checking, not EXT3/4, XFS, nor JFS.  If there is a problem
> they will alert you in the logs before your scheduled check runs.  Then
> you run a check/repair manually.  You mentioned e2fsck so I assume you
> have EXT3 or 4.
> 

While this is true, it may help to understand where I'm coming from with this idea. See, my array is connected over USB to a laptop. I have no intention of frequently disconnecting the drives, but I run a hybrid desktop/server Linux system, meaning that the laptop runs X, and I connect over VNC and all, but it also runs some typical server services such as FTP, HTTP, SAMBA, NFS, etc. I send this computer to sleep mode every night and resume next morning and pm-utils does not always work as expected, and I think it adversely affects any externally connected storage device as it may sometimes go to sleep without proper unmount action (in those rare cases when something happens to the system, it does happen, rarely, but it does). So, it is reasonable to run e2fsck from time to time to catch t
 hose not very obvious failures and to correct any possible impact.

> Also, I see little/no value in running a scheduled mdadm check on a
> RAID1 array.  Any problems with RAID1 will be due to one of the disks
> beginning to fail in some mode, usually requiring sector relocation.
> Most drives do this automatically until they run out of spare sectors,
> at which point md will throw write errors.  Monitoring SMART data and/or
> running SMART self analysis on a schedule is much more effective here,
> as you will become aware of a problem sooner, and have the opportunity
> to correct it before it shows up in md.

Bare with me, I know very little about how RAID works so I can sometimes make totally absurd statements. That being said, I intend to monitor SMART values and I'm wondering now why does it make sense to run check on other types of RAID? I assume 5/6/10 mostly?

I'm also wondering if it is advised to run check with filesystem mounted and in use, or unmounted?

Ivan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-10 19:35       ` Ivan Lezhnjov IV
@ 2013-11-10 20:12         ` Stan Hoeppner
  2013-11-10 23:08           ` Ivan Lezhnjov IV
  0 siblings, 1 reply; 17+ messages in thread
From: Stan Hoeppner @ 2013-11-10 20:12 UTC (permalink / raw)
  To: Ivan Lezhnjov IV; +Cc: linux-raid@vger.kernel.org

On 11/10/2013 1:35 PM, Ivan Lezhnjov IV wrote:
> 
> On Nov 10, 2013, at 9:17 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> 
>> On 11/10/2013 12:12 PM, Ivan Lezhnjov IV wrote:
>>> Love for optimization :) I'm going to run check via cron job, and then I thought why not run e2fsck on the same day so that I do all the maintenance on the same day (in my configuration check requires some almost 48 hours for this raid1 2TB array when filesystem is mounted but it can run in foreground and results examined later, while e2fsck obviously requires some attention).
>>
>> This is not optimization.  This is unnecessary duplication of sanity
>> checking of on-disk data structures.
>>
>> No journaling filesystem requires scheduled "preemptive" metadata
>> structure checking, not EXT3/4, XFS, nor JFS.  If there is a problem
>> they will alert you in the logs before your scheduled check runs.  Then
>> you run a check/repair manually.  You mentioned e2fsck so I assume you
>> have EXT3 or 4.
>>
> 
> While this is true, it may help to understand where I'm coming from with this idea. See, my array is connected over USB to a laptop. I have no intention of frequently disconnecting the drives, but I run a hybrid desktop/server Linux system, meaning that the laptop runs X, and I connect over VNC and all, but it also runs some typical server services such as FTP, HTTP, SAMBA, NFS, etc. I send this computer to sleep mode every night and resume next morning and pm-utils does not always work as expected, and I think it adversely affects any externally connected storage device as it may sometimes go to sleep without proper unmount action (in those rare cases when something happens to the system, it does happen, rarely, but it does). So, it is reasonable to run e2fsck from time to time to catch
  those not very obvious failures and to correct any possible impact.

Now you know why I asked for context.  Your original post suggested you
were doing something very different, out of the ordinary.  And you most
certainly are.

USB is not a storage protocol.  USB devices often disconnect/reconnect
for no apparent reason.  We see this frequently with the little vendor
USB disk drives (Seagate/WD) and also generic disk enclosures.  USB is
not a proper protocol for md/RAID storage.  You may have continual
problems with this setup.

If the laptop has an eSATA port use eSATA.  If not, drop in an eSATA
PCMCIA card.  This should be much more reliable than USB for this
application.

>> Also, I see little/no value in running a scheduled mdadm check on a
>> RAID1 array.  Any problems with RAID1 will be due to one of the disks
>> beginning to fail in some mode, usually requiring sector relocation.
>> Most drives do this automatically until they run out of spare sectors,
>> at which point md will throw write errors.  Monitoring SMART data and/or
>> running SMART self analysis on a schedule is much more effective here,
>> as you will become aware of a problem sooner, and have the opportunity
>> to correct it before it shows up in md.
> 
> Bare with me, I know very little about how RAID works so I can sometimes make totally absurd statements. That being said, I intend to monitor SMART values and I'm wondering now why does it make sense to run check on other types of RAID? I assume 5/6/10 mostly?
> 
> I'm also wondering if it is advised to run check with filesystem mounted and in use, or unmounted?

Instead of using a connection method known to cause problems with
storage, and then attempting to mitigate such damage with array/fs
checks after the fact, why not simply avoid the problem in the first
place?  Use eSATA, or build/buy a little NFS/Samba NAS filer.

-- 
Stan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-10 19:17     ` Stan Hoeppner
  2013-11-10 19:35       ` Ivan Lezhnjov IV
@ 2013-11-10 20:34       ` NeilBrown
  2013-11-10 22:36         ` Stan Hoeppner
  2013-11-10 23:11         ` Ivan Lezhnjov IV
  1 sibling, 2 replies; 17+ messages in thread
From: NeilBrown @ 2013-11-10 20:34 UTC (permalink / raw)
  To: stan; +Cc: Ivan Lezhnjov IV, linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 666 bytes --]

On Sun, 10 Nov 2013 13:17:21 -0600 Stan Hoeppner <stan@hardwarefreak.com>
wrote:

> Also, I see little/no value in running a scheduled mdadm check on a
> RAID1 array.  Any problems with RAID1 will be due to one of the disks
> beginning to fail in some mode, usually requiring sector relocation.

I think scrubbing has value on any RAID with redundancy.
The firmware can only relocate a sector if it reads it when it is marginal
but not yet completely lost.  If a sector is not read for a long time and
during that time the media degraded beyond recovery the firmware cannot do
anything.  But RAID1 can - it can get it from the other device.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-10 20:34       ` NeilBrown
@ 2013-11-10 22:36         ` Stan Hoeppner
  2013-11-10 22:51           ` NeilBrown
  2013-11-10 22:54           ` Adam Goryachev
  2013-11-10 23:11         ` Ivan Lezhnjov IV
  1 sibling, 2 replies; 17+ messages in thread
From: Stan Hoeppner @ 2013-11-10 22:36 UTC (permalink / raw)
  To: NeilBrown; +Cc: Ivan Lezhnjov IV, linux-raid@vger.kernel.org

On 11/10/2013 2:34 PM, NeilBrown wrote:
> On Sun, 10 Nov 2013 13:17:21 -0600 Stan Hoeppner <stan@hardwarefreak.com>
> wrote:
> 
>> Also, I see little/no value in running a scheduled mdadm check on a
>> RAID1 array.  Any problems with RAID1 will be due to one of the disks
>> beginning to fail in some mode, usually requiring sector relocation.
> 
> I think scrubbing has value on any RAID with redundancy.

That's a bit... redundant, Neil. :)

> The firmware can only relocate a sector if it reads it when it is marginal
> but not yet completely lost.  If a sector is not read for a long time and
> during that time the media degraded beyond recovery the firmware cannot do
> anything.  But RAID1 can - it can get it from the other device.

But is a scrub required for this?  Isn't this exactly what occurs during
normal operation with md/RAID1?  I.e. a read fails with disk error, so
we grab the sector from the mirror?  So what advantage is there to
scrubbing md/RAID1?

-- 
Stan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-10 22:36         ` Stan Hoeppner
@ 2013-11-10 22:51           ` NeilBrown
  2013-11-10 22:54           ` Adam Goryachev
  1 sibling, 0 replies; 17+ messages in thread
From: NeilBrown @ 2013-11-10 22:51 UTC (permalink / raw)
  To: stan; +Cc: Ivan Lezhnjov IV, linux-raid@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1342 bytes --]

On Sun, 10 Nov 2013 16:36:35 -0600 Stan Hoeppner <stan@hardwarefreak.com>
wrote:

> On 11/10/2013 2:34 PM, NeilBrown wrote:
> > On Sun, 10 Nov 2013 13:17:21 -0600 Stan Hoeppner <stan@hardwarefreak.com>
> > wrote:
> > 
> >> Also, I see little/no value in running a scheduled mdadm check on a
> >> RAID1 array.  Any problems with RAID1 will be due to one of the disks
> >> beginning to fail in some mode, usually requiring sector relocation.
> > 
> > I think scrubbing has value on any RAID with redundancy.
> 
> That's a bit... redundant, Neil. :)

RAID0.  Sadly a name that is used, even though it is an oxymoron.

> 
> > The firmware can only relocate a sector if it reads it when it is marginal
> > but not yet completely lost.  If a sector is not read for a long time and
> > during that time the media degraded beyond recovery the firmware cannot do
> > anything.  But RAID1 can - it can get it from the other device.
> 
> But is a scrub required for this?  Isn't this exactly what occurs during
> normal operation with md/RAID1?  I.e. a read fails with disk error, so
> we grab the sector from the mirror?  So what advantage is there to
> scrubbing md/RAID1?
> 

If scrubbing finds and repairs a (rarely accessed) bad sector on one drive
before the other drive dies completely, that is a win.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-10 22:36         ` Stan Hoeppner
  2013-11-10 22:51           ` NeilBrown
@ 2013-11-10 22:54           ` Adam Goryachev
  2013-11-11  2:08             ` Stan Hoeppner
  1 sibling, 1 reply; 17+ messages in thread
From: Adam Goryachev @ 2013-11-10 22:54 UTC (permalink / raw)
  To: stan; +Cc: linux-raid@vger.kernel.org

On 11/11/13 09:36, Stan Hoeppner wrote:
> On 11/10/2013 2:34 PM, NeilBrown wrote:
>> The firmware can only relocate a sector if it reads it when it is marginal
>> but not yet completely lost.  If a sector is not read for a long time and
>> during that time the media degraded beyond recovery the firmware cannot do
>> anything.  But RAID1 can - it can get it from the other device.
> But is a scrub required for this?  Isn't this exactly what occurs during
> normal operation with md/RAID1?  I.e. a read fails with disk error, so
> we grab the sector from the mirror?  So what advantage is there to
> scrubbing md/RAID1?
Wouldn't a check of the raid cause each member to be read in full, 
therefore helping the disk to notice that the sector is marginal, and/or 
the RAID layer to notice that the sector is no longer readable and 
therefore read from the other member, and re-write the sector. Consider 
a sector that is very rarely accessed...

Or are you suggesting that a smart command issued to the underlying 
devices can solve both of those scenarios?

Regards,
Adam

-- 
Adam Goryachev Website Managers www.websitemanagers.com.au

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-10 20:12         ` Stan Hoeppner
@ 2013-11-10 23:08           ` Ivan Lezhnjov IV
  2013-11-11  3:43             ` Stan Hoeppner
  0 siblings, 1 reply; 17+ messages in thread
From: Ivan Lezhnjov IV @ 2013-11-10 23:08 UTC (permalink / raw)
  To: stan; +Cc: linux-raid@vger.kernel.org

On Nov 10, 2013, at 10:12 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

> USB is not a storage protocol.  USB devices often disconnect/reconnect
> for no apparent reason.  We see this frequently with the little vendor
> USB disk drives (Seagate/WD) and also generic disk enclosures.  USB is
> not a proper protocol for md/RAID storage.  You may have continual
> problems with this setup.
> 
> If the laptop has an eSATA port use eSATA.  If not, drop in an eSATA
> PCMCIA card.  This should be much more reliable than USB for this
> application.

Actually, it's a good piece of advice. Now all I need is to figure out if I I can do this with the hardware I've got.

However, I feel compelled to say that my USB drives (I have had several… 4 to be precise, now 5) have been incredibly reliable throughout all these years. No connection problems whatsoever, no flakiness/flapping of any kind. Very solid and reliable as for a home, midrange 7 years old laptop and three 7 years old drives. I've been using them for all sorts of things, from backups to torrents and storing virtual machine disk images, etc. Very reliable. The only concern I have is that performance sometimes may not be enough, but by and large it is not a problem for me and so I get by just fine.

Installing an eSATA PCMCIA card is actually a great idea, and I almost falmpaced when I realized I could've probably resolved performance issues long time ago and the solution was in front of me all this time, but then again the problem was from a pressing character and so I have been really content most of the time with what I have.

> 
>>> Also, I see little/no value in running a scheduled mdadm check on a
>>> RAID1 array.  Any problems with RAID1 will be due to one of the disks
>>> beginning to fail in some mode, usually requiring sector relocation.
>>> Most drives do this automatically until they run out of spare sectors,
>>> at which point md will throw write errors.  Monitoring SMART data and/or
>>> running SMART self analysis on a schedule is much more effective here,
>>> as you will become aware of a problem sooner, and have the opportunity
>>> to correct it before it shows up in md.
>> 
>> Bare with me, I know very little about how RAID works so I can sometimes make totally absurd statements. That being said, I intend to monitor SMART values and I'm wondering now why does it make sense to run check on other types of RAID? I assume 5/6/10 mostly?
>> 
>> I'm also wondering if it is advised to run check with filesystem mounted and in use, or unmounted?
> 
> Instead of using a connection method known to cause problems with
> storage, and then attempting to mitigate such damage with array/fs
> checks after the fact, why not simply avoid the problem in the first
> place?  Use eSATA, or build/buy a little NFS/Samba NAS filer.
> 

As I said in my particular configuration it is a pretty solid connection. No experience with NAS filers here, but I'm definitely looking this option up as well (already googled it up and reading a description).

What about filesystem state? Does it matter if a filesystem is mounted when check is run?

Ivan

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-10 20:34       ` NeilBrown
  2013-11-10 22:36         ` Stan Hoeppner
@ 2013-11-10 23:11         ` Ivan Lezhnjov IV
  1 sibling, 0 replies; 17+ messages in thread
From: Ivan Lezhnjov IV @ 2013-11-10 23:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: stan, linux-raid@vger.kernel.org


On Nov 10, 2013, at 10:34 PM, NeilBrown <neilb@suse.de> wrote:

> On Sun, 10 Nov 2013 13:17:21 -0600 Stan Hoeppner <stan@hardwarefreak.com>
> wrote:
> 
>> Also, I see little/no value in running a scheduled mdadm check on a
>> RAID1 array.  Any problems with RAID1 will be due to one of the disks
>> beginning to fail in some mode, usually requiring sector relocation.
> 
> I think scrubbing has value on any RAID with redundancy.
> The firmware can only relocate a sector if it reads it when it is marginal
> but not yet completely lost.  If a sector is not read for a long time and
> during that time the media degraded beyond recovery the firmware cannot do
> anything.  But RAID1 can - it can get it from the other device.
> 
> NeilBrown

I think this is very relevant in my case. I typically offload a bulk of data to these drives, frequently using only some parts of it. So, it sounds like having check/scrubbing run on a schedule (how often is a reasonable frequency? every two weeks perhaps?) is a good idea after all.

Ivan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-10 22:54           ` Adam Goryachev
@ 2013-11-11  2:08             ` Stan Hoeppner
  0 siblings, 0 replies; 17+ messages in thread
From: Stan Hoeppner @ 2013-11-11  2:08 UTC (permalink / raw)
  To: Adam Goryachev; +Cc: linux-raid@vger.kernel.org

On 11/10/2013 4:54 PM, Adam Goryachev wrote:
> On 11/11/13 09:36, Stan Hoeppner wrote:
>> On 11/10/2013 2:34 PM, NeilBrown wrote:
>>> The firmware can only relocate a sector if it reads it when it is
>>> marginal
>>> but not yet completely lost.  If a sector is not read for a long time
>>> and
>>> during that time the media degraded beyond recovery the firmware
>>> cannot do
>>> anything.  But RAID1 can - it can get it from the other device.
>> But is a scrub required for this?  Isn't this exactly what occurs during
>> normal operation with md/RAID1?  I.e. a read fails with disk error, so
>> we grab the sector from the mirror?  So what advantage is there to
>> scrubbing md/RAID1?
> Wouldn't a check of the raid cause each member to be read in full,
> therefore helping the disk to notice that the sector is marginal, and/or
> the RAID layer to notice that the sector is no longer readable and
> therefore read from the other member, and re-write the sector. Consider
> a sector that is very rarely accessed...
> 
> Or are you suggesting that a smart command issued to the underlying
> devices can solve both of those scenarios?

No, what I suggest is that drive instrumentation will often alert one to
drive problems before you see a read error at the kernel.  Assuming this
is true then scrubbing isn't necessary.

What Neil describes is a case where a sector is written once and read
very infrequently, or possibly years after the write, i.e. long term
archiving.  In this case a scrub may discover a media defect which may
go unnoticed by the drive firmware or normal md array operation.

-- 
Stan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-10 23:08           ` Ivan Lezhnjov IV
@ 2013-11-11  3:43             ` Stan Hoeppner
  2013-11-11  7:52               ` Ivan Lezhnjov IV
  0 siblings, 1 reply; 17+ messages in thread
From: Stan Hoeppner @ 2013-11-11  3:43 UTC (permalink / raw)
  To: Ivan Lezhnjov IV; +Cc: linux-raid@vger.kernel.org

On 11/10/2013 5:08 PM, Ivan Lezhnjov IV wrote:
> 
> On Nov 10, 2013, at 10:12 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> 
>> USB is not a storage protocol.  USB devices often disconnect/reconnect
>> for no apparent reason.  We see this frequently with the little vendor
>> USB disk drives (Seagate/WD) and also generic disk enclosures.  USB is
>> not a proper protocol for md/RAID storage.  You may have continual
>> problems with this setup.
>>
>> If the laptop has an eSATA port use eSATA.  If not, drop in an eSATA
>> PCMCIA card.  This should be much more reliable than USB for this
>> application.
> 
> Actually, it's a good piece of advice. Now all I need is to figure out if I I can do this with the hardware I've got.
> 
> However, I feel compelled to say that my USB drives (I have had several… 4 to be precise, now 5) have been incredibly reliable throughout all these years. No connection problems whatsoever, no flakiness/flapping of any kind. Very solid and reliable as for a home, midrange 7 years old laptop and three 7 years old drives. I've been using them for all sorts of things, from backups to torrents and storing virtual machine disk images, etc. Very reliable. The only concern I have is that performance sometimes may not be enough, but by and large it is not a problem for me and so I get by just fine.
> 
> Installing an eSATA PCMCIA card is actually a great idea, and I almost falmpaced when I realized I could've probably resolved performance issues long time ago and the solution was in front of me all this time, but then again the problem was from a pressing character and so I have been really content most of the time with what I have.
> 
>>
>>>> Also, I see little/no value in running a scheduled mdadm check on a
>>>> RAID1 array.  Any problems with RAID1 will be due to one of the disks
>>>> beginning to fail in some mode, usually requiring sector relocation.
>>>> Most drives do this automatically until they run out of spare sectors,
>>>> at which point md will throw write errors.  Monitoring SMART data and/or
>>>> running SMART self analysis on a schedule is much more effective here,
>>>> as you will become aware of a problem sooner, and have the opportunity
>>>> to correct it before it shows up in md.
>>>
>>> Bare with me, I know very little about how RAID works so I can sometimes make totally absurd statements. That being said, I intend to monitor SMART values and I'm wondering now why does it make sense to run check on other types of RAID? I assume 5/6/10 mostly?
>>>
>>> I'm also wondering if it is advised to run check with filesystem mounted and in use, or unmounted?
>>
>> Instead of using a connection method known to cause problems with
>> storage, and then attempting to mitigate such damage with array/fs
>> checks after the fact, why not simply avoid the problem in the first
>> place?  Use eSATA, or build/buy a little NFS/Samba NAS filer.
>>
> 
> As I said in my particular configuration it is a pretty solid connection. 

You've been lucky so far.  I hope your luck holds.

> No experience with NAS filers here, but I'm definitely looking this
option up as well (already googled it up and reading a description).

Network Attached Storage, A.K.A a dedicated file server "appliance".
"NAS" is a catch-all term these days for a dedicated file serving
device, whether NFS, Samba, Windows CIFS, etc.

I mentioned this option because it gives you a dedicated storage server,
and decouples the sleep mode of your laptop from your bulk storage.
There are DIY kits and full retail NAS boxen on the market that use very
little power.  Here you would configure the drives to spin down, but not
put the system into sleep, as the mainboard uses less than 10W anyway.

> What about filesystem state? Does it matter if a filesystem is mounted when check is run?

I'm not an EXT user.  See "man e2fsck".

With XFS you can check while mounted using "xfs_repair -n [device]", but
no repairs will be performed, you simply get a report.  To repair a
damaged XFS filesystem use the same command sans "-n" on the unmounted
filesystem.  xfs_repair will abort if the filesystem is mounted, and
"-n" is not specified.  xfs_repair is not to be automated via
script/cron.  It is only to be run if/when errors are encountered,
usually after a crash, power loss, controller failure, etc.

-- 
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-11  3:43             ` Stan Hoeppner
@ 2013-11-11  7:52               ` Ivan Lezhnjov IV
  2013-11-11  8:09                 ` David Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Ivan Lezhnjov IV @ 2013-11-11  7:52 UTC (permalink / raw)
  To: stan; +Cc: linux-raid@vger.kernel.org

On Nov 11, 2013, at 5:43 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

>> What about filesystem state? Does it matter if a filesystem is mounted when check is run?
> 
> I'm not an EXT user.  See "man e2fsck".
> 
> With XFS you can check while mounted using "xfs_repair -n [device]", but
> no repairs will be performed, you simply get a report.  To repair a
> damaged XFS filesystem use the same command sans "-n" on the unmounted
> filesystem.  xfs_repair will abort if the filesystem is mounted, and
> "-n" is not specified.  xfs_repair is not to be automated via
> script/cron.  It is only to be run if/when errors are encountered,
> usually after a crash, power loss, controller failure, etc.

Oh, I meant 'check' as in

echo check > /sys/block/$DEVICE/md/sync_action

That, and I also read on my distro's wiki that 'check' will pick up where it was interrupted automatically in those cases when a system is rebooted before 'check' is complete. Well, apparently it does not work that way because I was only some 4% into 'check' yesterday evening, and after resume from sleep mdstat does not show that 'check' is running. So, does the distro's wiki contain erroneous information?

Ivan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-11  7:52               ` Ivan Lezhnjov IV
@ 2013-11-11  8:09                 ` David Brown
  2013-11-11  8:29                   ` Ivan Lezhnjov IV
  0 siblings, 1 reply; 17+ messages in thread
From: David Brown @ 2013-11-11  8:09 UTC (permalink / raw)
  To: Ivan Lezhnjov IV, stan; +Cc: linux-raid@vger.kernel.org

On 11/11/13 08:52, Ivan Lezhnjov IV wrote:
> 
> On Nov 11, 2013, at 5:43 AM, Stan Hoeppner <stan@hardwarefreak.com>
> wrote:
> 
>>> What about filesystem state? Does it matter if a filesystem is
>>> mounted when check is run?
>> 
>> I'm not an EXT user.  See "man e2fsck".
>> 
>> With XFS you can check while mounted using "xfs_repair -n
>> [device]", but no repairs will be performed, you simply get a
>> report.  To repair a damaged XFS filesystem use the same command
>> sans "-n" on the unmounted filesystem.  xfs_repair will abort if
>> the filesystem is mounted, and "-n" is not specified.  xfs_repair
>> is not to be automated via script/cron.  It is only to be run
>> if/when errors are encountered, usually after a crash, power loss,
>> controller failure, etc.
> 
> Oh, I meant 'check' as in
> 
> echo check > /sys/block/$DEVICE/md/sync_action

In general, you can happily do a raid check while the system is in use
(i.e., the filesystem is mounted).  Of course, if you are using the
system heavily then things will slow down, but it will all work fine.
Raid is designed for maximal uptime - you don't have to stop arrays or
umount filesystems for raid maintenance.

You can also run raid check simultaneously with e2fsck.

But you can't run e2fsck very well while it is mounted.  You /can/ run
it in read-only mode, and if nothing writes to the filesystem (including
touching access times) then it will work.  But in general you will get
false error messages when inconsistencies are found as structures are
written during the check.  (Other tricks include re-mounting as
read-only, and doing an lvm snapshot of the volume then fsck'ing that.)

> 
> That, and I also read on my distro's wiki that 'check' will pick up
> where it was interrupted automatically in those cases when a system
> is rebooted before 'check' is complete. Well, apparently it does not
> work that way because I was only some 4% into 'check' yesterday
> evening, and after resume from sleep mdstat does not show that
> 'check' is running. So, does the distro's wiki contain erroneous
> information?
> 

I can't say anything about that other than general points about "sleep"
being a half-way house between rebooting and running.  It does not
surprise me that you get issues with things not restarting after a sleep
and wakeup.

As far as I understand your posts, you have a laptop that you use as a
normal machine and also as a server.  Presumably you are connected to
mains power rather than a battery.  Why not just let the thing run
overnight?  It's a laptop - it's low power already.  As long as things
like the display turn off (and perhaps the disks too, after some period
without use), the power used will be tiny.  So keep it turned on, or
turn it off - then you don't have any sleep problems.

I don't think you mentioned what distro you are using (I might have
missed it), which makes it difficult to guess what the "distro's wiki" says.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Running check and e2fsck simultaneously
  2013-11-11  8:09                 ` David Brown
@ 2013-11-11  8:29                   ` Ivan Lezhnjov IV
  0 siblings, 0 replies; 17+ messages in thread
From: Ivan Lezhnjov IV @ 2013-11-11  8:29 UTC (permalink / raw)
  To: David Brown; +Cc: stan, linux-raid@vger.kernel.org

On Nov 11, 2013, at 10:09 AM, David Brown <david.brown@hesbynett.no> wrote:

>> That, and I also read on my distro's wiki that 'check' will pick up
>> where it was interrupted automatically in those cases when a system
>> is rebooted before 'check' is complete. Well, apparently it does not
>> work that way because I was only some 4% into 'check' yesterday
>> evening, and after resume from sleep mdstat does not show that
>> 'check' is running. So, does the distro's wiki contain erroneous
>> information?
>> 
> 
> I can't say anything about that other than general points about "sleep"
> being a half-way house between rebooting and running.  It does not
> surprise me that you get issues with things not restarting after a sleep
> and wakeup.

Tried to echo 'check' but it started from the very beginning. So, the status of the 'check' got reset or something like that.

> As far as I understand your posts, you have a laptop that you use as a
> normal machine and also as a server.  Presumably you are connected to
> mains power rather than a battery.  Why not just let the thing run
> overnight?  It's a laptop - it's low power already.  As long as things
> like the display turn off (and perhaps the disks too, after some period
> without use), the power used will be tiny.  So keep it turned on, or
> turn it off - then you don't have any sleep problems.
> 
> I don't think you mentioned what distro you are using (I might have
> missed it), which makes it difficult to guess what the "distro's wiki" says.

Yes, it's a what I call a hybrid desktop/server laptop. It used to be my Linux workstation years ago, slowly it turned into more of a server type host on home LAN. I still run X and GNOME and connect via VNC to manage VirtualBox in GUI and what not, but 99% of the time it is more like a server. I enjoy, and frankly need, ability to enter sleep and resume on this machine because it sits on my table less then 2 meters away from my bed, and even when idle the fans noise drives me nuts. So, it is absolutely not an option to keep it running at night. There's also no alternative where to put it, so sleep mode has been hands down the best solution for me. Powering it off completely is also not an option because one of the issues is that this laptop has maximum amount of RAM installed, and that so
 mehow causes its BIOS to run full memory test on each start, so it takes about 1.5-2 minutes (maybe more, I never measured but it kind gets on your nerves when you need to reboot) -- compare that to ~ 30 seconds resume from sleep with all the stuff restored at the state it was left in. It is damn convenient and is totally worth a little work on pm-utils hooks to get things rolling smoothly. Well, at least totally justified for me, that is ;D

I run Arch Linux on this laptop, and the system has not been updated since mid 2012. Deliberately, I like it that way lol The Arch Linux wiki has this note:

> Note: If the system is rebooted after a partial scrub has been suspended, the scrub will start over.

https://wiki.archlinux.org/index.php/RAID

I think I was reading it wrong now that I read it again. They talk about suspending the scrub, which I bet is done manually and not by the system when machine is rebooted or switched to sleep mode. If that is the case, this can easily be handled with some BASH magic in a pm-utils sleep hook.

Ivan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-11-11  8:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-10 16:06 Running check and e2fsck simultaneously Ivan Lezhnjov IV
2013-11-10 18:08 ` Stan Hoeppner
2013-11-10 18:12   ` Ivan Lezhnjov IV
2013-11-10 19:17     ` Stan Hoeppner
2013-11-10 19:35       ` Ivan Lezhnjov IV
2013-11-10 20:12         ` Stan Hoeppner
2013-11-10 23:08           ` Ivan Lezhnjov IV
2013-11-11  3:43             ` Stan Hoeppner
2013-11-11  7:52               ` Ivan Lezhnjov IV
2013-11-11  8:09                 ` David Brown
2013-11-11  8:29                   ` Ivan Lezhnjov IV
2013-11-10 20:34       ` NeilBrown
2013-11-10 22:36         ` Stan Hoeppner
2013-11-10 22:51           ` NeilBrown
2013-11-10 22:54           ` Adam Goryachev
2013-11-11  2:08             ` Stan Hoeppner
2013-11-10 23:11         ` Ivan Lezhnjov IV

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).