From mboxrd@z Thu Jan 1 00:00:00 1970 From: Johannes Truschnigg Subject: Re: Disk Monitoring Date: Wed, 28 Jun 2017 12:45:46 +0200 Message-ID: <20170628104545.z7anvliygsjrhqax@vault.lan> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="v4xfsutwbhk3vzra" Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Gandalf Corvotempesta Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids --v4xfsutwbhk3vzra Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Gandalf, On Wed, Jun 28, 2017 at 12:25:55PM +0200, Gandalf Corvotempesta wrote: > Hi to all > I always used hardwre raid but with my next server I would like to use md= adm. >=20 > Some questions: >=20 > 1) all raid controllers have proactive monitoring features, like > patrol read, consistency check and (more or less) some SMART > integration. > Any counterpart in mdadm? mdmon(8) is what you seek. Also, monitoring the kernel debug ringbuffer I c= an highly recommend. > 2) thanks to this features, raid controller are usually able to detect > disk issues before they cause data-loss. what about mdadm ? >=20 > How and when do you replace disks ? Based on which params? Do you > always wait for a total failure before replacing the disk? >=20 > Is mdadm able to notify some possible bad-things before they happens ? md doesn't do low-level management of block devices/disks; that's the job of other parts of the kernel. The block layer will report errors that you may want to act upon before md itself complains and/or the disk gets kicked from its array (which renders your array degraded, but otherwise operational), b= ut that's usually not necessary. There's generally no need to replace a disk without any indication of serio= us problems (like it getting booted from the array due to I/O timeouts, for instance). > Many times in the past our raid controllers forced a bad sector > reallocation during proactive tasks like patrol read. This saved me > many times before. I've tried to not replace a disks when this > reallocation was made (it was a test server) and after some weeks the > disk failed totally. You can initiate the resilvering of array data via sysfs; check md(4) for details. --=20 with best regards: - Johannes Truschnigg ( johannes@truschnigg.info ) www: https://johannes.truschnigg.info/ phone: +43 650 2 133337 xmpp: johannes@truschnigg.info Please do not bother me with HTML-email or attachments. Thank you. --v4xfsutwbhk3vzra Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEGu9IhkI+7/aKLUWF95W3jMsYfLUFAllTiNkACgkQ95W3jMsY fLX1JA//WceCgH5c6fckVHHyjvz9RGwdMAHZOObIN4vQvRQK8XkhcthTd3U8xTQe N3WM9QrYH04HSNwDskzzaW/R8Rbs30QyPQVp5JcO9YxbikFBzeRuqygtDlfsmaNs afE2ixke0/ONBf+o14M1Ky/i9TSXPgp9lM1n3uYY9B1dCjKErglAz3broeLhWEmP kIim9fCg4LdFx2ntQSO8afQDh31g0zP60OWhykhD7ubZ7Jv2TZpB6/PAsh++0lIf itrxfzhucunP42VNCMH3nkXiOwAN0h8TT7+NJkWEUZHbEdcuQ4olL0Vuo4dbMwGu lNbu+l4qlWLlO/0y4cTe3OHB87WhbBH6jylKZDe34eBq4rI9MhwpPrsjWS5geCSb jjLWvhHYaNWUbClHySv7IbTkpPsk18Jr8W9+Guj1ZzwPnle9/lOQ+il+vA2c4PrP 5eZiUMFe48KRCsKsOJLXOGIYFyGKtMe4HVkmzaD0w8MjNJ0oSn/ktwgW7o5MSxkE Ng4fFhoxyfRw8pFhk0qM5PA8ic/cWBL0guAhlZkJ5+wrUBBTvX8BduW7zyQFvjNk ZamLglXbh0dP0/kksCsoG/ln9hP9Pv1ZdAX2V4QNs6FTrgR8SD4R+WeLncnfHX09 0i0KwYShpoO+ISJpiRIkb0lYY6+lFX92lnPfgxeoK4QVielyZHw= =X1MR -----END PGP SIGNATURE----- --v4xfsutwbhk3vzra--