From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: [Patch] chunkd: add self-checking Date: Tue, 02 Mar 2010 17:05:30 -0500 Message-ID: <4B8D8BAA.1010109@garzik.org> References: <20100302113156.2f439ae3@redhat.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=omUgR/cFpQmU1sIQQnViJ9a8QnH8DG2ZHi9vaXTNET0=; b=VYk3gzUNfLF9/mktLtDnSJeHSdeh4envKK713pQaLOJfwl83v09KEE5999/2Am7u+b dD3eGTnVmB5PNGYRg9PDTqQrQk8BxiEelBeVb6Z8ppOlnuahP1WZJ7qggqN80tAPV1k5 HsqnQ7xPcWeBH/9TSsVmnA47WGhxNsxtjLkOg= In-Reply-To: <20100302113156.2f439ae3@redhat.com> Sender: hail-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Pete Zaitcev Cc: Project Hail List On 03/02/2010 01:31 PM, Pete Zaitcev wrote: > This patch adds the self-check to Chunk. With it, the daemon can rescan > all of its keys and drop those that fail to match their own checksums > or throw an I/O error. Objects that are found found faulty are made > invisible to applications (back-end files are renamed, so that bad > blocks are not reused). This is intended to work in concert with > applications that store redundant copies of their data objects. > > This patch includes the part that tracks the active I/O, since the > self-check is the only user of it, so it made little sense to separate > the two. We have to track the I/O so that self-check does not mistakenly > assume a partially stored object to be faulty and kills it. > > Running the self-check can adversely affect performance. As a crude > way to limit the problem, we limit the load to one check thread only. > Still, as anyone who had mlocate or Beagle running in their desktop > knows, the biggest issue is not additional I/O as such, but the blowing > away the page cache and dentries in kernel. Also, our scheduling is > between rudimentary and non-existing. We only provide a looping check > with a randomized delay and an external control to start and verify the > running of the checking. > > Therefore, to avoid surprises with sudden loss of objects and with > performance anomalies, periodic self-check defaults to off. > > Still, self-check is an intergral part of the daemon, so we include > unit tests for both the I/O tracking and self-check itself. > > Signed-off-by: Pete Zaitcev > > --- > doc/setup.txt | 10 + > include/Makefile.am | 2 > include/chunk-private.h | 4 > include/chunk_msg.h | 21 ++ > include/chunkc.h | 4 > include/objcache.h | 75 ++++++++ > lib/chunkdc.c | 82 +++++++++ > server/Makefile.am | 3 > server/be-fs.c | 130 ++++++++++++++ > server/chunkd.h | 28 +++ > server/cldu.c | 1 > server/config.c | 18 ++ > server/objcache.c | 138 +++++++++++++++ > server/object.c | 9 + > server/selfcheck.c | 293 +++++++++++++++++++++++++++++++++ > server/server.c | 135 +++++++++++++++ > test/.gitignore | 2 > test/Makefile.am | 7 > test/objcache-unit.c | 64 +++++++ > test/selfcheck-unit.c | 334 ++++++++++++++++++++++++++++++++++++++ > test/test.h | 2 > tools/chcli.c | 116 +++++++++++-- > 22 files changed, 1454 insertions(+), 24 deletions(-) applied... looks mostly OK. Things like self-check period still need to be removed, and a few other minor quibbles. But let's go ahead and get this in, then tackle those things.