From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ingo Molnar Subject: Re: [PATCH 5/9] HWPoison: add memory_failure_queue() Date: Tue, 24 May 2011 04:48:48 +0200 Message-ID: <20110524024848.GA25230@elte.hu> References: <20110517092620.GI22093@elte.hu> <4DD31C78.6000209@intel.com> <20110520115614.GH14745@elte.hu> <20110522100021.GA28177@elte.hu> <20110522132515.GA13078@elte.hu> <4DD9C8B9.5070004@intel.com> <20110523110151.GD24674@elte.hu> <4DDB1396.7050205@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from mx3.mail.elte.hu ([157.181.1.138]:38270 "EHLO mx3.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752516Ab1EXCwO (ORCPT ); Mon, 23 May 2011 22:52:14 -0400 Content-Disposition: inline In-Reply-To: <4DDB1396.7050205@intel.com> Sender: linux-acpi-owner@vger.kernel.org List-Id: linux-acpi@vger.kernel.org To: Huang Ying Cc: huang ying , Len Brown , "linux-kernel@vger.kernel.org" , Andi Kleen , "Luck, Tony" , "linux-acpi@vger.kernel.org" , Andi Kleen , "Wu, Fengguang" , Andrew Morton , Linus Torvalds , Peter Zijlstra , Borislav Petkov * Huang Ying wrote: > >> - How to deal with ring-buffer overflow? For example, there is full of > >> corrected memory error in ring-buffer, and now a recoverable memory error > >> occurs but it can not be put into perf ring buffer because of ring-buffer > >> overflow, how to deal with the recoverable memory error? > > > > The solution is to make it large enough. With *every* queueing solution there > > will be some sort of queue size limit. > > Another solution could be: > > Create two ring-buffer. One is for logging and will be read by RAS > daemon; the other is for recovering, the event record will be removed > from the ring-buffer after all 'active filters' have been run on it. > Even RAS daemon being restarted or hang, recoverable error can be taken > cared of. Well, filters will always be executed since they execute when the event is inserted - not when it's extracted. So if you worry about losing *filter* executions (and dependent policy action) - there should be no loss there, ever. But yes, the scheme you outline would work as well: a counting-only event with a filter specified - this will do no buffering at all. So ... to get the ball rolling in this area one of you guys active in RAS should really try a first approximation for the active filter approach: add a test-TRACE_EVENT() for the errors you are interested in and define a convenient way to register policy action with post-filter events. This should work even without having the 'active' portion defined at the ABI and filter-string level. Thanks, Ingo