From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753356AbeAFNeU (ORCPT + 1 other); Sat, 6 Jan 2018 08:34:20 -0500 Received: from mx2.suse.de ([195.135.220.15]:41574 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753113AbeAFNeT (ORCPT ); Sat, 6 Jan 2018 08:34:19 -0500 Date: Sat, 6 Jan 2018 14:34:17 +0100 From: Michal Hocko To: Sergey Senozhatsky Cc: Andrew Morton , Tetsuo Handa , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: ratelimit end_swap_bio_write() error Message-ID: <20180106133417.GA23629@dhcp22.suse.cz> References: <20180106043407.25193-1-sergey.senozhatsky@gmail.com> <20180106094124.GB16576@dhcp22.suse.cz> <20180106100313.GA527@tigerII.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180106100313.GA527@tigerII.localdomain> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Sat 06-01-18 19:03:13, Sergey Senozhatsky wrote: > Hello, > > On (01/06/18 10:41), Michal Hocko wrote: > > On Sat 06-01-18 13:34:07, Sergey Senozhatsky wrote: > > > Use the ratelimited printk() version for swap-device write error > > > reporting. We can use ZRAM as a swap-device, and the tricky part > > > here is that zsmalloc() stores compressed objects in memory, thus > > > it has to allocates pages during swap-out. If the system is short > > > on memory, then we begin to flood printk() log buffer with the > > > same "Write-error on swap-device XXX" error messages and sometimes > > > simply lockup the system. > > > > Should we print an error in such a situation at all? Write-error > > certainly sounds scare and it suggests something went really wrong. > > My understading is that zram failed swap-out is not critical and > > therefore the error message is not really useful. > > I don't mind to get rid of it. up to you :) I do not think we can get rid of it for all swap backends. > > Or what should an admin do when seeing it? > > zsmalloc allocation is just one possibility; an error in > compressing algorithm is another one, yet is rather unlikely. > most likely it's OOM which can cause problems. but in any case > it's sort of unclear what should be done. an error can be a > temporary one or a fatal one, just like in __swap_writepage() > case. so may be both write error printk()-s can be dropped. Then I would suggest starting with sorting out which of those errors are critical and which are not and report the error accordingly. I am sorry to be fuzzy here but I am not familiar with the code to be more specific. Anyway ratelimiting sounds more like a paper over than a real solution. Also it sounds quite scary that you can see so many failures to actually lock up the system just by printing a message... -- Michal Hocko SUSE Labs