From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755778AbeAHKWn (ORCPT + 1 other); Mon, 8 Jan 2018 05:22:43 -0500 Received: from mail-pf0-f169.google.com ([209.85.192.169]:40905 "EHLO mail-pf0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755609AbeAHKWk (ORCPT ); Mon, 8 Jan 2018 05:22:40 -0500 X-Google-Smtp-Source: ACJfBovvxjGnknGtmFjctwus2ocKyyeXlw+M7C1Sk6NYGjcx4FdPUbDrL6SQO7jtZqrBMqrYMJjASQ== Date: Mon, 8 Jan 2018 19:22:34 +0900 From: Sergey Senozhatsky To: Michal Hocko Cc: Sergey Senozhatsky , Sergey Senozhatsky , Andrew Morton , Tetsuo Handa , Minchan Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: ratelimit end_swap_bio_write() error Message-ID: <20180108102234.GA818@jagdpanzerIV> References: <20180106043407.25193-1-sergey.senozhatsky@gmail.com> <20180106094124.GB16576@dhcp22.suse.cz> <20180106100313.GA527@tigerII.localdomain> <20180106133417.GA23629@dhcp22.suse.cz> <20180108015818.GA533@jagdpanzerIV> <20180108083742.GB5717@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180108083742.GB5717@dhcp22.suse.cz> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On (01/08/18 09:37), Michal Hocko wrote: [..] > > the lockup is not the main problem and I'm not really trying to > > address it here. we simply can fill up the entire kernel logbuf > > with the same "Write-error on swap-device" errors. > > Your changelog is rather modest on the information. fair point! > Could you be more specific on how the problem actually happens how > likely it is? ok. so what we have is slow_path / swap-out page __zram_bvec_write(page) compressed_page = zcomp_compress(page) zs_malloc(compressed_page) // no available zspage found, need to allocate new alloc_zspage() { for (i = 0; i < class->pages_per_zspage; i++) page = alloc_page(gfp); if (!page) return NULL } return -ENOMEM ... printk("Write-error on swap-device..."); zspage-s can consist of up to ->pages_per_zspage normal pages. if alloc_page() fails then we can't allocate the entire zspage, so we can't store the swapped out page, so it remains in ram and we don't make any progress. so we try to swap another page and may be do the whole zs_malloc()->alloc_zspage() again, may be not. depending on how bad the OOM situation is there can be few or many "Write-error on swap-device" errors. > And again, I do not think the throttling is an appropriate counter > measure. We do want to print those messages when a critical situation > happens. If we have a fallback then simply do not print at all. sure, but with the ratelimited printk we still print those messages. we just don't print it for every single page we failed to write to the device. the existing error messages can (*sometimes*) be noisy and not very informative - "Write-error on swap-device (%u:%u:%llu)\n"; it's not like 1000 of those tell more than 1 or 10. -ss