From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx181.postini.com [74.125.245.181]) by kanga.kvack.org (Postfix) with SMTP id E3DD56B0002 for ; Tue, 19 Feb 2013 07:12:09 -0500 (EST) Received: by mail-qa0-f47.google.com with SMTP id j8so1775571qah.13 for ; Tue, 19 Feb 2013 04:12:08 -0800 (PST) Message-ID: <51236C11.1010208@gmail.com> Date: Tue, 19 Feb 2013 20:12:01 +0800 From: Ric Mason MIME-Version: 1.0 Subject: Re: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dan Magenheimer Cc: Minchan Kim , Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton On 02/05/2013 05:28 AM, Dan Magenheimer wrote: >> From: Minchan Kim [mailto:minchan@kernel.org] >> Sent: Sunday, February 03, 2013 7:50 PM >> To: Hugh Dickins >> Cc: Nitin Gupta; Dan Magenheimer; Seth Jennings; Konrad Rzeszutek Wilk; linux-mm@kvack.org; linux- >> kernel@vger.kernel.org; Andrew Morton >> Subject: Re: Questin about swap_slot free and invalidate page >> >> Hi Hugh, >> >> On Sun, Feb 03, 2013 at 05:51:14PM -0800, Hugh Dickins wrote: >>> On Thu, 31 Jan 2013, Minchan Kim wrote: >>> >>>> When I reviewed zswap, I was curious about frontswap_store. >>>> It said following as. >>>> >>>> * If frontswap already contains a page with matching swaptype and >>>> * offset, the frontswap implementation may either overwrite the data and >>>> * return success or invalidate the page from frontswap and return failure. >>>> >>>> It didn't say why it happens. we already have __frontswap_invalidate_page >>>> and call it whenever swap_slot frees. If we don't free swap slot, >>>> scan_swap_map can't find the slot for swap out so I thought overwriting of >>>> data shouldn't happen in frontswap. >>>> >>>> As I looked the code, the curplit is reuse_swap_page. It couldn't free swap >>>> slot if the page founded is PG_writeback but miss calling frontswap_invalidate_page >>>> so data overwriting on frontswap can happen. I'm not sure frontswap guys >>>> already discussed it long time ago. >>>> >>>> If we can fix it, we can remove duplication entry handling logic >>>> in all of backend of frontswap. All of backend should handle it although >>>> it's pretty rare. Of course, zram could be fixed. It might be trivial now >>>> but more there are many backend of frontswap, more it would be a headache. >>>> >>>> If we are trying to fix it in swap layer, we might fix it following as >>>> >>>> int reuse_swap_page(struct page *page) >>>> { >>>> .. >>>> .. >>>> if (count == 1) { >>>> if (!PageWriteback(page)) { >>>> delete_from_swap_cache(page); >>>> SetPageDirty(page); >>>> } else { >>>> frontswap_invalidate_page(); >>>> swap_slot_free_notify(); >>>> } >>>> } >>>> } >>>> >>>> But not sure, it is worth at the moment and there might be other places >>>> to be fixed.(I hope Hugh can point out if we are missing something if he >>>> has a time) >>> I expect you are right that reuse_swap_page() is the only way it would >>> happen for frontswap; but I'm too unfamiliar with frontswap to promise >>> you that - it's better that you insert WARN_ONs in your testing to verify. >>> >>> But I think it's a general tmem property, isn't it? To define what >>> happens if you do give it the same key again. So I doubt it's something >> I am too unfamiliar with tmem property but thing I am seeing is >> EXPORT_SYMBOL(__frontswap_store). It's a one of frontend and is tighly very >> coupled with swap subsystem. >> >>> that has to be fixed; but if you do find it helpful to fix it, bear in >>> mind that reuse_swap_page() is an odd corner, which may one day give the >>> "stable pages" DIF/DIX people trouble, though they've not yet complained. >>> >>> I'd prefer a patch not specific to frontswap, but along the lines below: >>> I think that's the most robust way to express it, though I don't think >>> the (count == 0) case can actually occur inside that block (whereas >>> count == 0 certainly can occur in the !PageSwapCache case). >>> >>> I believe that I once upon a time took statistics of how often the >>> PageWriteback case happens here, and concluded that it wasn't often >>> enough that refusing to reuse in this case would be likely to slow >>> anyone down noticeably. >> I agree. I had a test about that with zram and that case wasn't common. >> so your patch looks good to me. >> >> I am waiting Dan's reply(He will come in this week) and then, judge what's >> the best. > Hugh is right that handling the possibility of duplicates is > part of the tmem ABI. If there is any possibility of duplicates, > the ABI defines how a backend must handle them to avoid data > coherency issues. > > The kernel implements an in-kernel API which implements the tmem > ABI. If the frontend and backend can always agree that duplicate Which ABI in zcache implement that? > are never possible, I agree that the backend could avoid that > special case. However, duplicates occur rarely enough and the > consequences (data loss) are bad enough that I think the case > should still be checked, at least with a BUG_ON. I also wonder > if it is worth it to make changes to the core swap subsystem > to avoid code to implement a zswap corner case. > > Remember that zswap is an oversimplified special case of tmem > that handles only one frontend (Linux frontswap) and one backend > (zswap). Tmem goes well beyond that and already supports other > more general backends including Xen and ramster, and could also > support other frontends such as a BSD or Solaris equivalent > of frontswap, for example with a Linux ramster/zcache backend. > I'm not sure how wise it is to tear out generic code and replace > it with simplistic code unless there is absolutely no chance that > the generic code will be necessary. > > My two cents, > Dan > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org