From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx166.postini.com [74.125.245.166]) by kanga.kvack.org (Postfix) with SMTP id 34ACB6B0007 for ; Thu, 31 Jan 2013 00:11:43 -0500 (EST) Date: Thu, 31 Jan 2013 14:11:40 +0900 From: Minchan Kim Subject: Questin about swap_slot free and invalidate page Message-ID: <20130131051140.GB23548@blaptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Nitin Gupta , Dan Magenheimer , Seth Jennings , Hugh Dickins , Konrad Rzeszutek Wilk Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton When I reviewed zswap, I was curious about frontswap_store. It said following as. * If frontswap already contains a page with matching swaptype and * offset, the frontswap implementation may either overwrite the data and * return success or invalidate the page from frontswap and return failure. It didn't say why it happens. we already have __frontswap_invalidate_page and call it whenever swap_slot frees. If we don't free swap slot, scan_swap_map can't find the slot for swap out so I thought overwriting of data shouldn't happen in frontswap. As I looked the code, the curplit is reuse_swap_page. It couldn't free swap slot if the page founded is PG_writeback but miss calling frontswap_invalidate_page so data overwriting on frontswap can happen. I'm not sure frontswap guys already discussed it long time ago. If we can fix it, we can remove duplication entry handling logic in all of backend of frontswap. All of backend should handle it although it's pretty rare. Of course, zram could be fixed. It might be trivial now but more there are many backend of frontswap, more it would be a headache. If we are trying to fix it in swap layer, we might fix it following as int reuse_swap_page(struct page *page) { .. .. if (count == 1) { if (!PageWriteback(page)) { delete_from_swap_cache(page); SetPageDirty(page); } else { frontswap_invalidate_page(); swap_slot_free_notify(); } } } But not sure, it is worth at the moment and there might be other places to be fixed.(I hope Hugh can point out if we are missing something if he has a time) If we are reluctant to it, at least, we should write out comment above frontswap_store about that to notice curious guys who spend many time to know WHY and smart guys who are going to fix it with nice way. Mr. Frontswap, What do you think about it? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx190.postini.com [74.125.245.190]) by kanga.kvack.org (Postfix) with SMTP id 490666B0007 for ; Sun, 3 Feb 2013 20:51:17 -0500 (EST) Received: by mail-pb0-f44.google.com with SMTP id wz12so2905401pbc.31 for ; Sun, 03 Feb 2013 17:51:16 -0800 (PST) Date: Sun, 3 Feb 2013 17:51:14 -0800 (PST) From: Hugh Dickins Subject: Re: Questin about swap_slot free and invalidate page In-Reply-To: <20130131051140.GB23548@blaptop> Message-ID: References: <20130131051140.GB23548@blaptop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Nitin Gupta , Dan Magenheimer , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton On Thu, 31 Jan 2013, Minchan Kim wrote: > When I reviewed zswap, I was curious about frontswap_store. > It said following as. > > * If frontswap already contains a page with matching swaptype and > * offset, the frontswap implementation may either overwrite the data and > * return success or invalidate the page from frontswap and return failure. > > It didn't say why it happens. we already have __frontswap_invalidate_page > and call it whenever swap_slot frees. If we don't free swap slot, > scan_swap_map can't find the slot for swap out so I thought overwriting of > data shouldn't happen in frontswap. > > As I looked the code, the curplit is reuse_swap_page. It couldn't free swap > slot if the page founded is PG_writeback but miss calling frontswap_invalidate_page > so data overwriting on frontswap can happen. I'm not sure frontswap guys > already discussed it long time ago. > > If we can fix it, we can remove duplication entry handling logic > in all of backend of frontswap. All of backend should handle it although > it's pretty rare. Of course, zram could be fixed. It might be trivial now > but more there are many backend of frontswap, more it would be a headache. > > If we are trying to fix it in swap layer, we might fix it following as > > int reuse_swap_page(struct page *page) > { > .. > .. > if (count == 1) { > if (!PageWriteback(page)) { > delete_from_swap_cache(page); > SetPageDirty(page); > } else { > frontswap_invalidate_page(); > swap_slot_free_notify(); > } > } > } > > But not sure, it is worth at the moment and there might be other places > to be fixed.(I hope Hugh can point out if we are missing something if he > has a time) I expect you are right that reuse_swap_page() is the only way it would happen for frontswap; but I'm too unfamiliar with frontswap to promise you that - it's better that you insert WARN_ONs in your testing to verify. But I think it's a general tmem property, isn't it? To define what happens if you do give it the same key again. So I doubt it's something that has to be fixed; but if you do find it helpful to fix it, bear in mind that reuse_swap_page() is an odd corner, which may one day give the "stable pages" DIF/DIX people trouble, though they've not yet complained. I'd prefer a patch not specific to frontswap, but along the lines below: I think that's the most robust way to express it, though I don't think the (count == 0) case can actually occur inside that block (whereas count == 0 certainly can occur in the !PageSwapCache case). I believe that I once upon a time took statistics of how often the PageWriteback case happens here, and concluded that it wasn't often enough that refusing to reuse in this case would be likely to slow anyone down noticeably. > > If we are reluctant to it, at least, we should write out comment above > frontswap_store about that to notice curious guys who spend many > time to know WHY and smart guys who are going to fix it with nice way. > > Mr. Frontswap, What do you think about it? He's not me of course :) Hugh --- 3.8-rc6/mm/swapfile.c 2012-12-22 09:43:27.668015583 -0800 +++ linux/mm/swapfile.c 2013-02-03 17:31:04.148181857 -0800 @@ -637,8 +637,11 @@ int reuse_swap_page(struct page *page) return 0; count = page_mapcount(page); if (count <= 1 && PageSwapCache(page)) { - count += page_swapcount(page); - if (count == 1 && !PageWriteback(page)) { + if (PageWriteback(page)) + count = 2; /* not safe yet to free its swap */ + else + count += page_swapcount(page); + if (count <= 1) { delete_from_swap_cache(page); SetPageDirty(page); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx177.postini.com [74.125.245.177]) by kanga.kvack.org (Postfix) with SMTP id EE3E66B0002 for ; Sun, 3 Feb 2013 21:49:52 -0500 (EST) Date: Mon, 4 Feb 2013 11:49:50 +0900 From: Minchan Kim Subject: Re: Questin about swap_slot free and invalidate page Message-ID: <20130204024950.GD2688@blaptop> References: <20130131051140.GB23548@blaptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Nitin Gupta , Dan Magenheimer , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Hi Hugh, On Sun, Feb 03, 2013 at 05:51:14PM -0800, Hugh Dickins wrote: > On Thu, 31 Jan 2013, Minchan Kim wrote: > > > When I reviewed zswap, I was curious about frontswap_store. > > It said following as. > > > > * If frontswap already contains a page with matching swaptype and > > * offset, the frontswap implementation may either overwrite the data and > > * return success or invalidate the page from frontswap and return failure. > > > > It didn't say why it happens. we already have __frontswap_invalidate_page > > and call it whenever swap_slot frees. If we don't free swap slot, > > scan_swap_map can't find the slot for swap out so I thought overwriting of > > data shouldn't happen in frontswap. > > > > As I looked the code, the curplit is reuse_swap_page. It couldn't free swap > > slot if the page founded is PG_writeback but miss calling frontswap_invalidate_page > > so data overwriting on frontswap can happen. I'm not sure frontswap guys > > already discussed it long time ago. > > > > If we can fix it, we can remove duplication entry handling logic > > in all of backend of frontswap. All of backend should handle it although > > it's pretty rare. Of course, zram could be fixed. It might be trivial now > > but more there are many backend of frontswap, more it would be a headache. > > > > If we are trying to fix it in swap layer, we might fix it following as > > > > int reuse_swap_page(struct page *page) > > { > > .. > > .. > > if (count == 1) { > > if (!PageWriteback(page)) { > > delete_from_swap_cache(page); > > SetPageDirty(page); > > } else { > > frontswap_invalidate_page(); > > swap_slot_free_notify(); > > } > > } > > } > > > > But not sure, it is worth at the moment and there might be other places > > to be fixed.(I hope Hugh can point out if we are missing something if he > > has a time) > > I expect you are right that reuse_swap_page() is the only way it would > happen for frontswap; but I'm too unfamiliar with frontswap to promise > you that - it's better that you insert WARN_ONs in your testing to verify. > > But I think it's a general tmem property, isn't it? To define what > happens if you do give it the same key again. So I doubt it's something I am too unfamiliar with tmem property but thing I am seeing is EXPORT_SYMBOL(__frontswap_store). It's a one of frontend and is tighly very coupled with swap subsystem. > that has to be fixed; but if you do find it helpful to fix it, bear in > mind that reuse_swap_page() is an odd corner, which may one day give the > "stable pages" DIF/DIX people trouble, though they've not yet complained. > > I'd prefer a patch not specific to frontswap, but along the lines below: > I think that's the most robust way to express it, though I don't think > the (count == 0) case can actually occur inside that block (whereas > count == 0 certainly can occur in the !PageSwapCache case). > > I believe that I once upon a time took statistics of how often the > PageWriteback case happens here, and concluded that it wasn't often > enough that refusing to reuse in this case would be likely to slow > anyone down noticeably. I agree. I had a test about that with zram and that case wasn't common. so your patch looks good to me. I am waiting Dan's reply(He will come in this week) and then, judge what's the best. Thanks! > > > > > If we are reluctant to it, at least, we should write out comment above > > frontswap_store about that to notice curious guys who spend many > > time to know WHY and smart guys who are going to fix it with nice way. > > > > Mr. Frontswap, What do you think about it? > > He's not me of course :) > > Hugh > > --- 3.8-rc6/mm/swapfile.c 2012-12-22 09:43:27.668015583 -0800 > +++ linux/mm/swapfile.c 2013-02-03 17:31:04.148181857 -0800 > @@ -637,8 +637,11 @@ int reuse_swap_page(struct page *page) > return 0; > count = page_mapcount(page); > if (count <= 1 && PageSwapCache(page)) { > - count += page_swapcount(page); > - if (count == 1 && !PageWriteback(page)) { > + if (PageWriteback(page)) > + count = 2; /* not safe yet to free its swap */ > + else > + count += page_swapcount(page); > + if (count <= 1) { > delete_from_swap_cache(page); > SetPageDirty(page); > } > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx206.postini.com [74.125.245.206]) by kanga.kvack.org (Postfix) with SMTP id EE9FD6B0085 for ; Mon, 4 Feb 2013 16:29:05 -0500 (EST) MIME-Version: 1.0 Message-ID: Date: Mon, 4 Feb 2013 13:28:55 -0800 (PST) From: Dan Magenheimer Subject: RE: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> In-Reply-To: <20130204024950.GD2688@blaptop> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim , Hugh Dickins Cc: Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton > From: Minchan Kim [mailto:minchan@kernel.org] > Sent: Sunday, February 03, 2013 7:50 PM > To: Hugh Dickins > Cc: Nitin Gupta; Dan Magenheimer; Seth Jennings; Konrad Rzeszutek Wilk; l= inux-mm@kvack.org; linux- > kernel@vger.kernel.org; Andrew Morton > Subject: Re: Questin about swap_slot free and invalidate page >=20 > Hi Hugh, >=20 > On Sun, Feb 03, 2013 at 05:51:14PM -0800, Hugh Dickins wrote: > > On Thu, 31 Jan 2013, Minchan Kim wrote: > > > > > When I reviewed zswap, I was curious about frontswap_store. > > > It said following as. > > > > > > * If frontswap already contains a page with matching swaptype and > > > * offset, the frontswap implementation may either overwrite the data= and > > > * return success or invalidate the page from frontswap and return fa= ilure. > > > > > > It didn't say why it happens. we already have __frontswap_invalidate_= page > > > and call it whenever swap_slot frees. If we don't free swap slot, > > > scan_swap_map can't find the slot for swap out so I thought overwriti= ng of > > > data shouldn't happen in frontswap. > > > > > > As I looked the code, the curplit is reuse_swap_page. It couldn't fre= e swap > > > slot if the page founded is PG_writeback but miss calling frontswap_i= nvalidate_page > > > so data overwriting on frontswap can happen. I'm not sure frontswap g= uys > > > already discussed it long time ago. > > > > > > If we can fix it, we can remove duplication entry handling logic > > > in all of backend of frontswap. All of backend should handle it altho= ugh > > > it's pretty rare. Of course, zram could be fixed. It might be trivial= now > > > but more there are many backend of frontswap, more it would be a head= ache. > > > > > > If we are trying to fix it in swap layer, we might fix it following = as > > > > > > int reuse_swap_page(struct page *page) > > > { > > > .. > > > .. > > > if (count =3D=3D 1) { > > > if (!PageWriteback(page)) { > > > delete_from_swap_cache(page); > > > SetPageDirty(page); > > > } else { > > > frontswap_invalidate_page(); > > > swap_slot_free_notify(); > > > } > > > } > > > } > > > > > > But not sure, it is worth at the moment and there might be other plac= es > > > to be fixed.(I hope Hugh can point out if we are missing something if= he > > > has a time) > > > > I expect you are right that reuse_swap_page() is the only way it would > > happen for frontswap; but I'm too unfamiliar with frontswap to promise > > you that - it's better that you insert WARN_ONs in your testing to veri= fy. > > > > But I think it's a general tmem property, isn't it? To define what > > happens if you do give it the same key again. So I doubt it's somethin= g >=20 > I am too unfamiliar with tmem property but thing I am seeing is > EXPORT_SYMBOL(__frontswap_store). It's a one of frontend and is tighly ve= ry > coupled with swap subsystem. >=20 > > that has to be fixed; but if you do find it helpful to fix it, bear in > > mind that reuse_swap_page() is an odd corner, which may one day give th= e > > "stable pages" DIF/DIX people trouble, though they've not yet complaine= d. > > > > I'd prefer a patch not specific to frontswap, but along the lines below= : > > I think that's the most robust way to express it, though I don't think > > the (count =3D=3D 0) case can actually occur inside that block (whereas > > count =3D=3D 0 certainly can occur in the !PageSwapCache case). > > > > I believe that I once upon a time took statistics of how often the > > PageWriteback case happens here, and concluded that it wasn't often > > enough that refusing to reuse in this case would be likely to slow > > anyone down noticeably. >=20 > I agree. I had a test about that with zram and that case wasn't common. > so your patch looks good to me. >=20 > I am waiting Dan's reply(He will come in this week) and then, judge what'= s > the best. Hugh is right that handling the possibility of duplicates is part of the tmem ABI. If there is any possibility of duplicates, the ABI defines how a backend must handle them to avoid data coherency issues. The kernel implements an in-kernel API which implements the tmem ABI. If the frontend and backend can always agree that duplicates are never possible, I agree that the backend could avoid that special case. However, duplicates occur rarely enough and the consequences (data loss) are bad enough that I think the case should still be checked, at least with a BUG_ON. I also wonder if it is worth it to make changes to the core swap subsystem to avoid code to implement a zswap corner case. Remember that zswap is an oversimplified special case of tmem that handles only one frontend (Linux frontswap) and one backend (zswap). Tmem goes well beyond that and already supports other more general backends including Xen and ramster, and could also support other frontends such as a BSD or Solaris equivalent of frontswap, for example with a Linux ramster/zcache backend. I'm not sure how wise it is to tear out generic code and replace it with simplistic code unless there is absolutely no chance that the generic code will be necessary. My two cents, Dan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx115.postini.com [74.125.245.115]) by kanga.kvack.org (Postfix) with SMTP id C2DBF6B00D1 for ; Mon, 4 Feb 2013 20:24:42 -0500 (EST) Date: Tue, 5 Feb 2013 10:24:40 +0900 From: Minchan Kim Subject: Re: Questin about swap_slot free and invalidate page Message-ID: <20130205012440.GF2610@blaptop> References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Dan Magenheimer Cc: Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton On Mon, Feb 04, 2013 at 01:28:55PM -0800, Dan Magenheimer wrote: > > From: Minchan Kim [mailto:minchan@kernel.org] > > Sent: Sunday, February 03, 2013 7:50 PM > > To: Hugh Dickins > > Cc: Nitin Gupta; Dan Magenheimer; Seth Jennings; Konrad Rzeszutek Wilk; linux-mm@kvack.org; linux- > > kernel@vger.kernel.org; Andrew Morton > > Subject: Re: Questin about swap_slot free and invalidate page > > > > Hi Hugh, > > > > On Sun, Feb 03, 2013 at 05:51:14PM -0800, Hugh Dickins wrote: > > > On Thu, 31 Jan 2013, Minchan Kim wrote: > > > > > > > When I reviewed zswap, I was curious about frontswap_store. > > > > It said following as. > > > > > > > > * If frontswap already contains a page with matching swaptype and > > > > * offset, the frontswap implementation may either overwrite the data and > > > > * return success or invalidate the page from frontswap and return failure. > > > > > > > > It didn't say why it happens. we already have __frontswap_invalidate_page > > > > and call it whenever swap_slot frees. If we don't free swap slot, > > > > scan_swap_map can't find the slot for swap out so I thought overwriting of > > > > data shouldn't happen in frontswap. > > > > > > > > As I looked the code, the curplit is reuse_swap_page. It couldn't free swap > > > > slot if the page founded is PG_writeback but miss calling frontswap_invalidate_page > > > > so data overwriting on frontswap can happen. I'm not sure frontswap guys > > > > already discussed it long time ago. > > > > > > > > If we can fix it, we can remove duplication entry handling logic > > > > in all of backend of frontswap. All of backend should handle it although > > > > it's pretty rare. Of course, zram could be fixed. It might be trivial now > > > > but more there are many backend of frontswap, more it would be a headache. > > > > > > > > If we are trying to fix it in swap layer, we might fix it following as > > > > > > > > int reuse_swap_page(struct page *page) > > > > { > > > > .. > > > > .. > > > > if (count == 1) { > > > > if (!PageWriteback(page)) { > > > > delete_from_swap_cache(page); > > > > SetPageDirty(page); > > > > } else { > > > > frontswap_invalidate_page(); > > > > swap_slot_free_notify(); > > > > } > > > > } > > > > } > > > > > > > > But not sure, it is worth at the moment and there might be other places > > > > to be fixed.(I hope Hugh can point out if we are missing something if he > > > > has a time) > > > > > > I expect you are right that reuse_swap_page() is the only way it would > > > happen for frontswap; but I'm too unfamiliar with frontswap to promise > > > you that - it's better that you insert WARN_ONs in your testing to verify. > > > > > > But I think it's a general tmem property, isn't it? To define what > > > happens if you do give it the same key again. So I doubt it's something > > > > I am too unfamiliar with tmem property but thing I am seeing is > > EXPORT_SYMBOL(__frontswap_store). It's a one of frontend and is tighly very > > coupled with swap subsystem. > > > > > that has to be fixed; but if you do find it helpful to fix it, bear in > > > mind that reuse_swap_page() is an odd corner, which may one day give the > > > "stable pages" DIF/DIX people trouble, though they've not yet complained. > > > > > > I'd prefer a patch not specific to frontswap, but along the lines below: > > > I think that's the most robust way to express it, though I don't think > > > the (count == 0) case can actually occur inside that block (whereas > > > count == 0 certainly can occur in the !PageSwapCache case). > > > > > > I believe that I once upon a time took statistics of how often the > > > PageWriteback case happens here, and concluded that it wasn't often > > > enough that refusing to reuse in this case would be likely to slow > > > anyone down noticeably. > > > > I agree. I had a test about that with zram and that case wasn't common. > > so your patch looks good to me. > > > > I am waiting Dan's reply(He will come in this week) and then, judge what's > > the best. > > Hugh is right that handling the possibility of duplicates is > part of the tmem ABI. If there is any possibility of duplicates, > the ABI defines how a backend must handle them to avoid data > coherency issues. > > The kernel implements an in-kernel API which implements the tmem > ABI. If the frontend and backend can always agree that duplicates > are never possible, I agree that the backend could avoid that > special case. However, duplicates occur rarely enough and the > consequences (data loss) are bad enough that I think the case > should still be checked, at least with a BUG_ON. I also wonder > if it is worth it to make changes to the core swap subsystem > to avoid code to implement a zswap corner case. It wasn't only zswap but it could be applied zram, too. > > Remember that zswap is an oversimplified special case of tmem > that handles only one frontend (Linux frontswap) and one backend > (zswap). Tmem goes well beyond that and already supports other > more general backends including Xen and ramster, and could also > support other frontends such as a BSD or Solaris equivalent > of frontswap, for example with a Linux ramster/zcache backend. > I'm not sure how wise it is to tear out generic code and replace > it with simplistic code unless there is absolutely no chance that > the generic code will be necessary. Fair enough. Thanks for clarifying that, Hugh, Dan. > > My two cents, > Dan > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx181.postini.com [74.125.245.181]) by kanga.kvack.org (Postfix) with SMTP id E3DD56B0002 for ; Tue, 19 Feb 2013 07:12:09 -0500 (EST) Received: by mail-qa0-f47.google.com with SMTP id j8so1775571qah.13 for ; Tue, 19 Feb 2013 04:12:08 -0800 (PST) Message-ID: <51236C11.1010208@gmail.com> Date: Tue, 19 Feb 2013 20:12:01 +0800 From: Ric Mason MIME-Version: 1.0 Subject: Re: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dan Magenheimer Cc: Minchan Kim , Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton On 02/05/2013 05:28 AM, Dan Magenheimer wrote: >> From: Minchan Kim [mailto:minchan@kernel.org] >> Sent: Sunday, February 03, 2013 7:50 PM >> To: Hugh Dickins >> Cc: Nitin Gupta; Dan Magenheimer; Seth Jennings; Konrad Rzeszutek Wilk; linux-mm@kvack.org; linux- >> kernel@vger.kernel.org; Andrew Morton >> Subject: Re: Questin about swap_slot free and invalidate page >> >> Hi Hugh, >> >> On Sun, Feb 03, 2013 at 05:51:14PM -0800, Hugh Dickins wrote: >>> On Thu, 31 Jan 2013, Minchan Kim wrote: >>> >>>> When I reviewed zswap, I was curious about frontswap_store. >>>> It said following as. >>>> >>>> * If frontswap already contains a page with matching swaptype and >>>> * offset, the frontswap implementation may either overwrite the data and >>>> * return success or invalidate the page from frontswap and return failure. >>>> >>>> It didn't say why it happens. we already have __frontswap_invalidate_page >>>> and call it whenever swap_slot frees. If we don't free swap slot, >>>> scan_swap_map can't find the slot for swap out so I thought overwriting of >>>> data shouldn't happen in frontswap. >>>> >>>> As I looked the code, the curplit is reuse_swap_page. It couldn't free swap >>>> slot if the page founded is PG_writeback but miss calling frontswap_invalidate_page >>>> so data overwriting on frontswap can happen. I'm not sure frontswap guys >>>> already discussed it long time ago. >>>> >>>> If we can fix it, we can remove duplication entry handling logic >>>> in all of backend of frontswap. All of backend should handle it although >>>> it's pretty rare. Of course, zram could be fixed. It might be trivial now >>>> but more there are many backend of frontswap, more it would be a headache. >>>> >>>> If we are trying to fix it in swap layer, we might fix it following as >>>> >>>> int reuse_swap_page(struct page *page) >>>> { >>>> .. >>>> .. >>>> if (count == 1) { >>>> if (!PageWriteback(page)) { >>>> delete_from_swap_cache(page); >>>> SetPageDirty(page); >>>> } else { >>>> frontswap_invalidate_page(); >>>> swap_slot_free_notify(); >>>> } >>>> } >>>> } >>>> >>>> But not sure, it is worth at the moment and there might be other places >>>> to be fixed.(I hope Hugh can point out if we are missing something if he >>>> has a time) >>> I expect you are right that reuse_swap_page() is the only way it would >>> happen for frontswap; but I'm too unfamiliar with frontswap to promise >>> you that - it's better that you insert WARN_ONs in your testing to verify. >>> >>> But I think it's a general tmem property, isn't it? To define what >>> happens if you do give it the same key again. So I doubt it's something >> I am too unfamiliar with tmem property but thing I am seeing is >> EXPORT_SYMBOL(__frontswap_store). It's a one of frontend and is tighly very >> coupled with swap subsystem. >> >>> that has to be fixed; but if you do find it helpful to fix it, bear in >>> mind that reuse_swap_page() is an odd corner, which may one day give the >>> "stable pages" DIF/DIX people trouble, though they've not yet complained. >>> >>> I'd prefer a patch not specific to frontswap, but along the lines below: >>> I think that's the most robust way to express it, though I don't think >>> the (count == 0) case can actually occur inside that block (whereas >>> count == 0 certainly can occur in the !PageSwapCache case). >>> >>> I believe that I once upon a time took statistics of how often the >>> PageWriteback case happens here, and concluded that it wasn't often >>> enough that refusing to reuse in this case would be likely to slow >>> anyone down noticeably. >> I agree. I had a test about that with zram and that case wasn't common. >> so your patch looks good to me. >> >> I am waiting Dan's reply(He will come in this week) and then, judge what's >> the best. > Hugh is right that handling the possibility of duplicates is > part of the tmem ABI. If there is any possibility of duplicates, > the ABI defines how a backend must handle them to avoid data > coherency issues. > > The kernel implements an in-kernel API which implements the tmem > ABI. If the frontend and backend can always agree that duplicate Which ABI in zcache implement that? > are never possible, I agree that the backend could avoid that > special case. However, duplicates occur rarely enough and the > consequences (data loss) are bad enough that I think the case > should still be checked, at least with a BUG_ON. I also wonder > if it is worth it to make changes to the core swap subsystem > to avoid code to implement a zswap corner case. > > Remember that zswap is an oversimplified special case of tmem > that handles only one frontend (Linux frontswap) and one backend > (zswap). Tmem goes well beyond that and already supports other > more general backends including Xen and ramster, and could also > support other frontends such as a BSD or Solaris equivalent > of frontswap, for example with a Linux ramster/zcache backend. > I'm not sure how wise it is to tear out generic code and replace > it with simplistic code unless there is absolutely no chance that > the generic code will be necessary. > > My two cents, > Dan > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx170.postini.com [74.125.245.170]) by kanga.kvack.org (Postfix) with SMTP id 668796B0002 for ; Tue, 19 Feb 2013 10:27:55 -0500 (EST) MIME-Version: 1.0 Message-ID: <1f089254-3abe-4c63-a72a-c9e564ae7d0d@default> Date: Tue, 19 Feb 2013 07:27:24 -0800 (PST) From: Dan Magenheimer Subject: RE: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> <51236C11.1010208@gmail.com> In-Reply-To: <51236C11.1010208@gmail.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Ric Mason Cc: Minchan Kim , Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton > From: Ric Mason [mailto:ric.masonn@gmail.com] > Sent: Tuesday, February 19, 2013 5:12 AM > To: Dan Magenheimer > Cc: Minchan Kim; Hugh Dickins; Nitin Gupta; Seth Jennings; Konrad Rzeszut= ek Wilk; linux-mm@kvack.org; > linux-kernel@vger.kernel.org; Andrew Morton > Subject: Re: Questin about swap_slot free and invalidate page >=20 > On 02/05/2013 05:28 AM, Dan Magenheimer wrote: > >> From: Minchan Kim [mailto:minchan@kernel.org] > >> Sent: Sunday, February 03, 2013 7:50 PM > >> To: Hugh Dickins > >> Cc: Nitin Gupta; Dan Magenheimer; Seth Jennings; Konrad Rzeszutek Wilk= ; linux-mm@kvack.org; linux- > >> kernel@vger.kernel.org; Andrew Morton > >> Subject: Re: Questin about swap_slot free and invalidate page > >> > >> Hi Hugh, > >> > >> On Sun, Feb 03, 2013 at 05:51:14PM -0800, Hugh Dickins wrote: > >>> On Thu, 31 Jan 2013, Minchan Kim wrote: > >>> > >>>> When I reviewed zswap, I was curious about frontswap_store. > >>>> It said following as. > >>>> > >>>> * If frontswap already contains a page with matching swaptype and > >>>> * offset, the frontswap implementation may either overwrite the da= ta and > >>>> * return success or invalidate the page from frontswap and return = failure. > >>>> > >>>> It didn't say why it happens. we already have __frontswap_invalidate= _page > >>>> and call it whenever swap_slot frees. If we don't free swap slot, > >>>> scan_swap_map can't find the slot for swap out so I thought overwrit= ing of > >>>> data shouldn't happen in frontswap. > >>>> > >> I am waiting Dan's reply(He will come in this week) and then, judge wh= at's > >> the best. > > Hugh is right that handling the possibility of duplicates is > > part of the tmem ABI. If there is any possibility of duplicates, > > the ABI defines how a backend must handle them to avoid data > > coherency issues. > > > > The kernel implements an in-kernel API which implements the tmem > > ABI. If the frontend and backend can always agree that duplicate >=20 > Which ABI in zcache implement that? https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v001.p= df The in-kernel APIs are frontswap and cleancache. For more information abou= t tmem, see http://lwn.net/Articles/454795/=20 =20 > > are never possible, I agree that the backend could avoid that > > special case. However, duplicates occur rarely enough and the > > consequences (data loss) are bad enough that I think the case > > should still be checked, at least with a BUG_ON. I also wonder > > if it is worth it to make changes to the core swap subsystem > > to avoid code to implement a zswap corner case. > > > > Remember that zswap is an oversimplified special case of tmem > > that handles only one frontend (Linux frontswap) and one backend > > (zswap). Tmem goes well beyond that and already supports other > > more general backends including Xen and ramster, and could also > > support other frontends such as a BSD or Solaris equivalent > > of frontswap, for example with a Linux ramster/zcache backend. > > I'm not sure how wise it is to tear out generic code and replace > > it with simplistic code unless there is absolutely no chance that > > the generic code will be necessary. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx109.postini.com [74.125.245.109]) by kanga.kvack.org (Postfix) with SMTP id DFFF36B0002 for ; Tue, 19 Feb 2013 21:04:04 -0500 (EST) Received: by mail-pb0-f42.google.com with SMTP id xb4so2576378pbc.1 for ; Tue, 19 Feb 2013 18:04:04 -0800 (PST) Message-ID: <51242F0D.4040201@gmail.com> Date: Wed, 20 Feb 2013 10:03:57 +0800 From: Ric Mason MIME-Version: 1.0 Subject: Re: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> <51236C11.1010208@gmail.com> <1f089254-3abe-4c63-a72a-c9e564ae7d0d@default> In-Reply-To: <1f089254-3abe-4c63-a72a-c9e564ae7d0d@default> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dan Magenheimer Cc: Minchan Kim , Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton On 02/19/2013 11:27 PM, Dan Magenheimer wrote: >> From: Ric Mason [mailto:ric.masonn@gmail.com] >> Sent: Tuesday, February 19, 2013 5:12 AM >> To: Dan Magenheimer >> Cc: Minchan Kim; Hugh Dickins; Nitin Gupta; Seth Jennings; Konrad Rzeszutek Wilk; linux-mm@kvack.org; >> linux-kernel@vger.kernel.org; Andrew Morton >> Subject: Re: Questin about swap_slot free and invalidate page >> >> On 02/05/2013 05:28 AM, Dan Magenheimer wrote: >>>> From: Minchan Kim [mailto:minchan@kernel.org] >>>> Sent: Sunday, February 03, 2013 7:50 PM >>>> To: Hugh Dickins >>>> Cc: Nitin Gupta; Dan Magenheimer; Seth Jennings; Konrad Rzeszutek Wilk; linux-mm@kvack.org; linux- >>>> kernel@vger.kernel.org; Andrew Morton >>>> Subject: Re: Questin about swap_slot free and invalidate page >>>> >>>> Hi Hugh, >>>> >>>> On Sun, Feb 03, 2013 at 05:51:14PM -0800, Hugh Dickins wrote: >>>>> On Thu, 31 Jan 2013, Minchan Kim wrote: >>>>> >>>>>> When I reviewed zswap, I was curious about frontswap_store. >>>>>> It said following as. >>>>>> >>>>>> * If frontswap already contains a page with matching swaptype and >>>>>> * offset, the frontswap implementation may either overwrite the data and >>>>>> * return success or invalidate the page from frontswap and return failure. >>>>>> >>>>>> It didn't say why it happens. we already have __frontswap_invalidate_page >>>>>> and call it whenever swap_slot frees. If we don't free swap slot, >>>>>> scan_swap_map can't find the slot for swap out so I thought overwriting of >>>>>> data shouldn't happen in frontswap. >>>>>> >>>> I am waiting Dan's reply(He will come in this week) and then, judge what's >>>> the best. >>> Hugh is right that handling the possibility of duplicates is >>> part of the tmem ABI. If there is any possibility of duplicates, >>> the ABI defines how a backend must handle them to avoid data >>> coherency issues. >>> >>> The kernel implements an in-kernel API which implements the tmem >>> ABI. If the frontend and backend can always agree that duplicate >> Which ABI in zcache implement that? > https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v001.pdf > > The in-kernel APIs are frontswap and cleancache. For more information about > tmem, see http://lwn.net/Articles/454795/ But you mentioned that you have in-kernel API which can handle duplicate. Do you mean zcache_cleancache/frontswap_put_page? I think they just overwrite instead of optional flush the page on the second(duplicate) put as mentioned in your tmemspec. > >>> are never possible, I agree that the backend could avoid that >>> special case. However, duplicates occur rarely enough and the >>> consequences (data loss) are bad enough that I think the case >>> should still be checked, at least with a BUG_ON. I also wonder >>> if it is worth it to make changes to the core swap subsystem >>> to avoid code to implement a zswap corner case. >>> >>> Remember that zswap is an oversimplified special case of tmem >>> that handles only one frontend (Linux frontswap) and one backend >>> (zswap). Tmem goes well beyond that and already supports other >>> more general backends including Xen and ramster, and could also >>> support other frontends such as a BSD or Solaris equivalent >>> of frontswap, for example with a Linux ramster/zcache backend. >>> I'm not sure how wise it is to tear out generic code and replace >>> it with simplistic code unless there is absolutely no chance that >>> the generic code will be necessary. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx116.postini.com [74.125.245.116]) by kanga.kvack.org (Postfix) with SMTP id 263066B0002 for ; Thu, 21 Feb 2013 16:42:43 -0500 (EST) MIME-Version: 1.0 Message-ID: <7793705b-a076-4c5a-be4d-9572d7560860@default> Date: Thu, 21 Feb 2013 13:42:21 -0800 (PST) From: Dan Magenheimer Subject: RE: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> <51236C11.1010208@gmail.com> <1f089254-3abe-4c63-a72a-c9e564ae7d0d@default> <51242F0D.4040201@gmail.com> In-Reply-To: <51242F0D.4040201@gmail.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Ric Mason Cc: Minchan Kim , Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton > From: Ric Mason [mailto:ric.masonn@gmail.com] > Subject: Re: Questin about swap_slot free and invalidate page >=20 > On 02/19/2013 11:27 PM, Dan Magenheimer wrote: > >> From: Ric Mason [mailto:ric.masonn@gmail.com] > >>> Hugh is right that handling the possibility of duplicates is > >>> part of the tmem ABI. If there is any possibility of duplicates, > >>> the ABI defines how a backend must handle them to avoid data > >>> coherency issues. > >>> > >>> The kernel implements an in-kernel API which implements the tmem > >>> ABI. If the frontend and backend can always agree that duplicate > >> Which ABI in zcache implement that? > > https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v0= 01.pdf > > > > The in-kernel APIs are frontswap and cleancache. For more information = about > > tmem, see http://lwn.net/Articles/454795/ >=20 > But you mentioned that you have in-kernel API which can handle > duplicate. Do you mean zcache_cleancache/frontswap_put_page? I think > they just overwrite instead of optional flush the page on the > second(duplicate) put as mentioned in your tmemspec. Maybe I am misunderstanding your question... The spec allows overwrite (and return success) OR flush the page (and return failure). Zcache does the latter (flush). The code that implements it is in tmem_put. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx114.postini.com [74.125.245.114]) by kanga.kvack.org (Postfix) with SMTP id 24FDC6B0002 for ; Thu, 21 Feb 2013 22:13:33 -0500 (EST) Received: by mail-qa0-f48.google.com with SMTP id j8so192253qah.0 for ; Thu, 21 Feb 2013 19:13:32 -0800 (PST) Message-ID: <5126E253.2030105@gmail.com> Date: Fri, 22 Feb 2013 11:13:23 +0800 From: Ric Mason MIME-Version: 1.0 Subject: Re: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> <51236C11.1010208@gmail.com> <1f089254-3abe-4c63-a72a-c9e564ae7d0d@default> <51242F0D.4040201@gmail.com> <7793705b-a076-4c5a-be4d-9572d7560860@default> In-Reply-To: <7793705b-a076-4c5a-be4d-9572d7560860@default> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dan Magenheimer Cc: Minchan Kim , Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton On 02/22/2013 05:42 AM, Dan Magenheimer wrote: >> From: Ric Mason [mailto:ric.masonn@gmail.com] >> Subject: Re: Questin about swap_slot free and invalidate page >> >> On 02/19/2013 11:27 PM, Dan Magenheimer wrote: >>>> From: Ric Mason [mailto:ric.masonn@gmail.com] >>>>> Hugh is right that handling the possibility of duplicates is >>>>> part of the tmem ABI. If there is any possibility of duplicates, >>>>> the ABI defines how a backend must handle them to avoid data >>>>> coherency issues. >>>>> >>>>> The kernel implements an in-kernel API which implements the tmem >>>>> ABI. If the frontend and backend can always agree that duplicate >>>> Which ABI in zcache implement that? >>> https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v001.pdf >>> >>> The in-kernel APIs are frontswap and cleancache. For more information about >>> tmem, see http://lwn.net/Articles/454795/ >> But you mentioned that you have in-kernel API which can handle >> duplicate. Do you mean zcache_cleancache/frontswap_put_page? I think >> they just overwrite instead of optional flush the page on the >> second(duplicate) put as mentioned in your tmemspec. > Maybe I am misunderstanding your question... The spec allows > overwrite (and return success) OR flush the page (and return > failure). Zcache does the latter (flush). The code that implements > it is in tmem_put. Thanks for your point out. Pers pages can have duplicate put since swap cache page can be reused. Can eph pages also have duplicate put? If yes, when can happen? > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx162.postini.com [74.125.245.162]) by kanga.kvack.org (Postfix) with SMTP id 349B66B0005 for ; Mon, 25 Feb 2013 12:21:14 -0500 (EST) MIME-Version: 1.0 Message-ID: <622b6670-f311-40d5-9c8d-f5dd3e03633c@default> Date: Mon, 25 Feb 2013 09:20:55 -0800 (PST) From: Dan Magenheimer Subject: RE: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> <51236C11.1010208@gmail.com> <1f089254-3abe-4c63-a72a-c9e564ae7d0d@default> <51242F0D.4040201@gmail.com> <7793705b-a076-4c5a-be4d-9572d7560860@default> <5126E253.2030105@gmail.com> In-Reply-To: <5126E253.2030105@gmail.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Ric Mason Cc: Minchan Kim , Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton > From: Ric Mason [mailto:ric.masonn@gmail.com] > Subject: Re: Questin about swap_slot free and invalidate page >=20 > On 02/22/2013 05:42 AM, Dan Magenheimer wrote: > >> From: Ric Mason [mailto:ric.masonn@gmail.com] > >> Subject: Re: Questin about swap_slot free and invalidate page > >> > >> On 02/19/2013 11:27 PM, Dan Magenheimer wrote: > >>>> From: Ric Mason [mailto:ric.masonn@gmail.com] > >>>>> Hugh is right that handling the possibility of duplicates is > >>>>> part of the tmem ABI. If there is any possibility of duplicates, > >>>>> the ABI defines how a backend must handle them to avoid data > >>>>> coherency issues. > >>>>> > >>>>> The kernel implements an in-kernel API which implements the tmem > >>>>> ABI. If the frontend and backend can always agree that duplicate > >>>> Which ABI in zcache implement that? > >>> https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-= v001.pdf > >>> > >>> The in-kernel APIs are frontswap and cleancache. For more informatio= n about > >>> tmem, see http://lwn.net/Articles/454795/ > >> But you mentioned that you have in-kernel API which can handle > >> duplicate. Do you mean zcache_cleancache/frontswap_put_page? I think > >> they just overwrite instead of optional flush the page on the > >> second(duplicate) put as mentioned in your tmemspec. > > Maybe I am misunderstanding your question... The spec allows > > overwrite (and return success) OR flush the page (and return > > failure). Zcache does the latter (flush). The code that implements > > it is in tmem_put. >=20 > Thanks for your point out. Pers pages can have duplicate put since swap > cache page can be reused. Can eph pages also have duplicate put? If yes, > when can happen? Yes, I have seen it. I am not sure of the exact circumstances when it happens as I am not an expert in the VFS subsystem. (Chris Mason wrote the VFS cleancache hooks in 2009.) Dan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752073Ab3AaFLp (ORCPT ); Thu, 31 Jan 2013 00:11:45 -0500 Received: from LGEMRELSE1Q.lge.com ([156.147.1.111]:47804 "EHLO LGEMRELSE1Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751772Ab3AaFLn (ORCPT ); Thu, 31 Jan 2013 00:11:43 -0500 X-AuditID: 9c93016f-b7b70ae000000e36-a5-5109fd0d4fdc Date: Thu, 31 Jan 2013 14:11:40 +0900 From: Minchan Kim To: Nitin Gupta , Dan Magenheimer , Seth Jennings , Hugh Dickins , Konrad Rzeszutek Wilk Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: Questin about swap_slot free and invalidate page Message-ID: <20130131051140.GB23548@blaptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When I reviewed zswap, I was curious about frontswap_store. It said following as. * If frontswap already contains a page with matching swaptype and * offset, the frontswap implementation may either overwrite the data and * return success or invalidate the page from frontswap and return failure. It didn't say why it happens. we already have __frontswap_invalidate_page and call it whenever swap_slot frees. If we don't free swap slot, scan_swap_map can't find the slot for swap out so I thought overwriting of data shouldn't happen in frontswap. As I looked the code, the curplit is reuse_swap_page. It couldn't free swap slot if the page founded is PG_writeback but miss calling frontswap_invalidate_page so data overwriting on frontswap can happen. I'm not sure frontswap guys already discussed it long time ago. If we can fix it, we can remove duplication entry handling logic in all of backend of frontswap. All of backend should handle it although it's pretty rare. Of course, zram could be fixed. It might be trivial now but more there are many backend of frontswap, more it would be a headache. If we are trying to fix it in swap layer, we might fix it following as int reuse_swap_page(struct page *page) { .. .. if (count == 1) { if (!PageWriteback(page)) { delete_from_swap_cache(page); SetPageDirty(page); } else { frontswap_invalidate_page(); swap_slot_free_notify(); } } } But not sure, it is worth at the moment and there might be other places to be fixed.(I hope Hugh can point out if we are missing something if he has a time) If we are reluctant to it, at least, we should write out comment above frontswap_store about that to notice curious guys who spend many time to know WHY and smart guys who are going to fix it with nice way. Mr. Frontswap, What do you think about it? -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753931Ab3BDBwg (ORCPT ); Sun, 3 Feb 2013 20:52:36 -0500 Received: from mail-da0-f45.google.com ([209.85.210.45]:38363 "EHLO mail-da0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753781Ab3BDBvR (ORCPT ); Sun, 3 Feb 2013 20:51:17 -0500 Date: Sun, 3 Feb 2013 17:51:14 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Minchan Kim cc: Nitin Gupta , Dan Magenheimer , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: Questin about swap_slot free and invalidate page In-Reply-To: <20130131051140.GB23548@blaptop> Message-ID: References: <20130131051140.GB23548@blaptop> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 31 Jan 2013, Minchan Kim wrote: > When I reviewed zswap, I was curious about frontswap_store. > It said following as. > > * If frontswap already contains a page with matching swaptype and > * offset, the frontswap implementation may either overwrite the data and > * return success or invalidate the page from frontswap and return failure. > > It didn't say why it happens. we already have __frontswap_invalidate_page > and call it whenever swap_slot frees. If we don't free swap slot, > scan_swap_map can't find the slot for swap out so I thought overwriting of > data shouldn't happen in frontswap. > > As I looked the code, the curplit is reuse_swap_page. It couldn't free swap > slot if the page founded is PG_writeback but miss calling frontswap_invalidate_page > so data overwriting on frontswap can happen. I'm not sure frontswap guys > already discussed it long time ago. > > If we can fix it, we can remove duplication entry handling logic > in all of backend of frontswap. All of backend should handle it although > it's pretty rare. Of course, zram could be fixed. It might be trivial now > but more there are many backend of frontswap, more it would be a headache. > > If we are trying to fix it in swap layer, we might fix it following as > > int reuse_swap_page(struct page *page) > { > .. > .. > if (count == 1) { > if (!PageWriteback(page)) { > delete_from_swap_cache(page); > SetPageDirty(page); > } else { > frontswap_invalidate_page(); > swap_slot_free_notify(); > } > } > } > > But not sure, it is worth at the moment and there might be other places > to be fixed.(I hope Hugh can point out if we are missing something if he > has a time) I expect you are right that reuse_swap_page() is the only way it would happen for frontswap; but I'm too unfamiliar with frontswap to promise you that - it's better that you insert WARN_ONs in your testing to verify. But I think it's a general tmem property, isn't it? To define what happens if you do give it the same key again. So I doubt it's something that has to be fixed; but if you do find it helpful to fix it, bear in mind that reuse_swap_page() is an odd corner, which may one day give the "stable pages" DIF/DIX people trouble, though they've not yet complained. I'd prefer a patch not specific to frontswap, but along the lines below: I think that's the most robust way to express it, though I don't think the (count == 0) case can actually occur inside that block (whereas count == 0 certainly can occur in the !PageSwapCache case). I believe that I once upon a time took statistics of how often the PageWriteback case happens here, and concluded that it wasn't often enough that refusing to reuse in this case would be likely to slow anyone down noticeably. > > If we are reluctant to it, at least, we should write out comment above > frontswap_store about that to notice curious guys who spend many > time to know WHY and smart guys who are going to fix it with nice way. > > Mr. Frontswap, What do you think about it? He's not me of course :) Hugh --- 3.8-rc6/mm/swapfile.c 2012-12-22 09:43:27.668015583 -0800 +++ linux/mm/swapfile.c 2013-02-03 17:31:04.148181857 -0800 @@ -637,8 +637,11 @@ int reuse_swap_page(struct page *page) return 0; count = page_mapcount(page); if (count <= 1 && PageSwapCache(page)) { - count += page_swapcount(page); - if (count == 1 && !PageWriteback(page)) { + if (PageWriteback(page)) + count = 2; /* not safe yet to free its swap */ + else + count += page_swapcount(page); + if (count <= 1) { delete_from_swap_cache(page); SetPageDirty(page); } From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753905Ab3BDCtz (ORCPT ); Sun, 3 Feb 2013 21:49:55 -0500 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:61811 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753761Ab3BDCtx (ORCPT ); Sun, 3 Feb 2013 21:49:53 -0500 X-AuditID: 9c930197-b7ca4ae000006ba8-14-510f21cf1bbb Date: Mon, 4 Feb 2013 11:49:50 +0900 From: Minchan Kim To: Hugh Dickins Cc: Nitin Gupta , Dan Magenheimer , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: Questin about swap_slot free and invalidate page Message-ID: <20130204024950.GD2688@blaptop> References: <20130131051140.GB23548@blaptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Hugh, On Sun, Feb 03, 2013 at 05:51:14PM -0800, Hugh Dickins wrote: > On Thu, 31 Jan 2013, Minchan Kim wrote: > > > When I reviewed zswap, I was curious about frontswap_store. > > It said following as. > > > > * If frontswap already contains a page with matching swaptype and > > * offset, the frontswap implementation may either overwrite the data and > > * return success or invalidate the page from frontswap and return failure. > > > > It didn't say why it happens. we already have __frontswap_invalidate_page > > and call it whenever swap_slot frees. If we don't free swap slot, > > scan_swap_map can't find the slot for swap out so I thought overwriting of > > data shouldn't happen in frontswap. > > > > As I looked the code, the curplit is reuse_swap_page. It couldn't free swap > > slot if the page founded is PG_writeback but miss calling frontswap_invalidate_page > > so data overwriting on frontswap can happen. I'm not sure frontswap guys > > already discussed it long time ago. > > > > If we can fix it, we can remove duplication entry handling logic > > in all of backend of frontswap. All of backend should handle it although > > it's pretty rare. Of course, zram could be fixed. It might be trivial now > > but more there are many backend of frontswap, more it would be a headache. > > > > If we are trying to fix it in swap layer, we might fix it following as > > > > int reuse_swap_page(struct page *page) > > { > > .. > > .. > > if (count == 1) { > > if (!PageWriteback(page)) { > > delete_from_swap_cache(page); > > SetPageDirty(page); > > } else { > > frontswap_invalidate_page(); > > swap_slot_free_notify(); > > } > > } > > } > > > > But not sure, it is worth at the moment and there might be other places > > to be fixed.(I hope Hugh can point out if we are missing something if he > > has a time) > > I expect you are right that reuse_swap_page() is the only way it would > happen for frontswap; but I'm too unfamiliar with frontswap to promise > you that - it's better that you insert WARN_ONs in your testing to verify. > > But I think it's a general tmem property, isn't it? To define what > happens if you do give it the same key again. So I doubt it's something I am too unfamiliar with tmem property but thing I am seeing is EXPORT_SYMBOL(__frontswap_store). It's a one of frontend and is tighly very coupled with swap subsystem. > that has to be fixed; but if you do find it helpful to fix it, bear in > mind that reuse_swap_page() is an odd corner, which may one day give the > "stable pages" DIF/DIX people trouble, though they've not yet complained. > > I'd prefer a patch not specific to frontswap, but along the lines below: > I think that's the most robust way to express it, though I don't think > the (count == 0) case can actually occur inside that block (whereas > count == 0 certainly can occur in the !PageSwapCache case). > > I believe that I once upon a time took statistics of how often the > PageWriteback case happens here, and concluded that it wasn't often > enough that refusing to reuse in this case would be likely to slow > anyone down noticeably. I agree. I had a test about that with zram and that case wasn't common. so your patch looks good to me. I am waiting Dan's reply(He will come in this week) and then, judge what's the best. Thanks! > > > > > If we are reluctant to it, at least, we should write out comment above > > frontswap_store about that to notice curious guys who spend many > > time to know WHY and smart guys who are going to fix it with nice way. > > > > Mr. Frontswap, What do you think about it? > > He's not me of course :) > > Hugh > > --- 3.8-rc6/mm/swapfile.c 2012-12-22 09:43:27.668015583 -0800 > +++ linux/mm/swapfile.c 2013-02-03 17:31:04.148181857 -0800 > @@ -637,8 +637,11 @@ int reuse_swap_page(struct page *page) > return 0; > count = page_mapcount(page); > if (count <= 1 && PageSwapCache(page)) { > - count += page_swapcount(page); > - if (count == 1 && !PageWriteback(page)) { > + if (PageWriteback(page)) > + count = 2; /* not safe yet to free its swap */ > + else > + count += page_swapcount(page); > + if (count <= 1) { > delete_from_swap_cache(page); > SetPageDirty(page); > } > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755262Ab3BDV3L (ORCPT ); Mon, 4 Feb 2013 16:29:11 -0500 Received: from aserp1040.oracle.com ([141.146.126.69]:17280 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754610Ab3BDV3K convert rfc822-to-8bit (ORCPT ); Mon, 4 Feb 2013 16:29:10 -0500 MIME-Version: 1.0 Message-ID: Date: Mon, 4 Feb 2013 13:28:55 -0800 (PST) From: Dan Magenheimer To: Minchan Kim , Hugh Dickins Cc: Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: RE: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> In-Reply-To: <20130204024950.GD2688@blaptop> X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.7 (607090) [OL 12.0.6665.5003 (x86)] Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT X-Source-IP: ucsinet21.oracle.com [156.151.31.93] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > From: Minchan Kim [mailto:minchan@kernel.org] > Sent: Sunday, February 03, 2013 7:50 PM > To: Hugh Dickins > Cc: Nitin Gupta; Dan Magenheimer; Seth Jennings; Konrad Rzeszutek Wilk; linux-mm@kvack.org; linux- > kernel@vger.kernel.org; Andrew Morton > Subject: Re: Questin about swap_slot free and invalidate page > > Hi Hugh, > > On Sun, Feb 03, 2013 at 05:51:14PM -0800, Hugh Dickins wrote: > > On Thu, 31 Jan 2013, Minchan Kim wrote: > > > > > When I reviewed zswap, I was curious about frontswap_store. > > > It said following as. > > > > > > * If frontswap already contains a page with matching swaptype and > > > * offset, the frontswap implementation may either overwrite the data and > > > * return success or invalidate the page from frontswap and return failure. > > > > > > It didn't say why it happens. we already have __frontswap_invalidate_page > > > and call it whenever swap_slot frees. If we don't free swap slot, > > > scan_swap_map can't find the slot for swap out so I thought overwriting of > > > data shouldn't happen in frontswap. > > > > > > As I looked the code, the curplit is reuse_swap_page. It couldn't free swap > > > slot if the page founded is PG_writeback but miss calling frontswap_invalidate_page > > > so data overwriting on frontswap can happen. I'm not sure frontswap guys > > > already discussed it long time ago. > > > > > > If we can fix it, we can remove duplication entry handling logic > > > in all of backend of frontswap. All of backend should handle it although > > > it's pretty rare. Of course, zram could be fixed. It might be trivial now > > > but more there are many backend of frontswap, more it would be a headache. > > > > > > If we are trying to fix it in swap layer, we might fix it following as > > > > > > int reuse_swap_page(struct page *page) > > > { > > > .. > > > .. > > > if (count == 1) { > > > if (!PageWriteback(page)) { > > > delete_from_swap_cache(page); > > > SetPageDirty(page); > > > } else { > > > frontswap_invalidate_page(); > > > swap_slot_free_notify(); > > > } > > > } > > > } > > > > > > But not sure, it is worth at the moment and there might be other places > > > to be fixed.(I hope Hugh can point out if we are missing something if he > > > has a time) > > > > I expect you are right that reuse_swap_page() is the only way it would > > happen for frontswap; but I'm too unfamiliar with frontswap to promise > > you that - it's better that you insert WARN_ONs in your testing to verify. > > > > But I think it's a general tmem property, isn't it? To define what > > happens if you do give it the same key again. So I doubt it's something > > I am too unfamiliar with tmem property but thing I am seeing is > EXPORT_SYMBOL(__frontswap_store). It's a one of frontend and is tighly very > coupled with swap subsystem. > > > that has to be fixed; but if you do find it helpful to fix it, bear in > > mind that reuse_swap_page() is an odd corner, which may one day give the > > "stable pages" DIF/DIX people trouble, though they've not yet complained. > > > > I'd prefer a patch not specific to frontswap, but along the lines below: > > I think that's the most robust way to express it, though I don't think > > the (count == 0) case can actually occur inside that block (whereas > > count == 0 certainly can occur in the !PageSwapCache case). > > > > I believe that I once upon a time took statistics of how often the > > PageWriteback case happens here, and concluded that it wasn't often > > enough that refusing to reuse in this case would be likely to slow > > anyone down noticeably. > > I agree. I had a test about that with zram and that case wasn't common. > so your patch looks good to me. > > I am waiting Dan's reply(He will come in this week) and then, judge what's > the best. Hugh is right that handling the possibility of duplicates is part of the tmem ABI. If there is any possibility of duplicates, the ABI defines how a backend must handle them to avoid data coherency issues. The kernel implements an in-kernel API which implements the tmem ABI. If the frontend and backend can always agree that duplicates are never possible, I agree that the backend could avoid that special case. However, duplicates occur rarely enough and the consequences (data loss) are bad enough that I think the case should still be checked, at least with a BUG_ON. I also wonder if it is worth it to make changes to the core swap subsystem to avoid code to implement a zswap corner case. Remember that zswap is an oversimplified special case of tmem that handles only one frontend (Linux frontswap) and one backend (zswap). Tmem goes well beyond that and already supports other more general backends including Xen and ramster, and could also support other frontends such as a BSD or Solaris equivalent of frontswap, for example with a Linux ramster/zcache backend. I'm not sure how wise it is to tear out generic code and replace it with simplistic code unless there is absolutely no chance that the generic code will be necessary. My two cents, Dan From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755635Ab3BEBYo (ORCPT ); Mon, 4 Feb 2013 20:24:44 -0500 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:49649 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754594Ab3BEBYn (ORCPT ); Mon, 4 Feb 2013 20:24:43 -0500 X-AuditID: 9c930197-b7ca4ae000006ba8-9b-51105f586f90 Date: Tue, 5 Feb 2013 10:24:40 +0900 From: Minchan Kim To: Dan Magenheimer Cc: Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: Questin about swap_slot free and invalidate page Message-ID: <20130205012440.GF2610@blaptop> References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 04, 2013 at 01:28:55PM -0800, Dan Magenheimer wrote: > > From: Minchan Kim [mailto:minchan@kernel.org] > > Sent: Sunday, February 03, 2013 7:50 PM > > To: Hugh Dickins > > Cc: Nitin Gupta; Dan Magenheimer; Seth Jennings; Konrad Rzeszutek Wilk; linux-mm@kvack.org; linux- > > kernel@vger.kernel.org; Andrew Morton > > Subject: Re: Questin about swap_slot free and invalidate page > > > > Hi Hugh, > > > > On Sun, Feb 03, 2013 at 05:51:14PM -0800, Hugh Dickins wrote: > > > On Thu, 31 Jan 2013, Minchan Kim wrote: > > > > > > > When I reviewed zswap, I was curious about frontswap_store. > > > > It said following as. > > > > > > > > * If frontswap already contains a page with matching swaptype and > > > > * offset, the frontswap implementation may either overwrite the data and > > > > * return success or invalidate the page from frontswap and return failure. > > > > > > > > It didn't say why it happens. we already have __frontswap_invalidate_page > > > > and call it whenever swap_slot frees. If we don't free swap slot, > > > > scan_swap_map can't find the slot for swap out so I thought overwriting of > > > > data shouldn't happen in frontswap. > > > > > > > > As I looked the code, the curplit is reuse_swap_page. It couldn't free swap > > > > slot if the page founded is PG_writeback but miss calling frontswap_invalidate_page > > > > so data overwriting on frontswap can happen. I'm not sure frontswap guys > > > > already discussed it long time ago. > > > > > > > > If we can fix it, we can remove duplication entry handling logic > > > > in all of backend of frontswap. All of backend should handle it although > > > > it's pretty rare. Of course, zram could be fixed. It might be trivial now > > > > but more there are many backend of frontswap, more it would be a headache. > > > > > > > > If we are trying to fix it in swap layer, we might fix it following as > > > > > > > > int reuse_swap_page(struct page *page) > > > > { > > > > .. > > > > .. > > > > if (count == 1) { > > > > if (!PageWriteback(page)) { > > > > delete_from_swap_cache(page); > > > > SetPageDirty(page); > > > > } else { > > > > frontswap_invalidate_page(); > > > > swap_slot_free_notify(); > > > > } > > > > } > > > > } > > > > > > > > But not sure, it is worth at the moment and there might be other places > > > > to be fixed.(I hope Hugh can point out if we are missing something if he > > > > has a time) > > > > > > I expect you are right that reuse_swap_page() is the only way it would > > > happen for frontswap; but I'm too unfamiliar with frontswap to promise > > > you that - it's better that you insert WARN_ONs in your testing to verify. > > > > > > But I think it's a general tmem property, isn't it? To define what > > > happens if you do give it the same key again. So I doubt it's something > > > > I am too unfamiliar with tmem property but thing I am seeing is > > EXPORT_SYMBOL(__frontswap_store). It's a one of frontend and is tighly very > > coupled with swap subsystem. > > > > > that has to be fixed; but if you do find it helpful to fix it, bear in > > > mind that reuse_swap_page() is an odd corner, which may one day give the > > > "stable pages" DIF/DIX people trouble, though they've not yet complained. > > > > > > I'd prefer a patch not specific to frontswap, but along the lines below: > > > I think that's the most robust way to express it, though I don't think > > > the (count == 0) case can actually occur inside that block (whereas > > > count == 0 certainly can occur in the !PageSwapCache case). > > > > > > I believe that I once upon a time took statistics of how often the > > > PageWriteback case happens here, and concluded that it wasn't often > > > enough that refusing to reuse in this case would be likely to slow > > > anyone down noticeably. > > > > I agree. I had a test about that with zram and that case wasn't common. > > so your patch looks good to me. > > > > I am waiting Dan's reply(He will come in this week) and then, judge what's > > the best. > > Hugh is right that handling the possibility of duplicates is > part of the tmem ABI. If there is any possibility of duplicates, > the ABI defines how a backend must handle them to avoid data > coherency issues. > > The kernel implements an in-kernel API which implements the tmem > ABI. If the frontend and backend can always agree that duplicates > are never possible, I agree that the backend could avoid that > special case. However, duplicates occur rarely enough and the > consequences (data loss) are bad enough that I think the case > should still be checked, at least with a BUG_ON. I also wonder > if it is worth it to make changes to the core swap subsystem > to avoid code to implement a zswap corner case. It wasn't only zswap but it could be applied zram, too. > > Remember that zswap is an oversimplified special case of tmem > that handles only one frontend (Linux frontswap) and one backend > (zswap). Tmem goes well beyond that and already supports other > more general backends including Xen and ramster, and could also > support other frontends such as a BSD or Solaris equivalent > of frontswap, for example with a Linux ramster/zcache backend. > I'm not sure how wise it is to tear out generic code and replace > it with simplistic code unless there is absolutely no chance that > the generic code will be necessary. Fair enough. Thanks for clarifying that, Hugh, Dan. > > My two cents, > Dan > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932738Ab3BSMS0 (ORCPT ); Tue, 19 Feb 2013 07:18:26 -0500 Received: from mail-qe0-f43.google.com ([209.85.128.43]:51583 "EHLO mail-qe0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932535Ab3BSMSY (ORCPT ); Tue, 19 Feb 2013 07:18:24 -0500 Message-ID: <51236C11.1010208@gmail.com> Date: Tue, 19 Feb 2013 20:12:01 +0800 From: Ric Mason User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 To: Dan Magenheimer CC: Minchan Kim , Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/05/2013 05:28 AM, Dan Magenheimer wrote: >> From: Minchan Kim [mailto:minchan@kernel.org] >> Sent: Sunday, February 03, 2013 7:50 PM >> To: Hugh Dickins >> Cc: Nitin Gupta; Dan Magenheimer; Seth Jennings; Konrad Rzeszutek Wilk; linux-mm@kvack.org; linux- >> kernel@vger.kernel.org; Andrew Morton >> Subject: Re: Questin about swap_slot free and invalidate page >> >> Hi Hugh, >> >> On Sun, Feb 03, 2013 at 05:51:14PM -0800, Hugh Dickins wrote: >>> On Thu, 31 Jan 2013, Minchan Kim wrote: >>> >>>> When I reviewed zswap, I was curious about frontswap_store. >>>> It said following as. >>>> >>>> * If frontswap already contains a page with matching swaptype and >>>> * offset, the frontswap implementation may either overwrite the data and >>>> * return success or invalidate the page from frontswap and return failure. >>>> >>>> It didn't say why it happens. we already have __frontswap_invalidate_page >>>> and call it whenever swap_slot frees. If we don't free swap slot, >>>> scan_swap_map can't find the slot for swap out so I thought overwriting of >>>> data shouldn't happen in frontswap. >>>> >>>> As I looked the code, the curplit is reuse_swap_page. It couldn't free swap >>>> slot if the page founded is PG_writeback but miss calling frontswap_invalidate_page >>>> so data overwriting on frontswap can happen. I'm not sure frontswap guys >>>> already discussed it long time ago. >>>> >>>> If we can fix it, we can remove duplication entry handling logic >>>> in all of backend of frontswap. All of backend should handle it although >>>> it's pretty rare. Of course, zram could be fixed. It might be trivial now >>>> but more there are many backend of frontswap, more it would be a headache. >>>> >>>> If we are trying to fix it in swap layer, we might fix it following as >>>> >>>> int reuse_swap_page(struct page *page) >>>> { >>>> .. >>>> .. >>>> if (count == 1) { >>>> if (!PageWriteback(page)) { >>>> delete_from_swap_cache(page); >>>> SetPageDirty(page); >>>> } else { >>>> frontswap_invalidate_page(); >>>> swap_slot_free_notify(); >>>> } >>>> } >>>> } >>>> >>>> But not sure, it is worth at the moment and there might be other places >>>> to be fixed.(I hope Hugh can point out if we are missing something if he >>>> has a time) >>> I expect you are right that reuse_swap_page() is the only way it would >>> happen for frontswap; but I'm too unfamiliar with frontswap to promise >>> you that - it's better that you insert WARN_ONs in your testing to verify. >>> >>> But I think it's a general tmem property, isn't it? To define what >>> happens if you do give it the same key again. So I doubt it's something >> I am too unfamiliar with tmem property but thing I am seeing is >> EXPORT_SYMBOL(__frontswap_store). It's a one of frontend and is tighly very >> coupled with swap subsystem. >> >>> that has to be fixed; but if you do find it helpful to fix it, bear in >>> mind that reuse_swap_page() is an odd corner, which may one day give the >>> "stable pages" DIF/DIX people trouble, though they've not yet complained. >>> >>> I'd prefer a patch not specific to frontswap, but along the lines below: >>> I think that's the most robust way to express it, though I don't think >>> the (count == 0) case can actually occur inside that block (whereas >>> count == 0 certainly can occur in the !PageSwapCache case). >>> >>> I believe that I once upon a time took statistics of how often the >>> PageWriteback case happens here, and concluded that it wasn't often >>> enough that refusing to reuse in this case would be likely to slow >>> anyone down noticeably. >> I agree. I had a test about that with zram and that case wasn't common. >> so your patch looks good to me. >> >> I am waiting Dan's reply(He will come in this week) and then, judge what's >> the best. > Hugh is right that handling the possibility of duplicates is > part of the tmem ABI. If there is any possibility of duplicates, > the ABI defines how a backend must handle them to avoid data > coherency issues. > > The kernel implements an in-kernel API which implements the tmem > ABI. If the frontend and backend can always agree that duplicate Which ABI in zcache implement that? > are never possible, I agree that the backend could avoid that > special case. However, duplicates occur rarely enough and the > consequences (data loss) are bad enough that I think the case > should still be checked, at least with a BUG_ON. I also wonder > if it is worth it to make changes to the core swap subsystem > to avoid code to implement a zswap corner case. > > Remember that zswap is an oversimplified special case of tmem > that handles only one frontend (Linux frontswap) and one backend > (zswap). Tmem goes well beyond that and already supports other > more general backends including Xen and ramster, and could also > support other frontends such as a BSD or Solaris equivalent > of frontswap, for example with a Linux ramster/zcache backend. > I'm not sure how wise it is to tear out generic code and replace > it with simplistic code unless there is absolutely no chance that > the generic code will be necessary. > > My two cents, > Dan > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933150Ab3BSPaM (ORCPT ); Tue, 19 Feb 2013 10:30:12 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:19354 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933114Ab3BSPaJ convert rfc822-to-8bit (ORCPT ); Tue, 19 Feb 2013 10:30:09 -0500 MIME-Version: 1.0 Message-ID: <1f089254-3abe-4c63-a72a-c9e564ae7d0d@default> Date: Tue, 19 Feb 2013 07:27:24 -0800 (PST) From: Dan Magenheimer To: Ric Mason Cc: Minchan Kim , Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: RE: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> <51236C11.1010208@gmail.com> In-Reply-To: <51236C11.1010208@gmail.com> X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.7 (607090) [OL 12.0.6665.5003 (x86)] Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT X-Source-IP: acsinet22.oracle.com [141.146.126.238] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > From: Ric Mason [mailto:ric.masonn@gmail.com] > Sent: Tuesday, February 19, 2013 5:12 AM > To: Dan Magenheimer > Cc: Minchan Kim; Hugh Dickins; Nitin Gupta; Seth Jennings; Konrad Rzeszutek Wilk; linux-mm@kvack.org; > linux-kernel@vger.kernel.org; Andrew Morton > Subject: Re: Questin about swap_slot free and invalidate page > > On 02/05/2013 05:28 AM, Dan Magenheimer wrote: > >> From: Minchan Kim [mailto:minchan@kernel.org] > >> Sent: Sunday, February 03, 2013 7:50 PM > >> To: Hugh Dickins > >> Cc: Nitin Gupta; Dan Magenheimer; Seth Jennings; Konrad Rzeszutek Wilk; linux-mm@kvack.org; linux- > >> kernel@vger.kernel.org; Andrew Morton > >> Subject: Re: Questin about swap_slot free and invalidate page > >> > >> Hi Hugh, > >> > >> On Sun, Feb 03, 2013 at 05:51:14PM -0800, Hugh Dickins wrote: > >>> On Thu, 31 Jan 2013, Minchan Kim wrote: > >>> > >>>> When I reviewed zswap, I was curious about frontswap_store. > >>>> It said following as. > >>>> > >>>> * If frontswap already contains a page with matching swaptype and > >>>> * offset, the frontswap implementation may either overwrite the data and > >>>> * return success or invalidate the page from frontswap and return failure. > >>>> > >>>> It didn't say why it happens. we already have __frontswap_invalidate_page > >>>> and call it whenever swap_slot frees. If we don't free swap slot, > >>>> scan_swap_map can't find the slot for swap out so I thought overwriting of > >>>> data shouldn't happen in frontswap. > >>>> > >> I am waiting Dan's reply(He will come in this week) and then, judge what's > >> the best. > > Hugh is right that handling the possibility of duplicates is > > part of the tmem ABI. If there is any possibility of duplicates, > > the ABI defines how a backend must handle them to avoid data > > coherency issues. > > > > The kernel implements an in-kernel API which implements the tmem > > ABI. If the frontend and backend can always agree that duplicate > > Which ABI in zcache implement that? https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v001.pdf The in-kernel APIs are frontswap and cleancache. For more information about tmem, see http://lwn.net/Articles/454795/ > > are never possible, I agree that the backend could avoid that > > special case. However, duplicates occur rarely enough and the > > consequences (data loss) are bad enough that I think the case > > should still be checked, at least with a BUG_ON. I also wonder > > if it is worth it to make changes to the core swap subsystem > > to avoid code to implement a zswap corner case. > > > > Remember that zswap is an oversimplified special case of tmem > > that handles only one frontend (Linux frontswap) and one backend > > (zswap). Tmem goes well beyond that and already supports other > > more general backends including Xen and ramster, and could also > > support other frontends such as a BSD or Solaris equivalent > > of frontswap, for example with a Linux ramster/zcache backend. > > I'm not sure how wise it is to tear out generic code and replace > > it with simplistic code unless there is absolutely no chance that > > the generic code will be necessary. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934404Ab3BTCEG (ORCPT ); Tue, 19 Feb 2013 21:04:06 -0500 Received: from mail-da0-f41.google.com ([209.85.210.41]:55219 "EHLO mail-da0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933498Ab3BTCEF (ORCPT ); Tue, 19 Feb 2013 21:04:05 -0500 Message-ID: <51242F0D.4040201@gmail.com> Date: Wed, 20 Feb 2013 10:03:57 +0800 From: Ric Mason User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 To: Dan Magenheimer CC: Minchan Kim , Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> <51236C11.1010208@gmail.com> <1f089254-3abe-4c63-a72a-c9e564ae7d0d@default> In-Reply-To: <1f089254-3abe-4c63-a72a-c9e564ae7d0d@default> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/19/2013 11:27 PM, Dan Magenheimer wrote: >> From: Ric Mason [mailto:ric.masonn@gmail.com] >> Sent: Tuesday, February 19, 2013 5:12 AM >> To: Dan Magenheimer >> Cc: Minchan Kim; Hugh Dickins; Nitin Gupta; Seth Jennings; Konrad Rzeszutek Wilk; linux-mm@kvack.org; >> linux-kernel@vger.kernel.org; Andrew Morton >> Subject: Re: Questin about swap_slot free and invalidate page >> >> On 02/05/2013 05:28 AM, Dan Magenheimer wrote: >>>> From: Minchan Kim [mailto:minchan@kernel.org] >>>> Sent: Sunday, February 03, 2013 7:50 PM >>>> To: Hugh Dickins >>>> Cc: Nitin Gupta; Dan Magenheimer; Seth Jennings; Konrad Rzeszutek Wilk; linux-mm@kvack.org; linux- >>>> kernel@vger.kernel.org; Andrew Morton >>>> Subject: Re: Questin about swap_slot free and invalidate page >>>> >>>> Hi Hugh, >>>> >>>> On Sun, Feb 03, 2013 at 05:51:14PM -0800, Hugh Dickins wrote: >>>>> On Thu, 31 Jan 2013, Minchan Kim wrote: >>>>> >>>>>> When I reviewed zswap, I was curious about frontswap_store. >>>>>> It said following as. >>>>>> >>>>>> * If frontswap already contains a page with matching swaptype and >>>>>> * offset, the frontswap implementation may either overwrite the data and >>>>>> * return success or invalidate the page from frontswap and return failure. >>>>>> >>>>>> It didn't say why it happens. we already have __frontswap_invalidate_page >>>>>> and call it whenever swap_slot frees. If we don't free swap slot, >>>>>> scan_swap_map can't find the slot for swap out so I thought overwriting of >>>>>> data shouldn't happen in frontswap. >>>>>> >>>> I am waiting Dan's reply(He will come in this week) and then, judge what's >>>> the best. >>> Hugh is right that handling the possibility of duplicates is >>> part of the tmem ABI. If there is any possibility of duplicates, >>> the ABI defines how a backend must handle them to avoid data >>> coherency issues. >>> >>> The kernel implements an in-kernel API which implements the tmem >>> ABI. If the frontend and backend can always agree that duplicate >> Which ABI in zcache implement that? > https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v001.pdf > > The in-kernel APIs are frontswap and cleancache. For more information about > tmem, see http://lwn.net/Articles/454795/ But you mentioned that you have in-kernel API which can handle duplicate. Do you mean zcache_cleancache/frontswap_put_page? I think they just overwrite instead of optional flush the page on the second(duplicate) put as mentioned in your tmemspec. > >>> are never possible, I agree that the backend could avoid that >>> special case. However, duplicates occur rarely enough and the >>> consequences (data loss) are bad enough that I think the case >>> should still be checked, at least with a BUG_ON. I also wonder >>> if it is worth it to make changes to the core swap subsystem >>> to avoid code to implement a zswap corner case. >>> >>> Remember that zswap is an oversimplified special case of tmem >>> that handles only one frontend (Linux frontswap) and one backend >>> (zswap). Tmem goes well beyond that and already supports other >>> more general backends including Xen and ramster, and could also >>> support other frontends such as a BSD or Solaris equivalent >>> of frontswap, for example with a Linux ramster/zcache backend. >>> I'm not sure how wise it is to tear out generic code and replace >>> it with simplistic code unless there is absolutely no chance that >>> the generic code will be necessary. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755030Ab3BUVms (ORCPT ); Thu, 21 Feb 2013 16:42:48 -0500 Received: from aserp1040.oracle.com ([141.146.126.69]:33021 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754654Ab3BUVmr convert rfc822-to-8bit (ORCPT ); Thu, 21 Feb 2013 16:42:47 -0500 MIME-Version: 1.0 Message-ID: <7793705b-a076-4c5a-be4d-9572d7560860@default> Date: Thu, 21 Feb 2013 13:42:21 -0800 (PST) From: Dan Magenheimer To: Ric Mason Cc: Minchan Kim , Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: RE: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> <51236C11.1010208@gmail.com> <1f089254-3abe-4c63-a72a-c9e564ae7d0d@default> <51242F0D.4040201@gmail.com> In-Reply-To: <51242F0D.4040201@gmail.com> X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.7 (607090) [OL 12.0.6665.5003 (x86)] Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT X-Source-IP: acsinet21.oracle.com [141.146.126.237] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > From: Ric Mason [mailto:ric.masonn@gmail.com] > Subject: Re: Questin about swap_slot free and invalidate page > > On 02/19/2013 11:27 PM, Dan Magenheimer wrote: > >> From: Ric Mason [mailto:ric.masonn@gmail.com] > >>> Hugh is right that handling the possibility of duplicates is > >>> part of the tmem ABI. If there is any possibility of duplicates, > >>> the ABI defines how a backend must handle them to avoid data > >>> coherency issues. > >>> > >>> The kernel implements an in-kernel API which implements the tmem > >>> ABI. If the frontend and backend can always agree that duplicate > >> Which ABI in zcache implement that? > > https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v001.pdf > > > > The in-kernel APIs are frontswap and cleancache. For more information about > > tmem, see http://lwn.net/Articles/454795/ > > But you mentioned that you have in-kernel API which can handle > duplicate. Do you mean zcache_cleancache/frontswap_put_page? I think > they just overwrite instead of optional flush the page on the > second(duplicate) put as mentioned in your tmemspec. Maybe I am misunderstanding your question... The spec allows overwrite (and return success) OR flush the page (and return failure). Zcache does the latter (flush). The code that implements it is in tmem_put. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755760Ab3BVDNf (ORCPT ); Thu, 21 Feb 2013 22:13:35 -0500 Received: from mail-qc0-f182.google.com ([209.85.216.182]:49121 "EHLO mail-qc0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753513Ab3BVDNc (ORCPT ); Thu, 21 Feb 2013 22:13:32 -0500 Message-ID: <5126E253.2030105@gmail.com> Date: Fri, 22 Feb 2013 11:13:23 +0800 From: Ric Mason User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 To: Dan Magenheimer CC: Minchan Kim , Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> <51236C11.1010208@gmail.com> <1f089254-3abe-4c63-a72a-c9e564ae7d0d@default> <51242F0D.4040201@gmail.com> <7793705b-a076-4c5a-be4d-9572d7560860@default> In-Reply-To: <7793705b-a076-4c5a-be4d-9572d7560860@default> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/22/2013 05:42 AM, Dan Magenheimer wrote: >> From: Ric Mason [mailto:ric.masonn@gmail.com] >> Subject: Re: Questin about swap_slot free and invalidate page >> >> On 02/19/2013 11:27 PM, Dan Magenheimer wrote: >>>> From: Ric Mason [mailto:ric.masonn@gmail.com] >>>>> Hugh is right that handling the possibility of duplicates is >>>>> part of the tmem ABI. If there is any possibility of duplicates, >>>>> the ABI defines how a backend must handle them to avoid data >>>>> coherency issues. >>>>> >>>>> The kernel implements an in-kernel API which implements the tmem >>>>> ABI. If the frontend and backend can always agree that duplicate >>>> Which ABI in zcache implement that? >>> https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v001.pdf >>> >>> The in-kernel APIs are frontswap and cleancache. For more information about >>> tmem, see http://lwn.net/Articles/454795/ >> But you mentioned that you have in-kernel API which can handle >> duplicate. Do you mean zcache_cleancache/frontswap_put_page? I think >> they just overwrite instead of optional flush the page on the >> second(duplicate) put as mentioned in your tmemspec. > Maybe I am misunderstanding your question... The spec allows > overwrite (and return success) OR flush the page (and return > failure). Zcache does the latter (flush). The code that implements > it is in tmem_put. Thanks for your point out. Pers pages can have duplicate put since swap cache page can be reused. Can eph pages also have duplicate put? If yes, when can happen? > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758173Ab3BYR1N (ORCPT ); Mon, 25 Feb 2013 12:27:13 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:49552 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753101Ab3BYR1K convert rfc822-to-8bit (ORCPT ); Mon, 25 Feb 2013 12:27:10 -0500 MIME-Version: 1.0 Message-ID: <622b6670-f311-40d5-9c8d-f5dd3e03633c@default> Date: Mon, 25 Feb 2013 09:20:55 -0800 (PST) From: Dan Magenheimer To: Ric Mason Cc: Minchan Kim , Hugh Dickins , Nitin Gupta , Seth Jennings , Konrad Rzeszutek Wilk , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: RE: Questin about swap_slot free and invalidate page References: <20130131051140.GB23548@blaptop> <20130204024950.GD2688@blaptop> <51236C11.1010208@gmail.com> <1f089254-3abe-4c63-a72a-c9e564ae7d0d@default> <51242F0D.4040201@gmail.com> <7793705b-a076-4c5a-be4d-9572d7560860@default> <5126E253.2030105@gmail.com> In-Reply-To: <5126E253.2030105@gmail.com> X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.7 (607090) [OL 12.0.6665.5003 (x86)] Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT X-Source-IP: ucsinet21.oracle.com [156.151.31.93] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > From: Ric Mason [mailto:ric.masonn@gmail.com] > Subject: Re: Questin about swap_slot free and invalidate page > > On 02/22/2013 05:42 AM, Dan Magenheimer wrote: > >> From: Ric Mason [mailto:ric.masonn@gmail.com] > >> Subject: Re: Questin about swap_slot free and invalidate page > >> > >> On 02/19/2013 11:27 PM, Dan Magenheimer wrote: > >>>> From: Ric Mason [mailto:ric.masonn@gmail.com] > >>>>> Hugh is right that handling the possibility of duplicates is > >>>>> part of the tmem ABI. If there is any possibility of duplicates, > >>>>> the ABI defines how a backend must handle them to avoid data > >>>>> coherency issues. > >>>>> > >>>>> The kernel implements an in-kernel API which implements the tmem > >>>>> ABI. If the frontend and backend can always agree that duplicate > >>>> Which ABI in zcache implement that? > >>> https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v001.pdf > >>> > >>> The in-kernel APIs are frontswap and cleancache. For more information about > >>> tmem, see http://lwn.net/Articles/454795/ > >> But you mentioned that you have in-kernel API which can handle > >> duplicate. Do you mean zcache_cleancache/frontswap_put_page? I think > >> they just overwrite instead of optional flush the page on the > >> second(duplicate) put as mentioned in your tmemspec. > > Maybe I am misunderstanding your question... The spec allows > > overwrite (and return success) OR flush the page (and return > > failure). Zcache does the latter (flush). The code that implements > > it is in tmem_put. > > Thanks for your point out. Pers pages can have duplicate put since swap > cache page can be reused. Can eph pages also have duplicate put? If yes, > when can happen? Yes, I have seen it. I am not sure of the exact circumstances when it happens as I am not an expert in the VFS subsystem. (Chris Mason wrote the VFS cleancache hooks in 2009.) Dan