* [PATCH] swap: redirty page if page write fails on swap file @ 2013-04-17 12:11 ` Jerome Marchand 0 siblings, 0 replies; 28+ messages in thread From: Jerome Marchand @ 2013-04-17 12:11 UTC (permalink / raw) To: linux-mm@kvack.org; +Cc: linux-kernel, Mel Gorman Since commit 62c230b, swap_writepage() calls direct_IO on swap files. However, in that case page isn't redirtied if I/O fails, and is therefore handled afterwards as if it has been successfully written to the swap file, leading to memory corruption when the page is eventually swapped back in. This patch sets the page dirty when direct_IO() fails. It fixes a memory corruption that happened while using swap-over-NFS. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> --- mm/page_io.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 78eee32..04ca00d 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) if (ret == PAGE_SIZE) { count_vm_event(PSWPOUT); ret = 0; + } else { + set_page_dirty(page); } return ret; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH] swap: redirty page if page write fails on swap file @ 2013-04-17 12:11 ` Jerome Marchand 0 siblings, 0 replies; 28+ messages in thread From: Jerome Marchand @ 2013-04-17 12:11 UTC (permalink / raw) To: linux-mm@kvack.org; +Cc: linux-kernel, Mel Gorman Since commit 62c230b, swap_writepage() calls direct_IO on swap files. However, in that case page isn't redirtied if I/O fails, and is therefore handled afterwards as if it has been successfully written to the swap file, leading to memory corruption when the page is eventually swapped back in. This patch sets the page dirty when direct_IO() fails. It fixes a memory corruption that happened while using swap-over-NFS. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> --- mm/page_io.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 78eee32..04ca00d 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) if (ret == PAGE_SIZE) { count_vm_event(PSWPOUT); ret = 0; + } else { + set_page_dirty(page); } return ret; } ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file 2013-04-17 12:11 ` Jerome Marchand @ 2013-04-17 15:07 ` Johannes Weiner -1 siblings, 0 replies; 28+ messages in thread From: Johannes Weiner @ 2013-04-17 15:07 UTC (permalink / raw) To: Jerome Marchand; +Cc: linux-mm@kvack.org, linux-kernel, Mel Gorman On Wed, Apr 17, 2013 at 02:11:55PM +0200, Jerome Marchand wrote: > > Since commit 62c230b, swap_writepage() calls direct_IO on swap files. > However, in that case page isn't redirtied if I/O fails, and is therefore > handled afterwards as if it has been successfully written to the swap > file, leading to memory corruption when the page is eventually swapped > back in. > This patch sets the page dirty when direct_IO() fails. It fixes a memory > corruption that happened while using swap-over-NFS. > > Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file @ 2013-04-17 15:07 ` Johannes Weiner 0 siblings, 0 replies; 28+ messages in thread From: Johannes Weiner @ 2013-04-17 15:07 UTC (permalink / raw) To: Jerome Marchand; +Cc: linux-mm@kvack.org, linux-kernel, Mel Gorman On Wed, Apr 17, 2013 at 02:11:55PM +0200, Jerome Marchand wrote: > > Since commit 62c230b, swap_writepage() calls direct_IO on swap files. > However, in that case page isn't redirtied if I/O fails, and is therefore > handled afterwards as if it has been successfully written to the swap > file, leading to memory corruption when the page is eventually swapped > back in. > This patch sets the page dirty when direct_IO() fails. It fixes a memory > corruption that happened while using swap-over-NFS. > > Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file 2013-04-17 12:11 ` Jerome Marchand @ 2013-04-17 15:08 ` Mel Gorman -1 siblings, 0 replies; 28+ messages in thread From: Mel Gorman @ 2013-04-17 15:08 UTC (permalink / raw) To: Jerome Marchand; +Cc: linux-mm@kvack.org, linux-kernel, Andrew Morton On Wed, Apr 17, 2013 at 02:11:55PM +0200, Jerome Marchand wrote: > > Since commit 62c230b, swap_writepage() calls direct_IO on swap files. > However, in that case page isn't redirtied if I/O fails, and is therefore > handled afterwards as if it has been successfully written to the swap > file, leading to memory corruption when the page is eventually swapped > back in. > This patch sets the page dirty when direct_IO() fails. It fixes a memory > corruption that happened while using swap-over-NFS. > > Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Thanks Jerome. I've added Andrew to the cc and this should also be considered a candidate for 3.8-stable. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file @ 2013-04-17 15:08 ` Mel Gorman 0 siblings, 0 replies; 28+ messages in thread From: Mel Gorman @ 2013-04-17 15:08 UTC (permalink / raw) To: Jerome Marchand; +Cc: linux-mm@kvack.org, linux-kernel, Andrew Morton On Wed, Apr 17, 2013 at 02:11:55PM +0200, Jerome Marchand wrote: > > Since commit 62c230b, swap_writepage() calls direct_IO on swap files. > However, in that case page isn't redirtied if I/O fails, and is therefore > handled afterwards as if it has been successfully written to the swap > file, leading to memory corruption when the page is eventually swapped > back in. > This patch sets the page dirty when direct_IO() fails. It fixes a memory > corruption that happened while using swap-over-NFS. > > Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Thanks Jerome. I've added Andrew to the cc and this should also be considered a candidate for 3.8-stable. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file 2013-04-17 12:11 ` Jerome Marchand @ 2013-04-18 0:13 ` Simon Jeons -1 siblings, 0 replies; 28+ messages in thread From: Simon Jeons @ 2013-04-18 0:13 UTC (permalink / raw) To: Jerome Marchand; +Cc: linux-mm@kvack.org, linux-kernel, Mel Gorman Hi Jerome, On 04/17/2013 08:11 PM, Jerome Marchand wrote: > Since commit 62c230b, swap_writepage() calls direct_IO on swap files. > However, in that case page isn't redirtied if I/O fails, and is therefore > handled afterwards as if it has been successfully written to the swap > file, leading to memory corruption when the page is eventually swapped > back in. > This patch sets the page dirty when direct_IO() fails. It fixes a memory If swapfile has related page cache which cached swapfile in memory? It is not necessary, correct? > corruption that happened while using swap-over-NFS. > > Signed-off-by: Jerome Marchand <jmarchan@redhat.com> > --- > mm/page_io.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/mm/page_io.c b/mm/page_io.c > index 78eee32..04ca00d 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) > if (ret == PAGE_SIZE) { > count_vm_event(PSWPOUT); > ret = 0; > + } else { > + set_page_dirty(page); > } > return ret; > } > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file @ 2013-04-18 0:13 ` Simon Jeons 0 siblings, 0 replies; 28+ messages in thread From: Simon Jeons @ 2013-04-18 0:13 UTC (permalink / raw) To: Jerome Marchand; +Cc: linux-mm@kvack.org, linux-kernel, Mel Gorman Hi Jerome, On 04/17/2013 08:11 PM, Jerome Marchand wrote: > Since commit 62c230b, swap_writepage() calls direct_IO on swap files. > However, in that case page isn't redirtied if I/O fails, and is therefore > handled afterwards as if it has been successfully written to the swap > file, leading to memory corruption when the page is eventually swapped > back in. > This patch sets the page dirty when direct_IO() fails. It fixes a memory If swapfile has related page cache which cached swapfile in memory? It is not necessary, correct? > corruption that happened while using swap-over-NFS. > > Signed-off-by: Jerome Marchand <jmarchan@redhat.com> > --- > mm/page_io.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/mm/page_io.c b/mm/page_io.c > index 78eee32..04ca00d 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) > if (ret == PAGE_SIZE) { > count_vm_event(PSWPOUT); > ret = 0; > + } else { > + set_page_dirty(page); > } > return ret; > } > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file 2013-04-18 0:13 ` Simon Jeons @ 2013-05-01 7:39 ` Simon Jeons -1 siblings, 0 replies; 28+ messages in thread From: Simon Jeons @ 2013-05-01 7:39 UTC (permalink / raw) To: Jerome Marchand; +Cc: linux-mm@kvack.org, linux-kernel, Mel Gorman Ping, ;-) On 04/18/2013 08:13 AM, Simon Jeons wrote: > Hi Jerome, > On 04/17/2013 08:11 PM, Jerome Marchand wrote: >> Since commit 62c230b, swap_writepage() calls direct_IO on swap files. >> However, in that case page isn't redirtied if I/O fails, and is >> therefore >> handled afterwards as if it has been successfully written to the swap >> file, leading to memory corruption when the page is eventually swapped >> back in. >> This patch sets the page dirty when direct_IO() fails. It fixes a memory > > If swapfile has related page cache which cached swapfile in memory? It > is not necessary, correct? > >> corruption that happened while using swap-over-NFS. >> >> Signed-off-by: Jerome Marchand <jmarchan@redhat.com> >> --- >> mm/page_io.c | 2 ++ >> 1 files changed, 2 insertions(+), 0 deletions(-) >> >> diff --git a/mm/page_io.c b/mm/page_io.c >> index 78eee32..04ca00d 100644 >> --- a/mm/page_io.c >> +++ b/mm/page_io.c >> @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct >> writeback_control *wbc) >> if (ret == PAGE_SIZE) { >> count_vm_event(PSWPOUT); >> ret = 0; >> + } else { >> + set_page_dirty(page); >> } >> return ret; >> } >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file @ 2013-05-01 7:39 ` Simon Jeons 0 siblings, 0 replies; 28+ messages in thread From: Simon Jeons @ 2013-05-01 7:39 UTC (permalink / raw) To: Jerome Marchand; +Cc: linux-mm@kvack.org, linux-kernel, Mel Gorman Ping, ;-) On 04/18/2013 08:13 AM, Simon Jeons wrote: > Hi Jerome, > On 04/17/2013 08:11 PM, Jerome Marchand wrote: >> Since commit 62c230b, swap_writepage() calls direct_IO on swap files. >> However, in that case page isn't redirtied if I/O fails, and is >> therefore >> handled afterwards as if it has been successfully written to the swap >> file, leading to memory corruption when the page is eventually swapped >> back in. >> This patch sets the page dirty when direct_IO() fails. It fixes a memory > > If swapfile has related page cache which cached swapfile in memory? It > is not necessary, correct? > >> corruption that happened while using swap-over-NFS. >> >> Signed-off-by: Jerome Marchand <jmarchan@redhat.com> >> --- >> mm/page_io.c | 2 ++ >> 1 files changed, 2 insertions(+), 0 deletions(-) >> >> diff --git a/mm/page_io.c b/mm/page_io.c >> index 78eee32..04ca00d 100644 >> --- a/mm/page_io.c >> +++ b/mm/page_io.c >> @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct >> writeback_control *wbc) >> if (ret == PAGE_SIZE) { >> count_vm_event(PSWPOUT); >> ret = 0; >> + } else { >> + set_page_dirty(page); >> } >> return ret; >> } >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file 2013-05-01 7:39 ` Simon Jeons @ 2013-05-03 9:12 ` Jerome Marchand -1 siblings, 0 replies; 28+ messages in thread From: Jerome Marchand @ 2013-05-03 9:12 UTC (permalink / raw) To: Simon Jeons; +Cc: linux-mm@kvack.org, linux-kernel, Mel Gorman On 05/01/2013 09:39 AM, Simon Jeons wrote: > Ping, ;-) > On 04/18/2013 08:13 AM, Simon Jeons wrote: >> Hi Jerome, >> On 04/17/2013 08:11 PM, Jerome Marchand wrote: >>> Since commit 62c230b, swap_writepage() calls direct_IO on swap files. >>> However, in that case page isn't redirtied if I/O fails, and is >>> therefore >>> handled afterwards as if it has been successfully written to the swap >>> file, leading to memory corruption when the page is eventually swapped >>> back in. >>> This patch sets the page dirty when direct_IO() fails. It fixes a memory >> >> If swapfile has related page cache which cached swapfile in memory? It >> is not necessary, correct? I'm not sure I understand the question. What I can tell you is that it is necessary if the swapfile is located on a fs that uses swap_activate (only NFS so far). Other swap file hasn't been impacted by commit 62c230b. >> >>> corruption that happened while using swap-over-NFS. >>> >>> Signed-off-by: Jerome Marchand <jmarchan@redhat.com> >>> --- >>> mm/page_io.c | 2 ++ >>> 1 files changed, 2 insertions(+), 0 deletions(-) >>> >>> diff --git a/mm/page_io.c b/mm/page_io.c >>> index 78eee32..04ca00d 100644 >>> --- a/mm/page_io.c >>> +++ b/mm/page_io.c >>> @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct >>> writeback_control *wbc) >>> if (ret == PAGE_SIZE) { >>> count_vm_event(PSWPOUT); >>> ret = 0; >>> + } else { >>> + set_page_dirty(page); >>> } >>> return ret; >>> } >>> >>> -- >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>> the body to majordomo@kvack.org. For more info on Linux MM, >>> see: http://www.linux-mm.org/ . >>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file @ 2013-05-03 9:12 ` Jerome Marchand 0 siblings, 0 replies; 28+ messages in thread From: Jerome Marchand @ 2013-05-03 9:12 UTC (permalink / raw) To: Simon Jeons; +Cc: linux-mm@kvack.org, linux-kernel, Mel Gorman On 05/01/2013 09:39 AM, Simon Jeons wrote: > Ping, ;-) > On 04/18/2013 08:13 AM, Simon Jeons wrote: >> Hi Jerome, >> On 04/17/2013 08:11 PM, Jerome Marchand wrote: >>> Since commit 62c230b, swap_writepage() calls direct_IO on swap files. >>> However, in that case page isn't redirtied if I/O fails, and is >>> therefore >>> handled afterwards as if it has been successfully written to the swap >>> file, leading to memory corruption when the page is eventually swapped >>> back in. >>> This patch sets the page dirty when direct_IO() fails. It fixes a memory >> >> If swapfile has related page cache which cached swapfile in memory? It >> is not necessary, correct? I'm not sure I understand the question. What I can tell you is that it is necessary if the swapfile is located on a fs that uses swap_activate (only NFS so far). Other swap file hasn't been impacted by commit 62c230b. >> >>> corruption that happened while using swap-over-NFS. >>> >>> Signed-off-by: Jerome Marchand <jmarchan@redhat.com> >>> --- >>> mm/page_io.c | 2 ++ >>> 1 files changed, 2 insertions(+), 0 deletions(-) >>> >>> diff --git a/mm/page_io.c b/mm/page_io.c >>> index 78eee32..04ca00d 100644 >>> --- a/mm/page_io.c >>> +++ b/mm/page_io.c >>> @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct >>> writeback_control *wbc) >>> if (ret == PAGE_SIZE) { >>> count_vm_event(PSWPOUT); >>> ret = 0; >>> + } else { >>> + set_page_dirty(page); >>> } >>> return ret; >>> } >>> >>> -- >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>> the body to majordomo@kvack.org. For more info on Linux MM, >>> see: http://www.linux-mm.org/ . >>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file 2013-04-17 12:11 ` Jerome Marchand @ 2013-04-22 20:37 ` Andrew Morton -1 siblings, 0 replies; 28+ messages in thread From: Andrew Morton @ 2013-04-22 20:37 UTC (permalink / raw) To: Jerome Marchand Cc: linux-mm@kvack.org, linux-kernel, Mel Gorman, Hugh Dickins On Wed, 17 Apr 2013 14:11:55 +0200 Jerome Marchand <jmarchan@redhat.com> wrote: > > Since commit 62c230b, swap_writepage() calls direct_IO on swap files. > However, in that case page isn't redirtied if I/O fails, and is therefore > handled afterwards as if it has been successfully written to the swap > file, leading to memory corruption when the page is eventually swapped > back in. > This patch sets the page dirty when direct_IO() fails. It fixes a memory > corruption that happened while using swap-over-NFS. > > ... > > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) > if (ret == PAGE_SIZE) { > count_vm_event(PSWPOUT); > ret = 0; > + } else { > + set_page_dirty(page); > } > return ret; > } So what happens to the page now? It remains dirty and the kernel later tries to write it again? And if that write also fails, the page is effectively leaked until process exit? Aside: Mel, __swap_writepage() is fairly hair-raising. It unlocks the page before doing the IO and doesn't set PageWriteback(). Why such an exception from normal handling? Also, what is protecting the page from concurrent reclaim or exit() during the above swap_writepage()? Seems that the code needs a bunch of fixes or a bunch of comments explaining why it is safe and why it has to be this way. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file @ 2013-04-22 20:37 ` Andrew Morton 0 siblings, 0 replies; 28+ messages in thread From: Andrew Morton @ 2013-04-22 20:37 UTC (permalink / raw) To: Jerome Marchand Cc: linux-mm@kvack.org, linux-kernel, Mel Gorman, Hugh Dickins On Wed, 17 Apr 2013 14:11:55 +0200 Jerome Marchand <jmarchan@redhat.com> wrote: > > Since commit 62c230b, swap_writepage() calls direct_IO on swap files. > However, in that case page isn't redirtied if I/O fails, and is therefore > handled afterwards as if it has been successfully written to the swap > file, leading to memory corruption when the page is eventually swapped > back in. > This patch sets the page dirty when direct_IO() fails. It fixes a memory > corruption that happened while using swap-over-NFS. > > ... > > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) > if (ret == PAGE_SIZE) { > count_vm_event(PSWPOUT); > ret = 0; > + } else { > + set_page_dirty(page); > } > return ret; > } So what happens to the page now? It remains dirty and the kernel later tries to write it again? And if that write also fails, the page is effectively leaked until process exit? Aside: Mel, __swap_writepage() is fairly hair-raising. It unlocks the page before doing the IO and doesn't set PageWriteback(). Why such an exception from normal handling? Also, what is protecting the page from concurrent reclaim or exit() during the above swap_writepage()? Seems that the code needs a bunch of fixes or a bunch of comments explaining why it is safe and why it has to be this way. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file 2013-04-22 20:37 ` Andrew Morton @ 2013-04-24 9:57 ` Jerome Marchand -1 siblings, 0 replies; 28+ messages in thread From: Jerome Marchand @ 2013-04-24 9:57 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm@kvack.org, linux-kernel, Mel Gorman, Hugh Dickins On 04/22/2013 10:37 PM, Andrew Morton wrote: > On Wed, 17 Apr 2013 14:11:55 +0200 Jerome Marchand <jmarchan@redhat.com> wrote: > >> >> Since commit 62c230b, swap_writepage() calls direct_IO on swap files. >> However, in that case page isn't redirtied if I/O fails, and is therefore >> handled afterwards as if it has been successfully written to the swap >> file, leading to memory corruption when the page is eventually swapped >> back in. >> This patch sets the page dirty when direct_IO() fails. It fixes a memory >> corruption that happened while using swap-over-NFS. >> >> ... >> >> --- a/mm/page_io.c >> +++ b/mm/page_io.c >> @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) >> if (ret == PAGE_SIZE) { >> count_vm_event(PSWPOUT); >> ret = 0; >> + } else { >> + set_page_dirty(page); >> } >> return ret; >> } > > So what happens to the page now? It remains dirty and the kernel later > tries to write it again? Yes. Also, AS_EIO or AS_ENOSPC is set to the address space flags (in this case, swapper_space). > And if that write also fails, the page is > effectively leaked until process exit? AFAICT, there is no special handling for that page afterwards, so if all subsequent attempts fail, it's indeed going to stay in memory until freed. Jerome > > > Aside: Mel, __swap_writepage() is fairly hair-raising. It unlocks the > page before doing the IO and doesn't set PageWriteback(). Why such an > exception from normal handling? > > Also, what is protecting the page from concurrent reclaim or exit() > during the above swap_writepage()? > > Seems that the code needs a bunch of fixes or a bunch of comments > explaining why it is safe and why it has to be this way. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file @ 2013-04-24 9:57 ` Jerome Marchand 0 siblings, 0 replies; 28+ messages in thread From: Jerome Marchand @ 2013-04-24 9:57 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm@kvack.org, linux-kernel, Mel Gorman, Hugh Dickins On 04/22/2013 10:37 PM, Andrew Morton wrote: > On Wed, 17 Apr 2013 14:11:55 +0200 Jerome Marchand <jmarchan@redhat.com> wrote: > >> >> Since commit 62c230b, swap_writepage() calls direct_IO on swap files. >> However, in that case page isn't redirtied if I/O fails, and is therefore >> handled afterwards as if it has been successfully written to the swap >> file, leading to memory corruption when the page is eventually swapped >> back in. >> This patch sets the page dirty when direct_IO() fails. It fixes a memory >> corruption that happened while using swap-over-NFS. >> >> ... >> >> --- a/mm/page_io.c >> +++ b/mm/page_io.c >> @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) >> if (ret == PAGE_SIZE) { >> count_vm_event(PSWPOUT); >> ret = 0; >> + } else { >> + set_page_dirty(page); >> } >> return ret; >> } > > So what happens to the page now? It remains dirty and the kernel later > tries to write it again? Yes. Also, AS_EIO or AS_ENOSPC is set to the address space flags (in this case, swapper_space). > And if that write also fails, the page is > effectively leaked until process exit? AFAICT, there is no special handling for that page afterwards, so if all subsequent attempts fail, it's indeed going to stay in memory until freed. Jerome > > > Aside: Mel, __swap_writepage() is fairly hair-raising. It unlocks the > page before doing the IO and doesn't set PageWriteback(). Why such an > exception from normal handling? > > Also, what is protecting the page from concurrent reclaim or exit() > during the above swap_writepage()? > > Seems that the code needs a bunch of fixes or a bunch of comments > explaining why it is safe and why it has to be this way. > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file 2013-04-24 9:57 ` Jerome Marchand @ 2013-05-01 7:38 ` Will Huck -1 siblings, 0 replies; 28+ messages in thread From: Will Huck @ 2013-05-01 7:38 UTC (permalink / raw) To: Jerome Marchand Cc: Andrew Morton, linux-mm@kvack.org, linux-kernel, Mel Gorman, Hugh Dickins Hi Jerome, On 04/24/2013 05:57 PM, Jerome Marchand wrote: > On 04/22/2013 10:37 PM, Andrew Morton wrote: >> On Wed, 17 Apr 2013 14:11:55 +0200 Jerome Marchand <jmarchan@redhat.com> wrote: >> >>> Since commit 62c230b, swap_writepage() calls direct_IO on swap files. >>> However, in that case page isn't redirtied if I/O fails, and is therefore >>> handled afterwards as if it has been successfully written to the swap >>> file, leading to memory corruption when the page is eventually swapped >>> back in. >>> This patch sets the page dirty when direct_IO() fails. It fixes a memory >>> corruption that happened while using swap-over-NFS. >>> >>> ... >>> >>> --- a/mm/page_io.c >>> +++ b/mm/page_io.c >>> @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) >>> if (ret == PAGE_SIZE) { >>> count_vm_event(PSWPOUT); >>> ret = 0; >>> + } else { >>> + set_page_dirty(page); >>> } >>> return ret; >>> } >> So what happens to the page now? It remains dirty and the kernel later >> tries to write it again? > Yes. Also, AS_EIO or AS_ENOSPC is set to the address space flags (in this > case, swapper_space). After set AS_EIO or AS_ENOSPC, we can't touch swapper_space any more, correct? > >> And if that write also fails, the page is >> effectively leaked until process exit? > AFAICT, there is no special handling for that page afterwards, so if all > subsequent attempts fail, it's indeed going to stay in memory until freed. > > Jerome > > >> >> Aside: Mel, __swap_writepage() is fairly hair-raising. It unlocks the >> page before doing the IO and doesn't set PageWriteback(). Why such an >> exception from normal handling? >> >> Also, what is protecting the page from concurrent reclaim or exit() >> during the above swap_writepage()? >> >> Seems that the code needs a bunch of fixes or a bunch of comments >> explaining why it is safe and why it has to be this way. >> > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] swap: redirty page if page write fails on swap file @ 2013-05-01 7:38 ` Will Huck 0 siblings, 0 replies; 28+ messages in thread From: Will Huck @ 2013-05-01 7:38 UTC (permalink / raw) To: Jerome Marchand Cc: Andrew Morton, linux-mm@kvack.org, linux-kernel, Mel Gorman, Hugh Dickins Hi Jerome, On 04/24/2013 05:57 PM, Jerome Marchand wrote: > On 04/22/2013 10:37 PM, Andrew Morton wrote: >> On Wed, 17 Apr 2013 14:11:55 +0200 Jerome Marchand <jmarchan@redhat.com> wrote: >> >>> Since commit 62c230b, swap_writepage() calls direct_IO on swap files. >>> However, in that case page isn't redirtied if I/O fails, and is therefore >>> handled afterwards as if it has been successfully written to the swap >>> file, leading to memory corruption when the page is eventually swapped >>> back in. >>> This patch sets the page dirty when direct_IO() fails. It fixes a memory >>> corruption that happened while using swap-over-NFS. >>> >>> ... >>> >>> --- a/mm/page_io.c >>> +++ b/mm/page_io.c >>> @@ -222,6 +222,8 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) >>> if (ret == PAGE_SIZE) { >>> count_vm_event(PSWPOUT); >>> ret = 0; >>> + } else { >>> + set_page_dirty(page); >>> } >>> return ret; >>> } >> So what happens to the page now? It remains dirty and the kernel later >> tries to write it again? > Yes. Also, AS_EIO or AS_ENOSPC is set to the address space flags (in this > case, swapper_space). After set AS_EIO or AS_ENOSPC, we can't touch swapper_space any more, correct? > >> And if that write also fails, the page is >> effectively leaked until process exit? > AFAICT, there is no special handling for that page afterwards, so if all > subsequent attempts fail, it's indeed going to stay in memory until freed. > > Jerome > > >> >> Aside: Mel, __swap_writepage() is fairly hair-raising. It unlocks the >> page before doing the IO and doesn't set PageWriteback(). Why such an >> exception from normal handling? >> >> Also, what is protecting the page from concurrent reclaim or exit() >> during the above swap_writepage()? >> >> Seems that the code needs a bunch of fixes or a bunch of comments >> explaining why it is safe and why it has to be this way. >> > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] mm: swap: Mark swap pages writeback before queueing for direct IO 2013-04-22 20:37 ` Andrew Morton @ 2013-04-24 18:57 ` Mel Gorman -1 siblings, 0 replies; 28+ messages in thread From: Mel Gorman @ 2013-04-24 18:57 UTC (permalink / raw) To: Andrew Morton Cc: Jerome Marchand, linux-mm@kvack.org, linux-kernel, Hugh Dickins As pointed out by Andrew Morton, the swap-over-NFS writeback is not setting PageWriteback before it is queued for direct IO. While swap pages do not participate in BDI or process dirty accounting and the IO is synchronous, the writeback bit is still required and not setting it in this case was an oversight. swapoff depends on the page writeback to synchronoise all pending writes on a swap page before it is reused. Swapcache freeing and reuse depend on checking the PageWriteback under lock to ensure the page is safe to reuse. Direct IO handlers and the direct IO handler for NFS do not deal with PageWriteback as they are synchronous writes. In the case of NFS, it schedules pages (or a page in the case of swap) for IO and then waits synchronously for IO to complete in nfs_direct_write(). It is recognised that this is a slowdown from normal swap handling which is asynchronous and uses a completion handler. Shoving PageWriteback handling down into direct IO handlers looks like a bad fit to handle the swap case although it may have to be dealt with some day if swap is converted to use direct IO in general and bmap is finally done away with. At that point it will be necessary to refit asynchronous direct IO with completion handlers onto the swap subsystem. As swapcache currently depends on PageWriteback to protect against races, this patch sets PageWriteback under the page lock before queueing it for direct IO. It is cleared when the direct IO handler returns. IO errors are treated similarly to the direct-to-bio case except PageError is not set as in the case of swap-over-NFS, it is likely to be a transient error. It was asked what prevents such a page being reclaimed in parallel. With this patch applied, such a page will now be skipped (most of the time) or blocked until the writeback completes. Reclaim checks PageWriteback under the page lock before calling try_to_free_swap and the page lock should prevent the page being requeued for IO before it is freed. This and Jerome's related patch should considered for -stable as far back as 3.6 when swap-over-NFS was introduced. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/page_io.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/mm/page_io.c b/mm/page_io.c index 04ca00d..ec04247 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -214,6 +214,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) kiocb.ki_left = PAGE_SIZE; kiocb.ki_nbytes = PAGE_SIZE; + set_page_writeback(page); unlock_page(page); ret = mapping->a_ops->direct_IO(KERNEL_WRITE, &kiocb, &iov, @@ -223,8 +224,24 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) count_vm_event(PSWPOUT); ret = 0; } else { + /* + * In the case of swap-over-nfs, this can be a + * temporary failure if the system has limited + * memory for allocating transmit buffers. + * Mark the page dirty and avoid + * rotate_reclaimable_page but rate-limit the + * messages but do not flag PageError like + * the normal direct-to-bio case as it could + * be temporary. + */ set_page_dirty(page); + ClearPageReclaim(page); + if (printk_ratelimit()) { + pr_err("Write-error on dio swapfile (%Lu)\n", + (unsigned long long)page_file_offset(page)); + } } + end_page_writeback(page); return ret; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH] mm: swap: Mark swap pages writeback before queueing for direct IO @ 2013-04-24 18:57 ` Mel Gorman 0 siblings, 0 replies; 28+ messages in thread From: Mel Gorman @ 2013-04-24 18:57 UTC (permalink / raw) To: Andrew Morton Cc: Jerome Marchand, linux-mm@kvack.org, linux-kernel, Hugh Dickins As pointed out by Andrew Morton, the swap-over-NFS writeback is not setting PageWriteback before it is queued for direct IO. While swap pages do not participate in BDI or process dirty accounting and the IO is synchronous, the writeback bit is still required and not setting it in this case was an oversight. swapoff depends on the page writeback to synchronoise all pending writes on a swap page before it is reused. Swapcache freeing and reuse depend on checking the PageWriteback under lock to ensure the page is safe to reuse. Direct IO handlers and the direct IO handler for NFS do not deal with PageWriteback as they are synchronous writes. In the case of NFS, it schedules pages (or a page in the case of swap) for IO and then waits synchronously for IO to complete in nfs_direct_write(). It is recognised that this is a slowdown from normal swap handling which is asynchronous and uses a completion handler. Shoving PageWriteback handling down into direct IO handlers looks like a bad fit to handle the swap case although it may have to be dealt with some day if swap is converted to use direct IO in general and bmap is finally done away with. At that point it will be necessary to refit asynchronous direct IO with completion handlers onto the swap subsystem. As swapcache currently depends on PageWriteback to protect against races, this patch sets PageWriteback under the page lock before queueing it for direct IO. It is cleared when the direct IO handler returns. IO errors are treated similarly to the direct-to-bio case except PageError is not set as in the case of swap-over-NFS, it is likely to be a transient error. It was asked what prevents such a page being reclaimed in parallel. With this patch applied, such a page will now be skipped (most of the time) or blocked until the writeback completes. Reclaim checks PageWriteback under the page lock before calling try_to_free_swap and the page lock should prevent the page being requeued for IO before it is freed. This and Jerome's related patch should considered for -stable as far back as 3.6 when swap-over-NFS was introduced. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/page_io.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/mm/page_io.c b/mm/page_io.c index 04ca00d..ec04247 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -214,6 +214,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) kiocb.ki_left = PAGE_SIZE; kiocb.ki_nbytes = PAGE_SIZE; + set_page_writeback(page); unlock_page(page); ret = mapping->a_ops->direct_IO(KERNEL_WRITE, &kiocb, &iov, @@ -223,8 +224,24 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) count_vm_event(PSWPOUT); ret = 0; } else { + /* + * In the case of swap-over-nfs, this can be a + * temporary failure if the system has limited + * memory for allocating transmit buffers. + * Mark the page dirty and avoid + * rotate_reclaimable_page but rate-limit the + * messages but do not flag PageError like + * the normal direct-to-bio case as it could + * be temporary. + */ set_page_dirty(page); + ClearPageReclaim(page); + if (printk_ratelimit()) { + pr_err("Write-error on dio swapfile (%Lu)\n", + (unsigned long long)page_file_offset(page)); + } } + end_page_writeback(page); return ret; } ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] mm: swap: Mark swap pages writeback before queueing for direct IO 2013-04-24 18:57 ` Mel Gorman @ 2013-04-24 19:23 ` Andrew Morton -1 siblings, 0 replies; 28+ messages in thread From: Andrew Morton @ 2013-04-24 19:23 UTC (permalink / raw) To: Mel Gorman Cc: Jerome Marchand, linux-mm@kvack.org, linux-kernel, Hugh Dickins On Wed, 24 Apr 2013 19:57:44 +0100 Mel Gorman <mgorman@suse.de> wrote: > As pointed out by Andrew Morton, the swap-over-NFS writeback is not setting > PageWriteback before it is queued for direct IO. While swap pages do not > participate in BDI or process dirty accounting and the IO is synchronous, > the writeback bit is still required and not setting it in this case was > an oversight. swapoff depends on the page writeback to synchronoise all > pending writes on a swap page before it is reused. Swapcache freeing and > reuse depend on checking the PageWriteback under lock to ensure the page > is safe to reuse. > > Direct IO handlers and the direct IO handler for NFS do not deal with > PageWriteback as they are synchronous writes. In the case of NFS, it > schedules pages (or a page in the case of swap) for IO and then waits > synchronously for IO to complete in nfs_direct_write(). It is recognised > that this is a slowdown from normal swap handling which is asynchronous > and uses a completion handler. Shoving PageWriteback handling down into > direct IO handlers looks like a bad fit to handle the swap case although > it may have to be dealt with some day if swap is converted to use direct > IO in general and bmap is finally done away with. At that point it will > be necessary to refit asynchronous direct IO with completion handlers onto > the swap subsystem. > > As swapcache currently depends on PageWriteback to protect against races, > this patch sets PageWriteback under the page lock before queueing it for > direct IO. It is cleared when the direct IO handler returns. IO errors > are treated similarly to the direct-to-bio case except PageError is not > set as in the case of swap-over-NFS, it is likely to be a transient error. > > It was asked what prevents such a page being reclaimed in parallel. > With this patch applied, such a page will now be skipped (most of the time) > or blocked until the writeback completes. Reclaim checks PageWriteback > under the page lock before calling try_to_free_swap and the page lock > should prevent the page being requeued for IO before it is freed. > > This and Jerome's related patch should considered for -stable as far > back as 3.6 when swap-over-NFS was introduced. Fair enough - PageWriteback should protect the page during the redirty. > --- a/mm/page_io.c > +++ b/mm/page_io.c > > ... > > @@ -223,8 +224,24 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) > count_vm_event(PSWPOUT); > ret = 0; > } else { > + /* > + * In the case of swap-over-nfs, this can be a > + * temporary failure if the system has limited > + * memory for allocating transmit buffers. > + * Mark the page dirty and avoid > + * rotate_reclaimable_page but rate-limit the > + * messages but do not flag PageError like > + * the normal direct-to-bio case as it could > + * be temporary. > + */ > set_page_dirty(page); > + ClearPageReclaim(page); > + if (printk_ratelimit()) { > + pr_err("Write-error on dio swapfile (%Lu)\n", > + (unsigned long long)page_file_offset(page)); > + } > } > + end_page_writeback(page); A pox upon printk_ratelimit()! Both its code comment and the checkpatch warning explain why. --- a/mm/page_io.c~mm-swap-mark-swap-pages-writeback-before-queueing-for-direct-io-fix +++ a/mm/page_io.c @@ -244,10 +244,8 @@ int __swap_writepage(struct page *page, */ set_page_dirty(page); ClearPageReclaim(page); - if (printk_ratelimit()) { - pr_err("Write-error on dio swapfile (%Lu)\n", - (unsigned long long)page_file_offset(page)); - } + pr_err_ratelimited("Write error on dio swapfile (%Lu)\n", + (unsigned long long)page_file_offset(page)); } end_page_writeback(page); return ret; Do we need to cast the loff_t? afaict all architectures use long long. I didn't get a warning from sparc64 with the cast removed, and sparc64 is the one which likes to use different underlying types. I think I'll remove it and wait for Fengguang's nastygram. --- a/mm/page_io.c~mm-swap-mark-swap-pages-writeback-before-queueing-for-direct-io-fix-fix +++ a/mm/page_io.c @@ -245,7 +245,7 @@ int __swap_writepage(struct page *page, set_page_dirty(page); ClearPageReclaim(page); pr_err_ratelimited("Write error on dio swapfile (%Lu)\n", - (unsigned long long)page_file_offset(page)); + page_file_offset(page)); } end_page_writeback(page); return ret; _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] mm: swap: Mark swap pages writeback before queueing for direct IO @ 2013-04-24 19:23 ` Andrew Morton 0 siblings, 0 replies; 28+ messages in thread From: Andrew Morton @ 2013-04-24 19:23 UTC (permalink / raw) To: Mel Gorman Cc: Jerome Marchand, linux-mm@kvack.org, linux-kernel, Hugh Dickins On Wed, 24 Apr 2013 19:57:44 +0100 Mel Gorman <mgorman@suse.de> wrote: > As pointed out by Andrew Morton, the swap-over-NFS writeback is not setting > PageWriteback before it is queued for direct IO. While swap pages do not > participate in BDI or process dirty accounting and the IO is synchronous, > the writeback bit is still required and not setting it in this case was > an oversight. swapoff depends on the page writeback to synchronoise all > pending writes on a swap page before it is reused. Swapcache freeing and > reuse depend on checking the PageWriteback under lock to ensure the page > is safe to reuse. > > Direct IO handlers and the direct IO handler for NFS do not deal with > PageWriteback as they are synchronous writes. In the case of NFS, it > schedules pages (or a page in the case of swap) for IO and then waits > synchronously for IO to complete in nfs_direct_write(). It is recognised > that this is a slowdown from normal swap handling which is asynchronous > and uses a completion handler. Shoving PageWriteback handling down into > direct IO handlers looks like a bad fit to handle the swap case although > it may have to be dealt with some day if swap is converted to use direct > IO in general and bmap is finally done away with. At that point it will > be necessary to refit asynchronous direct IO with completion handlers onto > the swap subsystem. > > As swapcache currently depends on PageWriteback to protect against races, > this patch sets PageWriteback under the page lock before queueing it for > direct IO. It is cleared when the direct IO handler returns. IO errors > are treated similarly to the direct-to-bio case except PageError is not > set as in the case of swap-over-NFS, it is likely to be a transient error. > > It was asked what prevents such a page being reclaimed in parallel. > With this patch applied, such a page will now be skipped (most of the time) > or blocked until the writeback completes. Reclaim checks PageWriteback > under the page lock before calling try_to_free_swap and the page lock > should prevent the page being requeued for IO before it is freed. > > This and Jerome's related patch should considered for -stable as far > back as 3.6 when swap-over-NFS was introduced. Fair enough - PageWriteback should protect the page during the redirty. > --- a/mm/page_io.c > +++ b/mm/page_io.c > > ... > > @@ -223,8 +224,24 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) > count_vm_event(PSWPOUT); > ret = 0; > } else { > + /* > + * In the case of swap-over-nfs, this can be a > + * temporary failure if the system has limited > + * memory for allocating transmit buffers. > + * Mark the page dirty and avoid > + * rotate_reclaimable_page but rate-limit the > + * messages but do not flag PageError like > + * the normal direct-to-bio case as it could > + * be temporary. > + */ > set_page_dirty(page); > + ClearPageReclaim(page); > + if (printk_ratelimit()) { > + pr_err("Write-error on dio swapfile (%Lu)\n", > + (unsigned long long)page_file_offset(page)); > + } > } > + end_page_writeback(page); A pox upon printk_ratelimit()! Both its code comment and the checkpatch warning explain why. --- a/mm/page_io.c~mm-swap-mark-swap-pages-writeback-before-queueing-for-direct-io-fix +++ a/mm/page_io.c @@ -244,10 +244,8 @@ int __swap_writepage(struct page *page, */ set_page_dirty(page); ClearPageReclaim(page); - if (printk_ratelimit()) { - pr_err("Write-error on dio swapfile (%Lu)\n", - (unsigned long long)page_file_offset(page)); - } + pr_err_ratelimited("Write error on dio swapfile (%Lu)\n", + (unsigned long long)page_file_offset(page)); } end_page_writeback(page); return ret; Do we need to cast the loff_t? afaict all architectures use long long. I didn't get a warning from sparc64 with the cast removed, and sparc64 is the one which likes to use different underlying types. I think I'll remove it and wait for Fengguang's nastygram. --- a/mm/page_io.c~mm-swap-mark-swap-pages-writeback-before-queueing-for-direct-io-fix-fix +++ a/mm/page_io.c @@ -245,7 +245,7 @@ int __swap_writepage(struct page *page, set_page_dirty(page); ClearPageReclaim(page); pr_err_ratelimited("Write error on dio swapfile (%Lu)\n", - (unsigned long long)page_file_offset(page)); + page_file_offset(page)); } end_page_writeback(page); return ret; _ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] mm: swap: Mark swap pages writeback before queueing for direct IO 2013-04-24 19:23 ` Andrew Morton @ 2013-04-25 8:53 ` Mel Gorman -1 siblings, 0 replies; 28+ messages in thread From: Mel Gorman @ 2013-04-25 8:53 UTC (permalink / raw) To: Andrew Morton Cc: Jerome Marchand, linux-mm@kvack.org, linux-kernel, Hugh Dickins On Wed, Apr 24, 2013 at 12:23:13PM -0700, Andrew Morton wrote: > > } else { > > + /* > > + * In the case of swap-over-nfs, this can be a > > + * temporary failure if the system has limited > > + * memory for allocating transmit buffers. > > + * Mark the page dirty and avoid > > + * rotate_reclaimable_page but rate-limit the > > + * messages but do not flag PageError like > > + * the normal direct-to-bio case as it could > > + * be temporary. > > + */ > > set_page_dirty(page); > > + ClearPageReclaim(page); > > + if (printk_ratelimit()) { > > + pr_err("Write-error on dio swapfile (%Lu)\n", > > + (unsigned long long)page_file_offset(page)); > > + } > > } > > + end_page_writeback(page); > > A pox upon printk_ratelimit()! Both its code comment and the > checkpatch warning explain why. > Ok. There were few sensible options around dealing with the write errors. swap_writepage() could go to sleep on a waitqueue but it's putting IO rate limiting where it doesn't belong. Retrying silently forever could be difficult to debug if the error really is permanent. > --- a/mm/page_io.c~mm-swap-mark-swap-pages-writeback-before-queueing-for-direct-io-fix > +++ a/mm/page_io.c > @@ -244,10 +244,8 @@ int __swap_writepage(struct page *page, > */ > set_page_dirty(page); > ClearPageReclaim(page); > - if (printk_ratelimit()) { > - pr_err("Write-error on dio swapfile (%Lu)\n", > - (unsigned long long)page_file_offset(page)); > - } > + pr_err_ratelimited("Write error on dio swapfile (%Lu)\n", > + (unsigned long long)page_file_offset(page)); > } > end_page_writeback(page); > return ret; > > Do we need to cast the loff_t? afaict all architectures use long long. > I didn't get a warning from sparc64 with the cast removed, and sparc64 > is the one which likes to use different underlying types. > > I think I'll remove it and wait for Fengguang's nastygram. > Sounds reasonable. I'll get cc'd on the same mails. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] mm: swap: Mark swap pages writeback before queueing for direct IO @ 2013-04-25 8:53 ` Mel Gorman 0 siblings, 0 replies; 28+ messages in thread From: Mel Gorman @ 2013-04-25 8:53 UTC (permalink / raw) To: Andrew Morton Cc: Jerome Marchand, linux-mm@kvack.org, linux-kernel, Hugh Dickins On Wed, Apr 24, 2013 at 12:23:13PM -0700, Andrew Morton wrote: > > } else { > > + /* > > + * In the case of swap-over-nfs, this can be a > > + * temporary failure if the system has limited > > + * memory for allocating transmit buffers. > > + * Mark the page dirty and avoid > > + * rotate_reclaimable_page but rate-limit the > > + * messages but do not flag PageError like > > + * the normal direct-to-bio case as it could > > + * be temporary. > > + */ > > set_page_dirty(page); > > + ClearPageReclaim(page); > > + if (printk_ratelimit()) { > > + pr_err("Write-error on dio swapfile (%Lu)\n", > > + (unsigned long long)page_file_offset(page)); > > + } > > } > > + end_page_writeback(page); > > A pox upon printk_ratelimit()! Both its code comment and the > checkpatch warning explain why. > Ok. There were few sensible options around dealing with the write errors. swap_writepage() could go to sleep on a waitqueue but it's putting IO rate limiting where it doesn't belong. Retrying silently forever could be difficult to debug if the error really is permanent. > --- a/mm/page_io.c~mm-swap-mark-swap-pages-writeback-before-queueing-for-direct-io-fix > +++ a/mm/page_io.c > @@ -244,10 +244,8 @@ int __swap_writepage(struct page *page, > */ > set_page_dirty(page); > ClearPageReclaim(page); > - if (printk_ratelimit()) { > - pr_err("Write-error on dio swapfile (%Lu)\n", > - (unsigned long long)page_file_offset(page)); > - } > + pr_err_ratelimited("Write error on dio swapfile (%Lu)\n", > + (unsigned long long)page_file_offset(page)); > } > end_page_writeback(page); > return ret; > > Do we need to cast the loff_t? afaict all architectures use long long. > I didn't get a warning from sparc64 with the cast removed, and sparc64 > is the one which likes to use different underlying types. > > I think I'll remove it and wait for Fengguang's nastygram. > Sounds reasonable. I'll get cc'd on the same mails. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] mm: swap: Mark swap pages writeback before queueing for direct IO 2013-04-24 18:57 ` Mel Gorman @ 2013-05-01 6:58 ` Ric Mason -1 siblings, 0 replies; 28+ messages in thread From: Ric Mason @ 2013-05-01 6:58 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Jerome Marchand, linux-mm@kvack.org, linux-kernel, Hugh Dickins Hi Mel, On 04/25/2013 02:57 AM, Mel Gorman wrote: > As pointed out by Andrew Morton, the swap-over-NFS writeback is not setting > PageWriteback before it is queued for direct IO. While swap pages do not Before commit commit 62c230bc1 (mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages), swap pages will write to page cache firstly and then writeback? > participate in BDI or process dirty accounting and the IO is synchronous, > the writeback bit is still required and not setting it in this case was > an oversight. swapoff depends on the page writeback to synchronoise all > pending writes on a swap page before it is reused. Swapcache freeing and > reuse depend on checking the PageWriteback under lock to ensure the page > is safe to reuse. > > Direct IO handlers and the direct IO handler for NFS do not deal with > PageWriteback as they are synchronous writes. In the case of NFS, it > schedules pages (or a page in the case of swap) for IO and then waits > synchronously for IO to complete in nfs_direct_write(). It is recognised > that this is a slowdown from normal swap handling which is asynchronous > and uses a completion handler. Shoving PageWriteback handling down into > direct IO handlers looks like a bad fit to handle the swap case although > it may have to be dealt with some day if swap is converted to use direct > IO in general and bmap is finally done away with. At that point it will > be necessary to refit asynchronous direct IO with completion handlers onto > the swap subsystem. > > As swapcache currently depends on PageWriteback to protect against races, > this patch sets PageWriteback under the page lock before queueing it for > direct IO. It is cleared when the direct IO handler returns. IO errors > are treated similarly to the direct-to-bio case except PageError is not > set as in the case of swap-over-NFS, it is likely to be a transient error. > > It was asked what prevents such a page being reclaimed in parallel. > With this patch applied, such a page will now be skipped (most of the time) > or blocked until the writeback completes. Reclaim checks PageWriteback > under the page lock before calling try_to_free_swap and the page lock > should prevent the page being requeued for IO before it is freed. > > This and Jerome's related patch should considered for -stable as far > back as 3.6 when swap-over-NFS was introduced. > > Signed-off-by: Mel Gorman <mgorman@suse.de> > --- > mm/page_io.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/mm/page_io.c b/mm/page_io.c > index 04ca00d..ec04247 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -214,6 +214,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) > kiocb.ki_left = PAGE_SIZE; > kiocb.ki_nbytes = PAGE_SIZE; > > + set_page_writeback(page); > unlock_page(page); > ret = mapping->a_ops->direct_IO(KERNEL_WRITE, > &kiocb, &iov, > @@ -223,8 +224,24 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) > count_vm_event(PSWPOUT); > ret = 0; > } else { > + /* > + * In the case of swap-over-nfs, this can be a > + * temporary failure if the system has limited > + * memory for allocating transmit buffers. > + * Mark the page dirty and avoid > + * rotate_reclaimable_page but rate-limit the > + * messages but do not flag PageError like > + * the normal direct-to-bio case as it could > + * be temporary. > + */ > set_page_dirty(page); > + ClearPageReclaim(page); > + if (printk_ratelimit()) { > + pr_err("Write-error on dio swapfile (%Lu)\n", > + (unsigned long long)page_file_offset(page)); > + } > } > + end_page_writeback(page); > return ret; > } > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] mm: swap: Mark swap pages writeback before queueing for direct IO @ 2013-05-01 6:58 ` Ric Mason 0 siblings, 0 replies; 28+ messages in thread From: Ric Mason @ 2013-05-01 6:58 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Jerome Marchand, linux-mm@kvack.org, linux-kernel, Hugh Dickins Hi Mel, On 04/25/2013 02:57 AM, Mel Gorman wrote: > As pointed out by Andrew Morton, the swap-over-NFS writeback is not setting > PageWriteback before it is queued for direct IO. While swap pages do not Before commit commit 62c230bc1 (mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages), swap pages will write to page cache firstly and then writeback? > participate in BDI or process dirty accounting and the IO is synchronous, > the writeback bit is still required and not setting it in this case was > an oversight. swapoff depends on the page writeback to synchronoise all > pending writes on a swap page before it is reused. Swapcache freeing and > reuse depend on checking the PageWriteback under lock to ensure the page > is safe to reuse. > > Direct IO handlers and the direct IO handler for NFS do not deal with > PageWriteback as they are synchronous writes. In the case of NFS, it > schedules pages (or a page in the case of swap) for IO and then waits > synchronously for IO to complete in nfs_direct_write(). It is recognised > that this is a slowdown from normal swap handling which is asynchronous > and uses a completion handler. Shoving PageWriteback handling down into > direct IO handlers looks like a bad fit to handle the swap case although > it may have to be dealt with some day if swap is converted to use direct > IO in general and bmap is finally done away with. At that point it will > be necessary to refit asynchronous direct IO with completion handlers onto > the swap subsystem. > > As swapcache currently depends on PageWriteback to protect against races, > this patch sets PageWriteback under the page lock before queueing it for > direct IO. It is cleared when the direct IO handler returns. IO errors > are treated similarly to the direct-to-bio case except PageError is not > set as in the case of swap-over-NFS, it is likely to be a transient error. > > It was asked what prevents such a page being reclaimed in parallel. > With this patch applied, such a page will now be skipped (most of the time) > or blocked until the writeback completes. Reclaim checks PageWriteback > under the page lock before calling try_to_free_swap and the page lock > should prevent the page being requeued for IO before it is freed. > > This and Jerome's related patch should considered for -stable as far > back as 3.6 when swap-over-NFS was introduced. > > Signed-off-by: Mel Gorman <mgorman@suse.de> > --- > mm/page_io.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/mm/page_io.c b/mm/page_io.c > index 04ca00d..ec04247 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -214,6 +214,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) > kiocb.ki_left = PAGE_SIZE; > kiocb.ki_nbytes = PAGE_SIZE; > > + set_page_writeback(page); > unlock_page(page); > ret = mapping->a_ops->direct_IO(KERNEL_WRITE, > &kiocb, &iov, > @@ -223,8 +224,24 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) > count_vm_event(PSWPOUT); > ret = 0; > } else { > + /* > + * In the case of swap-over-nfs, this can be a > + * temporary failure if the system has limited > + * memory for allocating transmit buffers. > + * Mark the page dirty and avoid > + * rotate_reclaimable_page but rate-limit the > + * messages but do not flag PageError like > + * the normal direct-to-bio case as it could > + * be temporary. > + */ > set_page_dirty(page); > + ClearPageReclaim(page); > + if (printk_ratelimit()) { > + pr_err("Write-error on dio swapfile (%Lu)\n", > + (unsigned long long)page_file_offset(page)); > + } > } > + end_page_writeback(page); > return ret; > } > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] mm: swap: Mark swap pages writeback before queueing for direct IO 2013-05-01 6:58 ` Ric Mason @ 2013-05-01 8:20 ` Mel Gorman -1 siblings, 0 replies; 28+ messages in thread From: Mel Gorman @ 2013-05-01 8:20 UTC (permalink / raw) To: Ric Mason Cc: Andrew Morton, Jerome Marchand, linux-mm@kvack.org, linux-kernel, Hugh Dickins On Wed, May 01, 2013 at 02:58:03PM +0800, Ric Mason wrote: > Hi Mel, > On 04/25/2013 02:57 AM, Mel Gorman wrote: > >As pointed out by Andrew Morton, the swap-over-NFS writeback is not setting > >PageWriteback before it is queued for direct IO. While swap pages do not > > Before commit commit 62c230bc1 (mm: add support for a filesystem to > activate swap files and use direct_IO for writing swap pages), swap > pages will write to page cache firstly and then writeback? > That commit added an *optional* address_space operations method that allowed a filesystem to use their aops->direct_IO method to write to a swapfile. The existing methods for writing swap files are still there so before and after commit 62c230bc1, swap partitions and most swapfiles (backed by filesystems that support bmap) are still the same. Look at swapfile.c, swap_state.c and page_io.c for the details on how swap gets activated, pages are added to swap cache and the writepage method used when aops->writepage is called to write the page to disk respectively. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] mm: swap: Mark swap pages writeback before queueing for direct IO @ 2013-05-01 8:20 ` Mel Gorman 0 siblings, 0 replies; 28+ messages in thread From: Mel Gorman @ 2013-05-01 8:20 UTC (permalink / raw) To: Ric Mason Cc: Andrew Morton, Jerome Marchand, linux-mm@kvack.org, linux-kernel, Hugh Dickins On Wed, May 01, 2013 at 02:58:03PM +0800, Ric Mason wrote: > Hi Mel, > On 04/25/2013 02:57 AM, Mel Gorman wrote: > >As pointed out by Andrew Morton, the swap-over-NFS writeback is not setting > >PageWriteback before it is queued for direct IO. While swap pages do not > > Before commit commit 62c230bc1 (mm: add support for a filesystem to > activate swap files and use direct_IO for writing swap pages), swap > pages will write to page cache firstly and then writeback? > That commit added an *optional* address_space operations method that allowed a filesystem to use their aops->direct_IO method to write to a swapfile. The existing methods for writing swap files are still there so before and after commit 62c230bc1, swap partitions and most swapfiles (backed by filesystems that support bmap) are still the same. Look at swapfile.c, swap_state.c and page_io.c for the details on how swap gets activated, pages are added to swap cache and the writepage method used when aops->writepage is called to write the page to disk respectively. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2013-05-03 9:12 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-17 12:11 [PATCH] swap: redirty page if page write fails on swap file Jerome Marchand 2013-04-17 12:11 ` Jerome Marchand 2013-04-17 15:07 ` Johannes Weiner 2013-04-17 15:07 ` Johannes Weiner 2013-04-17 15:08 ` Mel Gorman 2013-04-17 15:08 ` Mel Gorman 2013-04-18 0:13 ` Simon Jeons 2013-04-18 0:13 ` Simon Jeons 2013-05-01 7:39 ` Simon Jeons 2013-05-01 7:39 ` Simon Jeons 2013-05-03 9:12 ` Jerome Marchand 2013-05-03 9:12 ` Jerome Marchand 2013-04-22 20:37 ` Andrew Morton 2013-04-22 20:37 ` Andrew Morton 2013-04-24 9:57 ` Jerome Marchand 2013-04-24 9:57 ` Jerome Marchand 2013-05-01 7:38 ` Will Huck 2013-05-01 7:38 ` Will Huck 2013-04-24 18:57 ` [PATCH] mm: swap: Mark swap pages writeback before queueing for direct IO Mel Gorman 2013-04-24 18:57 ` Mel Gorman 2013-04-24 19:23 ` Andrew Morton 2013-04-24 19:23 ` Andrew Morton 2013-04-25 8:53 ` Mel Gorman 2013-04-25 8:53 ` Mel Gorman 2013-05-01 6:58 ` Ric Mason 2013-05-01 6:58 ` Ric Mason 2013-05-01 8:20 ` Mel Gorman 2013-05-01 8:20 ` Mel Gorman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.