* [patch 0/6] remove dependency on __GFP_NOFAIL for failable allocations
@ 2010-07-21 2:44 David Rientjes
2010-07-21 2:44 ` [patch 1/6] sparc: remove dependency on __GFP_NOFAIL David Rientjes
` (5 more replies)
0 siblings, 6 replies; 26+ messages in thread
From: David Rientjes @ 2010-07-21 2:44 UTC (permalink / raw)
To: Andrew Morton, David S. Miller, Steve Wise, Al Viro,
Steven Whitehouse, Jan Kara
Cc: Benjamin Herrenschmidt, Roland Dreier, Jens Axboe, Bob Peterson,
Andreas Dilger, Jiri Kosina, linux-mm
This patchset removes __GFP_NOFAIL from various allocations when those
callers have error handling or the subsystem doesn't absolutely require
success.
This is the first phase of two for the total removal of __GFP_NOFAIL:
this patchset is intended to fix obvious users of __GFP_NOFAIL that are
already failable or otherwise unnecessary. The second phase will replace
__GFP_NOFAIL with a different gfp which will use all of the page
allocator's resources (direct reclaim, compaction, and the oom killer)
to free memory but not infinitely loop in the allocator itself.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* [patch 1/6] sparc: remove dependency on __GFP_NOFAIL 2010-07-21 2:44 [patch 0/6] remove dependency on __GFP_NOFAIL for failable allocations David Rientjes @ 2010-07-21 2:44 ` David Rientjes 2010-07-21 3:31 ` David Miller 2010-07-21 2:44 ` [patch 2/6] infiniband: " David Rientjes ` (4 subsequent siblings) 5 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2010-07-21 2:44 UTC (permalink / raw) To: David S. Miller, Andrew Morton Cc: Benjamin Herrenschmidt, sparclinux, linux-mm The kmalloc() in mdesc_kmalloc() is failable, so remove __GFP_NOFAIL from its mask. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Rientjes <rientjes@google.com> --- arch/sparc/kernel/mdesc.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c --- a/arch/sparc/kernel/mdesc.c +++ b/arch/sparc/kernel/mdesc.c @@ -134,7 +134,7 @@ static struct mdesc_handle *mdesc_kmalloc(unsigned int mdesc_size) sizeof(struct mdesc_hdr) + mdesc_size); - base = kmalloc(handle_size + 15, GFP_KERNEL | __GFP_NOFAIL); + base = kmalloc(handle_size + 15, GFP_KERNEL); if (base) { struct mdesc_handle *hp; unsigned long addr; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 1/6] sparc: remove dependency on __GFP_NOFAIL 2010-07-21 2:44 ` [patch 1/6] sparc: remove dependency on __GFP_NOFAIL David Rientjes @ 2010-07-21 3:31 ` David Miller 2010-07-21 9:41 ` David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: David Miller @ 2010-07-21 3:31 UTC (permalink / raw) To: rientjes; +Cc: akpm, benh, sparclinux, linux-mm From: David Rientjes <rientjes@google.com> Date: Tue, 20 Jul 2010 19:44:53 -0700 (PDT) > The kmalloc() in mdesc_kmalloc() is failable, so remove __GFP_NOFAIL from > its mask. > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Signed-off-by: David Rientjes <rientjes@google.com> The __GFP_NOFAIL is there intentionally. The code above this, in the cases where the machine description is dynamically updated by the hypervisor at run time, long after boot, has no failure handling. We absolutely must accept the machine descriptor update and fetch it from the hypervisor into a new buffer. Please don't remove this. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 1/6] sparc: remove dependency on __GFP_NOFAIL 2010-07-21 3:31 ` David Miller @ 2010-07-21 9:41 ` David Rientjes 0 siblings, 0 replies; 26+ messages in thread From: David Rientjes @ 2010-07-21 9:41 UTC (permalink / raw) To: David Miller; +Cc: akpm, benh, sparclinux, linux-mm On Tue, 20 Jul 2010, David Miller wrote: > > The kmalloc() in mdesc_kmalloc() is failable, so remove __GFP_NOFAIL from > > its mask. > > > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > > Signed-off-by: David Rientjes <rientjes@google.com> > > The __GFP_NOFAIL is there intentionally. > > The code above this, in the cases where the machine description is > dynamically updated by the hypervisor at run time, long after boot, > has no failure handling. > > We absolutely must accept the machine descriptor update and fetch it > from the hypervisor into a new buffer. > > Please don't remove this. > Ok, fair enough. I was convinced by the error handling in both mdesc_update() and mdesc_kmallloc() that this was a failable allocation, but I understand how mdesc_update() must succeed given your description. We can remove those branches from those two functions, though, since __GFP_NOFAIL will always succeed before returning. I'm planning on replacing __GFP_NOFAIL with a __GFP_KILLABLE flag that will use all of the page allocator's capabilities (direct reclaim, memory compaction for order > 0, and the oom killer) before failing. Then, existing __GFP_NOFAIL users can use do { page = alloc_page(GFP_KERNEL | __GFP_KILLABLE); } while (!page); to remove several branches from the page allocator that we'll no longer need. I'll do this in phase two and make sure to convert this instance to do that. Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch 2/6] infiniband: remove dependency on __GFP_NOFAIL 2010-07-21 2:44 [patch 0/6] remove dependency on __GFP_NOFAIL for failable allocations David Rientjes 2010-07-21 2:44 ` [patch 1/6] sparc: remove dependency on __GFP_NOFAIL David Rientjes @ 2010-07-21 2:44 ` David Rientjes 2010-07-21 3:19 ` Steve Wise 2010-07-21 2:45 ` [patch 3/6] fs: " David Rientjes ` (3 subsequent siblings) 5 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2010-07-21 2:44 UTC (permalink / raw) To: Steve Wise, Andrew Morton; +Cc: Roland Dreier, linux-rdma, linux-mm The alloc_skb() in various allocations are failable, so remove __GFP_NOFAIL from their masks. Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: David Rientjes <rientjes@google.com> --- drivers/infiniband/hw/cxgb4/cq.c | 4 ++-- drivers/infiniband/hw/cxgb4/mem.c | 2 +- drivers/infiniband/hw/cxgb4/qp.c | 6 +++--- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c --- a/drivers/infiniband/hw/cxgb4/cq.c +++ b/drivers/infiniband/hw/cxgb4/cq.c @@ -43,7 +43,7 @@ static int destroy_cq(struct c4iw_rdev *rdev, struct t4_cq *cq, int ret; wr_len = sizeof *res_wr + sizeof *res; - skb = alloc_skb(wr_len, GFP_KERNEL | __GFP_NOFAIL); + skb = alloc_skb(wr_len, GFP_KERNEL); if (!skb) return -ENOMEM; set_wr_txq(skb, CPL_PRIORITY_CONTROL, 0); @@ -118,7 +118,7 @@ static int create_cq(struct c4iw_rdev *rdev, struct t4_cq *cq, /* build fw_ri_res_wr */ wr_len = sizeof *res_wr + sizeof *res; - skb = alloc_skb(wr_len, GFP_KERNEL | __GFP_NOFAIL); + skb = alloc_skb(wr_len, GFP_KERNEL); if (!skb) { ret = -ENOMEM; goto err4; diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c --- a/drivers/infiniband/hw/cxgb4/mem.c +++ b/drivers/infiniband/hw/cxgb4/mem.c @@ -59,7 +59,7 @@ static int write_adapter_mem(struct c4iw_rdev *rdev, u32 addr, u32 len, wr_len = roundup(sizeof *req + sizeof *sc + roundup(copy_len, T4_ULPTX_MIN_IO), 16); - skb = alloc_skb(wr_len, GFP_KERNEL | __GFP_NOFAIL); + skb = alloc_skb(wr_len, GFP_KERNEL); if (!skb) return -ENOMEM; set_wr_txq(skb, CPL_PRIORITY_CONTROL, 0); diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c --- a/drivers/infiniband/hw/cxgb4/qp.c +++ b/drivers/infiniband/hw/cxgb4/qp.c @@ -130,7 +130,7 @@ static int create_qp(struct c4iw_rdev *rdev, struct t4_wq *wq, /* build fw_ri_res_wr */ wr_len = sizeof *res_wr + 2 * sizeof *res; - skb = alloc_skb(wr_len, GFP_KERNEL | __GFP_NOFAIL); + skb = alloc_skb(wr_len, GFP_KERNEL); if (!skb) { ret = -ENOMEM; goto err7; @@ -961,7 +961,7 @@ static int rdma_fini(struct c4iw_dev *rhp, struct c4iw_qp *qhp) PDBG("%s qhp %p qid 0x%x tid %u\n", __func__, qhp, qhp->wq.sq.qid, qhp->ep->hwtid); - skb = alloc_skb(sizeof *wqe, GFP_KERNEL | __GFP_NOFAIL); + skb = alloc_skb(sizeof *wqe, GFP_KERNEL); if (!skb) return -ENOMEM; set_wr_txq(skb, CPL_PRIORITY_DATA, qhp->ep->txq_idx); @@ -1035,7 +1035,7 @@ static int rdma_init(struct c4iw_dev *rhp, struct c4iw_qp *qhp) PDBG("%s qhp %p qid 0x%x tid %u\n", __func__, qhp, qhp->wq.sq.qid, qhp->ep->hwtid); - skb = alloc_skb(sizeof *wqe, GFP_KERNEL | __GFP_NOFAIL); + skb = alloc_skb(sizeof *wqe, GFP_KERNEL); if (!skb) return -ENOMEM; set_wr_txq(skb, CPL_PRIORITY_DATA, qhp->ep->txq_idx); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 2/6] infiniband: remove dependency on __GFP_NOFAIL 2010-07-21 2:44 ` [patch 2/6] infiniband: " David Rientjes @ 2010-07-21 3:19 ` Steve Wise 2010-07-21 17:55 ` Roland Dreier 0 siblings, 1 reply; 26+ messages in thread From: Steve Wise @ 2010-07-21 3:19 UTC (permalink / raw) To: David Rientjes Cc: Steve Wise, Andrew Morton, Roland Dreier, linux-rdma, linux-mm Acked-by: Steve Wise <swise@opengridcomputing.com> David Rientjes wrote: > The alloc_skb() in various allocations are failable, so remove > __GFP_NOFAIL from their masks. > > Cc: Roland Dreier <rolandd@cisco.com> > Signed-off-by: David Rientjes <rientjes@google.com> > --- > drivers/infiniband/hw/cxgb4/cq.c | 4 ++-- > drivers/infiniband/hw/cxgb4/mem.c | 2 +- > drivers/infiniband/hw/cxgb4/qp.c | 6 +++--- > 3 files changed, 6 insertions(+), 6 deletions(-) > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 2/6] infiniband: remove dependency on __GFP_NOFAIL 2010-07-21 3:19 ` Steve Wise @ 2010-07-21 17:55 ` Roland Dreier 0 siblings, 0 replies; 26+ messages in thread From: Roland Dreier @ 2010-07-21 17:55 UTC (permalink / raw) To: Steve Wise Cc: David Rientjes, Steve Wise, Andrew Morton, Roland Dreier, linux-rdma, linux-mm thanks guys, applied -- Roland Dreier <rolandd@cisco.com> || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch 3/6] fs: remove dependency on __GFP_NOFAIL 2010-07-21 2:44 [patch 0/6] remove dependency on __GFP_NOFAIL for failable allocations David Rientjes 2010-07-21 2:44 ` [patch 1/6] sparc: remove dependency on __GFP_NOFAIL David Rientjes 2010-07-21 2:44 ` [patch 2/6] infiniband: " David Rientjes @ 2010-07-21 2:45 ` David Rientjes 2010-07-23 19:36 ` Andrew Morton 2010-07-21 2:45 ` [patch 4/6] gfs2: " David Rientjes ` (2 subsequent siblings) 5 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2010-07-21 2:45 UTC (permalink / raw) To: Al Viro, Andrew Morton; +Cc: Jens Axboe, linux-fsdevel, linux-mm The kmalloc() in bio_integrity_prep() is failable, so remove __GFP_NOFAIL from its mask. Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: David Rientjes <rientjes@google.com> --- fs/bio-integrity.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c --- a/fs/bio-integrity.c +++ b/fs/bio-integrity.c @@ -413,7 +413,7 @@ int bio_integrity_prep(struct bio *bio) /* Allocate kernel buffer for protection data */ len = sectors * blk_integrity_tuple_size(bi); - buf = kmalloc(len, GFP_NOIO | __GFP_NOFAIL | q->bounce_gfp); + buf = kmalloc(len, GFP_NOIO | q->bounce_gfp); if (unlikely(buf == NULL)) { printk(KERN_ERR "could not allocate integrity buffer\n"); return -EIO; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 3/6] fs: remove dependency on __GFP_NOFAIL 2010-07-21 2:45 ` [patch 3/6] fs: " David Rientjes @ 2010-07-23 19:36 ` Andrew Morton 2010-07-23 19:51 ` David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: Andrew Morton @ 2010-07-23 19:36 UTC (permalink / raw) To: David Rientjes; +Cc: Al Viro, Jens Axboe, linux-fsdevel, linux-mm On Tue, 20 Jul 2010 19:45:00 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > The kmalloc() in bio_integrity_prep() is failable, so remove __GFP_NOFAIL > from its mask. > > Cc: Jens Axboe <jens.axboe@oracle.com> > Signed-off-by: David Rientjes <rientjes@google.com> > --- > fs/bio-integrity.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c > --- a/fs/bio-integrity.c > +++ b/fs/bio-integrity.c > @@ -413,7 +413,7 @@ int bio_integrity_prep(struct bio *bio) > > /* Allocate kernel buffer for protection data */ > len = sectors * blk_integrity_tuple_size(bi); > - buf = kmalloc(len, GFP_NOIO | __GFP_NOFAIL | q->bounce_gfp); > + buf = kmalloc(len, GFP_NOIO | q->bounce_gfp); > if (unlikely(buf == NULL)) { > printk(KERN_ERR "could not allocate integrity buffer\n"); > return -EIO; ^^^ what? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 3/6] fs: remove dependency on __GFP_NOFAIL 2010-07-23 19:36 ` Andrew Morton @ 2010-07-23 19:51 ` David Rientjes 2010-07-23 21:12 ` Pekka Enberg 0 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2010-07-23 19:51 UTC (permalink / raw) To: Andrew Morton; +Cc: Al Viro, Jens Axboe, linux-fsdevel, linux-mm On Fri, 23 Jul 2010, Andrew Morton wrote: > > The kmalloc() in bio_integrity_prep() is failable, so remove __GFP_NOFAIL > > from its mask. > > > > Cc: Jens Axboe <jens.axboe@oracle.com> > > Signed-off-by: David Rientjes <rientjes@google.com> > > --- > > fs/bio-integrity.c | 2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c > > --- a/fs/bio-integrity.c > > +++ b/fs/bio-integrity.c > > @@ -413,7 +413,7 @@ int bio_integrity_prep(struct bio *bio) > > > > /* Allocate kernel buffer for protection data */ > > len = sectors * blk_integrity_tuple_size(bi); > > - buf = kmalloc(len, GFP_NOIO | __GFP_NOFAIL | q->bounce_gfp); > > + buf = kmalloc(len, GFP_NOIO | q->bounce_gfp); > > if (unlikely(buf == NULL)) { > > printk(KERN_ERR "could not allocate integrity buffer\n"); > > return -EIO; > > ^^^ what? > Right, I'm not sure why that decision was made, but it looks like it can be changed over to -ENOMEM without harming anything. I'm concerned that the printk will spam the kernel log endlessly, though, if we're really oom and GFP_NOIO has no hope of freeing memory. This code has never been active, so I'd like to wait for some feedback from Al and Jens (now with a corrected email address, jens.axboe@oracle.com bounced) to see if we want to return -ENOMEM, if the printk is really necessary, and if it would be better to just convert this to a loop with a congestion_wait() instead of returning from bio_integrity_prep(). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 3/6] fs: remove dependency on __GFP_NOFAIL 2010-07-23 19:51 ` David Rientjes @ 2010-07-23 21:12 ` Pekka Enberg 0 siblings, 0 replies; 26+ messages in thread From: Pekka Enberg @ 2010-07-23 21:12 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Al Viro, Jens Axboe, linux-fsdevel, linux-mm On Fri, Jul 23, 2010 at 10:51 PM, David Rientjes <rientjes@google.com> wrote: > On Fri, 23 Jul 2010, Andrew Morton wrote: > >> > The kmalloc() in bio_integrity_prep() is failable, so remove __GFP_NOFAIL >> > from its mask. >> > >> > Cc: Jens Axboe <jens.axboe@oracle.com> >> > Signed-off-by: David Rientjes <rientjes@google.com> >> > --- >> > fs/bio-integrity.c | 2 +- >> > 1 files changed, 1 insertions(+), 1 deletions(-) >> > >> > diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c >> > --- a/fs/bio-integrity.c >> > +++ b/fs/bio-integrity.c >> > @@ -413,7 +413,7 @@ int bio_integrity_prep(struct bio *bio) >> > >> > /* Allocate kernel buffer for protection data */ >> > len = sectors * blk_integrity_tuple_size(bi); >> > - buf = kmalloc(len, GFP_NOIO | __GFP_NOFAIL | q->bounce_gfp); >> > + buf = kmalloc(len, GFP_NOIO | q->bounce_gfp); >> > if (unlikely(buf == NULL)) { >> > printk(KERN_ERR "could not allocate integrity buffer\n"); >> > return -EIO; >> >> ^^^ what? > > Right, I'm not sure why that decision was made, but it looks like it can > be changed over to -ENOMEM without harming anything. I'm concerned that > the printk will spam the kernel log endlessly, though, if we're really oom > and GFP_NOIO has no hope of freeing memory. This code has never been > active, so I'd like to wait for some feedback from Al and Jens (now with a > corrected email address, jens.axboe@oracle.com bounced) to see if we want > to return -ENOMEM, if the printk is really necessary, and if it would be > better to just convert this to a loop with a congestion_wait() instead of > returning from bio_integrity_prep(). Btw, you probably want __GFP_NOWARN here if you expect the allocation to fail under normal conditions. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch 4/6] gfs2: remove dependency on __GFP_NOFAIL 2010-07-21 2:44 [patch 0/6] remove dependency on __GFP_NOFAIL for failable allocations David Rientjes ` (2 preceding siblings ...) 2010-07-21 2:45 ` [patch 3/6] fs: " David Rientjes @ 2010-07-21 2:45 ` David Rientjes 2010-07-21 9:24 ` Steven Whitehouse 2010-07-21 2:45 ` [patch 6/6] jbd2: " David Rientjes 2010-07-21 19:26 ` [patch 5/6] jbd: " David Rientjes 5 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2010-07-21 2:45 UTC (permalink / raw) To: Steven Whitehouse, Andrew Morton; +Cc: Bob Peterson, cluser-devel, linux-mm The k[mc]allocs in dr_split_leaf() and dir_double_exhash() are failable, so remove __GFP_NOFAIL from their masks. Cc: Bob Peterson <rpeterso@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> --- fs/gfs2/dir.c | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c --- a/fs/gfs2/dir.c +++ b/fs/gfs2/dir.c @@ -955,7 +955,12 @@ static int dir_split_leaf(struct inode *inode, const struct qstr *name) /* Change the pointers. Don't bother distinguishing stuffed from non-stuffed. This code is complicated enough already. */ - lp = kmalloc(half_len * sizeof(__be64), GFP_NOFS | __GFP_NOFAIL); + lp = kmalloc(half_len * sizeof(__be64), GFP_NOFS); + if (!lp) { + error = -ENOMEM; + goto fail_brelse; + } + /* Change the pointers */ for (x = 0; x < half_len; x++) lp[x] = cpu_to_be64(bn); @@ -1063,7 +1068,9 @@ static int dir_double_exhash(struct gfs2_inode *dip) /* Allocate both the "from" and "to" buffers in one big chunk */ - buf = kcalloc(3, sdp->sd_hash_bsize, GFP_NOFS | __GFP_NOFAIL); + buf = kcalloc(3, sdp->sd_hash_bsize, GFP_NOFS); + if (!buf) + return -ENOMEM; for (block = dip->i_disksize >> sdp->sd_hash_bsize_shift; block--;) { error = gfs2_dir_read_data(dip, (char *)buf, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 4/6] gfs2: remove dependency on __GFP_NOFAIL 2010-07-21 2:45 ` [patch 4/6] gfs2: " David Rientjes @ 2010-07-21 9:24 ` Steven Whitehouse 2010-07-21 9:31 ` David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: Steven Whitehouse @ 2010-07-21 9:24 UTC (permalink / raw) To: David Rientjes; +Cc: Andrew Morton, Bob Peterson, cluster-devel, linux-mm Hi, Looks good to me, I've added it to the -nmw tree. There are a few more GFP_NOFAIL instances in the code that we can probably remove in the future, but these two are pretty easy. Thanks for the patch, Steve. On Tue, 2010-07-20 at 19:45 -0700, David Rientjes wrote: > The k[mc]allocs in dr_split_leaf() and dir_double_exhash() are failable, > so remove __GFP_NOFAIL from their masks. > > Cc: Bob Peterson <rpeterso@redhat.com> > Signed-off-by: David Rientjes <rientjes@google.com> > --- > fs/gfs2/dir.c | 11 +++++++++-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c > --- a/fs/gfs2/dir.c > +++ b/fs/gfs2/dir.c > @@ -955,7 +955,12 @@ static int dir_split_leaf(struct inode *inode, const struct qstr *name) > /* Change the pointers. > Don't bother distinguishing stuffed from non-stuffed. > This code is complicated enough already. */ > - lp = kmalloc(half_len * sizeof(__be64), GFP_NOFS | __GFP_NOFAIL); > + lp = kmalloc(half_len * sizeof(__be64), GFP_NOFS); > + if (!lp) { > + error = -ENOMEM; > + goto fail_brelse; > + } > + > /* Change the pointers */ > for (x = 0; x < half_len; x++) > lp[x] = cpu_to_be64(bn); > @@ -1063,7 +1068,9 @@ static int dir_double_exhash(struct gfs2_inode *dip) > > /* Allocate both the "from" and "to" buffers in one big chunk */ > > - buf = kcalloc(3, sdp->sd_hash_bsize, GFP_NOFS | __GFP_NOFAIL); > + buf = kcalloc(3, sdp->sd_hash_bsize, GFP_NOFS); > + if (!buf) > + return -ENOMEM; > > for (block = dip->i_disksize >> sdp->sd_hash_bsize_shift; block--;) { > error = gfs2_dir_read_data(dip, (char *)buf, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 4/6] gfs2: remove dependency on __GFP_NOFAIL 2010-07-21 9:24 ` Steven Whitehouse @ 2010-07-21 9:31 ` David Rientjes 0 siblings, 0 replies; 26+ messages in thread From: David Rientjes @ 2010-07-21 9:31 UTC (permalink / raw) To: Steven Whitehouse; +Cc: Andrew Morton, Bob Peterson, cluster-devel, linux-mm On Wed, 21 Jul 2010, Steven Whitehouse wrote: > Hi, > > Looks good to me, I've added it to the -nmw tree. There are a few more > GFP_NOFAIL instances in the code that we can probably remove in the > future, but these two are pretty easy. Thanks for the patch, > Thanks! I'm planning on replacing __GFP_NOFAIL with a different flag that will use all of the page allocator's capabilities (direct reclaim, compaction for order > 0, and oom killer) but not loop forever. Existing __GFP_NOFAIL callers can then do do { page = alloc_page(GFP_KERNEL | __GFP_KILLABLE); } while (!page); to duplicate the behavior of __GFP_NOFAIL until such time as __GFP_KILLABLE can be removed as well. That's what I was planning for the remaining instances in gfs2 during the second phase. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch 6/6] jbd2: remove dependency on __GFP_NOFAIL 2010-07-21 2:44 [patch 0/6] remove dependency on __GFP_NOFAIL for failable allocations David Rientjes ` (3 preceding siblings ...) 2010-07-21 2:45 ` [patch 4/6] gfs2: " David Rientjes @ 2010-07-21 2:45 ` David Rientjes 2010-07-22 14:14 ` Ted Ts'o 2010-07-21 19:26 ` [patch 5/6] jbd: " David Rientjes 5 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2010-07-21 2:45 UTC (permalink / raw) To: Jan Kara, Andrew Morton; +Cc: Andreas Dilger, Jiri Kosina, linux-mm The kzalloc() in start_this_handle() is failable, so remove __GFP_NOFAIL from its mask. Cc: Andreas Dilger <adilger@sun.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: David Rientjes <rientjes@google.com> --- fs/jbd2/transaction.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -102,8 +102,7 @@ static int start_this_handle(journal_t *journal, handle_t *handle) alloc_transaction: if (!journal->j_running_transaction) { - new_transaction = kzalloc(sizeof(*new_transaction), - GFP_NOFS|__GFP_NOFAIL); + new_transaction = kzalloc(sizeof(*new_transaction), GFP_NOFS): if (!new_transaction) { ret = -ENOMEM; goto out; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 6/6] jbd2: remove dependency on __GFP_NOFAIL 2010-07-21 2:45 ` [patch 6/6] jbd2: " David Rientjes @ 2010-07-22 14:14 ` Ted Ts'o 2010-07-22 18:09 ` David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: Ted Ts'o @ 2010-07-22 14:14 UTC (permalink / raw) To: David Rientjes Cc: Jan Kara, Andrew Morton, Andreas Dilger, Jiri Kosina, linux-mm On Tue, Jul 20, 2010 at 07:45:10PM -0700, David Rientjes wrote: > The kzalloc() in start_this_handle() is failable, so remove __GFP_NOFAIL > from its mask. Unfortunately, while there is error handling in start_this_handle(), there isn't in all of the callers of start_this_handle(), which is why the __GFP_NOFAIL is there. At the moment, if we get an ENOMEM in the delayed writeback code paths, for example, it's a disaster; user data can get lost, as a result. I could add retry code in the places where we really can't fail, and reflect the change back up to the userspace code everywhere else, but that will take a while to do, and even then it means that any VFS syscall (chmod, unlink, rmdir, mkdir, read, close, etc.) would now potentially return ENOMEM. And we know how often application programmers do error checking.... But until we fix up all of the callers of jbd2_journal_start() and jbd2_journal_restart(), I'd prefer keep this __GFP_NOFAIL in place, thanks. - Ted P.S. There may also be some places where we haven't taken locks yet, and so it might be safe in a few locations to omit the GFP_NOFS flag. That would mean adding a new version of jbd2_journal_start() which can take a gfp_mask, but the net result would be a file system that would be a little bit more of a "good citizen" as far as allowing memory reclaim. I am worried about reflecting ENOMEM errors back up to userspace, though, since I'm painfully aware of how much crappy userspace code is out there. What might be nice is a "__GFP_RETRY_HARDER" which is weaker form of __GFP_NOFAIL... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 6/6] jbd2: remove dependency on __GFP_NOFAIL 2010-07-22 14:14 ` Ted Ts'o @ 2010-07-22 18:09 ` David Rientjes 2010-07-22 23:09 ` Ted Ts'o 0 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2010-07-22 18:09 UTC (permalink / raw) To: Ted Ts'o Cc: Jan Kara, Andrew Morton, Andreas Dilger, Jiri Kosina, linux-mm On Thu, 22 Jul 2010, Ted Ts'o wrote: > > The kzalloc() in start_this_handle() is failable, so remove __GFP_NOFAIL > > from its mask. > > Unfortunately, while there is error handling in start_this_handle(), > there isn't in all of the callers of start_this_handle(), which is why > the __GFP_NOFAIL is there. At the moment, if we get an ENOMEM in the > delayed writeback code paths, for example, it's a disaster; user data > can get lost, as a result. > I'll change this to do { new_transaction = kzalloc(sizeof(*new_transaction), GFP_NOFS); } while (!new_transaction); in the next phase when I introduce __GFP_KILLABLE (that jbd and jbd2 can't use because they are GFP_NOFS). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 6/6] jbd2: remove dependency on __GFP_NOFAIL 2010-07-22 18:09 ` David Rientjes @ 2010-07-22 23:09 ` Ted Ts'o 2010-07-22 23:24 ` David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: Ted Ts'o @ 2010-07-22 23:09 UTC (permalink / raw) To: David Rientjes Cc: Jan Kara, Andrew Morton, Andreas Dilger, Jiri Kosina, linux-mm On Thu, Jul 22, 2010 at 11:09:53AM -0700, David Rientjes wrote: > > I'll change this to > > do { > new_transaction = kzalloc(sizeof(*new_transaction), > GFP_NOFS); > } while (!new_transaction); > > in the next phase when I introduce __GFP_KILLABLE (that jbd and jbd2 can't > use because they are GFP_NOFS). OK, I can carry a patch which does this in the ext4 tree to push to linus when the merge window opens shortly, since the goal is you want to get rid of __GFP_NOFAIL altogether, right? - Ted -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 6/6] jbd2: remove dependency on __GFP_NOFAIL 2010-07-22 23:09 ` Ted Ts'o @ 2010-07-22 23:24 ` David Rientjes 2010-07-23 14:10 ` Ted Ts'o 0 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2010-07-22 23:24 UTC (permalink / raw) To: Ted Ts'o Cc: Jan Kara, Andrew Morton, Andreas Dilger, Jiri Kosina, linux-mm On Thu, 22 Jul 2010, Ted Ts'o wrote: > > I'll change this to > > > > do { > > new_transaction = kzalloc(sizeof(*new_transaction), > > GFP_NOFS); > > } while (!new_transaction); > > > > in the next phase when I introduce __GFP_KILLABLE (that jbd and jbd2 can't > > use because they are GFP_NOFS). > > OK, I can carry a patch which does this in the ext4 tree to push to > linus when the merge window opens shortly, since the goal is you want > to get rid of __GFP_NOFAIL altogether, right? > Yup, I was trying to do the removal in two phases. First, remove __GFP_NOFAIL from callers that don't seem to need it. I found that they were actually needed in some cases such as jbd, jbd2, and sparc although the reason was specific to those subsystems at a higher level and their error handling was actually unused code since __GFP_NOFAIL cannot return NULL. Second, replace __GFP_NOFAIL with __GFP_KILLABLE which converts existing users of __GFP_NOFAIL into the do-while loop above and adding __GFP_KILLABLE for allocations allowing __GFP_FS which does memory compaction for order > 0, direct reclaim, and the oom killer but does not retry the allocation. That would be the responsibility of the caller. This ends up removing several branches from the page allocator. I didn't think about converting the existing GFP_NOFS | __GFP_NOFAIL callers into the do-while loop above until you mentioned it, thanks. I'll send patches to do that shortly. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 6/6] jbd2: remove dependency on __GFP_NOFAIL 2010-07-22 23:24 ` David Rientjes @ 2010-07-23 14:10 ` Ted Ts'o 2010-07-23 14:57 ` Jan Kara 0 siblings, 1 reply; 26+ messages in thread From: Ted Ts'o @ 2010-07-23 14:10 UTC (permalink / raw) To: David Rientjes Cc: Jan Kara, Andrew Morton, Andreas Dilger, Jiri Kosina, linux-mm, linux-ext4 On Thu, Jul 22, 2010 at 04:24:23PM -0700, David Rientjes wrote: > > I didn't think about converting the existing GFP_NOFS | __GFP_NOFAIL > callers into the do-while loop above until you mentioned it, thanks. I'll > send patches to do that shortly. Here's what I'm planning on queueing for the next merge window, along with patches to ext4 to use jbd2__journal_start(..., GFP_KERNEL) in places where we can afford to fail. After doing some analysis, the places where we can afford to fail are also the places where we can use GFP_KERNEL instead of GFP_NOFS, so conveniently, I'm using the lack of __GFP_FS to indicate that we should do the retry loop in start_this_handle(). I also added the congestion_wait() call since there's no point busy-looping the CPU while we're waiting for pages to get swapped or paged out. Comments would be appreciated. - Ted ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 6/6] jbd2: remove dependency on __GFP_NOFAIL 2010-07-23 14:10 ` Ted Ts'o @ 2010-07-23 14:57 ` Jan Kara 2010-07-23 15:05 ` Ted Ts'o 0 siblings, 1 reply; 26+ messages in thread From: Jan Kara @ 2010-07-23 14:57 UTC (permalink / raw) To: Ted Ts'o Cc: David Rientjes, Jan Kara, Andrew Morton, Andreas Dilger, Jiri Kosina, linux-mm, linux-ext4 On Fri 23-07-10 10:10:54, Ted Ts'o wrote: > On Thu, Jul 22, 2010 at 04:24:23PM -0700, David Rientjes wrote: > > > > I didn't think about converting the existing GFP_NOFS | __GFP_NOFAIL > > callers into the do-while loop above until you mentioned it, thanks. I'll > > send patches to do that shortly. ... > From 814be805d5e3d12343e590631ff9bc2d65c8f60a Mon Sep 17 00:00:00 2001 > From: Theodore Ts'o <tytso@mit.edu> > Date: Fri, 23 Jul 2010 10:06:53 -0400 > Subject: [PATCH] jbd2: Remove __GFP_NOFAIL from jbd2 layer > > __GFP_NOFAIL is going away, so add our own retry loop. Also add > jbd2__journal_start() and jbd2__journal_restart() which take a gfp > mask, so that file systems can optionally (re)start transaction > handles using GFP_KERNEL. If they do this, then they need to be > prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready > to reflect that error up to userspace. > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > --- > fs/jbd2/journal.c | 14 +++++++++-- > fs/jbd2/transaction.c | 60 +++++++++++++++++++++++++++++++++--------------- > include/linux/jbd2.h | 4 ++- > 3 files changed, 55 insertions(+), 23 deletions(-) > > diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c > index e214d68..43241c0 100644 > --- a/fs/jbd2/transaction.c > +++ b/fs/jbd2/transaction.c > @@ -83,30 +83,39 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction) > * transaction's buffer credits. > */ > > -static int start_this_handle(journal_t *journal, handle_t *handle) > +static int start_this_handle(journal_t *journal, handle_t *handle, > + int gfp_mask) > { > transaction_t *transaction; > int needed; > int nblocks = handle->h_buffer_credits; > transaction_t *new_transaction = NULL; > - int ret = 0; > unsigned long ts = jiffies; > > if (nblocks > journal->j_max_transaction_buffers) { > printk(KERN_ERR "JBD: %s wants too many credits (%d > %d)\n", > current->comm, nblocks, > journal->j_max_transaction_buffers); > - ret = -ENOSPC; > - goto out; > + return -ENOSPC; > } > > alloc_transaction: > if (!journal->j_running_transaction) { > - new_transaction = kzalloc(sizeof(*new_transaction), > - GFP_NOFS|__GFP_NOFAIL); > + retry_alloc: > + new_transaction = kzalloc(sizeof(*new_transaction), gfp_mask); > if (!new_transaction) { > - ret = -ENOMEM; > - goto out; > + /* > + * If __GFP_FS is not present, then we may be > + * being called from inside the fs writeback > + * layer, so we MUST NOT fail. Since > + * __GFP_NOFAIL is going away, we will arrange > + * to retry the allocation ourselves. > + */ > + if ((gfp & __GFP_FS) == 0) { > + congestion_wait(BLK_RW_ASYNC, HZ/50); > + goto retry_alloc; You could as well go to alloc_transaction label above... > + } > + return -ENOMEM; > } > } > ... > @@ -278,7 +285,7 @@ static handle_t *new_handle(int nblocks) > * > * Return a pointer to a newly allocated handle, or NULL on failure > */ > -handle_t *jbd2_journal_start(journal_t *journal, int nblocks) > +handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int gfp_mask) > { > handle_t *handle = journal_current_handle(); > int err; > @@ -298,7 +305,7 @@ handle_t *jbd2_journal_start(journal_t *journal, int nblocks) > > current->journal_info = handle; > > - err = start_this_handle(journal, handle); > + err = start_this_handle(journal, handle, GFP_NOFS); Here you want to use gfp_mask I guess. > if (err < 0) { > jbd2_free_handle(handle); > current->journal_info = NULL; > @@ -308,6 +315,15 @@ handle_t *jbd2_journal_start(journal_t *journal, int nblocks) > out: > return handle; > } > +EXPORT_SYMBOL(jbd2__journal_start); > + > + > +handle_t *jbd2_journal_start(journal_t *journal, int nblocks) > +{ > + return jbd2__journal_start(journal, nblocks, GFP_NOFS); > +} > +EXPORT_SYMBOL(jbd2_journal_start); > + > > /** > * int jbd2_journal_extend() - extend buffer credits. > @@ -394,8 +410,7 @@ out: > * transaction capabable of guaranteeing the requested number of > * credits. > */ > - > -int jbd2_journal_restart(handle_t *handle, int nblocks) > +int jbd2__journal_restart(handle_t *handle, int nblocks, int gfp_mask) > { > transaction_t *transaction = handle->h_transaction; > journal_t *journal = transaction->t_journal; > @@ -428,10 +443,17 @@ int jbd2_journal_restart(handle_t *handle, int nblocks) > > lock_map_release(&handle->h_lockdep_map); > handle->h_buffer_credits = nblocks; > - ret = start_this_handle(journal, handle); > + ret = start_this_handle(journal, handle, GFP_NOFS); And here you want to use gfp_mask as well. > return ret; > } > +EXPORT_SYMBOL(jbd2__journal_restart); > + > > +int jbd2_journal_restart(handle_t *handle, int nblocks) > +{ > + return jbd2__journal_restart(handle, nblocks, GFP_NOFS); > +} > +EXPORT_SYMBOL(jbd2_journal_restart); > > /** > * void jbd2_journal_lock_updates () - establish a transaction barrier. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 6/6] jbd2: remove dependency on __GFP_NOFAIL 2010-07-23 14:57 ` Jan Kara @ 2010-07-23 15:05 ` Ted Ts'o 2010-07-23 15:32 ` Jan Kara 2010-07-23 19:40 ` David Rientjes 0 siblings, 2 replies; 26+ messages in thread From: Ted Ts'o @ 2010-07-23 15:05 UTC (permalink / raw) To: Jan Kara Cc: David Rientjes, Andrew Morton, Andreas Dilger, Jiri Kosina, linux-mm, linux-ext4 Yeah, oops. Nice catches. I also hadn't done a test compile, so there were some missing #include's. So once more, this time with feeling... - Ted ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 6/6] jbd2: remove dependency on __GFP_NOFAIL 2010-07-23 15:05 ` Ted Ts'o @ 2010-07-23 15:32 ` Jan Kara 2010-07-23 19:40 ` David Rientjes 1 sibling, 0 replies; 26+ messages in thread From: Jan Kara @ 2010-07-23 15:32 UTC (permalink / raw) To: Ted Ts'o Cc: Jan Kara, David Rientjes, Andrew Morton, Andreas Dilger, Jiri Kosina, linux-mm, linux-ext4 On Fri 23-07-10 11:05:43, Ted Ts'o wrote: > Yeah, oops. Nice catches. I also hadn't done a test compile, so > there were some missing #include's. > > So once more, this time with feeling... > > - Ted > > From d24408e1b50e47b21b7d2ec5857b710e9b752dc9 Mon Sep 17 00:00:00 2001 > From: Theodore Ts'o <tytso@mit.edu> > Date: Fri, 23 Jul 2010 11:03:45 -0400 > Subject: [PATCH] jbd2: Remove __GFP_NOFAIL from jbd2 layer > > __GFP_NOFAIL is going away, so add our own retry loop. Also add > jbd2__journal_start() and jbd2__journal_restart() which take a gfp > mask, so that file systems can optionally (re)start transaction > handles using GFP_KERNEL. If they do this, then they need to be > prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready > to reflect that error up to userspace. > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Now the patch looks good. Acked-by: Jan Kara <jack@suse.cz> Honza > --- > fs/jbd2/journal.c | 15 +++++++++-- > fs/jbd2/transaction.c | 61 +++++++++++++++++++++++++++++++++--------------- > include/linux/jbd2.h | 4 ++- > 3 files changed, 57 insertions(+), 23 deletions(-) > > diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c > index f7bf157..a79d334 100644 > --- a/fs/jbd2/journal.c > +++ b/fs/jbd2/journal.c > @@ -41,6 +41,7 @@ > #include <linux/hash.h> > #include <linux/log2.h> > #include <linux/vmalloc.h> > +#include <linux/backing-dev.h> > > #define CREATE_TRACE_POINTS > #include <trace/events/jbd2.h> > @@ -48,8 +49,6 @@ > #include <asm/uaccess.h> > #include <asm/page.h> > > -EXPORT_SYMBOL(jbd2_journal_start); > -EXPORT_SYMBOL(jbd2_journal_restart); > EXPORT_SYMBOL(jbd2_journal_extend); > EXPORT_SYMBOL(jbd2_journal_stop); > EXPORT_SYMBOL(jbd2_journal_lock_updates); > @@ -311,7 +310,17 @@ int jbd2_journal_write_metadata_buffer(transaction_t *transaction, > */ > J_ASSERT_BH(bh_in, buffer_jbddirty(bh_in)); > > - new_bh = alloc_buffer_head(GFP_NOFS|__GFP_NOFAIL); > +retry_alloc: > + new_bh = alloc_buffer_head(GFP_NOFS); > + if (!new_bh) { > + /* > + * Failure is not an option, but __GFP_NOFAIL is going > + * away; so we retry ourselves here. > + */ > + congestion_wait(BLK_RW_ASYNC, HZ/50); > + goto retry_alloc; > + } > + > /* keep subsequent assertions sane */ > new_bh->b_state = 0; > init_buffer(new_bh, NULL, NULL); > diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c > index e214d68..001e95f 100644 > --- a/fs/jbd2/transaction.c > +++ b/fs/jbd2/transaction.c > @@ -26,6 +26,8 @@ > #include <linux/mm.h> > #include <linux/highmem.h> > #include <linux/hrtimer.h> > +#include <linux/backing-dev.h> > +#include <linux/module.h> > > static void __jbd2_journal_temp_unlink_buffer(struct journal_head *jh); > > @@ -83,30 +85,38 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction) > * transaction's buffer credits. > */ > > -static int start_this_handle(journal_t *journal, handle_t *handle) > +static int start_this_handle(journal_t *journal, handle_t *handle, > + int gfp_mask) > { > transaction_t *transaction; > int needed; > int nblocks = handle->h_buffer_credits; > transaction_t *new_transaction = NULL; > - int ret = 0; > unsigned long ts = jiffies; > > if (nblocks > journal->j_max_transaction_buffers) { > printk(KERN_ERR "JBD: %s wants too many credits (%d > %d)\n", > current->comm, nblocks, > journal->j_max_transaction_buffers); > - ret = -ENOSPC; > - goto out; > + return -ENOSPC; > } > > alloc_transaction: > if (!journal->j_running_transaction) { > - new_transaction = kzalloc(sizeof(*new_transaction), > - GFP_NOFS|__GFP_NOFAIL); > + new_transaction = kzalloc(sizeof(*new_transaction), gfp_mask); > if (!new_transaction) { > - ret = -ENOMEM; > - goto out; > + /* > + * If __GFP_FS is not present, then we may be > + * being called from inside the fs writeback > + * layer, so we MUST NOT fail. Since > + * __GFP_NOFAIL is going away, we will arrange > + * to retry the allocation ourselves. > + */ > + if ((gfp_mask & __GFP_FS) == 0) { > + congestion_wait(BLK_RW_ASYNC, HZ/50); > + goto alloc_transaction; > + } > + return -ENOMEM; > } > } > > @@ -123,8 +133,8 @@ repeat_locked: > if (is_journal_aborted(journal) || > (journal->j_errno != 0 && !(journal->j_flags & JBD2_ACK_ERR))) { > spin_unlock(&journal->j_state_lock); > - ret = -EROFS; > - goto out; > + kfree(new_transaction); > + return -EROFS; > } > > /* Wait on the journal's transaction barrier if necessary */ > @@ -240,10 +250,8 @@ repeat_locked: > spin_unlock(&journal->j_state_lock); > > lock_map_acquire(&handle->h_lockdep_map); > -out: > - if (unlikely(new_transaction)) /* It's usually NULL */ > - kfree(new_transaction); > - return ret; > + kfree(new_transaction); > + return 0; > } > > static struct lock_class_key jbd2_handle_key; > @@ -278,7 +286,7 @@ static handle_t *new_handle(int nblocks) > * > * Return a pointer to a newly allocated handle, or NULL on failure > */ > -handle_t *jbd2_journal_start(journal_t *journal, int nblocks) > +handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int gfp_mask) > { > handle_t *handle = journal_current_handle(); > int err; > @@ -298,7 +306,7 @@ handle_t *jbd2_journal_start(journal_t *journal, int nblocks) > > current->journal_info = handle; > > - err = start_this_handle(journal, handle); > + err = start_this_handle(journal, handle, gfp_mask); > if (err < 0) { > jbd2_free_handle(handle); > current->journal_info = NULL; > @@ -308,6 +316,15 @@ handle_t *jbd2_journal_start(journal_t *journal, int nblocks) > out: > return handle; > } > +EXPORT_SYMBOL(jbd2__journal_start); > + > + > +handle_t *jbd2_journal_start(journal_t *journal, int nblocks) > +{ > + return jbd2__journal_start(journal, nblocks, GFP_NOFS); > +} > +EXPORT_SYMBOL(jbd2_journal_start); > + > > /** > * int jbd2_journal_extend() - extend buffer credits. > @@ -394,8 +411,7 @@ out: > * transaction capabable of guaranteeing the requested number of > * credits. > */ > - > -int jbd2_journal_restart(handle_t *handle, int nblocks) > +int jbd2__journal_restart(handle_t *handle, int nblocks, int gfp_mask) > { > transaction_t *transaction = handle->h_transaction; > journal_t *journal = transaction->t_journal; > @@ -428,10 +444,17 @@ int jbd2_journal_restart(handle_t *handle, int nblocks) > > lock_map_release(&handle->h_lockdep_map); > handle->h_buffer_credits = nblocks; > - ret = start_this_handle(journal, handle); > + ret = start_this_handle(journal, handle, gfp_mask); > return ret; > } > +EXPORT_SYMBOL(jbd2__journal_restart); > + > > +int jbd2_journal_restart(handle_t *handle, int nblocks) > +{ > + return jbd2__journal_restart(handle, nblocks, GFP_NOFS); > +} > +EXPORT_SYMBOL(jbd2_journal_restart); > > /** > * void jbd2_journal_lock_updates () - establish a transaction barrier. > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h > index a4d2e9f..5a72bc7 100644 > --- a/include/linux/jbd2.h > +++ b/include/linux/jbd2.h > @@ -1081,7 +1081,9 @@ static inline handle_t *journal_current_handle(void) > */ > > extern handle_t *jbd2_journal_start(journal_t *, int nblocks); > -extern int jbd2_journal_restart (handle_t *, int nblocks); > +extern handle_t *jbd2__journal_start(journal_t *, int nblocks, int gfp_mask); > +extern int jbd2_journal_restart(handle_t *, int nblocks); > +extern int jbd2__journal_restart(handle_t *, int nblocks, int gfp_mask); > extern int jbd2_journal_extend (handle_t *, int nblocks); > extern int jbd2_journal_get_write_access(handle_t *, struct buffer_head *); > extern int jbd2_journal_get_create_access (handle_t *, struct buffer_head *); > -- > 1.7.0.4 > -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 6/6] jbd2: remove dependency on __GFP_NOFAIL 2010-07-23 15:05 ` Ted Ts'o 2010-07-23 15:32 ` Jan Kara @ 2010-07-23 19:40 ` David Rientjes 2010-07-23 19:52 ` Ted Ts'o 1 sibling, 1 reply; 26+ messages in thread From: David Rientjes @ 2010-07-23 19:40 UTC (permalink / raw) To: Ted Ts'o Cc: Jan Kara, Andrew Morton, Andreas Dilger, Jiri Kosina, linux-mm, linux-ext4 On Fri, 23 Jul 2010, Ted Ts'o wrote: > __GFP_NOFAIL is going away, so add our own retry loop. Also add > jbd2__journal_start() and jbd2__journal_restart() which take a gfp > mask, so that file systems can optionally (re)start transaction > handles using GFP_KERNEL. If they do this, then they need to be > prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready > to reflect that error up to userspace. > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: David Rientjes <rientjes@google.com> Will you be pushing the equivalent patch for jbd? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [patch 6/6] jbd2: remove dependency on __GFP_NOFAIL 2010-07-23 19:40 ` David Rientjes @ 2010-07-23 19:52 ` Ted Ts'o 0 siblings, 0 replies; 26+ messages in thread From: Ted Ts'o @ 2010-07-23 19:52 UTC (permalink / raw) To: David Rientjes Cc: Jan Kara, Andrew Morton, Andreas Dilger, Jiri Kosina, linux-mm, linux-ext4 On Fri, Jul 23, 2010 at 12:40:50PM -0700, David Rientjes wrote: > On Fri, 23 Jul 2010, Ted Ts'o wrote: > > > __GFP_NOFAIL is going away, so add our own retry loop. Also add > > jbd2__journal_start() and jbd2__journal_restart() which take a gfp > > mask, so that file systems can optionally (re)start transaction > > handles using GFP_KERNEL. If they do this, then they need to be > > prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready > > to reflect that error up to userspace. > > > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > > Acked-by: David Rientjes <rientjes@google.com> > > Will you be pushing the equivalent patch for jbd? I imagine Jan (as the person who has been primarily handling ext3 patches) is waiting to see how invasive the patches are to ext4 before deciding whether he's willing backport the equivalent changes to ext3 so that some of the calls that end up calling start_this_handle() end up passing GFP_KERNEL when it's OK for them to fail --- since ext3 is in maintenance mode, so we tend not to backport the more disruptive patches to ext3. It's really a cost/benefit tradeoff, really. I am certain that some application programmers will complain when their programs start breaking because they're not prepared to handle an ENOMEM failure from certain system calls that have never failed that way before. (I should introduce you to some Japanese enterprise programmers who believe that if a system call ever starts returning an error code not documented in a Linux manpage, it's a ABI compatibility bug. They're on crack, of course, but it's been hard convincing them that it's their fault for writing brittle/fragile applications.) So I could imagine that Jan and some of the enterprise distributions that are shipping ext3 might not want this change, given the "better the devil you know (random lockups in the case of extreme memory pressure)" is better than the devil you don't (more system calls might return ENOMEM, with undetermined application impacts, in the case of extreme memory pressure). So in the jbd layer, they might want to simply unconditionally loop, so it would be bug-for-bug compatible with existing ext3 behavior. - Ted -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [patch 5/6] jbd: remove dependency on __GFP_NOFAIL 2010-07-21 2:44 [patch 0/6] remove dependency on __GFP_NOFAIL for failable allocations David Rientjes ` (4 preceding siblings ...) 2010-07-21 2:45 ` [patch 6/6] jbd2: " David Rientjes @ 2010-07-21 19:26 ` David Rientjes 5 siblings, 0 replies; 26+ messages in thread From: David Rientjes @ 2010-07-21 19:26 UTC (permalink / raw) To: Andrew Morton, Jan Kara; +Cc: Andreas Dilger, Jiri Kosina, linux-mm The kzalloc() in start_this_handle() is failable, so remove __GFP_NOFAIL from its mask. Cc: Andreas Dilger <adilger@sun.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: David Rientjes <rientjes@google.com> --- fs/jbd/transaction.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c --- a/fs/jbd/transaction.c +++ b/fs/jbd/transaction.c @@ -99,8 +99,7 @@ static int start_this_handle(journal_t *journal, handle_t *handle) alloc_transaction: if (!journal->j_running_transaction) { - new_transaction = kzalloc(sizeof(*new_transaction), - GFP_NOFS|__GFP_NOFAIL); + new_transaction = kzalloc(sizeof(*new_transaction), GFP_NOFS); if (!new_transaction) { ret = -ENOMEM; goto out; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2010-07-23 21:12 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-07-21 2:44 [patch 0/6] remove dependency on __GFP_NOFAIL for failable allocations David Rientjes 2010-07-21 2:44 ` [patch 1/6] sparc: remove dependency on __GFP_NOFAIL David Rientjes 2010-07-21 3:31 ` David Miller 2010-07-21 9:41 ` David Rientjes 2010-07-21 2:44 ` [patch 2/6] infiniband: " David Rientjes 2010-07-21 3:19 ` Steve Wise 2010-07-21 17:55 ` Roland Dreier 2010-07-21 2:45 ` [patch 3/6] fs: " David Rientjes 2010-07-23 19:36 ` Andrew Morton 2010-07-23 19:51 ` David Rientjes 2010-07-23 21:12 ` Pekka Enberg 2010-07-21 2:45 ` [patch 4/6] gfs2: " David Rientjes 2010-07-21 9:24 ` Steven Whitehouse 2010-07-21 9:31 ` David Rientjes 2010-07-21 2:45 ` [patch 6/6] jbd2: " David Rientjes 2010-07-22 14:14 ` Ted Ts'o 2010-07-22 18:09 ` David Rientjes 2010-07-22 23:09 ` Ted Ts'o 2010-07-22 23:24 ` David Rientjes 2010-07-23 14:10 ` Ted Ts'o 2010-07-23 14:57 ` Jan Kara 2010-07-23 15:05 ` Ted Ts'o 2010-07-23 15:32 ` Jan Kara 2010-07-23 19:40 ` David Rientjes 2010-07-23 19:52 ` Ted Ts'o 2010-07-21 19:26 ` [patch 5/6] jbd: " David Rientjes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).