Re: gfs2 iomap dealock, IOMAP_F

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: gfs2 iomap dealock, IOMAP_F_UNBALANCED
       [not found]   ` <CAHc6FU49oBdo8mAq7hb1greR+B1C_Fpy5JU7RBHfRYACt1S4wA@mail.gmail.com>
@ 2019-04-07  7:32     ` Christoph Hellwig
  2019-04-08  8:53       ` Andreas Gruenbacher
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2019-04-07  7:32 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, cluster-devel, Dave Chinner, Ross Lagerwall,
	Mark Syms, Edwin Török, linux-fsdevel, Jan Kara,
	linux-mm

[adding Jan and linux-mm]

On Fri, Mar 29, 2019 at 11:13:00PM +0100, Andreas Gruenbacher wrote:
> > But what is the requirement to do this in writeback context?  Can't
> > we move it out into another context instead?
> 
> Indeed, this isn't for data integrity in this case but because the
> dirty limit is exceeded. What other context would you suggest to move
> this to?
> 
> (The iomap flag I've proposed would save us from getting into this
> situation in the first place.)

Your patch does two things:

 - it only calls balance_dirty_pages_ratelimited once per write
   operation instead of once per page.  In the past btrfs did
   hacks like that, but IIRC they caused VM balancing issues.
   That is why everyone now calls balance_dirty_pages_ratelimited
   one per page.  If calling it at a coarse granularity would
   be fine we should do it everywhere instead of just in gfs2
   in journaled mode
 - it artifically reduces the size of writes to a low value,
   which I suspect is going to break real life application

So I really think we need to fix this properly.  And if that means
that you can't make use of the iomap batching for gfs2 in journaled
mode that is still a better option.  But I really think you need
to look into the scope of your flush_log and figure out a good way
to reduce that as solve the root cause.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: gfs2 iomap dealock, IOMAP_F_UNBALANCED
  2019-04-07  7:32     ` gfs2 iomap dealock, IOMAP_F_UNBALANCED Christoph Hellwig
@ 2019-04-08  8:53       ` Andreas Gruenbacher
  2019-04-08 13:44         ` Jan Kara
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Gruenbacher @ 2019-04-08  8:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: cluster-devel, Dave Chinner, Ross Lagerwall, Mark Syms,
	Edwin Török, linux-fsdevel, Jan Kara, linux-mm

On Sun, 7 Apr 2019 at 09:32, Christoph Hellwig <hch@lst.de> wrote:
>
> [adding Jan and linux-mm]
>
> On Fri, Mar 29, 2019 at 11:13:00PM +0100, Andreas Gruenbacher wrote:
> > > But what is the requirement to do this in writeback context?  Can't
> > > we move it out into another context instead?
> >
> > Indeed, this isn't for data integrity in this case but because the
> > dirty limit is exceeded. What other context would you suggest to move
> > this to?
> >
> > (The iomap flag I've proposed would save us from getting into this
> > situation in the first place.)
>
> Your patch does two things:
>
>  - it only calls balance_dirty_pages_ratelimited once per write
>    operation instead of once per page.  In the past btrfs did
>    hacks like that, but IIRC they caused VM balancing issues.
>    That is why everyone now calls balance_dirty_pages_ratelimited
>    one per page.  If calling it at a coarse granularity would
>    be fine we should do it everywhere instead of just in gfs2
>    in journaled mode
>  - it artifically reduces the size of writes to a low value,
>    which I suspect is going to break real life application

Not quite, balance_dirty_pages_ratelimited is called from iomap_end,
so once per iomap mapping returned, not per write. (The first version
of this patch got that wrong by accident, but not the second.)

We can limit the size of the mappings returned just in that case. I'm
aware that there is a risk of balancing problems, I just don't have
any better ideas.

This is a problem all filesystems with data-journaling will have with
iomap, it's not that gfs2 is doing anything particularly stupid.

> So I really think we need to fix this properly.  And if that means
> that you can't make use of the iomap batching for gfs2 in journaled
> mode that is still a better option.

That would mean using the old-style, page-size allocations, and a
completely separate write path in that case. That would be quite a
nightmare.

> But I really think you need
> to look into the scope of your flush_log and figure out a good way
> to reduce that as solve the root cause.

We won't be able to do a log flush while another transaction is
active, but that's what's needed to clean dirty pages. iomap doesn't
allow us to put the block allocation into a separate transaction from
the page writes; for that, the opposite to the page_done hook would
probably be needed.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: gfs2 iomap dealock, IOMAP_F_UNBALANCED
  2019-04-08  8:53       ` Andreas Gruenbacher
@ 2019-04-08 13:44         ` Jan Kara
  2019-04-09 12:15           ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2019-04-08 13:44 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, cluster-devel, Dave Chinner, Ross Lagerwall,
	Mark Syms, Edwin Török, linux-fsdevel, Jan Kara,
	linux-mm

On Mon 08-04-19 10:53:34, Andreas Gruenbacher wrote:
> On Sun, 7 Apr 2019 at 09:32, Christoph Hellwig <hch@lst.de> wrote:
> >
> > [adding Jan and linux-mm]
> >
> > On Fri, Mar 29, 2019 at 11:13:00PM +0100, Andreas Gruenbacher wrote:
> > > > But what is the requirement to do this in writeback context?  Can't
> > > > we move it out into another context instead?
> > >
> > > Indeed, this isn't for data integrity in this case but because the
> > > dirty limit is exceeded. What other context would you suggest to move
> > > this to?
> > >
> > > (The iomap flag I've proposed would save us from getting into this
> > > situation in the first place.)
> >
> > Your patch does two things:
> >
> >  - it only calls balance_dirty_pages_ratelimited once per write
> >    operation instead of once per page.  In the past btrfs did
> >    hacks like that, but IIRC they caused VM balancing issues.
> >    That is why everyone now calls balance_dirty_pages_ratelimited
> >    one per page.  If calling it at a coarse granularity would
> >    be fine we should do it everywhere instead of just in gfs2
> >    in journaled mode
> >  - it artifically reduces the size of writes to a low value,
> >    which I suspect is going to break real life application
> 
> Not quite, balance_dirty_pages_ratelimited is called from iomap_end,
> so once per iomap mapping returned, not per write. (The first version
> of this patch got that wrong by accident, but not the second.)
> 
> We can limit the size of the mappings returned just in that case. I'm
> aware that there is a risk of balancing problems, I just don't have
> any better ideas.
> 
> This is a problem all filesystems with data-journaling will have with
> iomap, it's not that gfs2 is doing anything particularly stupid.

I agree that if ext4 would be using iomap, it would have similar issues.

> > So I really think we need to fix this properly.  And if that means
> > that you can't make use of the iomap batching for gfs2 in journaled
> > mode that is still a better option.
> 
> That would mean using the old-style, page-size allocations, and a
> completely separate write path in that case. That would be quite a
> nightmare.
> 
> > But I really think you need
> > to look into the scope of your flush_log and figure out a good way
> > to reduce that as solve the root cause.
> 
> We won't be able to do a log flush while another transaction is
> active, but that's what's needed to clean dirty pages. iomap doesn't
> allow us to put the block allocation into a separate transaction from
> the page writes; for that, the opposite to the page_done hook would
> probably be needed.

I agree that a ->page_prepare() hook would be probably the cleanest
solution for this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: gfs2 iomap dealock, IOMAP_F_UNBALANCED
  2019-04-08 13:44         ` Jan Kara
@ 2019-04-09 12:15           ` Christoph Hellwig
  2019-04-09 12:27             ` Andreas Gruenbacher
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2019-04-09 12:15 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andreas Gruenbacher, Christoph Hellwig, cluster-devel,
	Dave Chinner, Ross Lagerwall, Mark Syms, Edwin Török,
	linux-fsdevel, linux-mm

On Mon, Apr 08, 2019 at 03:44:05PM +0200, Jan Kara wrote:
> > We won't be able to do a log flush while another transaction is
> > active, but that's what's needed to clean dirty pages. iomap doesn't
> > allow us to put the block allocation into a separate transaction from
> > the page writes; for that, the opposite to the page_done hook would
> > probably be needed.
> 
> I agree that a ->page_prepare() hook would be probably the cleanest
> solution for this.

That doesn't sound too bad to me.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: gfs2 iomap dealock, IOMAP_F_UNBALANCED
  2019-04-09 12:15           ` Christoph Hellwig
@ 2019-04-09 12:27             ` Andreas Gruenbacher
  0 siblings, 0 replies; 5+ messages in thread
From: Andreas Gruenbacher @ 2019-04-09 12:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, cluster-devel, Dave Chinner, Ross Lagerwall, Mark Syms,
	Edwin Török, linux-fsdevel, linux-mm

On Tue, 9 Apr 2019 at 14:15, Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Apr 08, 2019 at 03:44:05PM +0200, Jan Kara wrote:
> > > We won't be able to do a log flush while another transaction is
> > > active, but that's what's needed to clean dirty pages. iomap doesn't
> > > allow us to put the block allocation into a separate transaction from
> > > the page writes; for that, the opposite to the page_done hook would
> > > probably be needed.
> >
> > I agree that a ->page_prepare() hook would be probably the cleanest
> > solution for this.
>
> That doesn't sound too bad to me.

Okay, I'll see how the code for that will turn out.

Thanks,
Andreas


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-04-09 12:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20190321131304.21618-1-agruenba@redhat.com>
     [not found] ` <20190328165104.GA21552@lst.de>
     [not found]   ` <CAHc6FU49oBdo8mAq7hb1greR+B1C_Fpy5JU7RBHfRYACt1S4wA@mail.gmail.com>
2019-04-07  7:32     ` gfs2 iomap dealock, IOMAP_F_UNBALANCED Christoph Hellwig
2019-04-08  8:53       ` Andreas Gruenbacher
2019-04-08 13:44         ` Jan Kara
2019-04-09 12:15           ` Christoph Hellwig
2019-04-09 12:27             ` Andreas Gruenbacher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).