FLUSH mechanism implementation in block layer

public inbox for linux-mmc@vger.kernel.org
 help / color / mirror / Atom feed

* FLUSH mechanism implementation in block layer
@ 2013-04-07 13:15 Tanya Brokhman
  2013-04-07 16:33 ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Tanya Brokhman @ 2013-04-07 13:15 UTC (permalink / raw)
  To: tj; +Cc: Jens Axboe, linux-mmc, merez, kdorfman

Hello Tejun,

I'm writing to you since you're signed on the blk-flush.c file, hoping 
you could answer a flush-related question for me. Please excuse me if 
you're not the right person to address this to.

I've been looking into the flush implementation, trying to understand 
how it works. FLUSH command can be used for two purposes:
1. Flush the data to the non-volatile memory from the card cache
2. Keep an order of requests: req_A.... req_D, FLUSH, req_C...req_X
Unfortunately I don't understand how the second purpose of FLUSH is 
implemented. If to simplify the question, let take for example a card 
that doesn't implement a writeback cache (doesn't support FLUSH/FUA) and 
the following example:

The application inserts req_A...req_D to the block layer (and the 
scheduler) and issues req_FLUSH that contains data. What is expected in 
this situation is that req_A..req_D will be written to the non-volatile 
memory before req_FLUSH.
According to the code at blk_insert_flush() the req_FLUSH request will 
be marked as SOFTBARRIER and added to the tail of the dispatch queue.
But what guaranties that by the time it's added to the dispatch queue 
req_A..req_D have been dispatched as well? It's possible that they are 
still in scheduler and will be dispatched only after req_FLUSH is 
completed...

One possible solution to this is if the application waits to get a 
completion callback on req_A..req_D before issuing the req_FLUSH. Is 
this indeed the case? I didn't find any documentation on it on the web.

Thanks,
Tanya Brokhman

-- 
QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a 
member of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FLUSH mechanism implementation in block layer
  2013-04-07 13:15 FLUSH mechanism implementation in block layer Tanya Brokhman
@ 2013-04-07 16:33 ` Tejun Heo
  2013-04-08  4:19   ` Tanya Brokhman
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2013-04-07 16:33 UTC (permalink / raw)
  To: Tanya Brokhman; +Cc: Jens Axboe, linux-mmc, merez, kdorfman

Hey,

On Sun, Apr 07, 2013 at 04:15:30PM +0300, Tanya Brokhman wrote:
> I've been looking into the flush implementation, trying to
> understand how it works. FLUSH command can be used for two purposes:
> 1. Flush the data to the non-volatile memory from the card cache
> 2. Keep an order of requests: req_A.... req_D, FLUSH, req_C...req_X
> Unfortunately I don't understand how the second purpose of FLUSH is
> implemented. If to simplify the question, let take for example a
> card that doesn't implement a writeback cache (doesn't support
> FLUSH/FUA) and the following example:
> 
> The application inserts req_A...req_D to the block layer (and the
> scheduler) and issues req_FLUSH that contains data. What is expected
> in this situation is that req_A..req_D will be written to the
> non-volatile memory before req_FLUSH.
> According to the code at blk_insert_flush() the req_FLUSH request
> will be marked as SOFTBARRIER and added to the tail of the dispatch
> queue.
> But what guaranties that by the time it's added to the dispatch
> queue req_A..req_D have been dispatched as well? It's possible that
> they are still in scheduler and will be dispatched only after
> req_FLUSH is completed...

Which version of kernel are you looking at?  The barrier / flush
implementation went through several iterations and in the current
iteration ordering of requests around the flush request is the
responsibility of higher layer - ie. filesystems are required to wait
for completions of commands which should come before flush before
issuing it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FLUSH mechanism implementation in block layer
  2013-04-07 16:33 ` Tejun Heo
@ 2013-04-08  4:19   ` Tanya Brokhman
  2013-04-09 19:03     ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Tanya Brokhman @ 2013-04-08  4:19 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-mmc, merez, kdorfman

Hi Tejun,

Thank you for the prompt response.
>
> Which version of kernel are you looking at?  The barrier / flush
> implementation went through several iterations and in the current
> iteration ordering of requests around the flush request is the
> responsibility of higher layer - ie. filesystems are required to wait
> for completions of commands which should come before flush before
> issuing it.
>
> Thanks.
>

That is the conclusion I came to as well. I was looking at kernel3.4 but 
if I'm not mistaken it's the same at kernel3.7 as well.
Could you please tell me why it was designed this way? It seems to me 
that if the application is issuing async requests it shouldn't be the 
applications responsibility to wait for their completion if the 
execution order is important. I mean, it feels like the FLUSH should be 
implemented by the lower layers (block, device driver, card) and in the 
current situation, you may say that FLUSH is partially implemented by 
the application.
At first I didn't understand why in case of FLUSH the elevator is not 
drained. Later I understood that it's not efficient since there might be 
requests from other tasks as well there that can be completed after the 
FLUSH. But perhaps there is a correct way to move the implementation of 
"ordering around the flush" to the block layer? Not that it would work 
better, it just feels that logically - block layer is the place to do it 
at.
What do you think?

Best Regards,
Tanya Brokhman
-- 
QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FLUSH mechanism implementation in block layer
  2013-04-08  4:19   ` Tanya Brokhman
@ 2013-04-09 19:03     ` Tejun Heo
  2013-04-10  6:13       ` Tanya Brokhman
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2013-04-09 19:03 UTC (permalink / raw)
  To: Tanya Brokhman; +Cc: Jens Axboe, linux-mmc, merez, kdorfman

Hey,

On Mon, Apr 08, 2013 at 07:19:19AM +0300, Tanya Brokhman wrote:
> completed after the FLUSH. But perhaps there is a correct way to
> move the implementation of "ordering around the flush" to the block
> layer? Not that it would work better, it just feels that logically -
> block layer is the place to do it at.
> What do you think?

It used to be implemented that way - REQ_BARRIER.  The problem was
that filesystems wanted multiple dependency streams - ie. fdatasync()
of a file doesn't need to drain all other IOs in progress.  We could
do coloring of IOs - ie. give IOs which may need flushing later but
belong to different dependency streams different colors and let block
layer figure out partial drainining, which would work but at the same
time be pretty nasty and complex.  The thing was that most filesystems
were already drainings IOs and didn't need any ordering guarantee from
block layer.  hch took care of the outliers which were depending on
REQ_BARRIER ordering guarantees and we just stripped the ordering
mechanism which immediately improved performance noticeably in certain
workloads.

It was done that way mostly out of convenience at that time but now I
think of it it's the correct thing to do.  It just is too much
information to communicate downwards and the extra communication
overhead doesn't really buy anything.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FLUSH mechanism implementation in block layer
  2013-04-09 19:03     ` Tejun Heo
@ 2013-04-10  6:13       ` Tanya Brokhman
  0 siblings, 0 replies; 5+ messages in thread
From: Tanya Brokhman @ 2013-04-10  6:13 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-mmc, merez, kdorfman

Hi Tejun

Thank you for the detailed explanation!

Best Regards
Tanya Brokhman
-- 
QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a 
member of Code Aurora Forum, hosted by The Linux Foundation

On 4/9/2013 10:03 PM, Tejun Heo wrote:
> Hey,
>
> On Mon, Apr 08, 2013 at 07:19:19AM +0300, Tanya Brokhman wrote:
>> completed after the FLUSH. But perhaps there is a correct way to
>> move the implementation of "ordering around the flush" to the block
>> layer? Not that it would work better, it just feels that logically -
>> block layer is the place to do it at.
>> What do you think?
>
> It used to be implemented that way - REQ_BARRIER.  The problem was
> that filesystems wanted multiple dependency streams - ie. fdatasync()
> of a file doesn't need to drain all other IOs in progress.  We could
> do coloring of IOs - ie. give IOs which may need flushing later but
> belong to different dependency streams different colors and let block
> layer figure out partial drainining, which would work but at the same
> time be pretty nasty and complex.  The thing was that most filesystems
> were already drainings IOs and didn't need any ordering guarantee from
> block layer.  hch took care of the outliers which were depending on
> REQ_BARRIER ordering guarantees and we just stripped the ordering
> mechanism which immediately improved performance noticeably in certain
> workloads.
>
> It was done that way mostly out of convenience at that time but now I
> think of it it's the correct thing to do.  It just is too much
> information to communicate downwards and the extra communication
> overhead doesn't really buy anything.
>
> Thanks.
>




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-04-10  6:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-07 13:15 FLUSH mechanism implementation in block layer Tanya Brokhman
2013-04-07 16:33 ` Tejun Heo
2013-04-08  4:19   ` Tanya Brokhman
2013-04-09 19:03     ` Tejun Heo
2013-04-10  6:13       ` Tanya Brokhman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox