All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [ceph-users] O_DIRECT on deep-scrub read
       [not found] <56150598.1080604@sadziu.pl>
@ 2015-10-07 14:50 ` Sage Weil
  2015-10-07 15:18   ` Milosz Tanski
  2015-10-07 19:51   ` David Zafman
  0 siblings, 2 replies; 6+ messages in thread
From: Sage Weil @ 2015-10-07 14:50 UTC (permalink / raw)
  To: Paweł Sadowski; +Cc: ceph-users, ceph-devel

It's not, but it would not be ahrd to do this.  There are fadvise-style 
hints being passed down that could trigger O_DIRECT reads in this case.  
That may not be the best choice, though--it won't use data that happens 
to be in cache and it'll also throw it out..

On Wed, 7 Oct 2015, Pawe? Sadowski wrote:

> Hi,
> 
> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
> not able to verify that in source code.
> 
> If not would it be possible to add such feature (maybe config option) to
> help keeping Linux page cache in better shape?
> 
> Thanks,
> 
> -- 
> PS
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ceph-users] O_DIRECT on deep-scrub read
  2015-10-07 14:50 ` [ceph-users] O_DIRECT on deep-scrub read Sage Weil
@ 2015-10-07 15:18   ` Milosz Tanski
  2015-10-07 19:51   ` David Zafman
  1 sibling, 0 replies; 6+ messages in thread
From: Milosz Tanski @ 2015-10-07 15:18 UTC (permalink / raw)
  To: Sage Weil; +Cc: Paweł Sadowski, ceph-users, ceph-devel

On Wed, Oct 7, 2015 at 10:50 AM, Sage Weil <sage@newdream.net> wrote:
> It's not, but it would not be ahrd to do this.  There are fadvise-style
> hints being passed down that could trigger O_DIRECT reads in this case.
> That may not be the best choice, though--it won't use data that happens
> to be in cache and it'll also throw it out..
>
> On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
>
>> Hi,
>>
>> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
>> not able to verify that in source code.
>>
>> If not would it be possible to add such feature (maybe config option) to
>> help keeping Linux page cache in better shape?
>>
>> Thanks,

When I was working on preadv2 somebody brought up a per operation
O_DIRECT flag. There wasn't a clear use case at the time (outside of
to saying Linus would "love that").

>>
>> --
>> PS
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@adfin.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ceph-users] O_DIRECT on deep-scrub read
  2015-10-07 14:50 ` [ceph-users] O_DIRECT on deep-scrub read Sage Weil
  2015-10-07 15:18   ` Milosz Tanski
@ 2015-10-07 19:51   ` David Zafman
  2015-10-07 20:52     ` Sage Weil
  1 sibling, 1 reply; 6+ messages in thread
From: David Zafman @ 2015-10-07 19:51 UTC (permalink / raw)
  To: Sage Weil, Paweł Sadowski; +Cc: ceph-users, ceph-devel


There would be a benefit to doing fadvise POSIX_FADV_DONTNEED after 
deep-scrub reads for objects not recently accessed by clients.

I see the NewStore objectstore sometimes using the O_DIRECT  flag for 
writes.  This concerns me because the open(2) man pages says:

"Applications should avoid mixing O_DIRECT and normal I/O to the same 
file, and especially to overlapping byte regions in the same file.  Even 
when the filesystem correctly handles the coherency issues in this 
situation, overall I/O throughput is likely to be slower than using 
either mode alone."

David

On 10/7/15 7:50 AM, Sage Weil wrote:
> It's not, but it would not be ahrd to do this.  There are fadvise-style
> hints being passed down that could trigger O_DIRECT reads in this case.
> That may not be the best choice, though--it won't use data that happens
> to be in cache and it'll also throw it out..
>
> On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
>
>> Hi,
>>
>> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
>> not able to verify that in source code.
>>
>> If not would it be possible to add such feature (maybe config option) to
>> help keeping Linux page cache in better shape?
>>
>> Thanks,
>>
>> -- 
>> PS
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ceph-users] O_DIRECT on deep-scrub read
  2015-10-07 19:51   ` David Zafman
@ 2015-10-07 20:52     ` Sage Weil
       [not found]       ` <alpine.DEB.2.00.1510071349410.8667-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2015-10-07 20:52 UTC (permalink / raw)
  To: David Zafman; +Cc: Paweł Sadowski, ceph-users, ceph-devel

On Wed, 7 Oct 2015, David Zafman wrote:
> 
> There would be a benefit to doing fadvise POSIX_FADV_DONTNEED after 
> deep-scrub reads for objects not recently accessed by clients.

Yeah, it's the 'except for stuff already in cache' part that we don't do 
(and the kernel doesn't give us a good interface for).  IIRC there was a 
patch that guessed based on whether the obc was already in cache, which 
seems like a pretty decent heuristic, but I forget if that was in the 
final version.

> I see the NewStore objectstore sometimes using the O_DIRECT  flag for writes.
> This concerns me because the open(2) man pages says:
> 
> "Applications should avoid mixing O_DIRECT and normal I/O to the same file,
> and especially to overlapping byte regions in the same file.  Even when the
> filesystem correctly handles the coherency issues in this situation, overall
> I/O throughput is likely to be slower than using either mode alone."

Yeah: an O_DIRECT write will do a cache flush on the write range, so if 
there was already dirty data in cache you'll write twice.  There's 
similarly an invalidate on read.  I need to go back through the newstore 
code and see how the modes are being mixed and how it can be avoided...

sage


> 
> David
> 
> On 10/7/15 7:50 AM, Sage Weil wrote:
> > It's not, but it would not be ahrd to do this.  There are fadvise-style
> > hints being passed down that could trigger O_DIRECT reads in this case.
> > That may not be the best choice, though--it won't use data that happens
> > to be in cache and it'll also throw it out..
> > 
> > On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
> > 
> > > Hi,
> > > 
> > > Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
> > > not able to verify that in source code.
> > > 
> > > If not would it be possible to add such feature (maybe config option) to
> > > help keeping Linux page cache in better shape?
> > > 
> > > Thanks,
> > > 
> > > -- 
> > > PS
> > > 
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: O_DIRECT on deep-scrub read
       [not found]       ` <alpine.DEB.2.00.1510071349410.8667-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2015-10-08  8:11         ` Paweł Sadowski
  2015-10-09 16:54           ` [ceph-users] " Milosz Tanski
  0 siblings, 1 reply; 6+ messages in thread
From: Paweł Sadowski @ 2015-10-08  8:11 UTC (permalink / raw)
  To: Sage Weil, David Zafman; +Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users

On 10/07/2015 10:52 PM, Sage Weil wrote:
> On Wed, 7 Oct 2015, David Zafman wrote:
>> There would be a benefit to doing fadvise POSIX_FADV_DONTNEED after 
>> deep-scrub reads for objects not recently accessed by clients.
> Yeah, it's the 'except for stuff already in cache' part that we don't do 
> (and the kernel doesn't give us a good interface for).  IIRC there was a 
> patch that guessed based on whether the obc was already in cache, which 
> seems like a pretty decent heuristic, but I forget if that was in the 
> final version.

I've run some tests and it look like on XFS cache is discarded on
O_DIRECT write and read but on EXT4 is discarded only on O_DIRECT write.
I've found some patches to add support for "read only if in page cache"
(preadv2/RWF_NONBLOCK) but can't find them in kernel source. Maybe
Milosz Tanski can tell more about that. I think it could help a bit
during deep scrub.

>> I see the NewStore objectstore sometimes using the O_DIRECT  flag for writes.
>> This concerns me because the open(2) man pages says:
>>
>> "Applications should avoid mixing O_DIRECT and normal I/O to the same file,
>> and especially to overlapping byte regions in the same file.  Even when the
>> filesystem correctly handles the coherency issues in this situation, overall
>> I/O throughput is likely to be slower than using either mode alone."
> Yeah: an O_DIRECT write will do a cache flush on the write range, so if 
> there was already dirty data in cache you'll write twice.  There's 
> similarly an invalidate on read.  I need to go back through the newstore 
> code and see how the modes are being mixed and how it can be avoided...
>
> sage
>
>
>> On 10/7/15 7:50 AM, Sage Weil wrote:
>>> It's not, but it would not be ahrd to do this.  There are fadvise-style
>>> hints being passed down that could trigger O_DIRECT reads in this case.
>>> That may not be the best choice, though--it won't use data that happens
>>> to be in cache and it'll also throw it out..
>>>
>>> On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
>>>
>>>> Hi,
>>>>
>>>> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
>>>> not able to verify that in source code.
>>>>
>>>> If not would it be possible to add such feature (maybe config option) to
>>>> help keeping Linux page cache in better shape?
>>>>
>>>> Thanks,

-- 
PS

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ceph-users] O_DIRECT on deep-scrub read
  2015-10-08  8:11         ` Paweł Sadowski
@ 2015-10-09 16:54           ` Milosz Tanski
  0 siblings, 0 replies; 6+ messages in thread
From: Milosz Tanski @ 2015-10-09 16:54 UTC (permalink / raw)
  To: Paweł Sadowski; +Cc: Sage Weil, David Zafman, ceph-users, ceph-devel

On Thu, Oct 8, 2015 at 4:11 AM, Paweł Sadowski <ceph@sadziu.pl> wrote:
>
> On 10/07/2015 10:52 PM, Sage Weil wrote:
> > On Wed, 7 Oct 2015, David Zafman wrote:
> >> There would be a benefit to doing fadvise POSIX_FADV_DONTNEED after
> >> deep-scrub reads for objects not recently accessed by clients.
> > Yeah, it's the 'except for stuff already in cache' part that we don't do
> > (and the kernel doesn't give us a good interface for).  IIRC there was a
> > patch that guessed based on whether the obc was already in cache, which
> > seems like a pretty decent heuristic, but I forget if that was in the
> > final version.
>
> I've run some tests and it look like on XFS cache is discarded on
> O_DIRECT write and read but on EXT4 is discarded only on O_DIRECT write.
> I've found some patches to add support for "read only if in page cache"
> (preadv2/RWF_NONBLOCK) but can't find them in kernel source. Maybe
> Milosz Tanski can tell more about that. I think it could help a bit
> during deep scrub.


After a fair amount of bike shedding on the API (and removing
pwritev2) it looked like we (me and Christoph) had enough consensus to
get it upstream. But sadly it died, akpm preferred different approach
(fincore) and with enough roadblocks it died :/

>
>
> >> I see the NewStore objectstore sometimes using the O_DIRECT  flag for writes.
> >> This concerns me because the open(2) man pages says:
> >>
> >> "Applications should avoid mixing O_DIRECT and normal I/O to the same file,
> >> and especially to overlapping byte regions in the same file.  Even when the
> >> filesystem correctly handles the coherency issues in this situation, overall
> >> I/O throughput is likely to be slower than using either mode alone."
> > Yeah: an O_DIRECT write will do a cache flush on the write range, so if
> > there was already dirty data in cache you'll write twice.  There's
> > similarly an invalidate on read.  I need to go back through the newstore
> > code and see how the modes are being mixed and how it can be avoided...
> >
> > sage
> >
> >
> >> On 10/7/15 7:50 AM, Sage Weil wrote:
> >>> It's not, but it would not be ahrd to do this.  There are fadvise-style
> >>> hints being passed down that could trigger O_DIRECT reads in this case.
> >>> That may not be the best choice, though--it won't use data that happens
> >>> to be in cache and it'll also throw it out..
> >>>
> >>> On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
> >>>> not able to verify that in source code.
> >>>>
> >>>> If not would it be possible to add such feature (maybe config option) to
> >>>> help keeping Linux page cache in better shape?
> >>>>
> >>>> Thanks,
>
> --
> PS
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html




-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-10-09 16:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <56150598.1080604@sadziu.pl>
2015-10-07 14:50 ` [ceph-users] O_DIRECT on deep-scrub read Sage Weil
2015-10-07 15:18   ` Milosz Tanski
2015-10-07 19:51   ` David Zafman
2015-10-07 20:52     ` Sage Weil
     [not found]       ` <alpine.DEB.2.00.1510071349410.8667-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-10-08  8:11         ` Paweł Sadowski
2015-10-09 16:54           ` [ceph-users] " Milosz Tanski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.