* Re: [ceph-users] O_DIRECT on deep-scrub read
[not found] <56150598.1080604@sadziu.pl>
@ 2015-10-07 14:50 ` Sage Weil
2015-10-07 15:18 ` Milosz Tanski
2015-10-07 19:51 ` David Zafman
0 siblings, 2 replies; 6+ messages in thread
From: Sage Weil @ 2015-10-07 14:50 UTC (permalink / raw)
To: Paweł Sadowski; +Cc: ceph-users, ceph-devel
It's not, but it would not be ahrd to do this. There are fadvise-style
hints being passed down that could trigger O_DIRECT reads in this case.
That may not be the best choice, though--it won't use data that happens
to be in cache and it'll also throw it out..
On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
> Hi,
>
> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
> not able to verify that in source code.
>
> If not would it be possible to add such feature (maybe config option) to
> help keeping Linux page cache in better shape?
>
> Thanks,
>
> --
> PS
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [ceph-users] O_DIRECT on deep-scrub read
2015-10-07 14:50 ` [ceph-users] O_DIRECT on deep-scrub read Sage Weil
@ 2015-10-07 15:18 ` Milosz Tanski
2015-10-07 19:51 ` David Zafman
1 sibling, 0 replies; 6+ messages in thread
From: Milosz Tanski @ 2015-10-07 15:18 UTC (permalink / raw)
To: Sage Weil; +Cc: Paweł Sadowski, ceph-users, ceph-devel
On Wed, Oct 7, 2015 at 10:50 AM, Sage Weil <sage@newdream.net> wrote:
> It's not, but it would not be ahrd to do this. There are fadvise-style
> hints being passed down that could trigger O_DIRECT reads in this case.
> That may not be the best choice, though--it won't use data that happens
> to be in cache and it'll also throw it out..
>
> On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
>
>> Hi,
>>
>> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
>> not able to verify that in source code.
>>
>> If not would it be possible to add such feature (maybe config option) to
>> help keeping Linux page cache in better shape?
>>
>> Thanks,
When I was working on preadv2 somebody brought up a per operation
O_DIRECT flag. There wasn't a clear use case at the time (outside of
to saying Linus would "love that").
>>
>> --
>> PS
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016
p: 646-253-9055
e: milosz@adfin.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [ceph-users] O_DIRECT on deep-scrub read
2015-10-07 14:50 ` [ceph-users] O_DIRECT on deep-scrub read Sage Weil
2015-10-07 15:18 ` Milosz Tanski
@ 2015-10-07 19:51 ` David Zafman
2015-10-07 20:52 ` Sage Weil
1 sibling, 1 reply; 6+ messages in thread
From: David Zafman @ 2015-10-07 19:51 UTC (permalink / raw)
To: Sage Weil, Paweł Sadowski; +Cc: ceph-users, ceph-devel
There would be a benefit to doing fadvise POSIX_FADV_DONTNEED after
deep-scrub reads for objects not recently accessed by clients.
I see the NewStore objectstore sometimes using the O_DIRECT flag for
writes. This concerns me because the open(2) man pages says:
"Applications should avoid mixing O_DIRECT and normal I/O to the same
file, and especially to overlapping byte regions in the same file. Even
when the filesystem correctly handles the coherency issues in this
situation, overall I/O throughput is likely to be slower than using
either mode alone."
David
On 10/7/15 7:50 AM, Sage Weil wrote:
> It's not, but it would not be ahrd to do this. There are fadvise-style
> hints being passed down that could trigger O_DIRECT reads in this case.
> That may not be the best choice, though--it won't use data that happens
> to be in cache and it'll also throw it out..
>
> On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
>
>> Hi,
>>
>> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
>> not able to verify that in source code.
>>
>> If not would it be possible to add such feature (maybe config option) to
>> help keeping Linux page cache in better shape?
>>
>> Thanks,
>>
>> --
>> PS
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [ceph-users] O_DIRECT on deep-scrub read
2015-10-07 19:51 ` David Zafman
@ 2015-10-07 20:52 ` Sage Weil
[not found] ` <alpine.DEB.2.00.1510071349410.8667-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2015-10-07 20:52 UTC (permalink / raw)
To: David Zafman; +Cc: Paweł Sadowski, ceph-users, ceph-devel
On Wed, 7 Oct 2015, David Zafman wrote:
>
> There would be a benefit to doing fadvise POSIX_FADV_DONTNEED after
> deep-scrub reads for objects not recently accessed by clients.
Yeah, it's the 'except for stuff already in cache' part that we don't do
(and the kernel doesn't give us a good interface for). IIRC there was a
patch that guessed based on whether the obc was already in cache, which
seems like a pretty decent heuristic, but I forget if that was in the
final version.
> I see the NewStore objectstore sometimes using the O_DIRECT flag for writes.
> This concerns me because the open(2) man pages says:
>
> "Applications should avoid mixing O_DIRECT and normal I/O to the same file,
> and especially to overlapping byte regions in the same file. Even when the
> filesystem correctly handles the coherency issues in this situation, overall
> I/O throughput is likely to be slower than using either mode alone."
Yeah: an O_DIRECT write will do a cache flush on the write range, so if
there was already dirty data in cache you'll write twice. There's
similarly an invalidate on read. I need to go back through the newstore
code and see how the modes are being mixed and how it can be avoided...
sage
>
> David
>
> On 10/7/15 7:50 AM, Sage Weil wrote:
> > It's not, but it would not be ahrd to do this. There are fadvise-style
> > hints being passed down that could trigger O_DIRECT reads in this case.
> > That may not be the best choice, though--it won't use data that happens
> > to be in cache and it'll also throw it out..
> >
> > On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
> >
> > > Hi,
> > >
> > > Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
> > > not able to verify that in source code.
> > >
> > > If not would it be possible to add such feature (maybe config option) to
> > > help keeping Linux page cache in better shape?
> > >
> > > Thanks,
> > >
> > > --
> > > PS
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: O_DIRECT on deep-scrub read
[not found] ` <alpine.DEB.2.00.1510071349410.8667-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2015-10-08 8:11 ` Paweł Sadowski
2015-10-09 16:54 ` [ceph-users] " Milosz Tanski
0 siblings, 1 reply; 6+ messages in thread
From: Paweł Sadowski @ 2015-10-08 8:11 UTC (permalink / raw)
To: Sage Weil, David Zafman; +Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA, ceph-users
On 10/07/2015 10:52 PM, Sage Weil wrote:
> On Wed, 7 Oct 2015, David Zafman wrote:
>> There would be a benefit to doing fadvise POSIX_FADV_DONTNEED after
>> deep-scrub reads for objects not recently accessed by clients.
> Yeah, it's the 'except for stuff already in cache' part that we don't do
> (and the kernel doesn't give us a good interface for). IIRC there was a
> patch that guessed based on whether the obc was already in cache, which
> seems like a pretty decent heuristic, but I forget if that was in the
> final version.
I've run some tests and it look like on XFS cache is discarded on
O_DIRECT write and read but on EXT4 is discarded only on O_DIRECT write.
I've found some patches to add support for "read only if in page cache"
(preadv2/RWF_NONBLOCK) but can't find them in kernel source. Maybe
Milosz Tanski can tell more about that. I think it could help a bit
during deep scrub.
>> I see the NewStore objectstore sometimes using the O_DIRECT flag for writes.
>> This concerns me because the open(2) man pages says:
>>
>> "Applications should avoid mixing O_DIRECT and normal I/O to the same file,
>> and especially to overlapping byte regions in the same file. Even when the
>> filesystem correctly handles the coherency issues in this situation, overall
>> I/O throughput is likely to be slower than using either mode alone."
> Yeah: an O_DIRECT write will do a cache flush on the write range, so if
> there was already dirty data in cache you'll write twice. There's
> similarly an invalidate on read. I need to go back through the newstore
> code and see how the modes are being mixed and how it can be avoided...
>
> sage
>
>
>> On 10/7/15 7:50 AM, Sage Weil wrote:
>>> It's not, but it would not be ahrd to do this. There are fadvise-style
>>> hints being passed down that could trigger O_DIRECT reads in this case.
>>> That may not be the best choice, though--it won't use data that happens
>>> to be in cache and it'll also throw it out..
>>>
>>> On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
>>>
>>>> Hi,
>>>>
>>>> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
>>>> not able to verify that in source code.
>>>>
>>>> If not would it be possible to add such feature (maybe config option) to
>>>> help keeping Linux page cache in better shape?
>>>>
>>>> Thanks,
--
PS
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [ceph-users] O_DIRECT on deep-scrub read
2015-10-08 8:11 ` Paweł Sadowski
@ 2015-10-09 16:54 ` Milosz Tanski
0 siblings, 0 replies; 6+ messages in thread
From: Milosz Tanski @ 2015-10-09 16:54 UTC (permalink / raw)
To: Paweł Sadowski; +Cc: Sage Weil, David Zafman, ceph-users, ceph-devel
On Thu, Oct 8, 2015 at 4:11 AM, Paweł Sadowski <ceph@sadziu.pl> wrote:
>
> On 10/07/2015 10:52 PM, Sage Weil wrote:
> > On Wed, 7 Oct 2015, David Zafman wrote:
> >> There would be a benefit to doing fadvise POSIX_FADV_DONTNEED after
> >> deep-scrub reads for objects not recently accessed by clients.
> > Yeah, it's the 'except for stuff already in cache' part that we don't do
> > (and the kernel doesn't give us a good interface for). IIRC there was a
> > patch that guessed based on whether the obc was already in cache, which
> > seems like a pretty decent heuristic, but I forget if that was in the
> > final version.
>
> I've run some tests and it look like on XFS cache is discarded on
> O_DIRECT write and read but on EXT4 is discarded only on O_DIRECT write.
> I've found some patches to add support for "read only if in page cache"
> (preadv2/RWF_NONBLOCK) but can't find them in kernel source. Maybe
> Milosz Tanski can tell more about that. I think it could help a bit
> during deep scrub.
After a fair amount of bike shedding on the API (and removing
pwritev2) it looked like we (me and Christoph) had enough consensus to
get it upstream. But sadly it died, akpm preferred different approach
(fincore) and with enough roadblocks it died :/
>
>
> >> I see the NewStore objectstore sometimes using the O_DIRECT flag for writes.
> >> This concerns me because the open(2) man pages says:
> >>
> >> "Applications should avoid mixing O_DIRECT and normal I/O to the same file,
> >> and especially to overlapping byte regions in the same file. Even when the
> >> filesystem correctly handles the coherency issues in this situation, overall
> >> I/O throughput is likely to be slower than using either mode alone."
> > Yeah: an O_DIRECT write will do a cache flush on the write range, so if
> > there was already dirty data in cache you'll write twice. There's
> > similarly an invalidate on read. I need to go back through the newstore
> > code and see how the modes are being mixed and how it can be avoided...
> >
> > sage
> >
> >
> >> On 10/7/15 7:50 AM, Sage Weil wrote:
> >>> It's not, but it would not be ahrd to do this. There are fadvise-style
> >>> hints being passed down that could trigger O_DIRECT reads in this case.
> >>> That may not be the best choice, though--it won't use data that happens
> >>> to be in cache and it'll also throw it out..
> >>>
> >>> On Wed, 7 Oct 2015, Pawe? Sadowski wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Can anyone tell if deep scrub is done using O_DIRECT flag or not? I'm
> >>>> not able to verify that in source code.
> >>>>
> >>>> If not would it be possible to add such feature (maybe config option) to
> >>>> help keeping Linux page cache in better shape?
> >>>>
> >>>> Thanks,
>
> --
> PS
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016
p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-10-09 16:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <56150598.1080604@sadziu.pl>
2015-10-07 14:50 ` [ceph-users] O_DIRECT on deep-scrub read Sage Weil
2015-10-07 15:18 ` Milosz Tanski
2015-10-07 19:51 ` David Zafman
2015-10-07 20:52 ` Sage Weil
[not found] ` <alpine.DEB.2.00.1510071349410.8667-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-10-08 8:11 ` Paweł Sadowski
2015-10-09 16:54 ` [ceph-users] " Milosz Tanski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.