From: Brett Russ <icycle+lkml@gmail.com>
To: linux-kernel@vger.kernel.org
Subject: O_DIRECT reads appear to be cached on block device partition file?
Date: Mon, 13 Sep 2010 23:49:32 -0400 [thread overview]
Message-ID: <i6mrcc$ers$1@dough.gmane.org> (raw)
Running a 2.6.31 kernel on a blade chassis system with multiple blades
sharing common JBOD storage. The application intelligently divides the
drives up among the blades, but one blade in particular is charged with
monitoring. As part of this, this one monitoring blade can perform
reads of a certain 512B sector of all disks in the system. This sector
is often written by other blades, these writes are sync'd to disk. To
work around the lack of cache coherency between the distinct blades, I'm
using O_DIRECT on the monitoring blade such that it always reads from
the media to get the latest copy of this sector. The basic steps are:
# grab a 512B aligned buffer (use 4KB to be safe)
posix_memalign(&ptr, getpagesize(), 512B)
open(/dev/sdX3, O_RDONLY|O_DIRECT)
lseek(fd, offset, SEEK_SET)
read(fd, ptr, 512B)
If I run the above on the monitoring blade, then sync an update to the
sector in question from another blade, then re-reun the above code on
the monitoring blade, believe it or not I appear to be reading stale
data. If I use dd with iflag=direct, reading the same sector offset at
the /dev/sdX3 partition file, I see the same stale data as seen from the
code above. If, however, I instead access this sector offset from the
/dev/sdX device file using the (offset of partition 3 + offset of the
sector) I see the intended data, which makes me believe some caching
occurred locally for /dev/sdX3.
I've searched google for too long, coming up empty. I've tried changing
the open flags to O_RDWR|O_DIRECT in case the kernel was doing something
special b/c it believed there to be no writing to this fd. Perhaps
there is an issue with reading only 512B rather than the 1KB Linux block
size or 4KB page size?
Am I missing something about O_DIRECT, especially as it pertains to a
device (and a partition on that device) file? I thought NO caching
would be involved.
Any clues appreciated,
thanks,
Brett
next reply other threads:[~2010-09-14 4:05 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-14 3:49 Brett Russ [this message]
2010-09-14 7:36 ` O_DIRECT reads appear to be cached on block device partition file? Dave Chinner
2010-09-17 22:22 ` Brett Russ
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='i6mrcc$ers$1@dough.gmane.org' \
--to=icycle+lkml@gmail.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox