* 2.6.9-rc2 bio sickness with large writes
@ 2004-09-15 23:39 Jeff V. Merkey
2004-09-16 6:34 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Jeff V. Merkey @ 2004-09-15 23:39 UTC (permalink / raw)
To: linux-kernel, jmerkey, jmerkey
[-- Attachment #1: Type: text/plain, Size: 1131 bytes --]
I am posting bio chains each with a total size of 64KB (16 pages each)
and of each of
these I am posting chains of these bio requests in 128MB contiguous sectors
(i.e. bio = 16 pages = 64KB + bio =16 pages = 64KB + bio = 16 pages =
64KB, etc.)
With the subsequent bio requests posted after the first bio is posted
the bio requests
merge together, but I never get all the callbacks from the coaslesced
bio requests which
are posted subsequent to the intial 64KB bio. The data ends up making
it onto the drives
and I am not seeing any data loss, the 2nd,3rd, .... etc. bios don't
seem to callback
correctly.
This bug does not manifest itself every time on every bio chain. It
only shows up
part of the time. about 1 in every 2-3 bio chains behave this way.
This is severely
BUSTED and I am providing the calling code as an example of what may be
happening.
Attached is the write case and the callback.
Any ideas? Also, if I use bio's the way Linus does in his buffer cache
code for submit_bh
ine onesy - twosy mode the interface works just fine. It is severely
broken with multiple pages, however.
Jeff
[-- Attachment #2: example.c --]
[-- Type: text/x-csrc, Size: 3008 bytes --]
static int end_bio_asynch(struct bio *bio, unsigned int bytes_done, int err)
{
ASYNCH_IO *io = bio->bi_private;
if (err)
{
P_Print("asynch bio error %d\n", (int)err);
io->ccode = ASIO_IO_ERROR;
datascout_put_bio(io->bio);
io->bio = NULL;
return 0;
}
if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
io->ccode = ASIO_IO_ERROR;
atomic_inc((atomic_t *)&io->complete);
if (io->complete == io->count)
{
datascout_put_bio(io->bio);
io->bio = NULL;
insert_callback(io);
#if (PROFILE_AIO)
profile_complete();
#endif
return 0;
}
return 1;
}
ULONG aWriteDiskSectors(ULONG disk, ULONG StartingLBA, BYTE *Sector,
ULONG sectors, ULONG readAhead, ASYNCH_IO *io)
{
register ULONG i, bytesWritten = 0;
register ULONG bps, lba;
register DSDISK *DSDisk;
register ULONG rsize, blocks, blocksize, spb;
DSDisk = SystemDisk[disk];
bps = DSDisk->BytesPerSector;
blocksize = DSDisk->DeviceBlockSize;
if ((ULONG)(sectors * bps) % blocksize)
return 0;
rsize = sectors * bps;
blocks = rsize / blocksize;
if (!blocks)
return 0;
spb = blocksize / bps;
lba = StartingLBA / spb;
if (StartingLBA % spb)
{
P_Print("request not %d block aligned (%d) sectors-%d lba-%d (write)\n",
(int)blocksize, (int)(StartingLBA % spb), (int)sectors,
(int)StartingLBA);
return 0;
}
io->bio = datascout_get_bio();
if (!io->bio)
return 0;
if (io->bio->bi_max_vecs < blocks)
return 0;
io->ccode = 0;
io->count = blocks;
io->complete = 0;
io->bio->bi_bdev = DSDisk->PhysicalDiskHandle;
io->bio->bi_sector = StartingLBA;
io->bio->bi_idx = 0;
io->bio->bi_end_io = end_bio_asynch;
io->bio->bi_private = io;
io->bio->bi_vcnt = 0;
io->bio->bi_phys_segments = 0;
io->bio->bi_hw_segments = 0;
io->bio->bi_size = 0;
for (i=0; i < blocks; i++)
{
#if LINUX_26_BIO_ADDPAGE
register struct page *page = virt_to_page(&Sector[i * blocksize]);
register ULONG offset = ((ULONG)(&Sector[i * blocksize])) % PAGE_SIZE;
register ULONG bytes;
bytes = bio_add_page(io->bio, page,
PAGE_SIZE - (offset % PAGE_SIZE), 0);
bytesWritten += bytes;
#else
register struct page *page = virt_to_page(&Sector[i * blocksize]);
register ULONG offset = ((ULONG)(&Sector[i * blocksize])) % PAGE_SIZE;
io->bio->bi_io_vec[i].bv_page = page;
io->bio->bi_io_vec[i].bv_len = blocksize;
io->bio->bi_io_vec[i].bv_offset = offset;
io->bio->bi_vcnt++;
io->bio->bi_phys_segments++;
io->bio->bi_hw_segments++;
io->bio->bi_size += blocksize;
bytesWritten += blocksize;
#endif
}
// unplug the queue and drain the bathtub
bio_get(io->bio);
submit_bio(WRITE | (1 << BIO_RW_SYNC), io->bio);
bio_put(io->bio);
return (bytesWritten);
}
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: 2.6.9-rc2 bio sickness with large writes
2004-09-15 23:39 2.6.9-rc2 bio sickness with large writes Jeff V. Merkey
@ 2004-09-16 6:34 ` Jens Axboe
2004-09-16 16:38 ` Jeff V. Merkey
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2004-09-16 6:34 UTC (permalink / raw)
To: Jeff V. Merkey; +Cc: linux-kernel, jmerkey
On Wed, Sep 15 2004, Jeff V. Merkey wrote:
>
> I am posting bio chains each with a total size of 64KB (16 pages each)
> and of each of
> these I am posting chains of these bio requests in 128MB contiguous sectors
> (i.e. bio = 16 pages = 64KB + bio =16 pages = 64KB + bio = 16 pages =
> 64KB, etc.)
>
> With the subsequent bio requests posted after the first bio is posted
> the bio requests
> merge together, but I never get all the callbacks from the coaslesced
> bio requests which
> are posted subsequent to the intial 64KB bio. The data ends up making
> it onto the drives
> and I am not seeing any data loss, the 2nd,3rd, .... etc. bios don't
> seem to callback
> correctly.
>
> This bug does not manifest itself every time on every bio chain. It
> only shows up
> part of the time. about 1 in every 2-3 bio chains behave this way.
> This is severely
> BUSTED and I am providing the calling code as an example of what may be
> happening.
>
> Attached is the write case and the callback.
>
> Any ideas? Also, if I use bio's the way Linus does in his buffer cache
> code for submit_bh
> ine onesy - twosy mode the interface works just fine. It is severely
> broken with multiple pages, however.
>
> Jeff
>
>
>
>
>
>
>
> static int end_bio_asynch(struct bio *bio, unsigned int bytes_done, int err)
> {
> ASYNCH_IO *io = bio->bi_private;
>
if (bio->bi_size)
return 1;
> if (err)
> {
> P_Print("asynch bio error %d\n", (int)err);
> io->ccode = ASIO_IO_ERROR;
> datascout_put_bio(io->bio);
> io->bio = NULL;
> return 0;
> }
>
> if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
> io->ccode = ASIO_IO_ERROR;
>
> atomic_inc((atomic_t *)&io->complete);
> if (io->complete == io->count)
> {
> datascout_put_bio(io->bio);
> io->bio = NULL;
> insert_callback(io);
>
> #if (PROFILE_AIO)
> profile_complete();
> #endif
> return 0;
> }
> return 1;
>
> }
>
> ULONG aWriteDiskSectors(ULONG disk, ULONG StartingLBA, BYTE *Sector,
> ULONG sectors, ULONG readAhead, ASYNCH_IO *io)
> {
> register ULONG i, bytesWritten = 0;
> register ULONG bps, lba;
> register DSDISK *DSDisk;
> register ULONG rsize, blocks, blocksize, spb;
>
> DSDisk = SystemDisk[disk];
> bps = DSDisk->BytesPerSector;
> blocksize = DSDisk->DeviceBlockSize;
>
> if ((ULONG)(sectors * bps) % blocksize)
> return 0;
>
> rsize = sectors * bps;
> blocks = rsize / blocksize;
> if (!blocks)
> return 0;
> spb = blocksize / bps;
>
> lba = StartingLBA / spb;
> if (StartingLBA % spb)
> {
> P_Print("request not %d block aligned (%d) sectors-%d lba-%d (write)\n",
> (int)blocksize, (int)(StartingLBA % spb), (int)sectors,
> (int)StartingLBA);
> return 0;
> }
>
> io->bio = datascout_get_bio();
> if (!io->bio)
> return 0;
>
> if (io->bio->bi_max_vecs < blocks)
> return 0;
>
> io->ccode = 0;
> io->count = blocks;
> io->complete = 0;
>
> io->bio->bi_bdev = DSDisk->PhysicalDiskHandle;
> io->bio->bi_sector = StartingLBA;
> io->bio->bi_idx = 0;
> io->bio->bi_end_io = end_bio_asynch;
> io->bio->bi_private = io;
> io->bio->bi_vcnt = 0;
> io->bio->bi_phys_segments = 0;
> io->bio->bi_hw_segments = 0;
> io->bio->bi_size = 0;
>
> for (i=0; i < blocks; i++)
> {
> #if LINUX_26_BIO_ADDPAGE
> register struct page *page = virt_to_page(&Sector[i * blocksize]);
> register ULONG offset = ((ULONG)(&Sector[i * blocksize])) % PAGE_SIZE;
> register ULONG bytes;
>
> bytes = bio_add_page(io->bio, page,
> PAGE_SIZE - (offset % PAGE_SIZE), 0);
offset instead of 0?
> bytesWritten += bytes;
> #else
> register struct page *page = virt_to_page(&Sector[i * blocksize]);
> register ULONG offset = ((ULONG)(&Sector[i * blocksize])) % PAGE_SIZE;
>
> io->bio->bi_io_vec[i].bv_page = page;
> io->bio->bi_io_vec[i].bv_len = blocksize;
> io->bio->bi_io_vec[i].bv_offset = offset;
> io->bio->bi_vcnt++;
> io->bio->bi_phys_segments++;
> io->bio->bi_hw_segments++;
> io->bio->bi_size += blocksize;
> bytesWritten += blocksize;
> #endif
> }
Get rid of this if/else, it's not correct. 2.6 always had
bio_add_page(), and you _must_ use it.
> // unplug the queue and drain the bathtub
> bio_get(io->bio);
> submit_bio(WRITE | (1 << BIO_RW_SYNC), io->bio);
> bio_put(io->bio);
You don't get to get/put the bio here, you aren't touching it after
submit_bio().
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: 2.6.9-rc2 bio sickness with large writes
2004-09-16 6:34 ` Jens Axboe
@ 2004-09-16 16:38 ` Jeff V. Merkey
2004-09-17 7:36 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Jeff V. Merkey @ 2004-09-16 16:38 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-kernel, jmerkey
>
>>static int end_bio_asynch(struct bio *bio, unsigned int bytes_done, int err)
>>{
>> ASYNCH_IO *io = bio->bi_private;
>>
>>
>>
>
> if (bio->bi_size)
> return 1;
>
>
I guess the question here is if a group of coalesed bios are submitted,
shouldn't I get a callback
on each one? This implies if they merge beneath me, I won't always see
the callback for each one.
Is this an acuurate assumption?
>
>
>> register struct page *page = virt_to_page(&Sector[i * blocksize]);
>> register ULONG offset = ((ULONG)(&Sector[i * blocksize])) % PAGE_SIZE;
>> register ULONG bytes;
>>
>> bytes = bio_add_page(io->bio, page,
>> PAGE_SIZE - (offset % PAGE_SIZE), 0);
>>
>>
>
>offset instead of 0?
>
>
>
blocksize always = PAGE_SIZE in this code. PAGE_SIZE - (offset %
PAGE_SIZE) determines the length of the
transfer. You could just as well substitute PAGE_SIZE for blocksize. I
always set the device (if it supports it) to a
PAGE_SIZE for the blocksize. 1024 blocksize is a little small.
>Get rid of this if/else, it's not correct. 2.6 always had
>bio_add_page(), and you _must_ use it.
>
>
>
Linus should then do the same in buffer.c since this example is
misleading in the code. Linus seems to get away with it
so why can't I? :-)
buffer.c line 2776
/*
* from here on down, it's all bio -- do the initial mapping,
* submit_bio -> generic_make_request may further map this bio around
*/
bio = bio_alloc(GFP_NOIO, 1);
bio->bi_sector = bh->b_blocknr * (bh->b_size >> 9);
bio->bi_bdev = bh->b_bdev;
bio->bi_io_vec[0].bv_page = bh->b_page;
bio->bi_io_vec[0].bv_len = bh->b_size;
bio->bi_io_vec[0].bv_offset = bh_offset(bh);
bio->bi_vcnt = 1;
bio->bi_idx = 0;
bio->bi_size = bh->b_size;
bio->bi_end_io = end_bio_bh_io_sync;
bio->bi_private = bh;
submit_bio(rw, bio);
>> // unplug the queue and drain the bathtub
>> bio_get(io->bio);
>> submit_bio(WRITE | (1 << BIO_RW_SYNC), io->bio);
>> bio_put(io->bio);
>>
>>
>
>You don't get to get/put the bio here, you aren't touching it after
>submit_bio().
>
>
>
I removed this from my code. You need to correct the example in bio.h
with something more meaningful. This is what
the source code says to do.
bio.h line 213
/*
* get a reference to a bio, so it won't disappear. the intended use is
* something like: *
* bio_get(bio);
* submit_bio(rw, bio);
* if (bio->bi_flags ...)
* do_something
* bio_put(bio);
*
* without the bio_get(), it could potentially complete I/O before submit_bio
* returns. and then bio would be freed memory when if (bio->bi_flags ...)
* runs
*/
I still see the problem and I suspect it's related to the callback
mechanism with the bio-bi_size check always returning 1. I guess
the question here is are bios guaranteed to always return a 1:1 ratio of
submission vs. callbacks?
Jeff
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: 2.6.9-rc2 bio sickness with large writes
2004-09-16 16:38 ` Jeff V. Merkey
@ 2004-09-17 7:36 ` Jens Axboe
[not found] ` <20040917201604.GA12974@galt.devicelogics.com>
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2004-09-17 7:36 UTC (permalink / raw)
To: Jeff V. Merkey; +Cc: linux-kernel, jmerkey
On Thu, Sep 16 2004, Jeff V. Merkey wrote:
>
> >
> >>static int end_bio_asynch(struct bio *bio, unsigned int bytes_done, int
> >>err)
> >>{
> >> ASYNCH_IO *io = bio->bi_private;
> >>
> >>
> >>
> >
> > if (bio->bi_size)
> > return 1;
> >
> >
>
> I guess the question here is if a group of coalesed bios are submitted,
> shouldn't I get a callback
> on each one? This implies if they merge beneath me, I won't always see
> the callback for each one.
> Is this an acuurate assumption?
No, you are reading it wrong. You will get a callback for each bio that
you issued, regardless of merge status. You never need to care about
merge status. But you might get partial completions from drivers, so
multiple completions on the same bio. Your code doesn't look like it's
handling that - you either need to take bytes_done into account, or just
add the above two lines if you don't care about partial completions.
When bi_size reaches 0, all io is done on this bio.
> >> register struct page *page = virt_to_page(&Sector[i * blocksize]);
> >> register ULONG offset = ((ULONG)(&Sector[i * blocksize])) %
> >> PAGE_SIZE;
> >> register ULONG bytes;
> >>
> >> bytes = bio_add_page(io->bio, page,
> >> PAGE_SIZE - (offset % PAGE_SIZE), 0);
> >>
> >>
> >
> >offset instead of 0?
> >
> >
> >
> blocksize always = PAGE_SIZE in this code. PAGE_SIZE - (offset %
> PAGE_SIZE) determines the length of the
> transfer. You could just as well substitute PAGE_SIZE for blocksize. I
> always set the device (if it supports it) to a
> PAGE_SIZE for the blocksize. 1024 blocksize is a little small.
Ok, that's not visible from the source you sent. You should still change
that to offset in case this changes. Or if not, kill the PAGE_SIZE - xxx
stuff to avoid confusion.
> >Get rid of this if/else, it's not correct. 2.6 always had
> >bio_add_page(), and you _must_ use it.
> >
> >
> >
>
> Linus should then do the same in buffer.c since this example is
> misleading in the code. Linus seems to get away with it
> so why can't I? :-)
Linus didn't write that, I did.
>
> buffer.c line 2776
> /*
> * from here on down, it's all bio -- do the initial mapping,
> * submit_bio -> generic_make_request may further map this bio around
> */
> bio = bio_alloc(GFP_NOIO, 1);
>
> bio->bi_sector = bh->b_blocknr * (bh->b_size >> 9);
> bio->bi_bdev = bh->b_bdev;
> bio->bi_io_vec[0].bv_page = bh->b_page;
> bio->bi_io_vec[0].bv_len = bh->b_size;
> bio->bi_io_vec[0].bv_offset = bh_offset(bh);
>
> bio->bi_vcnt = 1;
> bio->bi_idx = 0;
> bio->bi_size = bh->b_size;
>
> bio->bi_end_io = end_bio_bh_io_sync;
> bio->bi_private = bh;
>
> submit_bio(rw, bio);
And submit_bh() is correct, since it only uses a single page. You are
adding multiple pages to the bio manually, this is highly illegal. If
you aren't using bio_add_page() to do this you are both on your own and
in a world of pain support wise.
> >> // unplug the queue and drain the bathtub
> >> bio_get(io->bio);
> >> submit_bio(WRITE | (1 << BIO_RW_SYNC), io->bio);
> >> bio_put(io->bio);
> >>
> >>
> >
> >You don't get to get/put the bio here, you aren't touching it after
> >submit_bio().
> >
> >
> >
>
> I removed this from my code. You need to correct the example in bio.h
> with something more meaningful. This is what
> the source code says to do.
>
> bio.h line 213
>
> /*
> * get a reference to a bio, so it won't disappear. the intended use is
> * something like: *
> * bio_get(bio);
> * submit_bio(rw, bio);
> * if (bio->bi_flags ...)
> * do_something
> * bio_put(bio);
> *
> * without the bio_get(), it could potentially complete I/O before submit_bio
> * returns. and then bio would be freed memory when if (bio->bi_flags ...)
> * runs
> */
No, you need to read it properly. See how it looks at the bio after
submitting it in this example? That's why it gets a reference to it.
There's even a comment to that effect. You don't need to hold a
reference across the submit_bio() call if you don't look at the bio
afterwards, why would you care? This is, btw, no different than the bh
semantics since 2.4...
> I still see the problem and I suspect it's related to the callback
> mechanism with the bio-bi_size check always returning 1. I guess
> the question here is are bios guaranteed to always return a 1:1 ratio of
> submission vs. callbacks?
Fix your code like I described first, I cannot help you otherwise. Your
question is answered further up (it's yes).
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-09-21 6:42 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-15 23:39 2.6.9-rc2 bio sickness with large writes Jeff V. Merkey
2004-09-16 6:34 ` Jens Axboe
2004-09-16 16:38 ` Jeff V. Merkey
2004-09-17 7:36 ` Jens Axboe
[not found] ` <20040917201604.GA12974@galt.devicelogics.com>
2004-09-20 17:12 ` Jeff V. Merkey
2004-09-20 18:09 ` Jens Axboe
2004-09-20 19:20 ` Jeff V. Merkey
2004-09-21 6:40 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox