From: Johannes Bauer <dfnsonfsduifb@gmx.de>
To: dm-devel@redhat.com, doug@easyco.com
Subject: Re: Newbie device mapper questions
Date: Tue, 16 Jun 2015 20:54:33 +0200 [thread overview]
Message-ID: <558070E9.70607@gmx.de> (raw)
In-Reply-To: <CAFx4rwQ0ganzZn5FjQZTVtk1DK1Qr1pmMCTxRCwxd793+-anHg@mail.gmail.com>
On 15.06.2015 21:52, Doug Dumitru wrote:
>> Sounds pretty easy and I also got surprisingly far with my little kernel
>> module. I've so far implemented ctr, dtr, map and status.
>
> Congratulations, you are actually a long way there.
Thanks but I think I have the mountain still ahead -- still, I would
really like to figure out the nitty-gritty.
> You have to allocate a bio, populate it, allocate pages for buffer,
> populate the bvec, and call make_request (or generic make request). You
> will get the completion from the bio on the bottom half of the interrupt
> handler, so how much work you can do there is debatable. You cannot start
> an new IO from there, which you need to. You will probably want to start a
> helper thread and have the completion routine schedule itself onto your
> thread. Once you are back on your thread, you can do just about anything.
>
> Because you need to do IO, you will not be able to do a simple bio "bounce
> redirect". You will need to do the IO youself (ie, call another make
> request), but you can use the callers bvec for this, so there is no data
> copy required. Once the request completes, you can then fin the caller.
Oh, wow. This sounds truly terrifying. Let's dive in!
I tried to read your hints one word at a time. So here's the somewhat
pseudocodish solution to my homework:
struct bio *b = bio_alloc(GFP_NOIO, 1);
b->bi_size = 8;
bio_alloc_pages(b, GFP_NOIO);
b->bi_sector = 1234;
b->bi_bdev = lc->metadev->bdev;
b->bi_rw = READ;
b->bi_private = local_ctx;
b->bi_end_io = read_complete_callback;
generic_make_request(bi);
static void read_complete_callback(struct bio *b, int error) {
// ???
printk(KERN_INFO "First read byte: %02x\n",
b->bi_io_vec[0]->bv_page[0]);
}
So I hope this is even remotely close to what I should end up with.
This will alloc a new bio with, as I understand it, one page buffer in
b->bi_io_vec. This buffer is then allocated with bio_alloc_pages to 8
sectors in size (i.e. exactly one page of 4096 bytes). Then the read
address, block device and read mode is set. I pass some kind of local
context so I can do something meaningful in the callback and specify the
callback function. Then I execute the request.
As I understand, this executes asynchronously. So here comes the
threading into play, right? Just pseudocode (because I can't judge how
far I'm off here), but let's say this is map():
void read_complete_callback() {
semaphore_inc(local_ctx);
}
void map() {
local_ctx->semaphore->value = 0;
// Issue read as above
generic_make_request(bi);
semaphore_dec(&local_ctx->semaphore);
// Now the concurrent async IO has finished and we interpret the data
[...]
}
Oh boy I really don't know if this is even remotely close. Any hints, as
easy as they may seem to you guys, are really greatly appreciated. I've
never worked with this stuff.
> If you cannot continue because devices are not present or the right size,
> yes you should fail the ctr routine.
Alright!
> If you want to setup /proc or other monitoring stuff, you can use the init
> routine, probably plus some statics, to setup "views" into your module. If
> you want to support multiple instances (and you should), setup a
> /proc/{yourname} directory on the init and then populate it with
> sub-directories every time you create a device.
Okay, I'll try to do this (want to make statistics available via procfs
later on), but one construction site at a time for me.
>> - Can I determine the size the bio in map() will have already in ctr()
>> somehow? Can I assume it will never change if it was once determined?
>> The reason is that for my example I need to make sure the chunk size is
>> a integer multiple of the bio size and I would only like to check this
>> once (in ctr) and not every time (in map).
>
> Block size will not change. The size of requests to you is limited by the
> setup of ti->max_io_len. If you don't set this with recent kernels, you
> will only get 4K, which is not all that efficient. This is actually part
> of another big topic of "stacked limits", which someone could write a book
> on (and I would read it).
So if I would want to do a large I/O operation (say write one megabyte
of data to a block device somewhere within my driver) I'd have to make
lots of calls to generic_make_request?
Thank you so much for your help,
Best regards,
Johannes
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
next prev parent reply other threads:[~2015-06-16 18:54 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-15 18:39 Newbie device mapper questions Johannes Bauer
2015-06-15 19:52 ` Doug Dumitru
2015-06-15 22:20 ` Vivek Goyal
2015-06-16 0:55 ` Minfei Huang
2015-06-16 1:18 ` Minfei Huang
2015-06-16 18:54 ` Johannes Bauer [this message]
2015-06-16 19:37 ` Alasdair G Kergon
2015-06-16 21:05 ` Doug Dumitru
2015-06-16 19:46 ` Alasdair G Kergon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=558070E9.70607@gmx.de \
--to=dfnsonfsduifb@gmx.de \
--cc=dm-devel@redhat.com \
--cc=doug@easyco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.