* Re: [PATCHv4] UBI: new module ubiblk: block layer on top of UBI
From: Artem Bityutskiy @ 2011-09-06 4:55 UTC (permalink / raw)
To: david.wagner; +Cc: linux-mtd, linux-embedded, lkml, Tim Bird, David Woodhouse
In-Reply-To: <1314202540-5562-1-git-send-email-david.wagner@free-electrons.com>
On Wed, 2011-08-24 at 18:15 +0200, david.wagner@free-electrons.com
wrote:
> g checkpatch'd (it still complains about an assignment in an if ()), is it
> really bad ?
Well, UBI does not do this, so for the sake of consistent style you
could fix those warnings.
> about e), I have a question: it was possible to unmount a mounted filesystem
> because ot this. The fix we found was to return -EBUSY if the ubiblk device is
> opened. But is it correct to assume that a mounted filesystem will always hold
> the device opened ? If not, what is the correct way to impeach the removal of a
> ubiblk device when it is "mounted" ?
Once the block device is referenced (refcound is not zero), you should
open the underlying UBI volume (in write mode). And you close it only
when the block device's reference count becomes zero, just like gluebi
does this. So you need to enhance your ubiblk_open() and release
functions.
This should make sure that no one can remove the underlying UBI volume
when the corresponding ubiblk device is being used (e.g., mounted).
> (see ubiblk_remove(), if (dev->desc))
>
> other questions:
> is it ok not to return the error value from the notify function ? I found now
> way to return an error value w/o stopping the notifications from being emitted.
>
> there are error messages printed when a ubiblk_remove() is attempted on a
> non-existing device - that happens a lot when a UBI device is detached ; should
> the messages be debug-only (or even removed) ?
Not sure I got the question, is this an error situation? I see the
following possibilities:
1. UBI volume ubiX_Y is not used by ubiblk at all -> no issues
2. UBI volume ubiX_Y is has corresponding device ubiblkA, which is
unused, so UBI volume can be removed, which causes a notifier,
which removes ubiblkA.
3. ubiX_Y has corresponding ubiblkA, which is used, which means ubiX_Y
is still open, which means it cannot be removed, no issues.
> +/**
> + * ubi_ubiblk_init - initialize the module
> + *
> + * Get a major number and register to UBI notifications ; register the ioctl
> + * handler device
> + */
> +static int __init ubi_ubiblk_init(void)
> +{
> + int ret = 0;
> +
> + ret = register_blkdev(0, "ubiblk");
> + if (ret <= 0) {
> + pr_err("can't register the major\n");
Not very user-friendly message - I do not think it is very readable and
easy to understand. Saying "cannot register block device" would be
readable. But I'd better just not print anything. And the function
internally has some error-path prints.
> + return -ENODEV;
Why do you override the error code returned by 'register_blkdev()'?
> + }
> + ubiblk_major = ret;
> +
> + mutex_init(&devlist_lock);
Isn't it cleaner to initialize the global mutex when declaring it -
there is a macros to do this?
> +
> + ret = misc_register(&ctrl_dev);
> + if (ret) {
> + pr_err("can't register control device\n");
> + goto out_unreg_blk;
> + }
> +
> + ret = ubi_register_volume_notifier(&ubiblk_notifier, 1);
> + if (ret < 0)
> + goto out_unreg_ctrl;
> +
> + pr_info("major=%d\n", ubiblk_major);
Please, print something nicer, e.g.,
"control device major number is %d"
> +
> + return ret;
> +
> +out_unreg_ctrl:
> + misc_deregister(&ctrl_dev);
> +out_unreg_blk:
> + unregister_blkdev(ubiblk_major, "ubiblk");
> +
> + return ret;
> +}
--
Best Regards,
Artem Bityutskiy
^ permalink raw reply
* Re: [PATCHv3] UBI: new module ubiblk: block layer on top of UBI
From: Artem Bityutskiy @ 2011-09-06 4:29 UTC (permalink / raw)
To: Arnd Bergmann
Cc: david.wagner, linux-mtd, linux-embedded, lkml, Tim Bird,
David Woodhouse
In-Reply-To: <1315282208.19067.24.camel@sauron>
On Tue, 2011-09-06 at 07:10 +0300, Artem Bityutskiy wrote:
> On Tue, 2011-09-06 at 06:44 +0300, Artem Bityutskiy wrote:
> > > It's not a dummy bus, in this approach it would be a the bus that gets
> > > used by all ubiblk devices, which is a very common concept by itself.
> > > It's more like the classic understanding of a 'device class' that Greg
> > > wants to see get replaced by bus_types in the kernel.
> >
> > Yes, this sounds OK. Although UBI already has notifiers, so we could
> > just add 2 more events.
>
> Hmm, with notifications the error handling becomes a problem - we want
> the ioctls for creating/removing the block device to be synchronous,
> and, should an error occur, we want to return the error code to the
> user-space. So the existing notifications mechanism does not work well.
... but we can change it a little and have error codes handling, this is
just UB implementation issues.
--
Best Regards,
Artem Bityutskiy
^ permalink raw reply
* Re: [PATCHv3] UBI: new module ubiblk: block layer on top of UBI
From: Artem Bityutskiy @ 2011-09-06 4:10 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-embedded, david.wagner, lkml, linux-mtd, Tim Bird,
David Woodhouse
In-Reply-To: <1315280704.19067.14.camel@sauron>
On Tue, 2011-09-06 at 06:44 +0300, Artem Bityutskiy wrote:
> > It's not a dummy bus, in this approach it would be a the bus that gets
> > used by all ubiblk devices, which is a very common concept by itself.
> > It's more like the classic understanding of a 'device class' that Greg
> > wants to see get replaced by bus_types in the kernel.
>
> Yes, this sounds OK. Although UBI already has notifiers, so we could
> just add 2 more events.
Hmm, with notifications the error handling becomes a problem - we want
the ioctls for creating/removing the block device to be synchronous,
and, should an error occur, we want to return the error code to the
user-space. So the existing notifications mechanism does not work well.
Not sure about the bus approach - David, could you take a look at it
please? If we can handle errors there - then we could indeed re-use the
UBI control device. We could even re-use the ioctl data structures for
UBI volumes creation/removal - we have plenty of space there reserved
for future extensions.
--
Best Regards,
Artem Bityutskiy
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply
* Re: [PATCHv3] UBI: new module ubiblk: block layer on top of UBI
From: Artem Bityutskiy @ 2011-09-06 3:44 UTC (permalink / raw)
To: Arnd Bergmann
Cc: david.wagner, linux-mtd, linux-embedded, lkml, Tim Bird,
David Woodhouse
In-Reply-To: <201108251712.40894.arnd@arndb.de>
Hi, sorry for long delay, did not have time to read my mail.
On Thu, 2011-08-25 at 17:12 +0200, Arnd Bergmann wrote:
> > I think this wasteful. Why should I have block devices which I do not
> > need? If I have 4 UBI volumes, and need only one ubiblk, why should I
> > waste my resources for 3 more of them (e.g., I do not want to waste
> > memory for struct inode for each sysfs entry which these useless block
> > devices will add). Also, will this mean 3 more block devices registered?
> >
> > I think it is much uglier to have 3 "dummy" block devices and confuse
> > users than have one nice control character device. For the sake of not
> > having a separate control chardev?
>
> The cost of a block device node in the kernel is rather low. Nowadays,
> sysfs does not even permanently use inodes for entries, it has a much
> more compact internal representation IIRC.
>
> The main advantage of this approach is not having to set up the
> block device at all, it would just be there, which e.g. makes it
> possible to put a root file system on it or do something else without
> requiring a user space tool to issue an ioctl.
Yes, I understand, but the cost of each block device is not zero, I did
not measure it, but I believe it would be in Kilobytes. Also, littering
sysfs and /dev with useless block devices is not beautiful - why would
a user want to have or see fake and useless block devices? And as David
said, for some UBI volumes we do not want to have block devices at all.
And BTW, mtdblock is uses exactly this technique, and I had to write
many times to different confused people that they should not look at
those "/dev/mtdblockX" devices, as if they did not exist...
IOW, I think automatic creation for all UBI volumes has more drawbacks
than advantages.
> Evidently you can do everything you need even with that user space
> tool, but IMHO the complexity of doing that is way bigger than
> just creating the block devices right away.
Well, it is jut another step to a set of steps one usually needs to do
to attach an MTD device, create an UBI volume, etc.
> It's not a dummy bus, in this approach it would be a the bus that gets
> used by all ubiblk devices, which is a very common concept by itself.
> It's more like the classic understanding of a 'device class' that Greg
> wants to see get replaced by bus_types in the kernel.
Yes, this sounds OK. Although UBI already has notifiers, so we could
just add 2 more events.
--
Best Regards,
Artem Bityutskiy
^ permalink raw reply
* Re: [PATCHv3] UBI: new module ubiblk: block layer on top of UBI
From: David Wagner @ 2011-09-01 12:55 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linux-embedded, dedekind1, lkml, linux-mtd, Tim Bird,
David Woodhouse
In-Reply-To: <201108251712.40894.arnd@arndb.de>
On 08/25/2011 05:12 PM, Arnd Bergmann wrote:
> The cost of a block device node in the kernel is rather low. Nowadays,
> sysfs does not even permanently use inodes for entries, it has a much
> more compact internal representation IIRC.
>
> The main advantage of this approach is not having to set up the
> block device at all, it would just be there, which e.g. makes it
> possible to put a root file system on it or do something else without
> requiring a user space tool to issue an ioctl.
Before patch v3, every existing and new UBI volumes were "proxyfied" by
ubiblk and it was, indeed, one of our goal to be able to have a rootfs
on it. Patch v3 hinders that goal but it could still be achievable by
adding a module parameter that would explicitly create a ubiblk device
for a UBI volume at boot time.
I for one am fine with both solutions (keep the ioctl + add a kernel
parameter and throwing the ioctl away and go back "automatically create
a ubiblk for each UBI volume"). However, I agree that it doesn't make
sense to create a ubiblk device on top of UBI volumes containing, for
instance, a ubifs.
Best Regards,
David.
--
David Wagner, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply
* Re: RFC: [Restatement] KBUS messaging subsystem
From: Tony Ibbs @ 2011-08-29 8:55 UTC (permalink / raw)
To: Bryan Donlan
Cc: Pekka Enberg, lkml, Andrew Morton, Jonathan Corbet,
Florian Fainelli, Grant Likely, Linux-embedded, Tibs at Kynesim,
Richard Watts
In-Reply-To: <CAF_S4t-RV1SACnkwW9RsVWEXv-jHujvnLZh-NMfDZi4YzfJwdw@mail.gmail.com>
On 22 Aug 2011, at 02:15, Bryan Donlan wrote:
> I think this may well be the core problem here - is KBUS, as proposed,
> a general API lots of people will find useful, or is it something that
> will fit _your_ use case well, but other use cases poorly?
Indeed.
And, by the way, thanks a lot for this email, which gives me lots of
specific items to reply to.
> Designing a good API, of course, is quite difficult, but it _must_ be
> done before integrating anything with upstream Linux, as once
> something is merged it has to be supported for decades, even if it
> turns out to be useless for 99% of use cases.
Indeed.
It's only anecdotal evidence, of course, but we have put quite a lot
of thought and testing into KBUS - many things that look as if they
are going to be simple or easy either aren't, or lead to unfortunate
consequences.
> Some good questions to ask might be:
> * Does this system play nice with namespaces?
> * What limits are in place to prevent resource exhaustion attacks?
> * Can libdbus or other such existing message brokers swap out their
> existing central-routing-process based communications with this new
> system without applications being aware?
I'll punt on namespaces, since I don't know the terminology being used
in the kernel (there seem to be several sorts of namespace, which are
essentially independent?).
KBUS at the moment is definitely not doing enough to manage its
resources, as I've said elsewhere. However, having all of the queues
in the same place, under the same management, means that it is a
relatively simple job to enforce an overall limit on the memory usage
of a particular bus, as well as per-queue and per-message limits. This
will (eventually) get mended whether KBUS ends up in the kernel or
not.
As to higher-level messaging systems using KBUS, I think that's a red
herring. For a start, I can't see why they'd necessarily be interested
- presumably if they felt the need for such a kernel module they'd
already have moved to introduce one (as in binder, for instance). If
they haven't, it's presumably because their design works well enough
(for their own aims) without. And of course they'd lose a certain
amount of control if part of their system were kernel-maintained,
which might also be important. But also, it's a significant
development project in itself to try to produce a system suitable to
act as the underpinnings for another system. One can't just say "it
looks as if it might work", one needs to implement it and test it,
because of all the edge cases one is bound not to have thought of.
That's a lot of work, and almost entirely unrelated to producing a
simple, minimal system.
(for what it's worth, and despite that, my gut feeling is that
any useful minimal messaging system could be used as the
bottom-level for a libdbus or equivalent, but I'm still not
convinced it would be worth it, and it would not necessarily give a
better version of the higher level system)
So I'd say if libdbus or whoever *could* use such a system, that would
be nice, but it should not be the primary aim.
> Keep in mind also that the kernel API need not match the
> application-visible API, if you can add a userspace library to
> translate to the API you want.
OK, although that's basically true of all APIs (for instance, the way
I think of KBUS in the privacy of my own head is with the API I use in
the Python library, or how the message queues actually work within
KBUS itself).
If you look at the existing C and C++ APIs, they provide two somewhat
different abstractions. The Javascript APIs we've used in the past
were even further away from the actual kernel APIs (they never
mentioned a particular bus, since that could be inferred from the
message name in that application).
On the other hand, it is incumbent on us to remember that people
programming to the kernel API are users as well. So your implicit
point (as I take it) in this message that one should use familiar
interfaces in a familiar way is a good one. On the other hand, if we
can present the user with a simpler interface (as in simpler to
program with) by a relatively small amount of underlying work, then
that is a net gain - the user is less likely to make mistakes, and the
overall amount of code written will be smaller.
I think there is a general principle at work, in that one should
solve difficult problems once, in one place, if at all possible.
> So, for example, instead of numbering
> kbuses, you could define them as a new AF_UNIX protocol, and place
> them in the abstract socket namespace (ie, they'd have names like
> "\0kbus-0").
Indeed. I'd regard that as cosmetic detail - each KBUS is still
identified by a number, but instead of that number being used in a
device name, it's being used in a socket name.
> Doing something like this avoids creating a new
> namespace, and non-embedded devices could place these new primitives
> in a tmpfs or other more visible location. It also makes it very cheap
> (and a non-privileged operation!) to create kbuses.
Hmm.
The current mechanism for creating new KBUS buses as an unprivileged
user is admittedly via an ioctl, but clearly that should be replaced
by something more modern (the received wisdom on how to use things
like debugfs, for instance, seems to have changed greatly even in
KBUS's short life). It's not something that the current KBUS model
requires one to do often, so cheapness is not a great issue.
But unprivileged is good.
> So, let's look at your requirements:
>
> * Message broadcast API with prefix filtering
> * Deterministic ordering
> * Possible to snoop on all messages being passed through
> * Must not require any kind of central userspace daemon
> * Needs a race-less way of 1) Advertising (and locking) as a replier
> for a particular message type and 2) Detecting when the replier dies
> (and synthesizing error replies in this event)
>
> Now, to minimize this definition, why not remove prefix filtering from
> the kernel? For low-volume buses, it doesn't hurt to do the filtering
> in userspace (right?). If you want to reduce the volume of messages
> received, do it on a per-bus granularity (and set up lots of buses
> instead). After all, you can always connect to multiple buses if you
> need to listen for multiple message types. For replier registration,
> then, it would be done on a per-bus granularity, not a per-message
> granularity.
>
> So we now have an API that might (as an example) look like this:
>
> * Creation of buses - socket(AF_UNIX, SOCK_DGRAM, PROTO_KBUS),
> followed by bind() either to a file or in the abstract namespace
> * Advertising as a replier on a socket - setsockopt(SOL_KBUS,
> KBUS_REPLIER, &one); - returns -EEXIST if a replier is already present
> * Sending/receiving messages - ordinary sendto/recvfrom. If a reply is
> desired, use sendmsg with an ancillary data item indicating a reply is
> desired
> * Notification on replier death (or replier buffer overflow etc):
> empty message with ancillary data attached informing of the error
> condition
> * 64-bit global counter on all messages (or messages where requested
> by the client) to give a deterministic order between messages sent on
> multiple buses (reported via ancillary data)
> * Resource limitation based on memory cgroup or something? Not sure
> what AF_UNIX uses already, but you could probably use the same system.
> * Perhaps support SCM_RIGHTS/SCM_CREDENTIALS transfers as well?
Thanks a lot for concrete call examples - that makes it a lot easier
for me to think things through. I'll try to separate out my comments
into some sort of sensible sequence. Forgive me if I miss something.
Current scheme
--------------
In the current system, message sending looks something like the
following:
1. Sender opens bus 0
2. Sender creates a message with name "Fred"
3. Sender may mark the message as needing a reply.
4. Sender writes the message and sends it (these are currently two
operations, but as was mentioned upstream, could be combined - we
just liked them better apart).
5. If the message needs a reply, KBUS checks if the sender has enough
space in its queue to receive a reply, and if not, rejects it
5. KBUS assigns the next message id for bus 0 to the message
6. KBUS determines who should receive the message. If the message
needs a reply, and no-one has bound as replier, then the send
fails. Similarly, if the replier does not have room in their queue,
the send fails.
7. Otherwise, KBUS copies the message to the queues of everyone who
should receive it. If the message needs a reply, then the
header of the particular message that needs a reply will be altered
to indicate this.
At the recipient end, the sequence is something like:
1. Listener opens bus 0.
2. Listener chooses to receive messages with a particular name,
possibly as a replier.
3. Listener determines if there is a next message, and if so, its
length.
4. Listener allocates a buffer to receive the message.
5. Listener reads the message into the buffer.
The recipient is guaranteed to read messages in the order they were
sent in, and to only get the messages they asked for.
Sockety scheme
--------------
In the scheme where we're just replacing the calls with appropriate
"sockety" calls, and not altering message name filtering, this
presumably proceeds in a very similar manner, except that we are using
the equivalent sockety calls.
My first question would be how the recipient is meant to tell the
length of the next message before doing their recvfrom/recvmsg.
I realise (now) that "message data may vary in length" wasn't
mentioned up front as a requirement (although I'd aver that it is
pretty evident from the documentation, and from the API proposed, that
this is meant, else why do we have NEXTMSG returning the length of the
next message?). But then I'd never imagined that someone wouldn't
assume this as a property of a general messaging system (after all,
it is clearly simple to build a fixed length message system on top of
a variable length message system, and rather harder to do the
reverse).
Can one use MSG_PEEK to retrieve just the ancillary data? It's not
clear to me from the recvmsg man page. If message data length was sent
in the ancillary data, then this could work. If one can't do that,
perhaps one could use MSG_PEEK to look at the start of the message
proper (although that feels like a horrible hack). Is there precedent
for this?
Otherwise, we're reduced to a special call of getsockopt, or, worse,
separating all messages into a standard sized header message followed
by the message data - but that way lies insanity.
There's also a decision to be made of what does get put into ancillary
data. At one extreme, all of the message header data would be treated
as ancillary data, which means that the user would need to use
sendmsg/recvmsg all of the time. That's a lot more code complexity,
and a lot more allocations. At the other extreme, we don't use
ancillary data at all, in which case we keep the header more-or-less
as is. There is the whole issue of whether message name and data are
referred to as pointers from the header, or are part of the same
buffer (there's some discussion of this in the KBUS documentation,
where it talks about "pointy" and "entire" messages). But that's a
level of detail for later, if necessary.
Also, if we're using ancillary data, can we use a socket specific
method to identify the message sender, something one can feed straight
back into sendmsg (hmm, maybe not - a quick scan around suggests that
AF_UNIX only has SCM_CREDENTIALS and SCM_RIGHTS - maybe I've not
looked hard enough).
(KBUS's current sender id is nice and simple, but I'd assumed there
must be some sockety equivalent we should be using...)
Regardless, we have an API comparison something like:
======================== =============================
File-oriented Socket-oriented
======================== =============================
open socket
close close
write [and <send> ioctl] sendmsg or sendto
<nextmsg> ioctl not clear - getsockopt? peek?
read recvmsg or recvfrom
<bind> ioctl setsockopt
<unbind> ioctl setsockopt
poll poll
======================== =============================
There are also various ioctls on the file-oriented side that would
clearly be replaced by get/setsockopt, and more that should be direct
instructions to KBUS via debugfs or something (i.e., they should never
have been ioctls in the first place, if I'd know what to do instead).
> This is a much simpler kernel API, don't you think?
I think we mean different things by that.
Replacing read/write (which are, let's face it, quite simple to use,
and just about every C programmer can get them mostly right) with
sendmsg/recvmsg (which are some of the most complicated calls to use
in the socket world, and for which most easy to find examples are
about moving file descriptors between processes) does not seem to me
to be simplifying anything.
I must admit I'm also not entirely sure why get/setsockopt calls are
*that* much better than ioctls (they do at least specify a length, and
the number of existing options is smaller, but neither of those seems
an obvious win to me).
Regardless, though, assuming the message length problem can be sorted
out (and that's obviously possible by *some* means), it is clearly
feasible to replace one API with another, and I assume one could move
the innards of the current KBUS to talk to the new interface.
Filtering in userspace
----------------------
You suggest performing message filtering in userspace, by reading all
messages and "throwing away" those which are not of interest. This is
predicated on the idea that the data is low volume. Apart from the
fact that I'm not sure what low volume means (are we contrasting
with network traffic on an STB handling audio/video?), we've tried not
to assume anything much about the amount of traffic over KBUS, or the
number of senders/listeners. Granted I personally wouldn't recommend
sending very large messages (I'm doubtful of the sanity of anything
over a few MB, myself, although KBUS will cope with multiple page
messages - albeit rather slowly), or expecting KBUS to be fast
(whatever "fast" means), but I'm reluctant to put those assumptions
into the design.
The inside of KBUS would indeed be slightly simpler if it did not
perform the message filtering (and this is substantially unoptimised
at the moment). However, if the client receives all of the messages,
that's an awful lot more copying being done. Within the kernel module,
message content is reference counted, but as data goes across the
kernel-to-userspace boundary, all of it gets copied. In the current
system, one can happily send a message knowing that it will not get
sent to recipients who do not care, and thus not worry much about the
cost in CPU, memory and so on. In the non-filtering system, such
concerns would need to be a major issue (spamming many clients with a
single large message that they are going to ignore could be a very big
deal, and would be relatively hard to defend against).
You also suggest splitting buses up into a finer granularity, in the
hope that this would cause less userspace filtering to be necessary.
I'm uncomfortable with that suggestion because it is just that, a
suggestion as to how the user might do things. It doesn't address the
problem in a technical manner at all.
I'd also note that there is virtue in having the unsplit buses. In the
current system, it is sensible to say that all messages for one task
will be on bus 0, and messages for another task on bus 1, and the two
cannot interact. One has, if you will, multiple message namespaces. It
makes sense to say that a program will only send messages on bus 0,
without needing to list the messages. In the proposed new system, a
single task will typically need to span multiple buses, and we've lost
a useful distinction. Thus the original approach is a win for
documentation and pedagogy, if nothing else.
Replier buses
-------------
Putting replier registration on a bus basis. Hmm. So a recipient would
"bind" to a bus as replier *for that bus". Would all messages on that
bus be seen as requests by the replier? I think it would have to be
so, if only because marking messages as requests *as well* leads to
all sorts of possible confusions.
What if the recipient were monitoring messages as well, so it would
also want to "just receive" the requests? Presumably it would have to
open another connection to the bus to receive requests as ordinary
messages. OK.
Meanwhile, the sender presumably has to indicate that this bus is a
replier bus, with an appropriate setsockopt call. Note that this means
that everyone needs to know beforehand the id of that bus, and we are
getting perilously closer to needing some sort of manager of bus
ids/names (this makes me uncomfortable), or having a formalism about
how buses are named (ditto).
So sending now looks more like:
1. Sender opens bus 0
2. If this is to be a replier bus, sender marks it as such, via
setsockopt. Definitely not via an ioctl.
Note that we can't check for someone registered as a replier at
this point, as they might reasonably not have connected to the bus
yet.
3. Sender creates a message
4. Sender sends the message.
As before, in the sockety manner.
5. If this is a replier bus, then KBUS checks to see if the sender has
enough room to receive a reply on it.
6. KBUS assigns the next message id to the message
In current KBUS, the message id is unique on each bus, and buses
are isolated from each other. That doesn't work now, because we
need the recipient to be able to reconstruct message ordering
across buses. So the new id generation mechanism needs to be
bus-independent. Using a 64-bit id should probably give us at least
the id "granularity" that the current 32-bit id does.
7. If this is a replier bus, then KBUS checks to see if there is a
replier bound to it (thus, leaving it as late as possible, and
giving the most chance the replier will be there). If no-one has
bound as replier, then the send fails. Similarly, if the replier
does not have room in their queue, the send fails.
8. KBUS copies the message to the queues of everyone who has bound to
receive it.
At the recipient end, the sequence is then presumably something like:
1. Listener opens bus 0.
2. Listener possibly chooses to be a replier for that bus, using
setsockopt. It is an error if there is already a replier for the
bus.
Is it an error to bind as a replier on a bus that is not marked as
such? I can't see that it can be, because otherwise we would have
to fail with the race condition where:
a. Listener opens bus
b. Sender opens bus
c. Listener binds to bus as replier
d. Sender tells bus it is a replier bus
So I think we have to allow a replier bound on a non-replier bus -
they'd just never get any messages on it. Which means KBUS has to
make sure not to send ordinary messages to a listener on a bus
they've bound to as replier.
(For a world of pain, invent a getsockopt option to check if a
bus is marked as a replier bus, and wait for it to get set, and
*then* bind as replier. But I wouldn't want to advise it.)
That all feels rather messy, and is really one of the sorts of
reason we went with just marking messages and leaving buses as
message agnostic transports.
3. Listener determines if there is a next message, and if so, its
length.
Again, as said before, I'm not sure how this would be done.
4. Listener allocates the appropriate number of buffers to receive the
message.
5. Listener reads the message into the buffers.
6. If the message was read via a socket with the "replier" socket
option set (one assumes the recipient remembers this), then the
listener needs to send a reply over that same socket.
Conceptually, this does look like it would work (subject to the
binding nastiness mentioned), and it's clearly more-or-less a dual of
the approach we've already taken. If we were splitting current KBUS
buses into finer granularities, it would be a reasonable approach to
consider.
The problem with splitting related messages over many buses
-----------------------------------------------------------
The trouble is that, whilst the recipient is guaranteed to receive
messages *on a given bus* in the correct order, this is no longer
sufficient, as the order we care about is now split over multiple
buses.
You propose that the recipient should reassemble the message order.
This is clearly possible if they are receiving all messages, but at
the cost of having to keep a list (potentially a very long list) of
message ids received and outstanding, and only "releasing" a message
when all preceding message ids have been encountered. A colleague's
comment on this was that we should not be reimplementing TCP/IP in
userspace. I'd just say that this is a non-trivial problem, and if
every recipient has to do it, a potential burden on the performance of
the whole system (especially if we're talking many buses), so it
should be done at the place that causes fewest reimplementations,
i.e., in the kernel module. Which puts us back where we were.
(Obviously, if the recipient is not getting *all* messages, then this
problem is unsolvable, since it cannot know which message ids are
missing - i.e., if it receives messages 5, 9 and 7, it has no way of
knowing whether it should also have received message 8.)
> In short, API minimalism is key to acceptance in the upstream kernel.
> Try to pare down the core API to the bare minimum to get what you
> need, rather than implementing your final use case directly into the
> kernel using ioctls or whatnot.
Hmm. As I recall, when starting KBUS development we said "what's the
simplest API we can present to the user to do the job", at the same
time as asking "what's the simplest set of functionalities that we
need to provide". So, in a very real sense, we did start by trying to
pare down the core API.
Of course, that same aim led us to reject trying to force sockets to
do the job just because "sockets are used for messaging". Not that
they always are, of course - one doesn't classically communicate with
DSPs over sockets, for instance.
> Thanks,
> Bryan
Thanks again,
Tibs
^ permalink raw reply
* Re: [PATCHv3] UBI: new module ubiblk: block layer on top of UBI
From: Arnd Bergmann @ 2011-08-25 15:12 UTC (permalink / raw)
To: dedekind1
Cc: linux-embedded, david.wagner, lkml, linux-mtd, Tim Bird,
David Woodhouse
In-Reply-To: <1314256010.18988.18.camel@sauron>
On Thursday 25 August 2011, Artem Bityutskiy wrote:
> On Wed, 2011-08-24 at 18:23 +0200, Arnd Bergmann wrote:
> > That should be fine, yes. I would probably put them into the same
> > header file though if they are in the same number space even
> > when you use them on distinct devices.
> >
> > It does feel a little clumsy to have yet another character device
> > to manage the block devices though. What do you think about one
> > of these alternative approaches:
> >
> > * When the ubi block device driver gets loaded, create one block
> > device per volume and let the user deal with permissions for
> > the devices instead of having to first create them as well.
>
> I think this wasteful. Why should I have block devices which I do not
> need? If I have 4 UBI volumes, and need only one ubiblk, why should I
> waste my resources for 3 more of them (e.g., I do not want to waste
> memory for struct inode for each sysfs entry which these useless block
> devices will add). Also, will this mean 3 more block devices registered?
>
> I think it is much uglier to have 3 "dummy" block devices and confuse
> users than have one nice control character device. For the sake of not
> having a separate control chardev?
The cost of a block device node in the kernel is rather low. Nowadays,
sysfs does not even permanently use inodes for entries, it has a much
more compact internal representation IIRC.
The main advantage of this approach is not having to set up the
block device at all, it would just be there, which e.g. makes it
possible to put a root file system on it or do something else without
requiring a user space tool to issue an ioctl.
Evidently you can do everything you need even with that user space
tool, but IMHO the complexity of doing that is way bigger than
just creating the block devices right away.
> > * Use the existing UBI control device for the block devices as
> > well and just add two more ioctls to create the devices.
> > You can add a logical bus_type for this so that the ubi block
> > driver gets automatically loaded matched with the device when
> > one is created using the control device.
>
> This sounds better IMHO, but I am still not sure that adding another
> dummy bus and exposing it in sysfs and more complexity in the ubiblk
> code is more elegant and less wasteful than just creating a separate
> chardev...
It's not a dummy bus, in this approach it would be a the bus that gets
used by all ubiblk devices, which is a very common concept by itself.
It's more like the classic understanding of a 'device class' that Greg
wants to see get replaced by bus_types in the kernel.
Arnd
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply
* Re: [PATCHv3] UBI: new module ubiblk: block layer on top of UBI
From: Artem Bityutskiy @ 2011-08-25 7:06 UTC (permalink / raw)
To: Arnd Bergmann
Cc: david.wagner, linux-mtd, linux-embedded, lkml, Tim Bird,
David Woodhouse
In-Reply-To: <201108241823.20904.arnd@arndb.de>
Hi Arnd,
On Wed, 2011-08-24 at 18:23 +0200, Arnd Bergmann wrote:
> That should be fine, yes. I would probably put them into the same
> header file though if they are in the same number space even
> when you use them on distinct devices.
>
> It does feel a little clumsy to have yet another character device
> to manage the block devices though. What do you think about one
> of these alternative approaches:
>
> * When the ubi block device driver gets loaded, create one block
> device per volume and let the user deal with permissions for
> the devices instead of having to first create them as well.
I think this wasteful. Why should I have block devices which I do not
need? If I have 4 UBI volumes, and need only one ubiblk, why should I
waste my resources for 3 more of them (e.g., I do not want to waste
memory for struct inode for each sysfs entry which these useless block
devices will add). Also, will this mean 3 more block devices registered?
I think it is much uglier to have 3 "dummy" block devices and confuse
users than have one nice control character device. For the sake of not
having a separate control chardev?
> * Use the existing UBI control device for the block devices as
> well and just add two more ioctls to create the devices.
> You can add a logical bus_type for this so that the ubi block
> driver gets automatically loaded matched with the device when
> one is created using the control device.
This sounds better IMHO, but I am still not sure that adding another
dummy bus and exposing it in sysfs and more complexity in the ubiblk
code is more elegant and less wasteful than just creating a separate
chardev...
--
Best Regards,
Artem Bityutskiy
^ permalink raw reply
* [PATCH] Tools for controling ubiblk
From: David Wagner @ 2011-08-24 16:43 UTC (permalink / raw)
To: David Wagner, linux-mtd; +Cc: Artem Bityutskiy, linux-embedded, David Wagner
ubiblk_ctrl sends an appropriate ioctl to ubiblk control node.
ubiblkadd and ubiblkdel are wrapper around ubiblk_ctrl.
The syntax is:
ubiblk{add,del} x y
where x is the UBI device number and y the volume ID.
Signed-off-by: David Wagner <david.wagner@free-electrons.com>
---
Hi
This tool is intended to add and remove ubiblk devices on top of specified UBI
volumes.
ubiblk is a new module proposal (WIP) ; full description can be found at
https://lkml.org/lkml/2011/8/24/244
changes since v1:
~~~~~~~~~~~~~~~~~
- update ubiblk-user.h from the latest version of ubiblk (PATCHv4) and copy it
from <linux sources>/usr/include/mtd/ubi/
- more foolproof and correct stupid mistakes
Regards,
David.
PS: I had troubles integrating it in buildroot: they use the 1.4.6 version or
mtd-utils (latest to date of writing) but 1.4.6 doesn't contain the build system
rework. However, the fix is trivial and when the next mtd-utils version gets
released, that won't be a problem anymore.
Makefile | 6 ++-
include/mtd/ubiblk-user.h | 46 +++++++++++++++++++
ubi-utils/ubiblk_ctrl.c | 106 +++++++++++++++++++++++++++++++++++++++++++++
ubi-utils/ubiblkadd | 2 +
ubi-utils/ubiblkdel | 2 +
5 files changed, 161 insertions(+), 1 deletions(-)
create mode 100644 include/mtd/ubiblk-user.h
create mode 100644 ubi-utils/ubiblk_ctrl.c
create mode 100755 ubi-utils/ubiblkadd
create mode 100755 ubi-utils/ubiblkdel
diff --git a/Makefile b/Makefile
index 5a0044b..e7856d2 100644
--- a/Makefile
+++ b/Makefile
@@ -27,12 +27,16 @@ MTD_BINS = \
sumtool #jffs2reader
UBI_BINS = \
ubiupdatevol ubimkvol ubirmvol ubicrc32 ubinfo ubiattach \
- ubidetach ubinize ubiformat ubirename mtdinfo ubirsvol
+ ubidetach ubinize ubiformat ubirename mtdinfo ubirsvol \
+ ubiblk_ctrl
+UBI_SCRIPTS = \
+ ubiblkadd ubiblkdel
BINS = $(MTD_BINS)
BINS += mkfs.ubifs/mkfs.ubifs
BINS += $(addprefix ubi-utils/,$(UBI_BINS))
SCRIPTS = flash_eraseall
+SCRIPTS += $(addprefix ubi-utils/,$(UBI_SCRIPTS))
TARGETS = $(BINS)
TARGETS += lib/libmtd.a
diff --git a/include/mtd/ubiblk-user.h b/include/mtd/ubiblk-user.h
new file mode 100644
index 0000000..229d2da
--- /dev/null
+++ b/include/mtd/ubiblk-user.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright © Free Electrons, 2011
+ * Copyright © International Business Machines Corp., 2006
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
+ * the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * Author: David Wagner
+ * Some code taken from ubi-user.h
+ */
+
+#ifndef __UBIBLK_USER_H__
+#define __UBIBLK_USER_H__
+
+#include <linux/types.h>
+
+/**
+ * ubiblk_ctrl_req - additional ioctl data structure
+ * @ubi_num: UBI device number
+ * @vol_id: UBI volume identifier
+ */
+struct ubiblk_ctrl_req {
+ __s32 ubi_num;
+ __s32 vol_id;
+} __packed;
+
+/* ioctl commands of the UBI control character device */
+#define UBIBLK_CTRL_IOC_MAGIC 'O'
+
+/* Create a ubiblk device from a UBI volume */
+#define UBIBLK_IOCADD _IOW(UBIBLK_CTRL_IOC_MAGIC, 0x10, struct ubiblk_ctrl_req)
+/* Delete a ubiblk device */
+#define UBIBLK_IOCDEL _IOW(UBIBLK_CTRL_IOC_MAGIC, 0x11, struct ubiblk_ctrl_req)
+
+#endif
diff --git a/ubi-utils/ubiblk_ctrl.c b/ubi-utils/ubiblk_ctrl.c
new file mode 100644
index 0000000..7b309aa
--- /dev/null
+++ b/ubi-utils/ubiblk_ctrl.c
@@ -0,0 +1,106 @@
+/*
+ * Copyright (c) Free Electrons, 2011
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
+ * the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * Author: David Wagner
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <libgen.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <mtd/ubiblk-user.h>
+
+#define PROGRAM_NAME "ubiblk_ctrl"
+#include <common.h>
+
+#define CONTROL_NODE "/dev/ubiblk_ctrl"
+
+void usage(char *exec_path)
+{
+ fprintf(stderr, "%s <-a|-d> <-u UBI_DEVICE_NUMBER> <-v VOLUME_ID>\n",
+ basename(exec_path));
+}
+
+int main(int argc, char *argv[])
+{
+ int ubi_num = -1, vol_id = -1;
+ int fd;
+ struct ubiblk_ctrl_req req;
+ int ret, err = 0;
+ int command = 0;
+
+ int option;
+
+ while ((option = getopt(argc, argv, "adu:v:")) != -1) {
+ switch (option) {
+ case 'a':
+ command = UBIBLK_IOCADD;
+ break;
+ case 'd':
+ command = UBIBLK_IOCDEL;
+ break;
+ case 'u':
+ ubi_num = simple_strtol(optarg, &err);
+ if (err) {
+ usage(argv[0]);
+ return EXIT_FAILURE;
+ }
+ break;
+ case 'v':
+ vol_id = simple_strtol(optarg, &err);
+ if (err) {
+ usage(argv[0]);
+ return EXIT_FAILURE;
+ }
+ break;
+ }
+ }
+
+ if (command != UBIBLK_IOCDEL && command != UBIBLK_IOCADD) {
+ usage(argv[0]);
+ return EXIT_FAILURE;
+ }
+ if (vol_id == -1 || vol_id == -1) {
+ usage(argv[0]);
+ return EXIT_FAILURE;
+ }
+
+ req.ubi_num = ubi_num;
+ req.vol_id = vol_id;
+
+ fd = open(CONTROL_NODE, O_RDONLY);
+ if (fd == -1) {
+ fprintf(stderr, "Error while opening the control node: %s\n",
+ strerror(errno));
+ return EXIT_FAILURE;
+ }
+
+ ret = ioctl(fd, command, &req);
+ if (ret == -1) {
+ fprintf(stderr, "Error while ioctl: %s\n", strerror(errno));
+ close(fd);
+ return EXIT_FAILURE;
+ }
+ close(fd);
+ return EXIT_SUCCESS;
+}
diff --git a/ubi-utils/ubiblkadd b/ubi-utils/ubiblkadd
new file mode 100755
index 0000000..92f9992
--- /dev/null
+++ b/ubi-utils/ubiblkadd
@@ -0,0 +1,2 @@
+#! /bin/sh
+ubiblk_ctrl -a -u $1 -v $2
diff --git a/ubi-utils/ubiblkdel b/ubi-utils/ubiblkdel
new file mode 100755
index 0000000..4c7660d
--- /dev/null
+++ b/ubi-utils/ubiblkdel
@@ -0,0 +1,2 @@
+#! /bin/sh
+ubiblk_ctrl -d -u $1 -v $2
--
1.7.0.4
^ permalink raw reply related
* Re: [PATCHv3] UBI: new module ubiblk: block layer on top of UBI
From: Arnd Bergmann @ 2011-08-24 16:23 UTC (permalink / raw)
To: dedekind1
Cc: david.wagner, linux-mtd, linux-embedded, lkml, Tim Bird,
David Woodhouse
In-Reply-To: <1313998939.2644.52.camel@sauron>
On Monday 22 August 2011, Artem Bityutskiy wrote:
>
> On Wed, 2011-08-17 at 15:17 +0200, david.wagner@free-electrons.com
> wrote:
> > Questions:
> > ==========
> > I wasn't sure what magic ioctl number to use, so I settled to use the same one
> > as a part of UBI: 'O', which was so far only used by UBI but on a higher range
> > and leaving some room for UBI to add ioctls (for nw, it uses 'O'/0x00-0x06 and
> > ubiblk uses 'O'/0x10-0x11). Is it ok or should ubiblk use a different
> > number/range ?
>
> I think this is OK to share them between UBI and ubiblk, as long as this
> is documented.
That should be fine, yes. I would probably put them into the same
header file though if they are in the same number space even
when you use them on distinct devices.
It does feel a little clumsy to have yet another character device
to manage the block devices though. What do you think about one
of these alternative approaches:
* When the ubi block device driver gets loaded, create one block
device per volume and let the user deal with permissions for
the devices instead of having to first create them as well.
* Use the existing UBI control device for the block devices as
well and just add two more ioctls to create the devices.
You can add a logical bus_type for this so that the ubi block
driver gets automatically loaded matched with the device when
one is created using the control device.
Arnd
^ permalink raw reply
* [PATCH] document ubiblk's usage of the same ioctl magic as a part of UBI
From: David Wagner @ 2011-08-24 16:21 UTC (permalink / raw)
To: linux-mtd
Cc: Artem Bityutskiy, linux-embedded, lkml, Tim Bird, David Woodhouse,
David Wagner
In-Reply-To: <1314202540-5562-1-git-send-email-david.wagner@free-electrons.com>
---
include/mtd/ubi-user.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/include/mtd/ubi-user.h b/include/mtd/ubi-user.h
index 3c41097..7c8bfe3 100644
--- a/include/mtd/ubi-user.h
+++ b/include/mtd/ubi-user.h
@@ -188,6 +188,7 @@
/* Set an UBI volume property */
#define UBI_IOCSETVOLPROP _IOW(UBI_VOL_IOC_MAGIC, 6, \
struct ubi_set_vol_prop_req)
+/* If you add ioctls here, please note that ubiblk uses 'O'/0x10-0x11 */
/* Maximum MTD device name length supported by UBI */
#define MAX_UBI_MTD_NAME_LEN 127
--
1.7.0.4
^ permalink raw reply related
* [PATCHv4] UBI: new module ubiblk: block layer on top of UBI
From: david.wagner @ 2011-08-24 16:15 UTC (permalink / raw)
To: linux-mtd
Cc: Artem Bityutskiy, linux-embedded, lkml, Tim Bird, David Woodhouse,
David Wagner
In-Reply-To: <1308922482-14967-1-git-send-email-david.wagner@free-electrons.com>
From: David Wagner <david.wagner@free-electrons.com>
ubiblk is a read-only block layer on top of UBI. It presents UBI volumes as
read-only block devices (named ubiblk_X_Y, where X is the UBI device number
and Y the Volume ID).
It is used by putting a block filesystem image on a UBI volume, creating the
corresponding ubiblk device and then mounting it.
It uses the UBI API to register to UBI notifications and to read from the
volumes. It also creates a ubiblk_ctrl device node that simply receives ioctl
from a userspace tool for creating/removing ubiblk devices.
Some code is taken from mtd_blkdevs and gluebi. Some code for the ioctl part is
also inspired from ubi's core.
Advantages of ubiblk over gluebi+mtdblock_ro:
* Simpler architecture
* The numbering of devices is much easier with ubiblk than with
gluebi+mtdblock_ro. With gluebi+mtdblock_ro, you get one additional MTD
device for each UBI volume, so the number of MTD devices grows quite a lot
and is a bit difficult to understand. For example, mtdblock[0-4] might be
your real MTD partitions, while mtdblock[5-9] might be your UBI volumes.
It also means that if a new real MTD partition is added, the number of all
the MTD devices exposing UBI volumes will be incremented by one, which is a
bit confusing/annoying.
As well, if you add an UBI volume, the mtdblock devices that are emulated on
top of volumes that come after this new one will have their ID incremented.
* ubiblk devices are created on a 'on-demand' basis, which avoids wasting
resources.
* The performance appears to be slightly better with ubiblk than
gluebi+mtdblock_ro, according to our benchmarks (see
http://elinux.org/Flash_Filesystem_Benchmarks_2.6.39)
TODO:
* the modules keeps a table of the devices which length is the maximum number
of UBI volumes. There should be a better solution (linked list or, as
Christoph Hellwig suggests, a radix tree (idr)).
Signed-off-by: David Wagner <david.wagner@free-electrons.com>
---
Hi,
changes since v3
~~~~~~~~~~~~~~~~
a renamed *thread to *req_task
b kerneldoc comments
c fix ubiblk-user.h integration in Kbuild
d use dev_* instead of pr_*
e fix a bug: it was possible to remove a opened device
f use linked lists instead of static table
even if idr provides an address space for minor numbers, we found no way to
use that feature: when we need to get the device from the idr structure, we
only have a ubi_volume_info at hand, so we don't know the minor number. A
solution might be, instead of getting a minor number from idr, to decide of
one (for instance, ubi_num * UBI_MAX_VOLUMES + vol_id, but if they were more
than one UBI device, that would make the idr structure grow needlessly.
Plus, there are few chances that the number of ubiblk devices be high, so
it's probably not worth the pain.
However, if we have time, we may be interessed to see how it could benefit
UBI.
g checkpatch'd (it still complains about an assignment in an if ()), is it
really bad ?
about e), I have a question: it was possible to unmount a mounted filesystem
because ot this. The fix we found was to return -EBUSY if the ubiblk device is
opened. But is it correct to assume that a mounted filesystem will always hold
the device opened ? If not, what is the correct way to impeach the removal of a
ubiblk device when it is "mounted" ?
(see ubiblk_remove(), if (dev->desc))
other questions:
is it ok not to return the error value from the notify function ? I found now
way to return an error value w/o stopping the notifications from being emitted.
there are error messages printed when a ubiblk_remove() is attempted on a
non-existing device - that happens a lot when a UBI device is detached ; should
the messages be debug-only (or even removed) ?
Best regards,
David.
Documentation/ioctl/ioctl-number.txt | 1 +
drivers/mtd/ubi/Kconfig | 16 +
drivers/mtd/ubi/Makefile | 1 +
drivers/mtd/ubi/ubiblk.c | 697 ++++++++++++++++++++++++++++++++++
include/mtd/Kbuild | 1 +
include/mtd/ubiblk-user.h | 47 +++
6 files changed, 763 insertions(+), 0 deletions(-)
create mode 100644 drivers/mtd/ubi/ubiblk.c
create mode 100644 include/mtd/ubiblk-user.h
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 845a191..b24df7f 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -150,6 +150,7 @@ Code Seq#(hex) Include File Comments
'M' 00-0F drivers/video/fsl-diu-fb.h conflict!
'N' 00-1F drivers/usb/scanner.h
'O' 00-06 mtd/ubi-user.h UBI
+'O' 10-11 mtd/ubiblk-user.h ubiblk
'P' all linux/soundcard.h conflict!
'P' 60-6F sound/sscape_ioctl.h conflict!
'P' 00-0F drivers/usb/class/usblp.c conflict!
diff --git a/drivers/mtd/ubi/Kconfig b/drivers/mtd/ubi/Kconfig
index 4dcc752..977934a 100644
--- a/drivers/mtd/ubi/Kconfig
+++ b/drivers/mtd/ubi/Kconfig
@@ -60,4 +60,20 @@ config MTD_UBI_DEBUG
help
This option enables UBI debugging.
+config MTD_UBI_UBIBLK
+ tristate "Read-only block transition layer on top of UBI"
+ help
+ Read-only block interface on top of UBI.
+
+ This option adds ubiblk, which creates a read-ony block device from
+ UBI volumes. It makes it possible to use R/O block filesystems on
+ top of UBI volumes (and hence, on top of MTDs while avoiding bad
+ blocks).
+
+ ubiblk devices are created by invoking appropriate ioctl to the
+ ubiblk_ctrl device node.
+
+ The devices are named ubiblkX_Y where X is the UBI number and Y is
+ the Volume ID.
+
endif # MTD_UBI
diff --git a/drivers/mtd/ubi/Makefile b/drivers/mtd/ubi/Makefile
index c9302a5..354b2df 100644
--- a/drivers/mtd/ubi/Makefile
+++ b/drivers/mtd/ubi/Makefile
@@ -5,3 +5,4 @@ ubi-y += misc.o
ubi-$(CONFIG_MTD_UBI_DEBUG) += debug.o
obj-$(CONFIG_MTD_UBI_GLUEBI) += gluebi.o
+obj-$(CONFIG_MTD_UBI_UBIBLK) += ubiblk.o
diff --git a/drivers/mtd/ubi/ubiblk.c b/drivers/mtd/ubi/ubiblk.c
new file mode 100644
index 0000000..6758f00
--- /dev/null
+++ b/drivers/mtd/ubi/ubiblk.c
@@ -0,0 +1,697 @@
+/*
+ * Copyright (c) Free Electrons, 2011
+ * Copyright (c) International Business Machines Corp., 2006
+ * Copyright © 2003-2010 David Woodhouse <dwmw2@infradead.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
+ * the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * Author: David Wagner
+ * Some code taken from gluebi.c
+ * (Artem Bityutskiy (Битюцкий Артём), Joern Engel)
+ * Some code taken from cdev.c (Artem Bityutskiy (Битюцкий Артём))
+ * Some code taken from mtd_blkdevs.c (David Woodhouse)
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/vmalloc.h>
+#include <linux/mtd/ubi.h>
+#include <linux/blkdev.h>
+#include <linux/miscdevice.h>
+#include <linux/kthread.h>
+#include <linux/mutex.h>
+#include <linux/list.h>
+#include <mtd/ubiblk-user.h>
+#include "ubi.h"
+
+#define BLK_SIZE 512
+
+#define UBIBLK_MAX_DEVS (UBI_MAX_DEVICES * UBI_MAX_VOLUMES)
+
+/**
+ * struct ubiblk_dev - represents a ubiblk device, proxying a UBI volume
+ * @desc: open UBI volume descriptor
+ * @vi: UBI volume information
+ * @ubi_num: UBI device number
+ * @vol_id: UBI volume identifier
+ * @gd: the disk (block device) created by ubiblk
+ * @rq: the request queue to @gd
+ * @req_task: the thread processing @rq requests
+ * @vol_lock: protects write access to the elements of this structure
+ * @queue_lock: avoids concurrent accesses to the request queue
+ * @list: linked list structure
+ */
+struct ubiblk_dev {
+ struct ubi_volume_desc *desc;
+ struct ubi_volume_info *vi;
+ int ubi_num;
+ int vol_id;
+
+ struct gendisk *gd;
+ struct request_queue *rq;
+ struct task_struct *req_task;
+
+ struct mutex vol_lock;
+
+ spinlock_t queue_lock;
+
+ struct list_head list;
+};
+
+/* Linked list of all ubiblk_dev instances */
+static LIST_HEAD(ubiblk_devs);
+
+/* Avoid concurrent access to the above list */
+static struct mutex devlist_lock;
+
+static int ubiblk_major;
+static const struct block_device_operations ubiblk_ops;
+
+/* The device receiving the ioctls */
+static struct miscdevice ctrl_dev;
+
+static struct ubiblk_dev *ubiblk_find_dev(struct ubi_volume_info *vi)
+{
+ struct ubiblk_dev *dev;
+ struct list_head *list_ptr;
+
+ /* TODO: use list_for_each_entry ? */
+ list_for_each(list_ptr, &ubiblk_devs) {
+ dev = list_entry(list_ptr, struct ubiblk_dev, list);
+ if (dev && dev->ubi_num == vi->ubi_num &&
+ dev->vol_id == vi->vol_id)
+ break;
+ dev = NULL;
+ }
+ return dev;
+}
+
+/**
+ * do_ubiblk_request - Read a LEB and fill the request buffer with the
+ * requested sector
+ *
+ * @req: the request data structure
+ * @dev: the ubiblk device on which the request is issued
+ */
+static int do_ubiblk_request(struct request *req, struct ubiblk_dev *dev)
+{
+ unsigned long start, len, read_bytes;
+ int offset;
+ int leb;
+ int ret;
+
+ start = blk_rq_pos(req) << 9;
+ len = blk_rq_cur_bytes(req);
+ read_bytes = 0;
+
+ /* We are always reading. No need to handle writing for now */
+
+ leb = start / dev->vi->usable_leb_size;
+ offset = start % dev->vi->usable_leb_size;
+
+ do {
+ if (offset + len > dev->vi->usable_leb_size)
+ len = dev->vi->usable_leb_size - offset;
+
+ if (unlikely(blk_rq_pos(req) + blk_rq_cur_sectors(req) >
+ get_capacity(req->rq_disk))) {
+ dev_err(disk_to_dev(dev->gd),
+ "attempting to read too far\n");
+ return -EIO;
+ }
+
+ /* Read (len) bytes of LEB (leb) from (offset) and put the
+ * result in the buffer given by the request.
+ * If the request is overlapping on several lebs, (read_bytes)
+ * will be > 0 and the data will be put in the buffer at
+ * offset (read_bytes)
+ */
+ ret = ubi_read(dev->desc, leb, req->buffer + read_bytes,
+ offset, len);
+
+ if (ret) {
+ dev_err(disk_to_dev(dev->gd), "ubi_read error\n");
+ return ret;
+ }
+
+ read_bytes += len;
+
+ len = blk_rq_cur_bytes(req) - read_bytes;
+ leb++;
+ offset = 0;
+ } while (read_bytes < blk_rq_cur_bytes(req));
+
+ return 0;
+}
+
+/**
+ * ubi_ubiblk_request - wakes the processing thread
+ *
+ * @rq: the request queue which device is to be awaken
+ */
+static void ubi_ubiblk_request(struct request_queue *rq)
+{
+ struct ubiblk_dev *dev;
+ struct request *req = NULL;
+
+ dev = rq->queuedata;
+
+ if (!dev)
+ while ((req = blk_fetch_request(rq)) != NULL)
+ __blk_end_request_all(req, -ENODEV);
+ else
+ wake_up_process(dev->req_task);
+}
+
+/**
+ * ubiblk_open - open a UBI volume (get the volume descriptor)
+ *
+ * @bdev: the corresponding block device
+ * @mode: opening mode (don't care as long as ubiblk is read-only)
+ */
+static int ubiblk_open(struct block_device *bdev, fmode_t mode)
+{
+ struct ubiblk_dev *dev = bdev->bd_disk->private_data;
+ int err = 0;
+
+ dev_dbg(disk_to_dev(dev->gd), "opened mode=%d\n", mode);
+
+ dev->desc = ubi_open_volume(dev->ubi_num, dev->vol_id,
+ UBI_READONLY);
+ if (IS_ERR(dev->desc)) {
+ dev_err(disk_to_dev(dev->gd), "failed to open");
+
+ err = PTR_ERR(dev->desc);
+ dev->desc = NULL;
+ goto out_err;
+ }
+
+ dev->vi = kzalloc(sizeof(struct ubi_volume_info), GFP_KERNEL);
+ if (!dev->vi) {
+ err = -ENOMEM;
+ goto out_close;
+ }
+ ubi_get_volume_info(dev->desc, dev->vi);
+
+ return 0;
+
+out_close:
+ ubi_close_volume(dev->desc);
+ dev->desc = NULL;
+out_err:
+ return err;
+}
+
+/**
+ * ubiblk_release - close a UBI volume (close the volume descriptor)
+ *
+ * @gd: the disk that was previously opened
+ * @mode: don't care
+ */
+static int ubiblk_release(struct gendisk *gd, fmode_t mode)
+{
+ struct ubiblk_dev *dev = gd->private_data;
+ dev_dbg(disk_to_dev(dev->gd), "released, mode=%d\n", mode);
+
+ kfree(dev->vi);
+ dev->vi = NULL;
+ if (dev->desc) {
+ ubi_close_volume(dev->desc);
+ dev->desc = NULL;
+ }
+
+ return 0;
+}
+
+/**
+ * ubi_ubiblk_thread - loop on the block request queue and wait for new
+ * requests ; run them with do_ubiblk_request(). Mostly copied from
+ * mtd_blkdevs.c
+ *
+ * @arg: the ubiblk device which request queue to process
+ */
+static int ubi_ubiblk_thread(void *arg)
+{
+ struct ubiblk_dev *dev = arg;
+ struct request_queue *rq = dev->rq;
+ struct request *req = NULL;
+
+ spin_lock_irq(rq->queue_lock);
+
+ while (!kthread_should_stop()) {
+ int res;
+
+ if (!req && !(req = blk_fetch_request(rq))) {
+ set_current_state(TASK_INTERRUPTIBLE);
+
+ if (kthread_should_stop())
+ set_current_state(TASK_RUNNING);
+
+ spin_unlock_irq(rq->queue_lock);
+ schedule();
+ spin_lock_irq(rq->queue_lock);
+ continue;
+ }
+
+ spin_unlock_irq(rq->queue_lock);
+
+ mutex_lock(&dev->vol_lock);
+ res = do_ubiblk_request(req, dev);
+ mutex_unlock(&dev->vol_lock);
+
+ spin_lock_irq(rq->queue_lock);
+
+ if (!__blk_end_request_cur(req, res))
+ req = NULL;
+ }
+
+ if (req)
+ __blk_end_request_all(req, -EIO);
+
+ spin_unlock_irq(rq->queue_lock);
+
+ return 0;
+}
+
+/**
+ * ubiblk_create - create a ubiblk device proxying a UBI volume
+ *
+ * @vi: the UBI volume information data structure
+ *
+ * An UBI volume has been created ; create a corresponding ubiblk device:
+ * Initialize the locks, the structure, the block layer infos and start a
+ * req_task.
+ */
+static int ubiblk_create(struct ubi_volume_info *vi)
+{
+ struct ubiblk_dev *dev;
+ struct gendisk *gd;
+ int ret = 0;
+
+ mutex_lock(&devlist_lock);
+ /* Check that the volume isn't already proxyfied */
+ if (ubiblk_find_dev(vi)) {
+ ret = -EEXIST;
+ goto out_devlist;
+ }
+
+ dev = kzalloc(sizeof(struct ubiblk_dev), GFP_KERNEL);
+ if (!dev) {
+ ret = -ENOMEM;
+ goto out_devlist;
+ }
+
+ list_add(&dev->list, &ubiblk_devs);
+
+ mutex_init(&dev->vol_lock);
+ mutex_lock(&dev->vol_lock);
+
+ dev->ubi_num = vi->ubi_num;
+ dev->vol_id = vi->vol_id;
+
+ dev->desc = ubi_open_volume(dev->ubi_num, dev->vol_id,
+ UBI_READONLY);
+ if (IS_ERR(dev->desc)) {
+ pr_err("ubi_open_volume failed\n");
+ ret = PTR_ERR(dev->desc);
+ goto out_vol;
+ }
+
+ dev->vi = kzalloc(sizeof(struct ubi_volume_info), GFP_KERNEL);
+ if (!dev->vi) {
+ ret = -ENOMEM;
+ goto out_info;
+ }
+ ubi_get_volume_info(dev->desc, dev->vi);
+
+ /* Initialize the gendisk of this ubiblk device */
+ gd = alloc_disk(1);
+ if (!gd) {
+ pr_err("alloc_disk failed\n");
+ ret = -ENODEV;
+ goto out_disk;
+ }
+
+ gd->fops = &ubiblk_ops;
+ gd->major = ubiblk_major;
+ gd->first_minor = dev->ubi_num * UBI_MAX_VOLUMES + dev->vol_id;
+ gd->private_data = dev;
+ sprintf(gd->disk_name, "ubiblk%d_%d", dev->ubi_num, dev->vol_id);
+ set_capacity(gd,
+ (dev->vi->size *
+ dev->vi->usable_leb_size) >> 9);
+ set_disk_ro(gd, 1);
+ dev->gd = gd;
+
+ spin_lock_init(&dev->queue_lock);
+ dev->rq = blk_init_queue(ubi_ubiblk_request, &dev->queue_lock);
+ if (!dev->rq) {
+ pr_err("blk_init_queue failed\n");
+ ret = -ENODEV;
+ goto out_queue;
+ }
+ dev->rq->queuedata = dev;
+ blk_queue_logical_block_size(dev->rq, BLK_SIZE);
+ dev->gd->queue = dev->rq;
+
+ /* Stolen from mtd_blkdevs.c */
+ /* Create processing req_task
+ *
+ * The processing of the request has to be done in process context (it
+ * might sleep) but blk_run_queue can't block ; so we need to separate
+ * the event of a request being added to the queue (which triggers the
+ * callback ubi_ubiblk_request - that is set with blk_init_queue())
+ * and the processing of that request.
+ *
+ * Thus, the sole purpose of ubi_ubiblk_reuqest is to wake the kthread
+ * up so that it will process the request queue
+ */
+ dev->req_task = kthread_run(ubi_ubiblk_thread, dev, "%s%d_%d",
+ "kubiblk", dev->ubi_num, dev->vol_id);
+ if (IS_ERR(dev->req_task)) {
+ ret = PTR_ERR(dev->req_task);
+ goto out_thread;
+ }
+
+ add_disk(dev->gd);
+
+ dev_info(disk_to_dev(dev->gd),
+ "created from ubi%d:%d(%s)\n", dev->ubi_num, dev->vol_id,
+ dev->vi->name);
+
+ kfree(dev->vi);
+ dev->vi = NULL;
+ ubi_close_volume(dev->desc);
+ dev->desc = NULL;
+ mutex_unlock(&dev->vol_lock);
+
+ mutex_unlock(&devlist_lock);
+
+ return 0;
+
+out_thread:
+ blk_cleanup_queue(dev->rq);
+out_queue:
+ put_disk(dev->gd);
+out_disk:
+ kfree(dev->vi);
+ dev->vi = NULL;
+out_info:
+ ubi_close_volume(dev->desc);
+ dev->desc = NULL;
+out_vol:
+ mutex_unlock(&dev->vol_lock);
+out_devlist:
+ mutex_unlock(&devlist_lock);
+
+ return ret;
+}
+
+/**
+ * ubiblk_remove - destroy a ubiblk device
+ * @vi: the UBI volume information data structure
+ *
+ * A UBI volume has been removed or we are requested to unproxify a volume ;
+ * destroy the corresponding ubiblk device
+ */
+static int ubiblk_remove(struct ubi_volume_info *vi)
+{
+ struct ubiblk_dev *dev = NULL;
+
+ mutex_lock(&devlist_lock);
+
+ dev = ubiblk_find_dev(vi);
+
+ if (!dev) {
+ mutex_unlock(&devlist_lock);
+ pr_warn("trying to remove %s, but it isn't handled\n",
+ vi->name);
+ return -ENODEV;
+ }
+
+ if (dev->desc) {
+ mutex_unlock(&devlist_lock);
+ return -EBUSY;
+ }
+
+ del_gendisk(dev->gd);
+ blk_cleanup_queue(dev->rq);
+ kthread_stop(dev->req_task);
+ put_disk(dev->gd);
+
+ kfree(dev->vi);
+
+ list_del(&dev->list);
+ kfree(dev);
+
+ mutex_unlock(&devlist_lock);
+ pr_info("unproxyfied %s\n", vi->name);
+ return 0;
+}
+
+/**
+ * ubiblk_resize - resize a ubiblk device
+ * @vi: the UBI volume information data structure
+ *
+ * A UBI volume has been resized, change the ubiblk device geometry accordingly
+ */
+static int ubiblk_resize(struct ubi_volume_info *vi)
+{
+ struct ubiblk_dev *dev;
+
+ /* We don't touch the list, but we better lock it: it could be that the
+ * device gets removed between the time the device has been found and
+ * the time we access dev->gd
+ */
+ mutex_lock(&devlist_lock);
+ dev = ubiblk_find_dev(vi);
+ if (!dev) {
+ mutex_unlock(&devlist_lock);
+ pr_warn("trying to resize %s, which isn't handled\n",
+ vi->name);
+ return -ENODEV;
+ }
+
+ mutex_lock(&dev->vol_lock);
+ set_capacity(dev->gd,
+ (vi->size * vi->usable_leb_size) >> 9);
+ dev_dbg(disk_to_dev(dev->gd), "resized to %d LEBs\n", vi->size);
+ mutex_unlock(&dev->vol_lock);
+
+ mutex_unlock(&devlist_lock);
+ return 0;
+}
+
+/**
+ * ubiblk_notify - dispatches the UBI notifications
+ * copied from gluebi.c
+ * @nb: unused
+ * @notification_type: the notification type sent by UBI
+ * @ns_ptr: contains the notifications' additional informations
+ */
+static int ubiblk_notify(struct notifier_block *nb,
+ unsigned long notification_type, void *ns_ptr)
+{
+ struct ubi_notification *nt = ns_ptr;
+
+ switch (notification_type) {
+ case UBI_VOLUME_REMOVED:
+ ubiblk_remove(&nt->vi);
+ break;
+ case UBI_VOLUME_RESIZED:
+ ubiblk_resize(&nt->vi);
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static const struct block_device_operations ubiblk_ops = {
+ .owner = THIS_MODULE,
+ .open = ubiblk_open,
+ .release = ubiblk_release,
+};
+
+static struct notifier_block ubiblk_notifier = {
+ .notifier_call = ubiblk_notify,
+};
+
+
+/**
+ * ubiblk_ctrl_ioctl - ioctl handling for proxying/unproxying a UBI volume
+ * @file: the file on which the ioctl was invoked (usunsed)
+ * @cmd: the ioctl type
+ * @arg: additional command informations
+ */
+static long ubiblk_ctrl_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ int err = 0;
+ void __user *argp = (void __user *)arg;
+
+ struct ubi_volume_desc *desc;
+ struct ubi_volume_info vi;
+ struct ubiblk_ctrl_req req;
+
+ if (!capable(CAP_SYS_RESOURCE))
+ return -EPERM;
+
+ err = copy_from_user(&req, argp, sizeof(struct ubiblk_ctrl_req));
+ if (err)
+ return -EFAULT;
+
+ if (req.ubi_num < 0 || req.vol_id < 0)
+ return -EINVAL;
+
+ desc = ubi_open_volume(req.ubi_num, req.vol_id, UBI_READONLY);
+ if (IS_ERR(desc)) {
+ dev_err(ctrl_dev.this_device, "opening ubi%d:%d failed\n",
+ req.ubi_num, req.vol_id);
+ return PTR_ERR(desc);
+ }
+
+ ubi_get_volume_info(desc, &vi);
+
+ switch (cmd) {
+ case UBIBLK_IOCADD:
+ dev_info(ctrl_dev.this_device, "proxying ubi%d:%d\n",
+ req.ubi_num, req.vol_id);
+ err = ubiblk_create(&vi);
+ break;
+ case UBIBLK_IOCDEL:
+ dev_info(ctrl_dev.this_device, "unproxying ubi%d:%d\n",
+ req.ubi_num, req.vol_id);
+ err = ubiblk_remove(&vi);
+ break;
+
+ default:
+ err = -ENOTTY;
+ break;
+ }
+
+ ubi_close_volume(desc);
+
+ return err;
+}
+
+#ifdef CONFIG_COMPAT
+static long ubiblk_ctrl_compat_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ unsigned long translated_arg = (unsigned long)compat_ptr(arg);
+
+ return ubiblk_ctrl_ioctl(file, cmd, translated_arg);
+}
+#endif
+
+/* ubiblk control device (receives ioctls) */
+static const struct file_operations ubiblk_ctrl_ops = {
+ .owner = THIS_MODULE,
+ .unlocked_ioctl = ubiblk_ctrl_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = ubiblk_ctrl_compat_ioctl,
+#endif
+ .llseek = no_llseek,
+};
+static struct miscdevice ctrl_dev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "ubiblk_ctrl",
+ .fops = &ubiblk_ctrl_ops,
+};
+
+/**
+ * ubi_ubiblk_init - initialize the module
+ *
+ * Get a major number and register to UBI notifications ; register the ioctl
+ * handler device
+ */
+static int __init ubi_ubiblk_init(void)
+{
+ int ret = 0;
+
+ ret = register_blkdev(0, "ubiblk");
+ if (ret <= 0) {
+ pr_err("can't register the major\n");
+ return -ENODEV;
+ }
+ ubiblk_major = ret;
+
+ mutex_init(&devlist_lock);
+
+ ret = misc_register(&ctrl_dev);
+ if (ret) {
+ pr_err("can't register control device\n");
+ goto out_unreg_blk;
+ }
+
+ ret = ubi_register_volume_notifier(&ubiblk_notifier, 1);
+ if (ret < 0)
+ goto out_unreg_ctrl;
+
+ pr_info("major=%d\n", ubiblk_major);
+
+ return ret;
+
+out_unreg_ctrl:
+ misc_deregister(&ctrl_dev);
+out_unreg_blk:
+ unregister_blkdev(ubiblk_major, "ubiblk");
+
+ return ret;
+}
+
+/*
+ * ubi_ubiblk_exit - end of life
+ *
+ * unregister the block device major, unregister from UBI notifications,
+ * unregister the ioctl handler device stop the threads and free the memory.
+ */
+static void __exit ubi_ubiblk_exit(void)
+{
+ struct list_head *list_ptr, *next;
+ struct ubiblk_dev *dev;
+
+ ubi_unregister_volume_notifier(&ubiblk_notifier);
+ misc_deregister(&ctrl_dev);
+
+ list_for_each_safe(list_ptr, next, &ubiblk_devs) {
+ dev = list_entry(list_ptr, struct ubiblk_dev, list);
+
+ /* TODO: it shouldn't happen, right ? */
+ if (dev->desc)
+ ubi_close_volume(dev->desc);
+
+ del_gendisk(dev->gd);
+ blk_cleanup_queue(dev->rq);
+ kthread_stop(dev->req_task);
+ put_disk(dev->gd);
+
+ kfree(dev->vi);
+ list_del(&dev->list); /* really needed ? */
+ kfree(dev);
+ }
+
+ unregister_blkdev(ubiblk_major, "ubiblk");
+ pr_info("end of life\n");
+}
+
+module_init(ubi_ubiblk_init);
+module_exit(ubi_ubiblk_exit);
+MODULE_DESCRIPTION("Read-only block transition layer on top of UBI");
+MODULE_AUTHOR("David Wagner");
+MODULE_LICENSE("GPL");
diff --git a/include/mtd/Kbuild b/include/mtd/Kbuild
index 192f8fb..d0d59d8 100644
--- a/include/mtd/Kbuild
+++ b/include/mtd/Kbuild
@@ -3,3 +3,4 @@ header-y += mtd-abi.h
header-y += mtd-user.h
header-y += nftl-user.h
header-y += ubi-user.h
+header-y += ubiblk-user.h
diff --git a/include/mtd/ubiblk-user.h b/include/mtd/ubiblk-user.h
new file mode 100644
index 0000000..61df415
--- /dev/null
+++ b/include/mtd/ubiblk-user.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright © Free Electrons, 2011
+ * Copyright © International Business Machines Corp., 2006
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
+ * the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * Author: David Wagner
+ * Some code taken from ubi-user.h
+ */
+
+#ifndef __UBIBLK_USER_H__
+#define __UBIBLK_USER_H__
+
+#include <linux/types.h>
+
+/**
+ * ubiblk_ctrl_req - additional ioctl data structure
+ * @ubi_num: UBI device number
+ * @vol_id: UBI volume identifier
+ */
+struct ubiblk_ctrl_req {
+ __s32 ubi_num;
+ __s32 vol_id;
+} __packed;
+
+/* ioctl commands of the UBI control character device */
+#define UBIBLK_CTRL_IOC_MAGIC 'O'
+
+/* Create a ubiblk device from a UBI volume */
+#define UBIBLK_IOCADD _IOW(UBIBLK_CTRL_IOC_MAGIC, 0x10, struct ubiblk_ctrl_req)
+/* Delete a ubiblk device */
+#define UBIBLK_IOCDEL _IOW(UBIBLK_CTRL_IOC_MAGIC, 0x11, struct ubiblk_ctrl_req)
+/* If you add ioctls here, please note that UBI uses 'O'/0x00-0x06 */
+
+#endif
--
1.7.0.4
^ permalink raw reply related
* Re: [PATCH] Tools for controling ubiblk
From: Artem Bityutskiy @ 2011-08-22 8:17 UTC (permalink / raw)
To: David Wagner; +Cc: linux-mtd, linux-embedded, Tim Bird, David Woodhouse
In-Reply-To: <1313590847-3839-1-git-send-email-david.wagner@free-electrons.com>
I do not think LKML is interested in mtd-utils user-space project
patches, please, do not add lkml to spam it less.
On Wed, 2011-08-17 at 16:20 +0200, David Wagner wrote:
> +#ifndef __UBIBLK_USER_H__
> +#define __UBIBLK_USER_H__
> +
> +#include <linux/types.h>
> +
> +/* Structure to be passed to UBIBLK_IOCADD or IOCDEL ioctl */
> +struct ubiblk_ctrl_req {
> + __s32 ubi_num;
> + __s32 vol_id;
> +};
> +
> +/* ioctl commands of the UBI control character device */
> +
> +#define UBIBLK_CTRL_IOC_MAGIC 'O'
> +
> +/* Create a ubiblk device from a UBI volume */
> +#define UBIBLK_IOCADD _IOW(UBIBLK_CTRL_IOC_MAGIC, 0x10, struct ubiblk_ctrl_req)
> +/* Delete a ubiblk device */
> +#define UBIBLK_IOCDEL _IOW(UBIBLK_CTRL_IOC_MAGIC, 0x11, struct ubiblk_ctrl_req)
> +
> +#endif
Please, do not copy kernel headers verbatim, copy the result of "make
headers_install" invoked in the kernel tree.
--
Best Regards,
Artem Bityutskiy
^ permalink raw reply
* Re: [PATCHv3] UBI: new module ubiblk: block layer on top of UBI
From: Artem Bityutskiy @ 2011-08-22 7:42 UTC (permalink / raw)
To: david.wagner
Cc: linux-mtd, linux-embedded, lkml, Tim Bird, David Woodhouse,
Arnd Bergmann
In-Reply-To: <1313587042-30846-1-git-send-email-david.wagner@free-electrons.com>
On Wed, 2011-08-17 at 15:17 +0200, david.wagner@free-electrons.com
wrote:
> Questions:
> ==========
> I wasn't sure what magic ioctl number to use, so I settled to use the same one
> as a part of UBI: 'O', which was so far only used by UBI but on a higher range
> and leaving some room for UBI to add ioctls (for nw, it uses 'O'/0x00-0x06 and
> ubiblk uses 'O'/0x10-0x11). Is it ok or should ubiblk use a different
> number/range ?
I think this is OK to share them between UBI and ubiblk, as long as this
is documented. But I always CC Arnd when it comes to ioctl-related
questions.
P.S. Arnd, you can always find the initial post in lkml, if needed.
--
Best Regards,
Artem Bityutskiy
^ permalink raw reply
* Re: [PATCHv3] UBI: new module ubiblk: block layer on top of UBI
From: Artem Bityutskiy @ 2011-08-22 7:39 UTC (permalink / raw)
To: david.wagner; +Cc: linux-mtd, linux-embedded, lkml, Tim Bird, David Woodhouse
In-Reply-To: <1313587042-30846-1-git-send-email-david.wagner@free-electrons.com>
Hi,
thanks for the patch, it is quite good I think, but I still have few
comments. I did not review very carefully because of limited amount of
free time.
There are few checkpatch.pl complaints, could you please take a look?
Note, often I give a comment for one place, but there are many other
similar places with the same stuff.
On Wed, 2011-08-17 at 15:17 +0200, david.wagner@free-electrons.com
wrote:
> TODO:
> * the modules keeps a table of the devices which length is the maximum number
> of UBI volumes. There should be a better solution (linked list or, as
> Christoph Hellwig suggests, a radix tree (idr)).
Wold be nice to do the same change for UBI, BTW :-)
> Here is the v3 of ubiblk implementing ioctls for on-demand creation/removal of
> ubiblk device ; is it what you were thinking of ?
Looks like.
[snip]
> +config MTD_UBI_UBIBLK
> + tristate "Read-only block transition layer on top of UBI"
> + help
> + Read-only block interface on top of UBI.
> +
> + This option adds ubiblk, which creates a read-ony block device from
> + UBI volumes. It makes it possible to use block filesystems on top of
> + UBI (and thus, on top of MTDs while avoiding bad blocks).
s/block filesystems/R/O block filesystems/
s/on top of UBI/on top of UBI volumes/
s/and thus, on top of MTDs/and hence, on top of MTD devices/
> +
> + ubiblk devices are created by sending appropriate ioctl to the
> + ubiblk_ctrl device node.
s/sending/invoking/
[snip]
Would be good to add a section to the UBI Documentation describing
ubiblk by sending a patch against mtd-www:
http://git.infradead.org/mtd-www.git
[snip]
> +/*
> + * Structure representing a ubiblk device, proxying a UBI volume
> + */
> +struct ubiblk_dev {
> + struct ubi_volume_desc *vol_desc;
> + struct ubi_volume_info *vol_info;
Since this piece of code is part of drivers/mtd/ubi/, I think that it
makes sense to make it very consistent with the rest of the code. It is
then better to use consistent names for variables of this type: "desc"
and "vi".
> + int ubi_num;
> + int vol_id;
> +
> + /* Block stuff */
> + struct gendisk *gd;
> + struct request_queue *rq;
> + struct task_struct *thread;
Here is one comment I got from hch once which I also saved in the git
history: 6f904ff0e39ea88f81eb77e8dfb4e1238492f0a8
hch: a convention I tend to use and I've seen in various places
is to always use _task for the storage of the task_struct pointer,
and thread everywhere else. This especially helps with having
foo_thread for the actual thread and foo_task for a global
variable keeping the task_struct pointer
Let's follow it and call this field "<something_meaningful>_task", e.g.,
"req_task" (request queue/dispatcher/etc task) ? Keep using "_thread"
for other stuff.
You can also change UBI correspondingly.
> + /* Protects the access to the UBI volume */
> + struct mutex lock;
Lets call all locks <something_meaningful>_lock, e.g., volumes_lock or
vol_lock.
> +
> + /* Avoids concurrent accesses to the request queue */
> + spinlock_t queue_lock;
> +};
And for consistency, it is better to use kerneldoc comments, like the
rest of UBI, but if you hate them, I will not insist.
[snip]
> + mutex_lock(&dev->lock);
> + set_capacity(dev->gd,
> + (vol_info->size * vol_info->usable_leb_size) >> 9);
> + mutex_unlock(&dev->lock);
> + pr_debug("Resized ubiblk%d_%d to %d LEBs\n", vol_info->ubi_num,
> + vol_info->vol_id, vol_info->size);
If you find a way to properly use dev_dbg() instead of pr_debug(), your
messages will be automatically prefixed with "ubiblk%d_%d", and your
messages will be shorter - so less code, smaller binary size. See below
my other comment.
[snip]
> +static int ubiblk_notify(struct notifier_block *nb,
> + unsigned long notification_type, void *ns_ptr)
> +{
> + struct ubi_notification *nt = ns_ptr;
> +
> + switch (notification_type) {
> + case UBI_VOLUME_ADDED:
> + break;
> + case UBI_VOLUME_REMOVED:
> + ubiblk_remove(&nt->vi);
> + break;
> + case UBI_VOLUME_RESIZED:
> + ubiblk_resized(&nt->vi);
> + break;
> + case UBI_VOLUME_UPDATED:
> + break;
> + case UBI_VOLUME_RENAMED:
> + break;
> + default:
> + break;
Please, remove cases which do nothing and let them end up in "default:".
[snip]
> +static long ubiblk_ctrl_ioctl(struct file *file, unsigned int cmd,
> + unsigned long arg)
> +{
> + int err = 0;
> + void __user *argp = (void __user *)arg;
> +
> + struct ubi_volume_desc *vol_desc;
> + struct ubi_volume_info vol_info;
> + struct ubiblk_ctrl_req req;
> +
> + if (!capable(CAP_SYS_RESOURCE))
> + return -EPERM;
> +
> + err = copy_from_user(&req, argp, sizeof(struct ubiblk_ctrl_req));
> + if (err)
> + return -EFAULT;
> +
> + if (req.ubi_num < 0 || req.vol_id < 0)
> + return -EINVAL;
> +
> + vol_desc = ubi_open_volume(req.ubi_num, req.vol_id, UBI_READONLY);
> + if (IS_ERR(vol_desc)) {
> + pr_err("Opening volume %d on device %d failed\n",
> + req.vol_id, req.ubi_num);
Because dynamic_debug usually adds prefixes to messages, or pr_fmt is
usually used to add a prefix, I suggest to make all pr_* / dev_*
messages to start with small letter.
Also, should we use dev_* macros instead? I have always thought that
pr_* is used only if you do not have "struct device", no? But you do
have it, AFAICS:
1. For messages which are not related to a specific ubiblk device,
ubi_blk_ctrl_dev.this_device (may be ubi_blk_ctrk_dev then can be
shortened to ctrl_dev?)
2. For messages that are about a specific device - not exactly sure, but
I bet there is a struct device for your ubiblk_%d_%d device. Probably
disk_to_dev(<struct ubiblk_dev>->gd)
Try to find this out and test. We should not use pr_* unless there is a
good reason why dev_* does not work.
> + return PTR_ERR(vol_desc);
> + }
> +
> + ubi_get_volume_info(vol_desc, &vol_info);
> +
> + switch (cmd) {
> + case UBIBLK_IOCADD:
> + pr_info("Creating a ubiblk device proxing ubi%d:%d\n",
> + req.ubi_num, req.vol_id);
> + ubiblk_create(&vol_info);
Please, return the error code!
> + break;
> + case UBIBLK_IOCDEL:
> + pr_info("Removing ubiblk from ubi%d:%d\n",
> + req.ubi_num, req.vol_id);
> + ubiblk_remove(&vol_info);
And here!
[snip]
> +static int __init ubi_ubiblk_init(void)
> +{
> + int ret = 0;
> +
> + pr_info("UBIBLK starting\n");
Please, kill this message, it looks more like tracing, not info. I
suggest you to add one single message at the end like "blah blah
initialized, major number %d".
> +
> + ret = register_blkdev(0, "ubiblk");
> + if (ret <= 0) {
> + pr_err("UBIBLK: could not register_blkdev\n");
> + return -ENODEV;
> + }
> + ubiblk_major = ret;
> + pr_info("UBIBLK: device's major: %d\n", ubiblk_major);
Please, kill this message as well, or turn it into pr_debug()/dev_dbg().
Talking about messages:
1. Remove this UBIBLK: prefix from all of them. Rationale: if dynamic
debug is enabled, the dynamic debug infrastructure will add it for you,
see "Documentation/dynamic-debug-howto.txt". Otherwise you can always
define "pr_fmt" and add the prefix you wish.
[snip]
> +static void __exit ubi_ubiblk_exit(void)
> +{
> + int i;
> +
> + pr_info("UBIBLK: going to exit\n");
Please, kill these. Again, if you really feel like it - add one single
message like "blah exited" at the end.
[snip]
> +/* Structure to be passed to UBIBLK_IOCADD or IOCDEL ioctl */
> +struct ubiblk_ctrl_req {
> + __s32 ubi_num;
> + __s32 vol_id;
> +};
> +
> +/* ioctl commands of the UBI control character device */
Please, document here that you share "O" with UBI. Also document this in
UBI in a separate patch.
> +
> +#define UBIBLK_CTRL_IOC_MAGIC 'O'
> +
> +/* Create a ubiblk device from a UBI volume */
> +#define UBIBLK_IOCADD _IOW(UBIBLK_CTRL_IOC_MAGIC, 0x10, struct ubiblk_ctrl_req)
> +/* Delete a ubiblk device */
> +#define UBIBLK_IOCDEL _IOW(UBIBLK_CTRL_IOC_MAGIC, 0x11, struct ubiblk_ctrl_req)
> +
> +#endif
--
Best Regards,
Artem Bityutskiy
^ permalink raw reply
* Re: RFC: [Restatement] KBUS messaging subsystem
From: Bryan Donlan @ 2011-08-22 1:15 UTC (permalink / raw)
To: Tony Ibbs
Cc: Pekka Enberg, lkml, Andrew Morton, Jonathan Corbet,
Florian Fainelli, Grant Likely, Linux-embedded, Tibs at Kynesim,
Richard Watts
In-Reply-To: <1825A697-7D60-44D0-8150-BC0E9F0DEB67@tonyibbs.co.uk>
On Sun, Aug 21, 2011 at 09:28, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
>
> On 15 Aug 2011, at 12:46, Pekka Enberg wrote:
>
>> I simply don't see a convincing argument why existing IPC and other
>> kernel mechanisms are not sufficient to implement what you need. I'm
>> sure there is one but it's not apparent from your emails.
>
> Our major concern, strongly based on experience, is that given the
> existing kernel mechanisms, users do not build robust (or even
> sometimes working!) solutions for inter-process communication.
>
> This is in large part because they do not realise (at the start) how
> difficult this is to do. Especially if they want to keep it small.
>
> The only *sure* way of solving this is to provide a mechanism that is
> "always there", and that really means a solution provided by the
> kernel. This needs to be at a higher level than what is currently
> available, but obviously what exactly is provided is then a matter for
> discussion. We'd obviously argue that KBUS hits a "sweet spot" for the
> needs we perceive, given our application areas.
>
>> The whole thing feels more like "lets put a message broker into the
>> kernel" rather than set of kernel APIs that make sense. I suppose the
>> rather extensive ioctl() ABI is partly to blame here.
>
> I'm not sure what you mean by "message broker", except that it's
> plainly meant to be a bad thing - the wikipedia meaning doesn't seem
> terribly applicable to KBUS, as it covers an awful lot more territory
> (mind, the discussion page is amusing).
>
> I'll freely admit we started with the idea of what functionality we
> wanted and then chose a simple-to-implement API to make it happen.
>
> *If* KBUS were in the kernel, with its current functionality, what API
> would you expect? (not just "a sockety one", but what actual API?) If
> one recasts as a sockety API, how is many new socket options better
> than a set of ioctls? (or is that just one of those questions to which
> the answer is "well, it is"?)
I think this may well be the core problem here - is KBUS, as proposed,
a general API lots of people will find useful, or is it something that
will fit _your_ usecase well, but other usecases poorly?
Designing a good API, of course, is quite difficult, but it _must_ be
done before integrating anything with upstream Linux, as once
something is merged it has to be supported for decades, even if it
turns out to be useless for 99% of usecases.
Some good questions to ask might be:
* Does this system play nice with namespaces?
* What limits are in place to prevent resource exhaustion attacks?
* Can libdbus or other such existing message brokers swap out their
existing central-routing-process based communications with this new
system without applications being aware?
Keep in mind also that the kernel API need not match the
application-visible API, if you can add a userspace library to
translate to the API you want. So, for example, instead of numbering
kbuses, you could define them as a new AF_UNIX protocol, and place
them in the abstract socket namespace (ie, they'd have names like
"\0kbus-0"). Doing something like this avoids creating a new
namespace, and non-embedded devices could place these new primitives
in a tmpfs or other more visible location. It also makes it very cheap
(and a non-privileged operation!) to create kbuses.
So, let's look at your requirements:
* Message broadcast API with prefix filtering
* Deterministic ordering
* Possible to snoop on all messages being passed through
* Must not require any kind of central userspace daemon
* Needs a race-less way of 1) Advertising (and locking) as a replier
for a particular message type and 2) Detecting when the replier dies
(and synthesizing error replies in this event)
Now, to minimize this definition, why not remove prefix filtering from
the kernel? For low-volume buses, it doesn't hurt to do the filtering
in userspace (right?). If you want to reduce the volume of messages
received, do it on a per-bus granularity (and set up lots of buses
instead). After all, you can always connect to multiple buses if you
need to listen for multiple message types. For replier registration,
then, it would be done on a per-bus granularity, not a per-message
granularity.
So we now have an API that might (as an example) look like this:
* Creation of buses - socket(AF_UNIX, SOCK_DGRAM, PROTO_KBUS),
followed by bind() either to a file or in the abstract namespace
* Advertising as a replier on a socket - setsockopt(SOL_KBUS,
KBUS_REPLIER, &one); - returns -EEXIST if a replier is already present
* Sending/receiving messages - ordinary sendto/recvfrom. If a reply is
desired, use sendmsg with an ancillary data item indicating a reply is
desired
* Notification on replier death (or replier buffer overflow etc):
empty message with ancillary data attached informing of the error
condition
* 64-bit global counter on all messages (or messages where requested
by the client) to give a deterministic order between messages sent on
multiple buses (reported via ancillary data)
* Resource limitation based on memory cgroup or something? Not sure
what AF_UNIX uses already, but you could probably use the same system.
* Perhaps support SCM_RIGHTS/SCM_CREDENTIALS transfers as well?
This is a much simpler kernel API, don't you think? It's also easy to
see how dbus could use it as well - just add a method to filter
unicast messages from being seen by other uninterested clients, create
a kbus socket for each dbus connection (with appropriate symlinks for
any registered aliases), and have the owner of a connection socket
register itself as a replier. Now you can send dbus broadcast messages
across the KBUS socket as usual, and perhaps send replies to unicast
messages over a socket passed in over a SCM_CREDENTIALS transfer.
Alternately, you could assign connection IDs, and have a control
message to route unicast replies to their sender - in any case, these
details are something dbus people would need to comment on, if they're
interested, but you can see that it's a use case that shows promise
(I'm not familiar with the dbus security model, however, and so I'm
not sure if this'll play well with it).
In short, API minimalism is key to acceptance in the upstream kernel.
Try to pare down the core API to the bare minimum to get what you
need, rather than implementing your final use case directly into the
kernel using ioctls or whatnot.
Thanks,
Bryan
^ permalink raw reply
* Re: RFC: [Restatement] KBUS messaging subsystem
From: Tony Ibbs @ 2011-08-21 13:28 UTC (permalink / raw)
To: Pekka Enberg
Cc: lkml, Andrew Morton, Jonathan Corbet, Florian Fainelli,
Grant Likely, Linux-embedded, Tibs at Kynesim, Richard Watts
In-Reply-To: <CAOJsxLEg-9hAQG99TkM+tqdTjZkpZwubOu1qM-6m7B5apZmh_w@mail.gmail.com>
On 15 Aug 2011, at 12:46, Pekka Enberg wrote:
> Hi Tony,
Hi. Thanks for your reply
> On Sun, Aug 7, 2011 at 11:24 PM, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
>> Real examples of usage that aren't the STB are a bit difficult to give
>> because they belong to customer projects that we're not allowed to
>> talk about.
>
> That's part of the problem, I suppose. We usually don't merge new
> kernel facilities unless we're able to understand (and see) real
> applications that need them.
I understand. It is a bit of a chicken-and-egg problem.
> On Sun, Aug 7, 2011 at 11:24 PM, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
>> I assume the real point of your post is that I wrote about the reasons
>> why we made KBUS a kernel module, but did not really address the
>> reasons why KBUS might want to be a kernel module in general usage.
>
> I simply don't see a convincing argument why existing IPC and other
> kernel mechanisms are not sufficient to implement what you need. I'm
> sure there is one but it's not apparent from your emails.
Our major concern, strongly based on experience, is that given the
existing kernel mechanisms, users do not build robust (or even
sometimes working!) solutions for inter-process communication.
This is in large part because they do not realise (at the start) how
difficult this is to do. Especially if they want to keep it small.
The only *sure* way of solving this is to provide a mechanism that is
"always there", and that really means a solution provided by the
kernel. This needs to be at a higher level than what is currently
available, but obviously what exactly is provided is then a matter for
discussion. We'd obviously argue that KBUS hits a "sweet spot" for the
needs we perceive, given our application areas.
> The whole thing feels more like "lets put a message broker into the
> kernel" rather than set of kernel APIs that make sense. I suppose the
> rather extensive ioctl() ABI is partly to blame here.
I'm not sure what you mean by "message broker", except that it's
plainly meant to be a bad thing - the wikipedia meaning doesn't seem
terribly applicable to KBUS, as it covers an awful lot more territory
(mind, the discussion page is amusing).
I'll freely admit we started with the idea of what functionality we
wanted and then chose a simple-to-implement API to make it happen.
*If* KBUS were in the kernel, with its current functionality, what API
would you expect? (not just "a sockety one", but what actual API?) If
one recasts as a sockety API, how is many new socket options better
than a set of ioctls? (or is that just one of those questions to which
the answer is "well, it is"?)
> I'm not saying you need to implement everything in userspace but I'm
> also not convinced we want _all of this_ in the kernel.
That's quite understandable. So, given the functionality we want, what
would you put in the kernel, and what in userspace? (I'm personally
sceptical about how much can be split this way, but still)
(I realise that both this and the previous question are asking you to
do work, but I'm honestly hoping that even if KBUS-as-is is not
applicable, we might figure out an irreducible set of higher level
constructs than those which are currently present which will attack
the problem we wrote KBUS to attack.)
Tibs
^ permalink raw reply
* [PATCH] Tools for controling ubiblk
From: David Wagner @ 2011-08-17 14:20 UTC (permalink / raw)
To: linux-mtd
Cc: Artem Bityutskiy, linux-embedded, lkml, Tim Bird, David Woodhouse,
David Wagner
In-Reply-To: <1313587042-30846-1-git-send-email-david.wagner@free-electrons.com>
ubiblk_ctrl sends an appropriate ioctl to ubiblk control node.
ubiblkadd and ubiblkdel are wrappers around ubiblk_ctrl.
The syntax is:
ubiblk{add,del} x y
where x is the UBI device number and y the volume ID.
Signed-off-by: David Wagner <david.wagner@free-electrons.com>
---
This is a preliminary version: it has the control device node hardcoded which is
probably not nice.
Makefile | 6 ++-
include/mtd/ubiblk-user.h | 43 ++++++++++++++++++++
ubi-utils/ubiblk_ctrl.c | 95 +++++++++++++++++++++++++++++++++++++++++++++
ubi-utils/ubiblkadd | 2 +
ubi-utils/ubiblkdel | 2 +
5 files changed, 147 insertions(+), 1 deletions(-)
create mode 100644 include/mtd/ubiblk-user.h
create mode 100644 ubi-utils/ubiblk_ctrl.c
create mode 100755 ubi-utils/ubiblkadd
create mode 100755 ubi-utils/ubiblkdel
diff --git a/Makefile b/Makefile
index 67e42ce..2859a8c 100644
--- a/Makefile
+++ b/Makefile
@@ -27,12 +27,16 @@ MTD_BINS = \
sumtool #jffs2reader
UBI_BINS = \
ubiupdatevol ubimkvol ubirmvol ubicrc32 ubinfo ubiattach \
- ubidetach ubinize ubiformat ubirename mtdinfo ubirsvol
+ ubidetach ubinize ubiformat ubirename mtdinfo ubirsvol \
+ ubiblk_ctrl
+UBI_SCRIPTS = \
+ ubiblkadd ubiblkdel
BINS = $(MTD_BINS)
BINS += mkfs.ubifs/mkfs.ubifs
BINS += $(addprefix ubi-utils/,$(UBI_BINS))
SCRIPTS = flash_eraseall
+SCRIPTS += $(addprefix ubi-utils/,$(UBI_SCRIPTS))
TARGETS = $(BINS)
TARGETS += lib/libmtd.a
diff --git a/include/mtd/ubiblk-user.h b/include/mtd/ubiblk-user.h
new file mode 100644
index 0000000..fa0d007
--- /dev/null
+++ b/include/mtd/ubiblk-user.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright © Free Electrons, 2011
+ * Copyright © International Business Machines Corp., 2006
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
+ * the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * Author: David Wagner
+ * Some code taken from ubi-user.h
+ */
+
+#ifndef __UBIBLK_USER_H__
+#define __UBIBLK_USER_H__
+
+#include <linux/types.h>
+
+/* Structure to be passed to UBIBLK_IOCADD or IOCDEL ioctl */
+struct ubiblk_ctrl_req {
+ __s32 ubi_num;
+ __s32 vol_id;
+};
+
+/* ioctl commands of the UBI control character device */
+
+#define UBIBLK_CTRL_IOC_MAGIC 'O'
+
+/* Create a ubiblk device from a UBI volume */
+#define UBIBLK_IOCADD _IOW(UBIBLK_CTRL_IOC_MAGIC, 0x10, struct ubiblk_ctrl_req)
+/* Delete a ubiblk device */
+#define UBIBLK_IOCDEL _IOW(UBIBLK_CTRL_IOC_MAGIC, 0x11, struct ubiblk_ctrl_req)
+
+#endif
diff --git a/ubi-utils/ubiblk_ctrl.c b/ubi-utils/ubiblk_ctrl.c
new file mode 100644
index 0000000..1132f35
--- /dev/null
+++ b/ubi-utils/ubiblk_ctrl.c
@@ -0,0 +1,95 @@
+/*
+ * Copyright (c) Free Electrons, 2011
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
+ * the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * Author: David Wagner
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <libgen.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <mtd/ubiblk-user.h>
+
+#define CONTROL_NODE "/dev/ubiblk_ctrl"
+
+void usage(char *exec_path)
+{
+ fprintf(stderr, "%s <-a|-d> <-u UBI_DEVICE_NUMBER> <-v VOLUME_ID>\n",
+ basename(exec_path));
+}
+
+int main(int argc, char *argv[])
+{
+ int ubi_num = -2, vol_id = -1;
+ int fd;
+ struct ubiblk_ctrl_req req;
+ int ret;
+ int command;
+
+ int option;
+
+ while ((option = getopt(argc, argv, "adu:v:")) != -1) {
+ switch (option) {
+ case 'a':
+ command = UBIBLK_IOCADD;
+ break;
+ case 'd':
+ command = UBIBLK_IOCDEL;
+ break;
+ case 'u':
+ ubi_num = atoi(optarg);
+ break;
+ case 'v':
+ vol_id = atoi(optarg);
+ break;
+ }
+ }
+
+ if (command != UBIBLK_IOCDEL && command != UBIBLK_IOCADD) {
+ usage(argv[0]);
+ return EXIT_FAILURE;
+ }
+ if (vol_id == -1 || vol_id == -1) {
+ usage(argv[0]);
+ return EXIT_FAILURE;
+ }
+
+ req.ubi_num = ubi_num;
+ req.vol_id = vol_id;
+
+ fd = open(CONTROL_NODE, O_RDONLY);
+ if (fd == -1) {
+ fprintf(stderr, "Error while opening the control node: %s\n",
+ strerror(errno));
+ return EXIT_FAILURE;
+ }
+
+ ret = ioctl(fd, command, &req);
+ if (ret == -1) {
+ fprintf(stderr, "Error while ioctl: %s\n", strerror(errno));
+ close(fd);
+ return EXIT_FAILURE;
+ }
+ close(fd);
+ return EXIT_SUCCESS;
+}
diff --git a/ubi-utils/ubiblkadd b/ubi-utils/ubiblkadd
new file mode 100755
index 0000000..fd8fbdd
--- /dev/null
+++ b/ubi-utils/ubiblkadd
@@ -0,0 +1,2 @@
+#! /bin/sh
+ubiblk_ctrl -a -u $0 -v $1
diff --git a/ubi-utils/ubiblkdel b/ubi-utils/ubiblkdel
new file mode 100755
index 0000000..40a1fd1
--- /dev/null
+++ b/ubi-utils/ubiblkdel
@@ -0,0 +1,2 @@
+#! /bin/sh
+ubiblk_ctrl -d -u $0 -v $1
--
1.7.0.4
^ permalink raw reply related
* [PATCHv3] UBI: new module ubiblk: block layer on top of UBI
From: david.wagner @ 2011-08-17 13:17 UTC (permalink / raw)
To: linux-mtd
Cc: Artem Bityutskiy, linux-embedded, lkml, Tim Bird, David Woodhouse,
David Wagner
In-Reply-To: <1308922482-14967-1-git-send-email-david.wagner@free-electrons.com>
From: David Wagner <david.wagner@free-electrons.com>
ubiblk is a read-only block layer on top of UBI. It presents UBI volumes as
read-only block devices (named ubiblk_X_Y, where X is the UBI device number
and Y the Volume ID).
It is used by putting a block filesystem image on a UBI volume, creating the
corresponding ubiblk device and then mounting it.
It uses the UBI API to register to UBI notifications and to read from the
volumes. It also creates a ubiblk_ctrl device node that simply receives ioctl
from a userspace tool for creating/removing ubiblk devices.
Some code is taken from mtd_blkdevs and gluebi. Some code for the ioctl part is
also inspired from ubi's core.
Advantages of ubiblk over gluebi+mtdblock_ro:
* Simpler architecture
* The numbering of devices is much easier with ubiblk than with
gluebi+mtdblock_ro. With gluebi+mtdblock_ro, you get one additional MTD
device for each UBI volume, so the number of MTD devices grows quite a lot
and is a bit difficult to understand. For example, mtdblock[0-4] might be
your real MTD partitions, while mtdblock[5-9] might be your UBI volumes.
It also means that if a new real MTD partition is added, the number of all
the MTD devices exposing UBI volumes will be incremented by one, which is a
bit confusing/annoying.
As well, if you add an UBI volume, the mtdblock devices that are emulated on
top of volumes that come after this new one will have their ID incremented.
* ubiblk devices are created on a 'on-demand' basis, which avoids wasting
resources.
* The performance appears to be slightly better with ubiblk than
gluebi+mtdblock_ro, according to our benchmarks (see
http://elinux.org/Flash_Filesystem_Benchmarks_2.6.39)
TODO:
* the modules keeps a table of the devices which length is the maximum number
of UBI volumes. There should be a better solution (linked list or, as
Christoph Hellwig suggests, a radix tree (idr)).
Signed-off-by: David Wagner <david.wagner@free-electrons.com>
---
Hi Artem,
Here is the v3 of ubiblk implementing ioctls for on-demand creation/removal of
ubiblk device ; is it what you were thinking of ?
I also wrote a minimal (and ugly w.r.t argument parsing) tool in mtd-utils'
git ; I'll send it as a reply to this email.
known issue: attempts to create the same ubiblk device twice aren't catched and,
obviously, make things go wrong (see TODO).
Questions:
==========
I wasn't sure what magic ioctl number to use, so I settled to use the same one
as a part of UBI: 'O', which was so far only used by UBI but on a higher range
and leaving some room for UBI to add ioctls (for nw, it uses 'O'/0x00-0x06 and
ubiblk uses 'O'/0x10-0x11). Is it ok or should ubiblk use a different
number/range ?
updates from v3:
===============
- Removed unused parameter "struct ubi_device_info" from ubiblk_create
- Synchronisation: forgot to release a mutex on the error path of ubiblk_create
- New header ubiblk-user.h, contains ioctl numbers ; patch
Documentation/ioctl/ioctl-numbers.txt (use the same magic as UBI but on
another range, leaving expansion margin for UBI)
- Add a ioctl handler and make it use the already-existing
ubiblk_{create,remove} functions (inspired from ubi/cdev.c).
- Register a miscdevice (inspired from ubi/build.c) that will only receive ioctls
- Notifications: don't receive a notifications for already-existing volumes
upon registration ; don't create a ubiblk device when a new UBI volume
appears
- Amend commit message and Kconfig according to these previous points
TODO:
=====
- Use a dynamic structure for storing the ubiblk_devs (linked list or idr ?)
- After this task is done, check that using ubiblkadd can't create an
already-existing ubiblk device (see warning/todo in the code)
Documentation/ioctl/ioctl-number.txt | 1 +
drivers/mtd/ubi/Kconfig | 15 +
drivers/mtd/ubi/Makefile | 1 +
drivers/mtd/ubi/ubiblk.c | 653 ++++++++++++++++++++++++++++++++++
include/mtd/ubiblk-user.h | 43 +++
5 files changed, 713 insertions(+), 0 deletions(-)
create mode 100644 drivers/mtd/ubi/ubiblk.c
create mode 100644 include/mtd/ubiblk-user.h
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 3a46e36..240cf0f 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -150,6 +150,7 @@ Code Seq#(hex) Include File Comments
'M' 00-0F drivers/video/fsl-diu-fb.h conflict!
'N' 00-1F drivers/usb/scanner.h
'O' 00-06 mtd/ubi-user.h UBI
+'O' 10-11 mtd/ubiblk-user.h ubiblk
'P' all linux/soundcard.h conflict!
'P' 60-6F sound/sscape_ioctl.h conflict!
'P' 00-0F drivers/usb/class/usblp.c conflict!
diff --git a/drivers/mtd/ubi/Kconfig b/drivers/mtd/ubi/Kconfig
index 4dcc752..3eedf9a 100644
--- a/drivers/mtd/ubi/Kconfig
+++ b/drivers/mtd/ubi/Kconfig
@@ -60,4 +60,19 @@ config MTD_UBI_DEBUG
help
This option enables UBI debugging.
+config MTD_UBI_UBIBLK
+ tristate "Read-only block transition layer on top of UBI"
+ help
+ Read-only block interface on top of UBI.
+
+ This option adds ubiblk, which creates a read-ony block device from
+ UBI volumes. It makes it possible to use block filesystems on top of
+ UBI (and thus, on top of MTDs while avoiding bad blocks).
+
+ ubiblk devices are created by sending appropriate ioctl to the
+ ubiblk_ctrl device node.
+
+ The devices are named ubiblkX_Y where X is the UBI number and Y is
+ the Volume ID.
+
endif # MTD_UBI
diff --git a/drivers/mtd/ubi/Makefile b/drivers/mtd/ubi/Makefile
index c9302a5..354b2df 100644
--- a/drivers/mtd/ubi/Makefile
+++ b/drivers/mtd/ubi/Makefile
@@ -5,3 +5,4 @@ ubi-y += misc.o
ubi-$(CONFIG_MTD_UBI_DEBUG) += debug.o
obj-$(CONFIG_MTD_UBI_GLUEBI) += gluebi.o
+obj-$(CONFIG_MTD_UBI_UBIBLK) += ubiblk.o
diff --git a/drivers/mtd/ubi/ubiblk.c b/drivers/mtd/ubi/ubiblk.c
new file mode 100644
index 0000000..20c0896
--- /dev/null
+++ b/drivers/mtd/ubi/ubiblk.c
@@ -0,0 +1,653 @@
+/*
+ * Copyright (c) Free Electrons, 2011
+ * Copyright (c) International Business Machines Corp., 2006
+ * Copyright © 2003-2010 David Woodhouse <dwmw2@infradead.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
+ * the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * Author: David Wagner
+ * Some code taken from gluebi.c (Artem Bityutskiy (Битюцкий Артём),
+ * Joern Engel)
+ * Some code taken from cdev.c (Artem Bityutskiy (Битюцкий Артём))
+ * Some code taken from mtd_blkdevs.c (David Woodhouse)
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/vmalloc.h>
+#include <linux/mtd/ubi.h>
+#include <linux/blkdev.h>
+#include <linux/miscdevice.h>
+#include <linux/kthread.h>
+#include <linux/mutex.h>
+#include <mtd/ubiblk-user.h>
+#include "ubi.h"
+
+#define BLK_SIZE 512
+
+#define UBIBLK_MAX_DEVS (UBI_MAX_DEVICES * UBI_MAX_VOLUMES)
+
+/*
+ * Structure representing a ubiblk device, proxying a UBI volume
+ */
+struct ubiblk_dev {
+ struct ubi_volume_desc *vol_desc;
+ struct ubi_volume_info *vol_info;
+ int ubi_num;
+ int vol_id;
+
+ /* Block stuff */
+ struct gendisk *gd;
+ struct request_queue *rq;
+ struct task_struct *thread;
+
+ /* Protects the access to the UBI volume */
+ struct mutex lock;
+
+ /* Avoids concurrent accesses to the request queue */
+ spinlock_t queue_lock;
+};
+
+/*
+ * Contains the pointers to all ubiblk_dev instances
+ * TODO: use a linked list
+ */
+static struct ubiblk_dev *ubiblk_devs[UBIBLK_MAX_DEVS];
+static struct mutex devtable_lock;
+
+static int ubiblk_major;
+static const struct block_device_operations ubiblk_ops;
+
+static struct ubiblk_dev *ubiblk_find_dev(struct ubi_volume_info *vol_info)
+{
+ int i;
+ struct ubiblk_dev *dev;
+
+ mutex_lock(&devtable_lock);
+ for (i = 0; i < UBIBLK_MAX_DEVS; i++) {
+ dev = ubiblk_devs[i];
+ if (dev && dev->ubi_num == vol_info->ubi_num &&
+ dev->vol_id == vol_info->vol_id)
+ break;
+ }
+ mutex_unlock(&devtable_lock);
+ if (i == UBIBLK_MAX_DEVS)
+ return NULL;
+ return dev;
+}
+
+/*
+ * Read a LEB and fill the request buffer with the requested sector
+ */
+static int do_ubiblk_request(struct request *req, struct ubiblk_dev *dev)
+{
+ unsigned long start, len, read_bytes;
+ int offset;
+ int leb;
+ int ret;
+
+ start = blk_rq_pos(req) << 9;
+ len = blk_rq_cur_bytes(req);
+ read_bytes = 0;
+
+ /* We are always reading. No need to handle writing for now */
+
+ leb = start / dev->vol_info->usable_leb_size;
+ offset = start % dev->vol_info->usable_leb_size;
+
+ do {
+ if (offset + len > dev->vol_info->usable_leb_size)
+ len = dev->vol_info->usable_leb_size - offset;
+
+ if (unlikely(blk_rq_pos(req) + blk_rq_cur_sectors(req) >
+ get_capacity(req->rq_disk))) {
+ pr_err("UBIBLK: attempting to read too far\n");
+ return -EIO;
+ }
+
+ pr_debug("%s(%s) of sector %llu (LEB %d). offset=%d, len=%lu\n",
+ __func__, rq_data_dir(req) ? "Write" : "Read",
+ blk_rq_pos(req), leb, offset, len);
+
+ /* Read (len) bytes of LEB (leb) from (offset) and put the
+ * result in the buffer given by the request.
+ * If the request is overlapping on several lebs, (read_bytes)
+ * will be > 0 and the data will be put in the buffer at
+ * offset (read_bytes)
+ */
+ ret = ubi_read(dev->vol_desc, leb, req->buffer + read_bytes,
+ offset, len);
+
+ if (ret) {
+ pr_err("ubi_read error\n");
+ return ret;
+ }
+
+ read_bytes += len;
+
+ len = blk_rq_cur_bytes(req) - read_bytes;
+ leb++;
+ offset = 0;
+ } while (read_bytes < blk_rq_cur_bytes(req));
+
+ pr_debug("ubi_read done.\n");
+
+ return 0;
+}
+
+static void ubi_ubiblk_request(struct request_queue *rq)
+{
+ struct ubiblk_dev *dev;
+ struct request *req = NULL;
+
+ dev = rq->queuedata;
+
+ if (!dev)
+ while ((req = blk_fetch_request(rq)) != NULL)
+ __blk_end_request_all(req, -ENODEV);
+ else
+ wake_up_process(dev->thread);
+}
+
+/*
+ * Open a UBI volume (get the volume descriptor)
+ */
+static int ubiblk_open(struct block_device *bdev, fmode_t mode)
+{
+ struct ubiblk_dev *dev = bdev->bd_disk->private_data;
+ pr_debug("%s() disk_name=%s, mode=%d\n", __func__,
+ bdev->bd_disk->disk_name, mode);
+
+ dev->vol_desc = ubi_open_volume(dev->ubi_num, dev->vol_id,
+ UBI_READONLY);
+ if (!dev->vol_desc) {
+ pr_err("open_volume failed");
+ return -EINVAL;
+ }
+
+ dev->vol_info = kzalloc(sizeof(struct ubi_volume_info), GFP_KERNEL);
+ if (!dev->vol_info) {
+ ubi_close_volume(dev->vol_desc);
+ dev->vol_desc = NULL;
+ return -ENOMEM;
+ }
+ ubi_get_volume_info(dev->vol_desc, dev->vol_info);
+
+ return 0;
+}
+
+/*
+ * Close a UBI volume (close the volume descriptor)
+ */
+static int ubiblk_release(struct gendisk *gd, fmode_t mode)
+{
+ struct ubiblk_dev *dev = gd->private_data;
+ pr_debug("%s() disk_name=%s, mode=%d\n", __func__, gd->disk_name, mode);
+
+ kfree(dev->vol_info);
+ dev->vol_info = NULL;
+ if (dev->vol_desc) {
+ ubi_close_volume(dev->vol_desc);
+ dev->vol_desc = NULL;
+ }
+
+ return 0;
+}
+
+/*
+ * Loop on the block request queue and wait for new requests ; run them with
+ * do_ubiblk_request()
+ *
+ * Mostly copied from mtd_blkdevs.c
+ */
+static int ubi_ubiblk_thread(void *arg)
+{
+ struct ubiblk_dev *dev = arg;
+ struct request_queue *rq = dev->rq;
+ struct request *req = NULL;
+
+ spin_lock_irq(rq->queue_lock);
+
+ while (!kthread_should_stop()) {
+ int res;
+
+ if (!req && !(req = blk_fetch_request(rq))) {
+ set_current_state(TASK_INTERRUPTIBLE);
+
+ if (kthread_should_stop())
+ set_current_state(TASK_RUNNING);
+
+ spin_unlock_irq(rq->queue_lock);
+ schedule();
+ spin_lock_irq(rq->queue_lock);
+ continue;
+ }
+
+ spin_unlock_irq(rq->queue_lock);
+
+ mutex_lock(&dev->lock);
+ res = do_ubiblk_request(req, dev);
+ pr_debug("return from request: %d\n", res);
+ mutex_unlock(&dev->lock);
+
+ spin_lock_irq(rq->queue_lock);
+
+ if (!__blk_end_request_cur(req, res))
+ req = NULL;
+ }
+
+ if (req)
+ __blk_end_request_all(req, -EIO);
+
+ spin_unlock_irq(rq->queue_lock);
+
+ return 0;
+}
+
+/*
+ * An UBI volume has been created ; create a corresponding ubiblk device:
+ * Initialize the locks, the structure, the block layer infos and start a
+ * thread.
+ */
+static int ubiblk_create(struct ubi_volume_info *vol_info)
+{
+ struct ubiblk_dev *dev;
+ struct gendisk *gd;
+ int i;
+ int ret = 0;
+
+ mutex_lock(&devtable_lock);
+#warning "TODO: Make sure the ubiblk device doesn't already exist (to be done after introducing the use of a dynamic structure for storing the ubiblk_devs)"
+ for (i = 0; i < UBIBLK_MAX_DEVS; i++)
+ if (!ubiblk_devs[i])
+ break;
+
+ if (i == UBIBLK_MAX_DEVS) {
+ /* Shouldn't happen: UBI can't make more volumes than that */
+ pr_err("no slot left for a new ubiblk device.\n");
+ mutex_unlock(&devtable_lock);
+ return -ENOMEM;
+ }
+
+ dev = kzalloc(sizeof(struct ubiblk_dev), GFP_KERNEL);
+ if (!dev) {
+ pr_err("UBIBLK: ENOMEM when trying to create a new"
+ "ubiblk dev\n");
+ mutex_unlock(&devtable_lock);
+ return -ENOMEM;
+ }
+ ubiblk_devs[i] = dev;
+ mutex_unlock(&devtable_lock);
+
+ mutex_init(&dev->lock);
+ mutex_lock(&dev->lock);
+
+ dev->ubi_num = vol_info->ubi_num;
+ dev->vol_id = vol_info->vol_id;
+
+ dev->vol_desc = ubi_open_volume(dev->ubi_num, dev->vol_id,
+ UBI_READONLY);
+ if (IS_ERR(dev->vol_desc)) {
+ pr_err("open_volume failed\n");
+ ret = PTR_ERR(dev->vol_desc);
+ goto out_vol;
+ }
+
+ dev->vol_info = kzalloc(sizeof(struct ubi_volume_info), GFP_KERNEL);
+ if (!dev->vol_info) {
+ ret = -ENOMEM;
+ goto out_info;
+ }
+ ubi_get_volume_info(dev->vol_desc, dev->vol_info);
+
+ pr_info("Got volume %s: device %d/volume %d of size %d\n",
+ dev->vol_info->name, dev->ubi_num, dev->vol_id,
+ dev->vol_info->size);
+
+ /* Initialize the gendisk of this ubiblk device */
+ gd = alloc_disk(1);
+ if (!gd) {
+ pr_err("alloc_disk failed\n");
+ ret = -ENODEV;
+ goto out_disk;
+ }
+
+ gd->fops = &ubiblk_ops;
+ gd->major = ubiblk_major;
+ gd->first_minor = dev->ubi_num * UBI_MAX_VOLUMES + dev->vol_id;
+ gd->private_data = dev;
+ sprintf(gd->disk_name, "ubiblk%d_%d", dev->ubi_num, dev->vol_id);
+ pr_debug("creating a gd '%s'\n", gd->disk_name);
+ set_capacity(gd,
+ (dev->vol_info->size *
+ dev->vol_info->usable_leb_size) >> 9);
+ set_disk_ro(gd, 1);
+ dev->gd = gd;
+
+ spin_lock_init(&dev->queue_lock);
+ dev->rq = blk_init_queue(ubi_ubiblk_request, &dev->queue_lock);
+ if (!dev->rq) {
+ pr_err("init_queue failed\n");
+ ret = -ENODEV;
+ goto out_queue;
+ }
+ dev->rq->queuedata = dev;
+ blk_queue_logical_block_size(dev->rq, BLK_SIZE);
+ dev->gd->queue = dev->rq;
+
+ /* Stolen from mtd_blkdevs.c */
+ /* Create processing thread
+ *
+ * The processing of the request has to be done in process context (it
+ * might sleep) but blk_run_queue can't block ; so we need to separate
+ * the event of a request being added to the queue (which triggers the
+ * callback ubi_ubiblk_request - that is set with blk_init_queue())
+ * and the processing of that request.
+ *
+ * Thus, the sole purpose of ubi_ubiblk_reuqest is to wake the kthread
+ * up so that it will process the request queue
+ */
+ dev->thread = kthread_run(ubi_ubiblk_thread, dev, "%s%d_%d",
+ "kubiblk", dev->ubi_num, dev->vol_id);
+ if (IS_ERR(dev->thread)) {
+ ret = PTR_ERR(dev->thread);
+ goto out_thread;
+ }
+
+ add_disk(dev->gd);
+ kfree(dev->vol_info);
+ dev->vol_info = NULL;
+ ubi_close_volume(dev->vol_desc);
+ dev->vol_desc = NULL;
+ mutex_unlock(&dev->lock);
+
+ return 0;
+
+out_thread:
+ blk_cleanup_queue(dev->rq);
+out_queue:
+ put_disk(dev->gd);
+out_disk:
+ kfree(dev->vol_info);
+ dev->vol_info = NULL;
+out_info:
+ ubi_close_volume(dev->vol_desc);
+ dev->vol_desc = NULL;
+out_vol:
+ mutex_unlock(&dev->lock);
+
+ return ret;
+}
+
+/*
+ * A UBI has been removed ; destroy the corresponding ubiblk device
+ */
+static int ubiblk_remove(struct ubi_volume_info *vol_info)
+{
+ int i;
+ struct ubiblk_dev *dev;
+
+ mutex_lock(&devtable_lock);
+ for (i = 0; i < UBIBLK_MAX_DEVS; i++) {
+ dev = ubiblk_devs[i];
+ if (dev && dev->ubi_num == vol_info->ubi_num &&
+ dev->vol_id == vol_info->vol_id)
+ break;
+ }
+ if (i == UBIBLK_MAX_DEVS) {
+ mutex_unlock(&devtable_lock);
+ pr_warn("Trying to remove %s, but it isn't handled by ubiblk\n",
+ vol_info->name);
+ return -ENODEV;
+ }
+
+ pr_info("ubiblk: Removing %s\n", vol_info->name);
+
+ if (dev->vol_desc) {
+ ubi_close_volume(dev->vol_desc);
+ dev->vol_desc = NULL;
+ }
+
+ del_gendisk(dev->gd);
+ blk_cleanup_queue(dev->rq);
+ kthread_stop(dev->thread);
+ put_disk(dev->gd);
+
+ kfree(dev->vol_info);
+
+ kfree(ubiblk_devs[i]);
+ ubiblk_devs[i] = NULL;
+
+ mutex_unlock(&devtable_lock);
+ return 0;
+}
+
+static int ubiblk_resized(struct ubi_volume_info *vol_info)
+{
+ struct ubiblk_dev *dev;
+
+ dev = ubiblk_find_dev(vol_info);
+ if (!dev) {
+ pr_warn("Trying to resize %s, which is unknown from ubiblk\n",
+ vol_info->name);
+ return -ENODEV;
+ }
+
+ mutex_lock(&dev->lock);
+ set_capacity(dev->gd,
+ (vol_info->size * vol_info->usable_leb_size) >> 9);
+ mutex_unlock(&dev->lock);
+ pr_debug("Resized ubiblk%d_%d to %d LEBs\n", vol_info->ubi_num,
+ vol_info->vol_id, vol_info->size);
+ return 0;
+}
+
+/*
+ * Dispatches the UBI notifications
+ * copied from gluebi.c
+ */
+static int ubiblk_notify(struct notifier_block *nb,
+ unsigned long notification_type, void *ns_ptr)
+{
+ struct ubi_notification *nt = ns_ptr;
+
+ switch (notification_type) {
+ case UBI_VOLUME_ADDED:
+ break;
+ case UBI_VOLUME_REMOVED:
+ ubiblk_remove(&nt->vi);
+ break;
+ case UBI_VOLUME_RESIZED:
+ ubiblk_resized(&nt->vi);
+ break;
+ case UBI_VOLUME_UPDATED:
+ break;
+ case UBI_VOLUME_RENAMED:
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static const struct block_device_operations ubiblk_ops = {
+ .owner = THIS_MODULE,
+ .open = ubiblk_open,
+ .release = ubiblk_release,
+};
+
+static struct notifier_block ubiblk_notifier = {
+ .notifier_call = ubiblk_notify,
+};
+
+
+/*
+ * ioctl handling for managing/unmanaging a UBI volume
+ */
+
+static long ubiblk_ctrl_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ int err = 0;
+ void __user *argp = (void __user *)arg;
+
+ struct ubi_volume_desc *vol_desc;
+ struct ubi_volume_info vol_info;
+ struct ubiblk_ctrl_req req;
+
+ if (!capable(CAP_SYS_RESOURCE))
+ return -EPERM;
+
+ err = copy_from_user(&req, argp, sizeof(struct ubiblk_ctrl_req));
+ if (err)
+ return -EFAULT;
+
+ if (req.ubi_num < 0 || req.vol_id < 0)
+ return -EINVAL;
+
+ vol_desc = ubi_open_volume(req.ubi_num, req.vol_id, UBI_READONLY);
+ if (IS_ERR(vol_desc)) {
+ pr_err("Opening volume %d on device %d failed\n",
+ req.vol_id, req.ubi_num);
+ return PTR_ERR(vol_desc);
+ }
+
+ ubi_get_volume_info(vol_desc, &vol_info);
+
+ switch (cmd) {
+ case UBIBLK_IOCADD:
+ pr_info("Creating a ubiblk device proxing ubi%d:%d\n",
+ req.ubi_num, req.vol_id);
+ ubiblk_create(&vol_info);
+ break;
+ case UBIBLK_IOCDEL:
+ pr_info("Removing ubiblk from ubi%d:%d\n",
+ req.ubi_num, req.vol_id);
+ ubiblk_remove(&vol_info);
+ break;
+
+ default:
+ err = -ENOTTY;
+ break;
+ }
+
+ return err;
+}
+
+#ifdef CONFIG_COMPAT
+static long ubiblk_ctrl_compat_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ unsigned long translated_arg = (unsigned long)compat_ptr(arg);
+
+ return ubiblk_ctrl_ioctl(file, cmd, translated_arg);
+}
+#endif
+
+/* ubiblk control device (receives ioctls) */
+static const struct file_operations ubiblk_ctrl_ops = {
+ .owner = THIS_MODULE,
+ .unlocked_ioctl = ubiblk_ctrl_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = ubiblk_ctrl_compat_ioctl,
+#endif
+ .llseek = no_llseek,
+};
+static struct miscdevice ubiblk_ctrl_dev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "ubiblk_ctrl",
+ .fops = &ubiblk_ctrl_ops,
+};
+
+/*
+ * Initialize the module
+ * (Get a major number and register to UBI notifications)
+ */
+static int __init ubi_ubiblk_init(void)
+{
+ int ret = 0;
+
+ pr_info("UBIBLK starting\n");
+
+ ret = register_blkdev(0, "ubiblk");
+ if (ret <= 0) {
+ pr_err("UBIBLK: could not register_blkdev\n");
+ return -ENODEV;
+ }
+ ubiblk_major = ret;
+ pr_info("UBIBLK: device's major: %d\n", ubiblk_major);
+
+ mutex_init(&devtable_lock);
+
+ ret = misc_register(&ubiblk_ctrl_dev);
+ if (ret) {
+ pr_err("Cannot register ubiblk_ctrl\n");
+ goto out_unreg_blk;
+ }
+
+ ret = ubi_register_volume_notifier(&ubiblk_notifier, 1);
+ if (ret < 0)
+ goto out_unreg_ctrl;
+
+ return ret;
+
+out_unreg_ctrl:
+ misc_deregister(&ubiblk_ctrl_dev);
+out_unreg_blk:
+ unregister_blkdev(ubiblk_major, "ubiblk");
+
+ return ret;
+}
+
+/*
+ * End of life
+ * unregister the block device major, unregister from UBI notifications,
+ * stop the threads and free the memory.
+ */
+static void __exit ubi_ubiblk_exit(void)
+{
+ int i;
+
+ pr_info("UBIBLK: going to exit\n");
+
+ ubi_unregister_volume_notifier(&ubiblk_notifier);
+ misc_deregister(&ubiblk_ctrl_dev);
+
+ for (i = 0; i < UBIBLK_MAX_DEVS; i++) {
+ struct ubiblk_dev *dev = ubiblk_devs[i];
+ if (!dev)
+ continue;
+
+ if (dev->vol_desc)
+ ubi_close_volume(dev->vol_desc);
+
+ del_gendisk(dev->gd);
+ blk_cleanup_queue(dev->rq);
+ kthread_stop(dev->thread);
+ put_disk(dev->gd);
+
+ kfree(dev->vol_info);
+ kfree(ubiblk_devs[i]);
+ }
+
+ unregister_blkdev(ubiblk_major, "ubiblk");
+ pr_info("UBIBLK: The End\n");
+}
+
+module_init(ubi_ubiblk_init);
+module_exit(ubi_ubiblk_exit);
+MODULE_DESCRIPTION("Read-only block transition layer on top of UBI");
+MODULE_AUTHOR("David Wagner");
+MODULE_LICENSE("GPL");
diff --git a/include/mtd/ubiblk-user.h b/include/mtd/ubiblk-user.h
new file mode 100644
index 0000000..fa0d007
--- /dev/null
+++ b/include/mtd/ubiblk-user.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright © Free Electrons, 2011
+ * Copyright © International Business Machines Corp., 2006
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
+ * the GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ * Author: David Wagner
+ * Some code taken from ubi-user.h
+ */
+
+#ifndef __UBIBLK_USER_H__
+#define __UBIBLK_USER_H__
+
+#include <linux/types.h>
+
+/* Structure to be passed to UBIBLK_IOCADD or IOCDEL ioctl */
+struct ubiblk_ctrl_req {
+ __s32 ubi_num;
+ __s32 vol_id;
+};
+
+/* ioctl commands of the UBI control character device */
+
+#define UBIBLK_CTRL_IOC_MAGIC 'O'
+
+/* Create a ubiblk device from a UBI volume */
+#define UBIBLK_IOCADD _IOW(UBIBLK_CTRL_IOC_MAGIC, 0x10, struct ubiblk_ctrl_req)
+/* Delete a ubiblk device */
+#define UBIBLK_IOCDEL _IOW(UBIBLK_CTRL_IOC_MAGIC, 0x11, struct ubiblk_ctrl_req)
+
+#endif
--
1.7.0.4
^ permalink raw reply related
* Re: [PATCH] UBI: new module ubiblk: block layer on top of UBI
From: Artem Bityutskiy @ 2011-08-15 11:56 UTC (permalink / raw)
To: David Wagner; +Cc: linux-mtd, dwmw2, linux-kernel, linux-embedded, tim.bird
In-Reply-To: <1311683250-7921-1-git-send-email-david.wagner@free-electrons.com>
On Tue, 2011-07-26 at 14:27 +0200, David Wagner wrote:
> Advantages of ubiblk over gluebi+mtdblock_ro:
>
> * Simpler architecture
>
> * The numbering of devices is much easier with ubiblk than with
> gluebi+mtdblock_ro. With gluebi+mtdblock_ro, you get one additional
> MTD device for each UBI volume, so the number of MTD devices grows
> quite a lot and is a bit difficult to understand. For example,
> mtdblock[0-4] might be your real MTD partitions, while mtdblock[5-9]
> might be your UBI volumes.
> It also means that if a new real MTD
> partition is added, the number of all the MTD devices exposing UBI
> volumes will be incremented by one, which is a bit
> confusing/annoying.
> As well, if you add an UBI volume, the mtdblock devices that are
> emulated on top of volumes that come after this new one will have
> their ID incremented.
>
> * The performance appears to be slightly better with ubiblk than
> gluebi+mtdblock_ro, according to our benchmarks (see
> http://elinux.org/Flash_Filesystem_Benchmarks_2.6.39)
Hi, sounds good.
However, what I think is wrong is to follow the MTD + mtdblock approach:
mtdblock creates a block device for every MTD device - this is wasteful
and confusing. Indeed, it is wasteful because we end up with potentially
many unneeded block devices which are not used but consume resources.
Confusing is because people get impression that /dev/mtdblock* are the
right interfaces to access the MTD devices, which is wrong.
You can say that gluebi is also doing something like that - yes, this is
because gluebi is kind of temporary solution for software which cannot
work with UBI. We wanted to keep it very simple.
My suggestion is to do something similar to UBI:
* make ubiblk register its own character device (/dev/ubiblk_ctl,
similar to /dev/ubi_ctl).
* add a couple of ioclts to create / delete ubiblk devices (similar to
UBI attach/detach)
* add a couple of user-space tools to mtd-utils to create / delete
ubiblk devices (similar to ubiattach / ubidetach).
The other reason why I think this more complex approach is justified is
that when / if you decide to add R/W support to ubiblk, you'd have
troubles with the current approach (e.g., what if the UBI volume you are
looking at is not ubiblk-formatted?)
I think it should be rather easy to implement /dev/ubiblk_ctl.
--
Best Regards,
Artem Bityutskiy
^ permalink raw reply
* Re: RFC: [Restatement] KBUS messaging subsystem
From: Pekka Enberg @ 2011-08-15 11:46 UTC (permalink / raw)
To: Tony Ibbs
Cc: lkml, Andrew Morton, Jonathan Corbet, Florian Fainelli,
Grant Likely, Linux-embedded, Tibs at Kynesim, Richard Watts
In-Reply-To: <63D130AC-E6DF-4A61-BB69-30212D238F37@tonyibbs.co.uk>
Hi Tony,
On Sun, Aug 7, 2011 at 11:24 PM, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
> Real examples of usage that aren't the STB are a bit difficult to give
> because they belong to customer projects that we're not allowed to
> talk about.
That's part of the problem, I suppose. We usually don't merge new
kernel facilities unless we're able to understand (and see) real
applications that need them.
On Sun, Aug 7, 2011 at 11:24 PM, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
> I assume the real point of your post is that I wrote about the reasons
> why we made KBUS a kernel module, but did not really address the
> reasons why KBUS might want to be a kernel module in general usage.
I simply don't see a convincing argument why existing IPC and other
kernel mechanisms are not sufficient to implement what you need. I'm
sure there is one but it's not apparent from your emails.
The whole thing feels more like "lets put a message broker into the
kernel" rather than set of kernel APIs that make sense. I suppose the
rather extensive ioctl() ABI is partly to blame here.
I'm not saying you need to implement everything in userspace but I'm
also not convinced we want _all of this_ in the kernel.
Pekka
^ permalink raw reply
* Re: RFC: [Restatement] KBUS messaging subsystem
From: Tony Ibbs @ 2011-08-07 20:24 UTC (permalink / raw)
To: Pekka Enberg
Cc: lkml, Andrew Morton, Jonathan Corbet, Florian Fainelli,
Grant Likely, Linux-embedded, Tibs at Kynesim, Richard Watts
In-Reply-To: <CAOJsxLFSdPO2ebjM9W9JaNcX80COzn=Y3hFukXZjn+1yY4ivZQ@mail.gmail.com>
On 3 Aug 2011, at 21:48, Pekka Enberg wrote:
> Your description doesn't really explain what you want to use this
> thing exactly for in userspace.
A typical use might be communicating between components in a
set-top-box (STB). This might involve:
* Some sort of GUI user interface (e.g., a browser). This will
send control messages and receive state messages.
* Some sort of IR input, reading keypresses from a remote control. The
program reading the keypresses will decide to send control messages
for some of them.
* Possibly input from a mobile phone (over bluetooth or whatever),
acting as another source of control. It's possible messages may also
be received that require sending information back to the phone.
* A process reading data streams from the network and passing the
appropriate parts therefrom to audio and video decoders. This will
receive messages to tell it which programs to play, and send
messages indicating what it is doing.
* Another process recording programs to disk, as directed by the user
inputs. It may need to send messages to the process reading data
streams. It will also send messages of interest to the GUI.
* A process playing programs back from disk, including "trick play" -
that is, fast forward, skip and reverse. Obviously it receives
messages telling it which program to play, and what trick play
operations to perform. It in turn will send messages to the UI to
say what it is doing.
Having the listener choose what it wants to listen to is a clear win
in these circumstances - it means that the sender of a message does
not need to know if a new piece of infrastructure is added that also
wants to receive it.
Similarly, allowing any sender to send a particular request also makes
sense, as several processes might want to ask the current location of
play in the displayed video stream, or to request some sort of trick
play action.
(I'm sure all of this could be done perfectly well with, for instance,
DBus as well, but I hope I've adequately explained elsewhere why
that's not an applicable solution.)
A small example might be several programs waiting for particular
conditions to be satisfied, and sending messages to a central program
which lights up LEDs according to the messages it reveives.
Real examples of usage that aren't the STB are a bit difficult to give
because they belong to customer projects that we're not allowed to
talk about.
> On Fri, Jul 29, 2011 at 12:48 AM, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
> > So why did we write it as a kernel module?
> > ==========================================
> > As implementors, a kernel module makes a lot of sense. Not least
> > because:
> >
> > * It gives us a lot of things for free, including list handling,
> > reference counting, thread safety and (on larger systems)
> > multi-processor support, which we would otherwise have to write and
> > debug ourselves. This also keeps our codebase smaller.
>
> That's not a reason to put this into the kernel, really.
It's part of the reason why we wrote KBUS as a kernel module, which is
what this section was about. Agreed, it's not a reason that one can
readily use to argue that "X" (whatever that may be) should go in the
kernel-as-distributed, or we'd have all of user space there, which
would no longer be Linux (not sure what it *would* be).
> > * It helps give us reliability, partly because of the code we're
> > relying on, partly because of the strictures of working in the
> > kernel, partly by shielding us from userspace.
>
> So now instead of crashing in userspace, we crash the kernel? This
> seems like a bogus argument as well.
Well, ignoring the tone of that comment, the same argument as above
applies. Although I would point out that what I was saying was that it
would be intrinsically much less likely to crash anywhere because it
is a kernel module.
> > * It reduces message copying (we have userspace to kernel back to
> > userspace, as opposed to a userspace daemon communicating with
> > clients via sockets)
>
> Now this sounds like a real reason but you'd have to explain why you
> can't reuse existing zero-copy mechanisms like splice() and tee().
Hmm. vmsplice() too, presumably. I'll freely admit I don't know
anything beyond what I've just read about these functions. If one was
writing KBUS from scratch as a userspace library, with associated
daemon, then they might well be useful, but one would need to think
their use through very carefully, and I don't believe the code would
be simple (the image I have in mind is managing message structures
with two-metre long tongs, through an air-water boundary).
> > * It makes it simple for us to tell when a message recipient has "gone
> > away", as the kernel will call our "release" callback for us.
>
> Again, sounds like a reasonable technical requirement but doesn't
> really justify putting all this code into the kernel.
I'll get back to that below.
> > * It allows us to provide the functionality on systems without
> > requiring anything much beyond /dev and maybe /proc in userspace.
>
> Why is this important?
Because we sometimes want to target systems that do not need a
userspace filesystem, either because they are very simple (so their
needs can be satisfied by starting the necessary programs up in init),
or because they're trying to save space, or because they don't have
any physical storage associated with them, etc.
I assume the real point of your post is that I wrote about the reasons
why we made KBUS a kernel module, but did not really address the
reasons why KBUS might want to be a kernel module in general usage.
Obviously, there's one overriding reason, which is key:
* Inter-process messaging is hard to get right, and very easy to get
wrong. The kernel provides low-level mechanisms one can use to write
a userspace inter-process messaging system, but not an actual
solution.
Our contention is that a simple inter-process messaging module is a
worthwhile addition to the toolkit supplied by the kernel. The trick
is not to get over-ambitious (clearly enterprise solutions like DBus
belong in userspace), but to provide a sensible mimumum. KBUS is our
attempt at this, based on our experience of what one actually needs
in a relatively simple system.
Clearly, as the needs of a system grow, there is likely to be a
point at which larger, more powerful solutions may be necessary
(inevitably if you need things KBUS doesn't provide), but that
shouldn't preclude providing the simpler solution.
Otherwise, I'll try to give some subsidiary reasons below, but I'm
bound to have forgotten something. The points aren't in any particular
order.
* I aleady said that it is important that the kernel has a single
point where it knows that a process has gone away. Knowing this is a
fundamental requirement of KBUS, and it would be difficult and
unreliable to do in userspace. I actually think this is a very
important point, as it is at the core of how KBUS works.
* All the queues are in one place.
If KBUS was a userspace daemon, then it has to maintain the same
queues as it does now (in order to get the same effect), plus some
fraction of N message copies in transit through the kernel, where N
is the number of clients sending/receiving messages at a particular
time.
With KBUS in the kernel, that "fraction of N" is not needed, and
thus KBUS can account much more accurately for the memory it is
using. This in turn means that it can be less conservative about the
amount of memory available for its queues, meaning it can have more
messages in transit.
(Note that KBUS at the moment is nowhere near as good at this as it
should be, but resource management is acknowledged to be a problem
that we need to address, and it would be very simple to have a
memory limit per bus.)
Again, it's not that one can't do something similar in userspace,
but that doing it in userspace is both more complicated and more
wasteful.
* On embedded systems with not much memory, the OOM killer can be
quite active in userspace. If the message system is crucial, then it
is a big advantage having it in the kernel, where it cannot be
killed (that's not to claim that KBUS as it stands is well suited to
this use case, but it is more suitable than if it were a userspace
daemon).
(I do realise that there are ways of overriding the OOM killer per
process, but being removed from the problem seems more sensible.)
* KBUS works in each client's priority, and thus avoids priority
inversion problems, compared to userspace daemons.
A userspace daemon must run at its own particuar priority. If it is
high, then a low priority program sending messages can starve a
higher process program, and if it is low, the low prioriy processes
can preempt higher priority processes.
* Userspace peer-to-peer messaging via sockets (for instance) needs a
persistent store of client identities ("names"). Writing this so
that race conditions are minimised is not simple, and doing so makes
the whole messaging infrastructure more complex. I hope the example
at the beginning of this email makes it clearer why we'd rather not
have such.
* It was mentioned before that KBUS being a kernel module makes it
significantly smaller, as it can leverage code that is already
present in the kernel. This can be important on embedded systems,
since NAND flash is slow, and loading an extra few MB of library can
slow the boot process down unacceptably.
This matters to us quite a lot, it may matter less to the general
kernel community...
* Despite having said that we weren't aiming for the sort of security
handling that DBus provides, some security considerations are of
interest. In particular, being a kernel module means that KBUS
definitatively knows the identity of the sender and recipient(s) of
each message. This makes it possible, for instance, for a sender of
a request to assert that it should only succeed (at "write" time) if
the intended recipient is that expected (so if the original recipient
unbinds and a new recipient rebinds, this can be trivially
detected - we use this so a sender can realise that the replier has
changed and will not have any required state).
* Coming back to the "being in the kernel means more code reuse"
issue, this is not insignificant. If your message manager crashes,
for whatever reason, you will typically have lost all the in-transit
messages. This is a fairly serious issue. Reusing lots of well
tested code, and having to adhere to a moderately rigourous coding
style and set of practices helps a lot. It's not enough by itself to
justify being in the kernel, but it should not be ignored as a
contributory factor once one is balancing issues.
* Being in the kernel means that it should be a lot easier to scale to
multiple processors. And other forms of scaling that the kernel does
for you (more or less).
* I've recently received a specific request for support of messaging
between kernel and userspace (and vice-versa). I've yet to look at
the feasibility of this (it's my next job after this email), but I
think it's a fairly simple and non-obscure set of changes to KBUS. I
don't believe this would be as true of a userspace system.
This would allow us to replace writing to a user process that exists
merely to write to a (locally written) driver for a piece of
hardware with direct communication with that driver.
Tibs
^ permalink raw reply
* Re: RFC: [Restatement] KBUS messaging subsystem
From: Tony Ibbs @ 2011-08-07 16:47 UTC (permalink / raw)
To: Tony Ibbs
Cc: Colin Walters, lkml, Andrew Morton, Jonathan Corbet,
Florian Fainelli, Grant Likely, Linux-embedded, Tibs at Kynesim,
Richard Watts
In-Reply-To: <81866CAA-9010-4167-A556-84191E0BDFBA@tonyibbs.co.uk>
On 3 Aug 2011, at 21:14, Tony Ibbs wrote:
> On 29 Jul 2011, at 00:58, Colin Walters wrote:
>
>> You don't mention what your performance requirements are
>> (if any) for example.
>
> We don't particularly have any...
My colleague says that he has previously done measurements that show
KBUS to be about 3 times slower than using shared memory and futexes,
but (of course) much simpler to use.
His implication is, I believe, that this is a respectable speed.
Tibs
^ permalink raw reply
* Re: RFC: [Restatement] KBUS messaging subsystem
From: Pekka Enberg @ 2011-08-03 20:48 UTC (permalink / raw)
To: Tony Ibbs
Cc: lkml, Andrew Morton, Jonathan Corbet, Florian Fainelli,
Grant Likely, Linux-embedded, Tibs at Kynesim, Richard Watts
In-Reply-To: <5F531701-B760-45F0-989D-669013409FE8@tonyibbs.co.uk>
Hi Tony,
Your description doesn't really explain what you want to use this
thing exactly for in userspace.
On Fri, Jul 29, 2011 at 12:48 AM, Tony Ibbs <tibs@tonyibbs.co.uk> wrote:
> So why did we write it as a kernel module?
> ==========================================
> As implementors, a kernel module makes a lot of sense. Not least
> because:
>
> * It gives us a lot of things for free, including list handling,
> reference counting, thread safety and (on larger systems)
> multi-processor support, which we would otherwise have to write and
> debug ourselves. This also keeps our codebase smaller.
That's not a reason to put this into the kernel, really.
> * It helps give us reliability, partly because of the code we're
> relying on, partly because of the strictures of working in the
> kernel, partly by shielding us from userspace.
So now instead of crashing in userspace, we crash the kernel? This
seems like a bogus argument as well.
> * It reduces message copying (we have userspace to kernel back to
> userspace, as opposed to a userspace daemon communicating with
> clients via sockets)
Now this sounds like a real reason but you'd have to explain why you
can't reuse existing zero-copy mechanisms like splice() and tee().
> * It makes it simple for us to tell when a message recipient has "gone
> away", as the kernel will call our "release" callback for us.
Again, sounds like a reasonable technical requirement but doesn't
really justify putting all this code into the kernel.
> * It allows us to provide the functionality on systems without
> requiring anything much beyond /dev and maybe /proc in userspace.
Why is this important?
Pekka
^ permalink raw reply
* Re: [PATCH 00/11] RFC: KBUS messaging subsystem
From: Tony Ibbs @ 2011-08-03 20:23 UTC (permalink / raw)
To: Florian Fainelli
Cc: Jonathan Corbet, Grant Likely, lkml, Linux-embedded,
Tibs at Kynesim, Richard Watts
In-Reply-To: <201105171050.38903.florian@openwrt.org>
On 17 May 2011, at 09:50, Florian Fainelli wrote:
> Sorry for this late answer.
And apologies for my own late response. I'll try to keep this short as
I hope the "Restatement" side-thread will address some of it.
> On Tuesday 22 March 2011 20:36:40 Jonathan Corbet wrote:
>>
>> - Why kbus over, say, a user-space daemon and unix-domain sockets? I'm
>> not sure I see the advantage that comes with putting this into kernel
>> space.
>
> I also fail to see why this would be required. In my opininon you are trading
> the reliability over complexity by putting this in the kernel.
I hope that's addressed in the "So why did we write it as a kernel
module?" section of the "Restatement" message thread. Basically,
I believe a kernel module is smaller and "steals" reliability from
code written and tested by others. That doesn't mean it's a good
solution "in the wild", of course (privately we can add whatever we
want to the kernel, but in public it is and must be controlled).
> Indeed, I would also suggest having a look at what generic netlink already
> provides like messages per application PID, multicasting and marshaling. If
> you intend to keep a part of it in the kernel, you should have a look at this,
> because from my experience with generic netlink, most of the hard job you are
> re-doing here, has already been done in a generic manner.
If we do end up heading that way, I hope you won't mind if I ask
you for advice!
All the best,
Tibs
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox