Embedded Linux development
 help / color / mirror / Atom feed
* Re: [PATCH/RFC] Add Alternative Log Buffer Support for printk Messages
From: Grant Erickson @ 2009-01-07  2:11 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, Stefan Roese, Wolfgang Denx, linux-embedded
In-Reply-To: <1231286642.14860.35.camel@pasglop>

On 1/6/09 4:04 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2008-11-25 at 10:34 -0800, Grant Erickson wrote:
>> This merges support for the previously DENX-only kernel feature of
>> specifying an alternative, "external" buffer for kernel printk
>> messages and their associated metadata. In addition, this ports
>> architecture support for this feature from arch/ppc to arch/powerpc.
>> 
>> Signed-off-by: Grant Erickson <gerickson@nuovations.com>
> 
> Considering the extensive changes to generic code, this patch will
> have to be submitted via the linux-kernel mailing list.
> 
> I suggest you split the generic core change from the powerpc specific
> implementation.
> 
> I'm not sure whether I like the idea myself or not there, so you'll have
> to convince the powers that be to take it.

Ben:

Thanks for the feedback. Matt Sealey had some good feedback
<http://ozlabs.org/pipermail/linuxppc-dev/2008-November/065594.html> which I
have on my to-do list to evaluate.

In the interim, I'll hold off on pushing up to linux-kernel until I've done
that.

Regards,

Grant


^ permalink raw reply

* Re: [PATCH/RFC] Add Alternative Log Buffer Support for printk Messages
From: Grant Erickson @ 2009-01-07  2:11 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, Stefan Roese, Wolfgang Denx, linux-embedded
In-Reply-To: <1231286642.14860.35.camel@pasglop>

On 1/6/09 4:04 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2008-11-25 at 10:34 -0800, Grant Erickson wrote:
>> This merges support for the previously DENX-only kernel feature of
>> specifying an alternative, "external" buffer for kernel printk
>> messages and their associated metadata. In addition, this ports
>> architecture support for this feature from arch/ppc to arch/powerpc.
>> 
>> Signed-off-by: Grant Erickson <gerickson@nuovations.com>
> 
> Considering the extensive changes to generic code, this patch will
> have to be submitted via the linux-kernel mailing list.
> 
> I suggest you split the generic core change from the powerpc specific
> implementation.
> 
> I'm not sure whether I like the idea myself or not there, so you'll have
> to convince the powers that be to take it.

Ben:

Thanks for the feedback. Matt Sealey had some good feedback
<http://ozlabs.org/pipermail/linuxppc-dev/2008-November/065594.html> which I
have on my to-do list to evaluate.

In the interim, I'll hold off on pushing up to linux-kernel until I've done
that.

Regards,

Grant

^ permalink raw reply

* Re: [RFC 2.6.27 1/1] gpiolib: add support for batch set of pins
From: Ben Nizette @ 2009-01-07  1:52 UTC (permalink / raw)
  To: Robin Getz
  Cc: Jaya Kumar, David Brownell, Eric Miao, Sam Ravnborg, Eric Miao,
	Haavard Skinnemoen, Philipp Zabel, Russell King, Ben Gardner,
	Greg KH, linux-arm-kernel, linux-fbdev-devel, linux-kernel,
	linux-embedded
In-Reply-To: <200901061802.39072.rgetz@blackfin.uclinux.org>

On Tue, 2009-01-06 at 18:02 -0500, Robin Getz wrote:
> On Sun 28 Dec 2008 17:00, Ben Nizette pondered:
> > On Sun, 2008-12-28 at 13:46 -0500, Robin Getz wrote:
> > > > gpio_set_batch(DB0, value, 0xFFFF, 16)
> > > > 
> > > > which has the nice performance benefit of skipping all the bit
> > > > counting in the most common use case scenario.
> > > 
> > > but has the requirement that the driver know exactly the board level 
> > > impmentation details (something that doesn't sound generic).
> > 
> > The original use case for these batch operations was in a fastpath -
> > setting data lines on a framebuffer.  Sure it's arguably not as generic
> > as may be, but it optimises for speed and current usage patterns - I'm
> > OK with that.  Other usage patterns which don't have a speed requirement
> > can be done using the individual pin operations and a loop.
> 
> The tradeoff for speed is always made with extensibility. If you want best
> speed - you shouldn't be using the GPIO framework at all - just bang directly
> on the registers yourself...
> 
> If you want something extensible/generic that serves multiple use cases - 
> it will normally come with a performance tradeoff...

Yeah there's certainly a sliding scale here.  At the speedy end you've
got register banging, $SUBJECT is one level higher, your concepts are
higher again.

If your concept can perform at a speed which solves the problem which
inspired $SUBJECT then IMO it's a better way to go.  The thing which
worries me is that if we did end up sliding right up the Generic end of
the Generic-Performance slider then we'd end up with a sexy interface
which isn't actually usable for the case which inspired this activity in
the first place!

And yes the speed is primarily the concern of the implementation and
your comments really concern the interface.  On paper the 2 are largely
orthogonal but until I see an alternate/competing patch I'm not prepared
to bet it'll all play nice.

> 
> > > > While we are here, I was thinking about it, and its better if I give
> > > > gpio_request/free/direction_batch a miss for now. Nothing prevents
> > > > those features being added at a later point.
> > > 
> > > I don't think that request/free are optional.
> > > 
> > > For example - in most SoC implementations - gpios are implemented as
> > > banks of 16 or 32. (a 16 or 32 bit register).
> > > 
> > > Are there facilities to span these registers? 
> > >  - can you request 64 gpios as a 'bank'?
> > >  - can you request gpio_8 -> gpio_40 as a 'bank' on a 32-bit system?
> > > 
> > > Are non-adjacent/non-contiguous gpios avaliable to be put into 
> > > a 'bank/batch/bus'? can you use gpio_8 -> 11 &  28 -> 31 as a 8-bit
> > > 'bus'? 
> > > 
> > > How do you know what is avaliable to be talked to as a bank/bus/batch
> > > without the request/free operation?
> > 
> > I think the read/write operations should be able to fail if you give
> > them invalid chunks of gpio, sure. 
> 
> Can you define "invalid"? what are the limitations?
> 
> Can I use gpio_8 -> 11 &  28 -> 31 as a chunk?

With the $SUBJECT patch, I understand you can (they both fit in a u32
bitmask).  The limitations aren't really the point though - no matter
what the final interface/implementation there will be some limitations.
Either there must be a bulk request interface which will fail if you try
and break the limitations or else the read/write will fail under these
circumstances.

A limitation which springs to mind for any implementation would be if
you specify that the chunk must be accessible in irq context but you
specify some gpios which must sleep.  Yes this is a silly case and
totally the platform's responsibility but the fact is that if someone
makes this mistake an error should be returned *somewhere* rather than
just letting things explode.

In my own personal library of standalone code I've got a GPIO driver
which uses request() to build a cookie representing the gpios to be
driven; read/write then use this cookie.  If the requested pins can't be
batched then it's the request which fails.  I liked this approach.  For
$SUBJECT's purposes though I think it's more symmetric with the single
bit operations to keep request as a refcount and have read/write able to
fail.  Not too fussed though.

> 
> > Request/free are not really designed 
> > for that operation - they just ensure exclusive access to a gpio if
> > that's what the driver wants.  In the batch case the
> > request/free/direction operations can once again be performed by single
> > pin operations and iteration.
> 
> That depends on the semantics of "request". 
> 
> If it is "request & build up a monolithic chunk from xxx GPIO's" - then
> my definition works.

Indeed, see above.

> 
>  
> > > I have seen various hardware designs (both at the PCB and SoC level)
> > > require all of these options, and would like to see common infrastructure
> > > which handles this.
> >
> > Yeah the request/free operation doesn't deal with muxing or any other
> > platform-specific kinda gumph, that was an original design decision.
> > They're really just a usage counter.
> 
> Sorry for bringing up the muxing - that wasn't the point.
> 
> It was really the issue of being non-contiguous, spanning various implementations.
> 
> > An example which comes to mind is the avr32-specific userspace gpio
> > interface.  This takes a bitmask, loops over the set bits and fails if
> > any of the gpio are previously requested or have been assigned to
> > non-gpio peripherals. 
> 
> I'll have a look. I don't understand why everyone decided to make their own
> userspace GPIO interface - can't we all just get along? :)

Yeah for sure.  Of course since .27 there's been a nice consistant
userspace gpio interface (/sys/class/gpio) so theoretically all the
others are deprecated.  I know that at least on avrfreaks, the main
avr32 support forum, users have to have a good reason to be using the
avr32-specific interface if they expect to be helped.

> 
> > I don't really see a need to streamline this. 
> 
> 
> 
> > > I would think that a 'bank' / 'bus' (whatever) would be a collection
> > > of random/multiple GPIOs (a struct of gpio_port_t) rather than a
> > > start/length (as you described) - or better yet - the request 
> > > function takes a list (of individual GPIO's - defined in the 
> > > platform data), and creates the struct itself.
> > 
> > Hmm, this seems a little overengineered for the basic use-cases I can
> > think of. 
> 
> Not the ones I run into all the time...
> 
> More complex pin multiplexing results in less contiguous free GPIO.
> (which again - has nothing to do with multiplexing - it is the result
> that is the important thing).

Which is why we're having this bit of a discussion!  Jaya was the first
person to have this problem and has a patch which, in his case, solves
it.  If you have other concrete cases which need to be solved at the
same time then now's the time to come forward and share with the class.

OK so I've got a board which collects the status of a number of gpio all
over a bunch of chips at a few 10s of Hz, encodes it and fires it across
a network.  This works with single gpio ops but would be streamlined
with batch access.  If the final solution to this problem can be used
for my pins then great, I'll move to it.  However, if supporting my
use-case breaks Jaya's then it's not worth it.

> 
> > If this can be cranked up to the same speed as the current 
> > proposition then OK maybe someone will like it but otherwise, once
> > again, I think most people will be happy with individual operations and
> > iteration.
> 
> It will be easier to maintain (from a end user perceptive - if someone 
> wants a "chunk" of gpio's - they just define it in their platform data). 
> It does put a bigger burden on the person writing things.
> 
> I would think that the overhead would only be at init - runtime shouldn't
> be much different in the simple case, but allowing the complex usecases
> with the same interface is better (since we only have to teach people one
> thing) :)

Yeah I'll say again that if we can find an interface as generic as you
suggest and an implementation which performs as well as Jaya needs then
hells yeah, let's go for it.  If however a sexy interface means Jaya's
original use case breaks then I for one won't support that interface.

	--Ben.

> 
> -Robin

^ permalink raw reply

* Re: [PATCH/RFC] Add Alternative Log Buffer Support for printk Messages
From: Benjamin Herrenschmidt @ 2009-01-07  0:04 UTC (permalink / raw)
  To: Grant Erickson; +Cc: linuxppc-dev, Stefan Roese, Wolfgang Denx, linux-embedded
In-Reply-To: <1227638045-12862-1-git-send-email-gerickson@nuovations.com>

On Tue, 2008-11-25 at 10:34 -0800, Grant Erickson wrote:
> This merges support for the previously DENX-only kernel feature of
> specifying an alternative, "external" buffer for kernel printk
> messages and their associated metadata. In addition, this ports
> architecture support for this feature from arch/ppc to arch/powerpc.
> 
> Signed-off-by: Grant Erickson <gerickson@nuovations.com>

Considering the extensive changes to generic code, this patch will
have to be submitted via the linux-kernel mailing list.

I suggest you split the generic core change from the powerpc specific
implementation.

I'm not sure whether I like the idea myself or not there, so you'll have
to convince the powers that be to take it.

Cheers,
Ben.


^ permalink raw reply

* Re: [RFC 2.6.27 1/1] gpiolib: add support for batch set of pins
From: Robin Getz @ 2009-01-06 23:02 UTC (permalink / raw)
  To: Ben Nizette
  Cc: Jaya Kumar, David Brownell, Eric Miao, Sam Ravnborg, Eric Miao,
	Haavard Skinnemoen, Philipp Zabel, Russell King, Ben Gardner,
	Greg KH, linux-arm-kernel, linux-fbdev-devel, linux-kernel,
	linux-embedded
In-Reply-To: <1230501634.16910.57.camel@linux-51e8.site>

On Sun 28 Dec 2008 17:00, Ben Nizette pondered:
> On Sun, 2008-12-28 at 13:46 -0500, Robin Getz wrote:
> > > gpio_set_batch(DB0, value, 0xFFFF, 16)
> > > 
> > > which has the nice performance benefit of skipping all the bit
> > > counting in the most common use case scenario.
> > 
> > but has the requirement that the driver know exactly the board level 
> > impmentation details (something that doesn't sound generic).
> 
> The original use case for these batch operations was in a fastpath -
> setting data lines on a framebuffer.  Sure it's arguably not as generic
> as may be, but it optimises for speed and current usage patterns - I'm
> OK with that.  Other usage patterns which don't have a speed requirement
> can be done using the individual pin operations and a loop.

The tradeoff for speed is always made with extensibility. If you want best
speed - you shouldn't be using the GPIO framework at all - just bang directly
on the registers yourself...

If you want something extensible/generic that serves multiple use cases - 
it will normally come with a performance tradeoff...

> > > While we are here, I was thinking about it, and its better if I give
> > > gpio_request/free/direction_batch a miss for now. Nothing prevents
> > > those features being added at a later point.
> > 
> > I don't think that request/free are optional.
> > 
> > For example - in most SoC implementations - gpios are implemented as
> > banks of 16 or 32. (a 16 or 32 bit register).
> > 
> > Are there facilities to span these registers? 
> >  - can you request 64 gpios as a 'bank'?
> >  - can you request gpio_8 -> gpio_40 as a 'bank' on a 32-bit system?
> > 
> > Are non-adjacent/non-contiguous gpios avaliable to be put into 
> > a 'bank/batch/bus'? can you use gpio_8 -> 11 &  28 -> 31 as a 8-bit
> > 'bus'? 
> > 
> > How do you know what is avaliable to be talked to as a bank/bus/batch
> > without the request/free operation?
> 
> I think the read/write operations should be able to fail if you give
> them invalid chunks of gpio, sure. 

Can you define "invalid"? what are the limitations?

Can I use gpio_8 -> 11 &  28 -> 31 as a chunk?

> Request/free are not really designed 
> for that operation - they just ensure exclusive access to a gpio if
> that's what the driver wants.  In the batch case the
> request/free/direction operations can once again be performed by single
> pin operations and iteration.

That depends on the semantics of "request". 

If it is "request & build up a monolithic chunk from xxx GPIO's" - then
my definition works.

 
> > I have seen various hardware designs (both at the PCB and SoC level)
> > require all of these options, and would like to see common infrastructure
> > which handles this.
>
> Yeah the request/free operation doesn't deal with muxing or any other
> platform-specific kinda gumph, that was an original design decision.
> They're really just a usage counter.

Sorry for bringing up the muxing - that wasn't the point.

It was really the issue of being non-contiguous, spanning various implementations.

> An example which comes to mind is the avr32-specific userspace gpio
> interface.  This takes a bitmask, loops over the set bits and fails if
> any of the gpio are previously requested or have been assigned to
> non-gpio peripherals. 

I'll have a look. I don't understand why everyone decided to make their own
userspace GPIO interface - can't we all just get along? :)

> I don't really see a need to streamline this. 



> > I would think that a 'bank' / 'bus' (whatever) would be a collection
> > of random/multiple GPIOs (a struct of gpio_port_t) rather than a
> > start/length (as you described) - or better yet - the request 
> > function takes a list (of individual GPIO's - defined in the 
> > platform data), and creates the struct itself.
> 
> Hmm, this seems a little overengineered for the basic use-cases I can
> think of. 

Not the ones I run into all the time...

More complex pin multiplexing results in less contiguous free GPIO.
(which again - has nothing to do with multiplexing - it is the result
that is the important thing).

> If this can be cranked up to the same speed as the current 
> proposition then OK maybe someone will like it but otherwise, once
> again, I think most people will be happy with individual operations and
> iteration.

It will be easier to maintain (from a end user perceptive - if someone 
wants a "chunk" of gpio's - they just define it in their platform data). 
It does put a bigger burden on the person writing things.

I would think that the overhead would only be at init - runtime shouldn't
be much different in the simple case, but allowing the complex usecases
with the same interface is better (since we only have to teach people one
thing) :)

-Robin

^ permalink raw reply

* Re: [RFC 2.6.27 1/1] gpiolib: add support for batch set of pins
From: Robin Getz @ 2009-01-06 22:41 UTC (permalink / raw)
  To: Jaya Kumar
  Cc: David Brownell, Eric Miao, Sam Ravnborg, Eric Miao,
	Haavard Skinnemoen, Philipp Zabel, Russell King, Ben Gardner,
	Greg KH, linux-arm-kernel, linux-fbdev-devel, linux-kernel,
	linux-embedded
In-Reply-To: <45a44e480812311005k709c410ao1116187e9427e452@mail.gmail.com>

On Wed 31 Dec 2008 13:05, Jaya Kumar pondered:
> On Thu, Jan 1, 2009 at 1:38 AM, Robin Getz <rgetz@blackfin.uclinux.org> wrote:
> > On Tue 30 Dec 2008 23:58, Jaya Kumar pondered:
> >> On Tue, Dec 30, 2008 at 11:55 PM, Robin Getz <rgetz@blackfin.uclinux.org> wrote:
> >> > Yeah, I hadn't thought about spanning more than one gpio_chip. That's a good
> >> > point.
> >>
> >> The currently posted code already supports spanning more than one gpio_chip.
> >>
> >
> > But doesn't do all the other things that David suggested/requested.
> 
> Hi Robin,
> 
> Yes, you are right. My implementation does not support a driver that
> needs to set/get more than 32-bits of gpio in a single call. I'm okay
> with that restriction as I don't see a concrete use case for that.

It's not the more than 32-bits that I'm concerned about - it is spanning
more than one register. (if all the GPIOs that are left on the board are 
2, 64, and 128, where 2, and 64 are part of the SOC's GPIO, and 128 is on
a GPIO expander - which is a common use case - is this handled?)

^ permalink raw reply

* Re: [PATCH 1/3]: Replace kernel/timeconst.pl with kernel/timeconst.sh
From: Rob Landley @ 2009-01-06  0:06 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Bernd Petrovitsch, Valdis.Kletnieks, Ingo Oeser,
	Embedded Linux mailing list, linux-kernel, Andrew Morton,
	H. Peter Anvin, Sam Ravnborg
In-Reply-To: <20090105150156.GD14503@shareable.org>

On Monday 05 January 2009 09:01:56 Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
> > I assume that the NFS-mounted root filesystem is a real distribution.
>
> Not unless you call uClinux (MMU-less) a real distribution, no.

I want things to be orthogonal.  The following should be completely separate 
steps:

1) Creating a cross compiler
2) building a native development environment
3) booting a native development environment (on real hardware or under and 
emulator)
4) natively building your target system.

You should be able to mix and match.  Crosstool for #1, go download "fedora 
for arm" instead of #2, qemu or real hardware is your choice for #3, and then 
you should be able to natively build gentoo under an ubuntu host or vice 
versa.  (How is not currently properly documented, but I'm working on that.)

My objection to build systems like buildroot or uClinux is that they bundle 
all this together into a big hairball.  They create their own cross compiler, 
build their own root file system, use their own packaging system, and you have 
to take it all or nothing.

My build system is ruthlessly orthogonal.  I try not to make it depend on 
other bits of _itself_ more than necessary.

> > > (* - No MMU on some ARMs, but I'm working on ARM FDPIC-ELF to add
> > >      proper shared libs.  Feel free to fund this :-)
> >
> > The above mentioned ARMs have a MMU. Without MMU, it would be truly
> > insane IMHO.
>
> We have similar cross-build issues without MMUs... I.e. that a lot of
> useful packages don't cross-build properly (including many which use
> Autoconf), and it might be easier to make a native build environment
> than to debug and patch all the broken-for-cross-build packages.
> Especially as sometimes they build, but fail at run-time in some
> conditions.

If you can get a version of the same architecture with an mmu you can actually 
build natively on that.  It's not ideal (it's a bit like trying to build i486 
code on an i686; the fact it runs on the host is no guarantee it'll run on the 
target), but it's better than cross compiling.  And most things have a broad 
enough compatible "base architecture" that you can mostly get away with it.

> But you're right it's probably insane to try.  I haven't dared as I
> suspect GCC and/or Binutils would break too :-)

Oh it does, but you can fix it. :)

> I'm sticking instead with "oh well cross-build a few packages by hand
> and just don't even _try_ to use most of the handy software out there".

Cross compiling doesn't scale, and it bit-rots insanely quickly.

> You mentioned ARM Debian.  According to
> http://wiki.debian.org/ArmEabiPort one recommended method of
> bootstrapping it is building natively on an emulated ARM, because
> cross-building is fragile.

That's what my firmware linux project does too.  (I believe I was one of the 
first doing this back in 2006, but there are three or four others out there 
doing it now.)

Native compiling under emulation is an idea whose time has come.  Emulators on 
cheap x86-64 laptops today are about as powerful as high end tricked out build 
servers circa 2001, and Moore's Law continues to advance.  More memory, more 
CPU (maybe via SMP but distcc can take advantage of that today and qemu will 
develop threading someday).  You can throw engineering time at the problem 
(making cross compiling work) or you can throw hardware at the problem (build 
natively and buy fast native or emulator-hosting hardware).  The balance used 
to be in favor of the former; not so much anymore.

That said, my drive for reproducibility and orthogonality says that your 
native development environment must be something you can reproduce entirely 
from source on an arbitrary host.  You can't make cross compiling go away 
entirely, the best you can do is limit it to bootstrapping the native 
environment.  But I want to keep the parts I have to cross compile as small 
and simple as possible, and then run a native build script to get a richer 
environment.  For the past 5+ years my definition has been "an environment 
that can rebuild itself under itself is powerful enough, that's all I need to 
cross compile", and from the first time I tried this (late 2002) up until 
2.6.25 that was 7 packages.  That's why I responded to the addition of perl as 
a regression, because for my use case it was.

> -- Jamie

Rob

^ permalink raw reply

* Re: [PATCH 1/3]: Replace kernel/timeconst.pl with kernel/timeconst.sh
From: Rob Landley @ 2009-01-05 21:07 UTC (permalink / raw)
  To: Bernd Petrovitsch
  Cc: Jamie Lokier, Valdis.Kletnieks, Ingo Oeser,
	Embedded Linux mailing list, linux-kernel, Andrew Morton,
	H. Peter Anvin, Sam Ravnborg
In-Reply-To: <1231152378.3326.14.camel@gimli.at.home>

On Monday 05 January 2009 04:46:18 Bernd Petrovitsch wrote:
> > My 850 Linux boxes are 166MHz ARMs and occasionally NFS-mounted.
> > Their /bin/sh does not do $((...)), and Bash is not there at all.
>
> I assume that the NFS-mounted root filesystem is a real distribution.
> And on the local flash is a usual busybox based firmware.

Building on an nfs mount is evil.  Make cares greatly about timestamp 
accuracy, and NFS's dentry cacheing doesn't really, especially when it 
discards cached copies and re-fetches them, and the server and client's clocks 
are a half-second off from each other.

Sometimes you haven't got a choice, but I hate having to debug the build 
problems this intermittently causes.  If you never do anything except "make 
all" it should suck less.

> > If I were installing GCC natively on them, I'd install GNU Make and a
> > proper shell while I were at it.  But I don't know if Bash works
>
> ACK.
>
> > properly without fork()* - or even if GCC does :-)
> >
> > Perl might be hard, as shared libraries aren't supported by the
> > toolchain which targets my ARMs* and Perl likes its loadable modules.
>
> The simplest way to go is probably to use CentOS or Debian or another
> ready binary distribution on ARM (or MIPS or PPC or whatever core the
> embedded system has) possibly on a custom build kernel (if necessary).

Building natively on target hardware or under QEMU is growing in popularity.  
That's how the non-x86 versions of major distros build, and they even have 
policy documents about it.

Here's Fedora's:
http://fedoraproject.org/wiki/Architectures/ARM#Native_Compilation

And here are the guys who opened the door for Ubuntu's official Arm port:
http://mojo.handhelds.org/files/HandheldsMojo_ELC2008.pdf

Of course hobbyists like myself haven't got the budget to buy a cluster of 
high-end arm systems and they're not always even _available_ for things like 
cris, and for new architectures (Xylinx microblaze anyone?) you'll always have 
to cross compile to bootstrap the first development environment on 'em anyway, 
and it's nice for your environment to be _reproducible_...

So a more flexible approach is to cross compile just enough to get a working 
native development environment on the target, and then continue the build 
natively (whether it's under qemu or on a sufficiently powerful piece of 
target hardware).  That's what my "art piece" Firmware Linux project does, and 
there's a scratchbox rewrite (sbox2, 
http://www.freedesktop.org/wiki/Software/sbox2 ) that does the same sort of 
thing, and there are others out there in various states of development.  With 
x86 hardware so cheap and powerful, building under emulation for less powerful 
targets starts to make sense.

Building natively under emulation (QEMU) is available to hobbyists like me and 
avoids most of the fun cross compiling issues you don't find out about until 
after you've shipped the system and somebody tries to do something with it you 
didn't test.  So far the record for diagnosing one of these is the two full-
time weeks my friend Garrett spent back at TimeSys tracking down why perl 
signal handling wasn't working on mips; turned out it was using x86 signal 
numbers rather which don't match the mips ones.  The BSP had been shipping for 
over a year at that point, but nobody had ever tried to do signal handling in 
perl on mips before, and since the perl ./configure step is written in perl 
finding the broken part took some doing.  This was back in the mists of early 
2007 so it's ancient history by now, of course...

If you have set up a cross compiler, you can configure QEMU to use distcc to 
call out through its virtual network to the cross compiler running on the 
host, which gives you a speed boost without reintroducing most of the horrible 
cross compiling issues: there's still only a native toolchain so your build 
doesn't have to keep two contexts (hostcc/targetcc) straight, ./configure 
still runs natively so any binaries it builds can run and any questions it 
asks about the host it's building on should give the right answers for the 
target it's building for (including uname -m and friends), headers are 
#included natively and libraries are linked natively (that's just how distcc 
works, preprocessing and linking happen on the local machine) and there's only 
one set so they can't accidentally mix and the cross compiler isn't even 
_involved_ in that, make runs natively so it won't get confused by strange 
environment variables (yeah, seen that one)...)  Only the heavy lifting of 
compiling preprocessed .c files to .o files gets exported, which is the one 
thing the cross compiler can't really screw up.

But bootstraping a native build environment to run under the emulator is 
something you want to keep down to as few packages as possible, because if 
you're trying to get the same behavior across half a dozen boards, cross 
compiling breaks every time you upgrade _anything_.

> [...]
>
> > (* - No MMU on some ARMs, but I'm working on ARM FDPIC-ELF to add
> >      proper shared libs.  Feel free to fund this :-)
>
> The above mentioned ARMs have a MMU. Without MMU, it would be truly
> insane IMHO.

Without an mmu you have a restricted set of packages that run anyway.  No 
variable length stacks, you have to use vfork() instead of fork() (no copy on 
write), memory fragmentation is a big problem so malloc() fails way more 
often...

So toolchain problems aren't a "hump" to get past on nommu systems: the area 
past that it isn't necessarily any easier.

> 	Bernd

Rob

^ permalink raw reply

* Re: [PATCH 1/3]: Replace kernel/timeconst.pl with kernel/timeconst.sh
From: Bernd Petrovitsch @ 2009-01-05 16:18 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Rob Landley, Valdis.Kletnieks, Ingo Oeser,
	Embedded Linux mailing list, linux-kernel, Andrew Morton,
	H. Peter Anvin, Sam Ravnborg
In-Reply-To: <20090105150156.GD14503@shareable.org>

On Mon, 2009-01-05 at 15:01 +0000, Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
> > I assume that the NFS-mounted root filesystem is a real distribution.
> 
> Not unless you call uClinux (MMU-less) a real distribution, no.

Not really.

> > > (* - No MMU on some ARMs, but I'm working on ARM FDPIC-ELF to add
> > >      proper shared libs.  Feel free to fund this :-)
> > 
> > The above mentioned ARMs have a MMU. Without MMU, it would be truly
> > insane IMHO.
> 
> We have similar cross-build issues without MMUs... I.e. that a lot of

Of course.

> useful packages don't cross-build properly (including many which use
> Autoconf), and it might be easier to make a native build environment

Tell me about it - AC_TRY_RUN() is the culprit.
And `pkg-config` supports cross-compilation only since 18 months or so.
Before one had to rewrite the generated .pc files.

[...]
> You mentioned ARM Debian.  According to
> http://wiki.debian.org/ArmEabiPort one recommended method of
> bootstrapping it is building natively on an emulated ARM, because
> cross-building is fragile.

That's of course the other solution - if qemu supports your
$EMBEDDED_CPU good enough.

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services

^ permalink raw reply

* Re: [PATCH 1/3]: Replace kernel/timeconst.pl with kernel/timeconst.sh
From: Jamie Lokier @ 2009-01-05 15:01 UTC (permalink / raw)
  To: Bernd Petrovitsch
  Cc: Rob Landley, Valdis.Kletnieks, Ingo Oeser,
	Embedded Linux mailing list, linux-kernel, Andrew Morton,
	H. Peter Anvin, Sam Ravnborg
In-Reply-To: <1231152378.3326.14.camel@gimli.at.home>

Bernd Petrovitsch wrote:
> I assume that the NFS-mounted root filesystem is a real distribution.

Not unless you call uClinux (MMU-less) a real distribution, no.

> > (* - No MMU on some ARMs, but I'm working on ARM FDPIC-ELF to add
> >      proper shared libs.  Feel free to fund this :-)
> 
> The above mentioned ARMs have a MMU. Without MMU, it would be truly
> insane IMHO.

We have similar cross-build issues without MMUs... I.e. that a lot of
useful packages don't cross-build properly (including many which use
Autoconf), and it might be easier to make a native build environment
than to debug and patch all the broken-for-cross-build packages.
Especially as sometimes they build, but fail at run-time in some
conditions.

But you're right it's probably insane to try.  I haven't dared as I
suspect GCC and/or Binutils would break too :-)

I'm sticking instead with "oh well cross-build a few packages by hand
and just don't even _try_ to use most of the handy software out there".

You mentioned ARM Debian.  According to
http://wiki.debian.org/ArmEabiPort one recommended method of
bootstrapping it is building natively on an emulated ARM, because
cross-building is fragile.

-- Jamie

^ permalink raw reply

* Re: [PATCH V3 12/17] Squashfs: header files
From: Pekka Enberg @ 2009-01-05 13:32 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr
In-Reply-To: <E1LJnJo-0002Lz-2y@dylan.lougher.demon.co.uk>

Hi Phillip,

On Mon, Jan 5, 2009 at 1:08 PM, Phillip Lougher
<phillip@lougher.demon.co.uk> wrote:
> +#define TRACE(s, args...)      pr_debug("SQUASHFS: "s, ## args)

You've probably heard this before but silly "tracing" such as:

    TRACE("Entered squashfs_fill_superblock\n");

should really be removed from the filesystem code.

> +#define ERROR(s, args...)      pr_err("SQUASHFS error: "s, ## args)
> +
> +#define WARNING(s, args...)    pr_warning("SQUASHFS: "s, ## args)

I think you're supposed to #define pr_fmt() in your header instead of
adding wrappers like these.

                        Pekka

^ permalink raw reply

* Re: [PATCH V3 00/17] Squashfs: compressed read-only filesystem
From: Evgeniy Polyakov @ 2009-01-05 13:27 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr
In-Reply-To: <E1LJnJn-0002Kx-Dd@dylan.lougher.demon.co.uk>

Hi Phillip.

On Mon, Jan 05, 2009 at 11:08:23AM +0000, Phillip Lougher (phillip@lougher.demon.co.uk) wrote:
> This a second respin of the Squashfs patches incorporating the review comments
> received.  Thanks to everyone who have sent comments.
> 
> Summary of changes in patch respin:
> 
> 1. Vmalloc removed, smaller PAGE_CACHE_SIZE buffers are now allocated
> 2. Renamed some global functions, prefixing with squashfs_
> 3. brelse changed to put_bh
> 4. cache->lock coverage extended in squashfs_put_cache() and
>    squashfs_cache_get()
> 5. New squashfs.txt file in Documentation/filesystems
> 6. Changed 'long long' usage to u64 for variables referring to 64-bit
>    filesystem locations
> 7. SQUASHFS_I() renamed to squashfs_i()
> 8. Renamed locked variable to refcount to clarify usage
> 9. Renamed waiting variable to num_waiters, making it clear it is a count
>    rather than a boolean
> 10. Made pending and error fields int rather than char


Looks good.
You can also update the year in the copyright string in the files :)

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: [PATCH V3 01/17] Squashfs: inode operations
From: Evgeniy Polyakov @ 2009-01-05 13:20 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr
In-Reply-To: <E1LJnJn-0002L2-GZ@dylan.lougher.demon.co.uk>

Hi.

On Mon, Jan 05, 2009 at 11:08:23AM +0000, Phillip Lougher (phillip@lougher.demon.co.uk) wrote:
> +int squashfs_read_inode(struct inode *inode, long long ino)
> +{
> +	struct super_block *sb = inode->i_sb;
> +	struct squashfs_sb_info *msblk = sb->s_fs_info;
> +	u64 block = SQUASHFS_INODE_BLK(ino) + msblk->inode_table;
> +	int err, type, offset = SQUASHFS_INODE_OFFSET(ino);
> +	union squashfs_inode squashfs_ino;
> +	struct squashfs_base_inode *sqshb_ino = &squashfs_ino.base;
> +

What's the size of that union? If big enough it will lead to some
problems.

> +	TRACE("Entered squashfs_read_inode\n");
> +
> +	/*
> +	 * Read inode base common to all inode types.
> +	 */
> +	err = squashfs_read_metadata(sb, sqshb_ino, &block,
> +				&offset, sizeof(*sqshb_ino));
> +	if (err < 0)
> +		goto failed_read;
> +
> +	err = squashfs_new_inode(sb, inode, sqshb_ino);
> +	if (err)
> +		goto failed_read;
> +
> +	block = SQUASHFS_INODE_BLK(ino) + msblk->inode_table;
> +	offset = SQUASHFS_INODE_OFFSET(ino);
> +
> +	type = le16_to_cpu(sqshb_ino->inode_type);
> +	switch (type) {
> +	case SQUASHFS_REG_TYPE: {
> +		unsigned int frag_offset, frag_size, frag;
> +		u64 frag_blk;

Above variables can be moved out of the switch, since they are used in
some other cases too.

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: [PATCH V3 02/17] Squashfs: directory lookup operations
From: Evgeniy Polyakov @ 2009-01-05 13:09 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr
In-Reply-To: <E1LJnJn-0002L7-I8@dylan.lougher.demon.co.uk>

Hi Phillip.

One possible 'show-stopper' below and couple trivials.

On Mon, Jan 05, 2009 at 11:08:23AM +0000, Phillip Lougher (phillip@lougher.demon.co.uk) wrote:
> +static int get_dir_index_using_name(struct super_block *sb,
> +			u64 *next_block, int *next_offset, u64 index_start,
> +			int index_offset, int i_count, const char *name,
> +			int len)
> +{
> +	struct squashfs_sb_info *msblk = sb->s_fs_info;
> +	int i, size, length = 0, err;
> +	struct squashfs_dir_index *index;
> +	char *str;
> +
> +	TRACE("Entered get_dir_index_using_name, i_count %d\n", i_count);
> +
> +	index = kmalloc(sizeof(*index) + SQUASHFS_NAME_LEN * 2 + 2, GFP_KERNEL);
> +	if (index == NULL) {
> +		ERROR("Failed to allocate squashfs_dir_index\n");
> +		goto out;
> +	}
> +
> +	str = &index->name[SQUASHFS_NAME_LEN + 1];
> +	strncpy(str, name, len);
> +	str[len] = '\0';
> +
> +	for (i = 0; i < i_count; i++) {
> +		err = squashfs_read_metadata(sb, index, &index_start,
> +					&index_offset, sizeof(*index));
> +		if (err < 0)
> +			break;
> +
> +

Double new line. Code-style purists will scream when see this.
This was a show-stopper.

> +		size = le32_to_cpu(index->size) + 1;
> +
> +		err = squashfs_read_metadata(sb, index->name, &index_start,
> +					&index_offset, size);
> +		if (err < 0)
> +			break;
> +
> +		index->name[size] = '\0';
> +
> +		if (strcmp(index->name, str) > 0)
> +			break;
> +
> +		length = le32_to_cpu(index->index);
> +		*next_block = le32_to_cpu(index->start_block) +
> +					msblk->directory_table;
> +	}
> +
> +	*next_offset = (length + *next_offset) % SQUASHFS_METADATA_SIZE;
> +	kfree(index);
> +
> +out:
> +	/*
> +	 * Return index (f_pos) of the looked up metadata block.  Translate
> +	 * from internal f_pos to external f_pos which is offset by 3 because
> +	 * we invent "." and ".." entries which are not actually stored in the
> +	 * directory.
> +	 */
> +	return length + 3;
> +}
> +
> +

Another double new-line show-stopper.

> +static struct dentry *squashfs_lookup(struct inode *dir, struct dentry *dentry,
> +				 struct nameidata *nd)
> +{
> +	const unsigned char *name = dentry->d_name.name;
> +	int len = dentry->d_name.len;
> +	struct inode *inode = NULL;
> +	struct squashfs_sb_info *msblk = dir->i_sb->s_fs_info;
> +	struct squashfs_dir_header dirh;
> +	struct squashfs_dir_entry *dire;
> +	u64 block = squashfs_i(dir)->start + msblk->directory_table;
> +	int offset = squashfs_i(dir)->offset;
> +	int err, length = 0, dir_count, size;
> +
> +	TRACE("Entered squashfs_lookup [%llx:%x]\n", block, offset);
> +
> +	dire = kmalloc(sizeof(*dire) + SQUASHFS_NAME_LEN + 1, GFP_KERNEL);
> +	if (dire == NULL) {
> +		ERROR("Failed to allocate squashfs_dir_entry\n");
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	if (len > SQUASHFS_NAME_LEN) {
> +		err = -ENAMETOOLONG;
> +		goto failed;
> +	}
> +
> +	length = get_dir_index_using_name(dir->i_sb, &block, &offset,
> +				squashfs_i(dir)->dir_idx_start,
> +				squashfs_i(dir)->dir_idx_offset,
> +				squashfs_i(dir)->dir_idx_cnt, name, len);
> +

You do not check the return value here.
Plus dir entry allocation can be done after above len check.
This one is trivial.

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: [PATCH 1/3]: Replace kernel/timeconst.pl with kernel/timeconst.sh
From: Bernd Petrovitsch @ 2009-01-05 12:29 UTC (permalink / raw)
  To: Rob Landley
  Cc: Jamie Lokier, Valdis.Kletnieks, Ingo Oeser,
	Embedded Linux mailing list, linux-kernel, Andrew Morton,
	H. Peter Anvin, Sam Ravnborg
In-Reply-To: <200901042250.36847.rob@landley.net>

On Son, 2009-01-04 at 22:50 -0600, Rob Landley wrote:
> On Sunday 04 January 2009 18:15:30 Bernd Petrovitsch wrote:
[...]
> > ACK. A bash can IMHO be expected. Even going for `dash` is IMHO somewhat
> > too extreme.
> 
> I have yet to encounter a system that uses dash _without_ bash.  (All ubuntu 

Hmm, should be doable with a chroot environment quite cheap and simple.

> variants, even jeos, install bash by default.  They moved the /bin/sh symlink 

Yes, I know (small) embedded systems that have a bash (and not "only"
one of busybox shells). It eases writing somewhat fast shell scripts
without the need for lots of fork()s+exec()s too .....

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services

^ permalink raw reply

* Re: [PATCH V3 11/17] Squashfs: block operations
From: Jörn Engel @ 2009-01-05 12:18 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr
In-Reply-To: <E1LJnJo-0002Lt-0v@dylan.lougher.demon.co.uk>

Again, none of my comments are criticism against merging squashfs.  Code
improvements should continue independently.

On Mon, 5 January 2009 11:08:24 +0000, Phillip Lougher wrote:
> +/*
> + * Read and decompress a metadata block or datablock.  Length is non-zero
> + * if a datablock is being read (the size is stored elsewhere in the
> + * filesystem), otherwise the length is obtained from the first two bytes of
> + * the metadata block.  A bit in the length field indicates if the block
> + * is stored uncompressed in the filesystem (usually because compression
> + * generated a larger block - this does occasionally happen with zlib).
> + */

This is the core function of squashfs.  As such, it could probably use
some kerneldoc that explains each parameter and the return value;

> +int squashfs_read_data(struct super_block *sb, void **buffer, u64 index,
> +			int length, u64 *next_index, int srclength)
> +{
> +	struct squashfs_sb_info *msblk = sb->s_fs_info;
> +	struct buffer_head **bh;
> +	int offset = index & ((1 << msblk->devblksize_log2) - 1);
> +	u64 cur_index = index >> msblk->devblksize_log2;
> +	int bytes, compressed, b = 0, k = 0, page = 0, avail;
> +
> +
> +	bh = kcalloc((msblk->block_size >> msblk->devblksize_log2) + 1,
> +				sizeof(*bh), GFP_KERNEL);
> +	if (bh == NULL)
> +		return -ENOMEM;
> +
> +	if (length) {
> +		/*
> +		 * Datablock.
> +		 */
> +		bytes = -offset;
> +		compressed = SQUASHFS_COMPRESSED_BLOCK(length);
> +		length = SQUASHFS_COMPRESSED_SIZE_BLOCK(length);
> +		if (next_index)
> +			*next_index = index + length;
> +
> +		TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n",
> +			index, compressed ? "" : "un", length, srclength);
> +
> +		if (length < 0 || length > srclength ||
> +				(index + length) > msblk->bytes_used)
> +			goto read_failure;
> +
> +		for (b = 0; bytes < length; b++, cur_index++) {
> +			bh[b] = sb_getblk(sb, cur_index);
> +			if (bh[b] == NULL)
> +				goto block_release;
> +			bytes += msblk->devblksize;
> +		}
> +		ll_rw_block(READ, b, bh);
> +	} else {
> +		/*
> +		 * Metadata block.
> +		 */
> +		if ((index + 2) > msblk->bytes_used)
> +			goto read_failure;
> +
> +		bh[0] = get_block_length(sb, &cur_index, &offset, &length);
> +		if (bh[0] == NULL)
> +			goto read_failure;
> +		b = 1;
> +
> +		bytes = msblk->devblksize - offset;
> +		compressed = SQUASHFS_COMPRESSED(length);
> +		length = SQUASHFS_COMPRESSED_SIZE(length);
> +		if (next_index)
> +			*next_index = index + length + 2;
> +
> +		TRACE("Block @ 0x%llx, %scompressed size %d\n", index,
> +				compressed ? "" : "un", length);
> +
> +		if (length < 0 || length > srclength ||
> +					(index + length) > msblk->bytes_used)
> +			goto block_release;
> +
> +		for (; bytes < length; b++) {
> +			bh[b] = sb_getblk(sb, ++cur_index);
> +			if (bh[b] == NULL)
> +				goto block_release;
> +			bytes += msblk->devblksize;
> +		}
> +		ll_rw_block(READ, b - 1, bh + 1);
> +	}
> +
> +	if (compressed) {

This looks like a prime candidate for a seperate function.

> +		int zlib_err = 0, zlib_init = 0;
> +
> +		/*
> +		 * Uncompress block.
> +		 */
> +
> +		mutex_lock(&msblk->read_data_mutex);
> +
> +		msblk->stream.avail_out = 0;
> +		msblk->stream.avail_in = 0;
> +
> +		bytes = length;
> +		do {
> +			if (msblk->stream.avail_in == 0 && k < b) {
> +				avail = min(bytes, msblk->devblksize - offset);
> +				bytes -= avail;
> +				wait_on_buffer(bh[k]);
> +				if (!buffer_uptodate(bh[k]))
> +					goto release_mutex;
> +
> +				if (avail == 0) {
> +					offset = 0;
> +					put_bh(bh[k++]);
> +					continue;
> +				}
> +
> +				msblk->stream.next_in = bh[k]->b_data + offset;
> +				msblk->stream.avail_in = avail;
> +				offset = 0;
> +			}
> +
> +			if (msblk->stream.avail_out == 0) {
> +				msblk->stream.next_out = buffer[page++];
> +				msblk->stream.avail_out = PAGE_CACHE_SIZE;
> +			}
> +
> +			if (!zlib_init) {

Could be moved outside the main loop.

> +				zlib_err = zlib_inflateInit(&msblk->stream);
> +				if (zlib_err != Z_OK) {
> +					ERROR("zlib_inflateInit returned"
> +						" unexpected result 0x%x,"
> +						" srclength %d\n", zlib_err,
> +						srclength);
> +					goto release_mutex;
> +				}
> +				zlib_init = 1;
> +			}
> +
> +			zlib_err = zlib_inflate(&msblk->stream, Z_NO_FLUSH);
> +
> +			if (msblk->stream.avail_in == 0 && k < b)
> +				put_bh(bh[k++]);
> +		} while (zlib_err == Z_OK);
> +
> +		if (zlib_err != Z_STREAM_END) {
> +			ERROR("zlib_inflate returned unexpected result"
> +				" 0x%x, srclength %d, avail_in %d,"
> +				" avail_out %d\n", zlib_err, srclength,
> +				msblk->stream.avail_in,
> +				msblk->stream.avail_out);
> +			goto release_mutex;
> +		}
> +
> +		zlib_err = zlib_inflateEnd(&msblk->stream);
> +		if (zlib_err != Z_OK) {
> +			ERROR("zlib_inflateEnd returned unexpected result 0x%x,"
> +				" srclength %d\n", zlib_err, srclength);
> +			goto release_mutex;
> +		}
> +		length = msblk->stream.total_out;
> +		mutex_unlock(&msblk->read_data_mutex);
> +	} else {

Another candidate for a seperate function.  The complete condition could
look roughly like this:

	if (compressed)
		err = read_compressed_block(...);
	else
		err = read_uncompressed_block(...);

out:
	kfree(bh);
	if (err) {
		ERROR("sb_bread failed reading block -1x%llx\n", cur_index);
		return -EIO;
	}
	return length;

The only disadvantage I can see is the put_bh() loop in the error case
that would be duplicated in both read_*_block() functions.

> +		/*
> +		 * Block is uncompressed.
> +		 */
> +		int i, in, pg_offset = 0;
> +
> +		for (i = 0; i < b; i++) {
> +			wait_on_buffer(bh[i]);
> +			if (!buffer_uptodate(bh[i]))
> +				goto block_release;
> +		}
> +
> +		for (bytes = length; k < b; k++) {
> +			in = min(bytes, msblk->devblksize - offset);
> +			bytes -= in;
> +			while (in) {
> +				if (pg_offset == PAGE_CACHE_SIZE) {
> +					page++;
> +					pg_offset = 0;
> +				}
> +				avail = min_t(int, in, PAGE_CACHE_SIZE -
> +						pg_offset);
> +				memcpy(buffer[page] + pg_offset,
> +						bh[k]->b_data + offset, avail);
> +				in -= avail;
> +				pg_offset += avail;
> +				offset += avail;
> +			}
> +			offset = 0;
> +			put_bh(bh[k]);
> +		}
> +	}
> +
> +	kfree(bh);
> +	return length;
> +
> +release_mutex:
> +	mutex_unlock(&msblk->read_data_mutex);
> +
> +block_release:
> +	for (; k < b; k++)
> +		put_bh(bh[k]);
> +
> +read_failure:
> +	ERROR("sb_bread failed reading block 0x%llx\n", cur_index);
> +	kfree(bh);
> +	return -EIO;
> +}
> -- 
> 1.5.6.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-embedded" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jörn

-- 
Optimizations always bust things, because all optimizations are, in
the long haul, a form of cheating, and cheaters eventually get caught.
-- Larry Wall
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH V3 17/17] MAINTAINERS: squashfs entry
From: Phillip Lougher @ 2009-01-05 11:08 UTC (permalink / raw)
  To: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr


Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
---
 MAINTAINERS |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index befacf0..6ed506f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4049,6 +4049,13 @@ L:	cbe-oss-dev@ozlabs.org
 W:	http://www.ibm.com/developerworks/power/cell/
 S:	Supported
 
+SQUASHFS FILE SYSTEM
+P:	Phillip Lougher
+M:	phillip@lougher.demon.co.uk
+L:	squashfs-devel@lists.sourceforge.net (subscribers-only)
+W:	http://squashfs.org.uk
+S:	Maintained
+
 SRM (Alpha) environment access
 P:	Jan-Benedict Glaw
 M:	jbglaw@lug-owl.de
-- 
1.5.6.3

^ permalink raw reply related

* [PATCH V3 16/17] Squashfs: documentation
From: Phillip Lougher @ 2009-01-05 11:08 UTC (permalink / raw)
  To: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr


Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
---
 Documentation/filesystems/squashfs.txt |  225 ++++++++++++++++++++++++++++++++
 1 files changed, 225 insertions(+), 0 deletions(-)

diff --git a/Documentation/filesystems/squashfs.txt b/Documentation/filesystems/squashfs.txt
new file mode 100644
index 0000000..3e79e4a
--- /dev/null
+++ b/Documentation/filesystems/squashfs.txt
@@ -0,0 +1,225 @@
+SQUASHFS 4.0 FILESYSTEM
+=======================
+
+Squashfs is a compressed read-only filesystem for Linux.
+It uses zlib compression to compress files, inodes and directories.
+Inodes in the system are very small and all blocks are packed to minimise
+data overhead. Block sizes greater than 4K are supported up to a maximum
+of 1Mbytes (default block size 128K).
+
+Squashfs is intended for general read-only filesystem use, for archival
+use (i.e. in cases where a .tar.gz file may be used), and in constrained
+block device/memory systems (e.g. embedded systems) where low overhead is
+needed.
+
+Mailing list: squashfs-devel@lists.sourceforge.net
+Web site: www.squashfs.org
+
+1. FILESYSTEM FEATURES
+----------------------
+
+Squashfs filesystem features versus Cramfs:
+
+				Squashfs		Cramfs
+
+Max filesystem size:		2^64			16 MiB
+Max file size:			~ 2 TiB			16 MiB
+Max files:			unlimited		unlimited
+Max directories:		unlimited		unlimited
+Max entries per directory:	unlimited		unlimited
+Max block size:			1 MiB			4 KiB
+Metadata compression:		yes			no
+Directory indexes:		yes			no
+Sparse file support:		yes			no
+Tail-end packing (fragments):	yes			no
+Exportable (NFS etc.):		yes			no
+Hard link support:		yes			no
+"." and ".." in readdir:	yes			no
+Real inode numbers:		yes			no
+32-bit uids/gids:		yes			no
+File creation time:		yes			no
+Xattr and ACL support:		no			no
+
+Squashfs compresses data, inodes and directories.  In addition, inode and
+directory data are highly compacted, and packed on byte boundaries.  Each
+compressed inode is on average 8 bytes in length (the exact length varies on
+file type, i.e. regular file, directory, symbolic link, and block/char device
+inodes have different sizes).
+
+2. USING SQUASHFS
+-----------------
+
+As squashfs is a read-only filesystem, the mksquashfs program must be used to
+create populated squashfs filesystems.  This and other squashfs utilities
+can be obtained from http://www.squashfs.org.  Usage instructions can be
+obtained from this site also.
+
+
+3. SQUASHFS FILESYSTEM DESIGN
+-----------------------------
+
+A squashfs filesystem consists of seven parts, packed together on a byte
+alignment:
+
+	 ---------------
+	|  superblock 	|
+	|---------------|
+	|  datablocks   |
+	|  & fragments  |
+	|---------------|
+	|  inode table	|
+	|---------------|
+	|   directory	|
+	|     table     |
+	|---------------|
+	|   fragment	|
+	|    table      |
+	|---------------|
+	|    export     |
+	|    table      |
+	|---------------|
+	|    uid/gid	|
+	|  lookup table	|
+	 ---------------
+
+Compressed data blocks are written to the filesystem as files are read from
+the source directory, and checked for duplicates.  Once all file data has been
+written the completed inode, directory, fragment, export and uid/gid lookup
+tables are written.
+
+3.1 Inodes
+----------
+
+Metadata (inodes and directories) are compressed in 8Kbyte blocks.  Each
+compressed block is prefixed by a two byte length, the top bit is set if the
+block is uncompressed.  A block will be uncompressed if the -noI option is set,
+or if the compressed block was larger than the uncompressed block.
+
+Inodes are packed into the metadata blocks, and are not aligned to block
+boundaries, therefore inodes overlap compressed blocks.  Inodes are identified
+by a 48-bit number which encodes the location of the compressed metadata block
+containing the inode, and the byte offset into that block where the inode is
+placed (<block, offset>).
+
+To maximise compression there are different inodes for each file type
+(regular file, directory, device, etc.), the inode contents and length
+varying with the type.
+
+To further maximise compression, two types of regular file inode and
+directory inode are defined: inodes optimised for frequently occurring
+regular files and directories, and extended types where extra
+information has to be stored.
+
+3.2 Directories
+---------------
+
+Like inodes, directories are packed into compressed metadata blocks, stored
+in a directory table.  Directories are accessed using the start address of
+the metablock containing the directory and the offset into the
+decompressed block (<block, offset>).
+
+Directories are organised in a slightly complex way, and are not simply
+a list of file names.  The organisation takes advantage of the
+fact that (in most cases) the inodes of the files will be in the same
+compressed metadata block, and therefore, can share the start block.
+Directories are therefore organised in a two level list, a directory
+header containing the shared start block value, and a sequence of directory
+entries, each of which share the shared start block.  A new directory header
+is written once/if the inode start block changes.  The directory
+header/directory entry list is repeated as many times as necessary.
+
+Directories are sorted, and can contain a directory index to speed up
+file lookup.  Directory indexes store one entry per metablock, each entry
+storing the index/filename mapping to the first directory header
+in each metadata block.  Directories are sorted in alphabetical order,
+and at lookup the index is scanned linearly looking for the first filename
+alphabetically larger than the filename being looked up.  At this point the
+location of the metadata block the filename is in has been found.
+The general idea of the index is ensure only one metadata block needs to be
+decompressed to do a lookup irrespective of the length of the directory.
+This scheme has the advantage that it doesn't require extra memory overhead
+and doesn't require much extra storage on disk.
+
+3.3 File data
+-------------
+
+Regular files consist of a sequence of contiguous compressed blocks, and/or a
+compressed fragment block (tail-end packed block).   The compressed size
+of each datablock is stored in a block list contained within the
+file inode.
+
+To speed up access to datablocks when reading 'large' files (256 Mbytes or
+larger), the code implements an index cache that caches the mapping from
+block index to datablock location on disk.
+
+The index cache allows Squashfs to handle large files (up to 1.75 TiB) while
+retaining a simple and space-efficient block list on disk.  The cache
+is split into slots, caching up to eight 224 GiB files (128 KiB blocks).
+Larger files use multiple slots, with 1.75 TiB files using all 8 slots.
+The index cache is designed to be memory efficient, and by default uses
+16 KiB.
+
+3.4 Fragment lookup table
+-------------------------
+
+Regular files can contain a fragment index which is mapped to a fragment
+location on disk and compressed size using a fragment lookup table.  This
+fragment lookup table is itself stored compressed into metadata blocks.
+A second index table is used to locate these.  This second index table for
+speed of access (and because it is small) is read at mount time and cached
+in memory.
+
+3.5 Uid/gid lookup table
+------------------------
+
+For space efficiency regular files store uid and gid indexes, which are
+converted to 32-bit uids/gids using an id look up table.  This table is
+stored compressed into metadata blocks.  A second index table is used to
+locate these.  This second index table for speed of access (and because it
+is small) is read at mount time and cached in memory.
+
+3.6 Export table
+----------------
+
+To enable Squashfs filesystems to be exportable (via NFS etc.) filesystems
+can optionally (disabled with the -no-exports Mksquashfs option) contain
+an inode number to inode disk location lookup table.  This is required to
+enable Squashfs to map inode numbers passed in filehandles to the inode
+location on disk, which is necessary when the export code reinstantiates
+expired/flushed inodes.
+
+This table is stored compressed into metadata blocks.  A second index table is
+used to locate these.  This second index table for speed of access (and because
+it is small) is read at mount time and cached in memory.
+
+
+4. TODOS AND OUTSTANDING ISSUES
+-------------------------------
+
+4.1 Todo list
+-------------
+
+Implement Xattr and ACL support.  The Squashfs 4.0 filesystem layout has hooks
+for these but the code has not been written.  Once the code has been written
+the existing layout should not require modification.
+
+4.2 Squashfs internal cache
+---------------------------
+
+Blocks in Squashfs are compressed.  To avoid repeatedly decompressing
+recently accessed data Squashfs uses two small metadata and fragment caches.
+
+The cache is not used for file datablocks, these are decompressed and cached in
+the page-cache in the normal way.  The cache is used to temporarily cache
+fragment and metadata blocks which have been read as a result of a metadata
+(i.e. inode or directory) or fragment access.  Because metadata and fragments
+are packed together into blocks (to gain greater compression) the read of a
+particular piece of metadata or fragment will retrieve other metadata/fragments
+which have been packed with it, these because of locality-of-reference may be
+read in the near future. Temporarily caching them ensures they are available
+for near future access without requiring an additional read and decompress.
+
+In the future this internal cache may be replaced with an implementation which
+uses the kernel page cache.  Because the page cache operates on page sized
+units this may introduce additional complexity in terms of locking and
+associated race conditions.
-- 
1.5.6.3


^ permalink raw reply related

* [PATCH V3 11/17] Squashfs: block operations
From: Phillip Lougher @ 2009-01-05 11:08 UTC (permalink / raw)
  To: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr


Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
---
 fs/squashfs/block.c |  274 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 274 insertions(+), 0 deletions(-)

diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c
new file mode 100644
index 0000000..c837dfc
--- /dev/null
+++ b/fs/squashfs/block.c
@@ -0,0 +1,274 @@
+/*
+ * Squashfs - a compressed read only filesystem for Linux
+ *
+ * Copyright (c) 2002, 2003, 2004, 2005, 2006, 2007, 2008
+ * Phillip Lougher <phillip@lougher.demon.co.uk>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * block.c
+ */
+
+/*
+ * This file implements the low-level routines to read and decompress
+ * datablocks and metadata blocks.
+ */
+
+#include <linux/fs.h>
+#include <linux/vfs.h>
+#include <linux/slab.h>
+#include <linux/mutex.h>
+#include <linux/string.h>
+#include <linux/buffer_head.h>
+#include <linux/zlib.h>
+
+#include "squashfs_fs.h"
+#include "squashfs_fs_sb.h"
+#include "squashfs_fs_i.h"
+#include "squashfs.h"
+
+/*
+ * Read the metadata block length, this is stored in the first two
+ * bytes of the metadata block.
+ */
+static struct buffer_head *get_block_length(struct super_block *sb,
+			u64 *cur_index, int *offset, int *length)
+{
+	struct squashfs_sb_info *msblk = sb->s_fs_info;
+	struct buffer_head *bh;
+
+	bh = sb_bread(sb, *cur_index);
+	if (bh == NULL)
+		return NULL;
+
+	if (msblk->devblksize - *offset == 1) {
+		*length = (unsigned char) bh->b_data[*offset];
+		put_bh(bh);
+		bh = sb_bread(sb, ++(*cur_index));
+		if (bh == NULL)
+			return NULL;
+		*length |= (unsigned char) bh->b_data[0] << 8;
+		*offset = 1;
+	} else {
+		*length = (unsigned char) bh->b_data[*offset] |
+			(unsigned char) bh->b_data[*offset + 1] << 8;
+		*offset += 2;
+	}
+
+	return bh;
+}
+
+
+/*
+ * Read and decompress a metadata block or datablock.  Length is non-zero
+ * if a datablock is being read (the size is stored elsewhere in the
+ * filesystem), otherwise the length is obtained from the first two bytes of
+ * the metadata block.  A bit in the length field indicates if the block
+ * is stored uncompressed in the filesystem (usually because compression
+ * generated a larger block - this does occasionally happen with zlib).
+ */
+int squashfs_read_data(struct super_block *sb, void **buffer, u64 index,
+			int length, u64 *next_index, int srclength)
+{
+	struct squashfs_sb_info *msblk = sb->s_fs_info;
+	struct buffer_head **bh;
+	int offset = index & ((1 << msblk->devblksize_log2) - 1);
+	u64 cur_index = index >> msblk->devblksize_log2;
+	int bytes, compressed, b = 0, k = 0, page = 0, avail;
+
+
+	bh = kcalloc((msblk->block_size >> msblk->devblksize_log2) + 1,
+				sizeof(*bh), GFP_KERNEL);
+	if (bh == NULL)
+		return -ENOMEM;
+
+	if (length) {
+		/*
+		 * Datablock.
+		 */
+		bytes = -offset;
+		compressed = SQUASHFS_COMPRESSED_BLOCK(length);
+		length = SQUASHFS_COMPRESSED_SIZE_BLOCK(length);
+		if (next_index)
+			*next_index = index + length;
+
+		TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n",
+			index, compressed ? "" : "un", length, srclength);
+
+		if (length < 0 || length > srclength ||
+				(index + length) > msblk->bytes_used)
+			goto read_failure;
+
+		for (b = 0; bytes < length; b++, cur_index++) {
+			bh[b] = sb_getblk(sb, cur_index);
+			if (bh[b] == NULL)
+				goto block_release;
+			bytes += msblk->devblksize;
+		}
+		ll_rw_block(READ, b, bh);
+	} else {
+		/*
+		 * Metadata block.
+		 */
+		if ((index + 2) > msblk->bytes_used)
+			goto read_failure;
+
+		bh[0] = get_block_length(sb, &cur_index, &offset, &length);
+		if (bh[0] == NULL)
+			goto read_failure;
+		b = 1;
+
+		bytes = msblk->devblksize - offset;
+		compressed = SQUASHFS_COMPRESSED(length);
+		length = SQUASHFS_COMPRESSED_SIZE(length);
+		if (next_index)
+			*next_index = index + length + 2;
+
+		TRACE("Block @ 0x%llx, %scompressed size %d\n", index,
+				compressed ? "" : "un", length);
+
+		if (length < 0 || length > srclength ||
+					(index + length) > msblk->bytes_used)
+			goto block_release;
+
+		for (; bytes < length; b++) {
+			bh[b] = sb_getblk(sb, ++cur_index);
+			if (bh[b] == NULL)
+				goto block_release;
+			bytes += msblk->devblksize;
+		}
+		ll_rw_block(READ, b - 1, bh + 1);
+	}
+
+	if (compressed) {
+		int zlib_err = 0, zlib_init = 0;
+
+		/*
+		 * Uncompress block.
+		 */
+
+		mutex_lock(&msblk->read_data_mutex);
+
+		msblk->stream.avail_out = 0;
+		msblk->stream.avail_in = 0;
+
+		bytes = length;
+		do {
+			if (msblk->stream.avail_in == 0 && k < b) {
+				avail = min(bytes, msblk->devblksize - offset);
+				bytes -= avail;
+				wait_on_buffer(bh[k]);
+				if (!buffer_uptodate(bh[k]))
+					goto release_mutex;
+
+				if (avail == 0) {
+					offset = 0;
+					put_bh(bh[k++]);
+					continue;
+				}
+
+				msblk->stream.next_in = bh[k]->b_data + offset;
+				msblk->stream.avail_in = avail;
+				offset = 0;
+			}
+
+			if (msblk->stream.avail_out == 0) {
+				msblk->stream.next_out = buffer[page++];
+				msblk->stream.avail_out = PAGE_CACHE_SIZE;
+			}
+
+			if (!zlib_init) {
+				zlib_err = zlib_inflateInit(&msblk->stream);
+				if (zlib_err != Z_OK) {
+					ERROR("zlib_inflateInit returned"
+						" unexpected result 0x%x,"
+						" srclength %d\n", zlib_err,
+						srclength);
+					goto release_mutex;
+				}
+				zlib_init = 1;
+			}
+
+			zlib_err = zlib_inflate(&msblk->stream, Z_NO_FLUSH);
+
+			if (msblk->stream.avail_in == 0 && k < b)
+				put_bh(bh[k++]);
+		} while (zlib_err == Z_OK);
+
+		if (zlib_err != Z_STREAM_END) {
+			ERROR("zlib_inflate returned unexpected result"
+				" 0x%x, srclength %d, avail_in %d,"
+				" avail_out %d\n", zlib_err, srclength,
+				msblk->stream.avail_in,
+				msblk->stream.avail_out);
+			goto release_mutex;
+		}
+
+		zlib_err = zlib_inflateEnd(&msblk->stream);
+		if (zlib_err != Z_OK) {
+			ERROR("zlib_inflateEnd returned unexpected result 0x%x,"
+				" srclength %d\n", zlib_err, srclength);
+			goto release_mutex;
+		}
+		length = msblk->stream.total_out;
+		mutex_unlock(&msblk->read_data_mutex);
+	} else {
+		/*
+		 * Block is uncompressed.
+		 */
+		int i, in, pg_offset = 0;
+
+		for (i = 0; i < b; i++) {
+			wait_on_buffer(bh[i]);
+			if (!buffer_uptodate(bh[i]))
+				goto block_release;
+		}
+
+		for (bytes = length; k < b; k++) {
+			in = min(bytes, msblk->devblksize - offset);
+			bytes -= in;
+			while (in) {
+				if (pg_offset == PAGE_CACHE_SIZE) {
+					page++;
+					pg_offset = 0;
+				}
+				avail = min_t(int, in, PAGE_CACHE_SIZE -
+						pg_offset);
+				memcpy(buffer[page] + pg_offset,
+						bh[k]->b_data + offset, avail);
+				in -= avail;
+				pg_offset += avail;
+				offset += avail;
+			}
+			offset = 0;
+			put_bh(bh[k]);
+		}
+	}
+
+	kfree(bh);
+	return length;
+
+release_mutex:
+	mutex_unlock(&msblk->read_data_mutex);
+
+block_release:
+	for (; k < b; k++)
+		put_bh(bh[k]);
+
+read_failure:
+	ERROR("sb_bread failed reading block 0x%llx\n", cur_index);
+	kfree(bh);
+	return -EIO;
+}
-- 
1.5.6.3


^ permalink raw reply related

* [PATCH V3 14/17] Squashfs: Kconfig entry
From: Phillip Lougher @ 2009-01-05 11:08 UTC (permalink / raw)
  To: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr


Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
---
 fs/Kconfig |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index ff0e819..2553e0b 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -894,6 +894,58 @@ config CRAMFS
 
 	  If unsure, say N.
 
+config SQUASHFS
+	tristate "SquashFS 4.0 - Squashed file system support"
+	depends on BLOCK
+	select ZLIB_INFLATE
+	help
+	  Saying Y here includes support for SquashFS 4.0 (a Compressed
+	  Read-Only File System).  Squashfs is a highly compressed read-only
+	  filesystem for Linux.  It uses zlib compression to compress both
+	  files, inodes and directories.  Inodes in the system are very small
+	  and all blocks are packed to minimise data overhead. Block sizes
+	  greater than 4K are supported up to a maximum of 1 Mbytes (default
+	  block size 128K).  SquashFS 4.0 supports 64 bit filesystems and files
+	  (larger than 4GB), full uid/gid information, hard links and
+	  timestamps.  
+
+	  Squashfs is intended for general read-only filesystem use, for
+	  archival use (i.e. in cases where a .tar.gz file may be used), and in
+	  embedded systems where low overhead is needed.  Further information
+	  and tools are available from http://squashfs.sourceforge.net.
+
+	  If you want to compile this as a module ( = code which can be
+	  inserted in and removed from the running kernel whenever you want),
+	  say M here and read <file:Documentation/modules.txt>.  The module
+	  will be called squashfs.  Note that the root file system (the one
+	  containing the directory /) cannot be compiled as a module.
+
+	  If unsure, say N.
+
+config SQUASHFS_EMBEDDED
+
+	bool "Additional option for memory-constrained systems" 
+	depends on SQUASHFS
+	default n
+	help
+	  Saying Y here allows you to specify cache size.
+
+	  If unsure, say N.
+
+config SQUASHFS_FRAGMENT_CACHE_SIZE
+	int "Number of fragments cached" if SQUASHFS_EMBEDDED
+	depends on SQUASHFS
+	default "3"
+	help
+	  By default SquashFS caches the last 3 fragments read from
+	  the filesystem.  Increasing this amount may mean SquashFS
+	  has to re-read fragments less often from disk, at the expense
+	  of extra system memory.  Decreasing this amount will mean
+	  SquashFS uses less memory at the expense of extra reads from disk.
+
+	  Note there must be at least one cached fragment.  Anything
+	  much more than three will probably not make much difference.
+
 config VXFS_FS
 	tristate "FreeVxFS file system support (VERITAS VxFS(TM) compatible)"
 	depends on BLOCK
-- 
1.5.6.3


^ permalink raw reply related

* [PATCH V3 12/17] Squashfs: header files
From: Phillip Lougher @ 2009-01-05 11:08 UTC (permalink / raw)
  To: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr


Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
---
 fs/squashfs/squashfs.h       |   90 ++++++++++
 fs/squashfs/squashfs_fs.h    |  381 ++++++++++++++++++++++++++++++++++++++++++
 fs/squashfs/squashfs_fs_i.h  |   45 +++++
 fs/squashfs/squashfs_fs_sb.h |   76 +++++++++
 4 files changed, 592 insertions(+), 0 deletions(-)

diff --git a/fs/squashfs/squashfs.h b/fs/squashfs/squashfs.h
new file mode 100644
index 0000000..6b2515d
--- /dev/null
+++ b/fs/squashfs/squashfs.h
@@ -0,0 +1,90 @@
+/*
+ * Squashfs - a compressed read only filesystem for Linux
+ *
+ * Copyright (c) 2002, 2003, 2004, 2005, 2006, 2007, 2008
+ * Phillip Lougher <phillip@lougher.demon.co.uk>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * squashfs.h
+ */
+
+#define TRACE(s, args...)	pr_debug("SQUASHFS: "s, ## args)
+
+#define ERROR(s, args...)	pr_err("SQUASHFS error: "s, ## args)
+
+#define WARNING(s, args...)	pr_warning("SQUASHFS: "s, ## args)
+
+static inline struct squashfs_inode_info *squashfs_i(struct inode *inode)
+{
+	return list_entry(inode, struct squashfs_inode_info, vfs_inode);
+}
+
+/* block.c */
+extern int squashfs_read_data(struct super_block *, void **, u64, int, u64 *,
+				int);
+
+/* cache.c */
+extern struct squashfs_cache *squashfs_cache_init(char *, int, int);
+extern void squashfs_cache_delete(struct squashfs_cache *);
+extern struct squashfs_cache_entry *squashfs_cache_get(struct super_block *,
+				struct squashfs_cache *, u64, int);
+extern void squashfs_cache_put(struct squashfs_cache_entry *);
+extern int squashfs_copy_data(void *, struct squashfs_cache_entry *, int, int);
+extern int squashfs_read_metadata(struct super_block *, void *, u64 *,
+				int *, int);
+extern struct squashfs_cache_entry *squashfs_get_fragment(struct super_block *,
+				u64, int);
+extern struct squashfs_cache_entry *squashfs_get_datablock(struct super_block *,
+				u64, int);
+extern int squashfs_read_table(struct super_block *, void *, u64, int);
+
+/* export.c */
+extern __le64 *squashfs_read_inode_lookup_table(struct super_block *, u64,
+				unsigned int);
+
+/* fragment.c */
+extern int squashfs_frag_lookup(struct super_block *, unsigned int, u64 *);
+extern __le64 *squashfs_read_fragment_index_table(struct super_block *,
+				u64, unsigned int);
+
+/* id.c */
+extern int squashfs_get_id(struct super_block *, unsigned int, unsigned int *);
+extern __le64 *squashfs_read_id_index_table(struct super_block *, u64,
+				unsigned short);
+
+/* inode.c */
+extern struct inode *squashfs_iget(struct super_block *, long long,
+				unsigned int);
+extern int squashfs_read_inode(struct inode *, long long);
+
+/*
+ * Inodes and files operations
+ */
+
+/* dir.c */
+extern const struct file_operations squashfs_dir_ops;
+
+/* export.c */
+extern const struct export_operations squashfs_export_ops;
+
+/* file.c */
+extern const struct address_space_operations squashfs_aops;
+
+/* namei.c */
+extern const struct inode_operations squashfs_dir_inode_ops;
+
+/* symlink.c */
+extern const struct address_space_operations squashfs_symlink_aops;
diff --git a/fs/squashfs/squashfs_fs.h b/fs/squashfs/squashfs_fs.h
new file mode 100644
index 0000000..6840da1
--- /dev/null
+++ b/fs/squashfs/squashfs_fs.h
@@ -0,0 +1,381 @@
+#ifndef SQUASHFS_FS
+#define SQUASHFS_FS
+/*
+ * Squashfs
+ *
+ * Copyright (c) 2002, 2003, 2004, 2005, 2006, 2007, 2008
+ * Phillip Lougher <phillip@lougher.demon.co.uk>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * squashfs_fs.h
+ */
+
+#define SQUASHFS_CACHED_FRAGMENTS	CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE
+#define SQUASHFS_MAJOR			4
+#define SQUASHFS_MINOR			0
+#define SQUASHFS_MAGIC			0x73717368
+#define SQUASHFS_START			0
+
+/* size of metadata (inode and directory) blocks */
+#define SQUASHFS_METADATA_SIZE		8192
+#define SQUASHFS_METADATA_LOG		13
+
+/* default size of data blocks */
+#define SQUASHFS_FILE_SIZE		131072
+#define SQUASHFS_FILE_LOG		17
+
+#define SQUASHFS_FILE_MAX_SIZE		1048576
+#define SQUASHFS_FILE_MAX_LOG		20
+
+/* Max number of uids and gids */
+#define SQUASHFS_IDS			65536
+
+/* Max length of filename (not 255) */
+#define SQUASHFS_NAME_LEN		256
+
+#define SQUASHFS_INVALID_FRAG		(0xffffffffU)
+#define SQUASHFS_INVALID_BLK		(-1LL)
+
+/* Filesystem flags */
+#define SQUASHFS_NOI			0
+#define SQUASHFS_NOD			1
+#define SQUASHFS_NOF			3
+#define SQUASHFS_NO_FRAG		4
+#define SQUASHFS_ALWAYS_FRAG		5
+#define SQUASHFS_DUPLICATE		6
+#define SQUASHFS_EXPORT			7
+
+#define SQUASHFS_BIT(flag, bit)		((flag >> bit) & 1)
+
+#define SQUASHFS_UNCOMPRESSED_INODES(flags)	SQUASHFS_BIT(flags, \
+						SQUASHFS_NOI)
+
+#define SQUASHFS_UNCOMPRESSED_DATA(flags)	SQUASHFS_BIT(flags, \
+						SQUASHFS_NOD)
+
+#define SQUASHFS_UNCOMPRESSED_FRAGMENTS(flags)	SQUASHFS_BIT(flags, \
+						SQUASHFS_NOF)
+
+#define SQUASHFS_NO_FRAGMENTS(flags)		SQUASHFS_BIT(flags, \
+						SQUASHFS_NO_FRAG)
+
+#define SQUASHFS_ALWAYS_FRAGMENTS(flags)	SQUASHFS_BIT(flags, \
+						SQUASHFS_ALWAYS_FRAG)
+
+#define SQUASHFS_DUPLICATES(flags)		SQUASHFS_BIT(flags, \
+						SQUASHFS_DUPLICATE)
+
+#define SQUASHFS_EXPORTABLE(flags)		SQUASHFS_BIT(flags, \
+						SQUASHFS_EXPORT)
+
+/* Max number of types and file types */
+#define SQUASHFS_DIR_TYPE		1
+#define SQUASHFS_REG_TYPE		2
+#define SQUASHFS_SYMLINK_TYPE		3
+#define SQUASHFS_BLKDEV_TYPE		4
+#define SQUASHFS_CHRDEV_TYPE		5
+#define SQUASHFS_FIFO_TYPE		6
+#define SQUASHFS_SOCKET_TYPE		7
+#define SQUASHFS_LDIR_TYPE		8
+#define SQUASHFS_LREG_TYPE		9
+#define SQUASHFS_LSYMLINK_TYPE		10
+#define SQUASHFS_LBLKDEV_TYPE		11
+#define SQUASHFS_LCHRDEV_TYPE		12
+#define SQUASHFS_LFIFO_TYPE		13
+#define SQUASHFS_LSOCKET_TYPE		14
+
+/* Flag whether block is compressed or uncompressed, bit is set if block is
+ * uncompressed */
+#define SQUASHFS_COMPRESSED_BIT		(1 << 15)
+
+#define SQUASHFS_COMPRESSED_SIZE(B)	(((B) & ~SQUASHFS_COMPRESSED_BIT) ? \
+		(B) & ~SQUASHFS_COMPRESSED_BIT :  SQUASHFS_COMPRESSED_BIT)
+
+#define SQUASHFS_COMPRESSED(B)		(!((B) & SQUASHFS_COMPRESSED_BIT))
+
+#define SQUASHFS_COMPRESSED_BIT_BLOCK	(1 << 24)
+
+#define SQUASHFS_COMPRESSED_SIZE_BLOCK(B)	((B) & \
+						~SQUASHFS_COMPRESSED_BIT_BLOCK)
+
+#define SQUASHFS_COMPRESSED_BLOCK(B)	(!((B) & SQUASHFS_COMPRESSED_BIT_BLOCK))
+
+/*
+ * Inode number ops.  Inodes consist of a compressed block number, and an
+ * uncompressed offset within that block
+ */
+#define SQUASHFS_INODE_BLK(A)		((unsigned int) ((A) >> 16))
+
+#define SQUASHFS_INODE_OFFSET(A)	((unsigned int) ((A) & 0xffff))
+
+#define SQUASHFS_MKINODE(A, B)		((long long)(((long long) (A)\
+					<< 16) + (B)))
+
+/* Translate between VFS mode and squashfs mode */
+#define SQUASHFS_MODE(A)		((A) & 0xfff)
+
+/* fragment and fragment table defines */
+#define SQUASHFS_FRAGMENT_BYTES(A)	\
+				((A) * sizeof(struct squashfs_fragment_entry))
+
+#define SQUASHFS_FRAGMENT_INDEX(A)	(SQUASHFS_FRAGMENT_BYTES(A) / \
+					SQUASHFS_METADATA_SIZE)
+
+#define SQUASHFS_FRAGMENT_INDEX_OFFSET(A)	(SQUASHFS_FRAGMENT_BYTES(A) % \
+						SQUASHFS_METADATA_SIZE)
+
+#define SQUASHFS_FRAGMENT_INDEXES(A)	((SQUASHFS_FRAGMENT_BYTES(A) + \
+					SQUASHFS_METADATA_SIZE - 1) / \
+					SQUASHFS_METADATA_SIZE)
+
+#define SQUASHFS_FRAGMENT_INDEX_BYTES(A)	(SQUASHFS_FRAGMENT_INDEXES(A) *\
+						sizeof(u64))
+
+/* inode lookup table defines */
+#define SQUASHFS_LOOKUP_BYTES(A)	((A) * sizeof(u64))
+
+#define SQUASHFS_LOOKUP_BLOCK(A)	(SQUASHFS_LOOKUP_BYTES(A) / \
+					SQUASHFS_METADATA_SIZE)
+
+#define SQUASHFS_LOOKUP_BLOCK_OFFSET(A)	(SQUASHFS_LOOKUP_BYTES(A) % \
+					SQUASHFS_METADATA_SIZE)
+
+#define SQUASHFS_LOOKUP_BLOCKS(A)	((SQUASHFS_LOOKUP_BYTES(A) + \
+					SQUASHFS_METADATA_SIZE - 1) / \
+					SQUASHFS_METADATA_SIZE)
+
+#define SQUASHFS_LOOKUP_BLOCK_BYTES(A)	(SQUASHFS_LOOKUP_BLOCKS(A) *\
+					sizeof(u64))
+
+/* uid/gid lookup table defines */
+#define SQUASHFS_ID_BYTES(A)		((A) * sizeof(unsigned int))
+
+#define SQUASHFS_ID_BLOCK(A)		(SQUASHFS_ID_BYTES(A) / \
+					SQUASHFS_METADATA_SIZE)
+
+#define SQUASHFS_ID_BLOCK_OFFSET(A)	(SQUASHFS_ID_BYTES(A) % \
+					SQUASHFS_METADATA_SIZE)
+
+#define SQUASHFS_ID_BLOCKS(A)		((SQUASHFS_ID_BYTES(A) + \
+					SQUASHFS_METADATA_SIZE - 1) / \
+					SQUASHFS_METADATA_SIZE)
+
+#define SQUASHFS_ID_BLOCK_BYTES(A)	(SQUASHFS_ID_BLOCKS(A) *\
+					sizeof(u64))
+
+/* cached data constants for filesystem */
+#define SQUASHFS_CACHED_BLKS		8
+
+#define SQUASHFS_MAX_FILE_SIZE_LOG	64
+
+#define SQUASHFS_MAX_FILE_SIZE		(1LL << \
+					(SQUASHFS_MAX_FILE_SIZE_LOG - 2))
+
+#define SQUASHFS_MARKER_BYTE		0xff
+
+/* meta index cache */
+#define SQUASHFS_META_INDEXES	(SQUASHFS_METADATA_SIZE / sizeof(unsigned int))
+#define SQUASHFS_META_ENTRIES	127
+#define SQUASHFS_META_SLOTS	8
+
+struct meta_entry {
+	u64			data_block;
+	unsigned int		index_block;
+	unsigned short		offset;
+	unsigned short		pad;
+};
+
+struct meta_index {
+	unsigned int		inode_number;
+	unsigned int		offset;
+	unsigned short		entries;
+	unsigned short		skip;
+	unsigned short		locked;
+	unsigned short		pad;
+	struct meta_entry	meta_entry[SQUASHFS_META_ENTRIES];
+};
+
+
+/*
+ * definitions for structures on disk
+ */
+#define ZLIB_COMPRESSION	 1
+
+struct squashfs_super_block {
+	__le32			s_magic;
+	__le32			inodes;
+	__le32			mkfs_time;
+	__le32			block_size;
+	__le32			fragments;
+	__le16			compression;
+	__le16			block_log;
+	__le16			flags;
+	__le16			no_ids;
+	__le16			s_major;
+	__le16			s_minor;
+	__le64			root_inode;
+	__le64			bytes_used;
+	__le64			id_table_start;
+	__le64			xattr_table_start;
+	__le64			inode_table_start;
+	__le64			directory_table_start;
+	__le64			fragment_table_start;
+	__le64			lookup_table_start;
+};
+
+struct squashfs_dir_index {
+	__le32			index;
+	__le32			start_block;
+	__le32			size;
+	unsigned char		name[0];
+};
+
+struct squashfs_base_inode {
+	__le16			inode_type;
+	__le16			mode;
+	__le16			uid;
+	__le16			guid;
+	__le32			mtime;
+	__le32	 		inode_number;
+};
+
+struct squashfs_ipc_inode {
+	__le16			inode_type;
+	__le16			mode;
+	__le16			uid;
+	__le16			guid;
+	__le32			mtime;
+	__le32	 		inode_number;
+	__le32			nlink;
+};
+
+struct squashfs_dev_inode {
+	__le16			inode_type;
+	__le16			mode;
+	__le16			uid;
+	__le16			guid;
+	__le32			mtime;
+	__le32	 		inode_number;
+	__le32			nlink;
+	__le32			rdev;
+};
+
+struct squashfs_symlink_inode {
+	__le16			inode_type;
+	__le16			mode;
+	__le16			uid;
+	__le16			guid;
+	__le32			mtime;
+	__le32	 		inode_number;
+	__le32			nlink;
+	__le32			symlink_size;
+	char			symlink[0];
+};
+
+struct squashfs_reg_inode {
+	__le16			inode_type;
+	__le16			mode;
+	__le16			uid;
+	__le16			guid;
+	__le32			mtime;
+	__le32	 		inode_number;
+	__le32			start_block;
+	__le32			fragment;
+	__le32			offset;
+	__le32			file_size;
+	__le16			block_list[0];
+};
+
+struct squashfs_lreg_inode {
+	__le16			inode_type;
+	__le16			mode;
+	__le16			uid;
+	__le16			guid;
+	__le32			mtime;
+	__le32	 		inode_number;
+	__le64			start_block;
+	__le64			file_size;
+	__le64			sparse;
+	__le32			nlink;
+	__le32			fragment;
+	__le32			offset;
+	__le32			xattr;
+	__le16			block_list[0];
+};
+
+struct squashfs_dir_inode {
+	__le16			inode_type;
+	__le16			mode;
+	__le16			uid;
+	__le16			guid;
+	__le32			mtime;
+	__le32	 		inode_number;
+	__le32			start_block;
+	__le32			nlink;
+	__le16			file_size;
+	__le16			offset;
+	__le32			parent_inode;
+};
+
+struct squashfs_ldir_inode {
+	__le16			inode_type;
+	__le16			mode;
+	__le16			uid;
+	__le16			guid;
+	__le32			mtime;
+	__le32	 		inode_number;
+	__le32			nlink;
+	__le32			file_size;
+	__le32			start_block;
+	__le32			parent_inode;
+	__le16			i_count;
+	__le16			offset;
+	__le32			xattr;
+	struct squashfs_dir_index	index[0];
+};
+
+union squashfs_inode {
+	struct squashfs_base_inode		base;
+	struct squashfs_dev_inode		dev;
+	struct squashfs_symlink_inode		symlink;
+	struct squashfs_reg_inode		reg;
+	struct squashfs_lreg_inode		lreg;
+	struct squashfs_dir_inode		dir;
+	struct squashfs_ldir_inode		ldir;
+	struct squashfs_ipc_inode		ipc;
+};
+
+struct squashfs_dir_entry {
+	__le16			offset;
+	__le16			inode_number;
+	__le16			type;
+	__le16			size;
+	char			name[0];
+};
+
+struct squashfs_dir_header {
+	__le32			count;
+	__le32			start_block;
+	__le32			inode_number;
+};
+
+struct squashfs_fragment_entry {
+	__le64			start_block;
+	__le32			size;
+	unsigned int		unused;
+};
+
+#endif
diff --git a/fs/squashfs/squashfs_fs_i.h b/fs/squashfs/squashfs_fs_i.h
new file mode 100644
index 0000000..fbfca30
--- /dev/null
+++ b/fs/squashfs/squashfs_fs_i.h
@@ -0,0 +1,45 @@
+#ifndef SQUASHFS_FS_I
+#define SQUASHFS_FS_I
+/*
+ * Squashfs
+ *
+ * Copyright (c) 2002, 2003, 2004, 2005, 2006, 2007, 2008
+ * Phillip Lougher <phillip@lougher.demon.co.uk>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * squashfs_fs_i.h
+ */
+
+struct squashfs_inode_info {
+	u64		start;
+	int		offset;
+	union {
+		struct {
+			u64		fragment_block;
+			int		fragment_size;
+			int		fragment_offset;
+			u64		block_list_start;
+		};
+		struct {
+			u64		dir_idx_start;
+			int		dir_idx_offset;
+			int		dir_idx_cnt;
+			int		parent;
+		};
+	};
+	struct inode	vfs_inode;
+};
+#endif
diff --git a/fs/squashfs/squashfs_fs_sb.h b/fs/squashfs/squashfs_fs_sb.h
new file mode 100644
index 0000000..c8c6561
--- /dev/null
+++ b/fs/squashfs/squashfs_fs_sb.h
@@ -0,0 +1,76 @@
+#ifndef SQUASHFS_FS_SB
+#define SQUASHFS_FS_SB
+/*
+ * Squashfs
+ *
+ * Copyright (c) 2002, 2003, 2004, 2005, 2006, 2007, 2008
+ * Phillip Lougher <phillip@lougher.demon.co.uk>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * squashfs_fs_sb.h
+ */
+
+#include "squashfs_fs.h"
+
+struct squashfs_cache {
+	char			*name;
+	int			entries;
+	int			next_blk;
+	int			num_waiters;
+	int			unused;
+	int			block_size;
+	int			pages;
+	spinlock_t		lock;
+	wait_queue_head_t	wait_queue;
+	struct squashfs_cache_entry *entry;
+};
+
+struct squashfs_cache_entry {
+	u64			block;
+	int			length;
+	int			refcount;
+	u64			next_index;
+	int			pending;
+	int			error;
+	int			num_waiters;
+	wait_queue_head_t	wait_queue;
+	struct squashfs_cache	*cache;
+	void			**data;
+};
+
+struct squashfs_sb_info {
+	int			devblksize;
+	int			devblksize_log2;
+	struct squashfs_cache	*block_cache;
+	struct squashfs_cache	*fragment_cache;
+	struct squashfs_cache	*read_page;
+	int			next_meta_index;
+	__le64			*id_table;
+	__le64			*fragment_index;
+	unsigned int		*fragment_index_2;
+	struct mutex		read_data_mutex;
+	struct mutex		meta_index_mutex;
+	struct meta_index	*meta_index;
+	z_stream		stream;
+	__le64			*inode_lookup_table;
+	u64			inode_table;
+	u64			directory_table;
+	unsigned int		block_size;
+	unsigned short		block_log;
+	long long		bytes_used;
+	unsigned int		inodes;
+};
+#endif
-- 
1.5.6.3


^ permalink raw reply related

* [PATCH V3 15/17] Squashfs: initrd support
From: Phillip Lougher @ 2009-01-05 11:08 UTC (permalink / raw)
  To: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr


Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
---
 init/do_mounts_rd.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
index a7c748f..0f0f0cf 100644
--- a/init/do_mounts_rd.c
+++ b/init/do_mounts_rd.c
@@ -9,6 +9,7 @@
 #include <linux/string.h>
 
 #include "do_mounts.h"
+#include "../fs/squashfs/squashfs_fs.h"
 
 int __initdata rd_prompt = 1;/* 1 = prompt for RAM disk, 0 = don't prompt */
 
@@ -41,6 +42,7 @@ static int __init crd_load(int in_fd, int out_fd);
  * 	ext2
  *	romfs
  *	cramfs
+ *	squashfs
  * 	gzip
  */
 static int __init 
@@ -51,6 +53,7 @@ identify_ramdisk_image(int fd, int start_block)
 	struct ext2_super_block *ext2sb;
 	struct romfs_super_block *romfsb;
 	struct cramfs_super *cramfsb;
+	struct squashfs_super_block *squashfsb;
 	int nblocks = -1;
 	unsigned char *buf;
 
@@ -62,6 +65,7 @@ identify_ramdisk_image(int fd, int start_block)
 	ext2sb = (struct ext2_super_block *) buf;
 	romfsb = (struct romfs_super_block *) buf;
 	cramfsb = (struct cramfs_super *) buf;
+	squashfsb = (struct squashfs_super_block *) buf;
 	memset(buf, 0xe5, size);
 
 	/*
@@ -99,6 +103,16 @@ identify_ramdisk_image(int fd, int start_block)
 		goto done;
 	}
 
+	/* squashfs is at block zero too */
+	if (le32_to_cpu(squashfsb->s_magic) == SQUASHFS_MAGIC) {
+		printk(KERN_NOTICE
+		       "RAMDISK: squashfs filesystem found at block %d\n",
+		       start_block);
+		nblocks = (le64_to_cpu(squashfsb->bytes_used) + BLOCK_SIZE - 1)
+			 >> BLOCK_SIZE_BITS;
+		goto done;
+	}
+
 	/*
 	 * Read block 1 to test for minix and ext2 superblock
 	 */
-- 
1.5.6.3

^ permalink raw reply related

* [PATCH V3 13/17] Squashfs: Makefiles
From: Phillip Lougher @ 2009-01-05 11:08 UTC (permalink / raw)
  To: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr


Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
---
 fs/Makefile          |    1 +
 fs/squashfs/Makefile |    8 ++++++++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/Makefile b/fs/Makefile
index e6f423d..3f8843c 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -73,6 +73,7 @@ obj-$(CONFIG_JBD)		+= jbd/
 obj-$(CONFIG_JBD2)		+= jbd2/
 obj-$(CONFIG_EXT2_FS)		+= ext2/
 obj-$(CONFIG_CRAMFS)		+= cramfs/
+obj-$(CONFIG_SQUASHFS)		+= squashfs/
 obj-y				+= ramfs/
 obj-$(CONFIG_HUGETLBFS)		+= hugetlbfs/
 obj-$(CONFIG_CODA_FS)		+= coda/
diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
new file mode 100644
index 0000000..8258cf9
--- /dev/null
+++ b/fs/squashfs/Makefile
@@ -0,0 +1,8 @@
+#
+# Makefile for the linux squashfs routines.
+#
+
+obj-$(CONFIG_SQUASHFS) += squashfs.o
+squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o
+squashfs-y += namei.o super.o symlink.o
+#squashfs-y += squashfs2_0.o
-- 
1.5.6.3


^ permalink raw reply related

* [PATCH V3 07/17] Squashfs: export operations
From: Phillip Lougher @ 2009-01-05 11:08 UTC (permalink / raw)
  To: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr


Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
---
 fs/squashfs/export.c |  155 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 155 insertions(+), 0 deletions(-)

diff --git a/fs/squashfs/export.c b/fs/squashfs/export.c
new file mode 100644
index 0000000..69e971d
--- /dev/null
+++ b/fs/squashfs/export.c
@@ -0,0 +1,155 @@
+/*
+ * Squashfs - a compressed read only filesystem for Linux
+ *
+ * Copyright (c) 2002, 2003, 2004, 2005, 2006, 2007, 2008
+ * Phillip Lougher <phillip@lougher.demon.co.uk>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * export.c
+ */
+
+/*
+ * This file implements code to make Squashfs filesystems exportable (NFS etc.)
+ *
+ * The export code uses an inode lookup table to map inode numbers passed in
+ * filehandles to an inode location on disk.  This table is stored compressed
+ * into metadata blocks.  A second index table is used to locate these.  This
+ * second index table for speed of access (and because it is small) is read at
+ * mount time and cached in memory.
+ *
+ * The inode lookup table is used only by the export code, inode disk
+ * locations are directly encoded in directories, enabling direct access
+ * without an intermediate lookup for all operations except the export ops.
+ */
+
+#include <linux/fs.h>
+#include <linux/vfs.h>
+#include <linux/dcache.h>
+#include <linux/exportfs.h>
+#include <linux/zlib.h>
+
+#include "squashfs_fs.h"
+#include "squashfs_fs_sb.h"
+#include "squashfs_fs_i.h"
+#include "squashfs.h"
+
+/*
+ * Look-up inode number (ino) in table, returning the inode location.
+ */
+static long long squashfs_inode_lookup(struct super_block *sb, int ino_num)
+{
+	struct squashfs_sb_info *msblk = sb->s_fs_info;
+	int blk = SQUASHFS_LOOKUP_BLOCK(ino_num - 1);
+	int offset = SQUASHFS_LOOKUP_BLOCK_OFFSET(ino_num - 1);
+	u64 start = le64_to_cpu(msblk->inode_lookup_table[blk]);
+	__le64 ino;
+	int err;
+
+	TRACE("Entered squashfs_inode_lookup, inode_number = %d\n", ino_num);
+
+	err = squashfs_read_metadata(sb, &ino, &start, &offset, sizeof(ino));
+	if (err < 0)
+		return err;
+
+	TRACE("squashfs_inode_lookup, inode = 0x%llx\n",
+		(u64) le64_to_cpu(ino));
+
+	return le64_to_cpu(ino);
+}
+
+
+static struct dentry *squashfs_export_iget(struct super_block *sb,
+	unsigned int ino_num)
+{
+	long long ino;
+	struct dentry *dentry = ERR_PTR(-ENOENT);
+
+	TRACE("Entered squashfs_export_iget\n");
+
+	ino = squashfs_inode_lookup(sb, ino_num);
+	if (ino >= 0)
+		dentry = d_obtain_alias(squashfs_iget(sb, ino, ino_num));
+
+	return dentry;
+}
+
+
+static struct dentry *squashfs_fh_to_dentry(struct super_block *sb,
+		struct fid *fid, int fh_len, int fh_type)
+{
+	if ((fh_type != FILEID_INO32_GEN && fh_type != FILEID_INO32_GEN_PARENT)
+			|| fh_len < 2)
+		return NULL;
+
+	return squashfs_export_iget(sb, fid->i32.ino);
+}
+
+
+static struct dentry *squashfs_fh_to_parent(struct super_block *sb,
+		struct fid *fid, int fh_len, int fh_type)
+{
+	if (fh_type != FILEID_INO32_GEN_PARENT || fh_len < 4)
+		return NULL;
+
+	return squashfs_export_iget(sb, fid->i32.parent_ino);
+}
+
+
+static struct dentry *squashfs_get_parent(struct dentry *child)
+{
+	struct inode *inode = child->d_inode;
+	unsigned int parent_ino = squashfs_i(inode)->parent;
+
+	return squashfs_export_iget(inode->i_sb, parent_ino);
+}
+
+
+/*
+ * Read uncompressed inode lookup table indexes off disk into memory
+ */
+__le64 *squashfs_read_inode_lookup_table(struct super_block *sb,
+		u64 lookup_table_start, unsigned int inodes)
+{
+	unsigned int length = SQUASHFS_LOOKUP_BLOCK_BYTES(inodes);
+	__le64 *inode_lookup_table;
+	int err;
+
+	TRACE("In read_inode_lookup_table, length %d\n", length);
+
+	/* Allocate inode lookup table indexes */
+	inode_lookup_table = kmalloc(length, GFP_KERNEL);
+	if (inode_lookup_table == NULL) {
+		ERROR("Failed to allocate inode lookup table\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	err = squashfs_read_table(sb, inode_lookup_table, lookup_table_start,
+			length);
+	if (err < 0) {
+		ERROR("unable to read inode lookup table\n");
+		kfree(inode_lookup_table);
+		return ERR_PTR(err);
+	}
+
+	return inode_lookup_table;
+}
+
+
+const struct export_operations squashfs_export_ops = {
+	.fh_to_dentry = squashfs_fh_to_dentry,
+	.fh_to_parent = squashfs_fh_to_parent,
+	.get_parent = squashfs_get_parent
+};
-- 
1.5.6.3

^ permalink raw reply related

* [PATCH V3 09/17] Squashfs: uid/gid lookup operations
From: Phillip Lougher @ 2009-01-05 11:08 UTC (permalink / raw)
  To: akpm, linux-embedded, linux-fsdevel, linux-kernel, tim.bird, sfr


Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
---
 fs/squashfs/id.c |   94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 94 insertions(+), 0 deletions(-)

diff --git a/fs/squashfs/id.c b/fs/squashfs/id.c
new file mode 100644
index 0000000..3795b83
--- /dev/null
+++ b/fs/squashfs/id.c
@@ -0,0 +1,94 @@
+/*
+ * Squashfs - a compressed read only filesystem for Linux
+ *
+ * Copyright (c) 2002, 2003, 2004, 2005, 2006, 2007, 2008
+ * Phillip Lougher <phillip@lougher.demon.co.uk>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * id.c
+ */
+
+/*
+ * This file implements code to handle uids and gids.
+ *
+ * For space efficiency regular files store uid and gid indexes, which are
+ * converted to 32-bit uids/gids using an id look up table.  This table is
+ * stored compressed into metadata blocks.  A second index table is used to
+ * locate these.  This second index table for speed of access (and because it
+ * is small) is read at mount time and cached in memory.
+ */
+
+#include <linux/fs.h>
+#include <linux/vfs.h>
+#include <linux/slab.h>
+#include <linux/zlib.h>
+
+#include "squashfs_fs.h"
+#include "squashfs_fs_sb.h"
+#include "squashfs_fs_i.h"
+#include "squashfs.h"
+
+/*
+ * Map uid/gid index into real 32-bit uid/gid using the id look up table
+ */
+int squashfs_get_id(struct super_block *sb, unsigned int index,
+					unsigned int *id)
+{
+	struct squashfs_sb_info *msblk = sb->s_fs_info;
+	int block = SQUASHFS_ID_BLOCK(index);
+	int offset = SQUASHFS_ID_BLOCK_OFFSET(index);
+	u64 start_block = le64_to_cpu(msblk->id_table[block]);
+	__le32 disk_id;
+	int err;
+
+	err = squashfs_read_metadata(sb, &disk_id, &start_block, &offset,
+							sizeof(disk_id));
+	if (err < 0)
+		return err;
+
+	*id = le32_to_cpu(disk_id);
+	return 0;
+}
+
+
+/*
+ * Read uncompressed id lookup table indexes from disk into memory
+ */
+__le64 *squashfs_read_id_index_table(struct super_block *sb,
+			u64 id_table_start, unsigned short no_ids)
+{
+	unsigned int length = SQUASHFS_ID_BLOCK_BYTES(no_ids);
+	__le64 *id_table;
+	int err;
+
+	TRACE("In read_id_index_table, length %d\n", length);
+
+	/* Allocate id lookup table indexes */
+	id_table = kmalloc(length, GFP_KERNEL);
+	if (id_table == NULL) {
+		ERROR("Failed to allocate id index table\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	err = squashfs_read_table(sb, id_table, id_table_start, length);
+	if (err < 0) {
+		ERROR("unable to read id index table\n");
+		kfree(id_table);
+		return ERR_PTR(err);
+	}
+
+	return id_table;
+}
-- 
1.5.6.3


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox