* [RFC] Support for bzip compressed modules
@ 2007-01-08 20:34 Karl MacMillan
2007-01-09 7:18 ` James Antill
0 siblings, 1 reply; 10+ messages in thread
From: Karl MacMillan @ 2007-01-08 20:34 UTC (permalink / raw)
To: SELinux Mail List
[-- Attachment #1: Type: text/plain, Size: 2570 bytes --]
There was some discussion about bzip compressing policy modules
(actually policy packages). The attached patch implements this. The
patch is not ready for merging - I'm trying to get feedback since there
was opposition to this approach when proposed. This patch should
probably wait until after a stable branch is created.
The patch implements this support by changing sepol_policy_file_t to
support decompressing files or memory areas into a private memory copy.
This support is optional - dlopen is used so that a hard dependency to
libbz2 is not introduced. I took the approach of decompressing the
entire file or memory area because:
* It is very simple
* The current code depends on the ability to seek within policy files -
this is not really possible within compressed streams using the bzip2
library.
The downsides are:
* Increased memory usage
* No transparent support for compressed writing with an fd based policy
file.
I didn't want to add additional set functions - I would have preferred
to allow sepol_policy_file_set_[mem,fd] to transparently open compressed
streams with functions to set other behaviors as options stored in
sepol_policy_file_t structs. This was not possible becuase the current
set functions do not return errors.
Comments appreciated. Some very crude benchmarking below (note that I am
using a patched semodule to allow the globbing syntax - patch for that
to follow). The summary is that there is substantial space savings at
the expense of some increase in time to complete common actions. An
acceptable trade-off in my opinion.
Anyone have suggestions for something as simple as time but for max
memory usage?
Karl
Uncompressed
------------
[root@localhost modules]# time semodule -b
/usr/share/selinux/strict/base.pp
real 0m15.849s
user 0m14.791s
sys 0m0.930s
[root@localhost nobz-modules]# time semodule -i *.pp
real 0m15.447s
user 0m14.287s
sys 0m0.997s
[root@localhost modules]# time semodule -l
real 0m0.153s
user 0m0.133s
sys 0m0.017s
[root@localhost modules]# du -h
17M ./active/modules
22M ./active
22M .
Compressed
----------
[root@localhost modules]# time semodule -b /root/base.pp.bz2
real 0m16.117s
user 0m14.729s
sys 0m1.022s
[root@localhost modules]# time semodule -i /root/modules/*.bz2
real 0m18.529s
user 0m17.110s
sys 0m1.314s
[root@localhost modules]# time semodule -l
real 0m0.851s
user 0m0.750s
sys 0m0.098s
[root@localhost modules]# du -h
2.0M ./active/modules
4.9M ./active
4.9M .
[-- Attachment #2: selinux-compressed-modules.patch --]
[-- Type: text/x-patch, Size: 12384 bytes --]
diff -r 67226637bf28 libsemanage/src/direct_api.c
--- a/libsemanage/src/direct_api.c Mon Jan 08 11:08:13 2007 -0500
+++ b/libsemanage/src/direct_api.c Mon Jan 08 15:00:13 2007 -0500
@@ -307,7 +307,15 @@ static int parse_module_headers(semanage
ERR(sh, "Out of memory!");
return -1;
}
- sepol_policy_file_set_mem(pf, module_data, data_len);
+ /* We have to assume that this might be a compressed stream,
+ * so first try to treat this as a bz2 stream. If that fails
+ * assume that it is not compressed. The bad part of this is
+ * that it will use at least twice the memory of the original
+ * data. */
+ if (sepol_policy_file_set_mem_bz2(pf, module_data, data_len, 1) < 0) {
+ sepol_policy_file_set_mem(pf, module_data, data_len);
+ }
+
sepol_policy_file_set_handle(pf, sh->sepolh);
if (module_data == NULL ||
data_len == 0 ||
@@ -352,7 +360,13 @@ static int parse_base_headers(semanage_h
ERR(sh, "Out of memory!");
return -1;
}
- sepol_policy_file_set_mem(pf, module_data, data_len);
+ /* We have to assume that this might be a compressed stream,
+ * so first try to treat this as a bz2 stream. If that fails
+ * assume that it is not compressed. The bad part of this is
+ * that it will use at least twice the memory of the original
+ * data. */
+ if (sepol_policy_file_set_mem_bz2(pf, module_data, data_len, 1) < 0)
+ sepol_policy_file_set_mem(pf, module_data, data_len);
sepol_policy_file_set_handle(pf, sh->sepolh);
if (module_data == NULL ||
data_len == 0 ||
@@ -915,16 +929,16 @@ static int semanage_direct_list(semanage
goto cleanup;
}
+ if ((*modinfo = calloc(num_mod_files, sizeof(**modinfo))) == NULL) {
+ ERR(sh, "Out of memory!");
+ goto cleanup;
+ }
+
if (sepol_policy_file_create(&pf)) {
ERR(sh, "Out of memory!");
goto cleanup;
}
sepol_policy_file_set_handle(pf, sh->sepolh);
-
- if ((*modinfo = calloc(num_mod_files, sizeof(**modinfo))) == NULL) {
- ERR(sh, "Out of memory!");
- goto cleanup;
- }
for (i = 0; i < num_mod_files; i++) {
FILE *fp;
@@ -936,7 +950,10 @@ static int semanage_direct_list(semanage
continue;
}
__fsetlocking(fp, FSETLOCKING_BYCALLER);
- sepol_policy_file_set_fp(pf, fp);
+ if (sepol_policy_file_set_fp_bz2(pf, fp, 1) < 0) {
+ sepol_policy_file_set_fp(pf, fp);
+ }
+
if (sepol_module_package_info(pf, &type, &name, &version)) {
fclose(fp);
free(name);
diff -r 67226637bf28 libsemanage/src/semanage_store.c
--- a/libsemanage/src/semanage_store.c Mon Jan 08 11:08:13 2007 -0500
+++ b/libsemanage/src/semanage_store.c Mon Jan 08 15:00:13 2007 -0500
@@ -1490,7 +1490,13 @@ static int semanage_load_module(semanage
goto cleanup;
}
__fsetlocking(fp, FSETLOCKING_BYCALLER);
- sepol_policy_file_set_fp(pf, fp);
+ /* Try to set this as a bzip2 compressed file first. If this
+ * fails it means that the file is not compressed, so fall back
+ * to normal reading.
+ */
+ if (sepol_policy_file_set_fp_bz2(pf, fp, 1) < 0) {
+ sepol_policy_file_set_fp(pf, fp);
+ }
sepol_policy_file_set_handle(pf, sh->sepolh);
if (sepol_module_package_read(*package, pf, 0) == -1) {
ERR(sh, "Error while reading from module file %s.", filename);
diff -r 67226637bf28 libsepol/include/sepol/policydb.h
--- a/libsepol/include/sepol/policydb.h Mon Jan 08 11:08:13 2007 -0500
+++ b/libsepol/include/sepol/policydb.h Mon Jan 08 15:00:13 2007 -0500
@@ -29,6 +29,9 @@ extern void sepol_policy_file_set_mem(se
extern void sepol_policy_file_set_mem(sepol_policy_file_t * pf,
char *data, size_t len);
+extern int sepol_policy_file_set_mem_bz2(sepol_policy_file_t * pf, char *data,
+ size_t len, int check);
+
/*
* Get the size of the buffer needed to store a policydb write
* previously done on this policy file.
@@ -41,6 +44,9 @@ extern int sepol_policy_file_get_len(sep
* to the FILE.
*/
extern void sepol_policy_file_set_fp(sepol_policy_file_t * pf, FILE * fp);
+
+extern int sepol_policy_file_set_fp_bz2(sepol_policy_file_t * pf, FILE * fp,
+ int check);
/*
* Associate a handle with a policy file, for use in
diff -r 67226637bf28 libsepol/include/sepol/policydb/policydb.h
--- a/libsepol/include/sepol/policydb/policydb.h Mon Jan 08 11:08:13 2007 -0500
+++ b/libsepol/include/sepol/policydb/policydb.h Mon Jan 08 15:00:13 2007 -0500
@@ -562,6 +562,7 @@ typedef struct policy_file {
struct sepol_policy_file {
struct policy_file pf;
+ char *orig_data; /* if set, will be freed by sepol_policy_file_free */
};
extern int policydb_read(policydb_t * p, struct policy_file *fp,
diff -r 67226637bf28 libsepol/src/Makefile
--- a/libsepol/src/Makefile Mon Jan 08 11:08:13 2007 -0500
+++ b/libsepol/src/Makefile Mon Jan 08 15:00:13 2007 -0500
@@ -20,7 +20,7 @@ all: $(LIBA) $(LIBSO)
ranlib $@
$(LIBSO): $(LOBJS)
- $(CC) $(LDFLAGS) -shared -o $@ $^ -Wl,-soname,$(LIBSO),--version-script=libsepol.map,-z,defs
+ $(CC) $(LDFLAGS) -shared -o $@ $^ -Wl,-soname,$(LIBSO),--version-script=libsepol.map,-z,defs -ldl
ln -sf $@ $(TARGET)
%.o: %.c
diff -r 67226637bf28 libsepol/src/policydb_public.c
--- a/libsepol/src/policydb_public.c Mon Jan 08 11:08:13 2007 -0500
+++ b/libsepol/src/policydb_public.c Mon Jan 08 15:00:13 2007 -0500
@@ -1,9 +1,32 @@
+/*
+ * Author(s): Karl MacMillan <kmacmillan@mentalrootkit.com>
+ *
+ * Copyright (C) 2007 Red Hat, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
#include <stdlib.h>
#include "debug.h"
#include <sepol/policydb/policydb.h>
#include "policydb_internal.h"
+#include <bzlib.h>
+#include <dlfcn.h>
+
/* Policy file interfaces. */
int sepol_policy_file_create(sepol_policy_file_t ** pf)
@@ -14,10 +37,21 @@ int sepol_policy_file_create(sepol_polic
return 0;
}
+static void sepol_policy_file_free_data(sepol_policy_file_t * pf)
+{
+ if (pf && pf->orig_data) {
+ free(pf->orig_data);
+ pf->orig_data = NULL;
+ }
+}
+
void sepol_policy_file_set_mem(sepol_policy_file_t * spf,
char *data, size_t len)
{
struct policy_file *pf = &spf->pf;
+
+ sepol_policy_file_free_data(spf);
+
if (!len) {
pf->type = PF_LEN;
return;
@@ -32,9 +66,226 @@ void sepol_policy_file_set_fp(sepol_poli
void sepol_policy_file_set_fp(sepol_policy_file_t * spf, FILE * fp)
{
struct policy_file *pf = &spf->pf;
+
+ sepol_policy_file_free_data(spf);
+
pf->type = PF_USE_STDIO;
pf->fp = fp;
- return;
+}
+
+/* BZIP suport */
+
+#define BZ_UNINIT 0
+#define BZ_INIT 1
+#define BZ_ERROR 2
+static int bz_status = 0;
+
+static BZFILE* (*bz2_read_open_fp)(int *berror, FILE * f, int verbosity, int small,
+ void *unused, int nUnused) = NULL;
+static int (*bz2_read_fp)(int *berror, BZFILE *b, void *buf, int len) = NULL;
+static int (*bz2_decompress)(char *dest, unsigned int *dest_len, char *source,
+ unsigned int source_len, int small, int verbosity) = NULL;
+
+static int bz2_init(void)
+{
+ void *handle;
+
+ /* Initialize the library */
+ if (bz_status == BZ_ERROR) {
+ return -1;
+ } else if (bz_status == BZ_UNINIT) {
+ handle = dlopen("libbz2.so", RTLD_LAZY | RTLD_LOCAL);
+ if (handle == NULL) {
+ bz_status = BZ_ERROR;
+ return -1;
+ }
+ bz2_read_open_fp = dlsym(handle, "BZ2_bzReadOpen");
+ if (bz2_read_open_fp == NULL) {
+ bz_status = BZ_ERROR;
+ return -1;
+ }
+ bz2_read_fp = dlsym(handle, "BZ2_bzRead");
+ if (bz2_read_fp == NULL) {
+ bz_status = BZ_ERROR;
+ return -1;
+ }
+ bz2_decompress = dlsym(handle, "BZ2_bzBuffToBuffDecompress");
+ if (bz2_decompress == NULL) {
+ bz_status = BZ_ERROR;
+ return -1;
+ }
+ bz_status = BZ_INIT;
+ }
+
+ return 0;
+}
+
+/* read a bzip file into a memory buffer */
+static int bz2_read_file(BZFILE *bzfp, char **buf, int *buf_len)
+{
+ int ret, error, prev_buf_len;
+
+ *buf = NULL;
+ *buf_len = 0;
+ while (1) {
+ prev_buf_len = *buf_len;
+ *buf_len += BUFSIZ;
+ *buf = realloc(*buf, *buf_len);
+ if (*buf == NULL)
+ goto error;
+ ret = bz2_read_fp(&error, bzfp, *buf + prev_buf_len, BUFSIZ);
+ if (error != BZ_OK) {
+ if (error == BZ_STREAM_END) {
+ /* trim the buffer if needed */
+ if (ret != BUFSIZ) {
+ *buf_len -= BUFSIZ - ret;
+ *buf = realloc(*buf, *buf_len);
+ if (*buf == NULL)
+ goto error;
+ }
+ break;
+ } else {
+ goto error;
+ }
+ } else if (ret != BUFSIZ) {
+ goto error;
+ }
+ }
+
+ return 0;
+
+error:
+ free(buf);
+ *buf = NULL;
+ return -1;
+}
+
+/* Determine if the buffer contains the BZ2 magic number. This is based on
+ * /usr/share/file/magic. The buffer must be at least BZ2_MAGIC_LEN long.
+ */
+#define BZ2_MAGIC_LEN 3
+static int is_bz2_magic(char *buf)
+{
+ if (strncmp(buf, "BZh", BZ2_MAGIC_LEN) == 0)
+ return 1;
+ else
+ return 0;
+}
+
+static int is_bz2_file(FILE *fp)
+{
+ int ret, ret2;
+ char buf[BZ2_MAGIC_LEN];
+ long pos;
+
+ pos = ftell(fp);
+
+ ret = fread(buf, sizeof(char), BZ2_MAGIC_LEN, fp);
+ if (ret != BZ2_MAGIC_LEN) {
+ ret = -1;
+ goto out;
+ }
+
+ ret = is_bz2_magic(buf);
+out:
+ ret2 = fseek(fp, pos, SEEK_SET);
+ if (ret2 < 0)
+ ret = ret2;
+ return ret;
+}
+
+int sepol_policy_file_set_fp_bz2(sepol_policy_file_t * pf, FILE * fp, int check)
+{
+ BZFILE *bzfp;
+ int ret, buf_len;
+ char *buf = NULL;
+
+ if (check) {
+ ret = is_bz2_file(fp);
+ if (ret <= 0) {
+ return -1;
+ }
+ }
+
+ if (bz2_init() != 0) {
+ return -1;
+ }
+
+ bzfp = bz2_read_open_fp(&ret, fp, 0, 0, NULL, 0);
+ if (bzfp == NULL) {
+ return -1;
+ }
+
+ if (bz2_read_file(bzfp, &buf, &buf_len) < 0) {
+ goto error;
+ }
+
+ sepol_policy_file_set_mem(pf, buf, buf_len);
+ sepol_policy_file_free_data(pf);
+ pf->orig_data = buf;
+
+ return 0;
+
+error:
+ free(buf);
+ return -1;
+}
+
+int sepol_policy_file_set_mem_bz2(sepol_policy_file_t * pf, char *data, size_t len, int check)
+{
+ int ret;
+ char *dest_data = NULL;
+ unsigned int dest_data_size;
+ unsigned int dest_len;
+
+ if (len < BZ2_MAGIC_LEN)
+ return -1;
+
+ if (check) {
+ ret = is_bz2_magic(data);
+ if (ret < 0)
+ return -1;
+ }
+
+ if (bz2_init() != 0)
+ return -1;
+
+ /* We are going to decompress the data into a new buffer. The _awesome_ thing
+ * about this is that the bzip library doesn't resize the destination buffer
+ * for you or really provide any sort of reasonable interface for handling
+ * this. The only solution is to try to guess the buffer size and keep
+ * trying until we finally get the buffer size right. *yay*
+ */
+ dest_data_size = len;
+ while (1) {
+ dest_data_size = dest_data_size * 2;
+ dest_len = dest_data_size;
+ dest_data = realloc(dest_data, dest_data_size);
+ if (!dest_data)
+ goto error;
+ ret = bz2_decompress(dest_data, &dest_len, data, len, 0, 0);
+
+ if (ret == BZ_OK)
+ break;
+ else if (ret == BZ_OUTBUFF_FULL)
+ continue;
+ else
+ goto error;
+ }
+
+ dest_data = realloc(dest_data, dest_len);
+ if (!dest_data)
+ goto error;
+
+ sepol_policy_file_set_mem(pf, dest_data, dest_len);
+ sepol_policy_file_free_data(pf);
+ pf->orig_data = dest_data;
+
+ return 0;
+
+error:
+ free(dest_data);
+ return -1;
}
int sepol_policy_file_get_len(sepol_policy_file_t * spf, size_t * len)
@@ -54,6 +305,7 @@ void sepol_policy_file_set_handle(sepol_
void sepol_policy_file_free(sepol_policy_file_t * pf)
{
+ sepol_policy_file_free_data(pf);
free(pf);
}
diff -r 67226637bf28 policycoreutils/semodule_deps/Makefile
--- a/policycoreutils/semodule_deps/Makefile Mon Jan 08 11:08:13 2007 -0500
+++ b/policycoreutils/semodule_deps/Makefile Mon Jan 08 15:00:13 2007 -0500
@@ -7,7 +7,7 @@ MANDIR ?= $(PREFIX)/share/man
CFLAGS ?= -Werror -Wall -W
override CFLAGS += -I$(INCLUDEDIR)
-LDLIBS = $(LIBDIR)/libsepol.a
+LDLIBS = $(LIBDIR)/libsepol.a -ldl
all: semodule_deps
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Support for bzip compressed modules
2007-01-08 20:34 [RFC] Support for bzip compressed modules Karl MacMillan
@ 2007-01-09 7:18 ` James Antill
2007-01-09 15:51 ` Karl MacMillan
2007-01-09 22:33 ` Russell Coker
0 siblings, 2 replies; 10+ messages in thread
From: James Antill @ 2007-01-09 7:18 UTC (permalink / raw)
To: Karl MacMillan; +Cc: SELinux Mail List
[-- Attachment #1: Type: text/plain, Size: 3172 bytes --]
On Mon, 2007-01-08 at 15:34 -0500, Karl MacMillan wrote:
> The patch implements this support by changing sepol_policy_file_t to
> support decompressing files or memory areas into a private memory copy.
> This support is optional - dlopen is used so that a hard dependency to
> libbz2 is not introduced. I took the approach of decompressing the
> entire file or memory area because:
Why don't we want to depend on libbz, if we are building with bz2
support?
> * It is very simple
> * The current code depends on the ability to seek within policy files -
> this is not really possible within compressed streams using the bzip2
> library.
>
> The downsides are:
>
> * Increased memory usage
> * No transparent support for compressed writing with an fd based policy
> file.
>
> I didn't want to add additional set functions - I would have preferred
> to allow sepol_policy_file_set_[mem,fd] to transparently open compressed
> streams with functions to set other behaviors as options stored in
> sepol_policy_file_t structs. This was not possible becuase the current
> set functions do not return errors.
Do we really care about the memory usage, my instinct would be to drop
the FILE specific code and just dump everything into memory and then
call the mem_set function and thus. have only one decompression loop
(adding the fd version is simple then too).
Calling fstat(fileno(fp)) to read the policy in is probably easier than
a loop.
> Comments appreciated. Some very crude benchmarking below (note that I am
> using a patched semodule to allow the globbing syntax - patch for that
> to follow). The summary is that there is substantial space savings at
> the expense of some increase in time to complete common actions. An
> acceptable trade-off in my opinion.
>
> Anyone have suggestions for something as simple as time but for max
> memory usage?
There's memusage in glibc-utils.
---- code ----
The bz2 code looks fine, although the += BUFSIZE in one loop and *= 2
in the other is weird, and there's a couple of minor nits in the
interface:
. check is always true in callers, and I'm not sure why you'd have it
zero.
. All code paths have:
if (set_foo_bz2() == FAILED)
set_foo();
...which tells me set_foo_bz2() should do that ... in fact it seems sane
to just change set_foo() to check of bz2ness and do the right thing,
without having to alter the callers.
. A personal minor nit is that free(NULL) works fine, so don't work
around it (this idiom seems to be used in sepol).
. sepol_policy_file_free_data() is also called multiple times at the end
of the set_foo_bz2() functions (once inside set_foo() and then
explicitly immediately after).
I assume the only reason you went with bzip2 over gzip is the "have to
init yourself in the set_mem case"? I've done that before[1], so I can
help you get that bit done if you want ... this will drop
CPU/memory/dependency requirements (although expecting all Linux to have
libbz now isn't a big deal, IMO).
[1] http://www.and.org/vstr/examples/ex_zcat.c
--
James Antill <jantill@redhat.com>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Support for bzip compressed modules
2007-01-09 7:18 ` James Antill
@ 2007-01-09 15:51 ` Karl MacMillan
2007-01-09 15:58 ` Stephen Smalley
2007-01-09 16:50 ` James Antill
2007-01-09 22:33 ` Russell Coker
1 sibling, 2 replies; 10+ messages in thread
From: Karl MacMillan @ 2007-01-09 15:51 UTC (permalink / raw)
To: James Antill; +Cc: SELinux Mail List
James Antill wrote:
> On Mon, 2007-01-08 at 15:34 -0500, Karl MacMillan wrote:
>
>> The patch implements this support by changing sepol_policy_file_t to
>> support decompressing files or memory areas into a private memory copy.
>> This support is optional - dlopen is used so that a hard dependency to
>> libbz2 is not introduced. I took the approach of decompressing the
>> entire file or memory area because:
>
> Why don't we want to depend on libbz, if we are building with bz2
> support?
>
This allows the same binary to be installed with or without libbz2.
Since libsepol gets pulled into so many installs, it seems preferable to
not add additional dependencies.
On the other hand, I don't have a strong preference about that part of
the patch. If libbz2 is deemed sufficiently available (or even libz I
guess) we can just directly link. Perhaps with compile time options.
>> * It is very simple
>> * The current code depends on the ability to seek within policy files -
>> this is not really possible within compressed streams using the bzip2
>> library.
>>
>> The downsides are:
>>
>> * Increased memory usage
>> * No transparent support for compressed writing with an fd based policy
>> file.
>>
>> I didn't want to add additional set functions - I would have preferred
>> to allow sepol_policy_file_set_[mem,fd] to transparently open compressed
>> streams with functions to set other behaviors as options stored in
>> sepol_policy_file_t structs. This was not possible becuase the current
>> set functions do not return errors.
>
> Do we really care about the memory usage, my instinct would be to drop
> the FILE specific code and just dump everything into memory and then
> call the mem_set function and thus. have only one decompression loop
> (adding the fd version is simple then too).
> Calling fstat(fileno(fp)) to read the policy in is probably easier than
> a loop.
>
Not certain what you are getting at - both code paths result in an
uncompressed copy of the compressed data in memory. The only difference
is whether we are decompressing from an fd or from another memory buffer.
>> Comments appreciated. Some very crude benchmarking below (note that I am
>> using a patched semodule to allow the globbing syntax - patch for that
>> to follow). The summary is that there is substantial space savings at
>> the expense of some increase in time to complete common actions. An
>> acceptable trade-off in my opinion.
>>
>> Anyone have suggestions for something as simple as time but for max
>> memory usage?
>
> There's memusage in glibc-utils.
>
Thanks
> ---- code ----
>
> The bz2 code looks fine, although the += BUFSIZE in one loop and *= 2
> in the other is weird, and there's a couple of minor nits in the
> interface:
>
If you look at the loop with *= 2, we are just guessing buffer size so I
want to grow the buffer much more quickly if the decompression fails. We
start with a 2-1 compression ratio, then try 4-1, etc. If we only adding
BUFSIZ then we might loop for a long time growing the buffer.
> . check is always true in callers, and I'm not sure why you'd have it
> zero.
>
The magic number checking seems fragile - I'm assuming it might be
necessary to force the stream as compressed at some point. Since we are
maintaining ABI for this library (and these functions), seems better to
be safe.
> . All code paths have:
>
> if (set_foo_bz2() == FAILED)
> set_foo();
>
> ...which tells me set_foo_bz2() should do that ... in fact it seems sane
> to just change set_foo() to check of bz2ness and do the right thing,
> without having to alter the callers.
>
Note my comments with the original patch - this isn't possible because
set_foo() has a void return and we want to maintain binary compatibility.
> . A personal minor nit is that free(NULL) works fine, so don't work
> around it (this idiom seems to be used in sepol).
>
I don't that I see - maybe you mean:
+ if (pf && pf->orig_data) {
+ free(pf->orig_data);
+ pf->orig_data = NULL;
+ }
This is to allow this function to be called with a null pf - so I have
to check before looking in the struct.
> . sepol_policy_file_free_data() is also called multiple times at the end
> of the set_foo_bz2() functions (once inside set_foo() and then
> explicitly immediately after).
>
Good catch - thanks.
>
> I assume the only reason you went with bzip2 over gzip is the "have to
> init yourself in the set_mem case"?
No - just better compression.
[kmacmill@localhost ~]$ ls -l base.pp.*
-rw-r--r-- 1 kmacmill kmacmill 86379 Jan 9 10:50 base.pp.bz2
-rw-r--r-- 1 kmacmill kmacmill 167382 Jan 9 10:50 base.pp.gz
I've done that before[1], so I can
> help you get that bit done if you want ... this will drop
> CPU/memory/dependency requirements (although expecting all Linux to have
> libbz now isn't a big deal, IMO).
>
Ok - I'd be happy to support both if you want to send a patch.
Thanks - Karl
--
This message was distributed to subscribers of the selinux mailing list.
If you no longer wish to subscribe, send mail to majordomo@tycho.nsa.gov with
the words "unsubscribe selinux" without quotes as the message.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Support for bzip compressed modules
2007-01-09 15:51 ` Karl MacMillan
@ 2007-01-09 15:58 ` Stephen Smalley
2007-01-09 16:50 ` James Antill
1 sibling, 0 replies; 10+ messages in thread
From: Stephen Smalley @ 2007-01-09 15:58 UTC (permalink / raw)
To: Karl MacMillan; +Cc: James Antill, SELinux Mail List
On Tue, 2007-01-09 at 10:51 -0500, Karl MacMillan wrote:
> James Antill wrote:
> > On Mon, 2007-01-08 at 15:34 -0500, Karl MacMillan wrote:
> >
> >> The patch implements this support by changing sepol_policy_file_t to
> >> support decompressing files or memory areas into a private memory copy.
> >> This support is optional - dlopen is used so that a hard dependency to
> >> libbz2 is not introduced. I took the approach of decompressing the
> >> entire file or memory area because:
> >
> > Why don't we want to depend on libbz, if we are building with bz2
> > support?
> >
>
> This allows the same binary to be installed with or without libbz2.
> Since libsepol gets pulled into so many installs, it seems preferable to
> not add additional dependencies.
>
> On the other hand, I don't have a strong preference about that part of
> the patch. If libbz2 is deemed sufficiently available (or even libz I
> guess) we can just directly link. Perhaps with compile time options.
/sbin/init depends on libsepol, so if libsepol were to have a fixed
dependency on libbz2, then you might have a problem there (looks like
libbz2.so.1 lives in /usr/lib in Fedora, not in /lib, so it might not
even be available if /usr is a separate partition).
--
Stephen Smalley
National Security Agency
--
This message was distributed to subscribers of the selinux mailing list.
If you no longer wish to subscribe, send mail to majordomo@tycho.nsa.gov with
the words "unsubscribe selinux" without quotes as the message.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Support for bzip compressed modules
2007-01-09 15:51 ` Karl MacMillan
2007-01-09 15:58 ` Stephen Smalley
@ 2007-01-09 16:50 ` James Antill
2007-01-09 21:18 ` Karl MacMillan
1 sibling, 1 reply; 10+ messages in thread
From: James Antill @ 2007-01-09 16:50 UTC (permalink / raw)
To: Karl MacMillan; +Cc: SELinux Mail List
[-- Attachment #1: Type: text/plain, Size: 3014 bytes --]
On Tue, 2007-01-09 at 10:51 -0500, Karl MacMillan wrote:
> James Antill wrote:
> > On Mon, 2007-01-08 at 15:34 -0500, Karl MacMillan wrote:
> >> I didn't want to add additional set functions - I would have preferred
> >> to allow sepol_policy_file_set_[mem,fd] to transparently open compressed
> >> streams with functions to set other behaviors as options stored in
> >> sepol_policy_file_t structs. This was not possible becuase the current
> >> set functions do not return errors.
> >
> > Do we really care about the memory usage, my instinct would be to drop
> > the FILE specific code and just dump everything into memory and then
> > call the mem_set function and thus. have only one decompression loop
> > (adding the fd version is simple then too).
> > Calling fstat(fileno(fp)) to read the policy in is probably easier than
> > a loop.
> >
>
> Not certain what you are getting at - both code paths result in an
> uncompressed copy of the compressed data in memory. The only difference
> is whether we are decompressing from an fd or from another memory buffer.
Just that it seems easier to have the set_fp() function load all the
data into memory and call the set_mem() function, and have all of the
bz2 stuff in just the set_mem() function.
> > . check is always true in callers, and I'm not sure why you'd have it
> > zero.
> >
>
> The magic number checking seems fragile - I'm assuming it might be
> necessary to force the stream as compressed at some point. Since we are
> maintaining ABI for this library (and these functions), seems better to
> be safe.
Do policy files not have a magic value, then? You can also (for bz2)
check that the next value is between 1 and 9 (it's the compression
ratio).
I assume this means you'd rely on anything ending in ".bz2" being
compressed, and not otherwise ... do we always have a filename?
> > . All code paths have:
> >
> > if (set_foo_bz2() == FAILED)
> > set_foo();
> >
> > ...which tells me set_foo_bz2() should do that ... in fact it seems sane
> > to just change set_foo() to check of bz2ness and do the right thing,
> > without having to alter the callers.
> >
>
> Note my comments with the original patch - this isn't possible because
> set_foo() has a void return and we want to maintain binary compatibility.
Right, but as I said the error paths always just try again without
compression ... so why not just try the compression at the start of the
set_foo() code. You get the same behaviour.
> > I assume the only reason you went with bzip2 over gzip is the "have to
> > init yourself in the set_mem case"?
>
> No - just better compression.
>
> [kmacmill@localhost ~]$ ls -l base.pp.*
> -rw-r--r-- 1 kmacmill kmacmill 86379 Jan 9 10:50 base.pp.bz2
> -rw-r--r-- 1 kmacmill kmacmill 167382 Jan 9 10:50 base.pp.gz
Wow ... that's better than usual. Do you have the same difference for
smaller modules?
--
James Antill <jantill@redhat.com>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Support for bzip compressed modules
2007-01-09 16:50 ` James Antill
@ 2007-01-09 21:18 ` Karl MacMillan
2007-01-10 5:06 ` James Antill
0 siblings, 1 reply; 10+ messages in thread
From: Karl MacMillan @ 2007-01-09 21:18 UTC (permalink / raw)
To: James Antill; +Cc: SELinux Mail List
On Tue, 2007-01-09 at 11:50 -0500, James Antill wrote:
> On Tue, 2007-01-09 at 10:51 -0500, Karl MacMillan wrote:
> > James Antill wrote:
> > > On Mon, 2007-01-08 at 15:34 -0500, Karl MacMillan wrote:
>
> > >> I didn't want to add additional set functions - I would have preferred
> > >> to allow sepol_policy_file_set_[mem,fd] to transparently open compressed
> > >> streams with functions to set other behaviors as options stored in
> > >> sepol_policy_file_t structs. This was not possible becuase the current
> > >> set functions do not return errors.
> > >
> > > Do we really care about the memory usage, my instinct would be to drop
> > > the FILE specific code and just dump everything into memory and then
> > > call the mem_set function and thus. have only one decompression loop
> > > (adding the fd version is simple then too).
> > > Calling fstat(fileno(fp)) to read the policy in is probably easier than
> > > a loop.
> > >
> >
> > Not certain what you are getting at - both code paths result in an
> > uncompressed copy of the compressed data in memory. The only difference
> > is whether we are decompressing from an fd or from another memory buffer.
>
> Just that it seems easier to have the set_fp() function load all the
> data into memory and call the set_mem() function, and have all of the
> bz2 stuff in just the set_mem() function.
>
Ah - I see. I prefer the separate decompression paths because the mem
case is potentially very inefficient. No strong preference though.
> > > . check is always true in callers, and I'm not sure why you'd have it
> > > zero.
> > >
> >
> > The magic number checking seems fragile - I'm assuming it might be
> > necessary to force the stream as compressed at some point. Since we are
> > maintaining ABI for this library (and these functions), seems better to
> > be safe.
>
> Do policy files not have a magic value, then?
They do, but there is no real restriction that these policy file things
get used for files containing policy. Currently they are used for a
variety of file types and I would hate to hard code a list valid file
types into this function.
> You can also (for bz2)
> check that the next value is between 1 and 9 (it's the compression
> ratio).
Ok. Could this be used for sizing the buffer in the memory case? In
other words, is it a user setting that was passed to bzip or information
about the result of the compression?
> I assume this means you'd rely on anything ending in ".bz2" being
> compressed, and not otherwise ... do we always have a filename?
>
No, and in fact we explicitly rename the policy modules in the store to
whatever name is set within the policy. So if you create a module name
foo saved in the file bar.pp.bz2, after 'semodule -i bar.pp.bz2' there
will be a file called foo.pp in the module store. I didn't change that
behavior, so file extension is not useful here.
> > > . All code paths have:
> > >
> > > if (set_foo_bz2() == FAILED)
> > > set_foo();
> > >
> > > ...which tells me set_foo_bz2() should do that ... in fact it seems sane
> > > to just change set_foo() to check of bz2ness and do the right thing,
> > > without having to alter the callers.
> > >
> >
> > Note my comments with the original patch - this isn't possible because
> > set_foo() has a void return and we want to maintain binary compatibility.
>
> Right, but as I said the error paths always just try again without
> compression ... so why not just try the compression at the start of the
> set_foo() code. You get the same behaviour.
>
It is not about returning information about the compression. It is
because the compression routines have other error paths (failure to load
libbz2, memory allocation, etc.). There is no good way to indicate those
errors without changing the prototypes. Even if we didn't change the
prototypes, it is valid to not check the current functions for error, so
we can't change them in any way that has potential error paths.
> > > I assume the only reason you went with bzip2 over gzip is the "have to
> > > init yourself in the set_mem case"?
> >
> > No - just better compression.
> >
> > [kmacmill@localhost ~]$ ls -l base.pp.*
> > -rw-r--r-- 1 kmacmill kmacmill 86379 Jan 9 10:50 base.pp.bz2
> > -rw-r--r-- 1 kmacmill kmacmill 167382 Jan 9 10:50 base.pp.gz
>
> Wow ... that's better than usual. Do you have the same difference for
> smaller modules?
>
[kmacmill@localhost ~]$ ls -l alsa.pp*
-rw-r--r-- 1 kmacmill kmacmill 41720 Jan 9 16:13 alsa.pp
-rw-r--r-- 1 kmacmill kmacmill 4987 Jan 9 16:13 alsa.pp.bz2
-rw-r--r-- 1 kmacmill kmacmill 5145 Jan 9 16:13 alsa.pp.gz
[kmacmill@localhost ~]$ ls -l apache.pp*
-rw-r--r-- 1 kmacmill kmacmill 416123 Jan 9 16:14 apache.pp
-rw-r--r-- 1 kmacmill kmacmill 15800 Jan 9 16:14 apache.pp.bz2
-rw-r--r-- 1 kmacmill kmacmill 26585 Jan 9 16:14 apache.pp.gz
Not quite as dramatic always, but still a significant difference.
Karl
--
This message was distributed to subscribers of the selinux mailing list.
If you no longer wish to subscribe, send mail to majordomo@tycho.nsa.gov with
the words "unsubscribe selinux" without quotes as the message.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Support for bzip compressed modules
2007-01-09 7:18 ` James Antill
2007-01-09 15:51 ` Karl MacMillan
@ 2007-01-09 22:33 ` Russell Coker
2007-01-11 18:48 ` Karl MacMillan
1 sibling, 1 reply; 10+ messages in thread
From: Russell Coker @ 2007-01-09 22:33 UTC (permalink / raw)
To: James Antill; +Cc: Karl MacMillan, SELinux Mail List
On Tuesday 09 January 2007 18:18, James Antill <jantill@redhat.com> wrote:
> I assume the only reason you went with bzip2 over gzip is the "have to
> init yourself in the set_mem case"? I've done that before[1], so I can
> help you get that bit done if you want ... this will drop
> CPU/memory/dependency requirements (although expecting all Linux to have
> libbz now isn't a big deal, IMO).
Given that the vast majority of new machines now are AMD64 part of the CPU
requirement will be due to bzip2 and gzip not being optimised for AMD64.
gzip has i386 assembler optimisation but no AMD64 assembler and runs more
slowly on AMD64 because of it (my tests show that on RHEL4 the i386 copy of
gzip outperforms the AMD64 version on Opteron machines).
If you compare the AMD64 compiled version of gzip (IE the library) with bzip2
the difference in CPU time is a lot smaller than it is when comparing the
i386 versions.
Of course it probably wouldn't be difficult for someone to write some AMD64
assembler code for gzip or bzip2 to change things.
In regard to the following, I think that the 15 seconds for uncompressed
operation is the problem not the 1-3 seconds that are added. Is there any
chance of optimising the other operations of semodule? If we could get a 10%
improvement in the base functionality then "semodule -b" with compression
would be expected to give the same performance as currently, if we could get
a 20% improvement then "semodule -i" would give the current performance too.
Then if you got an AMD64 assembler optimised gzip library the performance
would be even better.
On Tuesday 09 January 2007 07:34, Karl MacMillan
<kmacmillan@mentalrootkit.com> wrote:
> Uncompressed
> ------------
>
> [root@localhost modules]# time semodule -b
> /usr/share/selinux/strict/base.pp
>
> real 0m15.849s
> user 0m14.791s
> sys 0m0.930s
>
> [root@localhost nobz-modules]# time semodule -i *.pp
>
> real 0m15.447s
> user 0m14.287s
> sys 0m0.997s
>
> Compressed
> ----------
>
> [root@localhost modules]# time semodule -b /root/base.pp.bz2
>
> real 0m16.117s
> user 0m14.729s
> sys 0m1.022s
>
> [root@localhost modules]# time semodule -i /root/modules/*.bz2
>
> real 0m18.529s
> user 0m17.110s
> sys 0m1.314s
--
russell@coker.com.au
http://etbe.blogspot.com/ My Blog
http://www.coker.com.au/sponsorship.html Sponsoring Free Software development
--
This message was distributed to subscribers of the selinux mailing list.
If you no longer wish to subscribe, send mail to majordomo@tycho.nsa.gov with
the words "unsubscribe selinux" without quotes as the message.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Support for bzip compressed modules
2007-01-09 21:18 ` Karl MacMillan
@ 2007-01-10 5:06 ` James Antill
2007-01-11 18:41 ` Karl MacMillan
0 siblings, 1 reply; 10+ messages in thread
From: James Antill @ 2007-01-10 5:06 UTC (permalink / raw)
To: Karl MacMillan; +Cc: SELinux Mail List
[-- Attachment #1: Type: text/plain, Size: 2257 bytes --]
On Tue, 2007-01-09 at 16:18 -0500, Karl MacMillan wrote:
> On Tue, 2007-01-09 at 11:50 -0500, James Antill wrote:
> > Do policy files not have a magic value, then?
>
> They do, but there is no real restriction that these policy file things
> get used for files containing policy. Currently they are used for a
> variety of file types and I would hate to hard code a list valid file
> types into this function.
I don't think I'm intelligent enough to understand this sentence :).
What I meant was if they have magic values, they'll never clash so it's
always ok to do the check, no?
Or in the other direction, if you can't differentiate on magic value or
the filename how can any piece of code tell whether it should decompress
it or treat it as raw data?
> > You can also (for bz2)
> > check that the next value is between 1 and 9 (it's the compression
> > ratio).
>
> Ok. Could this be used for sizing the buffer in the memory case?
No, it's just the "how hard did you want to try" value. So something
compressed with "bzip2 -1" will have a '1' and likewise "bzip2 -9" will
have a '9' after the magic.
> > Right, but as I said the error paths always just try again without
> > compression ... so why not just try the compression at the start of the
> > set_foo() code. You get the same behaviour.
> >
>
> It is not about returning information about the compression. It is
> because the compression routines have other error paths (failure to load
> libbz2, memory allocation, etc.). There is no good way to indicate those
> errors without changing the prototypes. Even if we didn't change the
> prototypes, it is valid to not check the current functions for error, so
> we can't change them in any way that has potential error paths.
Yes, I know that ... but there isn't a difference in the return value
between "bz2_init() failed" and "is_bz2_file() returned NO". So all the
callers just assume the later for all errors, if that's the case we
might as well combine it back into set_foo().
--
James Antill - <james.antill@redhat.com>
setsockopt(fd, IPPROTO_TCP, TCP_CONGESTION, ...);
setsockopt(fd, IPPROTO_TCP, TCP_DEFER_ACCEPT, ...);
setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, ...);
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Support for bzip compressed modules
2007-01-10 5:06 ` James Antill
@ 2007-01-11 18:41 ` Karl MacMillan
0 siblings, 0 replies; 10+ messages in thread
From: Karl MacMillan @ 2007-01-11 18:41 UTC (permalink / raw)
To: James Antill; +Cc: SELinux Mail List
On Wed, 2007-01-10 at 00:06 -0500, James Antill wrote:
> On Tue, 2007-01-09 at 16:18 -0500, Karl MacMillan wrote:
> > On Tue, 2007-01-09 at 11:50 -0500, James Antill wrote:
> > > Do policy files not have a magic value, then?
> >
> > They do, but there is no real restriction that these policy file things
> > get used for files containing policy. Currently they are used for a
> > variety of file types and I would hate to hard code a list valid file
> > types into this function.
>
> I don't think I'm intelligent enough to understand this sentence :).
> What I meant was if they have magic values, they'll never clash so it's
> always ok to do the check, no?
Policy files can, presumably, be used on arbitrary files or memory
areas. So there may or may not be a magic number at the beginning of the
stream. I think assuming that there will be is too limiting.
> Or in the other direction, if you can't differentiate on magic value or
> the filename how can any piece of code tell whether it should decompress
> it or treat it as raw data?
>
User input, information from other parts of the code, etc.
> > > You can also (for bz2)
> > > check that the next value is between 1 and 9 (it's the compression
> > > ratio).
> >
> > Ok. Could this be used for sizing the buffer in the memory case?
>
> No, it's just the "how hard did you want to try" value. So something
> compressed with "bzip2 -1" will have a '1' and likewise "bzip2 -9" will
> have a '9' after the magic.
>
Oh well.
> > > Right, but as I said the error paths always just try again without
> > > compression ... so why not just try the compression at the start of the
> > > set_foo() code. You get the same behaviour.
> > >
> >
> > It is not about returning information about the compression. It is
> > because the compression routines have other error paths (failure to load
> > libbz2, memory allocation, etc.). There is no good way to indicate those
> > errors without changing the prototypes. Even if we didn't change the
> > prototypes, it is valid to not check the current functions for error, so
> > we can't change them in any way that has potential error paths.
>
> Yes, I know that ... but there isn't a difference in the return value
> between "bz2_init() failed" and "is_bz2_file() returned NO". So all the
> callers just assume the later for all errors, if that's the case we
> might as well combine it back into set_foo().
>
I'd rather differentiate the error codes.
Karl
--
This message was distributed to subscribers of the selinux mailing list.
If you no longer wish to subscribe, send mail to majordomo@tycho.nsa.gov with
the words "unsubscribe selinux" without quotes as the message.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Support for bzip compressed modules
2007-01-09 22:33 ` Russell Coker
@ 2007-01-11 18:48 ` Karl MacMillan
0 siblings, 0 replies; 10+ messages in thread
From: Karl MacMillan @ 2007-01-11 18:48 UTC (permalink / raw)
To: russell; +Cc: James Antill, SELinux Mail List
On Wed, 2007-01-10 at 09:33 +1100, Russell Coker wrote:
> On Tuesday 09 January 2007 18:18, James Antill <jantill@redhat.com> wrote:
> > I assume the only reason you went with bzip2 over gzip is the "have to
> > init yourself in the set_mem case"? I've done that before[1], so I can
> > help you get that bit done if you want ... this will drop
> > CPU/memory/dependency requirements (although expecting all Linux to have
> > libbz now isn't a big deal, IMO).
>
> Given that the vast majority of new machines now are AMD64 part of the CPU
> requirement will be due to bzip2 and gzip not being optimised for AMD64.
> gzip has i386 assembler optimisation but no AMD64 assembler and runs more
> slowly on AMD64 because of it (my tests show that on RHEL4 the i386 copy of
> gzip outperforms the AMD64 version on Opteron machines).
>
> If you compare the AMD64 compiled version of gzip (IE the library) with bzip2
> the difference in CPU time is a lot smaller than it is when comparing the
> i386 versions.
>
> Of course it probably wouldn't be difficult for someone to write some AMD64
> assembler code for gzip or bzip2 to change things.
>
Ok.
>
> In regard to the following, I think that the 15 seconds for uncompressed
> operation is the problem not the 1-3 seconds that are added. Is there any
> chance of optimising the other operations of semodule? If we could get a 10%
> improvement in the base functionality then "semodule -b" with compression
> would be expected to give the same performance as currently, if we could get
> a 20% improvement then "semodule -i" would give the current performance too.
Hopefully these will get faster over time - there has been very little
performance work done to that code.
Karl
--
This message was distributed to subscribers of the selinux mailing list.
If you no longer wish to subscribe, send mail to majordomo@tycho.nsa.gov with
the words "unsubscribe selinux" without quotes as the message.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-01-11 18:49 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-08 20:34 [RFC] Support for bzip compressed modules Karl MacMillan
2007-01-09 7:18 ` James Antill
2007-01-09 15:51 ` Karl MacMillan
2007-01-09 15:58 ` Stephen Smalley
2007-01-09 16:50 ` James Antill
2007-01-09 21:18 ` Karl MacMillan
2007-01-10 5:06 ` James Antill
2007-01-11 18:41 ` Karl MacMillan
2007-01-09 22:33 ` Russell Coker
2007-01-11 18:48 ` Karl MacMillan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.