From: John Snow <jsnow@redhat.com>
To: Peter Wu <peter@lekensteyn.nl>
Cc: Kevin Wolf <kwolf@redhat.com>,
qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 06/10] block/dmg: process XML plists
Date: Mon, 05 Jan 2015 11:46:30 -0500 [thread overview]
Message-ID: <54AABFE6.6060508@redhat.com> (raw)
In-Reply-To: <3699574.WfSB9l0yJF@al>
On 01/03/2015 06:54 AM, Peter Wu wrote:
> On Friday 02 January 2015 19:04:32 John Snow wrote:
>> On 12/27/2014 10:01 AM, Peter Wu wrote:
>>> The format is simple enough to avoid using a full-blown XML parser.
>>> The offsets are based on the description at
>>> http://newosxbook.com/DMG.html
>>>
>>> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
>>> ---
>>> block/dmg.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 69 insertions(+)
>>>
>>> diff --git a/block/dmg.c b/block/dmg.c
>>> index 19e4fe2..c03ea01 100644
>>> --- a/block/dmg.c
>>> +++ b/block/dmg.c
>>> @@ -26,6 +26,7 @@
>>> #include "qemu/bswap.h"
>>> #include "qemu/module.h"
>>> #include <zlib.h>
>>> +#include <glib.h>
>>>
>>> enum {
>>> /* Limit chunk sizes to prevent unreasonable amounts of memory being used
>>> @@ -333,12 +334,66 @@ fail:
>>> return ret;
>>> }
>>>
>>> +static int dmg_read_plist_xml(BlockDriverState *bs, DmgHeaderState *ds,
>>> + uint64_t info_begin, uint64_t info_length)
>>> +{
>>> + BDRVDMGState *s = bs->opaque;
>>> + int ret;
>>> + uint8_t *buffer = NULL;
>>> + char *data_begin, *data_end;
>>> +
>>> + /* Have at least some length to avoid NULL for g_malloc. Attempt to set a
>>> + * safe upper cap on the data length. A test sample had a XML length of
>>> + * about 1 MiB. */
>>> + if (info_length == 0 || info_length > 16 * 1024 * 1024) {
>>> + ret = -EINVAL;
>>> + goto fail;
>>> + }
>>> +
>>> + buffer = g_malloc(info_length + 1);
>>> + buffer[info_length] = '\0';
>>> + ret = bdrv_pread(bs->file, info_begin, buffer, info_length);
>>> + if (ret != info_length) {
>>> + ret = -EINVAL;
>>> + goto fail;
>>> + }
>>> +
>>> + /* look for <data>...</data>. The data is 284 (0x11c) bytes after base64
>>> + * decode. The actual data element has 431 (0x1af) bytes which includes tabs
>>> + * and line feeds. */
>>> + data_end = (char *)buffer;
>>> + while ((data_begin = strstr(data_end, "<data>")) != NULL) {
>>> + gsize out_len = 0;
>>> +
>>> + data_begin += 6;
>>> + data_end = strstr(data_begin, "</data>");
>>> + /* malformed XML? */
>>> + if (data_end == NULL) {
>>> + ret = -EINVAL;
>>> + goto fail;
>>> + }
>>> + *data_end++ = '\0';
>>> + g_base64_decode_inplace(data_begin, &out_len);
>>> + ret = dmg_read_mish_block(s, ds, (uint8_t *)data_begin,
>>> + (uint32_t)out_len);
>>> + if (ret < 0) {
>>> + goto fail;
>>> + }
>>> + }
>>> + ret = 0;
>>> +
>>> +fail:
>>> + g_free(buffer);
>>> + return ret;
>>> +}
>>> +
>>
>> This starts to make me a little nervous, because we're ignoring so much
>> of the XML document structure here and just effectively performing a
>> regular search for "<data>(.*)</data>".
>>
>> Can we guarantee that the ONLY time the data element is used in this
>> document is when it is being used in the exact context we are expecting
>> here, where it contains the b64 mish data we expect it to?
>>
>> i.e. it is always in a path like this as detailed by
>> http://newosxbook.com/DMG.html :
>>
>> plist/dict/key[text()='resource-fork']/following-sibling::dict/key[text()='blkx']/following-sibling::array/dict/key[text()='data']/following-sibling::data
>>
>> I notice that this document says other sections MAY be present, do any
>> of them ever need to be parsed? Has anyone written about them before?
>>
>> Do we know if any use data sections?
>>
>> I suppose at the very least, sections of interest are always going to
>> include the "mish" magic, so that should probably keep us from doing
>> anything too stupid ...
>
> I did not find DMG files with <data> elements at other locations. If it
> would occur, at worst we would fail to parse a DMG file. I think that
> introducing a XML parser here would introduce a risk for a minor benefit
> (being prepared for future cases).
>
> Since this is a property list, in theory people could include all kinds
> of data for different keys (which would then be matched by the current
> implementation). But how likely is this for a disk image?
>
> FWIW, I looked into the dmg2img program and that also looks for the
> strings "<data>" and "</data>". Nobody has raised a bug for that program
> so far.
>
> Do you think that it is worth to use a XML parser on potentially
> insecure data? I suggest to keep it as it, and reconsider a different
> approach in case a problem is encountered.
>
> Kind regards,
> Peter
No: I was just asking the questions. If dmg2img gets away with it, the
worst that will happen is we will fail to parse/load a DMG file because
we ignore everything without the "mish" magic, so this is OK.
I just wanted to check, since I didn't have a lot of DMG files on-hand
and I couldn't really find a fuller reference to the types of XML that
shows up.
Thanks!
>>> static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>>> Error **errp)
>>> {
>>> BDRVDMGState *s = bs->opaque;
>>> DmgHeaderState ds;
>>> uint64_t rsrc_fork_offset, rsrc_fork_length;
>>> + uint64_t plist_xml_offset, plist_xml_length;
>>> int64_t offset;
>>> int ret;
>>>
>>> @@ -366,12 +421,26 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>>> if (ret < 0) {
>>> goto fail;
>>> }
>>> + /* offset of property list (XMLOffset) */
>>> + ret = read_uint64(bs, offset + 0xd8, &plist_xml_offset);
>>> + if (ret < 0) {
>>> + goto fail;
>>> + }
>>> + ret = read_uint64(bs, offset + 0xe0, &plist_xml_length);
>>> + if (ret < 0) {
>>> + goto fail;
>>> + }
>>> if (rsrc_fork_offset != 0 && rsrc_fork_length != 0) {
>>> ret = dmg_read_resource_fork(bs, &ds,
>>> rsrc_fork_offset, rsrc_fork_length);
>>> if (ret < 0) {
>>> goto fail;
>>> }
>>> + } else if (plist_xml_offset != 0 && plist_xml_length != 0) {
>>> + ret = dmg_read_plist_xml(bs, &ds, plist_xml_offset, plist_xml_length);
>>> + if (ret < 0) {
>>> + goto fail;
>>> + }
>>> } else {
>>> ret = -EINVAL;
>>> goto fail;
>>>
>
next prev parent reply other threads:[~2015-01-05 16:46 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-27 15:01 [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Peter Wu
2014-12-27 15:01 ` [Qemu-devel] [PATCH 01/10] block/dmg: properly detect the UDIF trailer Peter Wu
2015-01-02 23:58 ` John Snow
2015-01-03 9:39 ` Peter Wu
2015-01-06 13:35 ` Stefan Hajnoczi
2014-12-27 15:01 ` [Qemu-devel] [PATCH 02/10] block/dmg: extract mish block decoding functionality Peter Wu
2015-01-02 23:59 ` John Snow
2015-01-03 11:05 ` Peter Wu
2015-01-06 13:42 ` Stefan Hajnoczi
2014-12-27 15:01 ` [Qemu-devel] [PATCH 03/10] block/dmg: extract processing of resource forks Peter Wu
2015-01-03 0:01 ` John Snow
2015-01-03 11:24 ` Peter Wu
2014-12-27 15:01 ` [Qemu-devel] [PATCH 04/10] block/dmg: process a buffer instead of reading ints Peter Wu
2015-01-03 0:01 ` John Snow
2014-12-27 15:01 ` [Qemu-devel] [PATCH 05/10] block/dmg: validate chunk size to avoid overflow Peter Wu
2015-01-03 0:02 ` John Snow
2014-12-27 15:01 ` [Qemu-devel] [PATCH 06/10] block/dmg: process XML plists Peter Wu
2015-01-03 0:04 ` John Snow
2015-01-03 11:54 ` Peter Wu
2015-01-05 16:46 ` John Snow [this message]
2015-01-05 16:54 ` John Snow
2014-12-27 15:01 ` [Qemu-devel] [PATCH 07/10] block/dmg: set virtual size to a non-zero value Peter Wu
2015-01-03 0:04 ` John Snow
2014-12-27 15:01 ` [Qemu-devel] [PATCH 08/10] block/dmg: fix sector data offset calculation Peter Wu
2015-01-03 0:05 ` John Snow
2015-01-03 12:47 ` Peter Wu
2014-12-27 15:01 ` [Qemu-devel] [PATCH 09/10] block/dmg: support bzip2 block entry types Peter Wu
2015-01-05 19:32 ` John Snow
2015-01-07 10:29 ` Paolo Bonzini
2015-01-07 10:31 ` Peter Wu
2015-01-07 10:53 ` Paolo Bonzini
2014-12-27 15:01 ` [Qemu-devel] [PATCH 10/10] block/dmg: improve zeroes handling Peter Wu
2015-01-05 19:48 ` John Snow
2015-01-06 0:21 ` Peter Wu
2015-01-02 14:14 ` [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Stefan Hajnoczi
2015-01-02 16:31 ` John Snow
2015-01-02 18:46 ` Peter Wu
2015-01-02 18:58 ` John Snow
2015-01-02 21:49 ` Peter Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54AABFE6.6060508@redhat.com \
--to=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=peter@lekensteyn.nl \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).