From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38224) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y7CCu-00050R-4E for qemu-devel@nongnu.org; Fri, 02 Jan 2015 19:04:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Y7CCq-0001L3-Sa for qemu-devel@nongnu.org; Fri, 02 Jan 2015 19:04:40 -0500 Received: from mx1.redhat.com ([209.132.183.28]:35263) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y7CCq-0001Kz-Kt for qemu-devel@nongnu.org; Fri, 02 Jan 2015 19:04:36 -0500 Message-ID: <54A73210.5010709@redhat.com> Date: Fri, 02 Jan 2015 19:04:32 -0500 From: John Snow MIME-Version: 1.0 References: <1419692504-29373-1-git-send-email-peter@lekensteyn.nl> <1419692504-29373-7-git-send-email-peter@lekensteyn.nl> In-Reply-To: <1419692504-29373-7-git-send-email-peter@lekensteyn.nl> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 06/10] block/dmg: process XML plists List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Wu , qemu-devel@nongnu.org Cc: Kevin Wolf , Stefan Hajnoczi On 12/27/2014 10:01 AM, Peter Wu wrote: > The format is simple enough to avoid using a full-blown XML parser. > The offsets are based on the description at > http://newosxbook.com/DMG.html > > Signed-off-by: Peter Wu > --- > block/dmg.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 69 insertions(+) > > diff --git a/block/dmg.c b/block/dmg.c > index 19e4fe2..c03ea01 100644 > --- a/block/dmg.c > +++ b/block/dmg.c > @@ -26,6 +26,7 @@ > #include "qemu/bswap.h" > #include "qemu/module.h" > #include > +#include > > enum { > /* Limit chunk sizes to prevent unreasonable amounts of memory being used > @@ -333,12 +334,66 @@ fail: > return ret; > } > > +static int dmg_read_plist_xml(BlockDriverState *bs, DmgHeaderState *ds, > + uint64_t info_begin, uint64_t info_length) > +{ > + BDRVDMGState *s = bs->opaque; > + int ret; > + uint8_t *buffer = NULL; > + char *data_begin, *data_end; > + > + /* Have at least some length to avoid NULL for g_malloc. Attempt to set a > + * safe upper cap on the data length. A test sample had a XML length of > + * about 1 MiB. */ > + if (info_length == 0 || info_length > 16 * 1024 * 1024) { > + ret = -EINVAL; > + goto fail; > + } > + > + buffer = g_malloc(info_length + 1); > + buffer[info_length] = '\0'; > + ret = bdrv_pread(bs->file, info_begin, buffer, info_length); > + if (ret != info_length) { > + ret = -EINVAL; > + goto fail; > + } > + > + /* look for .... The data is 284 (0x11c) bytes after base64 > + * decode. The actual data element has 431 (0x1af) bytes which includes tabs > + * and line feeds. */ > + data_end = (char *)buffer; > + while ((data_begin = strstr(data_end, "")) != NULL) { > + gsize out_len = 0; > + > + data_begin += 6; > + data_end = strstr(data_begin, ""); > + /* malformed XML? */ > + if (data_end == NULL) { > + ret = -EINVAL; > + goto fail; > + } > + *data_end++ = '\0'; > + g_base64_decode_inplace(data_begin, &out_len); > + ret = dmg_read_mish_block(s, ds, (uint8_t *)data_begin, > + (uint32_t)out_len); > + if (ret < 0) { > + goto fail; > + } > + } > + ret = 0; > + > +fail: > + g_free(buffer); > + return ret; > +} > + This starts to make me a little nervous, because we're ignoring so much of the XML document structure here and just effectively performing a regular search for "(.*)". Can we guarantee that the ONLY time the data element is used in this document is when it is being used in the exact context we are expecting here, where it contains the b64 mish data we expect it to? i.e. it is always in a path like this as detailed by http://newosxbook.com/DMG.html : plist/dict/key[text()='resource-fork']/following-sibling::dict/key[text()='blkx']/following-sibling::array/dict/key[text()='data']/following-sibling::data I notice that this document says other sections MAY be present, do any of them ever need to be parsed? Has anyone written about them before? Do we know if any use data sections? I suppose at the very least, sections of interest are always going to include the "mish" magic, so that should probably keep us from doing anything too stupid ... > static int dmg_open(BlockDriverState *bs, QDict *options, int flags, > Error **errp) > { > BDRVDMGState *s = bs->opaque; > DmgHeaderState ds; > uint64_t rsrc_fork_offset, rsrc_fork_length; > + uint64_t plist_xml_offset, plist_xml_length; > int64_t offset; > int ret; > > @@ -366,12 +421,26 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags, > if (ret < 0) { > goto fail; > } > + /* offset of property list (XMLOffset) */ > + ret = read_uint64(bs, offset + 0xd8, &plist_xml_offset); > + if (ret < 0) { > + goto fail; > + } > + ret = read_uint64(bs, offset + 0xe0, &plist_xml_length); > + if (ret < 0) { > + goto fail; > + } > if (rsrc_fork_offset != 0 && rsrc_fork_length != 0) { > ret = dmg_read_resource_fork(bs, &ds, > rsrc_fork_offset, rsrc_fork_length); > if (ret < 0) { > goto fail; > } > + } else if (plist_xml_offset != 0 && plist_xml_length != 0) { > + ret = dmg_read_plist_xml(bs, &ds, plist_xml_offset, plist_xml_length); > + if (ret < 0) { > + goto fail; > + } > } else { > ret = -EINVAL; > goto fail; >