From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Schmidt Date: Mon, 11 Oct 2010 11:18:00 +0000 Subject: Re: [mlmmj] special characters in footer Message-Id: <4CB2F268.2050505@yahoo.com.au> MIME-Version: 1 Content-Type: multipart/mixed; boundary="------------050501060908030503080104" List-Id: References: In-Reply-To: To: mlmmj@mlmmj.org This is a multi-part message in MIME format. --------------050501060908030503080104 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit > is there an easy fix to add special characters to the footer? When the > footer is UTF-8 and has things like Umlauts, and is added to an > ISO-8859-15 mail, it's screwed up :-( I'm pretty sure there isn't! I think the way this is usually solved is by adding a new MIME part. Eventually I plan to move all content filtering/modification stuff like this out of Mlmmj into external programs, called through appropriate hooks. When this happens, the encoding will probably be made available as an environment var or something, and you could then do an appropriate conversion. But that's probably not going to happen any time in the near future, and will also probably need a bunch of discussion. At the moment, this is still done by piping the message through something prior to mlmmj-recieve, so Mlmmj sees every message it receives with the footer. I wrote a little C program that can do this, which I have attached. Feel free to use it. I use a slightly modified version of contrib/amime-receive to wrap it. There are other MIME-altering apps and scripts around on the web, too. Ben. --------------050501060908030503080104 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="foot_filter.c" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="foot_filter.c" // Check out the -V option; it outputs this and more #define FOOT_FILTER_VERSION "foot_filter version 1.2 by Ben Schmidt" static const char * USAGE="\n\ usage: foot_filter [-p plain_footer_file] [-h html_footer_file]\n\ [{-P|-H} mime_footer_file] [-s]\n\ foot_filter -V\n\ \n\ plain_footer_file, if present, will be appended to mails with plain text\n\ sections only. Similarly, html_footer_file. If mime_footer_file (either\n\ plain, -P, or HTML, -H) is given, it will be used when a mail with\n\ alternative formats is encountered, or if the footer for the relevant\n\ type of mail is not present; a new MIME section will be added.\n\ \n\ -s turns on smart mode which endeavours to remove included/quoted copies of\n\ the (or a similar) footer by surrounding the footer with patterns it later\n\ recognises. It also endeavours to strip 'padding' surrounding the old\n\ footers to make things as clean as possible. This includes whitespace\n\ (including ' ' and '
'), '>' quoting characters, various pairs of\n\ HTML tags (p, blockquote, div, span, font; it's naive, it doesn't check\n\ tags in between are balanced at all, so in '

prefix

suffix

' the\n\ first and last tags are paired), and even horizontal rules when inside\n\ paired tags (e.g. use '

footer
'). If the smart strings are\n\ found in the footer, they won't be added by the program, so you have the\n\ necessary control to do this.\n\ \n\ New footers are added prior to trailing whitespace and a few closing html\n\ tags (body, html) as well. You almost certainly want to begin your footer\n\ with an empty line because of this.\n\ \n\ Since these alterations, by their very nature, break signed mail,\n\ signatures are removed while processing. To keep some value from signatures,\n\ have the MTA verify them and add a header (or even supply an alternative\n\ footer to this program), and resign them to authenticate they came from the\n\ mailing list directly after the signature verification was done and recorded.\n\ Or don't use these kinds of transformations at all.\n\ \n\ -V shows the version and exits.\n\ \n\ Program is running now. Send EOF or interrupt to stop it. To avoid this usage\n\ message if wanting to run without arguments, use '--' as an argument.\n\ \n"; /* This is a pretty simple program not expecting much extension. As such, although prioritising correctness over speed in design (and thus using callbacks in the somewhat tricky buffer handling code, rather than specialised routines that would be faster), speed has been taken into consideration and prioritised above readability and modularity and other such generally recommended programming practices. There are a lot of global (well, static) variables. This should be fast as it lessens the amount of stack setup required, particularly when using the callbacks, which can't be inlined, and the processor cache will keep the globals close at hand. It was simpler just to use them throughout than use a mixture of globals and structs. However, great care should be taken to understand how and where (everywhere) globals are used before making any changes. Don't try to modify the program without understanding the whole thing or you will get burnt. You have been warned. I assume something else will break before us if sizes in the mail exceed what an int can represent! Relevant RFCs: http://www.ietf.org/rfc/rfc2015.txt http://www.ietf.org/rfc/rfc3851.txt http://www.ietf.org/rfc/rfc2045.txt http://www.ietf.org/rfc/rfc2046.txt http://www.ietf.org/rfc/rfc822.txt http://www.ietf.org/rfc/rfc2183.txt For program configuration, see 'constants' section below. Also see code comments throughout. Future possibilities: Saving copies of original mail in 'semi-temp' files for debugging. Strip attachments and save them (e.g. in a location that can become a 'files uploaded' section on a website). Replace them with links to the website, even. Make the prefixes, suffixes, replacements, padding, guts, pairs, configurable at runtime (a config file or files probably). Attaching signed mail, or wrapping in a multipart rather than removing signatures; wouldn't be hard if always using MIME footers. Following a script to allow various other header transformations (addition, removal, etc.), or other transformations. Headers as well as or instead of footers. */ /* tag: includes */ #include #include #include #include #include #include #include #include /* tag: typedefs */ // splint has bools, but C doesn't! #ifndef S_SPLINT_S typedef int bool; #define false (0) #define true (1) #endif // This is mostly to be able to include splint annotations typedef /*@null@*//*@observer@*/ const char * const_null_string; typedef /*@null@*/ char * null_string; typedef /*@null@*//*@owned@*/ char * owned_null_string; typedef /*@null@*//*@dependent@*/ char * dependent_null_string; // 'Callbacks'; they communicate primarily using globals, see below typedef bool (*callback_t)(); typedef void (*function_t)(); // For fill() typedef enum { echo, encode, shunt, discard, stop, fail } when_full_t; // Various places typedef enum { unencoded, quoted_printable, base64 } encoding_t; // For returning multiple characters, and a request to delete backlog // when decoding typedef struct { int r; int c1; int c2; int c3; } decode_t; /* tag: constants */ /* tag: header_constants */ // How many MIME Content- headers we expect, maximum, in a mail. If we have // more than that, we won't be able to process MIME so well, but we won't fail // catastrophically. #define mime_headers_max 16 /* tag: footer_constants */ // Stuff for processing the footer's smart removal and (smart or not) // insertion static const char * plain_prefix = "------~----------"; static const char * plain_suffix = "------~~---------"; static const char * plain_replacement = "\r\n\r\n"; static const_null_string plain_tails[] = { " ","\t","\r","\n", NULL }; static const_null_string plain_padding[] = { ">"," ","\t","\r","\n", NULL }; static const_null_string plain_guts[] = { NULL }; static const_null_string plain_pairs[] = { NULL }; static const char * html_prefix = "------~----------"; static const char * html_suffix = "------~~---------"; static const char * html_replacement = "\r\n

\r\n"; static const_null_string html_tails[] = { "","","","", " "," ","&NBSP;","\t","\r","\n", "
","
","
","
","
","
", NULL }; static const_null_string html_padding[] = { ">",">", " "," ","&NBSP;","\t","\r","\n", "
","
","
","
","
","
", NULL }; static const_null_string html_guts[] = { // These are removed in an attempt to make a pair "
","
","
","
","
","
", " "," ","&NBSP;","\t","\r","\n", "
","
","
","
","
","
", NULL }; static const_null_string html_pairs[] = { // Closing part (or NULL to mark no more), end of opening part, // start of opening part, NULL // The search strategy is fairly naive; if it finds the closing part, // it checks for the end of the opening part; if it finds that, it // searches back for the first character of each of the opening part // variants, and if that character is found and is the beginning of the // whole variant, it removes the pair. "

",">","

","

",">","

","

",">","

","
",">","
","
",">","
","
",">","
","
",">","","",">","","",">","","",">","","0,"unexpected commandline argument"); // Load footers if (plain_footer_file!=NULL) load_footer(&plain_footer,&plain_footer_buffer, plain_footer_file, smart_footer?plain_prefix:NULL,smart_footer?plain_suffix:NULL); if (html_footer_file!=NULL) load_footer(&html_footer,&html_footer_buffer, html_footer_file, smart_footer?html_prefix:NULL,smart_footer?html_suffix:NULL); if (mime_footer_file!=NULL) load_footer(&mime_footer,&mime_footer_buffer, mime_footer_file,NULL,NULL); // Do the job process_section(true,true,NULL); // Finish if (plain_footer_buffer!=NULL) free(plain_footer_buffer); if (html_footer_buffer!=NULL) free(html_footer_buffer); if (mime_footer_buffer!=NULL) free(mime_footer_buffer); exit(EX_OK); } static void load_footer(/*@out@*//*@shared@*/ char ** footer, /*@reldef@*/ char ** footer_buffer, char * file, /*@unique@*/ const_null_string prefix, /*@unique@*/ const_null_string suffix) { FILE * f; int prefixl=0, footerl=0, suffixl=0; char * ff; if (prefix!=NULL&&suffix!=NULL) { prefixl=(int)strlen(prefix); suffixl=(int)strlen(suffix); } f=fopen(file,"r"); resort_to_errno(f==NULL,"error opening footer file",EX_NOINPUT); resort_to_errno(fseek(f,0,SEEK_END)!=0, "error seeking end of footer file",EX_IOERR); resort_to_errno((footerl=(int)ftell(f))==-1, "error finding footer length",EX_IOERR); resort_to_errno(fseek(f,0,SEEK_SET)!=0, "error seeking in footer file",EX_IOERR); // prefix, \n, footer, \n, suffix, \0 *footer_buffer=alloc_or_exit(sizeof(char)*(prefixl+footerl+suffixl+3)); *footer=*footer_buffer; *footer+=prefixl+1; resort_to_errno(fread(*footer,1,(size_t)footerl,f)<(size_t)footerl, "error reading footer",EX_IOERR); // We strip off a single trailing newline to keep them from accumulating // but to allow the user the option of adding them if desired if ((*footer)[footerl-1]=='\n') --footerl; (*footer)[footerl]='\0'; if (prefix==NULL||suffix==NULL) return; // Put in the prefix and suffix as necessary ff=strstr(*footer,prefix); if (ff!=NULL) { ff=strstr(ff,suffix); if (ff!=NULL) return; (*footer)[footerl]='\n'; ++footerl; strcpy(*footer+footerl,suffix); (*footer)[footerl+suffixl]='\0'; } else { ff=strstr(*footer,suffix); if (ff==NULL) { (*footer)[footerl]='\n'; ++footerl; strcpy(*footer+footerl,suffix); (*footer)[footerl+suffixl]='\0'; } *footer-=prefixl+1; strcpy(*footer,prefix); (*footer)[prefixl]='\n'; } } // Should be called with the boundary for the section as lookahead // in the buffer, but nothing more, and no lookbehind. static void process_section(bool add_footer, bool can_reenvelope, /*@null@*/ bool * parent_needs_footer) { char * external=NULL; char * internal=NULL; char * generated=NULL; bool reenveloping=false; bool child_needed_footer=false; bool needs_footer=false; bool unsigning=false; if (parent_needs_footer!=NULL) *parent_needs_footer=false; // The headers must be read, saved and echoed before making any // recursive calls, as I'm naughty and using globals. read_boundary(&external); read_and_save_mime_headers(); if (mime_bad) { // If an error, just resort to echoing echo_buffer(); // Boundary and headers // End headers with the extra line break resort_to_errno(putstr("\r\n")==EOF, "error echoing string",EX_IOERR); free_saved_mime_headers(); // Body echo_to_boundary(external); free(external); return; } // Headers determining we skip this section if (is_signature()) { skip_buffer(); // Boundary and headers skip_to_boundary(external); return; } // Header processing if (is_signed()) unsigning=true; if (unsigning) change_to_mixed(); if (add_footer&&mime_footer!=NULL&&( is_alternative()||(is_multipart(NULL)&&!is_mixed())|| (is_plain()&&plain_footer==NULL)|| (is_html()&&html_footer==NULL) )) { add_footer=false; if (can_reenvelope) { reenveloping=true; remove_mime_headers(); } else if (parent_needs_footer!=NULL) *parent_needs_footer=true; } // Headers echo_buffer(); // Boundary and possibly modified headers if (reenveloping) { generate_boundary(&generated); output_mime_mixed_headers(generated); output_prolog(); output_boundary(generated); output_saved_mime_headers(); } // End the headers with the extra line break resort_to_errno(putstr("\r\n")==EOF, "error echoing string",EX_IOERR); // Body processing if (is_multipart(&internal)) { // This branch frees the MIME headers before recursing. // Don't include the prolog if it used to be signed; // it usually says something like 'this message is signed' if (unsigning) { skip_to_boundary(internal); resort_to_errno(putstr("\r\n")==EOF, "error echoing string",EX_IOERR); } else { echo_to_boundary(internal); } // The recursive call needs these globals free_saved_mime_headers(); while (!at_final_boundary(internal)) { process_section(add_footer,false,&child_needed_footer); if (child_needed_footer) needs_footer=true; } if (needs_footer) output_mime_footer(internal); free(internal); echo_to_boundary(external); } else { // This branch frees the MIME headers at the end if (!is_attachment()&&( (is_plain()&&plain_footer!=NULL)|| (is_html()&&html_footer!=NULL))) { // alternatively // if (!is_attachment()&&( // (is_plain()&&((add_footer&&plain_footer!=NULL)||smart_footer))|| // (is_html()&&((add_footer&&html_footer!=NULL)||smart_footer)))) { if (is_plain()) { process_text_section(add_footer,plain_footer, plain_prefix,plain_suffix,plain_replacement, plain_tails,plain_padding,plain_guts,plain_pairs,external); } else { process_text_section(add_footer,html_footer, html_prefix,html_suffix,html_replacement, html_tails,html_padding,html_guts,html_pairs,external); } } else { echo_to_boundary(external); } free_saved_mime_headers(); } // MIME stuff is freed now; take care not to use it. /*@-branchstate@*/ if (reenveloping) { // We ensure generated is not null in another if(reenveloping) // conditional above /*@-nullpass@*/ output_mime_footer(generated); output_final_boundary(generated); free(generated); /*@=nullpass@*/ } /*@=branchstate@*/ free(external); } /* tag: header_functions */ static inline void read_and_save_mime_headers() { /*@-mustfreeonly@*/ mime_bad=false; // Mark current end of buffer buffer_mark=buffer_read; buffer_marked=true; for (;;) { do { // Extend current header until beginning of next callback_bool=false; (void)fill(until_eol,shunt); if (buffer_filled==buffer_read) { // We probably hit EOF; just get out, and the whole // mail will end up echoed out warning("unexpected end of input"); break; } (void)look(one_char,buffer_read,false); if (callback_int==(int)' '||callback_int==(int)'\t') { // Continuation of previous header; read it read_buffer(); continue; } // Start of new header; don't read it; process the old one // (from the mark to the end of the lookbehind) break; } while (true); // Process the old header, if there is one if (buffer_mark0) { if (*h=='\0') break; if (*h=='\\') { ++h; if (*h=='\0') break; } else if (*h=='(') ++levels; else if (*h==')') --levels; ++h; } if (!delimiting(*h,ext)&&!delimiting(*(hh-1),ext)) { // Put in some whitespace if something delimiting isn't // coming and hasn't just been *hh=' '; ++hh; } continue; } else if (*h=='"'||*h=='[') { if (*h=='[') close=']'; else close='"'; *hh=*h; ++h; ++hh; hhh=hh; while (*h!='\0'&&*h!=close) { if (*h=='\\') { *hh=*h; ++hh; ++h; if (*h=='\0') break; if (*h=='\r'&&*(h+1)=='\n') { *hh=*h; ++hh; ++h; *hh=*h; ++hh; ++h; if (*h=='\0') break; ++hh; ++h; continue; } } else if (*h==(char)8) { --hh; ++h; if (hh'||c=='@'|| c==','||c==';'||c==':'||c=='\\'||c=='"'|| c=='.'||c=='['||c==']'|| (ext&&(c=='/'||c=='='||c=='?'))); } static inline void remove_mime_headers() { int h; for (h=0;h0, "internal error: unexpected data in buffer",EX_SOFTWARE); set_decoding_type(); encoding=decoding; decode_and_read_to_boundary_encoding_when_full(boundary); if (smart_footer&&footer!=NULL) { // alternatively // if (smart_footer) { for (;;) { prefix_pos=pos_of(prefix,0,buffer_read); if (prefix_pos==EOF) break; suffix_pos=pos_of(suffix,prefix_pos,buffer_read); if (suffix_pos==EOF) break; for (;;) { later_prefix_pos= pos_of(prefix,prefix_pos+prefixl,suffix_pos-prefixl); if (later_prefix_pos!=EOF) prefix_pos=later_prefix_pos; else break; } suffix_pos+=suffixl; pad(padding,guts,pairs,&prefix_pos,&suffix_pos); replacement_starts[replacements_count]=prefix_pos; replacement_ends[replacements_count]=suffix_pos; // We may not want the last replacement so replace // with nothing first replacement_strings[replacements_count]=NULL; ++replacements_count; // We want the last replacement; encode it now before // doing any more encoding if (removed_footers) encode_string(replacement); encode_replacements(); removed_footers=true; } } if (*boundary!='\0'&&(decoding==quoted_printable||decoding==unencoded)) { // If we're not using base64 encoding, and we're in multipart, there // will be a final CRLF that is part of the input but logically part of // the boundary, not the text. Removing the footer may have already // removed it, so we need to check if it's here or not. if (buffer_read>1) { callback_compare="\r\n"; (void)look(comparing_head,buffer_read-2,false); callback_compare=NULL; if (callback_bool) boundary_newline=true; } } if (add_footer&&footer!=NULL) { // This will skip past the boundary newline mark_tail(tails); if (removed_footers&&buffer_mark==0) { // The last replacement coincides with where the footer // is going to go; don't use the replacement text. removed_footers=false; } } if (removed_footers) encode_string(replacement); if (add_footer&&footer!=NULL) { if (buffer_mark0) (void)empty(until_no_buffer); } static inline void echo_lookbehind() { make_replacements(echoing_one_char,echoing_until_start_marked); if (buffer_read>0) (void)empty(echoing_until_no_lookbehind); } static inline void encode_lookbehind() { make_replacements(encoding_one_char,encoding_until_start_marked); if (buffer_read>0) (void)empty(encoding_until_no_lookbehind); } static inline void encode_replacements() { make_replacements(encoding_one_char,encoding_until_start_marked); } static inline void make_replacements(callback_t one_char, callback_t start_marked) { int r, minr=0; const char * c; if (buffer_read==0) return; buffer_marked=false; while (replacements_count>0) { for (r=0;r0) (void)empty(start_marked); c = replacement_strings[minr]; if (c!=NULL) { while (*c!='\0') { buffer_char=(int)(unsigned int)*c; (void)(*one_char)(); ++c; } } buffer_marked=true; buffer_mark=replacement_ends[minr]; for (r=0;r0) (void)empty(until_start_marked); for (r=minr;r0) (void)empty(encoding_until_start_marked); } static inline void echo_disk_buffer() { if (disk_buffer_filled>0) (void)empty(echoing_until_no_disk_buffer); } static inline void encode_disk_buffer() { if (disk_buffer_filled>0) (void)empty(encoding_until_no_disk_buffer); } static inline void skip_disk_buffer() { if (disk_buffer_filled>0) (void)empty(until_no_disk_buffer); } static inline void read_boundary(/*@out@*/ char ** boundary) { int l=0; if (buffer_filled>buffer_read) { callback_bool=false; callback_int=0; resort_to_exit(!look(counting_until_eol,buffer_read,false), "internal error: missing eol at section boundary",EX_SOFTWARE); l=callback_int-2; // remove the CRLF, but keep the leading '--' } // Leave room to append a trailing '--' for testing final boundary; // the CRLF will be written in this space by saving_until_eol too. *boundary = alloc_or_exit(sizeof(char)*(l+3)); if (buffer_filled>buffer_read) { callback_bool=false; callback_save=*boundary; (void)look(saving_until_eol,buffer_read,false); callback_save=NULL; } (*boundary)[l]='\0'; if (buffer_filled>buffer_read) { callback_bool=false; (void)look(until_eol,buffer_read,true); } } static inline void echo_to_boundary(const char * boundary) { do { echo_buffer(); } while (!process_one_line_checking_boundary( echoing_n_chars,NULL,until_eol,echo,boundary)); } static inline void skip_to_boundary(const char * boundary) { do { skip_buffer(); } while (!process_one_line_checking_boundary( n_chars,NULL,until_eol,discard,boundary)); } static inline void decode_and_read_to_boundary_encoding_when_full( const char * boundary) { do { read_buffer(); } while (!process_one_line_checking_boundary( encoding_n_chars,decode_lookahead, decoding_until_eol,encode,boundary)); finish_decoding(); // This just sets state, doesn't change data } static inline bool process_one_line_checking_boundary(callback_t n_chars, /*@null@*/ function_t process, callback_t processing, when_full_t when_full, const char * boundary) { bool stopped_by_design; if (feof(stdin)!=0) { // We're done! Call it a boundary (even if it isn't--we need to // get out of loops cleanly and tidy up as best we can). return true; } // Empty until enough space for boundary if (mem_buffer_size-mem_buffer_filled<80) { callback_int=80-(mem_buffer_size-mem_buffer_filled); (void)empty(n_chars); } callback_bool=false; stopped_by_design=fill(until_eol,stop); if (stopped_by_design||feof(stdin)!=0) { if (buffer_filled-buffer_read==0) { return *boundary=='\0'; } callback_bool=false; if (*boundary!='\0') { // Can only be at a boundary without being at EOF if there // really is a boundary /*@-temptrans@*/ callback_compare=boundary; /*@=temptrans@*/ (void)look(comparing_head,buffer_read,false); callback_compare=NULL; } if (!callback_bool&&process!=NULL) (*process)(); return callback_bool; } else { // Line is too long to be a boundary, so must be decoded if (process!=NULL) (*process)(); callback_bool=false; (void)fill(processing,when_full); return false; } } // Return the position of text whose start may occur in the buffer // anywhere between from and (just before) to. Use EOF for from to // go from current location; use EOF for to to read indefinitely; // EOF is returned if text is not found. static int pos_of(const char * text,int from,int to) { int saved_buffer_read; int pos=EOF; if (*text=='\0') return from; saved_buffer_read=buffer_read; if (from!=EOF) buffer_read=from; callback_match=(int)(unsigned int)*text; for (;;) { if (to!=EOF) { callback_int=to-buffer_read; if (!look(n_chars_until_match,buffer_read,true)) break; } else { if (!look(until_match,buffer_read,true)) break; } if (!callback_bool) break; /*@-temptrans@*/ callback_compare=text+1; /*@=temptrans@*/ (void)look(comparing_head,buffer_read,false); callback_compare=NULL; if (callback_bool) { // Include the first character pos=buffer_read-1; break; } } buffer_read=saved_buffer_read; return pos; } // Look at characters in the buffer, starting at offset from, // 'reading' if so indicated (and looking at that location). // The callback is called after updating the reading pointer // and placing the character in the buffer. The character is // also passed by means of the buffer_char global. // EOF is sent to the callback when we run out of data. // There is no automatic attempt to fill the buffer. // The callback should return a boolean indicating whether // to continue. This function will return true if the callback // indicated to stop (including if it so indicated on EOF), or // false if it stopped for EOF. // We always call the callback at least once, so don't call // this function at all unless you definitely want to look // at something. static bool look(callback_t callback,int from,bool read) { int pos=from; int disk_buffer_pos; char * mem_buffer_pos; if (pos=mem_buffer_end) mem_buffer_pos-=mem_buffer_size; while (pos=disk_buffer_filled) { mem_buffer_pos=mem_buffer_next_empty+(pos-disk_buffer_filled); if (mem_buffer_pos>=mem_buffer_end) mem_buffer_pos-=mem_buffer_size; while (pos>=disk_buffer_filled) { buffer_char=(int)(unsigned int)*mem_buffer_pos; if (!(*callback)()) return true; --mem_buffer_pos; if (mem_buffer_pos==mem_buffer_start-1) mem_buffer_pos=mem_buffer_end-1; if (mark&&pos==buffer_mark) --buffer_mark; --pos; } } if (pos>=0&&disk_buffer_filled>0) { disk_buffer_pos=disk_buffer_start+pos; // Reading backwards in the disk buffer is potentially very nasty; // hopefully it never actually happens while (pos>=0) { /*@-nullpass@*/ resort_to_errno(fseek(disk_buffer,disk_buffer_pos,SEEK_SET)!=0, "error seeking in temporary file",EX_IOERR); disk_buffer_sought=disk_buffer_pos; buffer_char=getc(disk_buffer); /*@=nullpass@*/ resort_to_errno(buffer_char==EOF, "error reading temporary file",EX_IOERR); ++disk_buffer_sought; if (!(*callback)()) return true; --disk_buffer_pos; if (mark&&pos==buffer_mark) --buffer_mark; --pos; } } if (mark&&buffer_mark==-1) { buffer_mark=0; buffer_marked=false; } // We don't call the callback on EOF when going backwards // buffer_char=EOF; // (void)(*callback)(); return false; } // Remove characters from the (beginning of the) buffer. The same // general principles as for look() apply. The callback is called // after the character is removed and all accounting has been done, so // perhaps the only place you can reliably find the character is in // the buffer_char global. Again the callback gets an EOF call if // there's nothing more to empty, and no automatic filling is done. // The callback and function return values are as for look() and // again, the callback is always called at least once; this means at // least one character is always removed from the buffer, so only call // the function if something definitely should be removed. static bool empty(callback_t callback) { if (disk_buffer_filled>0) { if (disk_buffer_sought!=disk_buffer_start) { /*@-nullpass@*/ resort_to_errno(fseek(disk_buffer,disk_buffer_start,SEEK_SET)!=0, "error seeking in temporary file",EX_IOERR); /*@=nullpass@*/ disk_buffer_sought=disk_buffer_start; } while (disk_buffer_filled>0) { /*@-nullpass@*/ buffer_char=getc(disk_buffer); /*@=nullpass@*/ resort_to_errno(buffer_char==EOF, "error reading temporary file",EX_IOERR); ++disk_buffer_sought; ++disk_buffer_start; --disk_buffer_filled; --buffer_filled; if (buffer_read>0) --buffer_read; if (buffer_marked) { if (buffer_mark>0) --buffer_mark; else buffer_marked=false; } if (!(*callback)()) return true; } } while (mem_buffer_filled>0) { buffer_char=(int)(unsigned int)*mem_buffer_next_empty; ++mem_buffer_next_empty; if (mem_buffer_next_empty==mem_buffer_end) mem_buffer_next_empty=mem_buffer_start; --mem_buffer_filled; --buffer_filled; if (buffer_read>0) --buffer_read; if (buffer_marked) { if (buffer_mark>0) --buffer_mark; else buffer_marked=false; } if (!(*callback)()) return true; } buffer_char=EOF; if (!(*callback)()) return true; return false; } // Get more characters into the (end of the) buffer. The same // general principles as for look() apply. The callback is called // after the character is added and all accounting has been done, // gets the character via buffer_char, including an EOF when no more // input is available (EOF on stdin). It should return whether to get // more characters, and this function will return whether its exit was // requested by the callback or not (the callback may signal EOF is // an appropriate place to stop and we still return true). // When the buffer is full there are a number of automatic options // echo old the data to stdout or call encodechar for it one character // at a time; shunt a block off to disk, keeping mem_buffer_keep in // memory, discard it a character at a time, stop (and return false; // no EOF call is made), or fail (exit). Here 'full' is defined as // less than mem_buffer_margin of space after adding the most recent // character, so there is always a bit of space for callbacks to do // input transformations. Again, at least one character is always // added (if possible), and thus consumed from stdin, so only call this // if you really want to do that. static bool fill(callback_t callback, when_full_t when_full) { if (feof(stdin)!=0) { buffer_char=EOF; if (!(*callback)()) return true; return false; } for (;;) { /*@-infloops@*/ while (mem_buffer_filled>=mem_buffer_size-mem_buffer_margin) { switch (when_full) { case echo: if (disk_buffer_filled>0) echo_disk_buffer(); (void)empty(echoing_one_char); break; case encode: if (disk_buffer_filled>0) encode_disk_buffer(); (void)empty(encoding_one_char); break; case discard: if (disk_buffer_filled>0) skip_disk_buffer(); (void)empty(one_char); break; case shunt: shunt_to_disk(mem_buffer_filled-mem_buffer_keep); break; case stop: return false; case fail: default: resort_to_exit(true,"buffer full",EX_SOFTWARE); } } /*@=infloops@*/ buffer_char=get(); if (buffer_char==EOF) { resort_to_errno(ferror(stdin)!=0,"error reading input",EX_IOERR); if (!(*callback)()) return true; return false; } *mem_buffer_next_fill=(char)buffer_char; ++mem_buffer_next_fill; if (mem_buffer_next_fill==mem_buffer_end) mem_buffer_next_fill=mem_buffer_start; ++mem_buffer_filled; ++buffer_filled; if (!(*callback)()) return true; } } static inline void create_disk_buffer() { int fildes; fildes=mkstemp(disk_buffer_template); resort_to_errno(fildes==-1, "cannot create temporary file",EX_CANTCREAT); disk_buffer=fdopen(fildes,"rw"); resort_to_errno(disk_buffer==NULL, "cannot create temporary stream",EX_CANTCREAT); } static void remove_disk_buffer() { if (disk_buffer!=NULL) { resort_to_warning(fclose(disk_buffer)!=0, "error closing temporary file"); disk_buffer=NULL; resort_to_warning(unlink(disk_buffer_template)!=0, "error removing temporary file"); } } static inline void shunt_to_disk(int n) { if (disk_buffer==NULL) create_disk_buffer(); if (disk_buffer_sought!=disk_buffer_start+disk_buffer_filled) { disk_buffer_sought=disk_buffer_start+disk_buffer_filled; /*@-nullpass@*/ resort_to_errno(fseek(disk_buffer, disk_buffer_start+disk_buffer_filled,SEEK_SET)!=0, "cannot seek to end of temporary file",EX_IOERR); /*@=nullpass@*/ } while (n>0) { resort_to_exit(mem_buffer_filled==0, "internal error: shunting too much to disk",EX_SOFTWARE); /*@-nullpass@*/ resort_to_errno(putc(*mem_buffer_next_empty,disk_buffer)==EOF, "error writing to temporary file",EX_IOERR); /*@=nullpass@*/ ++disk_buffer_sought; ++disk_buffer_filled; ++mem_buffer_next_empty; if (mem_buffer_next_empty==mem_buffer_end) mem_buffer_next_empty=mem_buffer_start; --mem_buffer_filled; --n; } } /* tag: callback_functions */ static bool one_char() { callback_int=buffer_char; return false; } static bool echoing_one_char() { if (buffer_char!=EOF) { resort_to_errno(put(buffer_char)==EOF,"error echoing",EX_IOERR); } callback_int=buffer_char; return false; } static bool encoding_one_char() { if (buffer_char!=EOF) encodechar(buffer_char); callback_int=buffer_char; return false; } // Set up callback_int before using this. static bool n_chars() { return --callback_int>0; } // Set up callback_int before using this. static bool echoing_n_chars() { if (buffer_char!=EOF) { resort_to_errno(put(buffer_char)==EOF,"error echoing",EX_IOERR); } return --callback_int>0; } // Set up callback_int before using this. static bool encoding_n_chars() { if (buffer_char!=EOF) encodechar(buffer_char); return --callback_int>0; } // Set up callback_int and callback_save before using this. static bool saving_n_chars() { if (buffer_char!=EOF) *callback_save++=(char)buffer_char; // We don't actually need this, though it's a good idea, really! // *callback_save='\0'; return --callback_int>0; } // Set up callback_int and callback_match before using this. static bool n_chars_until_match() { callback_bool=buffer_char==callback_match; return --callback_int>0&&buffer_char!=callback_match; } // Do callback_bool=false before using this. static bool until_eol() { if (buffer_char==(int)'\n') return !callback_bool; callback_bool=buffer_char==(int)'\r'; return true; } // Do callback_bool=false before using this. /*static bool echoing_until_eol() { if (buffer_char!=EOF) { resort_to_errno(put(buffer_char)==EOF,"error echoing",EX_IOERR); } if (buffer_char==(int)'\n') return !callback_bool; callback_bool=buffer_char==(int)'\r'; return true; }*/ // Do callback_bool=false, callback_int=0 before using this. static bool counting_until_eol() { if (buffer_char!=EOF) ++callback_int; if (buffer_char==(int)'\n') return !callback_bool; callback_bool=buffer_char==(int)'\r'; return true; } // Do callback_bool=false and set up callback_save before using this. static bool saving_until_eol() { if (buffer_char!=EOF) *callback_save++=(char)buffer_char; // We don't actually need this, though it's a good idea, really! // *callback_save='\0'; if (buffer_char==(int)'\n') return !callback_bool; callback_bool=buffer_char==(int)'\r'; return true; } // Do callback_bool=false before using this. static bool decoding_until_eol() { // We decode as we fill and work directly in the buffer to make // the transformation. We are guaranteed enough space to do this by // mem_buffer_margin. decode_t decoded; decoded=decodechar(buffer_char); // We always remove the latest undecoded character from the // buffer. ++decoded.r; if (decoded.r>mem_buffer_filled) { // This will only happen for quoted-printable decoding // whitespace stripping, and we can just live with it // if we can't get rid of it all; with sensible constants // something really is disobeying MIME and probably SMTP // about line length anyway if this happens. warning("unable to strip all whitespace; not enough in memory"); decoded.r=mem_buffer_filled; } if (buffer_filled-decoded.r(int)'Z'||c2!=c1-(int)'A'+(int)'a')&& (c2<(int)'A'||c2>(int)'Z'||c1!=c2-(int)'A'+(int)'a')) { callback_bool=false; return false; } /*@-modobserver@*/ ++callback_compare; /*@=modobserver@*/ if (*callback_compare=='\0') { callback_bool=true; return false; } return true; /*@=nullderef@*/ } /* tag: encoding_functions */ static inline void encode_string(const char * s) { while (*s!='\0') { encodechar((int)(unsigned int)*s); s++; } } static void encodechar(int c) { if (encoding==unencoded) { if (c!=EOF) resort_to_errno(put(c)==EOF,"error encoding",EX_IOERR); return; } else if (encoding==quoted_printable) { if (encoding_echoed>=68) { // We need a soft line break, or are close enough to needing // one (76 chars max; unclear whether that counts the CRLF; and // we may output two 3 character sequences which we don't want // to follow with an unescaped CRLF). This scheme will probably // make mail look a bit awful, but that's fairly standard anyway, // and it shouldn't degrade. resort_to_errno(putstr("=\r\n")==EOF, "error encoding string",EX_IOERR); encoding_echoed=0; } if (encoding_filled==1) { // Whatever happens, we'll deal with this now encoding_filled=0; if (encoding_buffer[0]=='\r') { if (c==(int)'\n') { // Output them as is and we're done for now resort_to_errno(putstr("\r\n")==EOF, "error encoding string",EX_IOERR); encoding_echoed=0; return; } else { // Must encode the bare CR and continue as normal resort_to_errno(put((int)'=')==EOF,"error encoding",EX_IOERR); encode_hex_byte((unsigned int)'\r'); encoding_echoed+=3; } } else { // encoding_buffer[0] must be whitespace if (c==EOF||c==(int)'\r') { // Must encode it resort_to_errno(put((int)'=')==EOF,"error encoding",EX_IOERR); encode_hex_byte((unsigned int)encoding_buffer[0]); encoding_echoed+=3; } else { // It is fine to output it now as something else is coming resort_to_errno(put( (int)(unsigned int)encoding_buffer[0])==EOF, "error encoding",EX_IOERR); encoding_echoed+=1; } } } if ((c>=33&&c<=60)||(c>=62&&c<=126)) { resort_to_errno(put(c)==EOF,"error encoding",EX_IOERR); ++encoding_echoed; } else if (c==(int)' '||c==(int)'\t') { if (encoding_echoed>=55) { // My concession to readability; since it's likely to be // a big mess with a 68 character width, we might as well // break a bit earlier on a nice word boundary. And it'll // in fact look better if we break with roughly equal size // lines, assuming they come in at close to 76 characters // wide, so we might as well make a nice skinny column. // rather than a ragged one that uses the same amount of // space. Compromising between the two, then, as some // formats, like HTML, don't have many hard line breaks // anyway, is what we get. resort_to_errno(put(c)==EOF,"error encoding",EX_IOERR); resort_to_errno(putstr("=\r\n")==EOF, "error encoding string",EX_IOERR); encoding_echoed=0; } else { // Store it; we may need to encode it if it's at end of line encoding_filled=1; encoding_buffer[0]=(char)c; } } else if (c==(int)'\r') { // Store it; '\n' may be coming up encoding_filled=1; encoding_buffer[0]='\r'; } else if (c==EOF) { // No buffer, and we're done! Reset for another run. encoding_echoed=0; } else { // Anything else must be encoded as a sequence. resort_to_errno(put((int)'=')==EOF,"error encoding",EX_IOERR); encode_hex_byte((unsigned int)c); encoding_echoed+=3; } } else if (encoding==base64) { if (c==EOF) { // Reset for next run; we won't need it here encoding_echoed=0; if (encoding_filled==0) return; encoding_buffer[encoding_filled]='\0'; } else { encoding_buffer[encoding_filled++]=(char)c; } if (encoding_filled==3||c==EOF) { encode_64((((unsigned int)encoding_buffer[0]>>2)&0x3f)); encode_64((((unsigned int)encoding_buffer[0]&0x03)<<4)| (((unsigned int)encoding_buffer[1]>>4)&0x0f)); if (encoding_filled==1) { resort_to_errno(put((int)'=')==EOF,"error encoding",EX_IOERR); resort_to_errno(put((int)'=')==EOF,"error encoding",EX_IOERR); // Reset for next run encoding_filled=0; return; } encode_64((((unsigned int)encoding_buffer[1]&0x0f)<<2)| (((unsigned int)encoding_buffer[2]>>6)&0x03)); if (encoding_filled==2) { resort_to_errno(put((int)'=')==EOF,"error encoding",EX_IOERR); // Reset for next run encoding_filled=0; return; } encode_64((((unsigned int)encoding_buffer[2]&0x3f))); encoding_echoed+=4; if (encoding_echoed>=72) { resort_to_errno(putstr("\r\n")==EOF, "error encoding string",EX_IOERR); encoding_echoed=0; } encoding_filled=0; } } else { resort_to_exit(true,"internal error: unknown encoding",EX_SOFTWARE); } } static inline void finish_encoding() { encodechar(EOF); } // The function takes an input character c and returns up to four output // characters (a character will be EOF to indicate no further characters // to store; note that this doesn't mean there will be no more ever; only // if EOF is returned when EOF was input does it meant this), and a number // of characters to remove before adding the aforementioned characters. static decode_t decodechar(int c) { int h; unsigned int b1, b2, b3, b4; decode_t o; o.r=0; o.c1=EOF; o.c2=EOF; o.c3=EOF; if (decoding==unencoded) { o.c1=c; return o; } else if (decoding==quoted_printable) { // decoding_buffer may hold '=' and maybe a hex digit or a CR. if (decoding_filled==2) { // Whatever happens, it's all settled now. decoding_filled=0; if (decoding_buffer[1]=='\r') { if (c==(int)'\n') { return o; } // Invalid; leave as is--will be encoded later. o.c1=(int)'='; o.c2=(int)'\r'; o.c3=c; return o; } h=decode_hex(c); if (h==EOF) { // Invalid; leave as is--will be encoded later. o.c1=(int)'='; o.c2=(int)(unsigned int)decoding_buffer[1]; o.c3=c; return o; } // We have a full sequence representing a single character. o.c1=decode_hex((int)(unsigned int)decoding_buffer[1])*16+h; return o; } else if (decoding_filled==1) { if (c==(int)'\r'||decode_hex(c)!=EOF) { // Valid character after = decoding_filled=2; decoding_buffer[1]=(char)c; return o; } // Invalid; leave as is--will be encoded later. decoding_filled=0; o.c1=(int)'='; o.c2=c; return o; } else if (decoding_filled==0) { if (c==(int)'=') { // The first character can only ever be '=' so we // don't actually bother to store it; just say it's there. decoding_white=0; decoding_filled=1; return o; } // Keep track of whitespace. if (c==(int)' '||c==(int)'\t') ++decoding_white; else decoding_white=0; // Remove trailing whitespace. if (c==EOF||c==(int)'\r') { o.r=decoding_white; decoding_white=0; } // Otherwise we just keep it. If it's EOF, we're done. o.c1=c; return o; } else { warning("internal error: decoding buffer too full"); return o; } } else if (decoding==base64) { if (c==EOF) { // Just in case it was corrupted, make sure we're reset decoding_filled=0; return o; } if (c==(int)'='||decode_64(c)!=EOF) decoding_buffer[decoding_filled++]=(char)c; if (decoding_filled==4) { // We empty it whatever happens here decoding_filled=0; b1=(unsigned int)decode_64((int)decoding_buffer[0]); b2=(unsigned int)decode_64((int)decoding_buffer[1]); o.c1=(int)(((b1&0x3f)<<2)|((b2>>4)&0x03)); if (decoding_buffer[2]=='=') return o; b3=(unsigned int)decode_64((int)decoding_buffer[2]); o.c2=(int)(((b2&0x0f)<<4)|((b3>>2)&0x0f)); if (decoding_buffer[3]=='=') return o; b4=(unsigned int)decode_64((int)decoding_buffer[3]); o.c3=(int)(((b3&0x03)<<6)|(b4&0x3f)); } return o; } else { resort_to_exit(true,"internal error: unknown encoding",EX_SOFTWARE); // Never reached return o; } } static void decode_lookahead() { // Decoding will always shrink, so this is quite easy char * c; char * cc; decode_t decoded; int pos=buffer_read; int decpos=buffer_read; resort_to_exit(buffer_read=mem_buffer_end) c-=mem_buffer_size; cc=c; while (pos0) { resort_to_exit(decpos-decoded.r=(int)'0'&&c<=(int)'9') return c-(int)'0'; if (c>=(int)'A'&&c<=(int)'F') return c-(int)'A'+10; return EOF; } static inline int decode_64(int c) { if (c>=(int)'A'&&c<=(int)'Z') return c-(int)'A'; if (c>=(int)'a'&&c<=(int)'z') return c-(int)'a'+26; if (c>=(int)'0'&&c<=(int)'9') return c-(int)'0'+52; if (c==(int)'+') return 62; if (c==(int)'/') return 63; // if (c==(int)'=') return EOF; return EOF; } static inline void encode_hex_byte(unsigned int h) { int h1=(int)((h>>4)&0x0f); int h2=(int)(h&0x0f); if (h1<10) resort_to_errno(put((int)'0'+h1)==EOF,"error encoding",EX_IOERR); else if (h1<16) resort_to_errno(put((int)'A'+h1-10)==EOF,"error encoding",EX_IOERR); else resort_to_exit(true,"internal error: byte too large",EX_SOFTWARE); if (h2<10) resort_to_errno(put((int)'0'+h2)==EOF,"error encoding",EX_IOERR); else if (h2<16) resort_to_errno(put((int)'A'+h2-10)==EOF,"error encoding",EX_IOERR); else resort_to_exit(true,"internal error: byte too large",EX_SOFTWARE); } static inline void encode_64(unsigned int b) { if (b<26) resort_to_errno(put((int)'A'+b)==EOF,"error encoding",EX_IOERR); else if (b<52) resort_to_errno(put((int)'a'+b-26)==EOF,"error encoding",EX_IOERR); else if (b<62) resort_to_errno(put((int)'0'+b-52)==EOF,"error encoding",EX_IOERR); else if (b==62) resort_to_errno(put((int)'+')==EOF,"error encoding",EX_IOERR); else if (b==63) resort_to_errno(put((int)'/')==EOF,"error encoding",EX_IOERR); else resort_to_exit(true, "internal error: base64 value too large",EX_SOFTWARE); } /* tag: error_functions */ // Syslog constants: // level: LOG_ERR, LOG_WARNING, LOG_NOTICE, LOG_INFO, LOG_DEBUG // facility: LOG_MAIL, LOG_DAEMON, LOG_USER, LOG_LOCALn(0-7) static inline void * alloc_or_exit(size_t s) /*@allocates result@*/ { void * m; m=malloc(s); if (m==NULL) { #ifdef USE_STDERR fprintf(stderr,"foot_filter: %s\n","out of memory"); #endif #ifdef USE_SYSLOG syslog(LOG_ERR|LOG_MAIL,"%s\n","out of memory"); #endif exit(EX_OSERR); } return m; } static inline void /*noreturnwhentrue*/ resort_to_exit(bool when,const char * message,int status) { if (when) { #ifdef USE_STDERR fprintf(stderr,"foot_filter: %s\n",message); #endif #ifdef USE_SYSLOG syslog(LOG_ERR|LOG_MAIL,"%s\n",message); #endif exit(status); } } static inline void /*noreturnwhentrue*/ resort_to_errno(bool when,const char * message,int status) { if (when) { #ifdef USE_STDERR fprintf(stderr,"foot_filter: %s (%s)\n",message,strerror(errno)); #endif #ifdef USE_SYSLOG syslog(LOG_ERR|LOG_MAIL,"%s (%m)\n",message); #endif exit(status); } } static inline void resort_to_warning(bool when,const char * message) { if (when) warning(message); } static inline void warning(const char * message) { #ifdef USE_STDERR fprintf(stderr,"foot_filter: %s\n",message); #endif #ifdef USE_SYSLOG syslog(LOG_WARNING|LOG_MAIL,"%s\n",message); #endif } /* tag: helper_functions */ // The program was written following all the specs using CRLF for newlines, // but we get them from Postfix with LF only, so these wrapper functions // do the translation in such a way that it can easily be disabled if desired. static inline int get() { int c; #ifdef UNIX_EOL static bool got_nl=false; if (got_nl) { got_nl=false; return 10; } #endif c=getchar(); #ifdef UNIX_EOL if (c==10) { got_nl=true; return 13; } #endif return c; } static inline int put(int c) { #ifdef UNIX_EOL if (c==13) return c; #endif return putchar(c); } static inline int putstr(const char * s) { while (*s!='\0') if (put((int)(unsigned int)*s++)==EOF) return EOF; return 0; } static inline bool case_insensitively_heads(const char * head,const char * buffer) { const char * s1=head; const char * s2=buffer; for (;;) { if (*s1=='\0') return true; /* for equality return *s2=='\0'; */ else if (*s2=='\0') return false; if (*s1!=*s2&& (*s1<'A'||*s1>'Z'||*s2!=*s1-'A'+'a')&& (*s2<'A'||*s2>'Z'||*s1!=*s2-'A'+'a')) return false; ++s1; ++s2; } } --------------050501060908030503080104 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="Makefile" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="Makefile" all: foot_filter dev: tags splint foot_filter .PHONY: splint test clean clobber test: test.c #splint test.c gcc -o test test.c tags: foot_filter.c ctags --excmd=number '--regex-c=-/\*[[:blank:]]*tag:[[:blank:]]*([[:alnum:]_]+)-\1-' foot_filter.c splint: splint +unixlib -exitarg -initallelements foot_filter.c foot_filter: foot_filter.c gcc -Wall -g -o foot_filter foot_filter.c -O3 clean: -rm tags clobber: clean -rm foot_filter -rm test --------------050501060908030503080104--