Customization of notification messages and log entries ====================================================== Mark Martinec Since March 2002 amavisd-new provides a way to customize e-mail notification messages that are sent in response to a virus (and spam) detection, without having to resort to modifications of Perl code. Three types of messages are generated: - administrator notifications are sent to administrator; (two types: virus, spam) - sender (non)delivery notifications may be sent to the mail originator; (three types: virus, spam, rejects by outgoing MTA) - recipient warnings may be sent to envelope recipients of the e-mail containing a virus. These notifications are normally disabled since they are more of a nuisance than benefit. (only for viruses) Besides the three types of e-mail notifications, the same customization principle is applied to customize the most common and useful log entry (present even at log level 0) that is generated when an e-mail containing an unwanted content is detected. Default Template texts are glued to the end of the 'amavisd' file, separated one from another by __DATA__ lines. Please see comments in these templates. These template texts are assigned to variables: $log_templ, $notify_sender_templ, $notify_virus_sender_templ, $notify_virus_admin_templ, $notify_virus_recips_templ, $notify_spam_sender_templ, $notify_spam_admin_templ The value of these variables may be overruled by assignment or by reading into them in the amavisd.conf file, which is run at amavisd startup time. If assigning to variables, care must be taken to properly quote certain special characters (like backslash), as required by Perl quoting rules. Text read from amavisd file or from external files is not subject to Perl quoting rules. Template text is subject to simple run-time macro expansion as described in the next section. Macro expansion is performed by the routine expand(), which receives substitution text of simple macros from 'amavisd' program. expand() takes a template string as its argument, performs macro expansion, and returns resulting multiline string back to 'amavisd', which uses it to send mail notifications or to write log entries. The substitution text for the following simple macros is built-in: - to be used in forming a mail header (properly quoted addresses as required by RFC2822): f administrator's e-mail address (typically used in 'From:' header of notification messages); T a list of recipients to be used in 'To:' header of the notification; C a list of recipients to be used in 'Cc:' header of the notification; B a list of recipients to be used in 'Bcc:' header of the notification; (the T, C and B lists are determined by each notification subroutine) - to be used in forming a notification mail body or log entry: p the current policy bank name (or empty if a built-in policy bank is still in place); h dns name of this host, or configurable name (variable $myhostname) HOSTNAME same as h n amavis internal log id (also called task id, am_id) as shown in the log and by amavisd-nanny, e.g. 58725-05-2 log_id a synonym for %n body_digest message digest of a mail body: the digest algorithm is chosen by a setting $mail_digest_algorithm ('MD5', or 'SHA-1' or 'SHA-256', defaults to 'MD5'). These are raw (non-encoded) bytes, not suitable for direct display. It is common to encode it with one of the macros: 'hexenc', 'b64enc', or 'b64urlenc'. The result of [:hexenc|[:body_digest]] is the same as a result of a macro call %b. b message digest of a mail body: the digest algorithm is chosen by a setting $mail_digest_algorithm ('MD5', or 'SHA-1' or 'SHA-256', defaults to 'MD5'), encoded as hex digits, high nybble first. The result is the same as a result of [:hexenc|[:body_digest]] date_unix_utc timestamp of the message reception - Unix time (seconds since 1970-01-01T00:00Z as a decimal integer) date_iso8601_utc timestamp of the message reception - ISO 8601 (EN 28601) UTC date-time date_iso8601_local => sub {iso8601_timestamp($MSGINFO->rx_time)}, date_rfc2822_local timestamp of the message reception - RFC 2822 local date-time format week_iso8601 returns an ISO 8601 week number (between 1 and 53) corresponding to the current message reception time (date_iso8601_local) weekday returns an ISO 8601 weekday number (between 1 and 7, 1=Mo, 2=Tu, ..., 7=Su) corresponding to the current message reception time (date_iso8601_local) partition_tag give the current value of a partition_tag attribute which will be stored in SQL records when SQL quarantining or logging is enabled; the value is derived from a setting $partition_tag (global or per-policy bank); a typical usage is to use ISO 8601 week number as a pertition tag; P same as partition_tag u same as date_unix_utc d same as date_rfc2822_local U same as date_iso8601_utc y elapsed time in ms for this message (not including final cleanup) s original envelope sender, rfc2821-quoted and enclosed in angle brackets rfc2822_sender an e-mail address from a Sender header field, or empty rfc2822_from an e-mail address from a From header field (possibly more than one) rfc2822_resent_sender e-mail address from all Resent-Sender header fields rfc2822_resent_from e-mail address from all Resent-From header fields tls_in returns TLS ciphers in use by a SMTP session if mail came to amavisd through a TLS-encrypted session, otherwise result is empty ip_trace_all a list of IP addresses as found in the 'Received from' trace of a mail header, one entry for each Received header field, including possibly invalid IP addresses and private IP addresses; Missing addresses are substituted by with a '?' (e.g. in Received header fields for local or other non-IP mail submissions). The list order corresponds to the order of 'Received' header fields as found in a mail header, top-down, i.e. first entry of the list is the topmost (the most recent) 'Received' header field, so chronologically in reverse; ip_trace_public a list of valid public IP addresses as found in the 'Received from' trace of a mail header. Missing, invalid or private IP addresses are not included in this list, so there may be more 'Received' header fields in a mail header then entries in this list. The list order corresponds to the order of 'Received' header fields as found in a mail header, top-down, i.e. first entry of the list is the topmost (the most recent) 'Received' header field with a valid public IP address, so chronologically in reverse; e best guess of the originator IP address: the bottom-most public IP address as obtained by parsing 'Received from' trace fields. Usually the 'e' macro would return the same IP address as the last entry provided by the ip_trace_public macro; t first entry in the 'Received from' trace of a mail header g original SMTP client DNS name as obtained from an XFORWARD NAME field, or from a 'client_name' attribute in an AM.PDP request; empty if unknown; client_helo client-supplied EHLO/HELO domain name from the original SMTP session as obtained through XFORWARD HELO or from a 'helo_name' attribute in an AM.PDP request; client_addr original SMTP client source IP address, same as %a, as obtained through XFORWARD ADDR or from a 'client_address' attribute in an AM.PDP request, or by parsing the topmost Received header field with a valid IP address as a last resort; a is a synonym for client_addr client_port original SMTP client source TCP port number as obtained through XFORWARD PORT or from a 'client_port' attribute in an AM.PDP request; client_addr_port combines addr and port, similar to: \[%a\]:[:client_port] protocol a protocol name by which a message was received by amavisd, according to RFC 3848 ("Transmission Types Registration") and "Mail Transmission Types" / "WITH protocol types" IANA registration http://www.iana.org/assignments/mail-parameters/mail-parameters.xhtml e.g.: SMTP, ESMTP, ESMTPA, ESMTPS, ESMTPSA, LMTP, LMTPA, LMTPS, LMTPSA, UTF8SMTP, UTF8SMTPA, UTF8SMTPS, UTF8SMTPSA, UTF8LMTP, UTF8LMTPA, UTF8LMTPS, UTF8LMTPSA, ... client_protocol a protocol name by which a message was received from a client by MTA; the information is passed from MTA to amavisd through XFORWARD PROTO SMTP protocol extension or through AM.PDP (milter); ip_trace_all a list of IP addresses from a Received header trace; ip_trace_public like ip_trace_all, except that non-public IP addresses are excluded from the list; ip_proto_trace_all a list of information items from a Received header trace; each item consists of a protocol name (the WITH clause) and an IP address, optionally followed by a source port number if known; Example: ESMTP://[2001:db8::143:1]:39141 < ESMTP://2001:db8::25 < esmtps://203.0.113.172 < ESMTPSA://192.168.9.9 or: UTF8SMTP://[203.0.113.172]:51208 < UTF8SMTPSA://192.168.9.9 ip_proto_trace_public like ip_proto_trace_all, except that entries with non-public IP address are excluded from the list; l (letter ell, suggesting 'local') is true if a variable 'originating' is true, and is an empty string otherwise; the boolean variable 'originating' is under policy bank control, and usually corresponds to a sending host (SMTP client's IP address) matching @mynetworks_maps, or a client being an authenticated roaming user; o best attempt at determining true sender of the virus - normally same as %s S address that will get sender notification; this is normally a one-entry list containing sender address (same as %s), but may be unmangled/reconstructed in an attempt to undo the address forging done by some viruses; in case of unknown (e.g. forged) sender address, the result is empty. R a list of original envelope recipients (for use in notification body, not headers) D a list of recipients with successful delivery status (will get mail) O a list of recipients with unsuccessful delivery status (will not get mail) N a list of recipients with UNsuccessful delivery status (will NOT get mail) with included short per-recipient delivery reports as used in the free format first MIME part of delivery status notifications j 'Subject' header field body m 'Message-ID' header field body r first 'Resent-Message-ID' header field body header_field field body of the header field specified in the argument; the first argument is a header field name (case insensitive); optional second argument truncates the result to n characters; optional third argument j helps choosing the header field in case of multiple fields of the same name: returns a j-th header field with a given field name; search proceeds top-down if j >= 0, or bottom up for negative values (-1=last, -2=next-to-last, ...). Unspecified j is equivalent to -1, i.e. the last header field of the specified name. The result is a string of logical characters (Unicode), suitable for notification templates. header_field_octets like header_field, except that a result is a string of octets in UTF-8 encoding, suitable for a log template useragent returns 'User-Agent: ...' or 'X-Mailer: ...' header field (whichever is present); an optional argument specifies whether an entire field is to be returned (empty or unrecognized argument), or just a field name (argument: 'name'), e.g. 'X-Mailer'; or just a field body (argument 'body'), e.g. 'Thunderbird_1.5.0.9'; Q MTA queue ID of the message if available (Courier, sendmail milter/AM.PDP) H a list of all header lines (field may be wrapped over more than one line); this does not include the 'Return-Path:' or 'Delivered-To:' header fields, which would have been added (or will be added later) by the local delivery agent if mail would have been delivered to a mailbox. z original mail size (in bytes) i long-term unique mail_id on this system, possibly used in log and in quarantine names (also in releasing from a quarantine), encoded in base64url (RFC 4648), e.g. jaUETfyBMJHG mail_id a synonym for %i secret_id a secret counterpart to mail_id, such that b64_encode(md5(b64_decode(secret_id))) == mail_id, encoded in base64url (RFC 4648), e.g. jaUETfyBMJHG q list of quarantine mailbox names, or empty if not quarantined ccat_maj (deprecated, use [:ccat|major]) a major category number of the blocking or a main contents category, see constants CC_* in file amavisd ccat_min (deprecated, use [:ccat|minor]) a minor category number of the blocking or a main contents category, usually 0 unless more specific information is available (e.g. details about bad header or tag3 spam level) ccat_name (deprecated, use [:ccat|name]) a display name of the blocking or main contents category best describing mail contents ccat a new general-purpose macro providing access to information about a mail contents category. Macro 'ccat' takes two optional fixed-string arguments, which are interpreted case-insensitively. In their absence it expands to a string "(maj,min)" which shows a major and a minor contents category number of a blocking ccat for a blocked message, and of a main contents category for a passed message. The first argument specifies which attribute of a ccat is to be provided, the second argument specifies whether a main or a blocking contents category is to be consulted: The first argument may be any of the following strings: name ... provide a human-readable name of a ccat (%ccat_display_names) major ... provide a number: a major contents category, values correspond to CC_* constants in the program minor ... provide a number: a minor contents category, often a 0 ... empty argument (also a default) results in a string "(maj,min)" is_blocking ... '1' if blocking_ccat is true (message is being blocked), or an empty string when a message is being passed; is_nonblocking .. the opposite: '1' if blocking_ccat false, '' otherwise is_blocked_by_nonmain .. '1' if blocking_ccat is true _and_ is different from a main contents category; The second argument may be any of the following strings: main ... provide information on main contents category when asked for name/major/minor/ blocking.. provide information on blocking contents category if it exists, otherwise it falls back to providing info on main ccat; this is also a default in the absence of this argument; v output of the (last) virus checking program V a list of virus names found; contains at least one entry (possibly an empty string) if a virus was found, otherwise a null list banning_rule_key a lookup key (a regexp) of a banning rule which matched, e.g.: (?-xism:^\\.(exe-ms|dll)$(?# rule #9)) banning_rule_comment a comment from a regexp in banning_rule_key, or the whole banning_rule_key if it does not contain a comment, e.g.: rule #9 banning_rule_rhs right-hand side of a banning rule which matched, often just a '1', but can be any string which evaluates to true banned_parts a list of banned parts, with their full path in a MIME/archive e.g.: multipart/mixed | application/octet-stream,.exe,.exe-ms,videos.exe banned_parts_as_attr similar to banned_parts, but uses a different syntax using attribute/value pairs; currently known attributes are: P: part's base name, i.e. a file name in a ./parts/ temporary directory L: part's location (path) in a mail tree M: MIME type as declared in MIME header fields of a message T: short part's content type according to a file(1) utility N: declared part names (none, one or more), as declared in MIME header fields or in an archive (tar, zip, ...) A: part's attributes as follows: U=undecodable, C=crypted, D=directory, S=special(device), L=link e.g.: P=p003,L=1,M=multipart/mixed | P=p002,L=1/2,M=application/octet-stream,T=rar,N=Setup1.1.rar | P=p007,L=1/2/4,T=exe,T=exe-ms,N=setup.exe F just the first leaf node from banned_parts, with a prepended rule comment (if any), e.g.: rule #9:application/octet-stream,.exe,.exe-ms,videos.exe W a list of av scanner names detecting a virus X a list of header syntax violations A a list of SpamAssassin report lines for body report, similar to macro SUMMARY, but provides a list instead of a single string c spam level/hits (mnemonic: sCore) as provided by SpamAssassin, including a per-recipient boost (shown as explicit sum); see also: SCORE 1 above tag level for any recipient? 2 above tag2 level for any recipient? k any recipient declared the message be killed ? T list of triggered SA tests (only in $log_templ and $log_recip_templ) report_json expands to a JSON representation of a structured log event. The macro can take arguments, which can include or exclude fields (key/values) from the JSON report object. Arguments to a macro are either field names (keys) to be included in a report, or are field names to be excluded, each prefixed with an exclamation mark, to produce a report with all but excluded fields. Field names are case-sensitive. The order of fields in a serialized JSON object is unaffected by the order of field names in a filter. Unknown or non-present field names in a filter are silently ignored. For better clarity, instead of listing field names as individual arguments to a macro, it is also possible to provide a single argument in which field names are separated by whitespace. If at least one field name has an exclamation mark (i.e. is to be excluded), all but excluded fields are implied, so any field names without an exclamation mark are redundant; wrap with arguments: width, prefix, indent, string will wrap a string to a multiline string of the specified width; for details see comments before the 'sub wrap_string' in file amavisd; lc lowercases arguments and concatenates them to a single string uc uppercases arguments and concatenates them to a single string substr same as Perl function substr: returns a substring of the first argument, starting at character position specified by argument 2, limited to arg3 characters unless arg3 is not specified; character positions start at 0 index same as Perl function index: locates substring arg2 within arg1, returns -1 if not found, or a character position of the first match len returns string length of its first argument incr returns arg1 incremented by 1 (non-numeric argument interpreted as 0); in the presence of arguments beyond arg1, their sum is added to arg1 decr returns arg1 decremented by 1 (non-numeric argument interpreted as 0) in the presence of arguments beyond arg1, their sum is subtracted from ar1 min returns the smallest number from its arguments, all-whitespace arguments are ignored max returns the largest number from its arguments, all-whitespace arguments are ignored sprintf (just like a Perl function sprintf): formats its arguments, after the first, under control of the format, which is the first argument; the format is a character string which contains three types of objects: plain characters, which are simply copied to resulting string, character escape sequences which are converted and copied to the resulting string, and format specifications, each of which takes the next successive argument and inserts its formatted representation to the resulting string; Note that to get a percent character it needs to be doubled, to avoid its special meaning as a macro call introducer, e.g. [:sprintf|%%s %%.1f|text|[:SCORE]] join (just like a Perl function join): the first argument is a separator string, remaining arguments are strings to be concatenated, with a separator string inserted at every concatenation point; rot13 replaces a string in its argument with an obfuscated string where letters are shifted by 13 positions of an English alphabet (a popular variant of a Caesar cipher to conceal spoilers); this may serve to (poorly) hide strings such as mail Subject or an e-mail address from casual browsing of a log; limit takes two arguments: a string size limit and some string, returning the string from the second argument unchanged if its size is below the limit, or cropped to the size limit, with '[...]' appended; dquote encloses its argument in double quotes and doubles existing double quotes within a string (suitable to sanitize Subject header field, e.g. ab"oh"cd -> "ab""oh""cd"); provisional - exact interpretation may change (and has changed, prior to 2.4.5 double quotes were replaced by \", which made parsing tricky as backquotes themselves were not escaped); uquote replaces one or more consecutive space or tab characters by '_', but does not protect existing underlines, which makes it a lossy transformation (suitable for logging of From or To header fields); provisional - exact interpretation may change; hexenc encodes its arguments as hex digits, high nybble first; b64enc encodes its arguments as base64 strings [A-Za-z0-9+/], removing the final null padding '=' characters; b64urlenc encodes its arguments as base64 strings [A-Za-z0-9-_] according to RFC 4648, removing the final null padding '=' characters; mime_decode takes a string as its first argument, and an optional truncation length as the second. The string is decoded as a MIME-Header string (understands Q or B character set encodings like =?iso-8859-2?Q?...?=, =?koi8-r?B?...?=) and converted to UTF-8, optionally truncated to the specified size at clean UTF-8 boundaries, and returned as a result. The macro can be useful to decode Subject or From header fields, e.g.: [? [:header_field|Subject]||,\ Subject: [:dquote|[:mime2utf8|[:header_field|Subject]|100]]]# The result is a string of logical characters (Unicode), suitable for notification templates; mime2utf8 like mime_decode, except that the result is a string of octets in UTF-8 encoding, suitable for log templates; mail_addr_decode takes an e-mail address as a string of octets, where a local part may be encoded as UTF-8, and the domain part may be an international domain name (IDN) consisting either of U-labels or A-labels or NR-LDH labels. Decodes A-labels to U-labels in domain name. Returns a string of logical characters (Unicode), suitable for notification templates. If the mail address is not a valid UTF-8 string, it is interpreted as ISO-8859-1 (Latin-1). mail_addr_decode_octets like mail_addr_decode, except that the result is a string of octets, only valid as UTF-8 if the provided address was a valid UTF-8 (garbage-in/garbage-out); supplementary_info gives access to some additional information provided by content scanners, such as a provided by SpamAssassin API routine get_tag. The macro takes two arguments, the first is a tag name (a name of some attribute which is expected to provide an associated value), the second argument is a sprintf format string and is optional, if missing a %s is assumed. Currently the only available attributes are AUTOLEARN, AUTOLEARNSCORE, SC, SCRULE, SCTYPE, RELAYCOUNTRY, and LANGUAGES. These are nonempty only when an associated SpamAssassin plugin or function is enabled. report_format gives one of the: dsn, arf, attach, plain, resend, according to a report or notification format being generated; feedback_type is expected to give one of the ARF (draft) strings: abuse, fraud, miscategorized, not-spam, opt-out, opt-out-list, virus, other, as supplied in a call to delivery_status_notification(); dkim takes one argument and returns the requested information about valid DKIM or DomainKeys signatures found in a mail header: any (or an empty argument) returns true if there was at least one valid signature present; author is nonempty if a valid author signature is present (matching a From header field), actual value happens to be a signing domain; sender is nonempty if a valid sender signature is present (signature matching a Sender header field, or matching a From if Sender is not present), the actual value happens to be a signing domain; thirdparty is nonempty if there is no valid author signature but at least one valid signature is present, the actual value happens to be a signing domain; envsender is nonempty if a valid signature is present which is matching the envelope sender address, the actual value happens to be a signing domain; identity returns a comma-separated list of signing identies ('i' tag) found in all valid signatures; selector returns a comma-separated list of signing selectors ('s' tag) found in all valid signatures; domain returns a comma-separated list of signing domains ('d' tag) found in all valid signatures; sig_sd returns a comma-separated list of selector:domain pairs found in all valid signatures; multiple occurences of the same selector:domain pair are listed only once; newsig_sd returns a comma-separated list of selector:domain pairs of newly created signatures, added to a mail header section; multiple occurences of the same selector:domain pair are listed only once; any other value (which may contain an '@', but local part is ignored) is interpreted as a 'verifier acceptable signing domain', and the result is true if there exists a valid signature whose signing domain ('d' tag) matches the supplied verifier acceptable signing domain, otherwise the result is false; the domain name in the argument may be prefixed by a '.', in which case subdomains are allowed too; A couple of SpamAssassin look-alike macros with the same names and arguments as in SA, see 'man Mail::SpamAssassin::Conf' for details: AUTOLEARN autolearn status (deprecated, use supplementary_info instead) DATE same as date_rfc2822_local SCORE similar to macro 'c', but returns a single number (sum of SA score and boost), and allows padding as per SA documentation. In a per-message log ($log_templ) when a message has multiple recipients, a minimum value across all recipients is given; STARS score as in macro SCORE, but represented as a bar of characters REPORT a SA terse report of tests hit (for header reports) SUMMARY similar to macro A, but provides a single multiline string, a SA summary of tests hit for standard body reports REMOTEHOSTADDR IP address of your connecting MTA (often 127.0.0.1) REMOTEHOSTNAME your connecting MTA (often [127.0.0.1] or localhost) TESTS similar to %T in logging, but without scores TESTSSCORES similar to %T in logging, but allows to specify a separator YESNO similar to macro '2', but provides Yes/No instead of 1/0 YESNOCAPS similar to macro '2', but provides YES/NO instead of 1/0 REQD minimal tag2 level of all recipients HEADER same as a header_field macro and two additional SA-lookalikes, but with no counterparts in SpamAssassin: LOGID log id (a.k.a. am_id) e.g. 58725-05-2, synonym for macro n MAILID mail_id as in quarantine e.g. jaUETfyBMJHG, synonym for macro i - when $log_recip_templ is expanded (by-recipient log entry), certain macros keep their general semantics but reflect a value for that recipient: %. value is a recipient counter (starting by 1) when $log_recip_templ is expanded, and is undef when other templates are expanded; %R current recipient email address %D recipient email address if mail will be delivered, otherwise empty %O recipient email address if mail will NOT be delivered, otherwise empty %N short DSN if mail will not be delivered for this recip, otherwise empty %c spam level/hits including a per-recipient boost %0 recipient email address belong to local_domains_maps: L or 0 %1 above tag level for this recipient: Y or 0 %2 above tag2 level for this recipient: Y or 0 %k above kill level for this recipient: Y or 0 REQD recipient's tag2_level ccat_maj major category number, takes into account per-recip bypass_* ccat_min minor category number, takes into account per-recip bypass_* ccat_name display name of the c.cat, takes into account per-recip bypass_* remote_mta MTA to which a message was forwarded remote_mta_smtp_response MTA's SMTP response on accepting forwarded message smtp_response either an MTA's SMTP response for forwarded mail, or internally generated SMTP response for mail that was not forwarded score_boost internally generated score points to be added to a SA score The choice of capital letters for lists, and lower case letters for simple strings is purely a convention and is not enforced, neither do all built-in macros adhere to the convention. Further built-in macros are easy to add if special need arises, just append new key/value pairs to the hash which is passed to expand(). Besides a simple string or an array reference, a hash value may also be a subroutine reference which will be called later during macro expansion. This way one can provide a method for obtaining information which is not yet available during initial construction of the hash (such as AV scanner results), or provide a lazy evaluation for more expensive calculations. Subroutine will be evaluated in scalar context. It may return a string or an array reference. The rest of this text explains what expand() does with its arguments. EXPAND This is a simple, yet fully fledged macro processor with proper lexical analysis, call stack, quoting levels, user supplied and builtin macros, three builtin flow-control macros: selector, regexp selector and iterator, plus a macro #, which discards input tokens until newline (like 'dnl' in m4). Also recognized are the usual \c and \nnn forms for specifying special characters, where c can be any of: r, n, f, b, e, a, t. Lexical analysis of the input string is performed only once, macro result values are not in danger of being lexically re-parsed and are treated as plain characters, loosing any special meaning they might have. New macros (with arguments) can be defined by a built-in macro [= name | body ]. Macro calls can be neutral (not re-evaluating the result) or active (pushing result back to input for re-evaluation). Simple caller-provided macros can evaluate to a string (possibly empty or undef), or an array of strings. It can also be a subroutine reference, in which case the subroutine will be called whenever macro is evaluated, supplying it with arguments from the macro call (first argument is a macro name). The subroutine must return a scalar: a string, or an array reference. The result will be treated as if it were specified directly. Two simple forms of macro calls are known: %x and %#x (where x is a single letter macro name, i.e. a key in a user-supplied associative array): %x evaluates to the value associated with the name x; if the value is an array ref, the result is a single concatenated string of values separated with comma-space pairs and is not re-evaluated (i.e. it is a form of a neutral macro call) %#x evaluates to a number: if a macro value is a scalar, returns 0 for all-whitespace value, and 1 otherwise. If a value is an array ref, evaluates to the number of elements in the array. Neutral call. A literal percent character can be produced by %% or \%. To take away any special meaning of other characters they can be quoted by a backslash, e.g. \[ or \\ or \"] . A more general form of a macro call provides an ability to call macros with longer names, to provide arguments to a call, and to specify whether the call is a neutral or an active call: [@ name |arg1|arg2|...|argn] is an active macro call, i.e. the result is pushed back to input and evaluated like usual; there may be any number of actual arguments supplied, argument 0 is a macro name; Neither a macro name nor arguments are implicitly quoted, so if it is desired to prevent their evaluation before a call, they should be quoted by enclosing them in a pair of [" and "], e.g.: [@["name"]|["arg1"]|["arg2"]| ... |["argn"]] [: name |arg1|arg2|...|argn] is an neutral macro call, i.e. the result is NOT pushed back to input for evaluation, but goes directly to destination (either straight to output string, or collected as argument if this is a nested macro call); Note that whitespace around macro name is allowed (is optional), and is automatically removed, but space within arguments receives no special treatment, and is passed on to a macro like other characters. There is a further simple form of a neutral macro call which emulates SpamAssassin syntax: _NAME_ or _NAME(...)_ where a macro name must be composed of capital letters only, and an optional string in () is passed to a macro as its first argument (besides argument 0, which is a macro name). Commas within (...) are not special, calls like _TESTS(,)_ and _SPAMMYTOKENS(2,short)_ still provide a single argument: "," or "2,short" respectively, to accommodate SA peculiarity. These all-capitals macros can still be called by a normal general-purpose form of a macro call for greater flexibility, as described above. A macro is evaluated only in non-quoted context. Enclosing strings between tokens [" and "] prevents its evaluation. Quoting may be nested, quote tokens must be balanced. Evaluating a quoted input strips off one level of quotes. As a special feature two built-in macros (selector and iterator) provide implicit quoting of their arguments to keep clutter in these two most common macros calls down to a minimum. Unfortunately this causes some complications in the code, but the feature is kept for backward compatibility. Built-in macros selector, regexp selector and iterator have the following syntax: [? arg1 | arg2 | ... ] a selector [~ arg1 | arg2 | ... ] a regexp selector [ arg1 | arg2 | ... ] an iterator where '[', '[?', '[~', '|', and ']' are required tokens. Arguments are arbitrary text, possibly multiline, whitespace matters. Nested macro calls are permitted, proper bracket nesting must be observed. SELECTOR lets its first argument be evaluated immediately, and implicitly quotes remaining arguments. The evaluated first argument chooses which of the remaining arguments is selected as a result value. The chosen result is only then evaluated (i.e. this is an active macro call), non-selected arguments are discarded without evaluation. The first argument is usually a number (with optional leading and trailing whitespace). If it is a non-numeric string, it is treated as 0 for all-whitespace and as 1 otherwise. Value 0 selects the very next (second) argument, value 1 selects the one after it, etc. If the value is greater than the number of available arguments, the last one (unless it is the only one) is selected. If there is only one (the first) alternative available but the value is greater than 0, an empty string is returned. Examples: [? 2 | zero | one | two | three ] -> two [? foo | none | any | two | three ] -> any [? 24 | 0 | one | many ] -> many [? 2 |No recipients] -> (empty string) [? %#R |No recipients|One recipient|%#R recipients] [? %q |No quarantine|Quarantined as %q] Note that a selector macro call can be considered a form of if-then-else, except that the 'then' and 'else' parts are swapped! ITERATOR in its full form takes three arguments (and ignores any extra arguments after that): [ %x | body-usually-containing-%x | separator ] All iterator's arguments are implicitly quoted, iterator performs its own substitutions on provided arguments, as described below. The result of an iterator call is a body (the second argument) repeated as many times as there are elements in the array denoted by the first argument. In each instance of a body all occurrences of a token %x in the body are replaced with each consecutive element of the array. Resulting body instances are then glued together with a string given as the third argument. The result is finally pushed back to input (active macro call) for possible further expansion. As a somewhat ugly hack (upwards compatible), it is possible to iterate on built-in macros with names longer than one character: [ long-macro-name | body-usually-containing-%x | separator ] This only works in a full three-argument form of iterator call, and the iterator variable name becomes a hard-wired literal character x. There are two simplified forms of an iterator call: [ body | separator ] or [ body ] where missing separator is considered a null string, and a missing formal argument name is obtained by looking for the first token of the form %x in the body. If there is no formal argument specified (neither explicitly nor in the body), the result is an empty string, which is potentially useful as a null lexical separator. Examples: [%V| ] a space-separated list of virus names [%V|\n] a newline-separated list of virus names [%V| ] same thing: a newline-separated list of virus names [ %V] a list of virus names, each preceded by NL and spaces [ %R |%s --> <%R>|, ] a comma-space separated list of sender/recipient name pairs where recipient is iterated over the list of recipients. (Only the (first) token %x in the first argument is significant, other characters are ignored.) [%V|[%R|%R + %V|, ]|; ] produce all combinations of %R + %V elements A combined example: [? %#C |#|Cc: [<%C>|, ]] [? %#C ||Cc: [<%C>|, ]\n]# ... same thing evaluates to an empty string if there are no elements in the %C array, otherwise it evaluates to a line: Cc: , , ...\n Whitespace (including newlines) around the first argument %#C of selector call is ignored and can be used for clarity. These all produce the same result: To: [%T|%T|, ] To: [%T|, ] To: %T The '#' removes input characters until and including newline after it. It can be used for clarity to allow newlines be placed in the source text but not resulting in empty lines in the expanded text. In the second example above, a backslash at the end of the line would achieve the same result, although the method is different: \NEWLINE is removed during initial lexical analysis, while # is an internal macro which, when called, actively discards tokens following it, until a NEWLINE (or end of input) is encountered. Further build-in macros are regexp selector and a macro-defining macro. REGEXP SELECTOR macro is somewhat similar to a plain SELECTOR, but it performs a regular expression match on its first argument. The syntax is: [~string|regexp1|then1|regexp2|then2|...|regexpN|thenN|else] It compares string to each regexp in turn, and if a match is found the macro expands to the corresponding 'then' argument, otherwise it expands to the 'else' argument (or empty if the 'else' is missing). For example: [~string|^s.*$|["matches"]|["no match"]] Unlike SELECTOR, arguments are not implicitly quoted, quoting must be explicit if desired (there was no backward compatibility need for this newer macro). This is an active macro call, results are pushed back to input for re-evaluation. Tokens %1, %2, ... %9 in arg3 and arg4 are replaced by captured subexpressions (in parenthesis) of a regexp, and %0 is replaced by arg1. DEFINE macro call allows new macros to be dynamically defined: [= name |body] It creates a new macro with a specified name (whitespace around name is trimmed), giving it the argument as a macro body string. No arguments are implicitly quoted, quoting must be explicit (for most uses at least the body string should be quoted in [" ... "] ). Body string may contain tokens %0, %1, %2, ... %9 as formal arguments. These will be substituted with actual arguments (or empty strings for missing arguments) at the time of a call. In most respects these dynamically defined macros behave just like other pre-defined macros. One distinction is that they can only produce a scalar string, they can not produce an array.