Customization of notification messages and log entries
======================================================
  Mark Martinec <Mark.Martinec@ijs.si>

Since March 2002 amavisd-new provides a way to customize e-mail notification
messages that are sent in response to a virus (and spam) detection,
without having to resort to modifications of Perl code. Three types
of messages are generated:

- administrator notifications are sent to administrator;
    (two types: virus, spam)

- sender (non)delivery notifications may be sent to the mail originator;
    (three types: virus, spam, rejects by outgoing MTA)

- recipient warnings may be sent to envelope recipients of the e-mail
  containing a virus. These notifications are normally disabled
  since they are more of a nuisance than benefit.
    (only for viruses)

Besides the three types of e-mail notifications, the same customization
principle is applied to customize the most common and useful log entry
(present even at log level 0) that is generated when an e-mail containing
an unwanted content is detected.

Default Template texts are glued to the end of the 'amavisd' file,
separated one from another by __DATA__ lines. Please see comments
in these templates. These template texts are assigned to variables:
  $log_templ,
  $notify_sender_templ,
  $notify_virus_sender_templ,
  $notify_virus_admin_templ,
  $notify_virus_recips_templ,
  $notify_spam_sender_templ,
  $notify_spam_admin_templ

The value of these variables may be overruled by assignment or by reading
into them in the amavisd.conf file, which is run at amavisd startup time.

If assigning to variables, care must be taken to properly quote certain
special characters (like backslash), as required by Perl quoting rules.
Text read from amavisd file or from external files is not subject
to Perl quoting rules.

Template text is subject to simple run-time macro expansion as described
in the next section. Macro expansion is performed by the routine expand(),
which receives substitution text of simple macros from 'amavisd' program.
expand() takes a template string as its argument, performs macro expansion,
and returns resulting multiline string back to 'amavisd', which uses it to
send mail notifications or to write log entries.

The substitution text for the following simple macros is built-in:

- to be used in forming a mail header (properly quoted
  addresses as required by RFC2822):

  f  administrator's e-mail address (typically used in 'From:' header
     of notification messages);
  T  a list of recipients to be used in 'To:' header of the notification;
  C  a list of recipients to be used in 'Cc:' header of the notification;
  B  a list of recipients to be used in 'Bcc:' header of the notification;
     (the T, C and B lists are determined by each notification subroutine)

- to be used in forming a notification mail body or log entry:

  p  the current policy bank name (or empty if a built-in policy bank
     is still in place);
  h  dns name of this host, or configurable name (variable $myhostname)
  HOSTNAME  same as h
  n  amavis internal log id (also called task id, am_id) as shown
     in the log and by amavisd-nanny, e.g. 58725-05-2
  log_id  a synonym for %n

  body_digest  message digest of a mail body: the digest algorithm is
     chosen by a setting $mail_digest_algorithm ('MD5', or 'SHA-1' or
     'SHA-256', defaults to 'MD5'). These are raw (non-encoded) bytes,
     not suitable for direct display. It is common to encode it with
     one of the macros: 'hexenc', 'b64enc', or 'b64urlenc'.
     The result of [:hexenc|[:body_digest]] is the same as a result
     of a macro call %b.

  b  message digest of a mail body: the digest algorithm is chosen by
     a setting $mail_digest_algorithm ('MD5', or 'SHA-1' or 'SHA-256',
     defaults to 'MD5'), encoded as hex digits, high nybble first.
     The result is the same as a result of [:hexenc|[:body_digest]]

  date_unix_utc      timestamp of the message reception - Unix time
                     (seconds since 1970-01-01T00:00Z as a decimal integer)
  date_iso8601_utc   timestamp of the message reception - ISO 8601 (EN 28601)
                     UTC date-time
  date_iso8601_local => sub {iso8601_timestamp($MSGINFO->rx_time)},
  date_rfc2822_local timestamp of the message reception - RFC 2822
                     local date-time format
  week_iso8601       returns an ISO 8601 week number (between 1 and 53)
                     corresponding to the current message reception time
                     (date_iso8601_local)
  weekday            returns an ISO 8601 weekday number
                     (between 1 and 7, 1=Mo, 2=Tu, ..., 7=Su) corresponding
                     to the current message reception time (date_iso8601_local)
  partition_tag      give the current value of a partition_tag attribute which
                     will be stored in SQL records when SQL quarantining or
                     logging is enabled; the value is derived from a setting
                     $partition_tag (global or per-policy bank); a typical
                     usage is to use ISO 8601 week number as a pertition tag;
  P  same as partition_tag
  u  same as date_unix_utc
  d  same as date_rfc2822_local
  U  same as date_iso8601_utc
  y  elapsed time in ms for this message (not including final cleanup)

  s  original envelope sender, rfc2821-quoted and enclosed in angle brackets
  rfc2822_sender   an e-mail address from a Sender header field, or empty
  rfc2822_from     an e-mail address from a From header field (possibly more
                   than one)
  rfc2822_resent_sender  e-mail address from all Resent-Sender header fields
  rfc2822_resent_from    e-mail address from all Resent-From header fields

  tls_in  returns TLS ciphers in use by a SMTP session if mail came to amavisd
     through a TLS-encrypted session, otherwise result is empty

  ip_trace_all  a list of IP addresses as found in the 'Received from'
     trace of a mail header, one entry for each Received header field,
     including possibly invalid IP addresses and private IP addresses;
     Missing addresses are substituted by with a '?' (e.g. in Received
     header fields for local or other non-IP mail submissions).
     The list order corresponds to the order of 'Received' header fields
     as found in a mail header, top-down, i.e. first entry of the list
     is the topmost (the most recent) 'Received' header field, so
     chronologically in reverse;

  ip_trace_public  a list of valid public IP addresses as found in the
     'Received from' trace of a mail header.  Missing, invalid or private
     IP addresses are not included in this list, so there may be more
     'Received' header fields in a mail header then entries in this list.
     The list order corresponds to the order of 'Received' header fields
     as found in a mail header, top-down, i.e. first entry of the list
     is the topmost (the most recent) 'Received' header field with a valid
     public IP address, so chronologically in reverse;

  e  best guess of the originator IP address: the bottom-most public IP
     address as obtained by parsing 'Received from' trace fields.
     Usually the 'e' macro would return the same IP address as the
     last entry provided by the ip_trace_public macro;

  t  first entry in the 'Received from' trace of a mail header

  g  original SMTP client DNS name as obtained from an XFORWARD NAME
     field, or from a 'client_name' attribute in an AM.PDP request;
     empty if unknown;

  client_helo  client-supplied EHLO/HELO domain name from the original
     SMTP session as obtained through XFORWARD HELO or from a 'helo_name'
     attribute in an AM.PDP request;

  client_addr  original SMTP client source IP address, same as %a,
     as obtained through XFORWARD ADDR or from a 'client_address' attribute
     in an AM.PDP request, or by parsing the topmost Received header field
     with a valid IP address as a last resort;

  a  is a synonym for client_addr

  client_port  original SMTP client source TCP port number as obtained through
     XFORWARD PORT or from a 'client_port' attribute in an AM.PDP request;

  client_addr_port  combines addr and port, similar to: \[%a\]:[:client_port]

  protocol
     a protocol name by which a message was received by amavisd,
     according to RFC 3848 ("Transmission Types Registration") and
     "Mail Transmission Types" / "WITH protocol types" IANA registration
       http://www.iana.org/assignments/mail-parameters/mail-parameters.xhtml
     e.g.:
       SMTP, ESMTP, ESMTPA, ESMTPS, ESMTPSA,
       LMTP, LMTPA, LMTPS, LMTPSA,
       UTF8SMTP, UTF8SMTPA, UTF8SMTPS, UTF8SMTPSA,
       UTF8LMTP, UTF8LMTPA, UTF8LMTPS, UTF8LMTPSA, ...

  client_protocol
     a protocol name by which a message was received from a client by MTA;
     the information is passed from MTA to amavisd through XFORWARD PROTO
     SMTP protocol extension or through AM.PDP (milter);

  ip_trace_all
     a list of IP addresses from a Received header trace;

  ip_trace_public
     like ip_trace_all, except that non-public IP addresses are excluded
     from the list;

  ip_proto_trace_all
     a list of information items from a Received header trace; each item
     consists of a protocol name (the WITH clause) and an IP address,
     optionally followed by a source port number if known;
     Example:
       ESMTP://[2001:db8::143:1]:39141 < ESMTP://2001:db8::25 <
         esmtps://203.0.113.172 < ESMTPSA://192.168.9.9
     or:
       UTF8SMTP://[203.0.113.172]:51208 < UTF8SMTPSA://192.168.9.9

  ip_proto_trace_public
     like ip_proto_trace_all, except that entries with non-public
     IP address are excluded from the list;

  l  (letter ell, suggesting 'local') is true if a variable 'originating' is
     true, and is an empty string otherwise; the boolean variable 'originating'
     is under policy bank control, and usually corresponds to a sending host
     (SMTP client's IP address) matching @mynetworks_maps, or a client being
     an authenticated roaming user;

  o  best attempt at determining true sender of the virus - normally same as %s

  S  address that will get sender notification;
     this is normally a one-entry list containing sender address (same as %s),
     but may be unmangled/reconstructed in an attempt to undo the address
     forging done by some viruses; in case of unknown (e.g. forged) sender
     address, the result is empty.

  R  a list of original envelope recipients
     (for use in notification body, not headers)
  D  a list of recipients with successful delivery status (will get mail)
  O  a list of recipients with unsuccessful delivery status (will not get mail)
  N  a list of recipients with UNsuccessful delivery status (will NOT get mail)
     with included short per-recipient delivery reports as used in the free
     format first MIME part of delivery status notifications

  j  'Subject' header field body
  m  'Message-ID' header field body
  r  first 'Resent-Message-ID' header field body

  header_field  field body of the header field specified in the argument;
     the first argument is a header field name (case insensitive);
     optional second argument truncates the result to n characters;
     optional third argument j helps choosing the header field in case of
     multiple fields of the same name: returns a j-th header field with a
     given field name; search proceeds top-down if j >= 0, or bottom up for
     negative values (-1=last, -2=next-to-last, ...). Unspecified j is
     equivalent to -1, i.e. the last header field of the specified name.
     The result is a string of logical characters (Unicode), suitable for
     notification templates.

  header_field_octets
    like header_field, except that a result is a string of octets
    in UTF-8 encoding, suitable for a log template

  useragent returns 'User-Agent: ...' or 'X-Mailer: ...' header field
     (whichever is present); an optional argument specifies whether
     an entire field is to be returned (empty or unrecognized argument),
     or just a field name (argument: 'name'), e.g. 'X-Mailer';
     or just a field body (argument 'body'),  e.g. 'Thunderbird_1.5.0.9';

  Q  MTA queue ID of the message if available (Courier, sendmail milter/AM.PDP)
  H  a list of all header lines (field may be wrapped over more than one line);
     this does not include the 'Return-Path:' or 'Delivered-To:' header fields,
     which would have been added (or will be added later) by the local
     delivery agent if mail would have been delivered to a mailbox.
  z  original mail size (in bytes)
  i  long-term unique mail_id on this system, possibly used in log
     and in quarantine names (also in releasing from a quarantine),
     encoded in base64url (RFC 4648), e.g. jaUETfyBMJHG
  mail_id  a synonym for %i
  secret_id  a secret counterpart to mail_id, such that
     b64_encode(md5(b64_decode(secret_id))) == mail_id,
     encoded in base64url (RFC 4648), e.g. jaUETfyBMJHG
  q  list of quarantine mailbox names, or empty if not quarantined

  ccat_maj  (deprecated, use [:ccat|major])  a major category number
            of the blocking or a main contents category, see constants
            CC_* in file amavisd
  ccat_min  (deprecated, use [:ccat|minor])  a minor category number
            of the blocking or a main contents category, usually 0
            unless more specific information is available (e.g.
            details about bad header or tag3 spam level)
  ccat_name (deprecated, use [:ccat|name])  a display name of the
            blocking or main contents category best describing mail contents

  ccat  a new general-purpose macro providing access to information about
     a mail contents category. Macro 'ccat' takes two optional fixed-string
     arguments, which are interpreted case-insensitively. In their absence
     it expands to a string "(maj,min)" which shows a major and a minor
     contents category number of a blocking ccat for a blocked message,
     and of a main contents category for a passed message.

     The first argument specifies which attribute of a ccat is to be provided,
     the second argument specifies whether a main or a blocking contents
     category is to be consulted:

   The first argument may be any of the following strings:
     name   ... provide a human-readable name of a ccat (%ccat_display_names)
     major  ... provide a number: a major contents category,
                values correspond to CC_* constants in the program
     minor  ... provide a number: a minor contents category, often a 0
     <empty>... empty argument (also a default) results in a string "(maj,min)"
     is_blocking   ... '1' if blocking_ccat is true (message is being blocked),
                        or an empty string when a message is being passed;
     is_nonblocking .. the opposite: '1' if blocking_ccat false, '' otherwise
     is_blocked_by_nonmain .. '1' if blocking_ccat is true
                       _and_ is different from a main contents category;

   The second argument may be any of the following strings:
     main   ... provide information on main contents category
                when asked for name/major/minor/<empty>
     blocking.. provide information on blocking contents category if it exists,
                otherwise it falls back to providing info on main ccat;
                this is also a default in the absence of this argument;

  v  output of the (last) virus checking program
  V  a list of virus names found; contains at least one entry (possibly
     an empty string) if a virus was found, otherwise a null list

  banning_rule_key  a lookup key (a regexp) of a banning rule which matched,
                e.g.:  (?-xism:^\\.(exe-ms|dll)$(?# rule #9))
  banning_rule_comment  a comment from a regexp in banning_rule_key, or
                the whole banning_rule_key if it does not contain a comment,
                e.g.:  rule #9
  banning_rule_rhs  right-hand side of a banning rule which matched, often
                just a '1', but can be any string which evaluates to true
  banned_parts  a list of banned parts, with their full path in a MIME/archive
     e.g.:  multipart/mixed | application/octet-stream,.exe,.exe-ms,videos.exe

  banned_parts_as_attr  similar to banned_parts, but uses a different syntax
                using attribute/value pairs; currently known attributes are:
     P: part's base name, i.e. a file name in a ./parts/ temporary directory
     L: part's location (path) in a mail tree
     M: MIME type as declared in MIME header fields of a message
     T: short part's content type according to a file(1) utility
     N: declared part names (none, one or more), as declared in MIME header
        fields or in an archive (tar, zip, ...)
     A: part's attributes as follows:
        U=undecodable, C=crypted, D=directory, S=special(device), L=link
     e.g.:
       P=p003,L=1,M=multipart/mixed |
       P=p002,L=1/2,M=application/octet-stream,T=rar,N=Setup1.1.rar |
       P=p007,L=1/2/4,T=exe,T=exe-ms,N=setup.exe

  F  just the first leaf node from banned_parts, with a prepended rule comment
     (if any), e.g.:  rule #9:application/octet-stream,.exe,.exe-ms,videos.exe

  W  a list of av scanner names detecting a virus
  X  a list of header syntax violations
  A  a list of SpamAssassin report lines for body report, similar to
     macro SUMMARY, but provides a list instead of a single string
  c  spam level/hits (mnemonic: sCore) as provided by SpamAssassin, including
     a per-recipient boost (shown as explicit sum);  see also: SCORE
  1  above tag level for any recipient?
  2  above tag2 level for any recipient?
  k  any recipient declared the message be killed ?
  T  list of triggered SA tests (only in $log_templ and $log_recip_templ)

  report_json  expands to a JSON representation of a structured log event.
     The macro can take arguments, which can include or exclude fields
     (key/values) from the JSON report object. Arguments to a macro are
     either field names (keys) to be included in a report, or are field
     names to be excluded, each prefixed with an exclamation mark,
     to produce a report with all but excluded fields.
     Field names are case-sensitive. The order of fields in a serialized
     JSON object is unaffected by the order of field names in a filter.
     Unknown or non-present field names in a filter are silently ignored.
     For better clarity, instead of listing field names as individual
     arguments to a macro, it is also possible to provide a single argument
     in which field names are separated by whitespace.
     If at least one field name has an exclamation mark (i.e. is to be
     excluded), all but excluded fields are implied, so any field names
     without an exclamation mark are redundant;

  wrap  with arguments: width, prefix, indent, string
     will wrap a string to a multiline string of the specified width;
     for details see comments before the 'sub wrap_string' in file amavisd;

  lc lowercases arguments and concatenates them to a single string
  uc uppercases arguments and concatenates them to a single string
  substr  same as Perl function substr: returns a substring of the first
     argument, starting at character position specified by argument 2,
     limited to arg3 characters unless arg3 is not specified;
     character positions start at 0
  index  same as Perl function index: locates substring arg2 within arg1,
     returns -1 if not found, or a character position of the first match
  len  returns string length of its first argument
  incr returns arg1 incremented by 1  (non-numeric argument interpreted as 0);
     in the presence of arguments beyond arg1, their sum is added to arg1
  decr returns arg1 decremented by 1  (non-numeric argument interpreted as 0)
     in the presence of arguments beyond arg1, their sum is subtracted from ar1
  min  returns the smallest number from its arguments, all-whitespace arguments
       are ignored
  max  returns the largest number from its arguments, all-whitespace arguments
       are ignored
  sprintf  (just like a Perl function sprintf): formats its arguments, after
     the first, under control of the format, which is the first argument;
     the format is a character string which contains three types of objects:
     plain characters, which are simply copied to resulting string, character
     escape sequences which are converted and copied to the resulting string,
     and format specifications, each of which takes the next successive
     argument and inserts its formatted representation to the resulting string;
     Note that to get a percent character it needs to be doubled, to avoid
     its special meaning as a macro call introducer,
     e.g.  [:sprintf|%%s %%.1f|text|[:SCORE]]
  join  (just like a Perl function join): the first argument is a separator
     string, remaining arguments are strings to be concatenated, with a
     separator string inserted at every concatenation point;
  rot13  replaces a string in its argument with an obfuscated string
     where letters are shifted by 13 positions of an English alphabet
     (a popular variant of a Caesar cipher to conceal spoilers);
     this may serve to (poorly) hide strings such as mail Subject or
     an e-mail address from casual browsing of a log;
  limit  takes two arguments: a string size limit and some string,
     returning the string from the second argument unchanged if its size is
     below the limit, or cropped to the size limit, with '[...]' appended;
  dquote  encloses its argument in double quotes and doubles existing double
     quotes within a string (suitable to sanitize Subject header field,
     e.g. ab"oh"cd -> "ab""oh""cd");  provisional - exact interpretation
     may change (and has changed, prior to 2.4.5 double quotes were replaced
     by \", which made parsing tricky as backquotes themselves were not
     escaped);
  uquote  replaces one or more consecutive space or tab characters by '_',
     but does not protect existing underlines, which makes it a lossy
     transformation (suitable for logging of From or To header fields);
     provisional - exact interpretation may change;

  hexenc  encodes its arguments as hex digits, high nybble first;

  b64enc  encodes its arguments as base64 strings [A-Za-z0-9+/],
     removing the final null padding '=' characters;

  b64urlenc  encodes its arguments as base64 strings [A-Za-z0-9-_]
     according to RFC 4648, removing the final null padding '=' characters;

  mime_decode  takes a string as its first argument, and an optional truncation
     length as the second. The string is decoded as a MIME-Header string
     (understands Q or B character set encodings like =?iso-8859-2?Q?...?=,
     =?koi8-r?B?...?=) and converted to UTF-8, optionally truncated to the
     specified size at clean UTF-8 boundaries, and returned as a result.
     The macro can be useful to decode Subject or From header fields, e.g.:
       [? [:header_field|Subject]||,\
       Subject: [:dquote|[:mime2utf8|[:header_field|Subject]|100]]]#
     The result is a string of logical characters (Unicode), suitable for
     notification templates;

  mime2utf8
     like mime_decode, except that the result is a string of octets
     in UTF-8 encoding, suitable for log templates;

  mail_addr_decode
     takes an e-mail address as a string of octets, where a local part
     may be encoded as UTF-8, and the domain part may be an international
     domain name (IDN) consisting either of U-labels or A-labels or NR-LDH
     labels. Decodes A-labels to U-labels in domain name. Returns a string
     of logical characters (Unicode), suitable for notification templates.
     If the mail address is not a valid UTF-8 string, it is interpreted as
     ISO-8859-1 (Latin-1).

  mail_addr_decode_octets
     like mail_addr_decode, except that the result is a string of octets,
     only valid as UTF-8 if the provided address was a valid UTF-8
     (garbage-in/garbage-out);

  supplementary_info  gives access to some additional information provided by
     content scanners, such as a provided by SpamAssassin API routine get_tag.
     The macro takes two arguments, the first is a tag name (a name of some
     attribute which is expected to provide an associated value), the second
     argument is a sprintf format string and is optional, if missing a %s
     is assumed. Currently the only available attributes are AUTOLEARN,
     AUTOLEARNSCORE, SC, SCRULE, SCTYPE, RELAYCOUNTRY, and LANGUAGES.
     These are nonempty only when an associated SpamAssassin plugin or
     function is enabled.

  report_format  gives one of the: dsn, arf, attach, plain, resend,
     according to a report or notification format being generated;

  feedback_type  is expected to give one of the ARF (draft) strings:
     abuse, fraud, miscategorized, not-spam, opt-out, opt-out-list,
     virus, other,  as supplied in a call to delivery_status_notification();

  dkim  takes one argument and returns the requested information about
     valid DKIM or DomainKeys signatures found in a mail header:
       any (or an empty argument) returns true if there was at least one
               valid signature present;
       author  is nonempty if a valid author signature is present (matching a
               From header field), actual value happens to be a signing domain;
       sender  is nonempty if a valid sender signature is present (signature
               matching a Sender header field, or matching a From if Sender is
               not present), the actual value happens to be a signing domain;
       thirdparty  is nonempty if there is no valid author signature but at
               least one valid signature is present, the actual value happens
               to be a signing domain;
       envsender  is nonempty if a valid signature is present which is matching
               the envelope sender address, the actual value happens to be
               a signing domain;
       identity  returns a comma-separated list of signing identies ('i' tag)
               found in all valid signatures;
       selector returns a comma-separated list of signing selectors ('s' tag)
               found in all valid signatures;
       domain  returns a comma-separated list of signing domains ('d' tag)
               found in all valid signatures;
       sig_sd  returns a comma-separated list of selector:domain pairs
               found in all valid signatures; multiple occurences of the
               same selector:domain pair are listed only once;
       newsig_sd  returns a comma-separated list of selector:domain pairs
               of newly created signatures, added to a mail header section;
               multiple occurences of the same selector:domain pair are
               listed only once;
       any other value (which may contain an '@', but local part is ignored)
               is interpreted as a 'verifier acceptable signing domain', and
               the result is true if there exists a valid signature whose
               signing domain ('d' tag) matches the supplied verifier
               acceptable signing domain, otherwise the result is false;
               the domain name in the argument may be prefixed by a '.', in
               which case subdomains are allowed too;

  A couple of SpamAssassin look-alike macros with the same names and arguments
  as in SA, see 'man Mail::SpamAssassin::Conf' for details:

  AUTOLEARN   autolearn status  (deprecated, use supplementary_info instead)

  DATE        same as date_rfc2822_local
  SCORE       similar to macro 'c', but returns a single number (sum of
              SA score and boost), and allows padding as per SA documentation.
              In a per-message log ($log_templ) when a message has multiple
              recipients, a minimum value across all recipients is given;
  STARS       score as in macro SCORE, but represented as a bar of characters
  REPORT      a SA terse report of tests hit (for header reports)
  SUMMARY     similar to macro A, but provides a single multiline string,
              a SA summary of tests hit for standard body reports
  REMOTEHOSTADDR IP address of your connecting MTA (often 127.0.0.1)
  REMOTEHOSTNAME your connecting MTA (often [127.0.0.1] or localhost)
  TESTS       similar to %T in logging, but without scores
  TESTSSCORES similar to %T in logging, but allows to specify a separator
  YESNO       similar to macro '2', but provides Yes/No instead of 1/0
  YESNOCAPS   similar to macro '2', but provides YES/NO instead of 1/0
  REQD        minimal tag2 level of all recipients
  HEADER      same as a header_field macro

  and two additional SA-lookalikes, but with no counterparts in SpamAssassin:

  LOGID       log id (a.k.a. am_id) e.g. 58725-05-2,      synonym for macro n
  MAILID      mail_id as in quarantine e.g. jaUETfyBMJHG, synonym for macro i

- when $log_recip_templ is expanded (by-recipient log entry), certain
  macros keep their general semantics but reflect a value for that recipient:

  %. value is a recipient counter (starting by 1) when $log_recip_templ
     is expanded, and is undef when other templates are expanded;
  %R current recipient email address
  %D recipient email address if mail will be delivered, otherwise empty
  %O recipient email address if mail will NOT be delivered, otherwise empty
  %N short DSN if mail will not be delivered for this recip, otherwise empty
  %c spam level/hits including a per-recipient boost
  %0 recipient email address belong to local_domains_maps: L or 0
  %1 above tag level for this recipient:  Y or 0
  %2 above tag2 level for this recipient: Y or 0
  %k above kill level for this recipient: Y or 0
  REQD recipient's tag2_level
  ccat_maj  major category number, takes into account per-recip bypass_*
  ccat_min  minor category number, takes into account per-recip bypass_*
  ccat_name display name of the c.cat, takes into account per-recip bypass_*
  remote_mta  MTA to which a message was forwarded
  remote_mta_smtp_response  MTA's SMTP response on accepting forwarded message
  smtp_response  either an MTA's SMTP response for forwarded mail, or
            internally generated SMTP response for mail that was not forwarded
  score_boost internally generated score points to be added to a SA score

The choice of capital letters for lists, and lower case letters for simple
strings is purely a convention and is not enforced, neither do all
built-in macros adhere to the convention.

Further built-in macros are easy to add if special need arises,
just append new key/value pairs to the hash which is passed to expand().

Besides a simple string or an array reference, a hash value may also be
a subroutine reference which will be called later during macro expansion.
This way one can provide a method for obtaining information which is not yet
available during initial construction of the hash (such as AV scanner results),
or provide a lazy evaluation for more expensive calculations. Subroutine
will be evaluated in scalar context. It may return a string or an array
reference.

The rest of this text explains what expand() does with its arguments.


EXPAND

This is a simple, yet fully fledged macro processor with proper lexical
analysis, call stack, quoting levels, user supplied and builtin macros,
three builtin flow-control macros: selector, regexp selector and iterator,
plus a macro #, which discards input tokens until newline (like 'dnl' in m4).
Also recognized are the usual \c and \nnn forms for specifying special
characters, where c can be any of: r, n, f, b, e, a, t.  Lexical analysis
of the input string is performed only once, macro result values are not
in danger of being lexically re-parsed and are treated as plain characters,
loosing any special meaning they might have. New macros (with arguments)
can be defined by a built-in macro [= name | body ].  Macro calls can be
neutral (not re-evaluating the result) or active (pushing result back to
input for re-evaluation).

Simple caller-provided macros can evaluate to a string (possibly empty
or undef), or an array of strings. It can also be a subroutine reference,
in which case the subroutine will be called whenever macro is evaluated,
supplying it with arguments from the macro call (first argument is a macro
name). The subroutine must return a scalar: a string, or an array reference.
The result will be treated as if it were specified directly.

Two simple forms of macro calls are known: %x and %#x (where x is a single
letter macro name, i.e. a key in a user-supplied associative array):
  %x   evaluates to the value associated with the name x;
       if the value is an array ref, the result is a single concatenated
       string of values separated with comma-space pairs and is not
       re-evaluated (i.e. it is a form of a neutral macro call)
  %#x  evaluates to a number: if a macro value is a scalar, returns 0
       for all-whitespace value, and 1 otherwise. If a value is an array ref,
       evaluates to the number of elements in the array. Neutral call.

A literal percent character can be produced by %% or \%. To take away any
special meaning of other characters they can be quoted by a backslash,
e.g. \[ or \\ or \"]  .

A more general form of a macro call provides an ability to call macros with
longer names, to provide arguments to a call, and to specify whether the call
is a neutral or an active call:
  [@ name |arg1|arg2|...|argn]  is an active macro call, i.e. the
       result is pushed back to input and evaluated like usual; there may be
       any number of actual arguments supplied, argument 0 is a macro name;
       Neither a macro name nor arguments are implicitly quoted, so if it is
       desired to prevent their evaluation before a call, they should be
       quoted by enclosing them in a pair of [" and "],
       e.g.: [@["name"]|["arg1"]|["arg2"]| ... |["argn"]]
  [: name |arg1|arg2|...|argn]  is an neutral macro call, i.e. the
       result is NOT pushed back to input for evaluation, but goes directly
       to destination (either straight to output string, or collected as
       argument if this is a nested macro call);
Note that whitespace around macro name is allowed (is optional), and is
automatically removed, but space within arguments receives no special
treatment, and is passed on to a macro like other characters.

There is a further simple form of a neutral macro call which emulates
SpamAssassin syntax: _NAME_ or _NAME(...)_ where a macro name must be
composed of capital letters only, and an optional string in () is passed
to a macro as its first argument (besides argument 0, which is a macro
name). Commas within (...) are not special, calls like _TESTS(,)_ and
_SPAMMYTOKENS(2,short)_ still provide a single argument: "," or "2,short"
respectively, to accommodate SA peculiarity. These all-capitals macros can
still be called by a normal general-purpose form of a macro call for greater
flexibility, as described above.

A macro is evaluated only in non-quoted context. Enclosing strings between
tokens [" and "] prevents its evaluation. Quoting may be nested, quote tokens
must be balanced. Evaluating a quoted input strips off one level of quotes.

As a special feature two built-in macros (selector and iterator) provide
implicit quoting of their arguments to keep clutter in these two most common
macros calls down to a minimum. Unfortunately this causes some complications
in the code, but the feature is kept for backward compatibility.

Built-in macros selector, regexp selector and iterator have the following
syntax:
  [? arg1 | arg2 | ... ]    a selector
  [~ arg1 | arg2 | ... ]    a regexp selector
  [  arg1 | arg2 | ... ]    an iterator
where '[', '[?', '[~', '|', and ']' are required tokens. Arguments are
arbitrary text, possibly multiline, whitespace matters. Nested macro calls
are permitted, proper bracket nesting must be observed.

SELECTOR lets its first argument be evaluated immediately, and implicitly
quotes remaining arguments. The evaluated first argument chooses which
of the remaining arguments is selected as a result value. The chosen result
is only then evaluated (i.e. this is an active macro call), non-selected
arguments are discarded without evaluation. The first argument is usually
a number (with optional leading and trailing whitespace). If it is a
non-numeric string, it is treated as 0 for all-whitespace and as 1 otherwise.
Value 0 selects the very next (second) argument, value 1 selects the
one after it, etc. If the value is greater than the number of available
arguments, the last one (unless it is the only one) is selected. If there
is only one (the first) alternative available but the value is greater
than 0, an empty string is returned.
  Examples:
    [? 2   | zero | one | two | three ]  -> two
    [? foo | none | any | two | three ]  -> any
    [? 24  | 0    | one | many ]         -> many
    [? 2   |No recipients]               -> (empty string)
    [? %#R |No recipients|One recipient|%#R recipients]
    [? %q  |No quarantine|Quarantined as %q]
Note that a selector macro call can be considered a form of if-then-else,
except that the 'then' and 'else' parts are swapped!

ITERATOR in its full form takes three arguments (and ignores any extra
arguments after that):
    [ %x | body-usually-containing-%x | separator ]
All iterator's arguments are implicitly quoted, iterator performs its own
substitutions on provided arguments, as described below. The result of an
iterator call is a body (the second argument) repeated as many times as
there are elements in the array denoted by the first argument. In each
instance of a body all occurrences of a token %x in the body are replaced
with each consecutive element of the array. Resulting body instances are
then glued together with a string given as the third argument. The result
is finally pushed back to input (active macro call) for possible further
expansion.

As a somewhat ugly hack (upwards compatible), it is possible to iterate
on built-in macros with names longer than one character:
    [ long-macro-name | body-usually-containing-%x | separator ]
This only works in a full three-argument form of iterator call, and the
iterator variable name becomes a hard-wired literal character x.

There are two simplified forms of an iterator call:
    [ body | separator ]
or  [ body ]
where missing separator is considered a null string, and a missing formal
argument name is obtained by looking for the first token of the form %x
in the body. If there is no formal argument specified (neither explicitly
nor in the body), the result is an empty string, which is potentially useful
as a null lexical separator.

  Examples:
    [%V| ]     a space-separated list of virus names

    [%V|\n]    a newline-separated list of virus names

    [%V|
    ]          same thing: a newline-separated list of virus names

    [
        %V]    a list of virus names, each preceded by NL and spaces

    [ %R |%s --> <%R>|, ]  a comma-space separated list of sender/recipient
               name pairs where recipient is iterated over the list
               of recipients. (Only the (first) token %x in the first
               argument is significant, other characters are ignored.)

    [%V|[%R|%R + %V|, ]|; ]  produce all combinations of %R + %V elements

A combined example:
    [? %#C |#|Cc: [<%C>|, ]]
    [? %#C ||Cc: [<%C>|, ]\n]#     ... same thing
evaluates to an empty string if there are no elements in the %C array,
otherwise it evaluates to a line:  Cc: <addr1>, <addr2>, ...\n
Whitespace (including newlines) around the first argument %#C of selector
call is ignored and can be used for clarity.

These all produce the same result:
    To: [%T|%T|, ]
    To: [%T|, ]
    To: %T

The '#' removes input characters until and including newline after it.
It can be used for clarity to allow newlines be placed in the source text
but not resulting in empty lines in the expanded text. In the second example
above, a backslash at the end of the line would achieve the same result,
although the method is different: \NEWLINE is removed during initial lexical
analysis, while # is an internal macro which, when called, actively discards
tokens following it, until a NEWLINE (or end of input) is encountered.

Further build-in macros are regexp selector and a macro-defining macro.

REGEXP SELECTOR macro is somewhat similar to a plain SELECTOR, but it
performs a regular expression match on its first argument. The syntax is:
  [~string|regexp1|then1|regexp2|then2|...|regexpN|thenN|else]

It compares string to each regexp in turn, and if a match is found the macro
expands to the corresponding 'then' argument, otherwise it expands to the
'else' argument (or empty if the 'else' is missing). For example:
  [~string|^s.*$|["matches"]|["no match"]]
Unlike SELECTOR, arguments are not implicitly quoted, quoting must be
explicit if desired (there was no backward compatibility need for this newer
macro). This is an active macro call, results are pushed back to input for
re-evaluation. Tokens %1, %2, ... %9 in arg3 and arg4 are replaced by captured
subexpressions (in parenthesis) of a regexp, and %0 is replaced by arg1.

DEFINE macro call allows new macros to be dynamically defined:
    [= name |body]
It creates a new macro with a specified name (whitespace around name is
trimmed), giving it the argument as a macro body string. No arguments are
implicitly quoted, quoting must be explicit (for most uses at least the
body string should be quoted in [" ... "] ). Body string may contain tokens
%0, %1, %2, ... %9 as formal arguments. These will be substituted with actual
arguments (or empty strings for missing arguments) at the time of a call.
In most respects these dynamically defined macros behave just like other
pre-defined macros. One distinction is that they can only produce a scalar
string, they can not produce an array.