In computer programming, leaning toothpick syndrome (LTS) is the situation in which a quoted expression becomes unreadable because it contains a large number of escape characters, usually backslashes (""), to avoid delimiter collision.
Contents
The official Perl documentation introduced the term to wider usage; there, the phrase is used to describe regular expressions that match Unix-style paths in which the elements are separated by slashes /
. The slash is also used as the default regular expression delimiter which must be escaped with a backslash, , leading to frequent escaped slashes represented as
/
. If doubled, as in URLs, this yields //
for an escaped //
. A similar phenomenon occurs for DOS/Windows paths, where the backslash is used as a path separator, requiring a doubled backslash – this can then be re-escaped for a regular expression inside an escaped string, requiring
to match a single backslash. In extreme cases, such as a regular expression in escaped string, matching a Uniform Naming Convention path (which begins
) this requires 8 backslashes
due to 2 backslashes each being double-escaped.
LTS appears in many programming languages and in many situations, including in patterns that match Uniform Resource Identifiers (URIs) and in programs that output quoted text. Many quines fall into the latter category.
Pattern example
Consider the following Perl regular expression intended to match URIs which identify files under the pub
directory of an FTP site:
Perl, like sed before it, solves this problem by allowing many other characters to be delimiters for a regular expression. For example, the following three examples are equivalent to the expression given above:
m{ftp://[^/]*/pub/} m#ftp://[^/]*/pub/# m!ftp://[^/]*/pub/!Quoted text example
A Perl program to print an HTML link tag, where the URL and link text are stored in variables $url
and $text
respectively, might look like this. Notice the use of backslashes to escape the quoted double-quote characters:
Using single quotes to delimit the string is not feasible, as Perl does not expand variables inside single-quoted strings. The code below, for example, would not work as intended.
Using the printf
function is a viable solution in many languages (Perl, C, PHP):
The qq
operator in Perl allows for any delimiter:
Here documents are especially well suited for multi-line strings; however, here documents do not allow for proper indentation. This example shows the Perl syntax:
C#
The C# programming language handles LTS by the use of the '@' symbol at the start of string literals, before the initial quotation marks e.g.
rather than otherwise requiring:
C++
The C++11 standard adds raw strings:
If the string contains the characters )" an optional delimiter can be used, such as d in the following example:
Go
Go indicates that a string is raw by using the backtick as a delimiter:
Raw strings may contain any character except backticks; there is no escape code for a backtick in a raw string. Raw strings may also span multiple lines, as in this example where the strings s and t are equivalent:
Python
Python has a similar construct using 'r':
One can also use them together with triple quotes:
Scala
Scala allows usage of triple quotes in order to prevent escaping confusion:
The triple quotes also allow for multi line strings, as shown here:
Sed
Sed regular expressions, particularly those using the "s" operator, are much similar to Perl (sed is a predecessor to Perl). The default delimiter is "/", but any delimiter can be used; the default is "s/regexp/replacement/", but "s:regexp:replacement:" is also a valid form. For example, to match a "pub" directory (as in the Perl example) and replace it with "foo", the default (escaping the slashes) is:
s/ftp://[^/]*/pub//foo/Using an exclamation point ("!") as delimiter instead yields:
s!ftp://[^/]*/pub/!foo!