Rahul Sharma (Editor)

Camel case

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Camel case

Camel case (stylized as camelCase or CamelCase; also known as camel caps or more formally as medial capitals) is the practice of writing compound words or phrases such that each word or abbreviation in the middle of the phrase begins with a capital letter, with no intervening spaces or punctuation. Common examples include "iPhone", "eBay", "FedEx", and "HarperCollins". It is also sometimes used in online usernames such as "JohnSmith", and to make multi-word domain names more legible, for example in advertisements.

Contents

As the first letter of a compound word in camel case may or may not be capitalized, there is no consensus on whether the term "camel case" implies an uppercase or lowercase initial letter. For clarity, this article calls the two alternatives upper camel case (initial upper case letter also known as Title Case) and lower camel case (initial lower case letter). Some people and organizations, notably Microsoft, use the term camel case only for lower camel case.

Camel case is distinct from title case, which is traditionally used for book titles and headlines, as the latter retains the spaces between the words. Camel case is also distinct from Tall Man lettering, which uses capitals to emphasize the differences between similar-looking words.

The name "CamelCase" is not related to the "Camel Book", the popular nickname of the book Programming Perl, which uses all-lowercase identifiers with underscores (sometimes called snake case) in its sample code.

Variations and synonyms

The original name of the practice, used in media studies, grammars and the Oxford English Dictionary, was "medial capitals". Other names such as "InterCaps" or "CamelCase" are relatively recent and more common in computer-related communities. Other synonyms include:

StudlyCaps encompasses both styles of camel case, but includes even randomly mixed capitalization, as in MiXeD CaPitALiZaTioN (typically a stereotyped allusion to online culture).

The earliest known occurrence of the term "InterCaps" on Usenet is in an April 1990 post to the group alt.folklore.computers by Avi Rappoport. The earliest use of the name "CamelCase" occurs in 1995, in a post by Newton Love. Love has since said, "With the advent of programming languages having these sorts of constructs, the humpiness of the style made me call it HumpyCase at first, before I settled on CamelCase. I had been calling it CamelCase for years. ... The citation above was just the first time I had used the name on USENET."

In word combinations

The use of medial capitals as a convention in the regular spelling of everyday texts is rare, but is used in some languages as a solution to particular problems which arise when two words or segments are combined.

In Italian, pronouns can be suffixed to verbs, and because the honorific form of second-person pronouns is capitalized, this can produce a sentence like non ho trovato il tempo di risponderLe ("I haven't found time to answer you" - where Le means "you").

In German, the medial capital letter I, called Binnen-I, is sometimes used in a word like StudentInnen ("students") to indicate that both Studenten ("male students") and Studentinnen ("female students") are intended simultaneously. However, mid-word capitalisation does not conform to German orthography. The previous example could be correctly written using parentheses as Student(inn)en, analogous to "congress(wo)man" in English.

In Irish, they are used when an inflectional prefix is attached to a proper noun, for example i nGaillimh ("in Galway"), from Gaillimh ("Galway"); an tAlbanach ("the Scottish person"), from Albanach ("Scottish person"); and go hÉireann ("to Ireland"), from Éire ("Ireland"). In recent Scots Gaelic orthography, a hyphen has been inserted: an t-Albannach.

This convention is also used by several Bantu languages (e.g., kiSwahili, "Swahili language"; isiZulu, "Zulu language") and several indigenous languages of Mexico (e.g. Nahuatl, Totonacan, Mixe–Zoque, and some Oto-Manguean languages).

In English, medial capitals are usually only found in Scottish or Irish "Mac-" or "Mc-" names, where for example MacDonald, McDonald, and Macdonald are common spelling variants of the same name, and in Anglo-Norman "Fitz-" names, where for example both FitzGerald and Fitzgerald are found.

In their English style guide The King's English, first published in 1906, H. W. Fowler and F. G. Fowler suggested that medial capitals could be used in triple compound words where hyphens would cause ambiguity—the examples they give are KingMark-like (as against King Mark-like) and Anglo-SouthAmerican (as against Anglo-South American). However, they described the system as "too hopelessly contrary to use at present."

In transliterations

In the scholarly transliteration of languages written in other scripts, medial capitals are used in similar situations. For example, in transliterated Hebrew, haIvri means "the Hebrew person" and biYerushalayim means "in Jerusalem". In Tibetan proper names like rLobsang, the "r" stands for a prefix glyph in the original script that functions as tone marker rather than a normal letter. Another example is tsIurku, a Latin transcription of the Chechen term for the capping stone of the characteristic Medieval defensive towers of Chechenia and Ingushetia; the capital letter "I" here denoting a phoneme distinct from the one transcribed as "i".

In abbreviations and acronyms

Medial capitals are traditionally used in abbreviations to reflect the capitalization that the words would have when written out in full, for example in the academic titles PhD or BSc. In German, the names to statutes are abbreviated using embedded capitals, e.g. StGB (Strafgesetzbuch) for criminal code, PatG (Patentgesetz) for Patent Act, or the very common GmbH (Gesellschaft mit beschränkter Haftung) for Company with Limited Liability. In this context, there can even be three or more "CamelCase" capitals, e.g. in TzBfG for Teilzeit- und Befristungsgesetz (Act on Part-Time and Limited Term Occupations). In French, camel case acronyms such as OuLiPo (1960) were favored for a time as alternatives to initialisms.

Camel case is often used to transliterate initialisms into alphabets where two letters may be required to represent a single character of the original alphabet, e.g., DShK from Cyrillic ДШК.

Chemical formulae

The first systematic and widespread use of medial capitals for technical purposes was the notation for chemical formulae invented by the Swedish chemist Berzelius in 1813. To replace the multitude of naming and symbol conventions used by chemists until that time, he proposed to indicate each chemical element by a symbol of one or two letters, the first one being capitalized. The capitalization allowed formulae like "NaCl" to be written without spaces and still be parsed without ambiguity.

Berzelius' system remains in use to this day, augmented with three-letter symbols like "Uue" for unconfirmed or unknown elements and abbreviations for some common substituents (especially in the field of organic chemistry, for instance "Et" for "ethyl-"). This has been further extended to describe the amino acid sequences of proteins and other similar domains.

Early use in trademarks

Since the early 20th century, medial capitals have occasionally been used for corporate names and product trademarks, such as

  • DryIce Corporation (1925) marketed the solid form of carbon dioxide (CO2) as "Dry Ice", thus leading to its common name.
  • CinemaScope and VistaVision, rival widescreen movie formats (1953)
  • ShopKo (1962)
  • MisteRogers, the Canadian version of Mister Rogers' Neighborhood (1962)
  • AstroTurf (1967)
  • ConAgra, formerly Consolidated Mills (1971).
  • Computer programming

    In the 1970s and 1980s, medial capitals were adopted as a standard or alternative naming convention for multi-word identifiers in several programming languages. The precise origin of the convention in computer programming has not yet been settled. A 1954 conference proceedings informally referred to IBM's Speedcoding system as "SpeedCo". Christopher Strachey's paper on GPM (1965), shows a program that includes some medial capital identifiers, including "NextCh" and "WriteSymbol".

    Multiple-word descriptive identifiers such as end of file or char table cannot be used in most popular programming languages because the spaces between the words would be parsed as delimiters between tokens. The alternative of running the words together as in endoffile or chartable may result in identifiers that are difficult to understand and perhaps even misleading; for example, chartable is ambiguous as it could mean "chart-able" (able to be charted) or "char table" (a table of characters).

    Some early programming languages, notably Lisp (1958) and COBOL (1959), addressed this problem by allowing a hyphen ("-") to be used between words of compound identifiers, as in "END-OF-FILE": Lisp because it worked well with prefix notation (a Lisp parser would not treat a hyphen in the middle of a symbol as a subtraction operator) and COBOL because its operators were individual English words. This convention remains in use in these languages, and is also common in program names entered on a command line, as in Unix.

    However, this solution was not adequate for mathematically-oriented languages such as FORTRAN (1955) and ALGOL (1958), which used the hyphen as an infix subtraction operator. These early languages instead allowed identifiers to have spaces in them, determining the end of the identifier by context. This approach was abandoned in later languages due to the complexity it adds to tokenization. (FORTRAN initially restricted identifiers to six characters or fewer, effectively preventing multi-word identifiers except those made of very short words.)

    Exacerbating the problem was the fact that common punched card character sets of the time were uppercase only and lacked other special characters. It was only in the late 1960s that the widespread adoption of the ASCII character set made both lower case and the underscore character _ universally available. Some languages, notably C, promptly adopted underscores as word separators, and identifiers such as end_of_file are still prevalent in C programs and libraries (as well as in later languages influenced by C, like Perl and Python). However, some languages and programmers chose to avoid underscores—among other reasons to prevent confusing them with whitespace—and adopted camel case instead.

    Charles Simonyi, who worked at Xerox PARC in the 1970s and later oversaw the creation of Microsoft's Office suite of applications, invented and taught the use of Hungarian Notation, in which the lower case letter at the start of a (capitalized) variable name denotes its type. One account claims that the camel case style first became popular at Xerox PARC around 1978, with the Mesa programming language developed for the Xerox Alto computer. This machine lacked an underscore key, and the hyphen and space characters were not permitted in identifiers, leaving camel case as the only viable scheme for readable multiword names. The PARC Mesa Language Manual (1979) included a coding standard with specific rules for upper and lower camel case that was strictly followed by the Mesa libraries and the Alto operating system.

    The Smalltalk language, which was developed originally on the Alto and became quite popular in the early 1980s, may have been instrumental in spreading the style outside PARC. Camel case was also used by convention for many names in the PostScript page description language (invented by Adobe Systems founder and ex-PARC scientist John Warnock), as well as for the language itself. In addition, Niklaus Wirth, the inventor of Pascal, came to appreciate camel case during a sabbatical at PARC and used it in Modula, his next programming language.

    Spread to mainstream usage

    Whatever its origins within the computing world, the practice spread in the 1980s and 1990s, when the advent of the personal computer exposed hacker culture to the world. Camel case then became fashionable for corporate trade names, initially in technical fields; mainstream usage was well established by 1990:

  • (1962) PolyGram, formerly one of the major labels of the music industry.
  • (1971) AeroVironment
  • (1976) InterCity 125
  • (1977) CompuServe, UnitedHealthCare (now UnitedHealthcare).
  • (1978) WordStar
  • (1979) MasterCard, SportsCenter, VisiCalc
  • (1980) EchoStar
  • (1982) MicroProse, WordPerfect
  • (1983) NetWare
  • (1984) BellSouth, LaserJet, MacWorks, iDEN
  • (1985) PageMaker, EastEnders
  • (1986) SpaceCamp
  • (1987) ClarisWorks, HyperCard, PowerPoint
  • (1990) HarperCollins, SeaTac, iconic WorldWideWeb
  • During the dot-com bubble of the late 1990s, the lowercase prefixes "e" (for "electronic") and "i" (for "Internet", "information", "intelligent", etc.) became quite common, giving rise to names like Apple's iMac and the eBox software platform.

    In 1998, Dave Yost suggested that chemists use medial capitals to aid readability of long chemical names, e.g. write AmidoPhosphoRibosylTransferase instead of amidophosphoribosyltransferase. This usage was still rare in 2012.

    The practice is sometimes used for abbreviated names of certain neighborhoods, e.g. New York City neighborhoods SoHo (South of Houston Street) and TriBeCa (Triangle Below Canal Street) and San Francisco's SoMa (South of Market). Such usages erode quickly, so the neighborhoods are now typically rendered as Soho, Tribeca, and Soma.

    Internal capitalization has also been used for other technical codes like HeLa (1983).

    Programming and coding

    The use of medial caps for compound identifiers is recommended by the coding style guidelines of many organizations or software projects. For some languages (such as Mesa, Pascal, Modula, Java and Microsoft's .NET) this practice is recommended by the language developers or by authoritative manuals and has therefore become part of the language's "culture".

    Style guidelines often distinguish between upper and lower camel case, typically specifying which variety should be used for specific kinds of entities: variables, record fields, methods, procedures, types, etc. These rules are sometimes supported by static analysis tools that check source code for adherence.

    The original Hungarian notation for programming, for example, specifies that a lowercase abbreviation for the "usage type" (not data type) should prefix all variable names, with the remainder of the name in upper camel case; as such it is a form of lower camel case.

    Programming identifiers often need to contain acronyms and initialisms that are already in upper case, such as "old HTML file". By analogy with the title case rules, the natural camel case rendering would have the abbreviation all in upper case, namely "oldHTMLFile". However, this approach is problematic when two acronyms occur together (e.g., "parse DBM XML" would become "parseDBMXML") or when the standard mandates lower camel case but the name begins with an abbreviation (e.g. "SQL server" would become "sQLServer"). For this reason, some programmers prefer to treat abbreviations as if they were lower case words and write "oldHtmlFile", "parseDbmXml" or "sqlServer". However, this can make it harder to recognise that a given word is intended as an acronym.

    Camel case is used in some wiki markup languages for terms that should be automatically linked to other wiki pages. This convention was originally used in Ward Cunningham's original wiki software, WikiWikiWeb, and can be activated in most other wikis. Some wiki engines such as TiddlyWiki, Trac and PmWiki make use of it in the default settings, but usually also provide a configuration mechanism or plugin to disable it. Wikipedia formerly used camel case linking as well, but switched to explicit link markup using square brackets and many other wiki sites have done the same. Some wikis that do not use camel case linking may still use the camel case as a naming convention, such as AboutUs.

    Other uses

    The NIEM registry requires that XML data elements use upper camel case and XML attributes use lower camel case.

    Most popular command-line interfaces and scripting languages cannot easily handle file names that contain embedded spaces (usually requiring the name to be put in quotes). Therefore, users of those systems often resort to camel case (or underscores, hyphens and other "safe" characters) for compound file names like MyJobResume.pdf.

    Microblogging and social networking sites that limit the number of characters in a message (most famously Twitter, where the 140-character limit can be quite restrictive in languages that rely on alphabets, including English) are potential outlets for medial capitals. Using camel case between words reduces the number of spaces, and thus the number of characters, in a given message, allowing more content to fit into the limited space. Hashtags, especially long ones, often use camel case to maintain readability (e.g. #CollegeStudentProblems is easier to read than #collegestudentproblems).

    In website URLs, spaces are percent-encoded as "%20", making the address longer and less human readable. By omitting spaces, camel case does not have this problem.

    Readability studies

    Camel case has been criticised as negatively impacting readability due to the removal of spaces and uppercasing of every word.

    A 2009 study comparing snake case to camel case found that camel case identifiers could be recognised with higher accuracy among both programmers and non-programmers, and that programmers already trained in camel case were able to recognise those identifiers faster than underscored snake-case identifiers.

    A 2010 follow-up study, under the same conditions but using an improved measurement method with use of eye-tracking equipment, indicates: "While results indicate no difference in accuracy between the two styles, subjects recognize identifiers in the underscore style more quickly."

    References

    Camel case Wikipedia