Language modules let you extend BBEdit’s syntax coloring, block folding, and file indexing by adding a specification for each language you want to support. Native-code modules use specialized, hand-coded parsers to provide those services in a highly sophisticated, context-sensitive form. (If this interests you, you can find the language module SDK at https://www.barebones.com/support/develop.)
Codeless language modules allow you to go almost as far using less-demanding tools.
Codeless language modules take the form of property list files (plists), a simple XML document type used throughout Mac OS X. The format provides a set of primitive types, like strings and numbers, plus structural elements like dictionaries and arrays. You can get an overview by typing man 5 plist in a Terminal window. Browsing to Apple’s Mac Dev Center, http://developer.apple.com/mac (free) and searching for “Property List Programming Guide” will give you the whole story. Codeless language modules are dictionaries (<dict>) matching values to key strings that identify them.
Apple’s Xcode development environment provides a graphical editor for plists, and editor applications are available on the Internet, but this reference assumes you’ll write modules using BBEdit itself. BBEdit comes with a generic clipping set for property lists, which it selects automatically when you edit a file with the .plist extension.
BBEdit gives you a choice of two strategies for constructing codeless language modules . The string-based strategy requires only that you provide literal string tokens for strings, comments, and keywords. The regular-expression-based strategy gives you finer control over how BBEdit colors and indexes your files, but it requires comfort with large regular expressions.
Once you’ve completed your language module, place it in the Language Modules subfolder of BBEdit's application support folder, and then quit and relaunch BBEdit in order to load it.
As dictionary property lists, codeless language modules are organized as a series of defined keys, paired with string, Boolean, array, and dictionary values. A few are mandatory; most are optional, or are alternatives to others, depending on your approach to the module. This section will list all of the keys and their functions.
These keys apply to both string-based and regular-expression-based codeless language modules. They identify the plist to BBEdit as a language module, and tell it how to identify and present the language.
BBEditDocumentType | string | mandatory |
This string must be CodelessLanguageModule. If it is not present, BBEdit won’t recognize the plist as a language module, and will ignore the file.
BBLMLanguageDisplayName | string | mandatory |
This is the name BBEdit will display in popup menus and preferences for this language. The name should be enough to identify the language, but not so long that it doesn’t fit. Examples are “C,” “HTML,” and “Perforce Specification.”
BBLMLanguageCode | string | mandatory |
A four-character ASCII string that uniquely identifies the language this module describes. It must not be the same as the code for any other CLM, nor for languages built into BBEdit.
Unfortunately, your first clue that you’ve picked a conflicting code may be that your text gets colored and indexed as though it were in some other language. If that happens, this is the first issue you should look for.
Codeless language modules have two complementary ways of claiming files. BBLMSuffixMap lists filename suffixes that, if they match a file’s name, claim the file for that module. BBLMFileNamesToMatch lets you provide a set of whole-name patterns that will identify files as belonging to your language. Both are optional, but if you don’t provide either, your user will have to choose your language from the language popup whenever she wants to use it.
BBEdit assigns a language module to a file, based on the file’s name, in three steps:
Each stage is run against all modules before the next is attempted. The process stops at the first match. The order of modules in the matching process is not guaranteed; if more than one module claims a file pattern, the winner is not predictable.
A third key, BBLMPreferredFilenameExtension, gives BBEdit a file-name extension to suggest when the user first saves a file.
BBLMSuffixMap | array of dictionaries | optional |
The suffix map is an array of dictionaries, one for each type of file that this language can use. Each dictionary claims a file extension for the language.
The mapping dictionaries use only one key:
BBLMLanguageSuffix | string | mandatory in suffix map dictionaries |
This string is the suffix (not the extension — you must include the dot to specify an extension) that identifies files of this specific type. Examples include .h, .plist, or Makefile.
Here’s an example of a suffix map for the DTrace scripting language, in which .d files are source, and .h files are headers:
In the absence of a BBLMPreferredFilenameExtension key, BBEdit will suggest the first suffix in the map as the filename extension when the user first saves (or saves-as) a file.
BBLMFileNamesToMatch | array of string | optional |
Your language uses this list of string patterns to claim files on a whole-name basis. If a file’s name matches one of these patterns, BBEdit assigns your module to the file. The patterns use the same “glob” syntax you may have used in command-line shells.
This is useful if your module handles a class of files with no common elements to their names. BBEdit’s installed modules use this pattern to claim Rakefile for Ruby, and README, COPYING, and INSTALLING for “Software Package Notes.”
Mac OS X’s HFS+ file system is (by default) case-insensitive, and so are these patterns. If the pattern list contains *.c, for instance, the pattern will capture both foo.c and foo.C.
BBLMPreferredFilenameExtension | string | optional |
When the user chooses Save As… (or Save for the first time) on a file, and has designated a language with the editor window’s language popup, BBEdit will suggest this string as the file name extension. Unlike the suffix-mapping table, the preferred extension should not begin with a dot. For a DTrace source file, this entry might be
If BBLMPreferredFilenameExtension is not present in the CLM, BBEdit will suggest the first extension in the module’s suffix map.
… where BBLMLanguageDisplayName tag. The preference name is like any other parameter to a UNIX shell command — if it contains a space or other special characters, you must quote or escape it.
is the name of your language, as specified by theBBLMSupportsTextCompletion | Boolean | optional default false |
The name of this key is misleading: BBEdit will offer text completion regardless of whether BBLMSupportsTextCompletion is set. Instead, it controls how completions are offered.
If <false/> (the default), when the user triggers text completion, BBEdit will offer completions from symbols that are anywhere in the text, as well as the system spelling dictionary.
If <true/>, BBEdit will offer completions from the spelling dictionary within strings and comments. If the cursor is in code (not a string or a comment), BBEdit will offer completions only from ctags, clipping names, and symbols already in the code portions of the file.
In this context, “symbols” are any runs of text defined by the “Identifier and Keyword” character sets. Spelling-dictionary words will be offered only if the user has enabled the feature in the “Editing” preferences panel.
BBLMCanSpellCheckCodeRuns | Boolean | optional default false |
If <true/>, BBEdit will apply its spelling checker to any words that are neither keywords, nor within comments or strings.
If the language is a markup language like Markdown or LaTeX, and therefore mostly human-readable text, it would be convenient to set this key to <true/>. Programming languages would usually set BBLMCanSpellCheckCodeRuns to <false/>.
BBLMReferenceSearchURLTemplate | string | optional |
Users can use the Languages panel of the Preferences window to set custom preferences for a language, including a template for a URL that will return documentation for a selected symbol. Use this tag to specify a default template to be used for files claimed by this module.
Specify the whole URL, putting __SYMBOLNAME__ where the symbol to be looked up should go. Example:
BBLMIsCaseSensitive | Boolean | optional default false |
This tag controls whether the module itself takes case into account when it matches strings.
If <true/>, keywords and pattern strings will match text only if they match case (“BBEdit” will match “BBEdit”, but not “bbedit”). If <false/>, text need not match string or pattern case (“BBEdit” will match “bbedit” as well as “BBEdit”).
BBLMScansFunctions | Boolean | optional default false |
If <true/>, BBEdit will attempt to extract function names from the text, and put them into the function navigation pop-up menu for quick access. Under Language Features, you must specify either Identifier and Keyword Characters or Identifier and Keyword Character Class; and either Prefix for Functions or Function Pattern.
If <false/> or absent, BBEdit won’t put a function pop-up in the navigation bar, even if you provide a function pattern or prefix.
These tags control syntax coloring for both types of codeless language module. They turn coloring on and off, and identify words the module is to color as keywords or predefined names.
BBLMColorsSyntax | Boolean | optional default false |
A Boolean element (either <true/> or <false/>) that determines whether BBEdit will color strings, comments, and keywords. If you omit this key, BBEdit will not do syntax coloring.
BBLMKeywordList | array of string | optional |
A list of <string> elements indicating words BBEdit should color as keywords. BBEdit defines a word as a complete, contiguous run of characters defined by Identifier and Keyword Characters or Identifier and Keyword Character Class under Language Features.
Example:BBLMPredefinedNameList | array of string | optional |
A list of <string> elements that identify words BBEdit should color as “predefined names” (as used in the Text Colors preferences). BBEdit identifies words the same way as it does for the BBLMKeywordList.
These patterns are tokens that begin and end comments in your language. Beginning with BBEdit 10.1, BBEdit will use them to respond to the Un/Comment Selection command in the Text menu.
You must specify either line-comment or block-comment tokens. If you specify both, Un/Comment Selection will choose by following the PreferLineCommentsWhenCommenting expert preference. If you specify one, BBEdit will use that method regardless of the setting of PreferLineCommentsWhenCommenting.
The comment patterns under Language Features are still used to determine what text is to be colored as comments. Unlike the comment-delimiter strings, the Un/Comment patterns remain in effect if you use regular-expression patterns.
BBLMCommentLineDefault | string | mandatory if prefix/suffix not specified |
If this tag is set, selecting Un/Comment Selection in the Text menu will prepend this string to each line that contains the selection. You’d set this to // in a C-family language, or # in most scripting languages.
You must set this tag if you do not provide BBLMCommentPrefixDefault and BBLMCommentSuffixDefault.
BBLMCommentPrefixDefault | string | mandatory if line not specified |
BBLMCommentSuffixDefault | string | mandatory if line not specified |
If these tags are set, selecting Un/Comment Selection in the Text menu will surround the selected lines with the prefix and suffix tokens. In C-family languages, these strings would be /* and */ respectively.
If either of these tags is set, the other must be set as well. You must provide this pair if BBLMCommentLineDefault is not set.
Language Features | dictionary | mandatory |
The Language Features dictionary tells BBEdit how to identify features of your language such as strings, comments, and symbols. This section divides the features dictionary into two strategies: String-based, which identifies features by matching literal strings, and regular-expression-based, which identifies features by regular expressions. It is possible to mix the strategies, but in general, providing regular-expression patterns will cause string patterns to be ignored.
These keys are used regardless of your strategy. They tell BBEdit how to identify a run of characters as a single token in your language. A token is defined to be any continuous run of characters you specify for these keys.
One way BBEdit uses these character sets is to distinguish keywords from words that happen to contain them. Even if begin is in your BBLMKeywordList, BBEdit won’t highlight the first five letters in beginning, because it matches against character runs, and the whole run doesn’t match.
These character classes can’t fully identify legal tokens: If the set contains alphanumeric characters and underscore, BBEdit still won’t know about your language’s requirement that identifiers not begin with numerals. However, the character-class mechanism is good enough for the purpose of syntax coloring.
You must supply one of these tags or the other.
Identifier and Keyword Characters | string | mandatory if no Character Class |
This is a string containing every character your language permits in a token (for instance, a keyword, predefined symbol, or function name). For a C-family language, this would be
Identifier and Keyword Character Class | string | mandatory if no Character Set |
This string provides a more compact way to specify token characters. To simplify: BBEdit internally encloses this string in brackets ([]) for use in a PCRE regular expression. C-family tokens could be expressed as
The easiest way to mark language elements is by providing literal strings to delimit them. This is an effective technique for most languages: Provide delimiters for common constructs like comments, strings, and statement blocks, and BBEdit will color, index, and fold code accurately enough to satisfy most requirements.
These strings identify runs of characters as either comments or strings. There are two kinds of comments: Block comments, which run from an opening token to a closing token, and line comments, which run from an opening token to the end of the current line.
BBEdit allows your language to have two kinds of strings. This follows the pattern of C-family languages that enclose character literals in single quotes, and character-array literals in double quotes. You specify them separately as strings 1 and 2.
If you specify any regular-expression patterns, all of these tags will be ignored.
Open Block Comments | string | optional |
Close Block Comments | string | mandatory if open specified |
These tokens delimit comments that can end before the end of a line, or that can span more than one line. Examples are /* … */ in C-family languages, or =begin … =end in Ruby. The Ruby example raises an important point: Ruby’s block-comment delimiters must appear at the left margin, a restriction that a simple token can’t express. If you need to enforce such a condition, use regular-expression patterns.
Open Line Comments | string | optional |
Line comments run from the opening token to the end of a line, and then stop. Examples are // in C-family languages, % in TeX, or # in most scripting languages.
In versions of BBEdit before 10.1, and in later versions if you do not specify Un/Comment tokens, BBEdit will respond to the Un/Comment Selection command in the Text menu by adding or removing these comment tokens from the selected lines. If the PreferLineCommentsWhenCommenting expert preference is set to YES (as it is by default), Un/Comment Selection will use the Open Line Comments token to convert selected lines to comments; otherwise it will surround the selection with the block-comment tokens.
This feature has been superseded in BBEdit 10.1 and later by BBLMCommentPrefixDefault, BBLMCommentSuffixDefault, and BBLMCommentLineDefault. See the “Un/Comment Tokens” section for details.
Open Strings 1 | string | optional |
Open Strings 2 | string | optional |
Close Strings 1 | string | mandatory if open specified |
Close Strings 2 | string | mandatory if open specified |
These tokens delimit runs of characters that make up a string in your language. Many languages have more than one kind of string, and BBEdit allows you to specify separate delimiters — both opening and closing — for two string types.
Escape Char in Strings 1 | string (length 1) | optional |
Escape Char in Strings 2 | string (length 1) | optional |
Most languages let authors put string delimiters inside strings if they precede them with an “escape” character (typically a backslash). These keys specify the escape character for type-1 and type-2 strings.
BBEdit will ignore escaped delimiters, and will continue coloring a string until it encounters an unescaped closing delimiter for the current type of string.
End-of-line Ends Strings 1 | Boolean | optional default false |
End-of-line Ends Strings 2 | Boolean | optional default false |
If this key is <false/> for the given string type, BBEdit will continue coloring the string even if it contains a line break:
If this key is <true/>, BBEdit will stop coloring a string when it encounters a line break, even if it hasn’t found an end-of-string token:
However, if the end-of-line is preceded by the escape character for the given string type, BBEdit will continue coloring the string at least through the next line:
The simple-string keys in this section guide BBEdit in identifying functions for the function pop-up, and statement blocks for folding.
Prefix for Functions | string | optional |
Prefix for Procedures | string | optional |
Some languages begin subprogram definitions with keywords like FUNCTION and PROCEDURE in Pascal, or sub in Perl. These tokens signal BBEdit that it should treat the symbol that follows as the name of a subprogram, and index it in the function pop-up.
If these keys are absent, and you aren’t using regular-expression patterns for structural elements in your language, BBEdit will assume the language uses a C-style prefix — the return type of the function.
Open Parameter Lists | string | optional |
Close Parameter Lists | string | mandatory if open specified |
In most languages, subprograms may optionally take parameter lists. You’d rarely specify anything other than open and close parentheses, but these keys allow you a full range of options.
Terminator for Prototypes 1 | string | optional |
Terminator for Prototypes 2 | string | optional |
Some languages allow for forward declarations — constructs that show how a subprogram is to be called, in advance of defining the body of the function. Examples are prototypes in C-family languages (which end with “;”), or FORWARD and EXTERNAL declarations in Pascal. If BBEdit encounters one of these tokens before it sees the beginning of a statement block, it does not index the declaration in the function pop-up.
Open Statement Blocks | string | optional |
Close Statement Blocks | string | mandatory if open specified |
Many languages group statements by enclosing them in block delimiters — { … } in C-family languages, or BEGIN … END in Pascal. These keys specify the opening and closing tokens for such blocks. BBEdit detects blocks (even nested ones) and makes it possible to fold them.
Unfortunately, some languages (such as Python or Ruby) don’t have simple paired delimiters. Single-token matching does not cover such situations, which is one reason BBEdit comes with coded language modules for these languages.
The string patterns are adequate for most purposes, and are easy to specify. However, if your language is more complex, or you want more control over how BBEdit detects elements, you can provide Perl-compatible regular expressions (PCREs) for them.
Because of the way BBEdit processes regular-expression patterns internally, if you need to make backreferences in a pattern, you must use named subpatterns instead of positional (numbered) backreferences.
As with the string patterns, these pattern expressions identify character runs for strings and comments. If you use either of these tags, the string and comment tokens listed as String-like Elements in the String-Based Patterns section will be ignored.
Comment Pattern | string (PCRE) | optional |
BBEdit will color any text in a file in your language that matches this pattern as a comment. There is only one pattern for all kinds of comments, so you will usually write an alternative pattern to cover each:
is good enough for line comments and block comments in C-family languages. It also catches blocks guarded by #if 0 … #endif directives, showing (roughly) how the regular-expression pattern expands your options.
String Pattern | string (PCRE) | optional |
If text in a file matches this PCRE, BBEdit will color it as a string. This one pattern must match every type of string you want to color, but regular expressions are flexible enough to handle multiple cases. For instance, this expression matches single- and double-quoted strings, as well as the “here documents” found in most scripting languages. It accounts for escaped delimiters and newlines:
The regular-expression pattern strategy identifies functions by the Function Pattern expression, supplemented by the Skip Pattern. If you use either of these tags, all of the tokens listed the in the String-Based Patterns section above (string, comment, and structural) will be ignored.
Function Pattern | string (PCRE) | optional |
This regular-expression pattern identifies the whole body of a function in your language, and its name. BBEdit will offer to fold the function body, and will suspend its search for additional functions until after the body of the current function. It uses the name of the function to index it in the function pop-up of the navigation bar.
The function pattern should include two named subpatterns, function for the span of the complete function, and function_name to pinpoint the name. If you don’t designate a subpattern named "function", the whole Function Pattern will be used to identify functions. If you omit the function_name subpattern, BBEdit will not include the function in the function pop-up.
The following is a simplified pattern that recognizes a method declaration in Ruby:
As with all CLM regular-expression patterns, backreferences to subpatterns must be named — (?P=…) — and not positional (\1, \2, etc.).
BBLMScansFunctions must be <true/> for the function pattern to be used. Because function scanning is turned off while a function pattern matches, embedded functions will not be indexed.
Skip Pattern | string (PCRE) | optional |
When searching for function definitions, BBEdit advances through the text, character-by-character, until it detects a possible match. It then tests the ensuing character run. If the whole run matches the Function Pattern, it is indexed and folded as a function. If not, BBEdit advances to the next character and attempts to match a function from there.
This strategy is not perfect, because if a string or comment includes an apparent function definition, BBEdit will index and fold that, as well.
The Skip Pattern overrides this behavior. If the Function Pattern does not match, BBEdit attempts to match the Skip Pattern. If that succeeds, it will suspend function matching until the skip text ends — saving BBEdit from trying to reapply the function pattern character-by-character.
If you supply a String Pattern, the Skip Pattern can skip strings by including a (?P>string) subpattern; similarly, (?P>comment) will skip comments if you provided a Comment Pattern. A skip pattern that simply protects strings and comments would be:
Add one Comment Pattern to identify all types of comments:
If you provide either String Pattern or Comment Pattern, all string-based patterns for strings and comments will be ignored.
Here are some ideas to make it easier to build a codeless language module:
… where you substitute the path to the CLM property list you want to check. The easiest way to get the path into Terminal is to drag the file from the Finder into the Terminal window. Or, you can select Copy Full Path from the File Path: popup in the toolbar of a BBEdit window, and paste into a Terminal window using the Paste Escaped Text (control-command-V) command.
Press return, and plutil will give you an OK, or guidance if it finds an error.
As you grow your module, it’s likely that you’ll run into errors that lead BBEdit to reject the file. If you have a file in your language open, the code won’t be colored, and its language will be set to “(none).”
If you repair the plist, and relaunch BBEdit, your test file will still be un-colored, because it lost its language setting. Before you start looking for another error, check the language pop-up. If the CLM is still wrong, your language won’t appear; if it does, you can set it.
Property list files are XML, which reserves the <, >, and & characters exclusively for controlling the structure of the file. If you want to use those characters in-line, you must escape them as XML entities: <, >, and &.
This reference shows an alternative: Wrap the pattern in a <![CDATA[ … ]]> block, and you don't have to escape the special characters.
For readability, the CDATA blocks in this appendix put newlines and spaces before and after patterns. Ordinarily, this would mean that those patterns wouldn't match unless exactly that whitespace surrounded the target text. However, in order to make regular expressions easier to write, BBEdit strips leading and trailing whitespace from regular expressions wrapped as CDATA.
The revised, corrected, and expanded Codeless Language Module reference was written by Fritz Anderson, with technical review by Ryan Wilcox and Rich Siegel.