Bare Bones Software | Codeless Language Module Reference

Codeless Language Modules

Language modules let you extend BBEdit’s syntax coloring, block folding, and file indexing by adding a specification for each language you want to support. Native-code modules use specialized, hand-coded parsers to provide those services in a highly sophisticated, context-sensitive form. (If this interests you, you can find the language module SDK at https://www.barebones.com/support/develop.)

Codeless language modules allow you to go almost as far using less-demanding tools.

Note:: Although this document refers exclusively to writing codeless language modules for BBEdit, the information here also applies to contemporary TextWrangler versions; and so your codeless language module will be compatible with both products (assuming BBEdit 10.1 or later and TextWrangler 4.0 and later).; For recent changes to codeless language modules which apply only to BBEdit 11, please see this tech note.

In this document

Types of Codeless Language Modules

Codeless language modules take the form of property list files (plists), a simple XML document type used throughout Mac OS X. The format provides a set of primitive types, like strings and numbers, plus structural elements like dictionaries and arrays. You can get an overview by typing man 5 plist in a Terminal window. Browsing to Apple’s Mac Dev Center, http://developer.apple.com/mac (free) and searching for “Property List Programming Guide” will give you the whole story. Codeless language modules are dictionaries (<dict>) matching values to key strings that identify them.

Apple’s Xcode development environment provides a graphical editor for plists, and editor applications are available on the Internet, but this reference assumes you’ll write modules using BBEdit itself. BBEdit comes with a generic clipping set for property lists, which it selects automatically when you edit a file with the .plist extension.

BBEdit gives you a choice of two strategies for constructing codeless language modules . The string-based strategy requires only that you provide literal string tokens for strings, comments, and keywords. The regular-expression-based strategy gives you finer control over how BBEdit colors and indexes your files, but it requires comfort with large regular expressions.

Once you’ve completed your language module, place it in the Language Modules subfolder of BBEdit's application support folder, and then quit and relaunch BBEdit in order to load it.

Note:: By default, BBEdit's application support folder is located at: /Users/USERNAME/Library/Application Support/BBEdit/ though if you use Dropbox, you can optionally move this folder into your Dropbox data folder as detailed in Chapter 2 of the PDF manual.

Keys for Codeless Language Modules

As dictionary property lists, codeless language modules are organized as a series of defined keys, paired with string, Boolean, array, and dictionary values. A few are mandatory; most are optional, or are alternatives to others, depending on your approach to the module. This section will list all of the keys and their functions.

Keys for All Modules

These keys apply to both string-based and regular-expression-based codeless language modules. They identify the plist to BBEdit as a language module, and tell it how to identify and present the language.

Identification

BBEditDocumentType

string

mandatory

This string must be CodelessLanguageModule. If it is not present, BBEdit won’t recognize the plist as a language module, and will ignore the file.

BBLMLanguageDisplayName

string

mandatory

This is the name BBEdit will display in popup menus and preferences for this language. The name should be enough to identify the language, but not so long that it doesn’t fit. Examples are “C,” “HTML,” and “Perforce Specification.”

BBLMLanguageCode

string

mandatory

A four-character ASCII string that uniquely identifies the language this module describes. It must not be the same as the code for any other CLM, nor for languages built into BBEdit.

Unfortunately, your first clue that you’ve picked a conflicting code may be that your text gets colored and indexed as though it were in some other language. If that happens, this is the first issue you should look for.

File Mapping

Codeless language modules have two complementary ways of claiming files. BBLMSuffixMap lists filename suffixes that, if they match a file’s name, claim the file for that module. BBLMFileNamesToMatch lets you provide a set of whole-name patterns that will identify files as belonging to your language. Both are optional, but if you don’t provide either, your user will have to choose your language from the language popup whenever she wants to use it.

BBEdit assigns a language module to a file, based on the file’s name, in three steps:

If the file has a suffix that the user has assigned to a language with a Custom Extension Mapping in the Languages preference panel, that module is used.
If the name ends with a string in a module’s BBLMSuffixMap list, the module claims the file.
If the name matches the pattern in a module’s BBLMFileNamesToMatch list, that module claims the file.

Each stage is run against all modules before the next is attempted. The process stops at the first match. The order of modules in the matching process is not guaranteed; if more than one module claims a file pattern, the winner is not predictable.

A third key, BBLMPreferredFilenameExtension, gives BBEdit a file-name extension to suggest when the user first saves a file.

BBLMSuffixMap

array of dictionaries

optional

The suffix map is an array of dictionaries, one for each type of file that this language can use. Each dictionary claims a file extension for the language.

The mapping dictionaries use only one key:

BBLMLanguageSuffix

string

mandatory
in suffix map dictionaries

This string is the suffix (not the extension — you must include the dot to specify an extension) that identifies files of this specific type. Examples include .h, .plist, or Makefile.

Note:: If you are familiar with earlier versions of BBEdit, you'll remember that suffix-map dictionaries included BBLMIsSourceKind and BBLMIsHeaderKind to designate “source” and “header” counterparts. Recent versions of BBEdit ignore these tags — counterpart matching is now done by a more sophisticated examination of file names.

Here’s an example of a suffix map for the DTrace scripting language, in which .d files are source, and .h files are headers:

<key>BBLMSuffixMap</key> <array> <dict> <key>BBLMLanguageSuffix</key> <string>.d</string> </dict> <dict> <key>BBLMLanguageSuffix</key> <string>.h</string> </dict> </array>

In the absence of a BBLMPreferredFilenameExtension key, BBEdit will suggest the first suffix in the map as the filename extension when the user first saves (or saves-as) a file.

BBLMFileNamesToMatch

array of string

optional

Your language uses this list of string patterns to claim files on a whole-name basis. If a file’s name matches one of these patterns, BBEdit assigns your module to the file. The patterns use the same “glob” syntax you may have used in command-line shells.

This is useful if your module handles a class of files with no common elements to their names. BBEdit’s installed modules use this pattern to claim Rakefile for Ruby, and README, COPYING, and INSTALLING for “Software Package Notes.”

Mac OS X’s HFS+ file system is (by default) case-insensitive, and so are these patterns. If the pattern list contains *.c, for instance, the pattern will capture both foo.c and foo.C.

BBLMPreferredFilenameExtension

string

optional

When the user chooses Save As… (or Save for the first time) on a file, and has designated a language with the editor window’s language popup, BBEdit will suggest this string as the file name extension. Unlike the suffix-mapping table, the preferred extension should not begin with a dot. For a DTrace source file, this entry might be

<key>BBLMPreferredFilenameExtension</key> <string>d</string>

If BBLMPreferredFilenameExtension is not present in the CLM, BBEdit will suggest the first extension in the module’s suffix map.

Note:: You can also set the preferred extension through an expert preference. The form is
defaults write com.barebones.bbedit \ PreferredFilenameExtension_LanguageName \ -string "abc"

… where LanguageName is the name of your language, as specified by the BBLMLanguageDisplayName tag. The preference name is like any other parameter to a UNIX shell command — if it contains a space or other special characters, you must quote or escape it.

Services

BBLMSupportsTextCompletion

Boolean

optional
default false

The name of this key is misleading: BBEdit will offer text completion regardless of whether BBLMSupportsTextCompletion is set. Instead, it controls how completions are offered.

If <false/> (the default), when the user triggers text completion, BBEdit will offer completions from symbols that are anywhere in the text, as well as the system spelling dictionary.

If <true/>, BBEdit will offer completions from the spelling dictionary within strings and comments. If the cursor is in code (not a string or a comment), BBEdit will offer completions only from ctags, clipping names, and symbols already in the code portions of the file.

In this context, “symbols” are any runs of text defined by the “Identifier and Keyword” character sets. Spelling-dictionary words will be offered only if the user has enabled the feature in the “Editing” preferences panel.

BBLMCanSpellCheckCodeRuns

Boolean

optional
default false

If <true/>, BBEdit will apply its spelling checker to any words that are neither keywords, nor within comments or strings.

If the language is a markup language like Markdown or LaTeX, and therefore mostly human-readable text, it would be convenient to set this key to <true/>. Programming languages would usually set BBLMCanSpellCheckCodeRuns to <false/>.

BBLMReferenceSearchURLTemplate

string

optional

Users can use the Languages panel of the Preferences window to set custom preferences for a language, including a template for a URL that will return documentation for a selected symbol. Use this tag to specify a default template to be used for files claimed by this module.

Specify the whole URL, putting __SYMBOLNAME__ where the symbol to be looked up should go. Example:

http://www.example.com/foobar.cgi?__SYMBOLNAME__

BBLMIsCaseSensitive

Boolean

optional
default false

This tag controls whether the module itself takes case into account when it matches strings.

If <true/>, keywords and pattern strings will match text only if they match case (“BBEdit” will match “BBEdit”, but not “bbedit”). If <false/>, text need not match string or pattern case (“BBEdit” will match “bbedit” as well as “BBEdit”).

BBLMScansFunctions

Boolean

optional
default false

If <true/>, BBEdit will attempt to extract function names from the text, and put them into the function navigation pop-up menu for quick access. Under Language Features, you must specify either Identifier and Keyword Characters or Identifier and Keyword Character Class; and either Prefix for Functions or Function Pattern.

If <false/> or absent, BBEdit won’t put a function pop-up in the navigation bar, even if you provide a function pattern or prefix.

Controlling Syntax Coloring

These tags control syntax coloring for both types of codeless language module. They turn coloring on and off, and identify words the module is to color as keywords or predefined names.

BBLMColorsSyntax

Boolean

optional
default false

A Boolean element (either <true/> or <false/>) that determines whether BBEdit will color strings, comments, and keywords. If you omit this key, BBEdit will not do syntax coloring.

BBLMKeywordList

array of string

optional

A list of <string> elements indicating words BBEdit should color as keywords. BBEdit defines a word as a complete, contiguous run of characters defined by Identifier and Keyword Characters or Identifier and Keyword Character Class under Language Features.

Example:

<key>BBLMKeywordList</key> <array> <string>auto</string> <string>goto</string> <string>sizeof</string> <string>break</string> . . . </array>

BBLMPredefinedNameList

array of string

optional

A list of <string> elements that identify words BBEdit should color as “predefined names” (as used in the Text Colors preferences). BBEdit identifies words the same way as it does for the BBLMKeywordList.

Note:: There is no need to sort the keyword or predefined-name lists.

Note:: BBEdit will include your keywords and predefined names in code-completion lists.

Un/Comment Tokens

These patterns are tokens that begin and end comments in your language. Beginning with BBEdit 10.1, BBEdit will use them to respond to the Un/Comment Selection command in the Text menu.

You must specify either line-comment or block-comment tokens. If you specify both, Un/Comment Selection will choose by following the PreferLineCommentsWhenCommenting expert preference. If you specify one, BBEdit will use that method regardless of the setting of PreferLineCommentsWhenCommenting.

The comment patterns under Language Features are still used to determine what text is to be colored as comments. Unlike the comment-delimiter strings, the Un/Comment patterns remain in effect if you use regular-expression patterns.

BBLMCommentLineDefault

string

mandatory
if prefix/suffix not specified

If this tag is set, selecting Un/Comment Selection in the Text menu will prepend this string to each line that contains the selection. You’d set this to // in a C-family language, or # in most scripting languages.

Note:: See the beginning of this section for how this setting interacts with the PreferLineCommentsWhenCommenting expert preference.

You must set this tag if you do not provide BBLMCommentPrefixDefault and BBLMCommentSuffixDefault.

BBLMCommentPrefixDefault

string

mandatory
if line not specified

BBLMCommentSuffixDefault

string

mandatory
if line not specified

If these tags are set, selecting Un/Comment Selection in the Text menu will surround the selected lines with the prefix and suffix tokens. In C-family languages, these strings would be /* and */ respectively.

Note:: See the beginning of this section for how these tags interact with the PreferLineCommentsWhenCommenting expert preference.

If either of these tags is set, the other must be set as well. You must provide this pair if BBLMCommentLineDefault is not set.

Language Features

dictionary

mandatory

The Language Features dictionary tells BBEdit how to identify features of your language such as strings, comments, and symbols. This section divides the features dictionary into two strategies: String-based, which identifies features by matching literal strings, and regular-expression-based, which identifies features by regular expressions. It is possible to mix the strategies, but in general, providing regular-expression patterns will cause string patterns to be ignored.

Identifying keyword strings

These keys are used regardless of your strategy. They tell BBEdit how to identify a run of characters as a single token in your language. A token is defined to be any continuous run of characters you specify for these keys.

One way BBEdit uses these character sets is to distinguish keywords from words that happen to contain them. Even if begin is in your BBLMKeywordList, BBEdit won’t highlight the first five letters in beginning, because it matches against character runs, and the whole run doesn’t match.

These character classes can’t fully identify legal tokens: If the set contains alphanumeric characters and underscore, BBEdit still won’t know about your language’s requirement that identifiers not begin with numerals. However, the character-class mechanism is good enough for the purpose of syntax coloring.

You must supply one of these tags or the other.

Identifier and Keyword Characters

string

mandatory
if no Character Class

This is a string containing every character your language permits in a token (for instance, a keyword, predefined symbol, or function name). For a C-family language, this would be

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_

Identifier and Keyword Character Class

string

mandatory
if no Character Set

This string provides a more compact way to specify token characters. To simplify: BBEdit internally encloses this string in brackets ([]) for use in a PCRE regular expression. C-family tokens could be expressed as

A-Za-z0-9_

You can include any UTF-16 character, or the same escaped characters BBEdit accepts in the Find window:

Standard character escapes like \t for tabs, or \\ for backslashes.
Characters specified in hexadecimal with the \xNN notation.

String-Based Patterns

The easiest way to mark language elements is by providing literal strings to delimit them. This is an effective technique for most languages: Provide delimiters for common constructs like comments, strings, and statement blocks, and BBEdit will color, index, and fold code accurately enough to satisfy most requirements.

String-like Elements

These strings identify runs of characters as either comments or strings. There are two kinds of comments: Block comments, which run from an opening token to a closing token, and line comments, which run from an opening token to the end of the current line.

BBEdit allows your language to have two kinds of strings. This follows the pattern of C-family languages that enclose character literals in single quotes, and character-array literals in double quotes. You specify them separately as strings 1 and 2.

If you specify any regular-expression patterns, all of these tags will be ignored.

Open Block Comments

string

optional

Close Block Comments

string

mandatory
if open specified

These tokens delimit comments that can end before the end of a line, or that can span more than one line. Examples are /* … */ in C-family languages, or =begin … =end in Ruby. The Ruby example raises an important point: Ruby’s block-comment delimiters must appear at the left margin, a restriction that a simple token can’t express. If you need to enforce such a condition, use regular-expression patterns.

Open Line Comments

string

optional

Line comments run from the opening token to the end of a line, and then stop. Examples are // in C-family languages, % in TeX, or # in most scripting languages.

In versions of BBEdit before 10.1, and in later versions if you do not specify Un/Comment tokens, BBEdit will respond to the Un/Comment Selection command in the Text menu by adding or removing these comment tokens from the selected lines. If the PreferLineCommentsWhenCommenting expert preference is set to YES (as it is by default), Un/Comment Selection will use the Open Line Comments token to convert selected lines to comments; otherwise it will surround the selection with the block-comment tokens.

This feature has been superseded in BBEdit 10.1 and later by BBLMCommentPrefixDefault, BBLMCommentSuffixDefault, and BBLMCommentLineDefault. See the “Un/Comment Tokens” section for details.

Open Strings 1

string

optional

Open Strings 2

string

optional

Close Strings 1

string

mandatory
if open specified

Close Strings 2

string

mandatory
if open specified

These tokens delimit runs of characters that make up a string in your language. Many languages have more than one kind of string, and BBEdit allows you to specify separate delimiters — both opening and closing — for two string types.

Escape Char in Strings 1

string (length 1)

optional

Escape Char in Strings 2

string (length 1)

optional

Most languages let authors put string delimiters inside strings if they precede them with an “escape” character (typically a backslash). These keys specify the escape character for type-1 and type-2 strings.

BBEdit will ignore escaped delimiters, and will continue coloring a string until it encounters an unescaped closing delimiter for the current type of string.

End-of-line Ends Strings 1

Boolean

optional
default false

End-of-line Ends Strings 2

Boolean

optional
default false

If this key is <false/> for the given string type, BBEdit will continue coloring the string even if it contains a line break:

"This string is colored, even though it is on more than one line."

If this key is <true/>, BBEdit will stop coloring a string when it encounters a line break, even if it hasn’t found an end-of-string token:

"This part of the string is colored, but this part isn’t."

However, if the end-of-line is preceded by the escape character for the given string type, BBEdit will continue coloring the string at least through the next line:

"Because of this backslash -> \ this whole string is colored."

Note:: Take care with the capitalization of these keys: The “l” in “line” is lower-case.

Structural Elements

The simple-string keys in this section guide BBEdit in identifying functions for the function pop-up, and statement blocks for folding.

Prefix for Functions

string

optional

Prefix for Procedures

string

optional

Some languages begin subprogram definitions with keywords like FUNCTION and PROCEDURE in Pascal, or sub in Perl. These tokens signal BBEdit that it should treat the symbol that follows as the name of a subprogram, and index it in the function pop-up.

If these keys are absent, and you aren’t using regular-expression patterns for structural elements in your language, BBEdit will assume the language uses a C-style prefix — the return type of the function.

Open Parameter Lists

string

optional

Close Parameter Lists

string

mandatory
if open specified

In most languages, subprograms may optionally take parameter lists. You’d rarely specify anything other than open and close parentheses, but these keys allow you a full range of options.

Terminator for Prototypes 1

string

optional

Terminator for Prototypes 2

string

optional

Some languages allow for forward declarations — constructs that show how a subprogram is to be called, in advance of defining the body of the function. Examples are prototypes in C-family languages (which end with “;”), or FORWARD and EXTERNAL declarations in Pascal. If BBEdit encounters one of these tokens before it sees the beginning of a statement block, it does not index the declaration in the function pop-up.

Open Statement Blocks

string

optional

Close Statement Blocks

string

mandatory
if open specified

Many languages group statements by enclosing them in block delimiters — { … } in C-family languages, or BEGIN … END in Pascal. These keys specify the opening and closing tokens for such blocks. BBEdit detects blocks (even nested ones) and makes it possible to fold them.

Unfortunately, some languages (such as Python or Ruby) don’t have simple paired delimiters. Single-token matching does not cover such situations, which is one reason BBEdit comes with coded language modules for these languages.

Regular-Expression Patterns

The string patterns are adequate for most purposes, and are easy to specify. However, if your language is more complex, or you want more control over how BBEdit detects elements, you can provide Perl-compatible regular expressions (PCREs) for them.

Because of the way BBEdit processes regular-expression patterns internally, if you need to make backreferences in a pattern, you must use named subpatterns instead of positional (numbered) backreferences.

Note:: Regular-expression syntax often conflicts with that of XML, the format for property lists. Using XML entities like < can make patterns very difficult to read and edit. The examples in this reference show how to break regular expressions out of the XML syntax using CDATA blocks. See the Tips section for details.

String-like Elements

As with the string patterns, these pattern expressions identify character runs for strings and comments. If you use either of these tags, the string and comment tokens listed as String-like Elements in the String-Based Patterns section will be ignored.

Comment Pattern

string (PCRE)

optional

BBEdit will color any text in a file in your language that matches this pattern as a comment. There is only one pattern for all kinds of comments, so you will usually write an alternative pattern to cover each:

(?x: (//.*$) | (?# Line comment) (?s:/\*.*?\*/) | (?# Block comment) (?s:#if\s+0\b (?# #if-out block) .+? (#else|#endif)) )

is good enough for line comments and block comments in C-family languages. It also catches blocks guarded by #if 0 … #endif directives, showing (roughly) how the regular-expression pattern expands your options.

Note:: See “Perl-Style Pattern Extensions” in the "Searching with Grep" chapter of the User Manual (or in the online help) for an explanation of the ?x:, ?s: and ?# pattern modifiers.

String Pattern

string (PCRE)

optional

If text in a file matches this PCRE, BBEdit will color it as a string. This one pattern must match every type of string you want to color, but regular expressions are flexible enough to handle multiple cases. For instance, this expression matches single- and double-quoted strings, as well as the “here documents” found in most scripting languages. It accounts for escaped delimiters and newlines:

(?x: ("(\\"|[^"\r]|\\\r)*") | (?# Double-quote) ('(\\'|[^'\r]|\\\r)*') | (?# Single-quote) (?s: (?# Here-document) <<-?(?P<delim>\w+) .*? (?P=delim) ) )

Structural Elements

The regular-expression pattern strategy identifies functions by the Function Pattern expression, supplemented by the Skip Pattern. If you use either of these tags, all of the tokens listed the in the String-Based Patterns section above (string, comment, and structural) will be ignored.

Function Pattern

string (PCRE)

optional

This regular-expression pattern identifies the whole body of a function in your language, and its name. BBEdit will offer to fold the function body, and will suspend its search for additional functions until after the body of the current function. It uses the name of the function to index it in the function pop-up of the navigation bar.

The function pattern should include two named subpatterns, function for the span of the complete function, and function_name to pinpoint the name. If you don’t designate a subpattern named "function", the whole Function Pattern will be used to identify functions. If you omit the function_name subpattern, BBEdit will not include the function in the function pop-up.

The following is a simplified pattern that recognizes a method declaration in Ruby:

(?x: (?P<leadspace>^\s*) (?P<function> def\s+ (?P<function_name> [a-zA-Z0-9_!?]+ ) (?s:.+?) \r(?P=leadspace)end\b ) )

The pattern captures a run of whitespace characters at the beginning of a line, followed by the keyword def, which begins the function, and thus the named function pattern.
Next comes a run of symbol characters, which is captured as the function_name.
The function body is a run of any characters (including newlines, as specified by the ?s: pattern modifier)…
… that ends with a newline, the same run of whitespace that came before the def keyword (captured in the named backreference leadspace), and the keyword end.

Note:: The pattern is simplified by the forlorn assumption that the end keyword will be indented by exactly the same run of whitespace characters as came before the def.

As with all CLM regular-expression patterns, backreferences to subpatterns must be named — (?P=…) — and not positional (\1, \2, etc.).

BBLMScansFunctions must be <true/> for the function pattern to be used. Because function scanning is turned off while a function pattern matches, embedded functions will not be indexed.

Skip Pattern

string (PCRE)

optional

When searching for function definitions, BBEdit advances through the text, character-by-character, until it detects a possible match. It then tests the ensuing character run. If the whole run matches the Function Pattern, it is indexed and folded as a function. If not, BBEdit advances to the next character and attempts to match a function from there.

This strategy is not perfect, because if a string or comment includes an apparent function definition, BBEdit will index and fold that, as well.

The Skip Pattern overrides this behavior. If the Function Pattern does not match, BBEdit attempts to match the Skip Pattern. If that succeeds, it will suspend function matching until the skip text ends — saving BBEdit from trying to reapply the function pattern character-by-character.

If you supply a String Pattern, the Skip Pattern can skip strings by including a (?P>string) subpattern; similarly, (?P>comment) will skip comments if you provided a Comment Pattern. A skip pattern that simply protects strings and comments would be:

(?x: (?P>string) | (?P>comment) )

Examples

Here are a few examples for “My Language,” a severe subset of the Ruby scripting language.

The Bare Minimum

The first example shows the smallest codeless language module that will color keywords in a My Language file:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0">  <dict>  <key>BBEditDocumentType</key> <string>CodelessLanguageModule</string>  <key>BBLMLanguageDisplayName</key> <string>My Language</string> <key>BBLMLanguageCode</key> <string>MyL!</string>  <key>BBLMColorsSyntax</key> <true/>  <key>BBLMKeywordList</key> <array> <string>class</string> <string>def</string> <string>if</string> <string>elsif</string> <string>else</string> <string>end</string> <string>do</string> <string>for</string> <string>return</string> </array>  <key>BBLMCommentLineDefault</key> <string>#</string>  <key>Language Features</key> <dict> <key>Identifier and Keyword Character Class</key> <string>A-Za-z0-9_\?!</string> </dict> </dict> </plist>

There are a few things to notice here:

If you don’t want the module to do anything other than take up space in the language pop-up, just specify BBEditDocumentType, BBLMLanguageDisplayName, and BBLMLanguageCode. But this is the minimal useful language module.
For syntax coloring, you must turn on BBLMColorsSyntax.
For something to color, you must provide a list of keywords in BBLMKeywordList.
BBEdit requires that you provide a method — line or block — for commenting-out lines with the Un/Comment Selection command. In this case, BBLMCommentLineDefault specifies the # line-comment token.
You must also specify how to identify keywords, in the Language Features dictionary, using either Identifier and Keyword Characters or Identifier and Keyword Character Class.
You don’t have to provide a BBLMSuffixMap or BBLMFileNamesToMatch list. Without them, your user will have to pick the language out of the pop-up.

String patterns

This is where you commit to describing language features either by tokens or through regular expressions. Let’s do it the simple way first.

Comments

Add comment highlighting by adding these keys to the Language Features dictionary:

<key>Open Block Comments</key> <string>=begin</string> <key>Close Block Comments</key> <string>=end</string> <key>Open Line Comments</key> <string>#</string>

Strings

We’ll just add double-quoted strings to the Language Features dictionary — the only difference for single-quoted strings is that the keys will end in 2 instead of 1, and the open and close tokens will be single quotes instead of double:

<key>Open Strings 1</key> <string>"</string> <key>Close Strings 1</key> <string>"</string> <key>Escape Char in Strings 1</key> <string>\</string> <key>End-of-line Ends Strings 1</key> <true/>

Structures

Ruby doesn’t have a single pair of tokens to group statements into blocks, but braces are reliable:

<key>Open Statement Blocks</key> <string>{</string> <key>Close Statement Blocks</key> <string>}</string>

There is no distinction between functions and procedures:

<key>Prefix for Functions</key> <string>def</string> <key>Prefix for Procedures</key> <string>def</string>

… and parameter lists are parenthesized:

<key>Open Parameter Lists</key> <string>(</string> <key>Close Parameter Lists</key> <string>)</string>

Regular-Expression Patterns

If, instead, you adopt the regular-expression-based strategy, you need only provide one string, comment, function, and skip pattern each:

Strings

Provide one pattern to recognize every style of string your language supports. If you supply a string pattern, the simple-string matching patterns are ignored:

<key>String Pattern</key> <string><![CDATA[ (?x: ("(\\"|[^"\r]|\\\r)*") | (?# Double-quote) ('(\\'|[^'\r]|\\\r)*') | (?# Single-quote) (?s: <<-?(?P<delim>\w+) (?# Here-document) .*? (?P=delim) ) ) ]]></string>

Comments

Add one Comment Pattern to identify all types of comments:

<key>Comment Pattern</key> <string><![CDATA[ (?x: (\#.*?$) | (?# Line comment) (?s:^=begin\b.+?^=end\b) (?# Block comment) ) ]]></string>

If you provide either String Pattern or Comment Pattern, all string-based patterns for strings and comments will be ignored.

Functions

You’ve seen this pattern before — it defines the span of functions (the function subpattern), and the function name (function_name).

<key>Function Pattern</key> <string><![CDATA[ (?x: (?P<leadspace>^\s*) (?P<function> def\s+ (?P<function_name> [a-zA-Z0-9_!?]+ ) (?s:.+?) \r(?P=leadspace)end ) ) ]]> </string>

Skip

The Skip Pattern turns off function matching in sections of the text, such as comments and strings:

<key>Skip Pattern</key> <string><![CDATA[ (?x: (?P>comment) | (?P>string) )]]></string>

Tips

Here are some ideas to make it easier to build a codeless language module:

The easiest way to start a CLM of your own is to modify an existing one. Bare Bones offers a useful collection at https://www.barebones.com/support/bbedit/plugin_library.html.
Apple provides a tool, plutil, that will, among other things, verify that a property list file is well-formed. (See man plutil in the Terminal for other options.) In the Terminal, type

plutil /path/to/your.plist

… where you substitute the path to the CLM property list you want to check. The easiest way to get the path into Terminal is to drag the file from the Finder into the Terminal window. Or, you can select Copy Full Path from the File Path: popup in the toolbar of a BBEdit window, and paste into a Terminal window using the Paste Escaped Text (control-command-V) command.

Press return, and plutil will give you an OK, or guidance if it finds an error.

As you grow your module, it’s likely that you’ll run into errors that lead BBEdit to reject the file. If you have a file in your language open, the code won’t be colored, and its language will be set to “(none).”

If you repair the plist, and relaunch BBEdit, your test file will still be un-colored, because it lost its language setting. Before you start looking for another error, check the language pop-up. If the CLM is still wrong, your language won’t appear; if it does, you can set it.
The CLM loader itself can help you. BBEdit validates codeless language modules before loading them. If your module doesn’t take effect, open the Console application (in /Applications/Utilities) and look for BBEdit’s diagnostic messages.
Regular-expression patterns can be picky. There are ways to make them easier to write:

Open a new document in BBEdit, set its language to “Grep Search Pattern,” and build your patterns there. Having a full text editor, especially with syntax coloring, will make the job easier.
As you build your pattern, paste the whole pattern into the Find window, and use it to search a sample file in your language. Debug it by trying subpatterns. Remember that positional backreferences will not work in CLM regular-expression patterns; use named subexpressions.
Patterns are easier to understand if you can put each subpattern on its own line. You can do this by enclosing the pattern in (?x: … ). When you paste an extended pattern into the Find dialog, remember not to have an empty line after the closing parenthesis: It’s outside the extended pattern, and BBEdit must try to match it literally.
If complex patterns are not carefully crafted, the number of potential matches can be practically infinite, and the matching process will “run away.” BBEdit will cut matching off if any match takes more than 1.5 seconds, and the rest of the file will not be colored. The console log will contain a message if this occurs.

Property list files are XML, which reserves the <, >, and & characters exclusively for controlling the structure of the file. If you want to use those characters in-line, you must escape them as XML entities: <, >, and &.

This reference shows an alternative: Wrap the pattern in a <![CDATA[ … ]]> block, and you don't have to escape the special characters.

For readability, the CDATA blocks in this appendix put newlines and spaces before and after patterns. Ordinarily, this would mean that those patterns wouldn't match unless exactly that whitespace surrounded the target text. However, in order to make regular expressions easier to write, BBEdit strips leading and trailing whitespace from regular expressions wrapped as CDATA.

Index of CLM Tags

BBEditDocumentType
BBLMCanSpellCheckCodeRuns
BBLMColorsSyntax
BBLMCommentLineDefault
BBLMCommentPrefixDefault
BBLMCommentSuffixDefault
BBLMFileNamesToMatch
BBLMIsCaseSensitive
BBLMKeywordList
BBLMLanguageCode
BBLMLanguageDisplayName
BBLMLanguageSuffix
BBLMPredefinedNameList
BBLMPreferredFilenameExtension
BBLMReferenceSearchURLTemplate
BBLMScansFunctions
BBLMSuffixMap
BBLMSupportsTextCompletion
Close Block Comments
Close Parameter Lists
Close Statement Blocks
Close Strings 1
Close Strings 2
Comment Pattern
End-of-line Ends Strings 1
End-of-line Ends Strings 2
Escape Char in Strings 1
Escape Char in Strings 2
Function Pattern
Identifier and Keyword Character Class
Identifier and Keyword Characters
Language Features
Open Block Comments
Open Line Comments
Open Parameter Lists
Open Statement Blocks
Open Strings 1
Open Strings 2
Prefix for Functions
Prefix for Procedures
Skip Pattern
String Pattern
Terminator for Prototypes 1
Terminator for Prototypes 2

Credits

The revised, corrected, and expanded Codeless Language Module reference was written by Fritz Anderson, with technical review by Ryan Wilcox and Rich Siegel.

Newsflash(es)

Download:

Looking for TextWrangler?

User manuals:

Lost your serial number?