Juha-Matti Santala
Community Builder. Dreamer. Adventurer.

Syntax highlight all the things

Almost every developer, who has seen syntax highlighted code in their editor, would never want to go back to monocolor coding. It's been a key feature in code editors for ages and a thing we don't often think about.

As a speaker and blogger however, I've ended up thinking about them quite a lot. I want to make the code on my blog posts and talk slides easy to read. For the web, one of the top libraries to provide syntax highlighting is called Prism.js by Lea Verou and contributors. They have a huge collection of different languages and different themes so as a blogger, all you need to do is add a Javascript and a CSS file, add class="language-javascript" to your <code> blocks in HTML and Prism will take care of the rest.

I use Prism on this website as well and I've been very very happy with it.

But programming languages are not the only things that have structure you may want to highlight or colorize. One such format I've worked with a lot in my life is Pokemon TCG deck lists exported from/imported to Pokemon TCG Online software.

A deck list looks like this:

****** Pokémon Trading Card Game Deck List ******

##Pokémon - 16

* 1 Charmander VIV 23
* 1 Entei CEC 28
* 1 Heatmor RCL 34
* 1 Heatran LOT 48
* 1 Litwick RCL 31
* 1 Salandit SSH 27
* 1 Sizzlipede SSH 38
* 1 Tepig BLW 15
* 1 Centiskorch SSH 39
* 1 Charmeleon GEN 104
* 1 Lampent RCL 32
* 1 Pignite BLW 17
* 1 Salazzle UNB 31
* 1 Chandelure RCL 33
* 1 Charizard VIV 25
* 1 Emboar BLW 20

##Trainer Cards - 32

* 1 Marnie PR-SW 121
* 1 Ball Guy SHF 57
* 1 N FCO 105
* 1 Professor Juniper PLF 116
* 1 Evosoda GEN 62
* 1 Ultra Ball DEX 102
* 1 Pokémon Fan Club UPR 155
* 1 Timer Ball SUM 134
* 1 Lysandre FLF 90
* 1 Giant Hearth UNM 197
* 1 Welder UNB 189
* 1 Heavy Ball BKT 140
* 1 Scorched Earth PRC 138
* 1 Guzma BUS 115
* 1 Bird Keeper DAA 159
* 1 Quick Ball SSH 179
* 1 Escape Rope PLS 120
* 1 Stadium Nav UNM 208
* 1 Cynthia UPR 119
* 1 Colress PLS 118
* 1 Air Balloon SSH 156
* 1 Fire Crystal UNB 173
* 1 Brigette BKT 161
* 1 Tate & Liza CES 148
* 1 Rare Candy PLB 85
* 1 Blacksmith FLF 88
* 1 Float Stone BKT 137
* 1 Nest Ball SUM 123
* 1 Evolution Incense SSH 163
* 1 Level Ball NXD 89
* 1 Order Pad UPR 131
* 1 Trainers' Mail AOR 100

##Energy - 12

* 1 Heat {R} Energy DAA 174
* 1 Burning Energy BKT 151
* 10 Fire Energy GEN 76

Total Cards - 60

****** Deck List Generated by the Pokémon TCG Online www.pokemon.com/TCGO ******

Most often you just copy-paste these from different sources into PTCGO, build yuor deck and start playing. But as I was writing my gaming blog post on the Gym Leader Challenge format, I wanted to make the deck lists a bit prettier.

So I decided to learn how to extend Prism.js with a custom syntax parser. I started from looking at the documentation page for extending.

Extending Prism.js

There are couple of key structures in extending the Prism with custom definitions:

  1. Adding a language to Prism.languages by creating a new property that shares the name with what you want the name in language-[name] be. In my case, I added a new object into Prism.languages.ptcgo
  2. In simplest form, a parser is a collection of names and patterns, executed in order.

For example, in the above deck list, the first and the last line start with a bunch of asterisk.

Prism.languages.ptcgo = {
    ptcgo_title: /\*\*\*\*\*\*.*/
}

This definition creates a token named ptcgo_title. I decided to go with prefix/namespaced tokens to make sure they don't accidentally overlap with other blocks.

Here's the current full definition. Compared to programming languages, there's way less flexibility on what people can do as the structure is very clearly formatted.

Prism.languages.ptcgo = {
  ptcgo_title: /\*\*\*\*\*\*.*/,
  ptcgo_subheader: /##.*/,
  ptcgo_total: /Total Cards \- \d+/,
  ptcgo_set: /([A-Z-]{3,5}|Energy) \d+/,
  ptcgo_quantity: /\d+/,
  ptcgo_card_name: /[A-Za-z'&{}é]+/,
};

Since the format is quite unstructured in a way (it's just text), I had difficulties in differentiating for example the numbers and some of the text. With a bit of exploration and experimenting, I discovered that the tokenization happens in order of the definitions so by having the ptcgo_total before ptcgo_card_name, it doesn't match "Total Cards" into ptcgo_card_name. Same with the number in ptcgo_set compared to ptcgo_quantity. I'm not 100% sure if this is part of the spec or if I just got lucky.

This is all that's needed to get Prism.js to tokenize code in <code> blocks with a rather simple custom language definition. For more complex definitions, the library offers more options, of which you can learn at the official documentation.

Combined with my CSS declarations for these, we get the same deck list as above but in color:

****** Pokémon Trading Card Game Deck List ******

##Pokémon - 16

* 1 Charmander VIV 23
* 1 Entei CEC 28
* 1 Heatmor RCL 34
* 1 Heatran LOT 48
* 1 Litwick RCL 31
* 1 Salandit SSH 27
* 1 Sizzlipede SSH 38
* 1 Tepig BLW 15
* 1 Centiskorch SSH 39
* 1 Charmeleon GEN 104
* 1 Lampent RCL 32
* 1 Pignite BLW 17
* 1 Salazzle UNB 31
* 1 Chandelure RCL 33
* 1 Charizard VIV 25
* 1 Emboar BLW 20

##Trainer Cards - 32

* 1 Marnie PR-SW 121
* 1 Ball Guy SHF 57
* 1 N FCO 105
* 1 Professor Juniper PLF 116
* 1 Evosoda GEN 62
* 1 Ultra Ball DEX 102
* 1 Pokémon Fan Club UPR 155
* 1 Timer Ball SUM 134
* 1 Lysandre FLF 90
* 1 Giant Hearth UNM 197
* 1 Welder UNB 189
* 1 Heavy Ball BKT 140
* 1 Scorched Earth PRC 138
* 1 Guzma BUS 115
* 1 Bird Keeper DAA 159
* 1 Quick Ball SSH 179
* 1 Escape Rope PLS 120
* 1 Stadium Nav UNM 208
* 1 Cynthia UPR 119
* 1 Colress PLS 118
* 1 Air Balloon SSH 156
* 1 Fire Crystal UNB 173
* 1 Brigette BKT 161
* 1 Tate & Liza CES 148
* 1 Rare Candy PLB 85
* 1 Blacksmith FLF 88
* 1 Float Stone BKT 137
* 1 Nest Ball SUM 123
* 1 Evolution Incense SSH 163
* 1 Level Ball NXD 89
* 1 Order Pad UPR 131
* 1 Trainers' Mail AOR 100

##Energy - 12

* 1 Heat {R} Energy DAA 174
* 1 Burning Energy BKT 151
* 10 Fire Energy GEN 76

Total Cards - 60

****** Deck List Generated by the Pokémon TCG Online www.pokemon.com/TCGO ******

Find it in GitHub

If you want to add this custom PTCGO highlighting to your own website, you can find the extension in Github: hamatti/prism-extension-ptcgo. It's open sourced with MIT licence so do what you wish with it.

Syntax Error

Sign up for Syntax Error, a monthly newsletter that helps developers turn a stressful debugging situation into a joyful exploration.