Using Tags With Multilingual Jekyll Sites
The static site generator Jekyll stores tags site-wide. You need a little Ruby code if you want to want to maintain them per-language instead, as you would want on a multi-lingual site. In a previous post "Multilingual Web Sites with Jekyll" I had described how to set up a site with Jekyll supporting documents in multiple languages. One thing that was still missing was support for tags respectively keywords.
.
Requirements
I create category pages - that is pages listing all posts from a particular category - manually because I want to add a description to them anyway. For tag pages I prefer to have them maintained automatically.
I also want to be able to allow tag sets for languages to overlap, so that one and the same tag could be used for documents in different languages. The tag page should, however, only list articles for one language at a time.
Using "jekyll-tagging"
The standard plug-in for generating tag pages seems to be jekyll-tagging. I configured it in _config.yml
like this:
tag_page_layout: tag_page
tag_page_dir: tags
tag_feed_layout: tag_feed
tag_feed_dir: tags
tag_permalink_style: pretty
Note: For real usage, you also have to tell jekyll to use the plug-in but I will skip that step because the solution shown here will be different anyway.
The layout template _layout/tag_page.html
is used for generating the tag pages (line 1), and they should be written into /tags/
(line 2). I also wanted RSS feeds for each tag. They are configured in the same manner in lines 3 and 4.
Finally we tell the plug-in to use pretty links (line 5).
The template _layout/tag_page.html
looks like this:
---
layout: default
---
{% assign posts=site.tags[page.tag] | where: "lang", page.lang | where: "type", "posts" %}
<div class="col-md-8">
<span>{{ site.t[page.lang].tag }}: <i class="fa fa-tag"></i>{{ page.tag }}</span>
{% for post in posts %}
<article class="blog-post">
{% if post.image %}
<div class="blog-post-image">
<a href="{{post.url | prepend: site.baseurl}}">
<img src="{{post.image}}" alt="{{post.image_alt}}">
</a>
</div>
{% endif %}
<div class="blog-post-body">
<h2>
<a href="{{post.url | prepend: site.baseurl}}">{{ post.title }}</a>
</h2>
<div class="post-meta">
<span><i class="fa fa-clock-o"></i>{% include {{ page.lang }}/long_date.html param=post.date %}</span> {% if post.comments %} / <span><i
class="fa fa-comment-o"></i> <a href="#">{{ post.comments }}</a></span>
{% endif %}
</div>
<p>{{post.excerpt}}</p>
<div class="read-more">
<a href="{{post.url}}">Continue Reading</a>
</div>
</div>
</article>
{% endfor %}
</div>
This only interesting line is line 4. The collection site.tags
is filled by Jekyll. We use the document attribute tag
as the lookup key into that hash and filter that by the document language and document type. Unfortunately that does not work.
The first problem is that jekyll-tagging
does not know about a document attribute lang
and therefore cannot set it. And it only creates one tag page for each tag. But we want one tag page for each tag and for each language that uses it.
Writing a Wrapper Around jekyll-tagging
The only solution was to write a wrapper around the plug-in. Unfortunately I had never written a line of Ruby code before. But the task looked so trivial to me that I decided to give it a try unbiased by any Ruby knowledge.
My plan was to abuse the configuration option ignore_tags
of jekyll-tagging
for my purposes. Instead of invoking the plug-in once, the wrapper should invoke it for every language, each time giving the plug-in a modified configuration, especially setting the value of ignore_tags
to the list of tags that did not occur for the current language.
Skeleton For The Plug-In
I created a file _plugins/ml_tagging.rb
:
require 'jekyll/tagging'
module Jekyll
class MultiLangTagger < Tagger
@types = [:page, :feed]
def generate(site)
# Generate some pages.
end
end
end
I called my own plug-in MultiLangTagger
and subclassed it from Tagger
(line 4), the generator class defined by jekyll-tagging
.
Line 6 looks suspicious. It defines a class variable and was copied one to one from the original source of jekyll-tagging
. Obviously my own plug-in did not inherit the variable from the super class. Somebody with more profound Ruby knowledge than me can probably explain that.
The only method that Jekyll generator plug-ins have to implement is generate
, see line 8. Its single argument is a Jekyll::Site
instance that can be used to retrieve configuration, pages, posts, tags and so on.
Grouping Tags
The first task was to group the tags used by language. I modified the generate
method as follows:
def generate(site)
# Iterate over all posts and group the tags by language.
for post in site.posts.docs do
lang = post.data['lang']
site.config['t'][lang]['tagnames'] =
{} unless site.config['t'][lang]['tagnames']
tagnames = site.config['t'][lang]['tagnames']
tags = post.data['tags']
for tag in tags do
slug = jekyll_tagging_slug(tag)
if tagnames[slug]
if tagnames[slug] && tagnames[slug] != tag
raise "Tag '%{tag1}' and tag '%{tag2}' will create the same filename. Change one of them!" % { :tag1 => tagnames[slug], :tag2 => tag }
end
else
tagnames[slug] = tag
end
end
end
end
site.posts.docs
is a hash that contains all posts. For each post I first retrieve the language from the attribute lang
. On my site it is guaranteed to be set, so there is no check wether the attribute exists or not.
My site configuration already contains a key t
like translation that contains string translations for each supported language. I decided to stuff the tags into that structure with a key tagnames
(lines 5-6) because there is an analogous slot catnames
for categories.
In line 9 I iterate over all tags for the current post. In the next line I normalize the tag into a file-system-safe form by calling the helper function jekyll_tagging_slug()
. jekyll-tagging
uses that function for determining the name of the output file.
For my site it is important that there is a one-to-one relationship between tags and the name of the corresponding tag page. Lines 11 to 17 enforce that. That step is not strictly necessary but rather a QA measure. I want to ensure a consistent spelling of tags.
The result is a data structure that - translated from Ruby into YAML - would look like this in _config.yml
:
t:
en:
dns: DNS
system-administration: "System Administration"
jekyll: Jekyll
development: Development
de:
dns: DNS
systemadministration: "Systemadministration"
jekyll: Jekyll
entwicklung: Entwicklung
Invoking the Super Class Generator
Now that the tags are grouped, the generate
method of the super class has to be invoked but with a tweaked configuration for each language. The following code added to the generate
method does the job:
saved_tag_page_dir = site.config['tag_page_dir']
saved_tag_feed_dir = site.config['tag_feed_dir']
for lang in site.config['t'].keys
site.config['tag_page_dir'] = '/' + lang + '/' + saved_tag_page_dir
site.config['tag_feed_dir'] = '/' + lang + '/' + saved_tag_feed_dir
site.config['ignored_tags'] = site.tags.keys - site.config['t'][lang]['tagnames'].values
super
end
We need distinct output directories for each language. Otherwise tag pages for tags that are present in multiple languages would overwrite each other. In line 1 and 2, I get the locations for the tag pages from the configuration and store a copy of the original values.
Then I iterate over all available languages (that are the keys of the hash slot t
) and overwrite the configuration variables tag_page_dir
and tag_feed_dir
with the language-specific location. I played simple here and just prepend the language identifier to the original configuration values.
Line 7 is important. jekyll-tagging
uses the configuration variable ignored_tags
for suppressing tag pages for particular tags. I abuse that and temporarily fill it with an array that contains the difference between all tags and the tags used for the current language.
Line 9 calls the super method, that is the generate
method of jekyll-tagging
, and let's it do its job.
At that point I ran into a show stopper. As it turned out, jekyll-tagging
version 1.0.1 has a bug and the ignore mechanism actually does not work, see the bug report on github for details.
Unfortunately, I did not succeed in monkey-patching the bug away, and I ended up patching the source file manually. Search for the file tagging.rb
from the jekyll-tagging
distribution and change the method active_tags
to read as follows:
def active_tags
return site.tags unless site.config["ignored_tags"]
site.tags.reject { |t| site.config["ignored_tags"].include? t }
end
Alternatively, wait for the bug to be fixed upstream.
One part of the solution is still missing. We have to make sure that the tag pages all have an attribute lang
containing the language code.jekyll-tagging
does not have hooks for injecting additional data. Therefore we do that ourselves:
for page in site.pages
if page.data['tag']
dir = Pathname(page.url).each_filename.to_a
lang = page.data['lang'] = dir[0]
description = site.config['t'][lang]['taglist']['description']
page.data['description'] = description % { :tag => page.data['tag'] }
end
end
Still in our generate
method, we iterate over all pages. Short of any better method for detecting tag pages, we check whether the document attribute tag
is present, and extract the first path name component from the document URL. Note that in order to be able to use Pathname()
, you have to add a "require 'pathname'
" to the beginning of the file!
While we are at it, we also pimp up the generated tag pages by giving them a description. The description comes from a language-specific string in _config.yml
and may contain a placeholder %{tag}
for the current tag. Line 6 interpolates the tag into that string.
Finally, we have to restore the configuration to its original values because they are used at the next invocation of our plug-in:
site.config['tag_page_dir'] = saved_tag_page_dir
site.config['tag_feed_dir'] = saved_tag_feed_dir
At this point, our implementation is more or less functional. Our own generate
method calls generate
of jekyll-tagging
once for each language and with a modified configuration. And we also inject two attributes lang
and description
into the generated pages.
More Tweaks
Jekyll obviously detects that another generator plug-in has been loaded and insists on executing it as well, which leads to problems. In my particular site setup the liquid template engine bails out if a page or post does not have a lang
attribute. Therefore, it has to be prevented that the original plug-in generates pages without a configuration patched by us.
I could not find a way to make Jekyll prevent calling the super class generator. Therefore, after my plug-in has invoked the super class generator for each language, I set the list of tags to be ignored to the complete tag list of the site:
site.config['ignored_tags'] = site.tags.keys
Now the super method gets invoked but it will not generate any pages.
Another problem was that I wanted to display the count of documents for each tag in overview pages. I solved that by writing one more plug-in, this time a hook plug-in that computes these counts and saves them in the site configuration. And since I actually needed the same for categories, I precomputed the category counts as well.
You can find a download link to that file below.
Questions
How can I link to a tag page?
Like this:
<a href="/{{ page.lang }}{{ tag | tag_url }}">{{% tag %}}</a>
The filter tag_url
is defined by jekyll-tagging
and gives the URL of a tag page. We have to prepend the language.
How can I get the number of documents for a tag per language?
Like this:
{{ site.tagcounts[page.lang][tag] }}
The count is provided by the hook precompute.rb
.
How can I get the number of documents in a category per language?
Likewise:
{{ site.catcounts[page.lang][category] }}
The count is provided by the hook precompute.rb
.
How do I create language-specific tag clouds?
Tag clouds? Are you kidding me? It is 2016!
Downloads
You can download all source files needed below:
- _plugins/ml_tagging.rb
- wrapper plug-in for `jekyll-tagging`
- _plugins/precompute.rb
- hook plug-in that precomputes tag and category counts
- _layouts/tag_page.html
- template for tag pages
- _layouts/tag_feed.xml
- template for tag feeds
- _includes/feed.xml
- include for all feeds
- _includes/tag-feeds.xml
- include for listing all tag feeds for a language
Leave a comment