Add and use an image sitemap with Hugo

Not so far ago I elaborated on adding copyright information for images on a Hugo-based website. Through this approach, I managed to learn how to get images per post, then list and use in Schema for Images.
Recently, when I read updated Google image SEO best practices I noticed a section called Use an image sitemap where we read:
“You can provide the URL of images we might not have otherwise discovered by submitting an image sitemap”.
I decided to see how I can implement that into my Hugo website.
I started by looking at the example of image sitemap and decided to give it a try.
Hugo gives you flexibility in adding additional output files. Through this, I decided to create a sitemap for images.
Configuration
I started with specifying additional output for my sitemap in the Hugo configuration file (hugo.toml
).
[outputFormats.imagessitemap]
baseName = 'imagessitemap'
mediaType = 'application/xml'
noUgly = true # default is false
Further, in the same file, I added it to my outputs.
[outputs]
page = [ "html"]
home = [ "html", "rss", "imagessitemap"]
Multilingual adjustment
If you using a multilingual approach you may need to do an additional action here.
My Polish site is served in pure baseURL /
where my English part is /en/
.
The above output will generate the imagesitemap.xml
file in /
(Polish) and /en/
(English).
The problem is that, with a multilingual approach, the /sitemap.xml
is not a sitemap for the Polish part of the website. This file is serving as a Sitemap Index file listing sitemaps in language folders.
For the Polish part of the website, despite it being in /
, the sitemap with posts will be /pl/sitemap.xml
and for English /en/sitemap.xml
.
This may be a bit of OCD, but I would like my sitemap for images to be generated in the language folder. I would like the Polish part to be in /pl/imagessitemap.xml
. To do that, I need to add in the configuration file, in my main language section, the following:
[languages.pl.outputFormats.imagessitemap]
baseName = 'imagessitemap'
mediaType = 'application/xml'
noUgly = true # default is false
path = 'pl'
For the English part, it is not required as it will, by default, generate inside /en/
folder.
Layout
Now we need to create our custom output layout.
I created file called home.imagessitemap.xml
in layouts\_default
folder.
{{ printf "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?>" | safeHTML }}
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
{{ range .Site.RegularPages }}{{ if ne .Params.sitemap_exclude true }}
{{ if or (.Params.featuredImage) (findRE `(?s)<img.+?>` .Content) }}
{{- if .Permalink -}}
<url>
<loc>{{ .Permalink }}</loc>{{ if .Params.featuredImage }}
<image:image>
<image:loc>{{ .Params.featuredImage | absURL }}</image:loc>
</image:image>{{ end }}{{ if (findRE `(?s)<img.+?>` .Content) }}{{ range $k, $_ := findRE `(?s)<img.+?>` .Content }}{{ if $k }}{{ end }}
<image:image>
<image:loc>{{ replaceRE `(?s).*src="(.+?)".*` "$1" . | absURL }}</image:loc>
</image:image>{{ end }}
{{ end }}
</url>{{- end -}}
{{ end }}
{{ end }}{{ end }}
</urlset>
In this file, I already reused exclusion ({{ if ne .Params.sitemap_exclude true }}
) for posts that got specified sitemap_exclude: true
in frontmatter.
I followed it with a condition if or
where I will list the post only if got either featuredImage
specified in the frontmatter or any image added through the content utilising findRE
function that I learned before.
In this way, the post URL, specified between <loc>
will appear only if there are images to report.
By running Hugo locally we can verify, under localhost:1313/imagessitemap.xml
, or if we have multilanguage site localhost:1313/en/imagessitemap.xml
or localhost:1313/pl/imagessitemap.xml
are our site is there.
Announcing new sitemaps
The last bit is to report the sitemap in such a way, that Google and other search engines will be able to detect them.
The best approach is to report it in the robots.txt
file that is normally located in our static\
folder.
If you don’t know how to create
robots.txt
file, read How to write and submit a robots.txt file at Google Search Central.
Below other sitemaps that may be already there I added the following:
Sitemap: https://dariusz.wieckiewicz.org/pl/imagessitemap.xml
Sitemap: https://dariusz.wieckiewicz.org/en/imagessitemap.xml
To speed things up, I manually added them in the sitemaps section at Google Search Console.
And that’s all!