Static Site Generator Fulltext Search
Site-wide search using pagefind
2025-10-26 (updated 2025-10-27)Posts | 3 min read
#hugo #pagefind #search
Series: Tech Behind the Blog
I recently rebuilt my blog from scratch, see the previous post in this series.
One thing I wanted to add was site-wide search without any external dependencies. Since I have previous experience with a tool called pagefind, which is perfect for this use case, I implemented it with that.
In this post I will go over my implementation of pagefind on top of hugo,
but it is equally applicable to other static site generators or any folder full of .html files.
Setting Up Pagefind
Pagefind does not require much, you point it at a folder of .html files
and it builds a search index, JavaScript search API and prebuilt UI into a subfolder.
Additionally you can configure a lot of things using either command-line arguments or a pagefind.yaml file.
The three settings I find most important are these:
site: public
output_subdir: pagefind
exclude_selectors:
- code
Which tell pagefind to look into public/ for building the index, to output it to public/pagefind
and crucially to exclude code elements from indexing.
I originally included code elements in the index, but this lead to a bad search experience with loads of false prositive results.
Integrating Search
With the index and integration JavaSCript created by pagefind we can create a search page using it.
For me this is a simple page inside hugo:
---
title: Search
showReadingTime: false
showToc: false
showAd: false
keywords:
- search
description: Search through my site
---
<div id="pagefind"></div>
<link href="/pagefind/pagefind-ui.css" rel="stylesheet" />
<script src="/pagefind/pagefind-ui.js"></script>
<script>
window.addEventListener("DOMContentLoaded", (event) => {
new PagefindUI({
element: "#pagefind",
pageSize: 20,
showSubResults: true,
showImages: false,
excerptLength: 50,
openFilters: ["tag"],
autofocus: true,
highlightParam: "highlight",
});
});
</script>
Tuning
Search is working and we could stop here, but we can further improve things.
Indexing
By default pagefind includes all content on all pages in it’s index, which leads to things like post metadata, recommended posts or list views showing up in the results besides the actual content.
Modifying pagefinds indexing is thankfully very easy with a couple of HTML data attributes1:
data-pagefind-body- Once this is added to any page only pages with this attribute are included in the index
- Additonally only content below the element with this attribute will be included in the index
data-pagefind-ignore- Content inside any element with this attribute will be excluded from the index
Metadata
We can add arbitrary metadata2 to pages using data-pagefind-meta,
I chose to add created and modified.
There are many ways to do this, since I already have elements with these values I chose capture it from them directly:
<meta property="created"
content="{{ $page.Date.Format "2006-01-02" }}"
data-pagefind-meta="created[content]" />
<meta property="modified"
content="{{ $page.Lastmod.Format "2006-01-02" }}"
data-pagefind-meta="modified[content]" />
If you don’t have elements for your metadata already there are multiple other ways to add metadata.
Filters
We can specify filters for content3, which allows users to narrow down searches even further.
For me tags are a natural filter element and since I already render them I can capture filter values from these elements directly:
{{ range .Params.tags |sort }}
<a href="{{ ($.Site.GetPage (printf "/%s/%s" "tags" (. | urlize ))).Permalink }}"
data-pagefind-filter="tag">#{{ . }}</a>
{{ end }}
Automation
Having finished all of this I tied it into my GitLab CI based pipeline.
First, adding it to my custom hugo image:
# ...
ARG PAGEFIND_VERSION="v1.4.0"
RUN cd /tmp && \
wget https://github.com/Pagefind/pagefind/releases/download/${PAGEFIND_VERSION}/pagefind-${PAGEFIND_VERSION}-x86_64-unknown-linux-musl.tar.gz -O pagefind.tar.gz && \
tar -xzf pagefind.tar.gz && \
mv pagefind /usr/local/bin/pagefind && \
rm pagefind.tar.gz
# ...
And then using it inside my pipeline:
hugo:
# ...
image: registry.gitlab.com/mkamner/marco.ninja/hugo:latest
script:
# ...
# Render the site using hugo
- hugo build
# Build the search index with Pagefind
- pagefind
# ...
pages:
publish: src/public
environment: production