Custom org-sitemap-function post-Org 9.1

Published: December 27, 2022

I have been lugging around an old version of org-mode (9.0 to be specific) in the git repo which builds this website for a number of years now. I decided to do this because I had a custom org-sitemap-function to generate the landing page for my blog, but org 9.1 introduced a breaking change to the org-publish API.

I have now finally come around to fixing this issue and making my website compatible with modern emacs and org-mode versions higher than 9.1. However, porting my old sitemap function was… surprisingly difficult? So just in case someone is looking up my original post these days, this post contains a sitemap-function which will work in 2022.

In the pre-9.1 version, the org-sitemap-function would get the project-plist as argument. Post-9.1, the org manual states:

:sitemap-function

Plug-in function to use for generation of the sitemap. It is called with two arguments: the title of the site-map and a representation of the files and directories involved in the project as a nested list, which can further be transformed using org-list-to-generic, org-list-to-subtree and alike. Default value generates a plain list of links to all files in the project.

This makes it a little more difficult to produce the sitemap page that we want, but I was able to get it done by parsing each element of the list and extracting the path to the filename with the following function:

(defun my-blog-parse-sitemap-list (l)
  "Convert the sitemap list in to a list of filenames."
  (mapcar #'(lambda (i)
              (let ((link (with-temp-buffer
                            (let ((org-inhibit-startup nil))
                              (insert (car i))
                              (org-mode)
                              (goto-char (point-min))
                              (org-element-link-parser)))))
                (when link
                  (plist-get (cadr link) :path))))
          (cdr l)))

Finally, the new and improved sitemap function looks like this:

(defun my-blog-sort-article-list (l p)
  "sort the article list anti-chronologically."
  (sort l #'(lambda (a b)
              (let ((d-a (org-publish-find-date a p))
                    (d-b (org-publish-find-date b p)))
                (not (time-less-p d-a d-b))))))

(defun my-blog-sitemap (title list)
"Generate the landing page for my blog."
(with-temp-buffer
  ;; mangle the parsed list given to us into a plain lisp list of files
  (let* ((filenames (my-blog-parse-sitemap-list list))
         (project-plist (assoc "blog-articles" org-publish-project-alist))
         (articles (my-blog-sort-article-list filenames project-plist)))
    (dolist (file filenames)
            (let* ((abspath (file-name-concat my-website-blog-dir file))
                   (relpath (file-relative-name abspath my-website-base-dir))
                   (title (org-publish-find-title file project-plist))
                   (date (format-time-string (car org-time-stamp-formats) (org-publish-find-date file project-plist)))
                   (preview (my-blog-get-preview abspath)))
              ;; insert a horizontal line before every post, kill the first one
              ;; before saving
              (insert "-----\n")
              (insert (concat "* [[file:" relpath "][" title "]]\n"))
            ;; add properties for `ox-rss.el' here
            (let ((rss-permalink (concat (file-name-sans-extension relpath) ".html"))
                  (rss-pubdate date))
              (org-set-property "RSS_PERMALINK" rss-permalink)
              (org-set-property "PUBDATE" rss-pubdate))
            ;; insert the date, preview, & read more link
            (insert (concat "Published: " date "\n\n"))
            (insert preview)
            (insert "\n")
            (insert (concat "[[file:" relpath "][Read More...]]\n"))))
    ;; kill the first hrule to make this look OK
    (goto-char (point-min))
    (let ((kill-whole-line t)) (kill-line))
    ;; insert a title and save
    (insert "#+OPTIONS: title:nil\n")
    (insert "#+TITLE: Blog - Dennis Ogbe's Personal Website\n")
    (insert "#+AUTHOR: Dennis Ogbe\n")
    (insert "#+EMAIL: [email protected]\n")
    (buffer-string))))

This info can be combined with the instructions in my original post to cook up your own very special org-mode website.

Update December 29, 2022: I realized the info in the old post is very outdated. So here goes the current (late 2022) version of my website build script. I usually call this using emacs --batch, i.e., something like:

emacs --batch -l "./project.el" --eval="(org-publish \"blog\" t)"

As part of the build process, I use CSSTidy to minify my CSS and bibtex2html to generate the list of publications.

;; This file defines the org-publish project for my web site. -*- eval: (flycheck-mode -1) -*-

;; I can either run this from the build.sh script / Makefile or
;; evaluate this buffer and publish from within Emacs while editing
;; the page.

(defun generate-website (arg)
  "Generate my website. Call with prefix argument for a complete rebuild."
  (interactive "P")
  (message "Generating website for staging...")
  (if arg
      ;; force rebuild everything
      (org-publish "blog" t nil)
    ;; only rebuild what changed
    (org-publish "blog" nil nil))
  (message "Done. Check output in %s" my-website-out-dir))

;; I am not sure why I have to do it this way... This snippet finds
;; the parent directory of this file, which is the base directory of
;; the project.
(setq my-website-base-dir
      (file-name-as-directory
       (file-name-directory
        (directory-file-name
         (file-name-directory
          (or load-file-name buffer-file-name))))))

;; set up the rest of the directory tree
(defmacro my-website-set-path-var (name)
  (list 'setq (intern (format "my-website-%s-dir" name))
        (list 'file-name-as-directory (concat my-website-base-dir name))))
(my-website-set-path-var "bin")
(my-website-set-path-var "bib")
(my-website-set-path-var "blog")
(my-website-set-path-var "css")
(my-website-set-path-var "cv")
(my-website-set-path-var "dl")
(my-website-set-path-var "html")
(my-website-set-path-var "img")
(my-website-set-path-var "lisp")
(my-website-set-path-var "pages")

;; we pull the output directory out of an environment variable. If this
;; variable is not set, we bail
(setq my-website-out-dir (getenv "WEBSITE_OUT_DIR"))
(unless my-website-out-dir
  (setq my-website-out-dir (file-name-concat my-website-base-dir "www"))
  (message "Using default WEBSITE_OUT DIR: %s" my-website-out-dir))
(setq my-website-out-dir (file-name-as-directory my-website-out-dir))

;; [2022-12-27 Tue] This is now compatible with emacs 28.1. it
;; requires the `htmlize' and `org-contrib' packages.
(package-initialize)
(require 'org)
(require 'htmlize)
(require 'org-contrib)
(require 'ox-html)
(require 'ox-rss)

;; re-build the entire project if $WEBSITE_BUILD_TYPE=FULL
(when (and (getenv "WEBSITE_BUILD_TYPE")
           (string-equal (downcase (getenv "WEBSITE_BUILD_TYPE")) "full"))
  (setq org-publish-use-timestamps-flag nil))

;; html export settings
(setq org-export-html-coding-system 'utf-8-unix)
(setq org-html-htmlize-output-type 'css)

;; massage org-time-stamps
(setq org-time-stamp-custom-formats '("%B %d, %Y" . "%A, %B %d %Y, %H:%M"))
(defun my-org-export-ensure-custom-times (backend)
  (setq-local org-display-custom-times t))
(add-hook 'org-export-before-processing-hook 'my-org-export-ensure-custom-times)

;; we do not need backup files for this
(setq make-backup-files nil)

;; we evaluate some elisp to generate some html. this lets us do that.
(setq org-confirm-babel-evaluate nil)

(defun my-blog-extra-head (arg)
  (concat
   "<link rel='stylesheet' href='/../res/fonts.css' />\n" ; main css
   "<link rel='stylesheet' href='/../res/code.css' />\n" ; code highlighting
   "<link rel='stylesheet' href='/../res/main.css' />\n" ; main css
   (when arg "<link rel='stylesheet' href='/../res/blog.css' />\n") ; blog style
   "<link rel='shortcut icon' href='/../img/favicon.ico'>\n" ; favicon
   "<link rel='alternate' type='application/rss+xml' title='RSS Feed for ogbe.net' href='/blog.xml' />\n"))

;; header and footer

(defun my-blog-header (info)
  (with-temp-buffer
    (insert-file-contents (concat my-website-html-dir "header.html"))
    (buffer-string)))

(setq my-blog-footer
      (with-temp-buffer
        (insert-file-contents (concat my-website-html-dir "footer.html"))
        (buffer-string)))

(defun my-blog-org-export-format-drawer (name content)
  (concat "<div class=\"drawer " (downcase name) "\">\n"
          "<h6>" (capitalize name) "</h6>\n"
          content
          "\n</div>"))

(setq my-blog-local-mathjax
      '((path "/mathjax/tex-chtml.js")
        (scale "100") (align "center") (indent "2em") (tagside "right") (autonumber "AMS")
        (mathml nil)))
(setq my-blog-extra-mathjax-config
      "<script>
MathJax = {
  tex: {
    inlineMath: [['$', '$'], ['\\\\(', '\\\\)']]
  },
  svg: {
    fontCache: 'global'
  }
};
</script>")

(defun my-blog-get-preview (file)
  "The comments in FILE have to be on their own lines, prefereably before and after paragraphs."
  (with-temp-buffer
    (insert-file-contents file)
    (goto-char (point-min))
    (let ((beg (+ 1 (re-search-forward "^#\\+BEGIN_PREVIEW$")))
          (end (progn (re-search-forward "^#\\+END_PREVIEW$")
                      (match-beginning 0))))
      (buffer-substring beg end))))

(defun my-blog-parse-sitemap-list (l)
  "Convert the sitemap list in to a list of filenames."
  (mapcar #'(lambda (i)
              (let ((link (with-temp-buffer
                            (let ((org-inhibit-startup nil))
                              (insert (car i))
                              (org-mode)
                              (goto-char (point-min))
                              (org-element-link-parser)))))
                (when link
                  (plist-get (cadr link) :path))))
          (cdr l)))

(defun my-blog-sort-article-list (l p)
  "sort the article list anti-chronologically."
  (sort l #'(lambda (a b)
              (let ((d-a (org-publish-find-date a p))
                    (d-b (org-publish-find-date b p)))
                (not (time-less-p d-a d-b))))))

(defun my-blog-sitemap (title list)
  "Generate the landing page for my blog."
  (with-temp-buffer
    ;; mangle the parsed list given to us into a plain lisp list of files
    (let* ((filenames (my-blog-parse-sitemap-list list))
           (project-plist (assoc "blog-articles" org-publish-project-alist))
           (articles (my-blog-sort-article-list filenames project-plist)))
      (dolist (file filenames)
              (let* ((abspath (file-name-concat my-website-blog-dir file))
                     (relpath (file-relative-name abspath my-website-base-dir))
                     (title (org-publish-find-title file project-plist))
                     (date (format-time-string (car org-time-stamp-formats) (org-publish-find-date file project-plist)))
                     (preview (my-blog-get-preview abspath)))
                ;; insert a horizontal line before every post, kill the first one
                ;; before saving
                (insert "-----\n")
                (insert (concat "* [[file:" relpath "][" title "]]\n"))
              ;; add properties for `ox-rss.el' here
              (let ((rss-permalink (concat (file-name-sans-extension relpath) ".html"))
                    (rss-pubdate date))
                (org-set-property "RSS_PERMALINK" rss-permalink)
                (org-set-property "PUBDATE" rss-pubdate))
              ;; insert the date, preview, & read more link
              (insert (concat "/Published: " date "/\n\n"))
              (insert preview)
              (insert "\n")
              (insert (concat "[[file:" relpath "][/Read More.../]]\n"))))
      ;; kill the first hrule to make this look OK
      (goto-char (point-min))
      (let ((kill-whole-line t)) (kill-line))
      ;; insert a title and save
      (insert "#+OPTIONS: title:nil\n")
      (insert "#+TITLE: Blog - Dennis Ogbe's Personal Website\n")
      (insert "#+AUTHOR: Dennis Ogbe\n")
      (insert "#+EMAIL: [email protected]\n\n")
      (insert "@@html:<h1>Blog</h1>@@\n\n") ; this way the browser's tab shows ^ but the site shows <
      (buffer-string))))

;; pre- and post-processing
(defun my-blog-pages-preprocessor (project-plist)
  (message "In the pages preprocessor."))

(defun my-blog-pages-postprocessor (project-plist)
  (message "In the pages postprocessor."))

(defun my-blog-articles-preprocessor (project-plist)
  (message "In the articles preprocessor."))

(defun my-blog-articles-postprocessor (project-plist)
  "Massage the sitemap file and move it up one directory.

  for this to work, we have already fixed the creation of the
  relative link in the sitemap-publish function"
  (let* ((sitemap-fn (concat (file-name-sans-extension (plist-get project-plist :sitemap-filename)) ".html"))
         (sitemap-olddir (plist-get project-plist :publishing-directory))
         (sitemap-newdir (expand-file-name (concat (file-name-as-directory sitemap-olddir) "..")))
         (sitemap-oldfile (expand-file-name sitemap-fn sitemap-olddir))
         (sitemap-newfile (expand-file-name (concat (file-name-as-directory sitemap-newdir) sitemap-fn))))
    (with-temp-buffer
      (goto-char (point-min))
      (insert-file-contents sitemap-oldfile)
      ;; massage the sitemap if wanted

      ;; delete the old file and write the correct one
      (delete-file sitemap-oldfile)
      (write-file sitemap-newfile))))

(defun my-blog-articles-add-subheader (plist filename pub-dir)
  "Called after the publishing function, this adds a subheader to each blog post."
  (let* ((outfile (file-name-concat pub-dir (concat (file-name-base filename) ".html")))
         (date (format-time-string (car org-time-stamp-custom-formats) (org-publish-find-date filename plist)))
         (author (org-publish-find-property filename 'author plist)) ; unused
         (re (regexp-quote "<h1 class=\"title\">")))
    ;; open the outfile and splice publishing date into the generated HTML
    (with-temp-buffer
      (insert-file-contents outfile)
      (when (re-search-forward re nil t)
        (end-of-line)
        (insert (format "\n<div class=\"subheader\"><p><i>Published: %s</i></p></div>" date)))
      (write-file outfile))))

(defun my-blog-minify-css (project-plist)
  "Minify most of the CSS using CSSTidy."
  (let* ((csstidy (concat my-website-bin-dir "csstidy"))
         (csstidy-args " --template=highest --silent=true")
         (css-dir (expand-file-name (plist-get project-plist :publishing-directory)))
         (css-files (directory-files css-dir t "^.*\\.css$"))) ; CSSTidy does not work with the fonts file
    (dolist (file css-files)
      (unless (string-match-p (regexp-quote "fonts.css") file)
        (with-temp-buffer
          (insert (shell-command-to-string (concat csstidy " " file csstidy-args)))
          (write-file file))))))

;; emacs black magic. This code uses the bib2html binary to generate a list of
;; publications from my bibtex file. On the publications page, the output
;; appears as a table.
(defun generate-bib-html (relfile)
  (let ((bib2html-binary (concat my-website-bin-dir "bibtex2html"))
        (infile (concat my-website-bib-dir relfile))
        (tempfile (make-temp-file "emacs-bib2html")))
    ;; run bib2html with the correct flags
    (call-process bib2html-binary nil nil nil
                  "-noheader" "-nofooter" "-nodoc"
                  "-s" "ieeetr" "-d" "-r" "-nobiblinks"
                  "-nolinks" "-unicode"
                  "-o" tempfile infile)
    ;; massage the output
    (with-temp-buffer
      (insert-file-contents (concat tempfile ".html"))
      ;; make the table left-aligned
      (goto-char (point-min))
      (replace-regexp "<table>" "<table style=\"margin: 0 0 0 0; max-width:100%\">")
      ;; highlight my name (FIXME might be better ways, but for now this works.)
      (goto-char (point-min))
      (replace-regexp "D.&nbsp;Ogbe" "<b>D.&nbsp;Ogbe</b>")
      (buffer-substring (point-min) (point-max)))))

;; finally, pull the project together in the `org-publish-project-alist'
(setq org-publish-project-alist
      `(("blog"
         :components ("blog-articles" "blog-pages" "blog-rss" "blog-css" "blog-images" "blog-dl"))
        ("blog-articles"
         :base-directory ,my-website-blog-dir
         :base-extension "org"
         :publishing-directory ,(concat my-website-out-dir "blog")
         :publishing-function (org-html-publish-to-html my-blog-articles-add-subheader)
         :preparation-function my-blog-articles-preprocessor
         :completion-function my-blog-articles-postprocessor
         :htmlized-source t ;; this enables htmlize, which means that I can use css for code!

         ;; n.b., these actually don't do anything because org mode
         ;; puts the information in the header, but I am overwriting
         ;; the header. leaving here anyway.
         :with-author t
         :with-creator nil
         :with-date t
         :with-timestamps nil

         :headline-level 4
         :section-numbers nil
         :with-toc nil
         :with-drawers t
         :with-sub-superscript nil ;; important!!

         ;; the following removes extra headers from HTML output -- important!
         :html-link-home "/"
         :html-head nil ;; cleans up anything that would have been in there.
         :html-head-extra ,(my-blog-extra-head t)
         :html-head-include-default-style nil
         :html-head-include-scripts nil

         :html-format-drawer-function my-blog-org-export-format-drawer
         :html-home/up-format ""
         :html-mathjax-options ,my-blog-local-mathjax
         :html-mathjax-template ,(concat my-blog-extra-mathjax-config "<script type=\"text/javascript\" src=\"%PATH\"></script>")
         :html-footnotes-section "<div id='footnotes'><!--%s-->%s</div>"
         :html-link-up ""
         :html-link-home ""
         :html-preamble my-blog-header
         :html-postamble ,my-blog-footer

         ;; sitemap - list of blog articles
         :auto-sitemap t
         :sitemap-filename "blog.org"
         :sitemap-title "Blog"
         ;; custom sitemap generator function
         :sitemap-function my-blog-sitemap
         :sitemap-function org-publish-sitemap-default
         :sitemap-sort-files anti-chronologically
         :sitemap-date-format "Published: %a %b %d %Y")
        ("blog-pages"
         :base-directory ,my-website-pages-dir
         :base-extension "org"
         :publishing-directory ,my-website-out-dir
         :publishing-function org-html-publish-to-html
         :preparation-function my-blog-pages-preprocessor
         :completion-function my-blog-pages-postprocessor
         :htmlized-source t

         :with-author t
         :with-creator nil
         :with-date t
         :with-title nil
         :with-timestamps nil

         :headline-level 4
         :section-numbers nil
         :with-toc nil
         :with-drawers t
         :with-sub-superscript nil ;; important!!

         ;; the following removes extra headers from HTML output -- important!
         :html-link-home "/"
         :html-head nil ;; cleans up anything that would have been in there.
         :html-head-extra ,(my-blog-extra-head nil)
         :html-head-include-default-style nil
         :html-head-include-scripts nil

         :html-format-drawer-function my-blog-org-export-format-drawer
         :html-home/up-format ""
         :html-mathjax-options ,my-blog-local-mathjax
         :html-mathjax-template ,(concat my-blog-extra-mathjax-config "<script type=\"text/javascript\" src=\"%PATH\"></script>")
         :html-footnotes-section "<div id='footnotes'><!--%s-->%s</div>"
         :html-link-up ""
         :html-link-home ""

         :html-preamble my-blog-header
         :html-postamble ,my-blog-footer)
        ("blog-rss"
         :base-directory ,my-website-blog-dir
         :base-extension "org"
         :publishing-directory ,my-website-out-dir
         :publishing-function org-rss-publish-to-rss
         :with-timestamps nil

         :html-link-home "https://ogbe.net/"
         :html-link-use-abs-url t

         :title "Dennis Ogbe"
         :rss-image-url "https://ogbe.net/img/feed-icon-28x28.png"
         :section-numbers nil
         :exclude ".*"
         :include ("blog.org")
         :table-of-contents nil)
        ("blog-css"
         :base-directory ,my-website-css-dir
         :base-extension ".*"
         :publishing-directory ,(concat my-website-out-dir "res")
         :publishing-function org-publish-attachment
         :completion-function my-blog-minify-css
         :recursive t)
        ("blog-images"
         :base-directory ,my-website-img-dir
         :base-extension ".*"
         :publishing-directory ,(concat my-website-out-dir "img")
         :publishing-function org-publish-attachment
         :recursive t)
        ("blog-dl"
         :base-directory ,my-website-dl-dir
         :base-extension ".*"
         :publishing-directory ,(concat my-website-out-dir "dl")
         :publishing-function org-publish-attachment
         :recursive t)))

Update January 02, 2023: I changed a few minor things, including that I now finally add the publish date into the actual published blog post. Also, thanks to a hint by G.M., who provided a fix for my old structure-template definition, I now have an updated structure template to generate the header for a blog post (see here for an explanation):

(require 'org-tempo)
(tempo-define-template "blog-header" ; just some name for the template
                       '("#+title: ?" n
                         "#+AUTHOR: Dennis Ogbe" n
                         "#+EMAIL: [email protected]" n
                         "#+DATE:" n
                         "#+STARTUP: showall" n
                         "#+STARTUP: inlineimages" n
                         "#+BEGIN_PREVIEW" n p n
                         "#+END_PREVIEW")
                       "<b"
                       "Insert blog header" ; documentation
                       'org-tempo-tags)

Blog RSS License: CC BY-SA 4.0