I’ve been using pgeu-system for conference organization since 2000. Once feature I wish it had: export the schedule to HTML. The primary reason for that is to decouple the schedule from the software. That way, the schedules are available on the website when the software is not running. That is, instead of pulling the page from the database, you’re pulling the page from static HTML files.
Such a feature is useful should you decide to use different software at some future date. Or, in my case, when you don’t want to continue maintaining the software but want the schedule to remain in place.
In this post:
- FreeBSD FreeBSD 14.1
- pgeu-system from 2024-08-21 (they usually don’t do releases)
- nginx 1.26.2
- I am not showing any cd commands – notice the current working directdory is shown in the command prompt – that should guide you.
- I did this work on the beta version of my website, not production.
Background
The conversion to HTML involves two parts:
- fetch the data
- adjust links
- The content for PGCon 2020 lives at /usr/local/www/pgcon.org/2020/content/schedule
First, fetch the website
The directories shown have had the /usr/local/www/pgcon.org/ prefix trimmed so it fits better on the page.
This wget command, supplied to me by the BSDCan Team (who used for the BSDCan website, which also used the same pgeu-system software), fetches the HTML from the system.
[15:40 2020/content/schedule] % wget -m -p -k -r -np --reject-regex='/\/\//' https://www.pgcon.org/events/pgcon_2020/schedule/
Move some stuff around
Move the index over:
[15:41 2020/content/schedule] % mv -i www.pgcon.org/events/pgcon_2020/schedule/index.html .
Move session and speaker information:
[15:38 2020/content/schedule/www.pgcon.org/events/pgcon_2020/schedule] % mv session speaker /usr/local/www/pgcon.org/2020/content/schedule
Move speaker photos into the speaker directory:
[15:39 2020/content/schedule/www.pgcon.org/events/speaker] % mv * /usr/local/www/pgcon.org/2020/content/schedule/speaker
Save your work
At this point, I saved what I had into my repo, so I have a fallback position when updating links.
Link updating
Now we update the links.
NOTE: your version of sed may not want the -i ” option. Perhaps it only needs -i.
These change consist of three areas:
- schedule home directory
- speakers
- sessions
With more fancy sed commands, you might be able to combine these steps. That just gets too complex for one-off operations like this. I prefer simple things.
Adjust css link:
[15:43 2020/content/schedule] % sed -i '' 's|../../../media/css/pgeu.css|media/css/pgeu.css|' index.html
Make the home directory link to the website (be it your beta or production website, no need to mention the hostname when it may vary):
[15:43 2020/content/schedule] % sed -i '' 's|li class="nav-item p-2"><a href="https://www.pgcon.org/" title="Home">|li class="nav-item p-2"><a href="/" title="Home">|g' index.html
Remove the link for events:
[15:45 2020/content/schedule] % sed -i '' 's| <li class="nav-item p-2"><a href="https://www.pgcon.org/events/" title="Events">Events</a></li>||' index.html
Remove the link for Your account:
[15:46 2020/content/schedule] % sed -i '' 's| <li class="nav-item p-2"><a href="https://www.pgcon.org/account/" title="Your account">Your account</a></li>||' index.html
Repeat the above changes for speakers:
[15:46 2020/content/schedule/speaker] % sed -i '' 's|../../../../../media/css/pgeu.css|../../media/css/pgeu.css|' */index.html [15:46 2020/content/schedule/speaker] % sed -i '' 's|img class="speaker-photo" src="../../../../speaker|img class="speaker-photo" src="../|' */index.html [15:46 2020/content/schedule/speaker] % sed -i '' 's|li class="nav-item p-2"><a href="https://www.pgcon.org/" title="Home">|li class="nav-item p-2"><a href="/" title="Home">|g' */index.html [15:47 2020/content/schedule/speaker] % sed -i '' 's| <li class="nav-item p-2"><a href="https://www.pgcon.org/account/" title="Your account">Your account</a></li>||' */index.html [15:47 2020/content/schedule/speaker] % sed -i '' 's| <li class="nav-item p-2"><a href="https://www.pgcon.org/events/" title="Events">Events</a></li>||' */index.html
And then for sessions:
[15:49 2020/content/schedule/session] % sed -i '' 's|../../../../../media/css/pgeu.css|../../media/css/pgeu.css|' */index.html [15:49 2020/content/schedule/session] % sed -i '' 's| <li class="nav-item p-2"><a href="https://www.pgcon.org/events/" title="Events">Events</a></li>||' */index.html [15:49 2020/content/schedule/session] % sed -i '' 's|li class="nav-item p-2"><a href="https://www.pgcon.org/" title="Home">|li class="nav-item p-2"><a href="/" title="Home">|g' */index.html [15:50 2020/content/schedule/session] % sed -i '' 's| <li class="nav-item p-2"><a href="https://www.pgcon.org/account/" title="Your account">Your account</a></li>||' */index.html
Now is a good time to check your work. Then check it in.
My main check: running that original wget command on the new HTML files and monitoring the logs for any errors, especially 404s.
What about the old URLs?
A basic HTML premise: once you have a URL out there, people will use it. If you change it, and you want them to find the new stuff, do a redirect. These are the nginx redirects I’m using. They rewrite the incoming URL, if applicable, to the new URL and return 301, letting the client know that this is a permanent relocation.
# handle the conversion from pgeu to static files rewrite ^/events/pgcon_2020/schedule/(.*)$ /2020/schedule/$1 permanent; rewrite ^/events/pgcon_2021/schedule/(.*)$ /2021/schedule/$1 permanent; rewrite ^/events/pgcon_2022/schedule/(.*)$ /2022/schedule/$1 permanent; rewrite ^/events/pgcon_2023/schedule/(.*)$ /2023/schedule/$1 permanent;
I’m sure they can all be handled with one rewrite, but like I said, I like to keep some things simple and clear.
Hope this helps.