Wordpress Static Site Generator

A little while ago I wrote a short piece on trying to improve the speed, reliability and security of the https://linux.uk website. (WordPress, but Faster) Things have moved on a little from then and the code has ended up as a published plugin in the Wordpress directory.

Static vs Dynamic

Currently we seem to have two general design approaches for websites, each with it’s own set of Pro’s and Con’s, these are;

  • Static, the site is a collection of HTML and CSS pages. This generally results in lightning fast delivery, very few moving parts to go wrong, and no code to hack.
  • Dynamic, the site exists as a collection of assets in a database. Each page is delivered “on demand” so the speed is limited by the rate at which the pages can be generated, there are lots of spinning plates that can go belly up at any time, and because it “code” and it’s “online”, there’s plenty of scope for someone being able to hack into your site.

Now if you read this, you might think “why does anyone use dynamic?!”, the problem is that there are many features (like shopping baskets and billing systems etc) which “need” to be dynamic, at least to some extent. Going even further, dynamic systems tend to come with Content Management Systems that allow “users” to maintain the site, Static implementations tend to need programmers for maintenance.

So, caught between rock and a hard place … or maybe a two legged tortoise and a blind roadrunner.

The “other” way …

So the progression is to implement the site as a dynamic site, getting all the benefits of dynamic design, user maintenance etc, and then publishing it at the last minute as a static site, adopting the speed, security etc. A dynamic to static converter if you will.

This is where the plugin comes in.

Make Me Static

Well, it had to have “a” name. Essentially it employes an external web crawler to scan a Wordpress site, storing details of every asset (every web page, post, css file, javascript file etc) that the site references in a database. It then uses that database to synchronise what’s currently visible on the dynamic site with a static copy of the site stored within a Git repository. There’s a long list of reasons why this approach might be attractive;

  • You get to use Wordpress to design and maintain your site and it’s content
  • The result is a really fast, really reliable, difficult to hack version of the site
  • Each time you push changes from Wordpress to the Git repository, the repository stores a snapshot of what it had at any given point in time. This means that not only can you tell exactly what what on your website at any point in time (ever), you can also restore your website back to any given point in time … or indeed generate a copy of your website for any given point in time. There are many reasons why you might want this that go way beyond just having “backups”. (like being able to prove you did or didn’t offer a product at a given price for example, or that you didn’t “just” change your terms and conditions)

How does it look?

Well, something like this …

Profiles

You can have multiple “profiles”, each of which will generate it’s own static “version” of the site. This can be handy if you want to generate a “test” version of the site just to see how it looks. As some plugins don’t play well when presented as static, testing them in static form is less risky than deploying them directly to a live site.

Crawlers

Profiles are allocated a “crawler” which allows for the distribution of loading over the back-end, so in theory the system will scale almost infinitely. (there is an invisible load-balanced directory server that sits between the Internet and the crawlers that handles the allocation of crawlers to profiles)

Scanning modes

There are a number of ways in which the site can be scanned. The “default” mechanism is to scan the entire site for changes and to publish the difference. This “can” take minutes, or even tens of minutes for very large sites.

The second mode is a simple update which only looks at changed items, so for example if you correct typo’s in a page, and you know your changes don’t affect any other pages, you can run an update just to push that one page, which can take as little as 10 seconds.

The third mode performs the same task as the “default” mechanism, but also interrogates the Git repository and removes any obsolete items or items that no longer appear on the site.

Scanning speed and features

The default speed is to scan one page every two seconds. This is to try to avoid overloading the site that is being scanned. If however you’re not worried about loading on your Wordpress server, this can be upped to no throttling running across multiple threads concurrently, which speeds things up a fair bit.

Notifications

There is also an optional integration for “Webpushr”, which is a service designed to notify end users when sites in which they have expressed an interest have been updated. If you turn this on, each time you push changes to the static version of the site, users “subscribed” to the site will get a desktop notification that you have pushed new content.

Anyone using large social media platforms like Facebook, Youtube etc, will probably be familiar with this concept :wink:

Dynamic features

Some dynamic features cannot be implemented directly on a static site.

Forms

Form plugins, specifically those using AJAX to post form details, should still work as they will still point at the Wordpress server and are not converted to point to the static site.

Search

The same applies to the Wordpress search feature. The only caveat here is that search results will contain URL’s which reference the original Wordpress site, so anyone using the search and drilling into the results, will end up pulling pages from the dynamic Wordpress instance. We have a medium term solution for this, but in the short term it’s likely to be a relatively tiny amount of dynamic traffic.

What’s it got to do with Linux?

Well, everything really. It was written on Linux systems, for Linux systems, to work with an Open Source Content Management System (Wordpress) which is used to power our website (https://linux.uk)

The plugin itself, also Open Source.

You can find it in the Wordpress Plugin Directory here;

Here’s the Freebie!

I’m looking for 3 testers. In return I have three free Wordpress websites, all setup, hosting and support included. You can either provide your own domain or you can have a domain ending in “linux.co.uk”. First three people who can convince me they’re actually going to use the site and provide a review for the plugin :slight_smile:

Last but not least …

Those of you who’ve read my other recent posts will know what’s coming next … the infrastructure is all running on Raspberry Pi’s, the code is all written in Python (and Javascript) , the database is powered by my (Python based) Database :scream:

The really tekky bit

But just to add a little detail, the entire project was written partially with one eye on promoting an (Open Source) development framework I’ve written called “Orbit”.

The crux of the design is the ability to map database fields onto reactive javascript variables via secure websockets. In context this means that the MMS Control panel shows a fair bit of potentially “live” information, however there is no explicit code written to update this information, other than being mapped via the Orbit Framework. On the backend, crawlers and other tasks simply update database tables with no specific knowledge of the front-end. The framework takes care of updating user’s screens (in real-time) from these changes without needing any logic to do so. The update process is “very” efficient and only updates changed information.

If for example you have a table showing 20 rows and 1 row is changed in the database, only 1 row is sent to the browser. i.e. it “knows” what’s on the screen and transmits the bare minimum only. It also handles multiple users looking at the same thing, so all views from multiple users are always in sync.

If anyone is vaguely interested in what Orbit is all about, the docs are here;

https://zerodocs.madpenguin.uk/#/

(ZeroDocs is an application written using Orbit)

MP, you are a genius. I can’t pretend to understand it but what it achieves is very impressive.

Ok, well leaving speed,reliability and all the bells+whistles for a second, there are exploits hitting the news on a regular basis, detailed here for example;

So multiple issues per day. If you run a static version of a Wordpress site, then there is essentially “no Wordpress” as far as the exposed site is concerned, so most of these security issues evaporate :slight_smile:

The Plugin provides a button within Wordpress that you can push, this one push takes a snapshot of the Wordpress site and publishes the snapshot as a static site (typically within seconds or low minutes).

Typical Wordpress sites are dynamic, what you are seeing is the state of the site following the last update or edit that was applied. If you look at https://linux.uk, this is a point in time snapshot of the actual Wordpress site - but it’s not actually “a” Wordpress site :slight_smile:

Yep: that’s more or less what I got out of your script. Making the site static seems definitely the safest way to go. Aren’t other sites doing this, too, if it’s known vulnerability?

Yes and no. There are a few plugins, written in PHP which attempt to produce a static copy of your site in a local file-system, which you can then set about publishing via the mechanism of your choice as a static site.

Issues I found;

  • None of the plugins I tried seemed able to handle linux.uk, it’s either too big or there are issues with the plugins on that site
  • They all seemed relatively slow
  • The resulting static sites were problematic in the way they handled content that couldn’t be made static (searching, forms etc)
  • They typically only store the site in a filesystem, it’s then up to you to publish it, i.e. it’s aimed at being a “one-shot” periodical process.

Seemed like a “gap in the market”, there is one other plugin doing a similar thing that launched in July, however as far as I can see, their service isn’t free …