Alejandro Ciniglio

A tool to convert a Curl request to Ansible's URI module


Some brief notes to self on how Hugo is working

These are some notes on how hugo is working for this website in particular. I haven’t found anywhere that enumerates these things succinctly, so I’m attempting to do that here.

  • Hugo reads the theme from the config.toml file. From there, it looks in the themes directory for a folder with the theme name (this folder could, but doesn’t have to, contain a theme.toml file).

  • The index view renders the layout found for the theme in $THEME_NAME/layouts/_default/baseof.html. In my theme, this layout loads partials with the <head> of the document, and a sidebar and footer, then delegates to the defined main block. main could be defined in any layout file, and I define it in the index.html file in my site layout (which is located at $SITE_ROOT/layouts/index.html).

  • Hugo will search for layouts using a priority based lookup that depends on several parameters. For example, the home layout search path is as follows (note that the project specific layouts are preferred to theme layouts):

    layouts/index.html.html
    themes/<THEME>/layouts/index.html.html
    layouts/home.html.html
    themes/<THEME>/layouts/home.html.html
    layouts/list.html.html
    themes/<THEME>/layouts/list.html.html
    layouts/index.html
    themes/<THEME>/layouts/index.html
    layouts/home.html
    themes/<THEME>/layouts/home.html
    layouts/list.html
    themes/<THEME>/layouts/list.html
    layouts/_default/index.html.html
    themes/<THEME>/layouts/_default/index.html.html
    layouts/_default/home.html.html
    themes/<THEME>/layouts/_default/home.html.html
    layouts/_default/list.html.html
    themes/<THEME>/layouts/_default/list.html.html
    layouts/_default/index.html
    themes/<THEME>/layouts/_default/index.html
    layouts/_default/home.html
    themes/<THEME>/layouts/_default/home.html
    layouts/_default/list.html
    themes/<THEME>/layouts/_default/list.html
    
  • Other than the home page, layouts are not searched for in the root of the layouts directory.

  • Partials are found in one of two places:

layouts/partials/*<PARTIALNAME>.html
themes/<THEME>/layouts/partials/*<PARTIALNAME>.html
  • You can use subdirectories, and refer to them in the partial call, e.g. {{ partial "subdir/footer.html" }}

  • Content types are set by either the subdirectory under content or by explicitly setting it in the front matter of the content. e.g. this post is at content/posts/hugo_workings.md which makes it a posts content-type. I could change that by setting type: othertype in the yaml front matter.

  • content/_index.md is the content for the home page

  • _index.md vs index.md: _index.md means that the content and containing folder refer to a list and a “branch” page bundle (i.e. contains more sub pages); index.md means that the content and containing folder are a single page bundle (a “leaf” bundle).

  • The .Permalink property on a post generates a link which loads the single.html layout with the contents of a single post.

  • Setting up css and js processing is best done with an external pipeline like webpack. Webpack can then be configured to start the hugo server for us, so that our single entry point for dev and building is via Yarn. Something like this to set your environment in webpack.config.js:

      function set() {
        switch (process.env.APP_ENV) {
        case 'dev':
          return {
            watch: true,
            filename: '[name]',
            command: 'hugo serve --buildDrafts=true'
          }
        default:
          return {
            watch: false,
            filename: '[name].[hash]',
            command: 'hugo'
          }
        }
      }
    
then in `package.json`, add the scripts blocks to call webpack:
```js
    "scripts": {
      "build": "webpack -p",
      "start": "APP_ENV=dev webpack"
    },

this way, I can use yarn start for development and yarn build to package the site.

Fall back to default page when nginx proxy fails

I started using react router for the first time on a new web app recently. React router loads when your base page loads and then handles loading subviews as the URL changes. If using a singe JS bundle, this means that your server only serves the initial index page to the client.

This is easy to handle if the user will always start the app at the root page (e.g. http://example.com/), however, if they’ve navigated and want to refresh (or if they’re navigating back to the app via a bookmark), we need to handle that correctly on the server side.

For example, if a user navigates to http://example.com/subscribe, the server needs to return the index page that contains react router, then react router will load the subscribe view.

This is straightforward with nginx1:

location / {
    try_files index.html =404;
}

Things get a bit more hairy when we throw a backend API server into the mix.

In my case, I have an API backend that I want to be making calls to from the client on the same url. E.g. POST http://example.com/user/1/subscriptions.

This backend is a separate server listening on the machine, so we’ll use nginx’s proxy_pass to send requests and return them to the frontend.

The easiest way I found to do this was to make a separate named location block and refer to it from the main location block.

Failing attempts

This inital attempt fails to render anything other than API results.

location / {
    try_files @api index.html =404;
}

location @api {
    proxy_pass http://127.0.0.1:3000;
}

To fix this, we need to try to render static files if they exist, and only proxy to the API if there is no file with that name2.

location / {
    try_files $uri $uri/ @api index.html =404;
}

location @api {
    proxy_pass http://127.0.0.1:3000;
}

This works for the initial load, but now our refresh case doesn’t work because e.g. there’s no file named subscribe, so subscribe gets called on the api, but that’s not a valid API route either.

Both API calls and navigation work correctly

To get the navigation to work again, we’ll have our API return 404 for routes it can’t handle, then we’ll have nginx render index.html if the API page has a 404 error.

The working block is below, note that we also set some headers as good practice (forwarded-for can be used by the API server to determine the original request).

proxy_intercept_errors_on tells nginx that it should be responsible for handling proxy error codes, instead of letting the API server handle them directly.

location / {
    try_files $uri $uri/ @api index.html;
}

location @api {
    proxy_intercept_errors on;
    proxy_set_header X-Real-IP  $remote_addr;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_pass http://127.0.0.1:3000;
    error_page 404 /index.html;
}

  1. The =404 at the end tells nginx to return HTTP 404 if index.html isn’t found on disk (which should never happen). ↩︎

  2. This means our api endpoints can’t have the same path as a real file, for better or worse. ↩︎

Automate your ssl certificate renewal: renew_cert.sh

Let’s Encrypt offers free SSL certificates, so there’s no reason to not support https on your site. Getting set up is easy (I used this guide), but you still have to renew your certificate every 90 days.

I pulled this script together from the acme-tiny instructions:

#!/bin/bash
set -eufx -o pipefail

echo "Last refreshed: " \
    $(cat ~/.certrefresh 2> /dev/null || echo "never")

cd certinfo/
python acme_tiny.py \
       --account-key ./account.key \
       --csr ./domain_and_subdomain.csr \
       --acme-dir /www/acmecert/YOUR_DOMAIN/challenges/ \
       > ./signed.crt
wget -O - https://letsencrypt.org/certs/lets-encrypt-x3-cross-signed.pem \
    > intermediate.pem
cat signed.crt intermediate.pem > chained.pem
sudo cp chained.pem /etc/ssl/YOUR_DOMAIN/

sudo /etc/init.d/nginx reload

date > ~/.certrefresh

Using jbuilder to run protoc

I’m starting on a new Ocaml project, and I wanted to use protobuffers as my serialization format. ocaml-protoc seems like a fairly well-adopted implementation of a protobuf compiler for ocaml, but figuring out how to get it to run automatically as part of a jbuilder build was not particularly well documented.

Jbuilder behaves somewhat like make (and ironically uses a lisp-ish configuration language). Given this, I made a rule whose output would be the generated .ml and .mli files, and required those resulting files to be installed (causing them to get generated at build time).

I added these two stanzas to my jbuild file:

(rule
 ((targets (messages_types.mli messages_types.ml
            messages_pb.mli messages_pb.ml))
  (deps (messages.proto))
  (action
       (run ocaml-protoc -binary -ml_out ./ ${<}))))

(install
  ((section etc)
   (files (messages_*.mli messages_*.ml))))

I’m still getting used to jbuilder, so I’m sure there’s less repetitive ways of getting this accomplished. If you have any suggestions, let me know!

Snake Case Elisp

I’ve been using Ocaml for some fun stuff lately (including generating this site). However, it only came to my attention recently that the style guide recommends snake_case for functions and variables. After doing so much Java, my default is CamelCasing all the time so I had some work ahead of me to clean it up.

I did a few by hand via Emacs’ subword-mode (aside: subword-mode is awesome and you should definitely use it), but this got tedious, so I thought it would be a good time to practice some elisp.

I knew I’d want to be able to call my function from the middle of a word and have it transform the whole thing, so I started by moving to the beginning of the word (backward-word is insufficient since if I’m already at the beginning of the word, it will take me to the word prior):

(defun aec/beginning-of-word ()
  "Move point to the beginning of nearest word"
  (interactive)
  (forward-word)
  (backward-word))

Easy enough; now I needed a way to know if I was done with the current word. I called this end-of-word-p, which is perhaps too general of a name, since it has some snake_case specific logic. Regular expressions in elisp are apparently not as straightforward as they are in languages I’m more familiar with, so I had to do some pretty crude expanded conditionals.

(defun aec/end-of-word-p (curpos)
  "whether the point is at the end of a word. Treats numerical digits 
   as non-word characters"
  (interactive "d")
  (or
   ;; end of buffer is obviously the end of the word
   (equal curpos (point-max))

   ;; word character followed by non-underscore non-word character
   (and
    (equal
      (string-match "\\w" (substring (buffer-string) (- curpos 2))) 
      0)
    (and
     (equal 
       (string-match "\\W" (substring (buffer-string) (- curpos 1))) 
       0)
     (not (equal 
            (string-match "_" (substring (buffer-string) (- curpos 1))) 
            0))))

   ;; underscore followed by non-word character
   (and
    (equal 
      (string-match "_" (substring (buffer-string) (- curpos 2))) 
      0)
    (equal 
      (string-match "\\W" (substring (buffer-string) (- curpos 1))) 
      0))))

Now that we have those tools, actually doing the work of snake_casing is easy enough: just convert the subwords to lowercase and insert underscores between them (and remember to delete the last underscore).

(defun snake-case-ify ()
  "Take a camelCased word and transform to snake_case"
  (interactive)
  (aec/beginning-of-word)
  (while (not (aec/end-of-word-p (point)))
    (call-interactively 'subword-downcase)
    (insert "_"))
  (delete-char -1))

The (call) stack

It’s not uncommon to hear programmers talk about ’the stack’ when they’re referring to functions or variables. One of, if not the, most popular sites for programmers, stackoverflow, is even named after a common error involving ’the stack'.

What is ’the stack'?

For starters, it’s a stack data structure just like the name implies: a last-in, first-out (LIFO), data store that has two key operations: push and pop.

The stack API could look like this if called from Javascript:

var s = new Stack();

s.push("a");
s.push("b");
s.push("c");

s.pop(); //=> "c"
s.pop(); //=> "b"
s.pop(); //=> "a"

Where is ’the stack'?

When a program runs, the operating system gives the program a large amount of addressable memory to work with. The program can’t use this memory arbitrarily, but this does serve as the landscape that the program will operate in.

‘The stack’ is just a stack that the operating system sets up for us in this landscape (usually at one end of the landscape). However, since ’the stack’ lives in addressable memory, it has one more special property: we can look at an item at any position on ’the stack’.

Stack and Frame pointers

When the program starts executing, the operating system also sets up two special variables for us, the stack pointer and the base pointer (sometimes referred to as the frame pointer).

The stack pointer is the only variable needed to implement push and pop; the frame pointer is used to make some calculations easier, but we’ll cover that in detail elsewhere.

When do we use ’the stack'?

So we know that ’the stack’ supports push and pop, but when do we push and pop?

Simply speaking, we push whenever we call a function (hence, call stack), and we pop whenever a function returns.

Say we have the following javascript:

function a() { return 1; }
function b() { a(); return 2; }

b();

Our program starts, and the stack is empty (we’ll tag the outermost item for clarity):

*program*

The first function call we see is b(); at the bottom of the program, so we push *b*, now the stack looks like:

*program*
*b*

Now we’re executing b, and in the body of b, we see a call to a(), so we push *a*:

*program*
*b*
*a*

a doesn’t call a function, but does return, so we pop *a*:

*program*
*b*

After a() has finished, we continue inside of b and return, so we pop *b*:

*program*

This illustrates the key purpose of the stack: it’s a way for the computer to keep track of where it should go when it’s done with a function (e.g. when a finished, the computer looked at the stack after pop-ping and saw that it should go back into b.

The stack can be used for additional purposes, but the reason it exists at all is to keep track of which function should be active when the current function returns.

Digression: stack overflow errors

Now that we know what the stack does, what exactly is a stack overflow? As the name may imply, it’s an error that happens when the stack gets too big and consumes more memory than our program has available.

Generally, this happens when there are too many items pushed onto the stack (i.e. it becomes too deep). This is almost always because of a recursive call (either individual, i.e. a function a calling itself; or mutual, i.e. a function a calls function b which calls function a).

For example:

function a() { a(); }
a();

This will run and cause the stack to look like:

*program*
*a*
*a*
.
.
.
*a*

Eventually, this will exhaust the memory that the operating system has set aside for the program, and it will get terminated.

Digression: Implementing a stack with only one variable

I mentioned above that the only variable we need to implement the stack is the stack pointer. This is fasible because the beginning location of the stack is set up for us by the OS.

When the program is started, the operating system thoughtfully sets up the stack pointer to contain the address of memory where the next item in the stack should go (sidenote: accessing memory addresses and getting an item at an index of an array are functionally equivalent, so I usually think of the memory given to us by the OS as a big, empty array, and the stack pointer as an index in that array).

When we start, our stack and stack pointer look like:

    Stack:
    ---------------------------------------------------------
    |     |     |     |     |     |     |     |     |     |  ...
    ---------------------------------------------------------
       0     1     2     3     4     5     6     7     8      
       ^
   
   Stack Pointer: 0

When we push, we write whatever we’re pushing to the location pointed to by stack pointer, and we increment stack pointer

For example, push("a"):

    Stack:
    ---------------------------------------------------------
    | "a" |     |     |     |     |     |     |     |     |  ...
    ---------------------------------------------------------
       0     1     2     3     4     5     6     7     8      
             ^
   
   Stack Pointer: 1 // Incremented from 0

push("b"):

    Stack:
    ---------------------------------------------------------
    | "a" | "b"  |     |     |     |     |     |     |     |  ...
    ---------------------------------------------------------
       0     1     2     3     4     5     6     7     8      
                   ^
   
   Stack Pointer: 2 // Incremented from 1

To pop, we first decrement the stack pointer, and then we remove whatever is in the location pointed at by stack pointer. E.g. pop():

    Stack pointer: 1 // Decremented from 2

    Stack:
    ---------------------------------------------------------
    | "a" |     |     |     |     |     |     |     |     |  ...
    ---------------------------------------------------------
       0     1     2     3     4     5     6     7     8      
             ^

    Return "b"

If you found this interesting or if I could have stated something more clearly, let me know on twitter or shoot me an email.

Nested variable scoping

Overview

How does a compiler translate a source language with nested lexical scopes to a destination language with only immediate scope (e.g. x86 assembly, or a hypothetically constrained javascript[1])?

To handle the case of a variable that is declared in an outer scope of a function, the compiler passes a static link when a function is called. The static link can be used by the compiler to unwind the call stack of the function to find the variable's position where it was originally declared.

Explanation

For example, if we have something like the following, with a variable x in the outermost scope, a function f that refers to x directly, and then a different scope, g, that calls f:

var x = 1;
function f() { 
  return x + 1; 
}

function g() { 
  var x = 10; 
  return f(); 
}
g(); //=> returns 2

At runtime, when g has called f, the stack looks like the following (growing downward):

*OUTERMOST FRAME*
X => 1
*G FRAME*
X => 10
*F FRAME*

Here, we need the call to f to be able to resolve x in the outermost scope (i.e. 1). At compile time, we know that when x is referred to by f, the nearest enclosing x is actually part of the outermost scope.

Keep in mind that in our target language, our functions cannot have references to other scopes, so in the body of f, we can’t simply refer to outermost_scope; however, we can pass in a reference to outermost_scope.

Let’s make the scopes explicit:

var outermost_scope = {};
outermost_scope.x = 1;

function f(hopefully_the_correct_scope) { 
  var f_scope = {};

  // How do we get outermost_scope here?
  return hopefully_the_correct_scope.x + 1;
}

function g(hopefully_another_correct_scope) { 
  var g_scope = {};
  g_scope.x = 10; 
  return f(/* some other scope? */); 
}

g(/* some scope? */);

The problem we’re trying to solve is how do we pass f and g the correct scopes, such that hopefully_the_correct_scope in f is actually outermost_scope?

Since there is only a single outermost scope in the whole program, we can trivially solve our problem by simply passing outermost_scope through every function call:

var outermost_scope = {};
outermost_scope.x = 1;

function f(outermost_scope) { 
  var f_scope = {};
  return outermost_scope.x + 1;
}

function g(outermost_scope) { 
  var g_scope = {};
  g_scope.x = 10; 
  return f(outermost_scope); 
}

g(outermost_scope); //=> returns 2

That’s not very interesting, but the idea of passing around scopes is useful.

Let’s make things more interesting by adding another layer of functions:

var x = 0;
function a() {
  var x = 10;
  
  function b() {
    var y = "a";
    return x + 5;
  }
  
  function c() {
    var x = 100;
    return b();
  }
  
  return c();
}

a(); //=> returns 15

Now let’s look at our stack at it’s deepest, when b is invoked:

*OUTERMOST FRAME*
x => 0
*A FRAME*
x => 10
*C FRAME*
x => 100
*B FRAME*
y => "a"

Now, b will need a reference to a’s scope, but because a is a function, it could be invoked at multiple places on the stack, so it isn’t a constant that we can know ahead of time. Let's see if passing outermost_scope through to every function solves our problem here:

var outermost_scope = {};
outermost_scope.x = 0;

function a(outermost_scope) {
  var a_scope = {};
  a_scope.x = 10;
  
  function b(outermost_scope) {
    var b_scope = {};
    b_scope.y = "a";
    return outermost_scope.x + 5; // Uh oh! We actually wanted a_scope.x
  }
  
  function c(outermost_scope) {
    var c_scope = {};
    c_scope.x = 100;
    return b(outermost_scope);
  }
  
  return c(outermost_scope);
}

a(outermost_scope); //=> actually 5, but we wanted 15

That didn't work! Inside the body of b, we actually want a reference to a_scope, not outermost_scope. Fortunately, every location where b can be called will be at a fixed offset from a’s scope. This means that we can look at every call of b and figure out how many layers we need to unwrap to get to a’s scope. In our example, the call to b will always have one extraneous scope in between b and a, so we need to go up two levels.

If we were to add a call to b(); just before return c();:

var x = 0;
function a() {
  var x = 10;
  
  function b() { ... }
  
  function c() { ... }

  b();  // Let's look at the call stack during this invocation.
  return c();
}

a();

that particular invocation would always have a spot on the stack directly below a’s scope:

*OUTERMOST FRAME*
x => 0
*A FRAME*
x => 10
*B FRAME*
y => "a"

In that case, to find the value of x inside the body of b() we’d only need to go up a single level.


One way the compiler can keep track of how many levels we need to go up the call stack is known as a static link. A static link is an invisible (to the programmer) argument that is passed to each function that points to the scope in which the called function was defined.

When the compiler encounters a function call, it will compute how many layers it needs to unwrap to get the scope in which the called function was defined, then emit code that always passes that scope as an argument (the static link) to that function. Then at runtime, the function can use that scope for variable and function lookups as needed.

We can make this visible and see how it looks after our computation:

var outermost_scope = {};
outermost_scope.x = 0;

function a(a_parent_scope) {
  var a_scope = {}
  a_scope.parent = a_parent_scope;
  a_scope.x = 10;

  function b(b_parent_scope) {
    var b_scope = {};
    b_scope.parent = b_parent_scope;
    return b_scope.parent.x + 5;
  }
  
  function c(c_parent_scope) {
    var c_scope = {};
    c_scope.parent = c_parent_scope;
    
    function d(d_parent_scope) {
      var d_scope = {};

      // By adding the static link machinery, the compiler is ensuring
      // that d_parent_scope is always c_scope, so while we can't refer
      // to c_scope directly, whenever we need it, we'll use
      // d_scope.parent.
      d_scope.parent = d_parent_scope;

      // We know that b is defined in a_scope, which we can't refer to
      // directly, but we know is always the scope that is the parent
      // of c_scope, which we're referring to as d_scope.parent, so
      // a_scope can be referred to as d_scope.parent.parent.
      return b(d_scope.parent.parent); 
    }
    d(c_scope); //=> 15

    // We use c_scope.parent because we know that b and c are defined in
    // a_scope, which in the body of c, we can only refer to as
    // c_scope.parent.
    return b(c_scope.parent); 
  }

  // We use a_scope because that's the scope in which b is defined.
  b(a_scope); //=> 15

  // We use a_scope because that's the scope in which c is defined.
  return c(a_scope);
}

a(outermost_scope); //=> 15

This example has parent as an explicit property on each scope to make it clear that our references are legal (i.e. not jumping scopes), and to illustrate that we can keep refering upwards to ancestor scopes with this mechanism (parent.parent...).

Look specifically at our call to b(d_scope.parent.parent). We know at compile time how far away d_scope is from b’s enclosing scope (a_scope), so we know how many .parents to unwrap so that we pass our equivalent reference to a_scope to b as b_parent_scope.

Now we have a straightforward, if repetitive, way to handle nested scopes in a target language that doesn’t support enclosing references. Adding this argument to every single function call is suboptimal if the child function never needs access to the parent scope, but we’ll leave that optimization for later.


If you found this interesting or if I could have stated something more clearly, let me know on twitter or shoot me an email.


  1. “these examples use javascript both as the compiled and target language simply for brevity and illustration; to approximate the issues of a target language, we’ll hand-wave and assume that our target javascript doesn’t have the features that we’re trying to implement, but has all the other features that our implementation depends on.”

Previous Page 2 of 2