Watch it on YouTube

While Assembly is a language able to give instructions to the computer hardware, WebAssembly is an Assembly-like language able to drive a portable virtual stack machine. At present, not only all major browsers run an implementation of this VM, but WebAssembly bytecode can also be executed from the command line (wasm-interp, wasmtime, etc...).

How WebAssembly can be used

Although WebAssembly bytecode is more portable than standard Assembly which can only run on a specific CPU model, it is not recommended to write WebAssembly directly. In fact, WebAssembly should be the compilation target for the programming language of choice: C, C++, C#, Java and much more! This article can help with a bunch of scenarios where WebAssembly might be the best choice:

  1. Critical web applications that need to improve their performance (bear in mind that while JavaScript programs are interpreted text files, WebAssembly ones are in WASM binary form, therefore, they are executed instead)
  2. C/C++ desktop applications that need to be converted to web applications with the least amount of work
  3. Web applications that need accessing, for instance, C or C++ libraries

Performance testing: compiling C to WebAssembly

In this section we will test the performance of a C function compiled into WebAssembly comparing it to a JavaScript script. Both of these pieces of code:

  • Check whether an email address is in a valid format
  • In order to produce meaningful results, share the same design and implement the same algorithm

Before proceeding with the code, bear in mind that while in JavaScript strings are primitive types, in C, they are one-dimensional arrays of chars. Moreover, a C array and a pointer have a close relationship as the name of the former can be considered as a pointer to the first element of the item list.

To run the first example, we create a HTML page with an unordered list of fake email addresses that will bootstrap the C function emailchecker compiled as WASM. This function:

  • If the email is valid, logs 0 to the console
  • If the email is invalid, logs a positive number to the console
  • Prints the estimated execution time to the console
   <!DOCTYPE html>
   <html>
       <head>
          <!-- Remove missing favicon error -->
           <link rel="shortcut icon" href="#">
       </head>
       <body>
         <main>
            <h1>These emails were randomly generated...<h1>
            <h2>Please do not send any message to these addresses</h2>
            <ul>
               <li>Add an email address here</li>
               <li>Add one more email address here</li>
               <li>...</li>
            </ul>
            //This JavaScript is generated by the compiler and it has to be loaded
            <script type="text/javascript" src="emailchecker.js"></script>
            <script type="text/javascript">
               //Wait until the system is ready
               Module.onRuntimeInitialized = () => {
                  //Look for all unordered list items
                  const nodeList = document.querySelectorAll("ul > li");
                  //Pass the C program to cwrap() and specify the return and parameter types
                  let address = Module.cwrap('emailchecker', 'number', ['string']);

                  //Get system time to calculate performance
                  const start = Date.now();

                  //Pass each item of the unordered list as parameter to the C function
                  for (let i = 0; i < nodeList.length; i++) {
                     console.log(nodeList[i].innerHTML + ' ' + address(nodeList[i].innerHTML));
                  }

                  //Calculate the execution time
                  console.log(`Time elapsed: ${Date.now() - start} ms`);

               }
            </script>
         </main>
       </body>
   </html>

Next, we save the following C function as emailchecker.c

   #include <string.h>
   #include <emscripten.h> 

   int EMSCRIPTEN_KEEPALIVE emailchecker(const char *email)
   {
        int total = 1;
        const char *placeholder, *domain;
        static char illegal[] = "<([@])>;:,\\\"";

        //Check email user

        for (placeholder = email; *placeholder; placeholder++)
        {
          //If this is a space, a non-printable OR an extended ASCII, return 1
          if (*placeholder <= 32 || *placeholder >= 127) return 1;
          //If this is a @, the domain name is next, EXIT LOOP
          if (*placeholder == 64) break;
          //If contains any illegal character, return 2
          if (strchr(illegal, *placeholder)) return 2;
        }

        //If this is the beginning OR the previous is a dot, return 3
        if (placeholder == email || *(placeholder - 1) == 46) return 3;

        //Check email domain

        //If the domain is null, return 4
        if (!*(domain = ++placeholder)) return 4;

        do
        {
          //If there is a dot, THEN
          if (*placeholder == 46)
          {
            //If it's the beginning OR previous one is a dot, return 5
            if (placeholder == domain || *(placeholder - 1) == 46) return 5;
            total = 0;
          }
          //If it's a space or an extended ASCII, return 6
          if (*placeholder <= 32 || *placeholder >= 127) return 6;
          //If contains any illegal character, return 7
          if (strchr(illegal, *placeholder)) return 7;
        } while (*++placeholder);

        return (total);
   }

The label EMSCRIPTEN_KEEPALIVE allows us to export specific functions rather than making the whole program available to external calls. In fact, in the code above the function main is missing as in the <script> HTML tag the function cwrap calls the C function emailchecker using DOM elements as parameters. Next, the C code is built:

emcc emailchecker.c -Os -o emailchecker.js -s EXPORTED_FUNCTIONS=_emailchecker -s EXPORTED_RUNTIME_METHODS=cwrap

In the code above, the function cwrap is exported as it will bootstrap emailchecker while the option -O3 forces the compiler to produce optimized code (other optimizations are -O1, -O2, -O3, -Oz, -Og and -O0 for no optimization)

Running the program three times with 5000 email addresses gererates the following results:

  • 1st Time elapsed: 174 ms
  • 2nd Time elapsed: 191 ms
  • 3rd Time elapsed: 196 ms

Rebuilding the program using the optimization option -O3 produce the following instead

  • 1st Time elapsed: 211 ms
  • 2nd Time elapsed: 172 ms
  • 3rd Time elapsed: 195 ms

Performance testing: from C to JavaScript

To test how JavaScript performs in a similar scenario we will need the previous HTML page with a different <script> section. We will remove the previous <script> tag and placing the following one just before the closing </body>

   <script src="checkemail.js"></script>
   <script>
      const start = Date.now();

      const nodeList = document.querySelectorAll("ul > li");
      //Call the JavaScript for each unordered list element
      for (let i = 0; i < nodeList.length; i++)
           console.log(nodeList[i].innerHTML + ' ' + emailchecker(nodeList[i].innerHTML));

      console.log(`Time elapsed: ${Date.now() - start} ms`);
   </script>

We now need the checkemail.js which is derived from the C function

   function emailchecker(email)
   {
     let total = 1;
     const illegal = "<([@])>;:,\\\"";
     let index = 0;

     //Check email user

     for(index; index < email.length; index++)
     {
       //If this is a space, a non-printable OR an extended ASCII, return 1
       if (email.charCodeAt(index) <= 32 || email.charCodeAt(index) >= 127) return 1;
       //If this is a @, the domain name is next, EXIT LOOP
       if (email.charCodeAt(index) == 64) break;
       //If contains any illegal character, return 2
       if (illegal.includes(email[index])) return 2;
     }

     //If this is the beginning OR the previous is a dot, return 3
     if (index == 0 || email.charCodeAt(index - 1) == 46) return 3;

     //Check email domain

     //If the domain is null, return 4
     if (index == email.length - 1) return 4;

     do
     { 
       //If there is a dot, THEN
       if (email.charCodeAt(++index) == 46)
       { 
          //If it's the beginning OR previous one is a dot, return 5
          if (email.charCodeAt(index - 1) == 64 || email.charCodeAt(index -1) == 46) return 5; 
          total = 0;
       }
       //If it's a space or an extended ASCII, return 6
       if (email.charCodeAt(index) <= 32 || email.charCodeAt(index) >= 127) return 6;
       //If contains any illegal character, return 7
       if (illegal.includes(email[index])) return 7;
     } while (index < email.length - 1)

     return (total);
   }

If the previous email addresses were not removed, the JavaScript executions will run with the same strings used before:

  • 1st Time elapsed: 185 ms
  • 2nd Time elapsed: 227 ms
  • 3rd Time elapsed: 190 ms

Conclusions

Although for better understanding what is actually going on the reader will have to analyze the Performance tab in the Firefox Web Developer Tools (or similar technology), this test did show that running C code inside web pages is possible and it might perform better than JavaScript counterpart. Therefore, migrating older desktop applications to the web using WebAssembly will work just fine.

Previous Post Next Post