InqPortal – Adding Performance Metrics v5

Previous: Adding Web Content

If you have done Arduino programming for a while, a time will have come when you experienced program stoppages, or random reboots. ESP8266 programming seems to be even more susceptible to these seemingly random errors. It is no fault of the ESP8266. It is doing far more things behind the scenes than merely running your Sketch. It handles all kinds of WiFi, networking and file duties transparently behind the scenes. Even more impressively, the ESP8266 was not designed as a stand-alone processor for the Maker community. It was originally designed as a WiFi serial communications device running off the old modem AT command set.

When new developers run into these issues, they often blame it on the board, sensors or sun spots. Some give up. Some just move on to a different project. More senior developers start adding debugging information and/or other metrics to shine light onto the problem. In this section we will show some performance metrics that are built into InqPortal and how these might be used. In a later section, we will also get into some other InqPortal advantages for debugging as well.

Note: Remember to set your menu item Tools / Erase Flash: “Only Sketch” since you’ve made some settings and leaving it on “All Flash Contents” will erase those settings.

Now… head back to our Arduino IDE and add one new line of code as shown to our Sketch and Upload.

Adding ADC_MODE to use analog pin for reading input voltage

ADC_MODE(ADC_VCC); – This is a standard line from the ESP8266 code base. There is only one analog pin on ESP8266 boards. This command can only be included if you have no plans for the analog pin in your project. Using this line of code will cause the ESP8266 to take over the analog to digital converter to measure the incoming voltage. InqPortal will gather this value and allow you to monitor it from the browser based Admin. From a development standpoint, monitoring this value might allow you to see a trend in the voltage that causes the processor to stop or reboot. Low voltage, either from a bad power supply, weakening battery or an overdraw of power from sensors and actuators running off your ESP8266 can all cause the processor to reboot. It might be difficult to rely on this value. We have seen multiple different boards showing a large range of voltages. One board might show a voltage number fluctuating around 2.9V while another hung off the same USB port may show numbers fluctuating around 3.7V. In other words, you’ll need to establish a baseline for each and every port/battery/power supply/processor you use. YMMV!

As gathering and processing data does require some processing time, the following performance metrics are only gathered when a browser is running the InqPortal Admin. If you are using your own front-end GUI browser pages, these three performance metrics are not gathered and sent.

  1. The first metric captured is the optional Supply Voltage discussed above. If you want to see the supply voltage, you must include of the ADC_MODE(ADC_VCC); line. If it is not in your Sketch, the analog pin is not used by InqPortal and you will only receive the following two performance metrics.
  2. The second is the current Free Memory available. This is memory used by your Sketch directly as well as by InqPortal and the lower level WiFi and file management subsystems. You will see it rise and fall as actions occur on your project. It is beyond the scope of this topic, but there are two issues that actively start eating this memory. These are Memory Leaks and Fragmentation. Developers on the low level libraries and InqPortal have expended great effort to eliminate these issues. Excessive memory usage can occur in minutes or it might take days. One of the most frustrating aspects of this type of bug is, it appears to happen randomly and can finally reboot at any place in your code. Tracking this in InqPortal might help point you to searching your code for potential leaks and fragmentation.
  3. The final metric is Loop Frequency. On true computers that have real operating systems, there is the concept of CPU utilization. Everyone is familiar with it. On microprocessors, the MPU is always at 100%, so that metric has little value. However, there is the concept of how many cycles the Arduino coding loop() method gets called per second. When this number is high, there is very little going on with your Sketch, InqPortal and the low-level functions. As processes use more time, this number drops. It can be a good gauge of how the MPU is being used. As you can see below, when idle, this number fluctuates around 130 kHz using the 80 Mhz clock speed. In the turbo 160 MHz speed, this number is about 270 kHz. There is a great deal of power available. Until this number gets below 1 Hz, your program should be fine. Below 1Hz starts getting into the area where the ESP8266’s Watch-Dogs might raise their ugly heads. Tracking this in InqPortal might help point you to bottlenecks in your logic when things start getting overloaded.

Warning: DON’T USE the delay() method – This method comes from its Arduino heritage. On an Arduino, there are no low-level, background routines that need some of the CPU’s time. Not true on the ESP8266. Starving the lower levels will cause all kinds of communications errors and can corrupt data coming in OR going out. You can use delay() sparingly (tens of milliseconds is OK) but the common practice of delaying for a second or more is a severe problem. InqPortal has methods of handling this which coincidentally makes your code more readable and compartmentalized as well as flexible. We’ll be exploring those in the next topic.

Here are more metrics included with InqPortal.

Server Start Time – Will tell you when the last time your server was booted. It might just have been by your hand or a power failure, but it could also be your server hit a critical exception or Watchdog failure and rebooted.

Boot Count – Without this you may not know that your server booted a dozen times during the night while your test was being performed. Even if your project reboots, often, the InqPortal Admin page will reconnect to the server once its finishes rebooting and successfully reconnects to the network.

Boot Reason – Lets you know the reason for the last reboot. External System will be the most common message under normal circumstances. You’ll see this for reboots by compiled uploads, OTA updates and initial powering up.

Disk Size, File Use and FragmentationInqPortal uses our own file system that has been optimized for loading and serving files. It does not use SPIFFs or LittleFS. It includes a flash memory, wear leveling capability. Because we felt that automatic self memory defragmentation can result in a critical pause when your project cannot afford such a pause, we provide a manual defragmentation option that you can use when you know you server is being updated anyway and critical processes are not running. Defragmentation is never necessary during normal runtime operations. The only time defragmentation might be necessary is during file uploads during maintenance or upgrade procedures anyway. During an upload a message may prompt you to defrag the file system.

Send Buffer / Data Loss – Our custom Web Sockets functionality is optimized for small message packets. The system is fast, robust and very responsive. When one client updates a value on the server and the server propagates that response out to all browsers, it is near impossible for you to notice that they didn’t arrive simultaneously even though the server must send the same message to each individual browser in turn. As these small messages have a great deal of TCP overhead and are the most abusive type TCP transaction, a dynamic buffer is required. When under light load this buffer is typically near zero. As load increases, so does this buffer. We have seen this sub-system handle a steady state of 35 KB/sec running a data send interval of 5 milliseconds to four clients using tiny messages (the most abusive). Anything faster, larger and/or more clients and the systems has to start throwing away messages or risk a critical exception causing a reboot. The Data Loss field shows you how much data has been lost since the last re-boot. This is another metric along with Loop Frequency that InqPortal provides you so you can monitor how close you are to overloading your microprocessor. Known data loss is far more informative than unknown data loss or worse yet… going belly up without notice. We are constantly striving to improve throughput and do not feel we have reached the maximum. As we have seen a sustained 480 KB/sec rate from serving web pages, we know there is performances to be gained. However, the biggest unfixable hurdle is the small size of the messages required for a flexible and responsive system.

Data Processed and Data Rate – These are just basic statistics. This includes all data that the InqPortal processed including, HTTP requests for URLs/files, the output of serving those files, and Web Socket data coming and going. The rate also includes all incoming and outgoing data and is in bytes/second.

WebSockets – This value lets you know how many browsers are currently connected to your server. For performance reasons, we have decided to use WebSockets for communication between clients and servers. The ESP8266 has a hardware limit of five connections. We reserve one connection to always being available for uploading files, OTA and serving URLs/files. This leaves up to four simultaneous clients. This include InqPortal Admins and any of your client UI applications that use the client-side InqPortal library (InqPortal.js). Thus the fraction shown on the Admin shows <number of Admins> / <total number of WebSocket connections> / <total allowed>.

Performance Metrics Histogram

We discuss configuration of the History tab of InqPortal in several other locations. It is totally generalized for your custom use, however, it defaults to monitoring the three values we have been discussing above. You can add and subtract your own custom numeric, published variables (more later) as well adjust the time base from seconds to days. There are also several averaging, minimums, and maximum functions that can be applied to the data.

History tab displaying the default Loop Frequency, Memory Available, and input Voltage.

In the next topic, we start implementing our own project specific data and pushing it out to clients.

Next: Publishing Data