C++ Arrays, Data Structures, and working with Program Memory 

Table of Contents

Introduction

In our last C++ lecture we looked at simple datatypes like uint8_t. In this lecture we will be looking at “structured” data types. A structured data type is a collection of related data organized in some logical fashion and assigned a single name (identifier). Data structures may be accessed as a collection of data or by the individual data items that comprise them. Our first structured data type is the array.

C++ simple data types:

  1. integer (uint8_t, int16_t)
  2. float (float, double)
  3. enum

C++ structured data types:

  1. array
  2. struct
  3. union
  4. class

Arrays

  • Arrays are groups of variables of the same data type.
  • Variables within the array are organized sequentially and accessed using a simple numerical index. Arrays therefore occupy contiguous locations in memory (SRAM or Flash).
  • The first entry has index value 0 and therefore the last entry has an index of “size – 1”, where size is the number of variables within the array.

Declaring Arrays

The simplest array declaration only has a single dimension.

dataType arrayName [ arraySize ]; 

The arraySize is an integer or constant variable (const int ARRAY_SIZE = 100)  greater than zero. The dataType can be any C++ data type.

For example, consider rooms within a maze. Imagine declaring all the variables in the maze as simple data types and then consider how you would access each variable within the maze.

uint8_t room0, room1, room2 ... room99; // a maze with 100 rooms

Clearly, an array provides a much simpler and practical solution. The following C++ statement declares a collection of 100 unsigned 8-bit integers, named maze.

uint8_t maze[100];                     // a maze with 100 rooms

Accessing a room within the maze is now as simple as giving its index within brackets []. The use of brackets [] in both the declaration and accessing elements of an array can potentially lead to confusion. In the declaration statement the brackets enclose the size of the array, while in the second case, the brackets enclose the index (base 0).

uint8_t room2 = maze[2];

Initializing Arrays

In C++ we initialize an array using braces {}, also known as curly brackets.

In this example, I want to track a robot’s progress in a maze containing obstacles. For fun the robot looks like a bear and the obstacles are defined as bee stings. As the robot moves from room-to-room the program needs to keep track of 6 variables.

To illustrative purposes, I am going to locate all this data in a single array in SRAM named myRobot containing 6 unsigned 8-bit integers corresponding to these data.

  //                    dir   turn  row   col   room  bees
  uint8_t myRobot[6] = {0x03, 0x00, 0x14, 0x00, 0x00, 0x00};

C++ provides a lot of flexibility in the declaration of an array. However, the array size once defined can not be changed (at least not within the context of an introductory lecture on C++ arrays), nor is this something you want to do in an embedded system. Lets look at two equivalent declaration statements for myRobot.

uint8_t myRobot[]  = {0x03, 0x00, 0x14, 0x00, 0x00, 0x00};  // array holds these 6 bytes
uint8_t myRobot[6] = {0x03, 0x00, 0x14};                    // extra bytes assigned 0x00 

As long as the number of values between braces { } is not larger than the number of elements that you declare for the array between the brackets [ ], no error will be generated.

Accessing Array Elements

As mentioned earlier, once an array is initialized, accessing a datum within the array is implemented by placing the index of the datum within square brackets after the name of the array.

uint8_t row = myRobot[2];

The above statement will take 3rd element from the array and assign the value to the row variable. Below is another example that declares, assigns random values, and accesses elements in an array.

void setup () 
{ 
  Serial.begin(9600);
  // generate random seed based on milliseconds since program started 
  randomSeed( (uint8_t) millis()); // cast 32-bit unsigned long to 8-bits 
  uint8_t n[10];      // n is an array of 10 8-bit unsigned integers 
  // initialize each element of array n to a random number
  for (int i = 0; i < 10; i++) {
    n[i] = random(255);   // random number between 0 and 255
  } 
  Serial.print("Element        ");  
  Serial.println("Value");
  // output each array element's value
  for ( int i = 0; i < 10; i++) { 
    Serial.print("      "); 
    Serial.print(i);    
    Serial.print("           "); 
    Serial.println(n[i]); 
  } 
}

Here is a sample output.

Index Value
0 36
1 94
2 54
3 45
4 94
5 170
6 103
7 162
8 3
9 8

Example – Running Average

In this example, I will show you how to implement a running average (digital low-pass filter) in your program using an array. It is very useful for inputs that have a high variance between samples (high frequency component), yet only vary slightly over time (low frequency component). An example would be the analog output of an IR sensor as read by the 10-bit ADC peripheral subsystem of an ATmega microcontroller.

Stability and accuracy of the final value will depend on the number of samples used to calculate the running average. If the data changes slowly relative to the sample frequency of the microcontroller, then a larger sample size can be employed.

The example program below reads from an analog pin and computes the running average for a sample size of 8.

Notes:
1.  Alternative array initialization includes

window[g_width];
window[g_width] = {};

2. Alternative implementations of limit check on circular buffer include the modulus and "and" operator. The first alternative is not recommended without an fpu. The second alternative requires a buffer size that is a multiple of 2.

n % 8;
n &= 0x000F;

3. You could also use a shift operator in place of division (multiple of 2) Unfortunately, our processor does not have a barrel shifter. Alternative float version: return sum/float(g_width);

C++ Passing Arrays to Functions

C++ does not allow you to pass an entire array as an argument to a function. However, you can pass a pointer to an array by specifying the array's name without an index.

If you want to pass a single-dimension array as an argument in a function, you will have to declare a pointer to the array as a formal parameter in the function declaration. Here is a quick review of function declarations so that you understand what is being done here.

C++ Function Declarations – Review

When you need to create a custom function that will be used in your program, it must be declared and/or defined. The format for a declaration is shown below.

[Data type of output parameter] [Function Name] ([Data Types of input parameters]) {}

The first part of the function declaration is the data type of the output parameter. This is the type of value that will be returned from the function. This can be an int, float, pointer, or etc. If there is no value to be returned, you will use the void data type. You only need to indicate the data type, as there is no need to name the output.

The second part of the function declaration is the name of the function. This is what is used to call the function. A general convention for naming functions is to have the first word in lowercase and capitalize the first letter of the second word.

The third part of the function declaration are the data types of the input parameters. Any variables or values that the function will need to use must be defined here. This is related to the concept of the scope of a function because copies of those variables or values are created when the input parameters are defined. They are not readily available to the custom functions unless the variables or values have a global scope. It is generally not recommended to use global variables. If there are no inputs, you will have nothing inside of the parentheses.

The final part of a function declaration are the brackets. This is where the function is defined and the code to be performed are placed here. When programming within the Arduino IDE, you will not need to worry about where a function is declared or defined as long as it is located within the main “.ino” file. If a function is called near the beginning of the program and it is defined much later in the code, the compiler will not have an issue with finding the function and will not give a scope error. For this reason, you only need to declare and define a function a single time rather than provide a declaration as part of the header (.h) file, as is the case within traditional C++ IDEs, like Eclipse. Here is an example of a function declaration.

void myFunction(int input1, float input2, char input3) {}

This function is named myFunction and will return no output. It has three input values that are of three different types. The inputs are named input1, input2, and input3 and are only available to be used within the function. There are used more as place holders for the variable or value that you intend to use in the function.

If you are trying to call this example function, you will need to use the following code with the assumption that x is defined as an int, y is defined as a float, and z is defined as a char.

myFunction(x, y, z);

C++ Arguments versus Parameters – Review

Now that we have covered function declarations, it is important to note the distinction between arguments and parameters. A parameter is defined in the function declaration and indicates the type of data that is used. The parameter name is usually a placeholder to represent the kind of value or variable that the function works with. In the function declaration of myFunction the parameters are input1, input2, and input3.

An argument is the actual value or variable being used when a function is called. Arguments are passed to a function and are modified by the function code. From the example above, x, y, and z are the three arguments being sent to myFunction.

Defining A Pointer Parameter

After all of that review, we can now introduce how to define a pointer parameter for a function declaration. There are three ways to do this and all three declaration methods produce similar results because each tells the compiler that an integer pointer is going to be received.

Traditional Way

This form is the general form that has been used the longest and is distinct because of the use of the * to indicate it is a pointer. This is also the most outdated method and should not be used.

void myFunction(int * param) {

 // function definition

}

Modern Way

This is the modern way to define a pointer parameter because it is clear that an array is being used and the intended size of that array is known.

void myFunction(int param[10]) {  

  // function definition

}

Modern Way without Bounds Checking

The final method is to define the pointer parameter as an unsized array that can work with any size array argument that is passed to it. As shown in the next example, this way can allow the definition of more general purpose functions. On the minus side, this way can not do bounds checking on your array (i.e., garbage in garbage out).

void myFunction(int param[]) {

 // function definition

}

Now, consider the getAverage function, which takes an array as an argument along with its is size and based on this information returns the average of the numbers contained in the array.

int getAverage(int arr[], int size) {
  int sum = 0;
  for (int i = 0; i < size; ++i) {
    sum += arr[i];
  }
  float avg = float(sum) / size;
  return (int) avg;              // round up
}

Now, lets call the above function.

void setup () {
  // an int array with 5 elements.
  int balance[5] = {1000, 2, 3, 17, 50}; 
  // pass pointer to the array as an argument.   
  int avg = getAverage(balance, 5);  // balance argument is a pointer
  // output the returned value
  Serial.print(“Average value is: ”);
  Serial.print(avg);
}

When the above code is compiled and run, it produces the following result

Average value is: 214

As you can see, the length of the array doesn't matter as far as the function is concerned because C++ performs no bounds checking for the formal parameters.

Returning an Array from a Function

Just as you can not pass the contents of an array as an argument in a function call, you can not ask a function to return an array, instead you must return a pointer to an array instead. You can do this by defining the data type and using an * to indicate it is a pointer as shown below.

int * myFunction() {}

If the output array is created within the function, qualify the array data type as static, so the pointer being returned is not the address of a local variable that could disappear once the program is out of the scope of the function. What this is referring to is the fact that the program will delete all local variables after it returns from the function (technically the stack frame is destroyed) and the pointer is no longer pointing to the array. By using the static qualifier, it will preserve the data at the address defined by the pointer.

The example shown below will return a pointer to an array of 10 random numbers.

int * getRandom() {
  static int r[10];
  randomSeed( (uint8_t) millis());
  for (int i = 0; i < 10; ++i) {
    r[i] = random(255);  // random number between 0 and 255
    Serial.println(r[i]);
  }
  return r;
}

You can also define a pointer as a variable and initialize its value to be the address of an array. The pointer will need to be the same data type as the array. The example shown below works with the getRandom() function.

int *p;
p = getRandom();

By modifying the pointer, you can access the different elements of the array instead of using the index value. The code below will modify the 2nd element of the array of random numbers to be 20 and the 7th element to be 100.

*(p + 1) = 20;
*(p + 6) = 100;

Structured Data Types in Program Memory (PROGMEM)

Reference:

  1. Arduino PROGMEM https://playground.arduino.cc/Main/PROGMEM
  2. Data in Program Space http://www.nongnu.org/avr-libc/user-manual/pgmspace.html

The C++ compiler was designed for computers based on the Princeton memory model. In a Princeton memory model both the program and the data share a common memory, in this case SRAM. Therefore, by default in C++ all arrays are created and stored within SRAM. Many embedded systems, like the AVR family of microcontrollers are based on the Harvard memory model. In a Harvard memory model, the program and data memory reside in different physical memories; specifically, Flash Program and SRAM respectively.  To make matters worse, microcontrollers contain significantly more Flash than SRAM memory. For example, the ATmega32U4 contains 32 Kbytes of Flash memory versus only 2.5 Kbytes of SRAM. Therefore, if data in an array is constant and it takes up a lot of space, you should save it all in Flash program memory - but how when the C++ compiler want to put it all in SRAM?

In order to solve this conundrum within the Arduino IDE (GCC C++ compiler), we will be using the avr/pgmspace.h library that is available for the AVR architecture only. It provides definitions for the different data types that can be saved into Flash program memory and all of the functions needed to interact (read, find, compare, etc) with the data saved there. To use the library, you need to add the following line of code with other include statements at the top of the program.

Once it has been included, you can start defining the data to be saved into Flash program memory using the PROGMEM variable modifier. It can be added before the data type of the variable or after the variable name and it indicates that this variable (arrays included) is to be saved in Flash program memory. Below are some examples of how to use it.

const PROGMEM dataType variableName[] = {}; // I like this form
const dataType variableName[] PROGMEM = {}; // this form also works
const dataType PROGMEM variableName[] = {}; // but maybe not this one
  • program memory dataType - any program memory variable type
  • variableName - the name for your array of data

Rules for initialization of the arrays within Flash program memory are the same as conventional arrays.

const PROGMEM uint8_t myData[11][10] =
{
{0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09},
{0x0A,0x0B,0x0C,0x0D,0x0E,0x0F,0x10,0x11,0x12,0x13},
{0x14,0x15,0x16,0x17,0x18,0x19,0x1A,0x1B,0x1C,0x1D},
{0x1E,0x1F,0x20,0x21,0x22,0x23,0x24,0x25,0x26,0x27},
{0x28,0x29,0x2A,0x2B,0x2C,0x2D,0x2E,0x2F,0x30,0x31},
{0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x3A,0x3B},
{0x3C,0x3D,0x3E,0x3F,0x40,0x41,0x42,0x43,0x44,0x45},
{0x46,0x47,0x48,0x49,0x4A,0x4B,0x4C,0x4D,0x4E,0x4F},
{0x50,0x51,0x52,0x53,0x54,0x55,0x56,0x57,0x58,0x59},
{0x5A,0x5B,0x5C,0x5D,0x5E,0x5F,0x60,0x61,0x62,0x63},
{0x64,0x65,0x66,0x67,0x68,0x69,0x6A,0x6B,0x6C,0x6D}
};

One trick, I have used in past is to create the array in Microsoft Excel and to export as CSV (Comma delimited) open in Notepad and then copy-paste into the program.

Now that your data resides in Flash program memory, your code to access (read) the data will no longer work The code that gets generated will retrieve the data that is located at the address of the myData array, plus offsets indexed by the i and j  variables. However, the final address that is calculated points to the SRAM Data Space  Not Flash Program Space where the data is actually located. So your program will be retrieving garbage. Again, the problem is that C++ compiler does not intrinsically know that the data resides in the Flash program memory.

Consequently, you will need to use the pgm_read_byte() or pgm_read_byte_near()functions. From my reading, for AVR microcontroller 16-bit address applications, pgm_read_byte() resolves to pgm_read_byte_near. One way to access the data bytes in the myData 2-dimensional array is by translating the SRAM base + index address into a Flash program memory address using the referencing operator " . 

uint8_t X = pgm_read_byte(&(myData[i][j]));      // to learn more about this approach click here

Here is another way.

uint8_t X = pgm_read_byte(myData + i*10 + j);   // to learn more about this approach click here.

In the past I have only worked with one dimensional arrays. Therefore both approaches above are untested.

There are different functions for each of the data types (word, dword, float,  and ptr). Click here for more information about these functions and the pgmspace library.

Here is a nice Arduino IDE script showing how to use PROGMEM and the pgmspace library to print out a 1-dimensional array of 16-bit unsigned word values and a text string.

 

// array of unsigned 16-bit integers
const PROGMEM uint16_t charSet[] = { 65000, 32796, 16843, 10, 11234};

// array of 8-bit characters
const PROGMEM char signMessage[] = {"Hello World"};

char myChar;

void setup() {
 Serial.begin(9600);
 while(!Serial);     // make sure buffer is empty
 // put your setup code here, to run once:
 // read back an array of 2-byte integers
 for (int k = 0; k < 5; k++){
   uint8_t displayInt = pgm_read_word_near(charSet + k);
   Serial.println(displayInt);
 }
 Serial.println();

 // read back an array of char
 int len = strlen_P(signMessage);
 for (k = 0; k < len; k++){
   myChar = pgm_read_byte_near(signMessage + k);
   Serial.print(myChar);
 }
 Serial.println();
}
void loop() {
  // put your main code here, to run repeatedly:
}

Caveats

The macros and functions used to retrieve data from the Program Space generate some extra code in order to actually load the data from the Program Space. This incurs extra overhead in terms of code space (extra machine code instructions) and therefore execution time. Usually, both the space and time overhead is minimal compared to the space savings of putting data in Flash program memory. But you should be aware of this so you can minimize the number of calls within a single function that gets the same piece of data from Flash. It is always instructive and fun to look at the resulting disassembly from the compiler.

Data Structures

Data Structures in C++ provide a greater level of organization for complex systems. Before going into greater details arrays will be reviewed as they give a good insight on how structures work. for example: type array[size]; C++ arrays are a fixed length of a singular data type. This allows us to collect groups of numbers for data for analysis. The issue comes when collections of data are not conducive to arrays or if there is a mixture of data types. Starting from basic C++ foundation, the immediate idea is to make sets of arrays and organize the groups of information by index. That is the core premise of structures! Structures also lays a good foundation as to how Object Oriented Programming (OOP) is organized.

I will take my example from the myRobot array introduced at the beginning of this page. To review, I want to track a robot's progress in a maze containing obstacles. For fun the robot looks like a bear and the obstacles are defined as bee stings. As the robot moves from room-to-room the program needs to keep track of 6 variables.

As an alternative to working with 6 variables at a time, I created a single array named myRobot containing 6 unsigned 8-bit integers corresponding to these data.

  //                    dir   turn  row   col   room  bees
  uint8_t myRobot[6] = {0x03, 0x00, 0x14, 0x00, 0x00, 0x00};

While consolidating all this data into a single array, accessing the array does not add any clarity to my program. For example, if I wanted to find out the row the robot was in, I would add the code segment myRobot[2] - not very helpful. Data structures allow us to have the best of both worlds

I will now define a struct named MyRobot_t and group together all the variables we want to keep track of. From our discussion above or group will consist of variables dir, turn, row, col, room, and bees. The generation of this data structure looks like this:

struct MyRobot_t {
  uint8_t dir;
  uint8_t turn;
  uint8_t row;
  uint8_t col;
  uint8_t room;
  uint8_t bees;
}

MyRobot_t Robot

This definition needs to be placed within a header file or else the Arduino IDE will return several errors when we try to use the structure. We will learn more about header files in the future. For example, in Lab 4 the code above is placed in a header file named maze.h. With this structure, we have defined all of our variables as uint8_t but they can be any type. In the main loop, we can create a static structure to hold all of our relevant data.

Wait, we discussed that structs can hold multiple data types but we are holding only one type here. Why not use an array? While you can definitely do the same thing with an array, structs make the code more readable. Now instead of Robot[1] = newTurn;

The instruction would be Robot.turn = newTurn;

Since we need all of our updating functions to have access to the information we will define a static struct inside loop. The struct is now functionally a data type so many of the same rules apply. (const/static/ scope/ etc.) Our Instantiation of the Struct will look like.

void loop() {
  static MyRobot_t Robot{0x03,0x00,0x14,0x00,0x00,0x00};
  // more code
}

This creates a structure called Robot that initializes the values of each variable to the values listed (similar to an array). The list of default values fills the array in the order it was declared in our header file. So for our case dir = 0x03, turn = 0x00 and etc. If you want to be more rigid in your default settings you may define this in the structure prototype as shown below.

In our header…

struct MyRobot_t {
  uint8_t dir  = 0x03;
  uint8_t turn = 0x00;
    .
    .
    .
   etc
}

Object Oriented Programming Example

Class.h File

class sensor { 
  private: /* these properties can't be modified outside the class */

  // defined by the constructor
  uint8_t pin;           // IR analog pin (A0, A1, A2, A3)
  float Vref = 3.3;      // analog reference voltage
  uint8_t buffer_depth;  // number of 10-bit ADC samples averaged
  // internal properties // over the data stream. maximum size = 64 (2^6) 
  uint8_t pointer = 0;   // index pointer into the buffer
  // points to oldest value
  uint16_t circular_buffer[64] = {0}; // window into the data stream.
  uint16_t sum = 0;     // sum of the readings in the buffer

  public: /* these methods can be called outside the class and can use the private properties */

  // constructor example sensor(A0,8,3.3)
  sensor(uint8_t ic_pin,uint8_t circular_buffer_depth,float voltage_reference){
    pin = ic_pin;
    buffer_depth = circular_buffer_depth;
    Vref = voltage_reference;

    // fill the buffer
    for (int i=0; i < buffer_depth;i++){
      readSensor();
    }
  }

  void readSensor(){
   int16_t A_1 = analogRead(pin);            // 1. read analog value 
   uint16_t A_0 = circular_buffer[pointer];  // 2. get oldest reading
   sum += A_1 - A_0;                         // 3. subtract out oldest reading and 
   // replace with current reading 
   // update circular_buffer 
   circular_buffer[pointer] = A_1; 
   pointer++;
   if(pointer >= buffer_depth) pointer = 0; 
  }

  uint16_t getAvg(){
    return sum/buffer_depth;
  }

  float getVpin(){
    return float(getAvg()) * Vref/1024.0;
  } 
};

Arduino Main Page

#include "3DoTconfig.h"
#include "Class.h"

sensor IR_R(A0,8,3.3); // instantiate right IR sensor 
sensor IR_L(A2,8,3.3); // instantiate left IR sensor

void setup() {

}

void loop() {
  IR_R.readSensor();

  uint16_t average = IR_R.getAvg();
  float voltage = IR_R.getVpin();

  Serial.print("Average: ");
  Serial.println(average);
  Serial.print("Voltage: ");
  Serial.println(voltage);
  delay(500);
}

Review Questions

  1.  The following code will create an array of how many elements? int testarray[14];
  2. Write the code that will assign the value of the 3rd element of the array called
    balance to a variable called change
  3. Can an array be defined without an array size?
  4. The action of sending variable values to be used in a function is called what?
  5. The void data type is used for what?
  6. Write one of the ways to define a pointer as an input parameter for the function
    testFunction()
  7. Write the function definition for a function called returnPointer that has no input
    parameters and returns a pointer to an integer
  8. Where are arrays normally saved to?
  9.  What function from the pgmspace library should be used to read a value from an
    array?
  10. What type of buffer is required to implement a running average?

Answers

Using your mouse, highlight below in order to reveal the answers.

  1. 14 elements
  2.  change = balance[2];
  3.  No, all arrays must have a defined size.
  4.  Passing arguments (may need to remove "values" from the question to be less
    confusing)
  5.  Void is used for defining that a function does not have an output returned
  6.  Void testFunction(int *pointer) {} , void testFunction(int pointer[]) {}, void
    testFunction(int pointer[10]) {}
  7.  int *pointer returnPointer() {}
  8. SRAM
  9. pgm_read_byte_near()
  10. circular

Appendix A Statistics Package

int readADC(int sensorPin){
int n = 200; // number of ADC samples
int x_i; // ADC input for sample i
float A_1; // current i running average
float A_0; // previous i-1 running average

// rapid calculation method http://en.wikipedia.org/wiki/Standard_deviation
A_0 = 0;
for (int i=1; i <= n; i++){
x_i = analogRead(sensorPin);
A_1 = A_0 + (x_i - A_0)/i ;
A_0 = A_1;
}

// Serial.print(", mean = ");
// Serial.println(A_1);

return (int(A_1));
}