Intro to Arduino Assembly – Class Lectures
Introduction to AVR Assembly Language Programming
READING
The AVR Microcontroller and Embedded Systems using Assembly and C by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Sections: 0.1, 0.2, 1.1, 2.5, 2.6, 2.7
SOURCE MATERIAL
These are some of the sources I used to develop the material used in the lecture series.
- Reduced Instruction Set Computer
- Atmel AVR Assembler User Guide
- Atmel AVR
- AVR Quick Reference Guide:
- ATmega328P Summary (26 pages)
- ATmega328P (448 pages)
- 8-bit AVR Instruction Set (155 pages)
Table of Contents
What is an Embedded System?
- An embedded system is an electronic system that contains at least one controlling device, i.e. “the brain”, but in such a way that it is hidden from the end user. That is, the controller is embedded so far in the system that usually users don’t realize its presence.
- Embedded systems perform a dedicated function.
What is the Controlling Device?
EE Course | Technology | Tools |
EE201 | Discrete Logic | Boolean Algebra |
EE301 | Field Programmable Gate Array (FPGA), Application-Specific Integrated Circuit (ASIC) | HDL (typically VHDL or Verilog) |
EE346 | Microcontroller | Program (typically C++ or Assembly) |
EE443 | System on a Chip (SoC) | System Level Design Language |
What is an Arduino?
- Arduino is an open-source electronics PCB containing a microcontroller and the things needed to support it: Power Supply, Communications, Reset Button, Clock, and Connectors for adding Sensors and Actuators in the physical world.
-
Using an Arduino you can develop interactive objects, taking inputs from a variety of switches or sensors, and controlling a variety of lights, motors, and other physical outputs.
- The Arduino consists of two parts; the hardware and the software.
- We will be using the Arduino Uno which contains an ATmega328P 8 bit microcontroller.
- We will be using AVR Studio to develop the software for the Arduino in place of the Arduino IDE and associated Scripting Language.
What is a CSULB Shield?
The CSULB Shield was designed to meet the educational objectives of EE 346.
The shield works with the Arduino Uno, Duemilanove (rev 2009b), and Mega Microcontroller Boards.
CSULB Shield Specifications
- Input
- 8 Toggle Switches
- General Purpose Push Button
- Reset Button
- Output
- 8 Discrete Red LEDs
- 3 Discrete Green LEDs
- 1 7-Segment Display
Building Blocks
What is an Input and Output Peripheral Device?
- A device attached to a controlling mechanism, for example a computer, yet is not actually part of the controlling mechanism, and whose operation is functionally dependent upon the controlling mechanism.
How do you design this controlling mechanism?
- If you control peripherals using Discrete Electronics or a Programmable Logic Device (PLD) such as an FPGA or ASIC, then the control is in hardware (EE201, EE301).
- If you control peripherals using a Microcontroller then the control is in software (EE346 and EE444), implemented by a Program.
- If you control peripheral using a System on a Chip (SoC) then the control may be in software and/or hardware (EE443).
What is a Program?
- The Program is a “very specific list of instructions” to the computer.
- The process of “creating the program” is where much of an electrical engineer’s time is spent.
- The program is often referred to as Software, while the physical system components are called Hardware. Software held within non-volatile memory is called Firmware.
- Software design is all about creating patterns of 0’s and 1’s in order to get the computer to do what we want. These 0’s and 1’s are known as Machine Code.
0010 0111 0000 0000 → 1110 1111 0001 1111 → 1011 1001 0000 0111 → 1011 1001 0001 1000
1011 1001 0000 0100 → 1011 0000 0111 0110 → 1011 1000 0111 0101 → 1100 1111 1111 1101
- The architecture of the processer (or computer) within a microcontroller is unique as are the Machine Code Instructions it understands.
0010 0111 0000 0000
1110 1111 0001 1111
- The list of Machine Code Instructions understood by a Microcontroller is known as the Machine Language.
How is Machine Code Related to Assembly Language?
Machine Code (The language of the machine)
- Binary Code (bit sequence) that directs the computer to carry out (execute) a pre-defined operation.
0010 0111 0000 0000
1110 1111 0001 1111
1011 1001 0000 0111
1011 1001 0001 1000
Assembly Language
- A computer language where there is a one-to-one correspondence between a symbolic (assembly language instruction) and a machine code
- The language of the machine in human readable form
clr r16
ser r17
out DDRC, r16
out PORTC, r17
Corollary
- Specific to a single computer or class of computers (non-portable)
Anatomy of an Assembly Instruction?
Sample Code Segment
Machine Code | Assembly Code | |
Binary | Hex | |
0010 0111 0000 0000 | 0x2700 | clr r16 |
1110 1111 0001 1111 | 0xEF1F | ser r17 |
1011 1001 0000 0111 | 0xB907 | out DDRC, r16 |
1011 1001 0001 1000 | 0xB918 | out PORTC, r17 |
- The Operation Code or Opcode for short, is a mnemonic that tells the CPU what instruction is to be executed. In the sample code above that would be clr (clear), ser (set register), and out (output to I/O location). One or more operands follow the Opcode.
- The Operand(s) specify the location of the data that is to be operated on by the CPU. In many cases it is the Arithmetic Logic Unit (ALU) that performs the specified operation.
Design Example
Write an Assembly Program to turn a light on and off with a switch. A similar program was used in the design of The Wake-up Machine.
Development Steps
Assembly and Microcontroller Overview
Help
0010 0111 0000 00002 = 270016 = clr r16
An Important part of this course is understanding the Design and Language of “The Computer.”
The computer implements the classical digital gate you learned in your Digital Logic class (EE201) in software with instructions like and, or, and eor/xor.
You are also going to have to seamlessly move from binary to hexadecimal and back again (i.e., Number Systems).
Computer programs move data through Registers, so a working knowledge of Flip-Flops and Registers is also an important foundational part of this class.
Finally, instead of designing with gates (EE201) you will be designing with code. So you will need to review Programming concepts like: data transfer (assignment expressions) , arithmetic and logic operators, control transfer (branching and looping), and bit and bit test operators that you leaned in your programming class (CECS174 or CECS100).
The good news is that help is available in Chapter 0: “Introduction to Computing” of your textbook, the supplemental reading provided at the beginning of this document, the web, and the following sections.
Numbers and Their Computer Representation Number System.
Introduction
Base 10 result of ten fingers
Arabic symbols 0-9, India created Zero and Positional Notation
Other Systems: Roman Numerals: essentially additive, Importance of Roman Numeral lies in whether a symbol precedes or follows another symbol. Ex. IV = 4 versus VI = 6. This was a very clumsy system for arithmetic operations.
Positional Notation (Positive Real Integers)
Fractional numbers will not be considered but it should be noted that the addition of said would be a simple and logical addition to the theory presented.
The value of each digit is determined by its position. Note pronunciation of 256 “Two Hundred and Fifty Six?
Ex. 256 = 2*102 + 5*101 + 6*100
Generalization to any base or radix
Base or Radix = Number of different digit which can occur in each position in the number system.
N = Anrn + An-1rn-1 + … + A1r1 + A0r0 (or simple A1r + A0)
Binary
The operation of most digital devices is binary by nature, either they are on or off.
Examples: Switch, Relay, Tube, Transistor, and Transistor-Transisor-Logic Integrated circuit (TTL IC)
Thus it is only logical for a digital computer to in base 2.
Note: Future devices may not have this characteristic, and this is one of the reasons the basics and theory are important. For they add flexibility to the system.
In the Binary system there are only 2 states allowed; 0 and 1 (FALSE or TRUE, OFF or ON)
Example: Most Significant Bit
Bit = One Binary Digit (0 or 1)
This positional related equation also gives us a tool for converting from a given radix to base 10 – in this example Binary to Decimal.
Base Eight and Base Sixteen
Early in the development of the digital computer Von Neuman realized the usefulness of operating in intermediate base systems such as base 8 (or Octal)
By grouping 3 binary digits or bits one octal digit is formed. Note that 23 = 8
Binary to Octal Conversion Table
222120
0 0 0 = 0
0 0 1 = 1
0 1 0 = 2
0 1 1 = 3
1 0 0 = 4
1 0 1 = 5
1 1 0 = 6
1 1 1 = 7
Symbols (not numbers) 8 and 9 are not used in octal.
Example: 100 001 010 110
4 1 2 6 8 = 4*83 + 1*82 + 2*81 + 6*80 = 2134
This is another effective way of going from base 2 to base 10
Summary: Base 8 allows you to work in the language of the computer without dealing with large numbers of ones and zeros. This is made possible through the simplicity of conversion from base 8 to base 2 and back again.
In microcomputers groupings of 4 bits (as opposed to 3 bits) or base 16 (24) is used. Originally pronounced Sexadecimal, base 16 was quickly renamed Hexadecimal (this really should be base 6).
Binary to Hex Conversion Table
0 0 0 0 = 0
0 0 0 1 = 1
0 0 1 0 = 2
0 0 1 1 = 3
0 1 0 0 = 4
0 1 0 1 = 5
0 1 1 0 = 6
0 1 1 1 = 7
1 0 0 0 = 8
1 0 0 1 = 9
1 0 1 0 = A
1 0 1 1 = B
1 1 0 0 = C
1 1 0 1 = D
1 1 1 0 = E
1 1 1 1 = F
In Hex Symbols for 10 to 15 are borrowed from the alphabet. This shows how relative numbers really are or in other words, they truly are just symbols.
Example: 1000 0101 0110
8 5 6 16 = 8*162 + 5*161 + 6*160 = 2134
It is not as hard to work in base 16 as you might think, although it does take a little practice.
Conversion from Base 10 to a Given Radix (or Base)
Successive Division is best demonstrated by an example
To get the digits in the right order let them fall to the right.
For this example: 4310 = 1010112 Quick Check (Octal) 101 011 = 5*8 + 3 = 4310
Another example: Convert 4310 from decimal to Octal
For this example: 4310 = 538 Quick Check (Octal) 5*8 + 3 = 4310
Generalization of the procedure OR Why It Works
Where r = radix, N = number, A = remainder, and n = the number of digits in radix r for number N. Division is normally done in base 10.
Another way of expressing the above table is:
N = r*N1 + A0
N1 = r*N2 + A1
N2 = r*N3 + A2
:
Nn-1 = r*Nn + An-1
Nn = r*0 + An
or (now for the slight of hand)
N = r*( r*N2 + A1)+ A0 substitute N1
N = r2N2 + rA1+ A0 multiply r through equation
N = r2(r*N3 + A2) + rA1+ A0 substitute N2
:
N = Anrn + An-1rn-1 + … + A1r1 + A0r0
Nomenclature
Bit = 1 binary digit
Byte = 8 bits
Nibble = one half byte = 4 bits
Word = Computer Dependent
Arithmetic Operations
Addition
Binary
Binary addition is performed similar to decimal addition using the following binary addition rules:
0 + 0 = 0
0 + 1 = 1
1 + 0 = 1
1 + 1 = 10 (0 with a carry of 1)
Examples:
Problem | 2110 + 1010 = 3110 | 4510 + 5410 = 9910 | 310 + 710 = 1010 |
101012
+ 010102 _______________ 111112 |
1011012
+ 1101102 _______________ 11000112 |
0112
+ 1112 _______________ 10102 |
|
Check | 1*23 + 0*22 + 1*21 + 0*20 =
1*8 + 0*4 + 1*2 + 0*1 = 1010 |
Octal
Octal addition is also performed similar to decimal addition except that each digit has a range of 0 to 7 instead of 0 to 9.
Problem | 2110 + 1010 = 3110 | 4510 + 5410 = 9910 | 310 + 710 = 1010 |
258
+ 128 _______________ 378 |
558 + 668 _______________ 1438 |
38 + 78 _______________ 128 |
|
Check | 3*81 + 7*80
3*8 + 7*1 = 3110 |
1*82 + 4*81 + 3*80
64 + 32 + 3 = 9910 |
1*81 + 2*80
8 + 2 = 1010 |
Hexadecimal
Hex addition is also performed similar to decimal addition except that each digit has a range of 0 to 15 instead of 0 to 9.
Problem | 2110 + 1010 = 3110 | 4510 + 5410 = 9910 | 310 + 710 = 1010 |
1516 + 0A16 _______________ 1F16 |
2D16 + 3616 _______________ 6316 |
316 + 716 _______________ A16 (not 10) |
|
Check | 1*161 + 15*160
16 + 15 = 3110 |
6*161 + 3*160
96 + 3 = 9910 |
10*160
1010 |
Binary Multiplication
Decimal | Binary |
1110 x 1310 _______________ 3310 1110– _______________ 14310 |
10112 x 11012 _______________ 10112 00002- 10112– 10112— _______________ 100011112 |
Check | 8*161 + 15*160
128 + 15 = 14310 |
Binary Division
Check: 1*161 + 5*160 = 16 + 5 = 2110
Practice arithmetic operations by making problems up and then checking your answers by converting them back to base 10 via different bases (i.e., 2, 8, and 16).
How a computer performs arithmetic operations is a much more involved subject and has not been dealt with within this section.
Complements and Negative Numbers OR Adding a Sign Bit
Addition, Multiplication, and Division is nice but what about subtraction and negative numbers? From grade school you have learned that subtraction is simply the addition of a negative number. Mathematicians along with engineers have exploited this principle along with modulo arithmetic — a natural outgrowth of adders of finite width — to allow computers to operate on negative numbers without adding any new hardware elements to the arithmetic logic unit (ALU).
Sign Magnitude
Here is a simple solution, just add a sign bit. To implement this solution in hardware you will need to create a subtractor; which means more money.
sign magnitude
Example: – 2 = 1 00102
Ones Complement
Here is a solution that is a little more complex. Add the sign bit and invert each bit making up the magnitude — simply change the 1’s to 0’s and the 0’s to 1’s.
sign magnitude
Example: – 2 = 1 11012
To subtract in 1’s complement you simply add the sign and magnitude bits letting the last carry bit (from the sign) fall into the bit bucket, and then add 1 to the answer. Once again let the last carry bit fall into the bit bucket. The bit bucket is possible due to the physical size of the adder.
0 10102 10
+ _ 1 11012 +(-2)
0 10002 8
+______12 Adjustment
0 10012
Although you can now use your hardware adder to subtract numbers, you now need to add 1 to the answer. This again means adding hardware. Compounding this problem, ones complement allows two numbers to equal 0 (schizophrenic zero).
Twos Complement
Here is a solution that is a little more complex to set up, but needs no adjustments at the end of the addition. There are two ways to take the twos complement of a number.
Method 1 = Take the 1’s complement and add 1
__0 00102 2 <- start
+ 1 11012 1’s complement (i.e. invert)
+ 12 add 1
1 11102
Method 2 = Move from right to left until a 1 is encountered then invert.
0 00102 | start = 210 |
02 | no change |
102 | no change but one is encountered |
1102 | invert = change 0 to 1 |
11102 | invert = change 0 to 1 |
1 11102 | invert = change 0 to 1 |
Subtraction in twos complement is the same as addition. No adjustment is needed, and twos complement has no schizophrenic zero although it does have an additional negative number (see How It Works).
0 10102 10
+ 1 11102 +(-2)
0 10012 8
Examples:
Problem | 3310 – 1910 = 1410 | 6910 – 8410 = -1510 |
0 1000012 + 1 1011012 _______________ 0 0011102 |
0 10001012 + 1 01011002 _______________ 1 11100012 |
|
Check | convert to intermediate base E16 = 1410 | convert back to sign magnitude
– 00011112 convert to intermediate base (16) – F16 = – 1510 |
Why It Works
Real adders have a finite number of bits, which leads naturally to modulo arithmetic — the bit bucket.
With arithmetic now reduced to going around in circles, positive numbers can add up to negative and vice-versa. Two tests provide a quick check on whether or not an “Overflow” condition exists.
Test 1 = If the two numbers are negative and the answer is positive, an overflow has occurred.
Test 2 = If the two number are positive and the answer is negative, an overflow has occurred.
If computers were calculators and the world was a perfect place, we would be done. But they are not and so we continue by looking at a few real world problems and their solutions.
Character Codes OR Non-Numeric Information
Decimal Number Problems
Represent a Decimal Numbers in a Binary Computer. A binary representation of a decimal number, a few years ago, might have been “hard wired” into the arithmetic logic unit (ALU) of the computer. Today it, more likely than not, is simply representing some information that is naturally represented in base 10, for example your student ID.
Solution
In this problem, ten different digits need to be represented. Using 4 bits 24 or 16 combinations can be created. Using 3 bits 23 or 8 combinations can be created. Thus 4 bits will be required to represent one Decimal Digit. It should here be pointed out how 16 combinations can be created from 4 bits (0000 – 1111) while the largest numeric value that can be represented is 15. The reason that the highest numeric value and the number of combinations are different, is due to zero (0) being one of the combinations. This difference points up the need to always keep track of wetter or not you are working zero or one relative and what exactly you are after — a binary number or combinations.
The most common way of representing a decimal number is named Binary Coded Decimal (BCD). Here each binary number corresponds to its decimal equivalent, with numbers larger than 9 simply not allowed. BCD is also known as an 8-4-2-1 code since each number represents the respective weights of the binary digits. In contrast the Excess-3 code is an unweighted code used in earlier computers. Its code assignment comes from the corresponding BCD code plus 3. The Excess-3 code had the advantage that by complementing each digit of the binary code representation of a decimal digit (1’s complement), the 9’s complement of that digit would be formed. The following table lists each decimal digit and its BCD and Excess-3 code equivalent representation. I have also included the negative equivalent of each decimal digit encoded using the Excess-3 code. For instance, the complement of 0100 (1 decimal) is 1011, which is 8 decimal. You can find more decimal codes on page 18 of “Digital Design” by M. Morris Mano (course text).
Binary Coded Decimal (BCD) | Excess-3 | |||
Decimal Digit | Binary Code 8‑4-2-1 | Decimal Digit | Binary Code | 9’s Compliment |
0 1 2 3 4 5 6 7 8 9 N/A N/A N/A N/A N/A N/A |
0000
0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 |
N/A
N/A N/A 0 1 2 3 4 5 6 7 8 9 N/A N/A N/A |
0000
0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 |
1111 1110 1101 1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001 0000 |
Alphanumeric Character Problem
Represent Alphanumeric data (lower and upper case letters of the alphabet (a-z, A-Z), digital numbers (0-9), and special symbols (carriage return, line feed, period, etc.).
Solution
To represent the upper and lower case letters of the alphabet, plus ten numbers, you need at least 62 (2×26+10) unique combinations. Although a code using only six binary digits providing 26 or 64 unique combinations would work, only 2 combinations would be left for special symbols. On the other hand a code using 7 bits provides 27 or 128 combinations, which provides more than enough room for the alphabet, numbers, and special symbols. So who decides which binary combinations correspond to what character. Here there is no “best way.” About thirty years ago IBM came out with a new series of computers which used 8 bits to store one character (28 = 256 combinations), and devised the Extended Binary-Coded Decimal Interchange Code (EBCDIC pronounced ep-su-dec) for this purpose. Since IBM had a near monopoly on the computer field, at that time, the other computer makers refused to adopt EBCDIC, and that is how the 7bit American Standard Code for Information Interchange (ASCII) came into existence. ASCII has now been adopted by virtually all micro-computer and mini-computer manufacturers. The table below shows a partial list of the ASCII code. Page 23 of the text lists all 128 codes with explanations of the control characters.
DEC | HEX | CHAR | DEC | HEX | CHAR |
32
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
20
21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F |
!
“ # $ % & ‘ ( ) * + , – * / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? |
64
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
40
41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F |
@
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ |
The word “string” is commonly used to describe a sequence of characters stored via their numeric codes — like ASCII).
Although ASCII requires only 7 bits, the standard in computers is to use 8 bits, where the leftmost bit is set to 0. This allows you to code another 128 characters (including such things as Greek letters), giving you an extended character set, simply by letting the leftmost bit be a 1. This can also lead to a computer version of the tower of Babel. Alternatively, the leftmost bit can be used for detecting errors when transmitting characters over a telephone line. Which brings us to our next problem.
Synthesis
Although ASCII solves the communication problem between English speaking computers, what about Japanese, Chinese, or Russian computers which have different, and in all these examples, larger alphabets?
Communication Problem
Binary information may be transmitted serially (one bit at a time) through some form of communication medium such as a telephone line or a radio wave. Any external noise introduced into the medium can change bit values from 1 to 0 or visa versa.
Solution
The simplest and most common solution to the communication problem involves adding a parity bit to the information being sent. The function of the parity bit is to make the total number of 1’s being sent either odd (odd parity) or even (even parity). Thus, if any odd number of 1’s were sent but an even number of 1’s received, you know an error has occurred. The table below illustrates the appropriate parity bit (odd and even) that would be appended to a 4-bit chunk of data.
Synthesis
What happens if two binary digits change bit values? Can a system be devised to not only detect errors but to identify and correct the bit(s) that have changed? One of the most common error-correcting codes was developed by R.W. Hamming. His solution, known as a Hamming code, can be found in a very diverse set of places from a Random Access Memory (RAM) circuit to a Spacecraft telecommunications link. For more of error correcting codes read pages 299 to 302 of the text.
Although detecting errors is nice, preventing them from occurring is even better. Which of course brings us to our next problem.
Shaft Encoder Problem
As a shaft turns, you need to convert its radial position into a binary coded digital number.
Solution
The type of coder which will be briefly described below converts a shaft position to a binary-coded digital number. A number of different types of devices will perform this conversion; the type described is representative of the devices now in use, and it should be realized that more complicated coders may yield additional accuracy. Also, it is generally possible to convert a physical position into an electric analog-type signal and then convert this signal to a digital system. In general, though, more direct and accurate coders can be constructed by eliminating the intermediate step of converting a physical position to an analog electric signal. The Figure below illustrates a coded-segment disk which is coupled to the shaft.
The shaft encoder can be physically realized using electro-mechanical (brush) or electro-optical technology. Assuming an electro-optical solution, the coder disk is constructed with bands divided into transparent segments (the shaded areas) and opaque segments (the unshaded areas). A light source is put on one side of the disk, and a set of four photoelectric cells on the other side, arranged so that one cell is behind each band of the coder disk. If a transparent segment is between the light source and a light-sensitive cell, a 1 output will result; and if an opaque area is in front of the photoelectric cell, there will be a O output.
There is one basic difficulty with the coder illustrated: if the disk is in a position where the output number is changing from 011 to 100, or in any position where several bits are changing value, the output signal may become ambiguous. As with any physically realized device, no matter how carefully it is made, the coder will have erroneous outputs in several positions. If this occurs when 011 is changing to 100, several errors are possible; the value may be read as 111 or 000, either of which is a value with considerable errors. To circumvent this difficulty, engineers use a “Gray,” or “unit distance,” code to form the coder disk (see previous Figure). In this code, 2 bits never change value in successive coded binary numbers. Using a Gray coded disk, a 6 may be read as 7, or a 4 as 5, but larger errors will not be made. The Table below shows a listing of a 4-bit Gray code.
Decimal | Gray Code |
0 |
0000 0001 0011 0010 0110 0111 0101 0100 1100 1101 1111 1110 1010 1011 1001 1000 |
Synthesis
Gray code is used in a multitude of applications other than shaft encoders. For example, CMOS circuits draw the most current when they are switching. If a large number of circuits switch at the same time unwelcome phenomena such as “Ground Bounce” and “EMI Noise” can result. If the transistors are switching due to some sequential phenomena (like counting), then these unwelcome visitors can be minimized by replacing a weighted binary code by a Gray code.
If the inputs to a binary machine are from an encoder using a Gray code, each word must be converted to a conventional binary or binary-coded decimal bit equivalent. How can this be done? Before you can answer this question, you will need to learn about Boolean Algebra — what a coincidence, that’s the topic of the next section.
Introduction to the Atmel AVR Family of Microcontrollers
READING
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Sections: 0.3, 0.4, 1.2, 2.1, 2.2, 2.8, 2.9, 3.3
SOURCE MATERIAL
- Reduced Instruction Set Computer: http://en.wikipedia.org/wiki/Load-store_architecture
- Atmel AVR: http://en.wikipedia.org/wiki/Atmel_AVR
- AVR Quick Reference Guide: http://www.atmel.com/dyn/resources/prod_documents/doc4064.pdf
- ATmega328P Summary (26 pages) http://www.atmel.com/dyn/resources/prod_documents/8161S.pdf
- Arduino Uno schematic
- Arduino shield
- ATmega328P (448 pages) http://www.atmel.com/dyn/resources/prod_documents/doc8161.pdf
- 8-bit AVR Instruction Set (155 pages) http://www.atmel.com/dyn/resources/prod_documents/doc0856.pdf
Table of Contents
WHAT IS A FLIP-FLOP AND A REGISTER
You can think of a D flip-flop as a one-bit memory. As illustrated, the something to remember on the D input of flip-flop is remembered on the positive edge of the clock input.
Truth Table
Dt | Qt+1 | |
0 | 0 | |
1 | 1 | |
x | Qt |
A register is a collection of flip-flops sharing the same clock input.
Labs are based on the ATmega32U4 used in the Arduino Leonardo and Nano. The ATmega328P shown here, is used in the Arduino UNO. For instructional purposes, both architectures will be referenced. The ATmega328P is the simpler of the two architectures and the easier to learn.
THE AVR ENGINE
Let’s adopt the analogy used by Charles Babbage when he called his computer an Analytical Engine. For closer look see this article in Wired and ATmega328 Wikipedia page.
INSTRUCTION SET ARCHITECTURE (ISA)
“The Parts of the Engine”
- The Instruction Set Architecture (ISA) of a microprocessor includes all the registers that are accessible to the programmer. In other words, registers that can be modified by the instruction set of the processor.
- With respect to the AVR CPU illustrated in Figure 5.2, these ISA registers include the 32 x 8-bit general purpose registers, status register (SREG), the stack pointer (SP), and the program counter (PC).
AVR CPU CORE ARCHITECTURE
“Features of the Engine” Part I
- Reduced Instruction Set Computer (RISC): The instruction set of the computer and target compiler(s) are developed in concert allowing the optimization of both. In this way, a relatively high performance processor can be realized by “reducing” the amount of work any single instruction needs to do; leading to a simpler hardware design (smaller, faster, and cheaper).
8051 Microcontroller ATmega328P Microcontroller cjne A, 0x99, next cmp r16, 0x99
brne next - Mostly 16-bit fixed-length instructions. Instructions have from zero to two operands. Many of today’s RISC microprocessors have up to three operands.
- The Register File of the AVR CPU contains 32 x 8 bit mostly Orthogonal (or identical) General Purpose Registers – instructions can use any register; therefore, simplifying compiler design.
- Load-store memory access. Before you can do anything to data, you must first load it from memory into one of the general-purpose registers. You then use register-register instructions to operate on the data. Finally, you store your answer back into memory.
“Features of the Engine” Part II
- Modified Harvard memory model: A Harvard memory model separates Program and Data memory into separate physical memory systems (Flash and SRAM) that appear in different address spaces. A Modified Harvard memory model has the ability to read/write data items from/to program memory using special instructions. A Princeton memory model computer has only a single address space, shared by both the program and data.
- A Two-stage Instruction Pipeline (fetch and execute) resulting in most instructions being executed in one clock cycle. Consequently, the performance of a 20 MHz processor would approach 20 MIPS (Millions of Instructions Per Second). Compare this with the 8051 Complex Instructions Set Computer (CISC) computer which takes a minimum of 12 clock cycles to execute a single instructions (12 MHz clock = 1 MIPS).
- Simplicity of the computer architecture translates to a faster learning curve and utilization of the machine by the student.
AVR CPU INSTRUCTIONS
“The Language of the Machine”
The Instruction Set of our AVR CPU can be functionally divided (or classified) into five (5) categories.
Data Transfer | |
Arithmetic and Logical | |
Bit and Bit-Test | |
Control Transfer (Branch Instructions) | “Load the Program Counter” |
MCU Control | nop, sleep, wdr, break |
- Data Transfer instructions are used to Load and Store data to the General Purpose Registers, also known as the Register File.
-
- Exceptions are the push and pop instructions which modify the Stack Pointer.
- By definition these instructions do not modify the status register (SREG).
- Arithmetic and Logic Instructions plus Bit and Bit-Test Instructions use the ALU to operate on the data contained in the general purpose registers .
- Flags contained in the Status Register (SREG) provide important information concerning the results of these operations.
- For example, if you are adding two signed numbers together, you will want to know if the answer is correct. The state of the overflow flag (OV) bit within SREG gives you the answer to this question (1 = error, 0 no error).
- As the AVR processor fetches and executes instructions it automatically increments the program counter (PC) so it always points at the next instruction to be executed. Control Transfer Instructions allow you to change the contents of the PC either conditionally or unconditionally.
- Continuing our example if an error results from adding two signed numbers together we may want to conditionally (OV = 1) branch to an error handling routine.
INSTRUCTION FETCH AND EXECUTE
“The Basic Cycles of the Engine”
Once built, our computer lives to Fetch and Execute instructions, the bread-and-butter of the computer programmer. For this reason, the programmer views the computer as a vehicle for executing a set of instructions. This perspective is codified by the Instruction Set Architecture (ISA) of the computer.
HARVARD VERSUS PRINCETON MEMORY MODEL INSTRUCTION FETCH CYCLE
The five (5) steps required to fetch an instruction on a CPU incorporating the Princeton memory model is provided here. The key difference between the Princeton and Harvard memory model is the physical seperation of program memory from data memory. For embedded systems the program memory is implemented using FLASH memory. With program memory now isolated from data memory, the instruction fetch cycle is reduced to a single (1) step. What is accomplished in that single step is shown in bold.
- The CPU presents the value of the program counter (PC) on the address bus and sets the read control line.
- The Flash program memory looks up the address of the instruction and presents the value on the data bus.
- The value from the data bus is placed into the instruction register and the CPU clears the read control line. The instruction register now holds the instruction to be executed.
- The program counter is incremented so it points to the next instruction to be executed.
- The instruction decoder interprets and implements (executes) the instruction.
I/O Address Space versus Memory Mapped I/O
- Input and Output ports have traditionally been treated as separate parts of the computer.
- The AVR includes an in instruction to read from an I/O port and an out instruction to write to an I/O port.
- The AVR has 64 I/O registers accessible to these two instructions
Problem: The Atmel ATmega line of Microcontrollers needs more than 64 I/O registers (GPIO, Timers,…)
Solution: Instead of looking at computers having 5 basic elements (Input, Output, ALU, CPU, Memory), you can simplify the design to only three (CPU, ALU, and Memory) now allowing the CPU to access 160 “extended” I/O registers using SRAM instructions like lds (load from SRAM) and sts (store to SRAM).
- This was such a powerful technique that Atmel extended the I/O mapping to include the 32 general purpose registers, the original 64 I/O registers, and the 160 extended I/O registers. The overlaying of the I/O address space with the SRAM address space is shown in the next slide.
- A side benefit of the double mapping is the large number of ways of accessing data within SRAM (addressing modes) versus the limited number of instructions and addressing modes available for accessing the original 64 I/O registers (i.e., in, out).
- It is very important to realize that I/O registers are not contiguous within the address space (I/O or SRAM). The mapping is simply a convenient way of accessing registers physically located in diverse locations within the Silicon chip.
Atmel ATmega328P Memory Model
ATMEGA328P I/O MEMORY MAP
Appendix
APPENDIX A PROCESSOR CONTROL AND DATAPATH
Control | Datapath |
Component of the processor that commands the datapath, memory, data, I/O devices according to the instructions of the memory | Components of the processor that perform arithmetic operations and holds data |
APPENDIX B CALCULATING THE LAST ADDRESS
Given a 16K word (2 bytes / word) memory, what is the last address, in hexadecimal?
- The range of memory addresses, like an unsigned number, is from 0 to 2n – 1
- We are given the size of our memory in decimal as 16K10. So the first step is to convert this number to a power of 2.
16K10 = 24 * 210 = 214, which in binary would be… - Which then can directly be expressed as a binary number.
- So the answer is 0x3FFF
- As a short-cut, if you can convert the memory size to a power of 2, the exponent equals the number of 1 in the answer. By dividing the exponent by 4, you have the number of hex digits which are F (11112), with the remainder giving you the most significant hex digit. In our example 4 goes into 14, 3 times with a remainder of 2, where 2 ones (00112) equal hexadecimal 316.
APPENDIX C I/O ADDRESS SPACE VERSUS MEMORY MAPPED I/O
Reading: Your textbook covers memory organization in Section 0.3 “Semiconductor Memory” and I/O Mapping in Section 2.2 “The AVR Data Memory.” The following material covers mapping of the I/O address space in a slightly different way. The material was provided in bullet form earlier in this document.
From Charles Babbage’s Analytical Engine to Dr. Jon Von Neumann’s paper on the EDVAC computer, Input and Output have been treated as separate parts of the computer. Input and Output parts of your PC include the keyboard, mouse, printer, display, etc. To support these “peripheral” devices many microprocessors include a separate I/O address space and instructions for working with the registers contained used to control and access data provided by the peripheral device. For the AVR microcontroller you read an I/O register using an in instruction and write using the out instruction. When Atmel adopted the AVR architecture, they discovered that the 64 I/O registers accessible to these two instructions was insufficient for all the peripheral devices that they were planning on adding to the ATmega line of Microcontrollers. Specifically, they added 160 “extended” I/O registers. However, the AVR microprocessor was only designed for 64 I/O registers. To solve this problem, Atmel turned to an alternative way of working with I/O devices pioneered by Motorola and the 6800 family of processors (among others). Motorola realized that there was no reason to treat input and output devices any different from memory. Now instead of looking at computers having 5 basic elements (Input, Output, ALU, CPU, Memory), you could simplify the design to only three (CPU, ALU, and Memory). Now accessing the 160 “extended” I/O registers was accomplished using SRAM instruction like lds (load from SRAM) and sts (store to SRAM). This was such a powerful technique that Atmel extended the I/O mapping to include the 32 general purpose registers, the original 64 I/O registers, and the 160 extended I/O registers. The overlaying of the I/O address space with the SRAM address space is shown in the next section.
A side benefit of the double mapping is the large number of ways of accessing data within SRAM (addressing modes) versus the limited number of instructions and addressing modes available for accessing the original 64 I/O registers.
It is very important to realize that I/O registers are not contiguous within the address space (I/O or SRAM). The mapping is simply a convenient way of accessing registers physically located in diverse locations within the Silicon chip.
APPENDIX D A BRIEF HISTORY OF THE COMPUTER
4,000 to 3,000 BC Abacus (+, -, *, /)
- The abacus is an instrument used to perform arithmetic calculations. The positions of beads on a set of wires determine the value of the digit. Romans called these beads calculi the plural of calculus, meaning pebble. This Latin root gave rise to the word calculate. In one contest the Abacus easily won over a mechanical calculator. The abacus is still used in China, Japan, and Korea.
1642 Blaise Pascal Mechanical Calculator (+, -)
- Designed at the age of 20. Rotating wheel mechanical calculator with automatic carry between digits on addition and subtraction of decimal digits (like the odometer in a car). In 1671 Baron von Leibnitz created a calculator, which could add, subtract, and multiply.
- A Human Computer with a mechanical calculator can execute 500 operations a day
1833 Charles Babbage and the Analytical Engine
- Conceived by Babbage, the engine established the basic principles upon which modern general-purpose digital computers are constructed. This mechanical machine performed instructions dictated by punched cards, with the variable values being determined by a second set of cards. The punched cards came from Joseph Marie Jacquard’s loom, where they controlled the operation of the weaving machines in 1812.
- Neither the Analytical Engine or Difference Engine (1820), a special purpose computer designed to solve polynomial expressions (ex. N2 + N + 41), were ever entirely completed by Babbage known as “the irascible genius.” The difference engine has recently been built as shown here.
1843 Ada Byron and the First Computer Program
- Ada Byron, Lady Lovelace, was one of the most picturesque characters in computer history. Augusta Ada Byron was born December 10, 1815 the daughter of the illustrious poet, Lord Byron. Ada was brought up to be a mathematician and scientist. It was at a dinner party at Mrs. Somerville’s that Ada heard in November 1834, Babbage’s ideas for a new calculating engine, the Analytical Engine. Ada, in 1843, married to the Earl of Lovelace and the mother of three children under the age of eight, wrote an article describing Babbage’s Analytical Engine. Lady Lovelace’s prescient comments included her predictions that such a machine might be used to compose complex music, to produce graphics, and would be used for both practical and scientific use. When inspired Ada could be very focused and a mathematical taskmaster. Ada suggested to Babbage writing a plan for how the engine might calculate Bernoulli numbers. This plan, is now regarded as the first “computer program.” Like her father, she died at 36, Ada anticipated by more than a century most of what we think is brand-new computing.
Source: http://www.scottlan.edu/lriddle/women/love.htm
1890 Herman Hollerith and the Census Counting Machine
- Hollerith developed punched cards for tabulating equipment used in the 11th census of the United States. Cards contained 288 locations, size of dollar bill in order to save on tooling. Contact brushes completed electrical circuits allowing the system to do: counting, sensing, punching, and sorting. Started Tabulating Machine Company, which turned into the Computer-Tabulating-Recording Company, which turned into the International Business Machine Corporation (IBM) in 1924.
1937 Harvard Mark I
- Howard Hathaway Aiken at Harvard proposed to IBM the Mark I or Automatic Sequence Controlled Calculator — this was to be the first large-scale calculator. Very similar to the Analytical engine, the machine used a combination of electromechanical devices, including many relays. It went to work in 1944 calculating with numbers of 23 digits and computer products of 46-digit accuracy. It received its instructions from perforated tape, from IBM cards, and from the mechanical setting of 1,440 dial switches. Output was either by IBM cards or by typing columns of figures on a roll of paper. The Mark I could perform one division per minute. The machine was in operation for many years, generating many tables of mathematical functions (particularly Bessel functions), and was used for trajectory calculations in World War II.
1943 Electronic Numerical Integrator and Computer (ENIAC)
- Engineers J. Presper Eckert and John W. Mauchly created the ENIAC at the Moore School of Engineering of the University of Pennsylvania between 1943-1946. Built in war time secrecy for the army ordnance department, the ENIAC was designed to do Trajectory calculations. Containing 18,000 vacuum tubes, each accumulator using 100 vacuum tubes arranged as 10 columns of 10 tubes each, the ENIAC could add two 10-digit numbers (the size of ENIAC’s decimal accumulators) in 200 microseconds. Thirty thousand (30,000) times faster than the Mark I. The ENIAC was programmed by patch board and switches. The ENIAC was later moved at a cost of $100,000 to the Ballistic Research Laboratories at the Aberdeen Proving Ground.
1945 Dr. John Von Neumann and the Electronic Discrete Variable Computer (EDVAC)
- EDVAC was the first general-purpose stored program binary electronic (vacuum tube) computer. Completed in 1950 after the EDSAC thus it was not the first operational stored program computer. The technical work done on the EDVAC was by Eckert and Mauchly, Notable the Ultrasonic (or Supersonic) Delay Line, with the logical organization done by Von Neumann, Burke, and Goldstine.
- This computer was the blueprint for most modern day computer systems having in it the 5 principle organs that make up almost all modern day computers. Input, Output, Arithmetical, Central Control, Memory (storing both the numerical as well as the instructional information for a given problem), Eckert as well as others left before the EDVAC was ever completed. Architecturally the EDVAC is classified as a general purpose four address computer.
1947 The First Computer Bug
- American engineers have been calling small flaws in machines “bugs” for over a century. Thomas Edison talked about bugs in electrical circuits in the 1870s. When the first computers were built during the early 1940s, people working on them found bugs in both the hardware of the machines and in the programs that ran them.
- In 1947, engineers working on the Mark II computer at Harvard University found a moth stuck in one of the components. They taped the insect in their logbook and labeled it “first actual case of bug being found.” The words “bug” and “debug” soon became a standard part of the language of computer programmers.
1951 John Von Neumann and Princeton’s IAS (Institute for Advance Study) Machine
- Designed to develop a world weather model, the IAS machine incorporated most of the general concepts of parallel binary stored-program computers. That is it used random access memory or parallel memory, CRTs. One address computer.
1951 Eckert and Mauchly and the UNIVAC I
- Soon after the formal dedication of ENIAC computer, J. Presper Eckert and John W. Mauchley’s left the University of Pennsylvania to start their own business. Early orders from U.S. government agencies and other potential customers were not enough to keep the young Eckert-Mauchley Computer Corporation alive, and Remington Rand agreed to purchase the firm in 1950. Work on the UNIVAC I (Universal Automatic Computer) went forward, and the first commercially available electronic (vacuum tube) digital computer was delivered to the Bureau of the Census in early 1951. By 1957, some 46 copies of the machine had been installed at locations ranging from the David Taylor Model Basin of the U.S. Navy Bureau of Ships, to Pacific Mutual Life Insurance Company, to the offices of the Commonwealth of Pennsylvania.
- The UNIVAC, like the ENIAC, had vacuum tube circuit elements. There also were some 18,000 crystal diodes. Central memory was handled in acoustic delay-line tanks, which were used in several early computers. UNIVAC also had an external magnetic tape memory, as well as magnetic tapes used in input and output. Users of UNIVAC played an important role in the development of programming languages. Source: Smithsonian Computer History Collection
1965 Digital Equipment Corporation (DEC) PDP-8
- Designed using Integrated Circuits, DEC sold the first PDP-8 for only $18,000. Later versions of this machine that incorporated improvements in electronics appeared over the next decade. These became steadily smaller and cheaper, triggering a rush of new applications in which the computer was embedded into another system and sold by a third party (called an Original Equipment Manufacturer, or OEM). Some machines were specifically designed for time sharing and for business applications. Ultimately over 50,000 PDP-8’s were sold (excluding those embedded as single chips into other systems) bringing computers into the laboratory and the manufacturing plant’s production line, and thus the minicomputer industry was born. (read “The Sole of a New Machine”).
The x86 isn’t all that complex — it just doesn’t make a lot of sense
Mike Johnson
Leader of the 80×86 Design at AMD
Microprocessor Report (1994)
June 1969 to April 1971 Ted Hoff and Intel 3-chipset 4004
- Intel, a company founded in 1968, is asked by Busicom of Japan to design a custom LSI calculator chip-set. Intel discovers design will take 11 36-40 pin IC packages and proposes a creative alternative. Ted Hoff, at Intel, had been working with the PDP-8 min-computer and proposed to Busicom that a general purpose LSI chip-set be designed that could be programmed to be a calculator or for other applications. We are so used to using computers, that the genus of this step can escape us. The traditional solution was to design what you wanted using logic gates. What Ted Hoff envisioned was a wholly different approach. You design a simple CPU and taught it using software to do what you want. Today these computers are known a microcontrollers and embedded systems. Publicly announced on November 1971.
Nov 1969 to Jan 1972 Vic Poor and the Intel 8008
- Vic Poor of Datapoint Corporation of San Antonio, Texas (manufacturers of “intelligent terminals” and small computer systems) along with Cogar and Viatron engineers design a very elementary computer, and put under contract Intel and Texas Instruments to implement the design on a single logic chip. Intel succeeded, but their product executed instructions approximately ten (10) times as slowly as Datapoint had specified and way behind schedule (work had been stopped by Intel to complete the Busicom chip-set.); so Datapoint declined to buy it, and built their own product using existing logic components. And thus Intel holding a computer-like logic device (whose development had been paid for) marketed the Intel 8008 and the microcomputer industry was born.
1975 John Cocke and the IBM 801
- The first (Reduced Instruction Set Computer) RISC machine was developed as part of the IBM 801 Minicomputer Project. John Cocke contributed many detailed innovations in the 801 processor and associated optimizing compiler, and is considered the “father of RISC architecture.”
- “John’s concept of the RISC resulted from his detailed study of the trade-offs between high performance machine organization and compiler optimization technology. He recognized that an appropriately defined set of machine instructions, program controls, and programs produced by a compiler — carefully designed to exploit the instruction set — could realize a very high performance processor with relatively few circuits. Critical to the success of RISC was the concept of an optimizing compiler able to use the reduced instruction set very efficiently and maximize performance of the machine.”
Source: http://domino.watson.ibm.com/comm/pr.nsf/pages/news.20020717_cocke.html
1976 Intel i8748
- Prior to 1976 small board computers (SBCs) were designed around microprocessor chips, like the 8080. These SBCs included all the features needed to implement a very simple computer system. These SBCs, of which the D2 by Motorola, KIM-1 by MOS Technology, and SDK-85 by Intel are the most memorable, quickly found their way into design labs at colleges, universities, and electronic companies. By adding peripheral cards these SBCs could read sensors and control actuators. In 1976 Intel put all of the features found on an SBC and parts of the peripheral cards into one chip known as the i8748. With over 17,000 transistors the i8748 was the first device in the MCS-48 family of microcontrollers. This IC, and other MCS-48 devices, quickly became the de facto industrial standard in control-oriented applications. Soon MCS-48 devices were replacing electromechanical components in many modern appliances.
1980 Intel 8051
- With over 60,000 transistors, the power, size, and complexity of microcontrollers moved to the next level with Intel’s introduction of the 8051, the first device in the MCS-51 family of microcontrollers. In a bold move, Intel allowed other manufacturers to make and market code-compatible variants of the 8051. This step led to its general acceptance by the engineering community as the de facto standard in microcontroller architectures.
1996 Atmel AVR
- AVR is a moniker for a family of Atmel 8-bit RISC microcontrollers. The AVR is a Modified Harvard architecture machine with program and data stored in separate physical memory systems that appear in different address spaces. The AVR architecture was conceived by Alf-Egil Bogen and Vegard Wollan at the Norwegian Institute of Technology (NTH). When the technology was sold to Atmel, the internal architecture was further developed by Alf and Vegard at Atmel Norway, a subsidiary of Atmel founded by the two architects. The name AVR sounds cool and does not stand for anything.
Source: http://en.wikipedia.org/wiki/Atmel_AVR
APPENDIX E CLASSIC COMPUTER ARCHITECTURE
As we discovered in our short history lesson, computers are designed to meet a specific set of requirements. In the early days, these requirements were to meet some military, science, civil, or commercial need. For the military, it was predominately the calculation of ballistic tables; for science to calculate the motion of the planets or the weather. For civil keeping track of people and commercial keeping track of the money. To meet these requirements the computer was conceived and described by its (1) hardware components and (2) the instructions it could execute. The former, for all modern day computers, were codified by Von Neumann in his landmark paper describing the architecture of the EDVAC computer.
Von Neumann’s paper describes a computer architecture having five basic components: Input, Output, Memory, Control, and Arithmetical.
For this class we will Reparation these elements as discussed in the next section and defined in Figure 1-3. An important component of this new viewpoint is the central processing unit (CPU) which will be divided into a Control and a Datapath element as shown in the Figure 1-2. Atmel literature uses the term microcontroller unit (MCU) in place of the more generic central processing unit. In this course the two terms are considered synonymous.
Classic Microcontroller Architecture
The CPU is divided into a Control and a Datapath element as shown in the Figure 1-2. The Control Unit contains combination logic and translates the instructions held in the instruction register (not shown) into the control signals needed to execute the instruction. The data path contains the General Purpose Registers (technically known as the Register File) and the Arithmetic and Logic Unit (ALU). The Datapath includes a few other registers which we will learn about shortly.
The integration of the program and data memory described by Von Neumann is today known as the Princeton memory model. The architecture of our AVR processor separates these two types of memory into Flash Program Memory and Static Random Memory (SRAM). This separation of program and data memory more resembles the Harvard Mark I computer, than the EDVAC computer, and is therefore known as the Harvard memory model.
The input and output functions of Figure 1-1 will be treated together and simply called input/output (I/O). For microcontrollers, the term I/O includes all the Peripherals (Parallel I/O, Counter/Timers, etc.) supported by a particular model of microcontroller, in our case the ATmega328P.
For this class the Von Neumann architecture is thus repartitioned into five basic blocks: Flash Program Memory, SRAM Data Memory, Control Unit, Datapath, and Input-Output.
APPENDIX F ATMEGA328P ARCHITECTURAL OVERVIEW
Reading: Section 5.1 Overview plus Atmega8 Block Diagram
Clock
- ATmega Family – Up to 20 MHz
- Arduino Duemilanove – 16 MHz (ATmega328P)
- ALU – On-chip 2-cycle Hardware Multiplier
Memory
- ATmega Family – Up to 256 KBytes Flash, 4K Bytes EEPROM and 8K Bytes SRAM.
- ATmega328P – 32 KBytes Flash, 1K Bytes EEPROM, and 2K Bytes SRAM
- Self-Programming Flash memory with boot block (ICSP header)
Peripheral Subsystems
- Two 8-bit (PORTB, PORTD), plus One 7-bit (PORTC) General Digital I/O Ports
- Programmable Serial USART, Master/Slave SPI Serial Interface.
- Byte-oriented 2-wire Serial Interface (TWI) is Philips I2C compliant.
- Two 8-bit Timer/Counters with Separate Prescaler and Compare Mode
- One 16-bit Timer/Counter with Separate Prescaler, Compare Mode, and Capture Mode
- Six PWM Channels
- 8-channel 10-bit A/D converter with up to x200 analog gain stage.
- Programmable Watchdog Timer with Separate On-chip Oscillator
- On-Chip Debug through JTAG or debugWIRE interface.
Other Features
- External and Internal Interrupt Sources with 2 instruction words/vector
Note
- In the following Block Diagram, Power (Vcc), Ground (GND), and the clock input (XTAL) are present but not shown.
APPENDIX G MICROPROCESSOR VERSUS MICROCONTROLLER
APPENDIX H TWO-STAGE INSTRUCTION PIPELINE
Pipelining: A technique that breaks operations, such as instruction processing or bus transactions, into smaller distinct stages or tenures (respectively) so that a subsequent operation can begin before the previous one has completed.
From the Atmel ATmega328P Data Sheet Chapter 6 AVR CPU Core, Section 6.1 Overview and with respect to Figure 6-1 Block Diagram of the AVR Architecture
“In order to maximize performance and parallelism, the AVR uses a Harvard architecture – with separate memories and buses for program and data. Instructions in the program memory are executed with a single level pipelining. While one instruction is being executed, the next instruction is pre-fetched from the program memory. This concept enables instructions to be executed in every clock cycle. The program memory is In-System Reprogrammable Flash memory.”
A pipeline stage begins and ends with a register; controlled by a clock. Between the register(s) is combinational logic. Although counter-intuitive, Flash program memory can be viewed as combinational logic with an address generating a word of data. With respect to our AVR architecture (Figure 6-1) the two registers of interest are the Program Counter (PC) and the Instruction Register (IR). Without pipelining these two registers in the control unit (PC, IR) would require two clock cycles to complete a basic computer operation cycle. Specifically, an instruction is (1) fetched and then (2) executed.
For most instructions, especially one based on a modified Harvard memory model, program memory is not accessed during the execution cycle. This memory down time could be used to fetch the next instruction to be executed, in parallel with the execution cycle of the current instruction. Here then is an opportunity for pipelining! Figure 10.2 illustrates the idea. The pipeline has two independent stages. The first stage fetches an instruction and places it in the Instruction Register (IR), while the second stage is executing the instruction. This two-stage instruction pipeline is also called instruction prefetch can be found in some of the earliest microprocessors including the Intel 8086
For our RISC architecture most instructions are executed in a single cycle (also known as elemental instructions). In this perfect world where all instructions take one cycle to fetch and one cycle to execute, after an initial delay of one cycle to fill the pipeline, known as latency, each instruction will take only one cycle to complete.
Forgetting for now the circuit delays attendant with implementing the pipeline (for example the latch), and other complicating issues, our performance would be twice that of a non-pipelined design.
APPENDIX I ATMEGA328P INSTRUCTION SET
The Instruction Set of our AVR processor can be functionally divided (or classified) into the following types:
- Data Transfer Instructions
- Arithmetic and Logic Instructions
- Bit and Bit-Test Instructions
- Branch (Control Transfer) Instructions
- MCU Control Instructions
Addressing Modes: Working with AVR’s Load-Store RISC Architecture
READING
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Sections: 2.3, 6.2
Table of Contents
LOAD-STORE INSTRUCTIONS AND THE ATMEGA328P MEMORY MODEL
When selecting an addressing mode you should ask yourself where is the operand (data) located within the memory model of the AVR processor and when do I know its address (assembly time or at run time).
LOAD-STORE INSTRUCTIONS AND ADDRESSING MODES
- When loading and storing data we have several ways to “address” the data.
- The AVR microcontroller supports addressing modes for access to the Program memory (Flash) and Data memory (SRAM, Register file, I/O Memory, and Extended I/O Memory).
IMMEDIATE
- Data is encoded with the instruction. Operand is therefore located in Flash Program Memory. This is why technically our memory model is a Modified Harvard.
ldi r16, 0x23 // where ldi = 1110, Rd = 00002, and constant K = 001000112
- Notice that only four bits (dddd) are set aside for defining destination register Rd. This limits us to 24 = 16 registers. The designers of the AVR processor chose registers 16 to 31 to be these registers (i.e., 16 ≤ Rd ≤ 31).
DIRECT
lds r16, A
sts A, r16
Within the AVR family there are two (2) possible lds/sts instructions. A specific family member will have only one lds/sts combination. The ATmega328P lds/sts instruction is illustrated here with the exception that 5 bits (not 4) encode Rr/Rd. This means all 32 registers are available to the lds/sts instruction.
in r16, PINC
out PORTD, r16
REGISTER-REGISTER INSTRUCTIONS
Data Transfer
- Register-register move byte (mov) or word (movw)
Arithmetic and Logic (ALU)
- Two’s complement negate (neg), Arithmetic add (add, adc, adiw), subtract (sub, subi, sbc, sbci), and multiply (mul, muls, mulsu, fmul, fmuls, fmulsu)
- Logical not (com), and (and, andi, cbr, tst), or (or, ori, sbr), exclusive or (eor)
- Clear (clr), set (ser), increment (inc), decrement (dec)
Bit and Bit-Test
- Register logical shift left (lsl) or right (lsr); arithmetic shift right (asr); and rotate left or right (rol, ror)
- Register swap nibble (swap)
- Register bit load (bld) or store (bst) from/to T flag in the Status Register SREG
- I/O Register Clear (cbi) or set (sbi) a bit
- Clear (clFlag) or set (seFlag) a Flag bit in the Status Register SREG by name (I, T, H, S, V, N, Z, C) or bit (bclr, bset).
REGISTER DIRECT
In the following figures, OP means the operation code part of the instruction word. To simplify, not all figures show the exact location of the addressing bits. To generalize, the abstract terms RAMEND and FLASHEND have been used to represent the highest location in data and program space.
com r16
add r16, r17
LOAD-STORE PROGRAM EXAMPLE
Write an Assembly program to add two 8-bit numbers.
C = A + B
lds r16, A ; 1. Load variables
lds r17, B
add r16, r17 ; 2. Do something
sts C, r16 ; 3. Store answer
- Identify the operation, source operand, destination operand in the first Data Transfer instruction.
- Identify the source/destination operand in the Arithmetic and Logic (ALU) instruction.
- What addressing mode is used by the source operand, in the first instruction?
- Show contents of Flash Program Memory (mnemonics)
- Show contents of SRAM Data Memory, assuming variables are stored in sequential memory locations starting at address 010016.
- Modify the program to leave register r16 unchanged by making a copy (use r15).
SPECIAL TOPIC – HARVARD VERSUS PRINCETON ARCHITECTURE
Princeton or Von Neumann Memory Model
Program and data share the same memory space. Processors used in all personal computers, like the Pentium, implement a von Neumann architecture.
Harvard Memory Model
Program and data memory are separated. The AVR processors among others including the Intel 8051 use this memory model. One advantage of the Harvard architecture for microcontrollers is that program memory can be wider than data memory. This allows the processor to implement more instructions while still working with 8-bit data. For the AVR processor program memory is 16-bits wide while data memory is only 8-bits.
You may have already noticed that when you single step your program in the simulator of AVR Studio it is incremented by 1 each time an instruction is executed. No surprise there right? Wrong. The program memory of the AVR processor can also be accessed at the byte level. In most cases this apparent paradox is transparent to the operation of your program with one important exception. When you want to access data stored in program memory, you will be working with byte addresses not words (16-bits). The assembler is not smart enough to know the difference and so when you ask for an address in program memory it returns its word address. To convert this word address into a byte address you need to multiply it by 2. Problematically we do this by using the shift left syntax of C++ to explicitly tell the assembler to multiply the word address by 2. Remember, when you shift left one place you are effectively multiplying by 2.
With this in mind, we would interpret the following AVR instruction as telling the AVR assembler to convert the word address of label beehives in program memory to a byte address and then to take the low order of the resulting value and put into the source operand of the instruction.
ldi ZL,low(beeHives<<1) // load word address of beeHives look-up
APPENDIX A – ATMEGA328P INSTRUCTION SET
Introduction to AVR Assembly Language Programming II: ALU and SREG
READING
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Sections: 2.4, 5.1, 5.2, 6.5
COMPLEMENTARY READING
The following source(s) cover the same material as Chapter 2 of your textbook.
They are provided to you in case you want a different viewpoint.
ATMEL document doc8161 “8-bit AVR Microcontroller with 4/8/16/32K Bytes In-System Programmable Flash” Section 6.3.1: SREG – AVR Status Register
Table of Contents
INSTRUCTION SET ARCHITECTURE (REVIEW)
ALU – TWO OPERAND INSTRUCTIONS
- All math (+,-,×,÷) and logic (and, or, xor) instructions work with the Register File (register to register).
- Most math and logic instructions have two operands Rd, Rr with register Rd initially containing one of the values to be operated on and ultimately the result of the operation. The initial contents of Rd are therefore destroyed by this operation.
add Rd, Rr ; Rd = Rd + Rr, You may use any register (R0 – R31).
- Some math and logic operations replace the source register Rr with a constant K. Typically denoted by an “i” postfix.
subi Rd, K ; Rd = Rd – K, You may only registers (R16 – R31).
add, adc, adiw Adds two registers and the contents of the C Flag (adc only) and places the result in the destination register Rd.
sub, sbc, subi, sbci, sbiw Subtracts the source register Rs or constant K from the source/destination register Rr and subtracts with the C Flag (sbc and sbci only) and places the result in the source/destination register Rd. Think of the C Flag as the Borrow bit within this context.
mul, muls, mulsu, fmul, fmuls, fmulsu The multiplicand Rd and the multiplier Rr are two registers containing binary or fractional ( f-prefix) encoded numbers. Both numbers may be unsigned (mul, fmul), or signed (muls, fmuls). Finally, the multiplicand Rd may be signed with the multiplier Rr unsigned (mulsu, fmulsu). The 16-bit unsigned product is placed in R1 (high byte) and R0 (low byte).
and, andi, or, ori, eor Performs the logical AND, OR, and XOR operations between the contents of register Rd and register Rr or constant K.
ALU – SINGLE OPERAND INSTRUCTIONS
- All single operand math and logic instructions only need a single register and usually the mnemonic alone is enough to tell you what it does.
Mnemonic | Operation | Description |
com | One’s complement | |
neg | Two’s complement | |
inc | Increment | |
dec | Decrement | |
clr | Clear | |
ser | Set Register, Limited to r16-r31 | |
tst | Test for Zero or Minus |
ALU PROGRAM EXAMPLE
Write an Assembly program to implement the polynomial expression
B = A2 + A + 41
.INCLUDE
.DSEG
A: .BYTE 1 // 8 bit input
B: .BYTE 2 // 16 bit output
.CSEG
; load
lds r16, A ; r16 with the value of A
clr r17 ; r17 with 0
ldi r18, 41 ; r18 with 41
; do something
mul r16, r16 ; r1:r0 = A^2
add r0, r16
adc r1, r17 ; r1:r0 = A^2 + A
add r0, r18
adc r1, r17 ; r1:r0 = A^2 + A + 41
; store
sts B, r0 ; answer byte ordering
sts B+1, r1 ; is little endian
SREG – AVR STATUS REGISTER
SREG – AVR Status Register
Non ALU
- Bit 7 – I: Global Interrupt Enable
The Global Interrupt Enable bit must be set for the interrupts to be enabled. The individual interrupt enable control is then performed in separate control registers. The I-bit is cleared by hardware after an interrupt has occurred, and is set by the reti instruction. The I-bit can also be set and cleared by the application with the sei and cli instructions. - Bit 6 – T: Bit Copy Storage
The Bit Copy instructions bld (Bit LoaD) and bst (Bit STore) use the T-bit as source or destination. A bit from a register can be copied into T (Rb -> T) by the bst instruction, and a bit in T can be copied into a bit in a register (T -> Rb) by the bld instruction.
ALU
Signed two’s complement arithmetic
- Bit 4 – S: Sign Bit, S = N ⊕ V
Bit set if answer is negative with no errors or if both numbers were negative and error occurred, zero otherwise. - Bit 3 – V: Two’s Complement Overflow Flag
Bit set if error occurred as the result of an arithmetic operation, zero otherwise. - Bit 2 – N: Negative Flag
Bit set if result is negative, zero otherwise.
Unsigned arithmetic
- Bit 5 – H: Half Carry Flag
Carry from least significant nibble to most significant nibble. Half Carry is useful in BCD arithmetic. - Bit 0 – C: Carry Flag
The Carry Flag C indicates a carry in an arithmetic operation. Bit set if error occurred as the result of an unsigned arithmetic operation, zero otherwise.
Arithmetic and Logical
- Bit 1 – Z: Zero Flag
The Zero Flag Z indicates a zero result in an arithmetic or logic operation.
THE SREG OVERFLOW BIT
- The overflow bit indicates if there was an error caused by the addition or two n-bit 2’s complement numbers, where the n-1 “sign bit” is 1 if the number is negative and 0 if the number is positive. In other words, the sum is outside the range 2n 1 to 2n 1 1.
- Another way to recognize an error in addition is to observe that if you add two numbers of the same sign (positive + positive = negative or negative + negative = positive) then an error has occurred.
- An overflow condition can never result from the addition of two n-bit numbers of opposite sign (positive _ negative or negative + positive).
- Here are examples of all four cases for two 8 bit signed numbers.
Case A B C D 0b6b5b4b3b2b1b0 0b6b5b4b3b2b1b0 1b6b5b4b3b2b1b0 1b6b5b4b3b2b1b0 0b6b5b4b3b2b1b0 1b6b5b4b3b2b1b0 0b6b5b4b3b2b1b0 1b6b5b4b3b2b1b0
The variable “bn” simply indicates some binary value and may be 1 or 0. The index of the carry bit (Cn) is equal to the carry into bit bn. For example, the carry into b0 is C0 and the carry out of an 8-bit register b7 is C8.
- Looking first at Case A, a carry cannot be generated out of the sign bit (Cn+1=0); therefore, if a carry enters the sign bit (Cn=1), the sum will be negative and the answer is wrong.
- For Case B and Case C no error can occur. Observe that in both case B and C because the numbers are contained in an n-bit (n = 8) register, we know they are in the range -2n-1 to 2n-1-1 (-128 to 127 for our two 8-bit numbers). Because one number is positive and the other negative, we further know, the answer must be correct.
- For Case D, a carry will always be generated out of the sign bit Cn+1=1 (ex. C8 = 1) with the sign bit itself set to 0; therefore, if a carry does not enter the sign bit Cn=0 (C7=1) the sum will be positive and the answer will be wrong.
- Here is what we have discovered translated into a truth-table.
- Solving for the overflow bit (V) we have,
COMPUTING ALU STATUS REGISTER BITS – ADDITION –
COMPUTING ALU STATUS REGISTER BITS – SUBTRACTION –
- For subtract instructions (sub, subi, sbc, sbci, sbiw), including compare instructions (cp, cpc, cpi, cpse), the carry bit is equal to and
- Assume the subtract instruction sub r16, r17 has just been run by the ATmega328P microcontroller. Complete the table provided. The “difference” column should reflect the contents of register r16 after the subtraction operation (leave the answer in 2’s complement form) and not the actual difference (i.e., if done using your calculator).
Signed Unsigned r16 r17 difference relationship relationship H S V N Z C 3B 3B 00 + = + = 0 0 0 0 1 0 3B 15 26 + > + > 0 0 0 0 0 0 15 3B F9 F6 F6 F9 15 F6 F6 15 68 A5 A5 68 - Use AVR Studio simulation software to check your answers.
Introduction to AVR Assembly Language Programming II: Branching
READING
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Sections: 3.1
ADDITIONAL READING
Introduction to AVR assembler programming for beginners, controlling sequential execution of the program http://www.avr-asm-tutorial.net/avr_en/beginner/JUMP.html
Table of Contents
INSTRUCTION SET ARCHITECTURE (REVIEW)
The Instruction Set Architecture (ISA) of a microprocessor includes all the registers that are accessible to the programmer. In other words, registers that can be modified by the instruction set of the processor. With respect to the AVR CPU illustrated here , these ISA registers include the 32 x 8-bit general purpose resisters, status resister (SREG), the stack pointer (SP), and the program counter (PC).
Data Transfer instructions are used to load and store data to the General Purpose Registers, also known as the Register File. Exceptions are the push and pop instructions which modify the Stack Pointer. By definition these instructions do not modify the status register (SREG).
Arithmetic and Logic Instructions plus Bit and Bit-Test Instructions use the ALU to operate on the data contained in the general purpose registers. Flags contained in the status register (SREG) provide important information concerning the results of these operations. For example, if you are adding two signed numbers together, you will want to know if the answer is correct. The state of the overflow flag (OV) bit within SREG gives you the answer to this question (1 = error, 0 no error).
Control Transfer Instructions allow you to change the contents of the PC either conditionally or unconditionally. Continuing our example if an error results from adding two signed numbers together we may want to conditionally (OV = 1) branch to an error handling routine. As the AVR processor fetches and executes instructions it automatically increments the program counter (PC) so it always points at the next instruction to be executed.
INSTRUCTION SET (REVIEW)
The Instruction Set of our AVR processor can be functionally divided (or classified) into the following parts:
- Data Transfer Instructions
- Arithmetic and Logic Instructions
- Bit and Bit-Test Instructions
- Control Transfer (Branch) Instructions
- MCU Control Instructions
JUMP INSTRUCTIONS
- There are two basic types of control transfer instructions – Unconditional and Conditional.
- From a programmer’s perspective an unconditional or jump instruction, jumps to the label specified. For example, jmp loop will unconditionally jump to the label loop in your program.
- Here are the unconditional control transfer “Jump” instructions of the AVR processor
Direct jmp, call Relative (1) rjmp, rcall Indirect ijmp, icall Subroutine & Interrupt Return ret, reti
Note:
- Jump relative to PC + (, where k = 12) PC-2048 to PC+2047, within 16 K word address space of ATmega328P
HOW THE DIRECT UNCONDITIONAL CONTROL TRANSFER INSTRUCTIONS JMP AND CALL WORK
- From a computer engineer’s perspective, a direct jump is accomplished by loading the target address into the program counter (PC). In the example, the target address is equated to label “loop.”
- To provide a more concrete example, assume the label loop corresponds to address 0x0123 in Flash Program Memory.
- To execute this instruction, the control logic of central procession unit (CPU) loads the 16-bit Program Counter (PC) register with 0x123.
- Consequently, on the next fetch cycle it is the instruction at location 0x0123 that is fetched and then executed. Control of the program has been transferred to this address.
HOW THE RELATIVE UNCONDITIONAL CONTROL TRANSFER INSTRUCTIONS RJMP AND RCALL WORK
- From a computer engineer’s perspective, a relative jump is accomplished by adding a 12-bit signed offset to the program counter (PC) . The result corresponding to the target address. In the example, the target address is equated to label “loop.”
- To provide a more concrete example, assume the label loop corresponds to address 0x0123 in Flash Program Memory (the target address).
- An rjmp loop instruction is located at address 0x206. When the rjmp is executed, the PC is currently fetching what it thinks is the next instruction to be executed at address 0x207.
- To accomplish this jump the relative address (kkkk kkkk kkkk) is equal to 0xF1C (i.e., 0x123 – 0x207).
- Consequently, on the next fetch cycle it is the instruction at location 0x0123 that is fetched and then executed. Control of the program has been transferred to this address.
BRANCH INSTRUCTIONS
- When a conditional or branch instruction is executed one of two things may happen.
- If the test condition is true then the branch will be taken (see jump instructions).
- If the test condition is false then nothing happens (see nop instruction).
Note: This statement is not entirely accurate. Because the program counter always points to the next instruction to be executed, during the execution state, doing nothing means fetching the next instruction.
- The “test condition” is a function of one SREG flag bit. For example, the branch if equal (breq) or not equal (brne) instructions test the Z flag.
HOW THE RELATIVE CONDITIONAL CONTROL TRANSFER INSTRUCTION BREQ WORKS
- If a relative branch is taken (test condition is true) a 7-bit signed offset is added to the PC. The result corresponding to the target address. In the example, the target address is equated to label “match.”
- To provide a more concrete example, assume the label nomatch corresponds to address 0x0123 in Flash Program Memory (the target address).
- A brne nomatch instruction is located at address 0x0112. When the brne instruction is executed, the PC is currently fetching what it thinks is the next instruction to be executed at address 0x0113.
- To accomplish this jump the relative address (kk kkkk) is equal to 0b01_0000 (i.e., 0x123 – 0x113).
- Consequently, on the next fetch cycle it is the instruction at location 0x0123 that is fetched and then executed. Control of the program has been transferred to this address.
BRANCH INSTRUCTIONS
- All conditional branch instructions may be implemented as brbs s,k or brbc s,k, where s is the bit number of the SREG flag bit. For example brbs 6, bitset would branch to label bitset, if the SREG T bit was set.
- To make your code more readable, the AVR assembler adds the following “alias” instructions.
- SREG Flag bit is clear (brFlagc) or set (brFlags) by name (I, T, H, S, V, N, Z, C) or bit (brbc, brbs).
- These SREG flag bits (I, T, H, S, V, N, Z, C) use more descriptive mnemonics.
- Branch if equal (breq) or not equal (brne) test the Z flag.
- Unsigned arithmetic branch if plus (brpl) or minus (brmi) test the N flag, while branch if same or higher (brsh) or lower (brlo), test the C flag and are equivalent to brcc and brcs respectively.
- Signed 2’s complement arithmetic branch if number is less than zero (brlt) or greater than or equal to zero (brge) test the S flag
Skip if …
- Bit (b) in a register is clear (sbrc) or set (sbrs).
- Bit (b) in I/O register is clear (sbic) or set (sbis). Limited to I/O addresses 0-31
Note:
- All branch instructions are relative to PC + (, where k = 7) + 1 PC-64 to PC+63
- Skip instructions may take 1, 2, or 3 cycles depending if the skip is not taken, and the number of Flash program memory words in the instruction to be skipped (1 or 2).
CONDITIONAL BRANCH ENCODING
Here is how the brbs, brbc and their alias assembly instructions are encoded.
A CONDITIONAL CONTROL TRANSFER (BRANCH) SEQUENCE
A conditional control transfer (branch) sequence is typically comprised of two (2) instructions.
1. The first instruction performs some arithmetic or logic operation using the ALU of the processor.
- Examples of this first type of instruction includes: cp, cpc, cpi, tst
- These ALU operations result in SREG flag bits 5 to 0 being set or cleared (i.e., H, S, V, N, Z, C).
- To allow for multiple branch conditions to be tested, these instructions typically do not modify any of our 32 general purpose registers.
- The compare instructions cp, cpc, cpi should be used when you want to understand the relationship between two registers. For compare instructions, this is accomplished by performing a subtraction operation without a destination operand (cp r16,r17 is equivalent to r16 – r17).
- The tst instruction should be used when you want to test if the number in one register is negative or zero. For a test instruction, this is accomplished by performing an and operation with the destination and source registers being the same (tst r16 is equivalent to and r16,r16).
WARNING: The Atmel “Instruction Set Summary” document incorrectly classifies compare instructions (cp, cpc, cpi) as “Branch Instructions.” They should be listed under “Arithmetic and Logical Instructions.” To highlight this inconsistency on Atmel’s part, the tst instruction is correctly listed under “Arithmetic and Logical Instructions.”
2. The second instruction is a conditional branch instruction testing one or more SREG flag bits.
CONDITIONAL BRANCH INSTRUCTION SUMMARY
As mentioned in the previous slide, typically a conditional control transfer instruction follows a compare or test instruction, where some relationship between two registers is being studied. The following table may be used to quickly find the correct conditional branch instructions for these conditions.
A Conditional Control Transfer (Branch) Example
Here is how a high-level language decision diamond would be implemented in assembly.
; directions (see note)
.EQU south=0b00 ; most significant 6 bits zero
.EQU east=0b01
.EQU west=0b10
.EQU north=0b11
cpi r16,north ; step 1: Z flag set if r16 = 0b00000011
breq yes ; step 2: branch if Z flag is set
Note: These equates are included in testbench.inc
IMPLEMENTING A HIGH-LEVEL IF STATEMENT
- A high-level if statement is typically comprised of…
- Conditional control transfer sequence (last slide) where the complement (not) of the high-level conditional expression is implemented.
- High-level procedural block of code is converted to assembly.
- C++ High-level IF Expression
if (r16 == north) {
block of code to be executed if answer is yes.
} - Assembly Version
cpi r16,north ; Is bear facing north?
brne no ; branch if Z flag is clear (not equal)
block of code to be executed if answer is yes.
no:
IMPLEMENTING A HIGH-LEVEL IF…ELSE STATEMENT
- A high-level if…else statement is typically comprised of…
- Conditional control transfer sequence where the complement (not) of the high-level conditional expression is implemented.
- High-level procedural block of code for yes (true) condition.
- Unconditional jump over the no (false) block of code.
- High-level procedural block of code for no (false) condition.
- C++ High-level if…else Expression
if (r16 == north) {
block of code to be executed if answer is yes (true).
}
else {
block of code to be executed if answer is no (false).
} - Assembly Version
cpi r16,north ; Is bear facing north?
brne else ; branch if Z flag is clear (not equal)
block of code to be executed if answer is yes.
rjmp end_if
else:
block of code to be executed if answer is no.
end_if:
ASSEMBLY OPTIMIZATION OF A HIGH-LEVEL IF…ELSE STATEMENT – ADVANCED TOPIC –
- If the if-else blocks of code can be done in a single line of assembly then the program flow is modified to guess the most likely outcome of the test.
- This is possible if the value of a variable (for example the segments of a 7-segment display to be turned on) is the only thing done in each block.
- This optimized program flow will always execute as fast as the normal if..else program flow (if the guess if wrong) and faster if the guess is correct.
- This implementation is also more compact and often easier to understand.
- Assembly Version
; 7-segment display (see note)
.EQU seg_a=0
.EQU seg_b=1
.EQU seg_c=2
…
ldi r17,1<
cpi r16, north ; Is bear facing north?
breq done
clear (not equal)
block of code to be executed if guess was wrong.
done:
Note: These equates are included in spi_shield.inc
Program Examples: Group A or B – Pseudocode example
- Objective
Assign the least significant 4 switches on the CSULB shield to group A and the most significant to group B. Based on user input, display A or B based on which group has the higher value. In the event of a tie display E for equal. For this programming problem assume that people choose A 50% of the time, B 40% of the time, and set the switches equal to each other 10% of the time.
- Pseudocode
- Using the ReadSwitches subroutine or reading the I/O ports directly, input group A into register A (.DEF regA = r16) and group B into register B (.DEF regB = r17)
- Preload the output register (.DEF answer = r18) with the letter A Guess
- If (A>B) then go to display answer.
- Preload the output register with the letter B Guess
- If (B>A) then go to display answer.
- Set answer to E and display answer.
- Seven segment display values.
- Programming work around by interchanging Rd and Rr.
Direction Finder – Two Program Solutions
- Objective
Design a digital circuit with two (2) switches that will turn on one of the rooms 4 LED segments indicating the direction you want your bear to walk
Direction Finder – Truth Table Implementation
lds r16, dir // move direction bits into a working register // facing east (segment b) bst r16,0 // store direction bit 0 into T bld var_B,0 // load r16 bit 0 from T bst r16,1 // store direction bit 1 into T bld var_A,0 // load r17 bit 0 from T com var_A // B = /A * B and var_B, var_A bst var_B,0 // store r16 bit 0 into T bld spi7SEG, seg_b // load r8 bit 1 from T Implementation of Boolean expressions for segments a, f, and g (circuit schematic)
Direction Finder – Using Conditional Expressions
lds r16, dir
ldi r17, 1< cpi r16,south ; if bear is facing south then we are done
breq done
ldi r17, 1< cpi r16,west ; if bear is facing west then we are done
breq done
ldi r17, 1< cpi r16,east ; if bear is facing east then we are done
breq done
ldi r17, 1<
done:
mov spi7SEG, r17 ; answer to 7-segment register
call WriteDisplay
Pseudo-Instructions TurnLeft, TurnRight, and TurnAround
Using switches 3 and 2, located on Port C pins 3 and 2 respectively, input an action you want the bear to take. The three possible actions are do nothing, turnLeft, turnRight, and turnAround. Write a subroutine named WhichWay to take the correct action as defined by the following table.
; ————————–
; — Which Way Do I Go? —
call ReadSwitches // input port C pins (0x06) into register r7
bst switch, 3 // store switch bit 3 into T
brts cond_1X // branch if T is set
bst switch, 2 // store switch bit 2 into T
brts cond_01 // branch if T is set
cond_00:
rjmp whichEnd
cond_01:
rcall TurnRight
rjmp whichEnd
cond_1X:
// branch based on the state of switch bit 2
:
cond_10:
:
cond_11:
:
whichEnd:
Warning: The above code is for illustrative purposes only and would typically be found in the main looping section of code not in a subroutine. Do not use this code to implement your lab.
InForest and Implementation of IF…ELSE Expression
- The inForest subroutine tells us if the bear is in the forest (i.e., has found his way out of the maze).
- The rows and columns of the maze are numbered from 0 to 19 (13h) starting in the upper left hand corner.
- When the bear has found his way out of the maze he is in row minus one (-1). The subroutine is to return true (r25:r24 != 0) if the bear is in the forest and false (r25:r24 == 0) otherwise.
- The register pair r25:r24 is where C++ looks for return values for the BYTE data type.
InForest and Implementation of IF…ELSE Expression – Continued –
; ————————–
; ——- In Forest ——–
; Called from whichWay subroutine
; Input: row Outputs: C++ return register (r24)
; No others registers or flags are modified by this subroutine
inForest:
push reg_F // push any flags or registers modified
in reg_F,SREG
push r16
lds r16,row
test if bear is in the forest
endForest:
clr r25 // zero extend
pop r16 // pop any flags or registers placed on the stack
out SREG,reg_F
pop reg_F
ret
Appendix
APPENDIX A: CONTROL TRANSFER INSTRUCTION ENCODING
Direct
All control transfer addressing modes modify the program counter.
CONTROL TRANSFER INSTRUCTION ENCODING – Indirect
CONTROL TRANSFER INSTRUCTION ENCODING – Relative
APPENDIX B – AVR STATUS REGISTER (SREG)
Non ALU
- Bit 7 – I: Global Interrupt Enable
The Global Interrupt Enable bit must be set for the interrupts to be enabled. The individual interrupt enable control is then performed in separate control registers. The I-bit is cleared by hardware after an interrupt has occurred, and is set by the reti instruction. The I-bit can also be set and cleared by the application with the sei and cli instructions. - Bit 6 – T: Bit Copy Storage
The Bit Copy instructions bld (Bit LoaD) and bst (Bit STore) use the T-bit as source or destination. A bit from a register can be copied into T (Rb T) by the bst instruction, and a bit in T can be copied into a bit in a register (T Rb) by the bld instruction.
ALU
Signed two’s complement arithmetic
- Bit 4 – S: Sign Bit, S = N ⊕ V
Bit set if answer is negative with no errors or if both numbers were negative and error occurred, zero otherwise. - Bit 3 – V: Two’s Complement Overflow Flag
Bit set if error occurred as the result of an arithmetic operation, zero otherwise. - Bit 2 – N: Negative Flag
Bit set if result is negative, zero otherwise.
Unsigned arithmetic
- Bit 5 – H: Half Carry Flag
Carry from least significant nibble to most significant nibble. Half Carry is useful in BCD arithmetic. - Bit 0 – C: Carry Flag
The Carry Flag C indicates a carry in an arithmetic operation. Bit set if error occurred as the result of an unsigned arithmetic operation, zero otherwise.
Arithmetic and Logical
- Bit 1 – Z: Zero Flag
The Zero Flag Z indicates a zero result in an arithmetic or logic operation.
APPENDIX C – CONTROL TRANSFER (BRANCH) INSTRUCTIONS
Compare and Test cp, cpc, cpi, tst, bst
Unconditional
- Relative (1) rjmp, rcall
- Direct jmp, call
- Indirect ijmp, icall
- Subr. & Inter. Return ret, reti
Conditional
- Branch if (2) …
- SREG Flag bit is clear (brFlagc) or set (brFlags) by name (I, T, H, S, V, N, Z, C) or bit (brbc, brbs).
- These SREG flag bits (I, T, H, S, V, N, Z, C) use more descriptive mnemonics.
- Branch if equal (breq) or not equal (brne) test the Z flag.
- Unsigned arithmetic branch if plus (brpl) or minus (brmi) test the N flag, while branch if same or higher (brsh) or lower (brlo), test the C flag and are equivalent to brcc and brcs respectively.
- Signed 2’s complement arithmetic branch if number is less than zero (brlt) or greater than or equal to zero (brge) test the S flag
- Skip if …
- Bit (b) in a register is clear (sbrc) or set (sbrs).
- Bit (b) in I/O register is clear (sbic) or set (sbis). Limited to I/O addresses 0-31
Note:
- Branch relative to PC + (, where k = 12) + 1 PC-2047 to PC+2048, within 16 K word address space of ATmega328P
- All branch relative to PC + (, where k = 7) + 1 PC-64 to PC+63, within 16 K word address space of ATmega328P
APPENDIX D – ATMEGA328P INSTRUCTION SET
AVR Control Transfer: Looping
READING
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Sections: 3.1, 3.3
ADDITIONAL READING
Introduction to AVR assembler programming for beginners, controlling sequential execution of the program http://www.avr-asm-tutorial.net/avr_en/beginner/JUMP.html
AVR Assembler User Guide
Table of Contents
LOOP CONSTRUCTS IN C++ AND ASSEMBLY
Loop Example 1: Loop through a block of code 7 times.
- Typically we increment the counter variable in C++.
for(int i=0; i<7; i++); // This statement loops 7 times {i: 0,1,2,3,4,5,6}
- As shown in the example at the right below, in assembly we decrement the counter variable.
{i: 7,6,5,4,3,2,1}
This allows us to immediately test the SREG Z-flag bit without an intermediate compare instruction.
C++ | Assembly | |
for(int i=7; i>0; i—-) { Block of code }
|
int i = 7; do { Block of code i—-; } while(i>0);
|
ldi r16, 7 loop: Block of code
|
BUTTON DEBOUNCE EXAMPLE
- In the screen capture (red waveform), a button bounces for about 400us when pressed. Once the transition is detected, we want to design a software loop that will do nothing while the switch input stabilizes.
- Specifically, we want to design a software delay routine that will generate a delay of approx. .
DELAY CALCULATION FOR AVR
We begin by designing a simple loop.
wait:
ldi r16, ____ // Loop Count
delay:
dec r16 // ____ machine cycles
brne delay // ____ machine cycles
To discover the delay generated by our “software” loop we begin by finding the answers to the questions.
- What “Loop Count” Lcnt will generate the maximum delay?
- What is a machine cycle and how many machine cycles are required for each line of code?
- What is the number of machine cycles Nmc in 1 loop?
INSTRUCTION (OR MACHINE) CYCLE TIME FOR THE AVR
- Machine Cycle – The number of clock cycles it takes the CPU to fetch and execute an instruction.
- Because the AVR processors incorporate a 2-stage pipeline, there is a one-to-one relationship between an AVR machine cycle and a clock cycle. In contrast for the non-pipelined 8051 microcontroller one machine cycle = 12 clock cycles.
- Therefore to calculate the time it takes for one machine cycle you only need to take the inverse of the clock frequency.
Example: |
- As shown in the “Complete Instruction Set Summary” on page 427 of the AVR Instruction Set Document (Atmel doc0856) most AVR instructions need only one or two clock cycles to fetch and execute an instruction.
- Given a clock frequency of 16 MHz and based on the above table a multiple MUL instruction will take
to execute
PIPELINING
Before you can fully understand branching and looping you need to understand the concept of pipelining and how it is implemented in our AVR processor.
- Pipelining is a technique that breaks operations, such as instruction processing (fetch and execute) into smaller distinct stages so that a subsequent operation can begin before the previous one has completed.
- For most instructions, especially one based on a modified Harvard memory model, program memory is not accessed during the execution cycle. This memory down time could be used to fetch the next instruction to be executed, in parallel with the execution cycle of the current instruction. Here then is an opportunity for pipelining!
AVR INTERSTAGE PIPELINE REGISTERS
- A pipeline stage begins and ends with a register; controlled by a clock. Technically these are known as interstage pipeline registers.
- With respect to our AVR architecture the two registers of interest are the Program Counter (PC) and the Instruction Register (IR).
- Between the register(s) is combinational logic. Although counter-intuitive, Flash Program memory can be viewed as combinational logic with an address generating a word of data.
- Without pipelining these two registers in the control unit (PC, IR) would require two clock cycles to complete a basic computer operation cycle. Specifically, an instruction is (1) fetched and then (2) executed.
AVR TWO-STAGE INSTRUCTION PIPELINE
- The AVR pipeline has two independent stages. The first stage fetches an instruction and places it in the Instruction Register (IR), while the second stage is executing the instruction.
- For our RISC architecture most instructions are executed in a single cycle (also known as elemental instructions). In this perfect world where all instructions take one cycle to fetch and one cycle to execute, after an initial delay of one cycle to fill the pipeline, known as latency, each instruction will take only one cycle to complete.
BRANCH PENALTY
- Within the context of pipeline architecture, when the execution stage of the pipeline is executing a conditional branch instruction, the execution stage must “predict” the outcome of the instruction in order to fetch what it “guesses” will be the next instruction.
- While on average 80% of the time a branch is taken, the AVR always guesses that the branch will not be taken. This guess is made simply because it is the simplest to implement (the program counter automatically points at the next instruction to be executed).
- When a branch is taken, and the guess is wrong, the processor must build the pipeline from scratch thus accruing a “penalty.” With our simple 2-stage pipeline that penalty is one clock cycle as shown in the AVR Instruction Set Document.
BUTTON DEBOUNCE EXAMPLE – CONTINUED
In the screen capture (red waveform), a button bounces for about 400us when pressed. Once the transition is detected, we want to design a software loop that will do nothing while the switch input stabilizes. To remove the noise, we will design a software delay routine that will generate a delay of approx. 500 us.
DELAY CALCULATION FOR AVR
- Returning to our simple software loop
wait:
ldi r16, ____ // Loop Count
delay:
dec r16 // 1 clock cycle
brne delay // + 2 cycles if true, 1 cycle if false
= Delay generated by the loop
= period of one machine cycle = (note: 1 machine cycle = 1 clock cycle) = 1 / 16 MHz = 0.0625 usec
= number of machine cycles in 1 loop = 3 (for brne Nmc = 2 cycles, we subtract 1 for the one cases where our guess is correct.)
= number of times loop is run (Loop Count) = ?
CALCULATING MAXIMUM DELAY
- Next we will calculate the maximum delay
= 0 which results in a count of 256
(approx) Note: the -1 is subtracting the one true result
- Now Let’s increase this delay by adding a nop instruction and then recalculating the maximum delay
= number of machine cycles in 1 loop = 4
wait:
clr r16 // 0 = maximum delay
delay:
nop // 1
dec r16 // 1 clock cycle
brne delay // + 2 cycles if true, 1 cycle if false
(approx) with r16 = 0 (clr r16)
CALCULATING LOOP COUNT FOR A GIVEN DELAY
- To generate a delay of 500 µs we will initialize r16 for a delay of 50 µs and then write an outside loop that will run the inside loop 10 times for a total delay of approximately 500 µs
- Solving our Tmax equation for Loop Count Lcnt
- Set Lcnt for a delay of 50 µsec
wait:
ldi r16, 0xC8 // 200
delay:
nop // 1
dec r16 // 1 clock cycle
brne delay // + 2 cycles if true, 1 cycle if false
LOOP INSIDE A LOOP DELAY
On your own, create an outside loop with a count of 10 to give us a delay of approximately 500 µsec (Hint see Example 3-18 in your textbook)
DESIGN EXAMPLE WITH EE346 SHIELD
When the user presses the button, read first 3 switches (least significant), if the number is less than or equal to 5 then calculate factorial. If greater than 5 turn on decimal point. Display the least significant 4 bits of the answer.
MY DESIGN STEPS
Step 1: Initialized Ports
Step 2: Turned on LED 0 to indicate initialization complete
Step 3: Wrote code to pulse the clock
Step 4: Read in pin waiting for button to be pressed (Loop Example 1)
Step 5: Need to filter out Bounce (Loop Example 2)
Maximum delay that could be generated was only 48 usec
Step 6: Added a NOP instruction, max delay was now 64 usec
Set delay for nice even number of 50 usec
Step 7: Made an outside loop of 10 (Loop Example 3)
Step 8: Converted loop to a subroutine so I could change condition to button release.
Step 9: Check for button pressed and then released
Step 10: Read Switch and check if less than or equal to 6
Step 11: Calculate Factorial (Loop Example 4)
Step 12: Store 4 digit answer to SRAM (SRAM Indirect Addressing Mode)
Step 13: Sequentially, Load each digit and … (SRAM Indirect Addressing Mode)
Step 14: convert to 7-segment display (Flash Program Indirect Addressing Mode)
CSULB PROTO-SHIELD SCHEMATIC
CONFIGURE GPIO PORTS
ATMEGA328P INSTRUCTION SET
AVR Subroutine Basics
READING
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Chapter 3, pages 118 to 125
Table of Contents
AVR Subroutine Basics
- How do I go to and return from a subroutine?
rcall label
call label
icall label
ret
- AVR Call Addressing Modes
Relative The relative address is encoded in the machine instruction using 12 bits. Assuming that the Program Counter (PC) is pointing at the next instruction to be executed, a relative call can jump within a range of -2n-1 to 2n-1 – 1 program words, in other words -2K ≤ PC < 2K – 1. n = 12 bits, K = 210 = 1024, and a program word is 16-bits. Long full 16 K word (32K byte) address space Indirect full 16 K word (32K byte) address space
- Why Subroutines?
- My Little Subroutine Dictionary
- Assembly Subroutine Template
- How to Send Information to and/or from the Calling Program
- Rules for Working with Subroutines
WHY SUBROUTINES?
- Divide and Conquer – Allow you to focus on one small “chunk” of the problem at a time.
- Code Organization – Gives the code organization and structure. A small step into the world of object-oriented programming.
- Modular and Hierarchical Design – Moves information about the program at the appropriate level of detail.
- Code Readability – Allows others to read and understand the program in digestible “bites” instead of all at once. Higher level subroutines with many lower level subroutine calls take on the appearance of a high level language.
- Encapsulation – Insulates the rest of the program from changes made within a procedure.
- Team Development – Helps multiple programmers to work on the program in parallel; a first step to configuration control. Allows a programmer to continue writing his code, independent of other team members by introducing “stub” subroutines. A stub subroutine may be as simple as the subroutine label followed by a return instruction.
MY LITTLE SUBROUTINE DICTIONARY
SUBROUTINE VERSUS FUNCTION
- Functions and subroutines are the most basic building block you can use to organize your code.
- Functions are very similar to subroutines; their syntax is nearly identical, and they can both perform the same actions. However, Functions return a value to the code that called it.
- For this course the terms Subroutine, Procedure and Method may describe a Subroutine or Function based on context.
PARAMETER VERSUS ARGUMENT
- In everyday usage, “parameter” and “argument” are used interchangeably to refer to the things that you use to define and call methods or functions.
- Often this interchangeability doesn’t cause ambiguity. It should be noted, though, that conventionally, they refer to different things.
- A “parameter” is the thing used to define a method or function while an “argument” is the thing you use to call a method or function.
- Ultimately, it doesn’t really matter what you say. People will understand from the context.
ASSEMBLY SUBROUTINE TEMPLATE
; —- My Subroutine ——-
; Called from Somewhere
; Input: Registers, SRAM variables, or I/O registers
; Outputs: None for a subroutine or r25:r24 register pair for a C function
; No others registers or flags are modified by this subroutine
; ————————–
MySubroutine:
push r15 // push any flags or registers modified by the procedure
in r15,SREG
push r16
my assembly code
endMySubroutine:
clr r25 // zero-extended to 16-bits for C++ call (optional)
pop r16 // pop any flags or registers placed on the stack
out SREG,r15
pop r15
ret
HOW TO SEND INFORMATION TO AND/OR FROM THE CALLING PROGRAM
There are many way to send information to and from a subroutine or function. Here are a few…
- In Register(s) or Register Pair(s) agreed upon between the calling program and Procedure or Function.
- By setting or clearing one of the bits in SREG (I, T, H, S, V, N, Z, C).
- In an SRAM variable, this method is not recommended.
- As part of a Stack Frame, this method is beyond the scope of a course on microcontrollers but is highly recommended.
HOW TO SEND INFORMATION TO AND/OR FROM YOUR C PROGRAM
When working in a Mixed C and Assembly programming environment, our subroutines and functions communicate using Register Pairs.
- Mixed C and Assembly parameter passing Register Pairs
In your C Program…
// C Assembly External Declarations
extern void mySubr(uint8_t param1, uint16_t param2, uint16_t param3);
extern uint8_t myFunc(uint8_t param1, uint16_t param2, uint16_t param3);
In your Assembly Program…
; Define Assembly Directives
.DEF parm1H = r25
.DEF parm1L = r24
.DEF parm2H = r23
.DEF parm2L = r22
.DEF parm3H = r21
.DEF parm3L = r20
mySubr:
Assembly Code
ret
- 8-bit return values (uint8_t data type) are zero/sign-extended to 16-bits in r25:r24 by called function.
RULES FOR WORKING WITH SUBROUTINES
Here are a few rules to remember when writing your main program and subroutines.
- Always disable interrupts and initialize the stack pointer at the beginning of your program.
; Disable interrupts and configure stack pointer for 328P
cli
ldi r16,low(RAMEND) // RAMEND address 0x08ff
out SPL,r16 // Stack Pointer Low SPL at i/o address 0x3d
ldi r16,high(RAMEND)
out SPH,r16 // Stack Pointer High SPH at i/o address 0x3e - Always initialize variables and registers at the beginning of your program. Do not re-initialize I/O registers used to configure the GPIO ports or other subsystems within a loop or a subroutine. For example, you only need to configure the port pins assigned to the switches as inputs with pull-up resistors once.
- Push (push r7) any registers modified by the subroutine at the beginning of the subroutine and pop (pop r7) in reverse order the registers at the end of the subroutine. This rule does not apply if you are using one of the registers or SREG flags to return a value to the calling program. Comments should clearly identify which registers are modified by the subroutine.
- You cannot save the Status Register SREG directly onto the stack. Instead, first push one of the 32 registers on the stack and then save SREG in this register. Reverse the sequence at the end of the subroutine.
push r15
in r15, SREG
:
out SREG, r15
pop r15 - Never jump into a subroutine. Use a call instruction (rcall, call) to start executing code at the beginning of a subroutine.
- Never jump out of a subroutine. Your subroutine should contain a single return (ret) instruction as the last instruction (ret = last instruction).
- You do not need an .ORG assembly directive. As long as the previous code segment ends correctly (rjmp, ret, reti) your subroutine can start at the next address.
- You do not need to clear a register or any variable for that matter before you write to it.
clr r16; this line is not required
lds r16, A - All blocks of code within the subroutine or Interrupt Service Routine (ISR) should exit the subroutine through the pop instructions and the return (ret, reti).
- It is a good programming practice to include only one return instruction (ret, reti) located at the end of the subroutine.
- Once again, never jump into or out of a subroutine from the main program, an interrupt service routine, or any other subroutine. However, subroutines or ISRs may call (rcall) other subroutines.
BASIC STRUCTURE OF A SUBROUTINE – A REVIEW
- Load argument(s) into input registers (parameters) as specified in the header of the subroutine (typically r24, r22).
- Call the Subroutine
- Save an image of the calling programs CPU state by pushing all registers modified by the subroutine, including saving SREG to a register.
- Do something with the return value(s) stored in the output register(s) specified in the header of the subroutine (typically r24, r22).
- Restore image of the calling programs CPU state by popping all registers modified by the subroutine, including loading SREG from a register.
- Return
AVR Peripherals: General-Purpose Input/Output
READING
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Sections: 4.1, 4.2, 6.4
SOURCE MATERIAL
1. ATmega328P Datasheet Section 13 “I/O-Ports” http://www.atmel.com/dyn/resources/prod_documents/doc8161.pdf
2. Arduino Port Registers
3. arduino-duemilanove-schematic
4. arduino-proto-shield
Table of Contents
Source: ATmega328P Data Sheet http://www.atmel.com/dyn/resources/prod_documents/8161S.pdf page 5 ******
ATMEGA GENERAL PURPOSE DIGITAL I/O PORTS
- The ATmega328P has 23 General Purpose Digital I/O Pins assigned to 3 GPIO Ports (8-bit Ports B, D and 7-bit Port C)
- Each I/O port pin may be configured as an output with symmetrical drive characteristics. Each pin driver is strong enough (20 mA) to drive LED displays directly.
- Each I/O port pin may be configured as an input with or without a pull-up resistors. The values for the pull up resistors can range from 20 – 50 K ohms.
- Each I/O pin has clamping diodes to protect input circuit from undervoltage/overvoltage and ESD conditions.
DUAL ROLE OF PORTS B, C AND D OF THE ATMEGA328P
I/O Ports B (PB7:0), Port C (PC5:0), and Port D (PD7:0)
Ports B, C, and D are bi-directional I/O ports with internal pull-up resistors (selected for each bit). The Port output buffers have symmetrical drive characteristics with both high sink and source capability.
Interrupts (INT1, INT0, PCINT23..0)
External Interrupts are triggered by the INT0 and INT1 pins or any of the PCINT23..0 pins. Observe that, if enabled, the interrupts will trigger even if the INT0 and INT1 or PCINT23..0 pins are configured as outputs. This feature provides a way of generating a software interrupt.
AVCC
AVCC is the supply voltage pin for the A/D Converter. It should be externally connected to VCC. If the ADC is used, it should be connected to VCC through a low-pass filter.
AREF
AREF is the analog reference pin for the A/D Converter.
ADC5:0
These pins serve as analog inputs to the A/D converter. These pins are powered from the analog supply and serve as 10-bit ADC channels.
I/O PORT PIN AS AN OUTPUT
- To configure a Port (x) pin as an output set corresponding bit (n) in the Data Direction Register (DDxn) to 1. Once configured as an output pin, you control the state of the pin (1 or 0) by writing to the corresponding bit (n) of the PORTxn register.
- Writing (signal WPx) a logic one to PINxn toggles the value of PORTxn, independent on the value of DDxn. Note that the SBI instruction can be used to toggle one single bit in a port.
I/O PORT PIN AS AN INPUT
- To configure a Port (x) pin as an input set corresponding bit (n) in the Data Direction Register (DDxn) to 0. To add a pull-up resistor set the corresponding bit (n) of the PORTxn register to 1 (see illustration).
- You can now read the state of the input pin by reading the corresponding bit (n) of the PINxn register.
ACCESSING GPIO LINES IN ASSEMBLY
DESIGN EXAMPLE 1 – Read Switches
Problem: Program GPIO Port C bits 5 to 0 as inputs with pull-up resistors. Read GPIO Port C into register r6 and move bit 4 to register r7 bit 0. Your program should not modify Port C bits 7 and 6.
; Initialize Switches with Pull-up resistors
in r16, DDRC // Port C DDR for switches 5 to 0
cbr r16,0b00111111 // define bits 5 to 0 as input (clear)
out DDRC,r16 // output DDxn = 0 PORTxn = Undefined
in r16,PORTC // PORT C Register for switches 5 to 0
sbr r16,0b00111111 // add pull-up resistors (PUR)
out PORTC,r16 // output DDxn = 0 PORTxn = 1
Main:
:
in r6,PINC // R6 <- IO[0x06]
bst r6,4 // T <- R6 bit 4
bld r7,0 // R7 bit 0 (seg_a) <- T
DESIGN EXAMPLE 2 – CONFIGURE D FLIP-FLOP
Problem: Program GPIO Port D bit 5 as an output and bit 2 as an input without a pull-up resistor.
; Pushbutton debounce port D pins
.EQU dff_clk=PORTD5 // clock of debounce flip-flop
.EQU dff_Q=PIND2 // Q output of debounce flip-flop
; initialize push-button debounce circuit
sbi DDRD,dff_clk // flip-flop clock, DDRD5 = 1; PORTD5 = Undefined
cbi PORTD,dff_clk // DDRD5 = 1; PORTD5 = 0
cbi DDRD,dff_Q // flip-flop Q DDRD2 = 0; PORTD2 = Undefined
cbi PORTD,dff_Q // flip-flop Q DDRD2 = 0; PORTD2 = 0
REGISTER SUMMARY AND THE I/O PORT
- Three I/O memory address locations are allocated for each port, one each for the Data Register – PORTx, Data Direction Register – DDRx, and the Port Input Pins – PINx.
- The Port Input Pins I/O location PINx is Read Only, while the Data Register and the Data Direction Register are read/write.
- However, Writing a logic one to a bit in the PINx Register, will result in a Toggle in the corresponding bit in the Data Register.
- In addition, the Pull-up Disable – PUD bit in MCUCR disables the pull-up function for all pins in all ports when set.
I/O PORT PIN SCHEMATIC
I/O PORT PIN CONFIGURATIONS
Inputs | Outputs | |||
DDRXn | PORTXn | I/O | Pull-Up | Comments |
0 | 0 | Input | No | Read “Synchronized” PINXn |
0 | 1 | Input | Yes | |
1 | X | Output | N/A | Write bit to PORTXn |
Appendix
APPENDIX A – PROGRAM I/O PORT AS AN INPUT USING MNEMONICS
.INCLUDE
; C:\Program Files\Atmel\AVR Tools\AvrAssembler2\Appnotes\m328Pdef.inc
in r16,DDRC // DDRC equated to 0x07 in m328Pdef.inc
cbr r16,(1<<pc5)|(1<<pc4)|(1<<pc3)|(1<<pc2)|(1<<pc1)|(1<<pc0)< span=””> out DDRC,r16 // output DDxn = 0; PORTxn = Undefined
in r16,PORTC // PortC equated to 0x08
sbr r16,(1<<pc5)|(1<<pc4)|(1<<pc3)|(1<<pc2)|(1<<pc1)|(1<<pc0)< span=””> out PORTC,r16 // output DDxn = 0; PORTxn = 1</pc5)|(1<<pc4)|(1<<pc3)|(1<<pc2)|(1<<pc1)|(1<<pc0)<></pc5)|(1<<pc4)|(1<<pc3)|(1<<pc2)|(1<<pc1)|(1<<pc0)<>
.INCLUDE “spi.inc”
The following Define and Equate Assembly Directives are defined in spi_shield.inc
.DEF spi7SEG=r8 // Text Substitution (copy-paste)
.DEF switch=r7
.EQU seg_a=0 // Numeric Substitution
in switch, PINC // R7 <- PINC
bst switch,4 // T <- R7 bit 4
bld spi7SEG,seg_a // R8 bit 0 <- T
Appendix B – I/O PORT PIN “SYNCHRONIZER”
- As previously discussed, you read a port pin by reading the corresponding PINxn Register bit. The PINxn Register bit and the preceding latch constitute a synchronizer. This is needed to avoid metastability if the physical pin changes value near the edge of the internal clock, but it also introduces a delay as shown in the timing diagram.
- Consider the clock period starting shortly after the first falling edge of the system clock. The latch is closed when the clock is low, and goes transparent when the clock is high, as indicated by the shaded region of the “SYNC LATCH” signal. The signal value is latched when the system clock goes low. It is clocked into the PINxn Register at the succeeding positive clock edge. As indicated by the two arrows tpd,max and tpd,min, a single signal transition on the pin will be delayed between ½ and 1½ system clock period depending upon the time of assertion.
Appendix C – SWITCHING BETWEEN I/O PORT PIN CONFIGURATIONS
- When switching between tri-state ({DDxn, PORTxn} = 0b00) and output high ({DDxn, PORTxn} = 0b11), an intermediate state with either pull-up enabled ({DDxn, PORTxn} = 0b01) or output low ({DDxn, PORTxn} = 0b10) must occur.
- Switching between input with pull-up ({DDxn, PORTxn} = 0b01) and output low ({DDxn, PORTxn} = 0b10) generates the same problem. You must use either the tri-state ({DDxn, PORTxn} = 0b00) or the output high state ({DDxn, PORTxn} = 0b11) as an intermediate step.
Interrupts and 16-bit Timer/Counter 1: ATmega328P Timing Subsystems
Reading
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Sections: 9.1, 9.3
Table of Contents
ATmega328P Timing Subsystem
The ATmega328P is equipped with two 8-bit timer/counters and one 16-bit counter. These Timer/Counters let you…
- Turn on or turn off an external device at a programmed time.
- Generate a precision output signal (period, duty cycle, frequency). For example, generate a complex digital waveform with varying pulse width to control the speed of a DC motor
- Measure the characteristics (period, duty cycle, frequency) of an incoming digital signal
- Count external events
What is a Flip-Flop and a Counter
You can think of a D flip-flop as a one-bit memory. The something to remember on the D input of flip-flop is remembered on the positive edge of the clock input .
Dt | Qt+1 |
0 | 0 |
1 | 1 |
X | Qt |
The counter part of an ATmega328P Timer/Counter peripheral subsystem is an example of an asynchronous (ripple) counter, which is a collection of flip-flops with the clock input of stage n connected to the output of stage n -1
When compared with a synchronous counter, an asynchronous “ripple” counter: generates less noise and is less expensive. On the negative side, an asynchronous “ripple” counter is slower than a synchronous counter.
Timing Terminology
Frequency
The number of times a particular event repeats within a 1-s period. The unit of frequency is Hertz, or cycles per second. For example, a sinusoidal signal with a 60-Hz frequency means that a full cycle of a sinusoid signal repeats itself 60 times each second, or every 16.67 ms. For the digital waveform shown, the frequency is 2 Hz.
Period
The flip side of a frequency is a period. If an event occurs with a rate of 2 Hz, the period of that event is 500 ms. To find a period, given a frequency, or vice versa, we simply need to remember their inverse relationship, F = 1/T where F and T represent a frequency and the corresponding period, respectively.
Duty Cycle
In many applications, periodic pulses are used as control signals. A good example is the use of a periodic pulse to control a servo motor. To control the direction and sometimes the speed of a motor, a periodic pulse signal with a changing duty cycle over time is used.
Duty cycle is defined as the percentage of one period a signal is ON. The periodic pulse signal shown in the Figure is ON for 50% of the signal period and off for the rest of the period. Therefore, we call the signal in a periodic pulse signal with a 50% duty cycle. This special case is also called a square wave.
Timer 1 Modes of Operation
Normal Mode
- The simplest AVR Timer mode of operation is the Normal mode. Waveform Generation Mode for Timer/Counter 1 (WGM1) bits 3:0 = 0. These bits are located in Timer/Counter Control Registers A/B (TCCR1A and TCCR1B).
- In this mode the Timer/Counter 1 Register (TCNT1H:TCNT1L) counts up (incrementing), and no counter clear is performed. The counter simply overruns when it passes its maximum 16-bit value 0xFFFF and then restarts 0x0000.
- There are no special cases to consider in the Normal mode, a new counter value can be written anytime.
- In normal operation the Timer/Counter Overflow Flag (TOV1) bit located in the Timer/Counter1 Interrupt Flag Register (T1FR1) will be set in the same timer clock cycle as the Timer/Counter 1 Register (TCNT1H:TCNT1L) becomes zero. The TOV1 Flag in this case behaves like a 17th bit, except that it is only set, not cleared.
Timer/Counter 1 Prescalar
The clock input to Timer/Counter 1 (TCNT1) can be pre-scaled (divided down) by 5 preset values (1, 8, 64, 256, and 1024).
Clock Select Counter/Timer 1 (CS1) bits 2:0 are located in Timer/Counter Control Registers B [yellow].
Timer/Counter 1 Normal Mode – Design Example
- In this design example, we want to write a 250 msec delay routine assuming a system clock frequency of 16.000 MHz and a prescale divisor of 64.
- The first step is to discover if our 16-bit Timer/Counter 1 can generate a 250 ms delay.
Variable Definitions
tclk_T1 : period of clock input to Timer/Counter1
fclk : AVR system clock frequency
fTclk_I/O : AVR Timer clock input frequency to Timer/Counter Waveform Generator
How to Calculate Maximum Delay (Normal Mode)
- The largest time delay possible is achieved by setting both TCNT1H and TCNT1L to zero, which results in the overflow flag TOV1 flag being set after 216 = 65,536 tics of the Timer/Counter1 clock.
, given then
and therefore
- Clearly, Timer 1 can generate a delay of 250 msec
- Our next step is to calculate the TCNT1 load value needed to generate a 250 ms delay.
How to Calculate Timer Load Value
Steps to Calculate to Timer Load Value (Normal Mode)
Problem
Generate a 250 msec delay assuming a clock frequency of 16 MHz and a prescale divisor of 64.
Solution
- Divide desired time delay by tclkT1 where tclkT1 = 64/fclkI/O = 64 / 16.000 MHz = 4 µsec/tic
250msec / 4 µs/tic = 62,500 tics
short-cut: TCNT1H = high(-62,500) and TCNT1L = low(-62,500) - Subtract 65,536 – step 1
65,536 – 62,500 = 3,036 - Convert step 2 to hexadecimal.
3,036 = 0x0BDC
For our example TCNT1H = 0x0B and TCNT1L = 0xDC - Check Answer
3,036 tics x 4 µs/tic = 12.14 msec
262.14 msec – 250 msec = 12.14 msec √
Steps to Calculate Clock Divisor (Normal Mode)
In the previous example we assumed a divisor of 64, and then by calculating the maximum delay TMAX verified that this assumption was correct. After that we simply followed the steps defined in the previous slide to calculate the value to be loaded into 16-bit timer/counter TCNT1.
Where:
TMAX = maximum delay
N = divisor
n = number of flip-flops making-up the timer
fclk = system clock frequency
But what if we are not given N and need to find TCNT1 for a given delay tdelay. In this case we know that tdelay ≤ TMAX and applying a little algebra can find an equation for N.
Let’s take a second look at our 250 msec delay problem. This time we will not assume a divisor of 64. Applying equation 2 we have:
From Table 13.5 “Clock Select Bit Description” on page 10, we see that the possible clock divisors are 1, 8, 64, 256, and 1024. From this list we want to select the divisor that is the closest value, yet greater than or equal to N. For our example, not surprisingly the answer is again 64.
Polling Example – Assembly Version
; -------------------------- ; ------ Delay 250ms ------ ; Called from main program ; Input: none Output: none ; no registers are modified by this subroutine Delay: push r15 in r15, SREG push r16 wait: sbis TIFR1, TOV1 rjmp wait sbi TIFR1, TOV1 // clear flag bit by writing a one (1) ldi r16,0x0B // load value high byte 0x0B sts TCNT1H,r16 ldi r16,0xDC // load value low byte 0xDC sts TCNT1L,r16 pop r16 out SREG, r15 pop r15 ret
Polling Example – C Version
; -------------------------- ; ------ Delay 250ms ------ ; Called from main program ; Input: none Output: none void T1Delay() { while (!(TIFR & (1<<tov1))) // eq. to Ex: 9-42 expression TIFR = 1<<tov1; clear="" timer="" overflow="" flag<br=""> TCNT1H = 0x0B; TCNT1L = 0xDC;</tov1;></tov1))) }
More Looping Examples
Here are six (6) other ways of implementing the looping part of the Polling Example written in assembly. See if you can come up with a few more.
wait:
sbis TIFR1, TOV1 // targets a specific bit |
wait:
in r16, TIFR1 |
wait:
in r16, TIFR1 |
|
wait:
in r16, TIFR1 |
wait:
in r16, TIFR1 |
wait:
in r16, TIFR1 |
wait:
in r16, TIFR1 |
Interrupts and 16-bit Timer/Counter 1: ATmega Interrupts
READING
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Sections: 10.1, 10.4
Table of Contents
Interrupt Basics
- A microcontroller normally executes instructions in an orderly fetch-execute sequence as dictated by a user-written program.
- However, a microcontroller must also be ready to handle unscheduled, events that might occur inside or outside the microcontroller.
- The interrupt system onboard a microcontroller allows it to respond to these internally and externally generated events. By definition we do not know when these events will occur.
- When an interrupt event occurs, the microcontroller will normally complete the instruction it is currently executing and then transition program control to an Interrupt Service Routine (ISR). These ISR, which handles the interrupt.
- Once the ISR is complete, the microcontroller will resume processing where it left off before the interrupt event occurred.
The Main Reasons You Might Use Interrupts
- To detect pin changes (eg. rotary encoders, button presses)
- Watchdog timer (eg. if nothing happens after 8 seconds, interrupt me)
- Timer interrupts – used for comparing/overflowing timers
- SPI data transfers
- I2C data transfers
- USART data transfers
- ADC conversions (analog to digital)
- EEPROM ready for use
- Flash memory ready
ATmega328P Interrupt Vector Table
- The ATmega328P provides support for 25 different interrupt sources. These interrupts and the separate Reset Vector each have a separate program vector located at the lowest addresses in the Flash program memory space.
- The complete list of vectors is shown in Table 11-6 “Reset and Interrupt Vectors in ATMega328P. Each Interrupt Vector occupies two instruction words.
- The list also determines the priority levels of the different interrupts. The lower the address the higher is the priority level. RESET has the highest priority, and next is INT0 – the External Interrupt Request 0.
ATmega328P Interrupt Vector Table
Vector No | Program Address | Source | Interrupt Definition | Arduino/C++ ISR() Macro Vector Name | Assembly Name |
---|---|---|---|---|---|
1 | 0x0000 | RESET | Reset | ||
2 | 0x0002 | INT0 | External Interrupt Request 0 (pin D2) | (INT0_vect) | INT0addr |
3 | 0x0004 | INT1 | External Interrupt Request 1 (pin D3) | (INT1_vect) | INT1addr |
4 | 0x0006 | PCINT0 | Pin Change Interrupt Request 0 (pins D8 to D13) | (PCINT0_vect) | PCI0addr |
5 | 0x0008 | PCINT1 | Pin Change Interrupt Request 1 (pins A0 to A5) | (PCINT1_vect) | PCI1addr |
6 | 0x000A | PCINT2 | Pin Change Interrupt Request 2 (pins D0 to D7) | (PCINT2_vect) | PCI2addr |
7 | 0x000C | WDT | Watchdog Time-out Interrupt | (WDT_vect) | WDTaddr |
8 | 0x000E | TIMER2 COMPA | Timer/Counter2 Compare Match A | (TIMER2_COMPA_vect) | OC2Aaddr |
9 | 0x0010 | TIMER2 COMPB | Timer/Counter2 Compare Match B | (TIMER2_COMPB_vect) | OC2Baddr |
10 | 0x0012 | TIMER2 OVF | Timer/Counter2 Overflow | (TIMER2_OVF_vect) | OVF2addr |
11 | 0x0014 | TIMER1 CAPT | Timer/Counter1 Capture Event | (TIMER1_CAPT_vect) | ICP1addr |
12 | 0x0016 | TIMER1 COMPA | Timer/Counter1 Compare Match A | (TIMER1_COMPA_vect) | OC1Aaddr |
13 | 0x0018 | TIMER1 COMPB | Timer/Counter1 Compare Match B | (TIMER1_COMPB_vect) | OC1Baddr |
14 | 0x001A | TIMER1 OVF | Timer/Counter1 Overflow | (TIMER1_OVF_vect) | OVF1addr |
15 | 0x001C | TIMER0 COMPA | Timer/Counter0 Compare Match A | (TIMER0_COMPA_vect) | OC0Aaddr |
16 | 0x001E | TIMER0 COMPB | Timer/Counter0 Compare Match B | (TIMER0_COMPB_vect) | OC0Baddr |
17 | 0x0020 | TIMER0 OVF | Timer/Counter0 Overflow | (TIMER0_OVF_vect) | OVF0addr |
18 | 0x0022 | SPI, STC | SPI Serial Transfer Complete | (SPI_STC_vect) | SPIaddr |
19 | 0x0024 | USART, RX | USART, Rx Complete | (USART_RX_vect) | URXCaddr |
20 | 0x0026 | USART, UDRE | USART, Data Register Empty | (USART_UDRE_vect) | UDREaddr |
21 | 0x0028 | USART, TX | USART, Tx Complete | (USART_TX_vect) | UTXCaddr |
22 | 0x002A | ADC | ADC Conversion Complete | (ADC_vect) | ADCCaddr |
23 | 0x002C | EE READY | EEPROM Ready | (EE_READY_vect) | ERDYaddr |
24 | 0x002E | ANALOG COMP | Analog Comparator | (ANALOG_COMP_vect) | ACIaddr |
25 | 0x0030 | TWI | 2-wire Serial Interface | (I2C) (TWI_vect) | TWIaddr |
26 | 0x0032 | SPM READY | Store Program Memory Ready | (SPM_READY_vect) | SPMRaddr |
ATmega328P Interrupt Processing
- (1) When an interrupt occurs, (2) the microcontroller completes the current instruction and (3) stores the address of the next instruction on the stack
- It also turns off the interrupt system to prevent further interrupts while one is in progress. This is done by (4) clearing the SREG Global Interrupt Enable I-bit.
- The (5) Interrupt flag bit is cleared for Type 1 Interrupts only (see the next page for Type definitions).
- The execution of the ISR is performed by (6) loading the beginning address of the ISR specific for that interrupt into the program counter. The AVR processor starts running the ISR.
- (7) Execution of the ISR continues until the return from interrupt instruction (reti) is encountered. The (8) SREG I-bit is automatically set when the reti instruction is executed (i.e., Interrupts enabled).
- When the AVR exits from an interrupt, it will always (9) return to the interrupted program and (10) execute one more instruction before any pending interrupt is served.
- The Status Register is not automatically stored when entering an interrupt routine, nor restored when returning from an interrupt routine. This must be handled by software.
push reg_F
in reg_F,SREG
:
out SREG,reg_F
pop reg_F
By The Numbers
Type 1
- The user software can write logic one to the I-bit to enable nested interrupts. All enabled interrupts can then interrupt the current interrupt routine.
- The SREG I-bit is automatically set to logic one when a Return from Interrupt instruction – RETI – is executed.
- There are basically two types of interrupts…
- The first type (Type 1) is triggered by an event that sets the Interrupt Flag. For these interrupts, the Program Counter is vectored to the actual Interrupt Vector in order to execute the interrupt handling routine, and hardware clears the corresponding Interrupt Flag.
- If the same interrupt condition occurs while the corresponding interrupt enable bit is cleared, the Interrupt Flag will be set and remembered until the interrupt is enabled, or the flag is cleared by software (interrupt cancelled).
- Interrupt Flag can be cleared by writing a logic one to the flag bit position(s) to be cleared.
- If one or more interrupt conditions occur while the Global Interrupt Enable (SREG I) bit is cleared, the corresponding Interrupt Flag(s) will be set and remembered until the Global Interrupt Enable bit is set on return (reti), and will then be executed by order of priority.
- The first type (Type 1) is triggered by an event that sets the Interrupt Flag. For these interrupts, the Program Counter is vectored to the actual Interrupt Vector in order to execute the interrupt handling routine, and hardware clears the corresponding Interrupt Flag.
Type 2
- The second type (Type 2) of interrupts will trigger as long as the interrupt condition is present. These interrupts do not necessarily have Interrupt Flags. If the interrupt condition disappears before the interrupt is enabled, the interrupt will not be triggered.
When Writing an Interrupt Service Routine (ISR)
- As a general rule get in and out of ISRs as quickly as possible. For example do not include timing loops inside of an ISR.
- If you are writing an Arduino program
- Don’t add delay loops or use function delay()
- Don’t use function Serial.print(val)
- Make variables shared with the main code volatile
- Variables shared with main code may need to be protected by “critical sections” (see below)
- Toggling interrupts off and on is not recommended. The default in the Arduino is for interrupts to be enabled. Don’t disable them for long periods or things like timers won’t work properly.
Program Initialization and the Interrupt Vector Table (IVT)
- Start by jumping over the Interrupt Vector Table
RST_VECT:
rjmp reset
- Add jumps in the IVT to your ISR routines
.ORG INT0addr // 0x0002 External Interrupt 0
jmp INT0_ISR
.ORG OVF1addr
jmp TOVF1_ISR
- Initialize Variables, Configure I/O Registers, and Set Local Interrupt Flag Bits
reset:
lds r16, EICRA // EICRA Memory Mapped Address 0x69
sbr r16, 0b000000010
cbr r16, 0b000000001
sts EICRA, r16 // ISC0=[10] (falling edge)
sbi EIMSK, INT0 // Enable INT0 interrupts
- Enable interrupts at the end of the initialization section of your code.
sei // Global Interrupt Enable
loop:
The Interrupt Service Routine (ISR)
; — Interrupt Service Routine —
INT0_ISR:
push reg_F
in reg_F,SREG
push r16
; Load
; Do Something
; Store
pop r16
out SREG,reg_F
pop reg_F
reti
; ——————————————————-
Predefined Arduino IDE Interrupts
- When you push the reset button the ATmega328P automatically runs an Arduino Boot program located in a separate Boot Flash section at the top of program memory. If compiled within the Arduino IDE, the Boot program loads your compiled program with these interrupts enabled.
17 | 0x0020 | TIMER0 OVF | Timer/Counter0 Overflow | (TIMER0_OVF_vect) |
- The millis() and micros() function calls make use of the “timer overflow” feature utilize timer 0. The ISR runs roughly 1000 times a second, and increments an internal counter which effectively becomes the millis() counter (see On your own question).
19 | 0x0024 | USART, RX | USART Rx Complete | (USART_RX_vect) |
21 | 0x0028 | USART, TX | USART, Tx Complete | (USART_TX_vect) |
- The hardware serial library uses interrupts to handle incoming and outgoing serial data. Your program can now be doing other things while data in an SRAM buffer is sent or received. You can check the status of the buffer by calling the Serial.available() function.
- On your own. Given that you are using 8-bit Timer/Counter 0, you have set TCCR0B bits CS02:CS01:CS00 = 0b011 (clkI/O/64), and the system clock fclk = 16 MHz, what value would you preload into the Timer/Counter Register TCNT0 to get a interrupt 1000 times a second.
Source: Gammon Software Solutions forum – this blog also covers how to work with all the interrupts in C++ and the Arduino scripting language.
Appendix
Programming the Arduino to Handle External Interrupts
- Variables shared between ISRs and normal functions should be declared “volatile“. This tells the compiler that such variables might change at any time, and thus the compiler should not “optimize” the code by placing a copy of the variable in one of the general purpose processor registers (R31..R0). Specifically, the processor must reload the variable from SRAM whenever it is referenced.
int pin = 13;
volatile int state = LOW;
- Add jumps in the IVT to ISR routine, configure External Interrupt Control Register A (EICRA), and enable local and global Interrupt Flag Bits.
void setup()
{
pinMode(pin, OUTPUT);
attachInterrupt(0, blink, CHANGE);
}
- Write Interrupt Service Routine (ISR)
void blink()
{
state = !state;
}
- To disable interrupts globally (clear the I bit in SREG) call the noInterrupts() function. To once again enable interrupts (set the I bit in SREG) call the interrupts() function.
- Again – Toggling interrupts ON and OFF is not recommended. For a discussion of when you may want to turn interrupts off, read Gammon Software Solutions forum – Why disable Interrupts?
The Real World of External Interrupts
Reading
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Sections: 10.3, 10.5
Table of Contents
External Interrupts
- Review ATmega328P Interrupts Lecture Notes page 4 “Interrupt Basics”
- External Interrupts are triggered by the INT0 and INT1 pins or any of the PCINT23..0 pins
- 23 Pin Change Interrupts are mapped to the 23 General Purpose I/O Port Pins:
Port B Group | PCINT7 (PB7) PCINT0 (PB0) | |
Port C Group | PCINT14 (PC6) PCINT8 (PC0) | |
Port D Group | PCINT23 (PD7) PCINT16 (PD0) |
ATmega328P Interrupt Vector Table
Vector No | Program Address | Source | Interrupt Definition | Arduino/C++ ISR() Macro Vector Name |
---|---|---|---|---|
1 | 0x0000 | RESET | Reset | |
2 | 0x0002 | INT0 | External Interrupt Request 0 (pin D2) | (INT0_vect) |
3 | 0x0004 | INT1 | External Interrupt Request 1 (pin D3) | (INT1_vect) |
4 | 0x0006 | PCINT0 | Pin Change Interrupt Request 0 (pins D8 to D13) | (PCINT0_vect) |
5 | 0x0008 | PCINT1 | Pin Change Interrupt Request 1 (pins A0 to A5) | (PCINT1_vect) |
6 | 0x000A | PCINT2 | Pin Change Interrupt Request 2 (pins D0 to D7) | (PCINT2_vect) |
7 | 0x000C | WDT | Watchdog Time-out Interrupt | (WDT_vect) |
8 | 0x000E | TIMER2 COMPA | Timer/Counter2 Compare Match A | (TIMER2_COMPA_vect) |
9 | 0x0010 | TIMER2 COMPB | Timer/Counter2 Compare Match B | (TIMER2_COMPB_vect) |
10 | 0x0012 | TIMER2 OVF | Timer/Counter2 Overflow | (TIMER2_OVF_vect) |
11 | 0x0014 | TIMER1 CAPT | Timer/Counter1 Capture Event | (TIMER1_CAPT_vect) |
12 | 0x0016 | TIMER1 COMPA | Timer/Counter1 Compare Match A | (TIMER1_COMPA_vect) |
13 | 0x0018 | TIMER1 COMPB | Timer/Counter1 Compare Match B | (TIMER1_COMPB_vect) |
14 | 0x001A | TIMER1 OVF | Timer/Counter1 Overflow | (TIMER1_OVF_vect) |
15 | 0x001C | TIMER0 COMPA | Timer/Counter0 Compare Match A | (TIMER0_COMPA_vect) |
16 | 0x001E | TIMER0 COMPB | Timer/Counter0 Compare Match B | (TIMER0_COMPB_vect) |
17 | 0x0020 | TIMER0 OVF | Timer/Counter0 Overflow | (TIMER0_OVF_vect) |
18 | 0x0022 | SPI, STC | SPI Serial Transfer Complete | (SPI_STC_vect) |
19 | 0x0024 | USART, RX | USART, Rx Complete | (USART_RX_vect) |
20 | 0x0026 | USART, UDRE | USART, Data Register Empty | (USART_UDRE_vect) |
21 | 0x0028 | USART, TX | USART, Tx Complete | (USART_TX_vect) |
22 | 0x002A | ADC | ADC Conversion Complete | (ADC_vect) |
23 | 0x002C | EE READY | EEPROM Ready | (EE_READY_vect) |
24 | 0x002E | ANALOG COMP | Analog Comparator | (ANALOG_COMP_vect) |
25 | 0x0030 | TWI | 2-wire Serial Interface | (I2C) (TWI_vect) |
26 | 0x0032 | SPM READY | Store Program Memory Ready | (SPM_READY_vect) |
ATmega328P External Interrupt Sense Control
- The INT0 and INT1 interrupts can be triggered by a low logic level, logic change, and a falling or rising edge.
- This is set up as indicated in the specification for the External Interrupt Control Register A – EICRA as defined in Section 12.2.1 EICRA of the Datasheet. The number “n” can be 0 or 1.
ISCn1 | ISCn0 | Arduino mode | Description |
---|---|---|---|
0 | 0 | LOW | The low level of INTn generates an interrupt request |
0 | 1 | CHANGE | Any logical change on INTn generates and interrupt request |
1 | 0 | FALLING | The falling edge of INT0 generates an interrupt request |
1 | 1 | RISING | The rising edge of INT0 generates an interrupt request |
ATmega328P External Interrupt Enable
- All interrupts are assigned individual enable bits which must be written logic one together with the Global Interrupt Enable bit in the Status Register (SREG) in order to enable the interrupt.
- The ATmega 328P supports two external interrupts which are individually enabled by setting bits INT1 and INT0 in the External Interrupt Mask Register (Section 12.2.2 EIMSK).
- Let’s look at an example. When an edge or logic change on the INT0 pin triggers an interrupt request, INTF0 becomes set (one). If the I-bit in SREG and the INT0 bit in EIMSK are set (one), the MCU will jump to the corresponding Interrupt Vector. The flag is cleared when the interrupt routine is executed.
- Alternatively, the flag can be cleared by writing a logical one to it. The EIFR register is within the I/O address range (0x00 to 0x1F) of the Set Bit in I/O Register (SBI) Instruction. This flag is always cleared when INT0 is configured as a level interrupt.
When Will External Interrupts be Triggered?
When the INT0 or INT1 interrupts are enabled and are configured as low level triggered (Type 2), the interrupts will trigger as long as…
- The pin is held low.
- The low level is held until the completion of the currently executing instruction.
- The level is held long enough for the MCU to completely wake-up (assuming it was asleep).
– Low level interrupt on INT0 and INT1 are detected asynchronously (no clock required). The I/O clock is halted in all sleep modes except idle mode. Therefore low level interrupts can be used for waking the part from all sleep modes. - Among other applications, low level interrupts may be used to implement a handshake protocol.
When the INT0 or INT1 interrupts are enabled and are configured as edge or logic change (toggle) triggered, (Type 11) the interrupts will trigger as long as…
- The I/O clock is present.
– This implies that these interrupts cannot be used for waking up the part from sleep modes other than idle mode. - The pulse lasts longer than one I/O clock period. Shorter pulses are not guaranteed to generate an interrupt.
PIN Change Interrupts
- In addition to our two (2) external interrupts, twenty-three (23) pins can be programmed to trigger an interrupt if there pin changes state.
- These 23 pins are in turn divided into three (3) interrupt groups (PCI 2:0) corresponding to the three GPIO Ports B, C, and D
- Each of the groups are assigned to one pin change interrupt flag (PCIF) bit (2:0).
- A pin change interrupt flag will be set, if the interrupt is enabled (see How to Enable a Pin Change Interrupt), and any pin assigned to the group changes state (toggles).
How a PIN Change Interrupt Works
Here is how it works…
How to Enable a PIN Change Interrupt
In addition to our two (2) external interrupts, twenty-three (23) pins PCINT 23:16, 14:0 can be programmed to trigger an interrupt if there pin changes state. These 23 pins are divided into three (3) interrupt groups (PCI 2:0) of eight (8), seven (7) and (8). Consequently to enable and individual pin change interrupt 3 interrupt mask bits must be set to one (1).
- The SREG global interrupt enable bit I
- The pin change interrupt enable bit (PCIE 2:0) group the pin is assigned. Specifically, a pin change interrupt PCI2 will trigger if any enabled PCINT23..16 pin toggles. A pin change interrupt PCI1 will trigger if any enabled PCINT14..8 pin toggles. A pin change interrupt PCI0 will trigger if any enabled PCINT7..0 pin toggles.
- The individual pin change interrupt enable mask bit assigned to the pin (PCINT 23:0) is set. These mask bits are located in the three pin change mask registers assigned to each group.
ATmega328P Interrupt Processing (REVIEW)
Programming the Arduino to Handle External Interrupts
- Stop compiler optimization of variables within an ISR by adding the volatile qualifier. This keeps the current value in SRAM until needed.
const byte pin = 8; // green LED 0
volatile int state = LOW;
- Add jumps in the IVT to ISR routine, configure External Interrupt Control Register A (EICRA), and enable local and global Interrupt Flag Bits.
- Write Interrupt Service Routine (ISR)
void blink()
{
state = !state;
}
To disable interrupts globally (clear the I bit in SREG) call the noInterrupts() function. To once again enable interrupts (set the I bit in SREG) call the interrupts() function.
Programming the Arduino to Handle Interrupts
- In the AVR-GCC environment upon which the Arduino language is built, the interrupt vector table (IVT) is predefined to point to interrupt routines with predetermined names (see “ATmega328P Interrupt Vector Table” on page 6).
- You create an ISR by using the Macro ISR() and these names.
#include <avr/interrupt.h>
ISR(ADC_vect)
{
// user code here
}
- Now that you have defined the ISR you need to locally and globally enable it. Here are the relevant links for learning how to complete your ISR definition.
- Global manipulation of the interrupt flag
- Gammon Software Solutions forum – Interrupts
- ISR() macro
Practice Problems
Design Example – Switch Debounce
- When you press a button, its contacts will open and close many times before they finally stay in position. This is known as contact bounce.
- Depending on the switch construction, this mechanical contact bounce can last up to 10 or 20 milliseconds. This isn’t a problem for lamps, doorbells and audio circuits, but it will play havoc to with our edge-triggered interrupt circuitry.
- With respect to the waveform above, a switch debounce solution must be designed to filter out these transitions.
Switch Debounce Solutions
So how can we design a “Debounce Circuit” to filter out these transitions.
- The lowest-cost solution requires no hardware. Specifically, we disable the external interrupt during the switch bounce time. This solution has been implemented for the Arduino by Nick Gammon with Arduino code provided here in the “Example code of a pump timer” section.
- For some simple electrical solutions visit http://www.patchn.com/Debounce.htm.
- For our solution, I added a D flip-flop which is clocked at a frequency less than 50 Hz (Tp = 20 milliseconds). This digital circuit acts as a low pass filter blocking the AVR interrupt circuitry from responding to any of these additional edges.
Switch Debounce Circuit – a Simple Digital Low Pass Filter
Appendix
How I Designed the Debounce Circuit
Here is a real world problem that I considered while designing my Debounce circuit.
Logic Levels
Between logic 0 and logic 1 there is an undefined region . The figure below shows TTL input and output voltage levels corresponding to logic 1 and 0 (source: Theory of TTL Logic Family).
Recommended Reading: Logic signal voltage levels
Rise and Fall Times (Slew Rate)
Electrical signals have a finite period to transition through this region, technically known at rise and fall times or slew rate.
The table below provides data for propagation delay and slew rate for each of the families listed. Don’t allow digital logic slew rates to be slower than what is specified by the data sheet. All digital logic families will oscillate with slow rise times.
For some micro-controller inputs rise and fall times can be no more than 20 nsec. If this specification is violated the input may start to oscillate causing havoc within the device and ultimately destroying the input gate structure of the receiving gate.
The input circuits of MOS devices, like our AVR micro-controller, can be characterized as capacitive in nature (can be modeled to the first order by a capacitor). For some inputs this capacitance can be as great as 10 pF (pico = 10-12). Now, let us assume an external pull-up resistor of 10 KΩ. Given this information we come up with a “back of the envelope” calculated time constant (RC) of 100 nsec.
Clearly, we have a problem. I solved this problem by adding a TTL device between the switch and the micro-controller. The input of the 74ALS74 can be characterized as resistive in nature (can be modeled by a resistor). Combined with a pull-up resistance (10 KΩ) the input problem is ameliorated.
The output of the 74ALS74 TTL device goes directly to the input of the AVR micro-controller solving our slew rate problem. This new faster circuit however introduces its own problems as discussed in the next section.
Interrupts and 16-bit Timer/Counter 1: Atmel AVR Timers and Interrupts
Reading
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Section: 10.2
Table of Contents
Interrupt Basics – Review –
ATmega328P Interrupt Vector Table
Vector No | Program Address | Source | Interrupt Definition | Arduino/C++ ISR() Macro Vector Name |
---|---|---|---|---|
1 | 0x0000 | RESET | Reset | |
2 | 0x0002 | INT0 | External Interrupt Request 0 (pin D2) | (INT0_vect) |
3 | 0x0004 | INT1 | External Interrupt Request 1 (pin D3) | (INT1_vect) |
4 | 0x0006 | PCINT0 | Pin Change Interrupt Request 0 (pins D8 to D13) | (PCINT0_vect) |
5 | 0x0008 | PCINT1 | Pin Change Interrupt Request 1 (pins A0 to A5) | (PCINT1_vect) |
6 | 0x000A | PCINT2 | Pin Change Interrupt Request 2 (pins D0 to D7) | (PCINT2_vect) |
7 | 0x000C | WDT | Watchdog Time-out Interrupt | (WDT_vect) |
8 | 0x000E | TIMER2 COMPA | Timer/Counter2 Compare Match A | (TIMER2_COMPA_vect) |
9 | 0x0010 | TIMER2 COMPB | Timer/Counter2 Compare Match B | (TIMER2_COMPB_vect) |
10 | 0x0012 | TIMER2 OVF | Timer/Counter2 Overflow | (TIMER2_OVF_vect) |
11 | 0x0014 | TIMER1 CAPT | Timer/Counter1 Capture Event | (TIMER1_CAPT_vect) |
12 | 0x0016 | TIMER1 COMPA | Timer/Counter1 Compare Match A | (TIMER1_COMPA_vect) |
13 | 0x0018 | TIMER1 COMPB | Timer/Counter1 Compare Match B | (TIMER1_COMPB_vect) |
14 | 0x001A | TIMER1 OVF | Timer/Counter1 Overflow | (TIMER1_OVF_vect) |
15 | 0x001C | TIMER0 COMPA | Timer/Counter0 Compare Match A | (TIMER0_COMPA_vect) |
16 | 0x001E | TIMER0 COMPB | Timer/Counter0 Compare Match B | (TIMER0_COMPB_vect) |
17 | 0x0020 | TIMER0 OVF | Timer/Counter0 Overflow | (TIMER0_OVF_vect) |
18 | 0x0022 | SPI, STC | SPI Serial Transfer Complete | (SPI_STC_vect) |
19 | 0x0024 | USART, RX | USART, Rx Complete | (USART_RX_vect) |
20 | 0x0026 | USART, UDRE | USART, Data Register Empty | (USART_UDRE_vect) |
21 | 0x0028 | USART, TX | USART, Tx Complete | (USART_TX_vect) |
22 | 0x002A | ADC | ADC Conversion Complete | (ADC_vect) |
23 | 0x002C | EE READY | EEPROM Ready | (EE_READY_vect) |
24 | 0x002E | ANALOG COMP | Analog Comparator | (ANALOG_COMP_vect) |
25 | 0x0030 | TWI | 2-wire Serial Interface | (I2C) (TWI_vect) |
26 | 0x0032 | SPM READY | Store Program Memory Ready | (SPM_READY_vect) |
ATmega328P Enabling an Interrupt – Timer/Counter 1
- All interrupts are assigned individual enable bits which must be written logic one together with the Global Interrupt Enable bit in the Status Register (SREG) in order to enable the interrupt.
- For example, to allow the Timer/Counter 1 Overflow flag (TOV1) to generate an interrupt you would set the Timer/Counter 1 Overflow Interrupt Enable (TOIE1) bit.
- When Timer/Counter 1 Overflows (0xFFFF 0x0000) the TOV1 bit is set to 1.
- With global interrupt I-bit set and Timer/Counter 1’s Overflow Interrupt Enable TOIE1-bit set, when the Overflow TOV1-bit is set an interrupt will be generated and the Program Counter (PC) will be vectored to Flash Program Memory address 0x001A (see IVT Table on previous page). The AVR processor starts running the ISR.
- The TOV1 flag is automatically cleared at the beginning of the interrupt service routine. Alternatively, if you are polling the flag, it can be cleared by writing a logical one to it. The TIFR1 register is within the I/O address range (0x00 to 0x1F) of the Set Bit in I/O Register (SBI) Instruction.
Timer/Counter 1 Normal Mode – Design Example
See Lecture 9 for the design example.
ATmega328P Enabling Timer/Counter 1 Interrupt
// Jump over and Setup the Interrupt Vector Table
RST_VECT:
rjmp reset
// TIMER1 OVF vector = 0x001A, Sect 9.4 Interrupt Vectors in ATmega328P
.ORG OVF1addr
jmp TOVF1_ISR // Section 4.7 Reset and Interrupt Handling
; Set prescale and start Timer/Counter1
ldi r16,(1<<cs11)|(1<<cs10) //prescale of 64 sect 15.11.2
sts TCCR1B,r16 // Table 15-5 Clock Select Bit Description
ldi r16,0x0B // load value high byte (Sect 15.2-15.3)
sts TCNT1H,r16
ldi r16,0xDC // load value low byte
sts TCNT1L,r16
// Enable Local and Global Interrupts
ldi r16,(1<<toie1) //enable interrupts for timer1 OVF
sts TIMSK1,r16 // TIMSK1 Bit 0 – TOIE1
sei // Global Interrupt Enable
The Interrupt Service Routine (ISR)
; — Timer/Counter 1 Overflow Interrupt Service Routine —
; Called on Timer/Counter1 overflow TOV1
; TOV1 flag automatically cleared by AVR on interrupt
TOVF1_ISR:
push reg_F
in reg_F,SREG
push r16
; — 250 msec —
ldi r16,0x0B // load value high byte 0x0B
sts TCNT1H,r16
ldi r16,0xDC // load value low byte 0xDC
sts TCNT1L,r16
; — Blink Discrete LED —
ldi r16,0b10000000 // toggle LED
eor spiLEDS, r16
pop r16
out SREG,reg_F
pop reg_F
reti
; ——————————————————-
Addressing Modes Part II: AVR Addressing Indirect
READING
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Sections: 6.1, 6.3, 6.4
Table of Contents
ADDRESSING MODES
- When loading and storing data we have several ways to “address” the data.
- The AVR microcontroller supports addressing modes for access to the Program memory (Flash) and Data memory (SRAM, Register file, I/O Memory, and Extended I/O Memory).
OPERAND LOCATIONS AND THE ATMEGA328P MEMORY MODEL
When selecting an addressing mode you should ask yourself where is the operand (data) located within the memory model of the AVR processor and when do I know its address (assembly time or at run time).
IMMEDIATE ADDRESSING MODE – A REVIEW
C++ Code
uint8_t foo; // 8-bit unsigned number, from 0 to 255
foo = 0x23;
Assembly Code
Data is encoded with the instruction. Operand is therefore located in Flash Program Memory. This is why technically our memory model is a Modified Harvard.
ldi r16, 0x23 // where ldi = 1110, Rd = 00002
// and constant K = 001000112
Notice that only four bits (dddd) are set aside for defining destination register Rd. This limits us to 24 = 16 registers. The designers of the AVR processor chose registers 16 to 31 to be these registers (i.e., 16 ≤ Rd ≤ 31).
What is the machine code instruction for our ldi example?
DIRECT ADDRESSING MODE – A REVIEW
C++ Code
uint8_t foo, A = 0x23; // 8-bit unsigned number, from 0 to 255
foo = A;
Assembly Code
.DSEG
A: .BYTE 1
.CSEG
lds r16, A
THE X-REGISTER, Y-REGISTER, AND Z-REGISTER
The registers R26..R31 have some added functions to their general purpose usage. These registers are 16-bit address pointers for indirect addressing of the data space. The three indirect address registers X, Y, and Z are defined as described here.
In the different addressing modes these address registers have functions as fixed displacement, automatic increment, and automatic decrement (see the instruction set reference for details).
PROGRAM MEMORY INDIRECT
- The indirect addressing mode in all its forms is used when you will not know the location of the data you want until the program is running. For example, in our 7-segment decoder example, we do not know ahead of time which number (0 to F) we want to decode.
lpm Rd, Z
- Instruction Encoding
TWO VIEWPOINTS
- You can look at the indirect addressing mode address as a word address with a byte selector (illustration on the left), or as a byte address (illustration on the right).
- The first viewpoint is correct from a computer engineering perspective (it is really how it is works). The second perspective is functionally equivalent and helps us visualize the computation of the indirect address as the sum of the base address plus an index.
- The most significant bit of the ZH:ZL is lost, to make space for the byte address in the least significant bit.
Addressing Mode Operation – Two Viewpoints
PROGRAM MEMORY INDIRECT WITH POST-INCREMENT
lpm r16, Z+
Instruction Encoding
Addressing Mode Operation
PROGRAM MEMORY INDIRECT – EXAMPLE 1
ldi ZH, high(Table<<1) // Initialize Z-pointer (read next page)
ldi ZL, low(Table<<1)
lpm r16, Z // Load constant from Program
; Memory pointed to by Z (r31:r30)
…
Table:
.DW 0x063F // 0x3F is addressed when ZLSB = 0
// 0x06 is addressed when ZLSB = 1
PRINCETON VERSUS MODIFIED HARVARD MEMORY MODELS
Princeton or Von Neumann Memory Model
Program and data share the same memory space. Processors used in all personal computers, like the Pentium, implement a von Neumann architecture.
Harvard Memory Model
As we have learned in the Harvard Memory Model, program and data memory are separated. The AVR processors among others including the Intel 8051 use this memory model. One advantage of the Harvard architecture for microcontrollers is that program memory can be wider than data memory. This allows the processor to implement more instructions while still working with 8-bit data. For the AVR processor program memory is 16-bits wide while data memory is only 8-bits.
You may have already noticed that when you single step your program in the simulator of AVR Studio the Program Counter is incremented by 1 each time most instructions are executed. No surprise there right? Wrong. The program memory of the AVR processor can also be accessed at the byte level. In most cases this apparent paradox is transparent to the operation of your program with one important exception. That important exception is occurs when you want to access data stored in program memory. It is this ability of the AVR processor to access data stored in program memory that makes it a “Modified” Harvard Memory Model.
When you access from program memory you will be working with byte addresses not words (16-bits). The assembler is not smart enough to know the difference and so when you ask for an address in program memory it returns its word address. To convert this word address into a byte address you need to multiply it by 2. Problematically we do this by using the shift left syntax of C++ to explicitly tell the assembler to multiply the word address by 2. Remember, when you shift left one place you are effectively multiplying by 2.
With this in mind, we would interpret the following AVR instruction as telling the AVR assembler to convert the word address of label beehives in program memory to a byte address and then to take the low order of the resulting value and put into the source operand of the instruction.
ldi ZL,low(beeHives<<1) // load word address of beeHives look-up
PROGRAM MEMORY INDIRECT – EXAMPLE 2
Program Memory Indirect is great for implementing look-up tables located in Flash program memory – including decoders (gray code → binary, hex → seven segment, …)
In this example I build a 7-segment decoder in software.
BCD_to_7SEG:
ldi r16, 0b00001111 // limit to least significant
and r0, r16 // nibble (4 bits)
ldi ZL,low(table<<1) // load address of look-up
ldi ZH,high(table<<1)
clr r1
add ZL, r0
adc ZH, r1
lpm spi7SEG, Z
ret
//__________ gfedcba ___ gfedcba ___ gfedcba
table: .DB 0b00111111, 0b00000110, 0b01011011, …
// ________________0 _________ 1 _________ 2
BIG ENDIAN VERSUS LITTLE ENDIAN – DEFINE BYTE
To help understand the difference between Big and Little Endian let’s take a closer look at how data is stored in Flash Program Memory. We will first look at the Define Byte (.DB) Assembly Directive and then at the Define Word (.DW) Assembly Directive.
Each table entry (.DB) contains one byte. If we look at the first table entry we see 0b00111111 which corresponds to 3f in hexadecimal. Comparing this with the corresponding address and data fields on the left… Wait a minute – where did 06 come from? That the second entry in the table (0b00000110 = 0616). The bytes are backwards and here is why.
There are two basic ways information can be saved in memory known as Big Endian and Little Endian. For Big Endian the most significant byte (big end) is saved in the lowest order byte; so 0x3f06 would be saved as bytes 0x3f and 0x06. For Little Endian the least significant byte (little end) is saved in the lowest order byte; so 0x3f06 is save as bytes 0x06 and 0x3f. As you hopefully have guessed by now the AVR processor is designed to work with data words saved as Little Endian.
BIG ENDIAN VERSUS LITTLE ENDIAN – DEFINE WORD
Now let’s take a closer look at how data is saved in program memory using the Define Word (.DW) Assembly Directive. For illustrative purposes we will look at a look-up table named beeHives.
Each table entry (.DW) contains two bytes (1 16-bit word). These two bytes provide the row and column of a room containing bees. For example with respect to the maze, the room in row 00 column 04 contains 1 bee. If we look at the first entry we see it contains 0x0400. Comparing this with the corresponding Program Memory Window in AVR Studio… Wait a minute – that looks backward. From reading about the .DB assembly directive can you discover why?
Working with Bits and Bytes: Logic Instructions and Programs
READING
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Sections: 5.3, 5.4, 5.5
Table of Contents
OVERVIEW
Clearing and Setting a Bit In …
Where | Instruction | Alternative | Notes |
---|---|---|---|
I/O (0 – 31) | cbi, sbi | Use with I/O Ports | |
SREG | cl{i,t,h,s,v,n,z,c}
se{i,t,h,s,v,n,z,c} |
bclr
bset |
|
Working with General Purpose Register Bits | |||
Clearing and Setting a Byte | clr, ser | ||
Clearing Bits | and, cbr | andi | |
Testing Bits | and | Also consider using sbrc, sbrs, sbic, sbis (see Control Transfer Lecture) | |
Testing a Bit | bst brts, brtc | ||
Testing a Byte | tst breq, brne | ||
Setting Bits | or, sbr | ori | |
Inserting a Bit Pattern | cbr sbr | and or | |
Complementing (Toggling) Bits | eor | ||
Rotating Bits | rol, ror | ||
Shifting Bits | lsl, lsr, asr | ||
Swapping Nibbles | swap |
SAMPLE APPLICATION – KNIGHT RIDER
KnightRider:
; See page 5 and 6 – Clearing and Setting Bits
clr r16 // start with r9 bit 6 set – LED 6
sbr r16, 0b10000000
; Q1: How could we have done this using 1 instruction?
ldi r17,(1<<sreg_t) equivalent=”” to=”” 0b01000000<=”” span=””></sreg_t)>
; See page 7 – Clearing and Setting a Bit in the AVR Status Register
clt // initialize T = 0, scan right
; See page 8 – Testing Bits
loop:
ldi r19, 0b100000001
and r19,r16 // test if LED hit is at an edge
breq contScan // continue scan if z = 0
; See page 9 – Toggling Bits
in r16,SREG // toggle T bit
eor r16, r17
out SREG,r16
; See page 10 – Rotating and Shifting Bits
contScan:
brts scanLeft // rotate right or left
lsr r16
rjmp cont
scanLeft:
lsl r16
cont:
mov spiLEDS, r16
call WriteDisplay
rcall Delay
rjmp loop
SAMPLE APPLICATION – BICYCLE LIGHT
A bicycle light has 5 LEDs.
BicycleLight1: A repeating pattern starts with the center LED turned ON. The center LED is then turned OFF, and the LEDs to the left and right of the center LED are turned ON. Each LED continues its scan to the left or right. Once the LEDs reach the end the pattern repeats itself. Using the CSULB shield, write a program to simulate this bicycle light.
BicycleLight2: Same as Bicycle1 except when LEDs reach the edge, they scan back to the center.
BicycleLight1:
clr r7 // turn off 7 segment
begin: ldi r16, 0x04 // scan register r16 = 4
mov r17, r16 // scan register r17 = 4
scan: mov r8, r16 // do not modify r16
cbr r17, 0x20 // r17 bit 5 = 1 at end of cycle
or r8, r17 // combine scan registers
rcall Delay
call WriteDisplay
lsr r16 // scan r16 right
lsl r17 // scan r17 left, r17 = 0 at end of cycle
brne scan // if r17 <> 0 then continue scan
rjmp begin // else start next cycle
BicycleLight2:
ldi r16, 0x08 // 000|0_1000 start just in from edges
ldi r17, 0x02 // 000|0_0010
scan: mov r8, r16
or r8, r17
rcall Delay
call WriteDisplay
lsl r17 // scan r17 left
lsr r16 // scan r16 right
brcc scan
rjmp BicycleLight2
CLEARING BITS
To clear a bit set the corresponding mask bit to 0
and source/dest register, mask register
Problem: Convert numeric ASCII value (‘0’ – ‘9’) to its
binary coded decimal (BCD) equivalent (0 – 9).
- What we have: ‘0’ to ‘9’ which equals 3016 to 3916
- What we want: 0 to 9 which equals 0016 to 0916
Solution: Mask out high-order nibble
lds r16, ascii_value
ldi r17, 0x0F
and r16, r17 // or simply andi
sts bcd_value, r16
An alternative to the and instruction is the Clear Bits in Register cbr instruction.
cbr source/dest register, mask bits
The cbr instruction clears the specified bits in the source/Destination Register (Rd). It performs the logical AND between the contents of register Rd and the complement of the constant mask (K). The result will be placed in register Rd.
Rd Rd ∙ (0xFF – K)
Here is how the previous problem would be solved using the cbr instruction.
lds r16, ascii_value
cbr r16, 0xF0
sts bcd_value, r16
SETTING BITS
To set a bit set the corresponding mask bit to 0
or source/dest register, control register
Example: Set to one (1) bits 4 and 2 in some port.
in r16, some_port
ldi r17, 0b00010100
or r16, r17 // or simply ori
out some_port, r16
An alternative to the or instruction is the Set Bits in Register sbr instruction.
sbr source/dest register, mask bits
The sbr instruction sets the specified bits in the source/Destination Register (Rd). It performs the logical ORI between the contents of register Rd and the constant control (K). The result will be placed in register Rd.
Rd Rd + K
Here is how the previous problem would be solved using the cbr instruction.
in r16, some_port
sbr r16, 0b00010100
out some_port, r16
CLEARING AND SETTING A BIT IN THE AVR STATUS REGISTER
AVR Instructions for Clearing and Setting SREG bits
cl{i,t,h,s,v,n,z,c} or bclr SREG_{I,T,H,S,V,N,Z,C} // defined in m328Pdef.inc
se{i,t,h,s,v,n,z,c} or bset SREG_{I,T,H,S,V,N,Z,C} // defined in m328Pdef.inc
Examples:
Disable all Interrupts
cli
Set T bit
set
TESTING BITS
Use the andi instruction to test if more than one bit is set
andi source/dest register, mask bits
Example 1: Branch if bit 7 or bit 0 is set
// 7654 3210
lds r16, some_bits // 1000 0000 example
andi r16, 0b10000001 // 1000 0001
brbc SREG_Z, bit_set // 1000 0000 (alt. brne)
Example 2: Branch if bit 4 and bit 2 are clear
// 7654 3210
lds r16, some_bits // 1101 1001 example
andi r16, 0b00010100 // 0001 0100
brbs SREG_Z, bits_zero // 0001 0000 (alt. breq)
Consider using one of the “Skip if Bit” instructions if you only need to test one bit.
Review “Control Transfer” lecture material for details.
Use the tst instructions to test if a register is Zero or Minus.
Tests if a register is zero or negative. Performs a logical AND between a register and itself. The register will remain
unchanged.
Example: Branch if bear is in the forest
rcall inForest // returns false(r24 = 0) if bear is not in the forest
tst r24
breq not_in_forest // branch if r24 = 0
TOGGLING BITS
To toggle (complement) a bit set the corresponding mask bit to 1
eor source/dest register, mask register
Example: Toggle bits 5 and 3 of I/O-Port D.
//7654 3210
in r16, PORTD // 1101 1001 example
ldi r17, 0x28 // 0010 1000
eor r16, r17 // 1111 0001
out PORTD, r16
When toggling an I/O-Port bit, consider writing a one to the corresponding pin.
Review “AVR Peripherals” lecture material for details.
Example: Toggle bits 5 and 3 of I/O-Port D.
sbi PIND, PIND5 // equivalent to sbi 0x09, 5
sbi PIND, PIND3
When toggling a byte (8 bits), use the Complement instruction.
Example: Write TurnAround code snip-it (i.e., toggle SRAM variable dir)
// 7654 3210
lds r16, dir // 1101 1001 facing East
com r16 //_____ 0010 0110 facing West
cbr r16, 0xFC //1111 1100 clear unused bits (optional)
sts dir, r16 // 0000 0010
Question: How could you have complemented dir without modifying the other 6 bits?
ROTATING AND SHIFTING BITS
Rotate Instructions allow us to rearrange bits without losing information and to sequentially test bit (brcc, brcs). Shift instructions allow us to quickly multiply and/or divide signed and/or unsigned numbers by 2.
Rotate Left through Carry
rol Rd
Shifts all bits in Rd one place to the left. The C Flag is shifted into bit 0 of Rd. Bit 7 is shifted into the C Flag. This operation, combined with LSL, effectively multiplies multi-byte signed and unsigned values by two.
Rotate Right through Carry
ror Rd
Shifts all bits in Rd one place to the right. The C Flag is shifted into bit 7 of Rd. Bit 0 is shifted into the C Flag. This operation, combined with ASR, effectively divides multi-byte signed values by two. Combined with LSR it effectively divides multibyte unsigned values by two. The Carry Flag can be used to round the result.
Logical Shift Left (Arithmetic Shift Left)
lsl Rd
Shifts all bits in Rd one place to the left. Bit 0 is cleared. Bit 7 is loaded into the C Flag of the SREG. This operation effectively multiplies signed and unsigned values by two.
Logical Shift Right
lsr Rd
Shifts all bits in Rd one place to the right. Bit 7 is cleared. Bit 0 is loaded into the C Flag of the SREG. This operation effectively divides an unsigned value by two. The C Flag can be used to round the result.
Arithmetic Shift Right
asr Rd
Shifts all bits in Rd one place to the right. Bit 7 is held constant. Bit 0 is loaded into the C Flag of the SREG. This operation effectively divides a signed value by two without changing its sign. The Carry Flag can be used to round the result.
CLEARING AND SETTING A BIT IN ONE OF THE FIRST 32 I/O REGISTERS
Example: Pulse Clock input of Proto-Shield Debounce D Flip-flop (PORTD5). Assume currently at logic 0.
sbi PORTD, 5
cbi PORTD, 5
SETTING A BIT PATTERN
Use the Clear Bits in Register cbr or functionally equivalent andi instruction in combination with the Set Bits in Register sbr to set a bit pattern in a register.
Problem: Convert a binary coded decimal (BCD) (0 – 9) number to its ASCII equivalent value (‘0’ – ‘9’).
- What we have: 0 to 9 which equals X016 to X916
The X indicates that we do not know what is contained in this nibble. - What we want: ‘0’ to ‘9’ which equals 3016 to 3916
Solution: Set high-order nibble to 316
lds r16, bcd_value
andi r16, 0x0F // clear most significant nibble
sbr r16, 0x30 // set bits 5 and 4
sts ascii_value, r16
What is Happening
QUESTIONS
- What instruction is used to divide a signed number by 2?
- What instruction is used to multiply an unsigned number by 2?
- What instruction(s) would be used to convert a word pointer into a byte pointer? A word pointer is a register pair like Z containing the address of a 16-bit data (2 byte) word in an SRAM Table. A byte pointer is a register pair like Z containing the address of an 8-bit data byte in a corresponding SRAM Table. Assuming there is a one-to-one relationship between each word in the first table with a byte in the second table. And remembering that SRAM is always addressed at the Byte level, how would convert a pointer defined for the word table into a pointer defined for the byte table.
Appendix
APPENDIX A: KNIGHT RIDER OPTIMIZED
.INCLUDE
rjmp reset
.INCLUDE “spi_shield.inc”
reset:
call InitShield
// initialize knight rider
ldi r16, 0b10000000 // start with r9 bit 7 set – LED 7
mov spiLEDS, r16
// initialize roulette
ldi r19,0xE0
ldi r20,0x1F
ldi r16,0x01
mov spi7SEG,r16
loop:
// night rider routine
ldi r16, 0b10000001
and r16, spiLEDS // test if LED hit is at an edge
breq contScan // continue scan if z = 0
bst spiLEDS, 0 // if right LED ON, then T = 1
contScan:
brts scanLeft // rotate right or left
lsr spiLEDS
rjmp cont
scanLeft:
lsl spiLEDS
cont:
// roulette routine
add spi7SEG, r19
and spi7SEG, r20
rol spi7SEG
rcall WriteDisplay
rcall Delay
// display routine
rcall WriteDisplay
rcall Delay
rjmp loop
APPENDIX B: KNIGHT RIDER ADDRESSING INDIRECT
begin:
ldi r16, 14 // loop 14 times
ldi ZH, high(Table<<1) // set base address
ldi ZL, low(Table<<1)
scan:
lpm r9, Z+ // load constant to LED display register
rcall WriteDisplay // display routine
rcall Delay
dec r16
brne scan // if r17 <> 0 then continue scan
rjmp begin // else start next cycle
KnightRider: .DB 0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02
.DB 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40
Knight Rider Dual Scan
ldi r18, 0x01
clr r19
ldi r20, 0b10000001
loop:
lsl r18
lsr r19
mov r8, r18
or r8, r19
rcall Delay
rcall WriteDisplay
and r8, r20
breq loop
push r18
push r19
pop r18
pop r19
rjmp loop
BICYCLE LIGHT SOLUTION BY ARTHUR KU FALL 2017
The idea behind this one is that the different numerical states of our LEDs have a difference from their neighbors by +6,+7,-7,-6 repeating, and so I can use the half-carry to decide when to toggle r16.
BicycleLight4:
ldi r16, 0x0A
mov r8, r16
ldi r16, 0x06
scan4:
inc r16
add r8, r16
rcall WriteDisplay
rcall Delay
brhc scan 4
com r16
rjmp scan4
Introduction to AVR Assembly Language Programming II: Stack Operations
“Those who are last now will be first then, and those who are first will be last.” -Matthew 20:16
READING
The AVR Microcontroller and Embedded Systems using Assembly and C
by Muhammad Ali Mazidi, Sarmad Naimi, and Sepehr Naimi
Section: 3.2
AVRBeginners.Net
Jumps, Calls and the Stack
Table of Contents
WORKING WITH STACKS
- Stacks
FIFO and LIFO
SP
Initialization - LIFO Stack Operations (Push and Pop)
Explicit push and pop
Implicit rcall, call, icall, ret, reti
- Working with the Stack (3 questions and answers)
Word Size: 1 byte
Points At: Empty Byte
Direction: Decrements stack by 2 for implicit (call) and by 1 for explicit (push) stack operations - The Program Counter byte ordering on the SRAM stack is Big Endian.
STACK OPERATION ON A CALL INSTRUCTION
AMAZING LAB DESIGN EXAMPLE
CALL INSTRUCTION ENCODING
- All control transfer addressing modes modify the program counter.
- The Program Counter byte ordering on the SRAM stack is Big Endian.
RCALL INSTRUCTION ENCODING
RET INSTRUCTION ENCODING
AVR Instruction Set Encoding
Reading
“AVR Instruction Set,” Section 6.4 “General Purpose Register File,” and Section 7.3 “SRAM Data Memory” in document doc0856 “The Program and Data Addressing Modes.”
Table of Contents
Instruction Set Mapping
The Instruction Set of our AVR processor can be functionally divided (or classified) into: Data Transfer Instructions, Arithmetic and Logic Instructions, Bit and Bit-Test Instructions, Control Transfer (Branch) Instructions, and MCU Control Instructions.
While this functional division helps you quickly find the instruction you need when you are writing a program; it does not reflect how the designers of the AVR processor mapped an assembly instruction into a 16-bit machine instruction. For this task a better way to look at the instructions is from the perspective of their addressing mode. We will divide AVR instructions into the following addressing mode types.
Data Addressing Modes
- Direct Register Addressing, Single Register
- Direct Register Addressing, Two 32 General Purpose Registers Rd and Rr
- Direct Register Addressing, Two 16 and 8 General Purpose Registers Rd and Rr
- Direct I/O Addressing (including SREG)
- Direct I/O Addressing, First 32 I/O Registers
- Direct SRAM Data Addressing
- Immediate 8-bit Constant
- Immediate 6-bit and 4-bit Constant
- Indirect SRAM Data Addressing with Pre-decrement and Post-increment
- Indirect Program Memory Addressing (Atmel Program Memory Constant Addressing)
Control Transfer
- Direct
- Relative, Unconditional
- Relative, Conditional
- Indirect
MCU Control Instructions
ATmega328P Operand Locations
When selecting an addressing mode you should ask yourself where the operand is (data) located within the AVR processor.
DATA ADDRESSING MODES
DIRECT REGISTER ADDRESSING, SINGLE REGISTER
DIRECT REGISTER ADDRESSING, TWO OF 32 8-BIT GENERAL PURPOSE REGISTERS RD AND RR
Multiply
DIRECT I/O ADDRESSING (INCLUDING SREG)
DIRECT SRAM DATA ADDRESSING
IMMEDIATE
INDIRECT SRAM DATA WITH DISPLACEMENT
INDIRECT SRAM DATA ADDRESSING WITH PRE-DECREMENT AND POST-INCREMENT
INDIRECT PROGRAM MEMORY ADDRESSING (ATMEL PROGRAM MEMORY CONSTANT ADDRESSING)
CONTROL TRANSFER
DIRECT
All control transfer addressing modes modify the program counter.
INDIRECT
RELATIVE
MCU CONTROL INSTRUCTIONS
PROGRAM DECODING – WHO AM I?
Addr Machine Instruction
Who_Am_I #1:
0204 9a5d ____ ____, ____ // I/O direct
0205 985d ____ ____, ____ // I/O direct
0206 9508 ____
pulse: ← Who Am I #1
0204 9a5d sbi PORTD,dff_clk // Set clock (2 clock cycles)
0205 985d cbi PORTD,dff_clk // Clear clock (2 clock cycles)
0206 9508 ret
Who_Am_I #2:
01f8 934f ____ ____ // Indirect SRAM Data Addressing
01f9 b74f ____ ____, ____ // I/O Direct
01fa 930f ____ ____ // Indirect SRAM Data Addressing
01fb 9180 0103 ____ ____, ____ // Direct SRAM Data Addressing
01fd 9100 0102 ____ ____, ____ // Direct SRAM Data Addressing
01ff 2380 ____ ____, ____ // Direct Register Addressing,
0200 910f ____ ____ // Indirect SRAM Data Addressing
0201 bf4f ____ ____, ____ // I/O Direct
0202 914f ____ ____ // Indirect SRAM Data Addressing
0203 9508 ____
hitWall: ← Who Am I #2
01f8 934f push reg_F // push any flags or registers modified
01f9 b74f in reg_F,SREG
01fa 930f push work0
01fb 9180 0103 lds cppReg,imageD
01fd 9100 0102 lds work0,imageR
01ff 2380 and cppReg,work0
0200 910f pop work0 // pop any flags or registers placed on the stack
0201 bf4f out SREG, reg_F
0202 914f pop reg_F
0203 9508 ret
PROGRAM ENCODING – DISPLAY
display:
:
_________ lds work0, imageR
_________ lds spi7SEG, imageD
_________ or spi7SEG, work0
_________ call spiTx
:
_________ ret
display:
019a 934f push reg_F
019b b74f in reg_F,SREG
019c 930f push work0
019d 9100 0102 lds work0,imageR
019f 9080 0103 lds spi7SEG,imageD
01a1 2a80 or spi7SEG,work0
01a2 940e 0109 call spiTx
01a4 910f pop work0
01a5 bf4f out SREG,reg_F
01a6 914f pop reg_F
01a7 9508 ret
PROGRAM ENCODING – TURN LEFT
; ————————–
; ——- Turn Left ——–
turnLeft:
_________ push reg_F
_________ in reg_F,SREG
:
_________ lds work0, dir // x = work0 bit 1, y = work0 bit 0
_________ bst work0,0 // store y into T
_________ bld work1,1 // load dir.1 from T (dir.1 = y)
_________ com work0 // store /x into T
_________ bst work0,1
_________ bld work1,0 // load dir.0 from T (dir.0 = /x)
_________ sts dir, work1
:
_________ out SREG, reg_F
_________ pop reg_F
_________ ret
turnLeft:
01b9 934f push reg_F
01ba b74f in reg_F,SREG
01bb 930f push work0
01bc 931f push work1
01bd 9100 0100 lds work0, dir // x = work0 bit 1, y = work0 bit 0
01bf fb00 bst work0,0 // store y into T
01c0 f911 bld work1,1 // load dir.1 from T (dir.1 = y)
01c1 9500 com work0 // store /x into T
01c2 fb01 bst work0,1
01c3 f910 bld work1,0 // load dir.0 from T (dir.0 = /x)
01c4 9310 0100 sts dir, work1
01c6 911f pop work1
01c7 910f pop work0
01c8 bf4f out SREG, reg_F
01c9 914f pop reg_F
01ca 9508 ret
PROGRAM ENCODING – IN FOREST AND SPITXWAIT
inForest:
Address Machine Instruction
0131 _____ ldi ZL,low(table<<1) // load address of look-up
:
02e8 _____ lds work0, row // SRAM row address = 0101
02e9
02ea _____ cpi work0, 0xFF
02eb _____ breq yes
02ec _____ clr cppReg // Compare to eor cppReg, cppReg
02ed _____ rjmp endForest
yes:
02ee _____ ser cppReg // compare to ldi cppReg, 0xFF
endForest:
:
02f3 _____ ret
inForest:
02e5 92ff push reg_F // push any flags or registers modified
02e6 b6ff in reg_F,SREG
02e7 930f push work0
02e8 9100 0101 lds work0,row
02ea 3f0f cpi work0,0xFF
02eb f011 breq yes
02ec 2788 clr cppReg // no
02ed c001 rjmp endForest
yes:
02ee ef8f ser cppReg
endForest:
02ef 2799 clr r25 // zero-extended to 16-bits for C++ call
02f0 910f pop work0 // pop any flags or registers placed on the stack
02f1 beff out SREG,reg_F
02f2 90ff pop reg_F
02f3 9508 ret
spiTxWait:
0112 _____ in work0,SPSR
0113 _____ bst work0,SPIF
0114 _____ brtc spiTxWait
0115 _____ ret
spiTxWait:
; Wait for transmission complete
0112 b50d in r16,SPSR
0113 fb07 bst r16,SPIF
0114 f7ee brtc spiTxWait
0115 9508 ret
PROGRAM ENCODING – BCD TO 7-SEGMENT DISPLAY
- Program Memory Indirect is great for implementing look-up tables located in Flash program memory – including decoders (gray code → binary, hex → seven segment, …)
- In this example I build a 7-segment decoder in software.
BCD_to_7SEG:
Address Machine Instruction
0131 _____ ldi ZL,low(table<<1) // load address of look-up
0132 _____ ldi ZH,high(table<<1)
0133 _____ clr r1
0134 _____ add ZL, r16
0135 _____ adc ZH, r1
0136 _____ lpm spi7SEG, Z
0137 _____ ret
0138 _____ table: DB 0b01111110, 0b0110000, 0b1101101 …
BCD_to_7SEG:
0131 e7e0 ldi ZL,low(table<<1) // load address of look-up
0132 e0f2 ldi ZH,high(table<<1)
0133 2411 clr r1
0134 0fe0 add ZL, r16
0135 1df1 adc ZH, r1
0136 9084 lpm spi7SEG, Z
0137 9508 ret
0138 307e
0139 6d6d table: .DB 0b01111110, 0b0110000, 0b1101101, 0b1101101
PROGRAM DECODING – SRAM INDIRECT
- Write and encode a program to set to ASCII Space Character (0x20), all the bytes in a 64-byte Buffer.