Advantech PCI-1243 (PCI-1243U) driver for Linux 2.6

By: Frank Rysanek of FCC Prumyslove systemy s.r.o., Czech Republic
e-mail: rysanek AT fccps DOT cz

Latest version

This description:
Driver and library:

Hardware recap

The Advantech PCI-1243 is a 4-axis motor control boards for the PCI bus, based on a chip called PCD4541, by Nippon Pulse Motor (NPM). 4 axis = 4 independent channels = 4 motors.
The board is capable of the following, in hardware:

In addition, the PCD4541 chip is capable of the following (this feature is not available on the PCI-1243, probably for the sake of simplicity and low cost): a pattern-encoded "excitation" output, four TTL output pins per axis for a 2-phase motor (unipolar or bipolar) - requires an extenal power driver/switch.

In comparison to some other controller boards and IC's (namely the PCI-1240 containing a Nova MCX314), these features are missing:

The PCD4541 chip has been used on other advantech products, namely the PCL-839+. Therefore, most software should be easily portable. Note that the PCI-1243 gets more out of the PCD4541 than the earlier ISA designs: the fourth axis is available, IRQ's finally function the way they should etc.

Along with the PCI-1243, Advantech provided a board manual, containing a basic description of the PCD4541 - but the Advantech manual is certainly not as comprehensive as the chipmaker's PCD4541 chip manual, which can also have some weaker parts. This HTML document should shed some light into some of the remaining dark corners. The information presented here was gathered from additional reading (the PCD4541 chip manual) and by experiments while debugging the Linux driver. Most of the detective work has in fact been carried out in the past, during driver development for the PCL-839+. Porting that to the PCI-1243 was a breeze.

Driver and library

Level of abstraction

The driver and library are rather thin wrappers around the PCI-1243 hardware functionality. To use this software package, the application programmer should know the PCI-1243 registers and semantics. This software package aims to shield away the application programmer from the low-level coding details of the PC/Linux platform, but not much more. The functions and macros provided by this package accept the speed/acceleration etc. values in their native hardware types and ranges, related to the underlying hardware registers, as described by the PCI-1243/PCD4541 manuals. I.e., there's no "math behind the scenes" as to convert the metric/physical values into the low-level hardware ranges and resolutions. This has to be done by the application (application programmer).

There are only a few areas in this software that provide some non-zero processing. These are usually related to the fact that the PCI-1243 per-axis configuration bits and status flags are mixed and multiplexed in various ways and thus somewhat cumbersome to work with in their native form. Let's have a few notable examples:

The basic PCD-4541 functionality is handled by reasonably mnemonic wrapper functions and macros. The more advanced functionality, such as individual bits in the configuration registers, are only covered at the level of bare chip commands, symbolic bit names and inb/outb wrappers. The application programmer will have to study the board+chip manuals and use these low-level functions and macros.

User space and kernel space

9 The software package described in this document consists of a driver in the form of a kernel-space module and a user-space library (for static or dynamic linking). The user-space library talks to the driver via a character device (opens an appropriate device node in the file system). The package is ready to work with multiple boards in the system, yet there's a single device node - multiple minor numbers would be more unix-like but perhaps unnecessarily complex to work with (depending on the particular real-world scenario at hand).

The header file is universal, all the function prototypes and macros work the same in kernel space and in user-space. In other words, the application programmer has freedom of choice, whether to compile the control application as a user-space executable binary, or as an insertable kernel-space module.

Obviously there are upsides and downsides to both kernel-space and user-space:

It's possible to strike a reasonable trade-off by splitting the application among kernel-space and user-space: timing-critical parts of the control task can run in a kernel module and the more relaxed parts can run as user-space apps, communicating among themselves from k-space to u-space via syscalls (read, write, ioctl) over device nodes (block or character type).

The header file


The header file contains a reasonable level of comments. If you're looking for a quick start, take a look at it.

The header file starts with mnemonic preprocessor macros (#defines) that substitute constant numeric identifiers of the various commands, registers, axis labels, IRQ source bits etc.
There are just a few core functions, implementing key parts of the functionality: read IO port, write IO port, issue a command, write a register, read a register etc. These in turn are used by a number of function-like wrapper macros, simplifying mnemonic access to the various PCI-1243 registers.

As mentioned above, the set of functions and macros is the same in kernel-space and in user-space. The only exception is the pci1243_opendev() prototype that takes no argument in kernel space and one argument in user space (the device node filename).

The identical function prototypes are implemented differently in the kernel-space module and in the user-space library. Any functionality is really implemented only in the kernel variety, the user-space library consists of mere wrapper functions that pass the arguments to their kernel-space counterparts via ioctl(). The bunch of function-like macros only live in the header file (no implementation in the C files) - hence they work the same way in kernel space and in user space.

The elements of the header file are stacked vaguely like this:

kernel spaceuser space
core functions<=-.  core functions
outw,inw,kernel code`-ioctl()

Except that, due to the inner dependencies of the header file, the contents flow from top to bottom, whereas the stack-style view above builds the layers from bottom to top.

In other words, in the header file, the most arcane functionality is located at the bottom. If you're after powerful macros and functions, try reading the header file from bottom to top :-)

Functions and macros, in order of appearance

For the named constants, take a look into the header file.


int pci1243_opendev(void);            // kernel space
int pci1243_opendev(char* name);  // user space
int pci1243_closedev(void);

int pci1243_write_port(u8 board, u8 port, u8 data, u8 delay);
int pci1243_read_port(u8 board, u8 port, u8* data);
int pci1243_cmd_raw(u8 board, u8 axis, u8 cmd, u8 delay);
int pci1243_cmd(u8 board, u8 axis, u8 cmd, u8 delay);
int pci1243_status0(u8 board, u8 axis, u8* data);
int pci1243_write_reg(u8 board, u8 axis, u8 reg, u32 data, u8 delay);
int pci1243_read_reg(u8 board, u8 axis, u8 reg, u32* data);
int pci1243_write_IO(u8 board, u8 data);
int pci1243_read_IO(u8 board, u8* data);

int pci1243_wait4irq(u8 board, u32* data);

int pci1243_readback_bit(u8 board, u8 axis, u8 bit);
int pci1243_readback_reg(u8 board, u8 axis, u8 reg, u32* data);
int pci1243_readback_port(u8 board, u8 port, u8* data);
int pci1243_bang_reg_bit(u8 board, u8 axis, u8 bit, u8 mode, u8 upload);
int pci1243_bang_port_bits(u8 board, u8 port, u8 bits, u8 mode, u8 upload);

int pci1243_reset_pcd4541(u8 board);

int pci1243_plx_read_reg(u8 board, u8 reg, u32 *data);
int pci1243_plx_write_reg(u8 board, u8 reg, u32 data);
int pci1243_plx_dump_regs(u8 board);  // via printk into dmesg


pci1243_start_stop_raw(board, axis, cmd, ramp_ena, use_fh, stop_irq_ena, use_sta)
pci1243_start_stop(board, axis, cmd)
pci1243_start(board, axis)
pci1243_stop(board, axis)
pci1243_decel_stop(board, axis)

pci1243_reg_slct(board, axis, reg, delay)
pci1243_op_mode(board, axis, mode)
pci1243_out_mode(board, axis, mode)

pci1243_preset_pulse_count(board, axis, data, delay)
pci1243_fl(board, axis, data, delay)
pci1243_fh(board, axis, data, delay)
pci1243_acceleration(board, axis, data, delay)
pci1243_multiplier(board, axis, data, delay)
pci1243_rampdown_point(board, axis, data, delay)
pci1243_idling_pulse(board, axis, data, delay)
pci1243_output_type(board, axis, data, delay)

pci1243_timer_set_data(board, data)


Compared to the PCD4541 manual, it's clear that most of the macros are mere shorthands for controller registers and commands. All of them really return an "int" error code (0 if success). For more information about the argument types and semantics, consult the comments in the header file and examples.

A recap of S-curve maths

The PCD4541 manual is exhaustive but sometimes messy about the maths. Let's have a few things sorted out.

A note on units

In this paper, wherever we speak of speed, we actually mean pulse rate, i.e. frequency (in Hz). Similarly, wherever we speak of distance or travel (abbreviated as "s"), we actually mean a total number of pulses (abbreviated as "p" or "P").

The sketches of ramps are plots of speed vs. time (i.e. not distance vs. time).

When ramping, the chip starts accelerating from an "initial speed" (V0) and accelerates up to the ultimate drive speed (V). The output then ticks at this ultimate speed (V), until the chip is told to decelerate (or decides to do that). Deceleration works vice versa.
During S-curve (quadratic) ramping, the acceleration is also variable. Each S-curve ramp consists of two parabolic (quadratic) areas. I.e., a whole accel/run/decel path contains four distinct parabolic areas. In each parabolic area, acceleration grows in a linear fashion (jerk is constant). I.e., acceleration is a derivative of speed, and jerk is a derivative of acceleration. One S-curve ramp is continuous in the second derivative, but the third derivative, called "jerk", is incontinuous just in the middle of the "S", where the speed has an inflexion point.

Let's distinguish between the machine units (register values) and the corresponding magnitudes used in general physics and mathematics.

Let's label the hardware values with capital letters:
M = multiplier
P = total pulse count (for preset pulse count operation)
V0 = initial speed (velocity) or pulse rate / frequency - the FL register
V = ultimate driving speed (velocity) or pulse rate / frequency - the FH register
A = acceleration (also used for deceleration)

Let's label the metric magnitudes (physical, SI, or whatever) with the corresponding lower-case letters - some of them are actual physical categories, some are not:
m = multiplier
s = p = distance traveled
v0 = initial speed
v = ultimate driving speed
a = acceleration
k = jerk

The application programmer will likely calculate with the real-world physical values.
The PCI-1243/PCD4541 manuals provide a set of formulas to transform these into the machine values.

Somewhat formal math and physics

The following picture is a generic driving path plot (pulse rate vs. time) with S-curve acceleration and deceleration. Note the six regions (a through f) identified on the curve. The unlabeled seventh region in the middle is a constant speed area. Each region is goverened by its own mathematical function, though you may find useful analogies and shortcuts between a,c,d and f.

GIF: Generic S-curve accel, constant speed area, S-curve decel.

The PCD4541 is only capable of "pure quadratic" S-curves, i.e. the commonly seen intermediate region of linear acceleration is not possible. Combined with a maximum acceleration that characterizes the particular electro-mechanical system being driven, this limits the total acceleration time achievable over an S-curve ramp. It's not possible to strike a tradeoff between the speed of a linear ramp and the smoothness of an S-curve. The quadratic-only S-curve is ultimately smooth but slow - given the upper bound on instantaneous acceleration, the S-curve takes twice the time of a linear ramp with the same acceleration.

The essential set of formulas is best demonstrated on region A and it should really look like this:

GIF: v''=a'=k; v'=a=kt; v=1/2 kt^2; s=1/6 kt^3

The blue area are the magnitudes as we know them (please substitute P for s). The green area are the formulas of how to arrive at them. Please note that the essential coefficient is the "jerk". The red area can be omitted - the PCD4541 is not capable of a "constant acceleration" component in this sense. On the other hand, the yellow area is VERY useful - the "constant speed" component is our initial speed (please substitute v0 for vc). The last component, sc, is an initial offset of our pulse count or travel - very obvious but hard to say how useful (that's really up to the application programmer).

Obviously the acceleration math applies analogically to the deceleration area.

The application programmer will likely work with plots of speed vs. time. Consequently, the starting formula is perhaps the third one: v(t) = 1/2 kt2 + vc. You know your initial speed, your terminal speed (beware, region A ends halfway between them!) and the time required to sweep from the former to the latter. Get the difference of speeds and divide by two. Divide the time difference by two. Enter that into this key formula and you get the jerk.

Let's have another graph. It'll demonstrate a simple trick that may simplify the math for the four parabolic regions.

GIF: unity S-curve accel, unity S-curve decel - no const accel/decel, no const speed

The graph is a "unity curve" - regions A,C,D and F have duration of 1 s, their delta(v) is also equal to 1 P/s. There are no "constant accel/decel" regions (B and E) and the initial speed is a zero. Hence, this parabolic bell curve has only four regions. As described in the manual, using some simple math you'll find that

k = 2 P/s3
a(1) = 2 P/s2 = -a(3)
s(0,1) = 1/3 P
s(1,2) = 2 - 1/3 P
s(0,4) = 1/3 + 1 + 2/3 + 2/3 + 1 + 1/3 = 4 P

This gives some hints for the generic S-curve ramps involving nonzero constant acceleration regions.

Let's give up looking for shortcuts for the moment - let's go through all the math required for region "C", i.e. the upper end of the acceleration S-curve. Here's a simple sketch that will say more:

GIF: Generic S-curve accel, constant speed area, S-curve decel. And more...

The red equation in upper left corner describes the upside-down parabola. Our region "C" is a part of it.

The last equation in the following set is a generic formula for the red square under that region (number of pulses between t2 and t3). The steps above it describe how it was arrived at. This is to say that only the last line matters - no need to go through all that symbolic math in software :-)

GIF: a generic formula for pulses across region C

Try applying the values from the unity curve to verify that they fit the formula.

Another hint: try substituting a zero for t2 and (t3 - t2) for t3. Much simpler, is it :-)

Back to the PCD-4541 and its quadratic-only ramps. A single ramp consists of two parabolic areas, each with a different curve equation. There are two ways to derive the number of pulses (travel) within a ramp. Firstly, there's the symbolic method - let's combine the two curve equations for region A and C derived above, substituting a single delta t for t3 and t2 as suggested above:

GIF: s(A+C) = (v0 + v) * delta_t/2

Secondly, there's the graphical method that also makes use of the "unity curve" tricks:

GIF: a graphical explanation of travel across a quadratic-only ramp

And some formulas that can be identified in the sketch:

GIF: s(A+C) = (v0 + v) * delta_t/2

Thirdly, the PCD4541 manual suggests the following formula, which also works, even though it's hard to transform to the ones above:

GIF: s(A+C) = (v^2 - v0^2)/a(max)

Conversion to machine units

The PCD4541 manual describes the math in limited detail on pages 22 through 24 (pages 34 through 36 of the PDF). The calculations are done straight in the machine units or, at the very best, there are conversion formulas given in a form where

real-world physics value = formula involving the machine value

which is perhaps not much use for the application programmer who needs to calculate the machine values from physical values in the first place, to set up the controller. At the end of this chapter, the conversion formulas are listed in the inverted form.

The "velocity" here is a pulse rate (pulses per time), rather than a distance per time. Let's label the unit of one pulse with a capital "P". Let's use a slash "/" instead of the word "per". Thus, the unit of pulse rate (velocity) becomes 1 P/s (=1 Hz). The unit of acceleration will be 1 P/s2 (Hz/s). The unit of jerk will be 1 P/s3 (Hz/s2).

First, let's have the multiplier sorted out.
The nominal maximum output pulse rate of one PCD4541 axis is 200 kP/s. The external crystal ticks at 4.9152 MHz. Now the speed values are denominated in P/s, but the hardware registers are 16bit and only about 13 bits are effectively valid - the actual resolution of the speed values is 0 to 8000. So how do we reach those 200 kP/s?
Obviously there's some sort of a pre-scaler in the game. The coefficient here is called the "multiplier" (M), even though it's actually a divisor, inversely proportional to the theroetical multiplier "m". A value of 600 in the "MULTIPLIER" register actually means that indeed the FH and FL range of 0 - 8000 corresponds to 0 - 8000 P/s on the output. I.e., the physical " multiplier" is 1. The M register is inversely proportional to the multiplier, according to this formula:

      4915200          [Hz]
M = -----------
      8192 * m         [Hz?] [no unit]

This means that e.g. M = 300 (m = 2) implies a range of about 16 kPps, and M = 24 (m = 25) implies a range of about 200 kPps.

The multiplier is clearly the first variable that needs to be sorted out, based on the required real-world values and hardware ranges of speed and acceleration. The speed registers (FL, FH) have a range of 0-8191, the acceleration register is limited to 0-1023.

The speed values are easy:

           v0         [P/s]
FL = V0 = ----
            m         [no unit]

           v          [P/s]
FH = V = -----
           m          [no unit]

It seems that the counting registers (the preset register and the ramp-down point register) are always in units of actual output pulses, no matter what the multiplier is. That would make sense :-)

The acceleration register is somewhat peculiar. The hardware value is inversely proportional to the physical acceleration intended:

    m * 4915200       [P/s]
A = -----------
         a            [P/s^2]

Please note that the range of the ACCEL register is 1-1023 (10 bits). Thus, the actual minimum acceleration in "1x mode" (M == 600, m == 1) is about 5 kHz/s. Combined with a maximum pulse rate of 8 kHz, this is barely enough to be noticeable on a 'scope.

With the PCD4541, the physical magnitude called "jerk" is kept quite implicit - it's never mentioned in the manual. The slope of an S-curve ramp is influenced by the "ACCELERATION" register - the value entered is actually derived from the peak instantaneous acceleration (happens in the inflexion point halfway up the ramp), the conversion formula is the one above, the same as for linear ramps.

It can be easily proven that with the same values in FL, FH and ACCEL registers, a quadratic ramp takes twice the time of a linear ramp, and the S-curve's peak acceleration is the same as the linear curve's characteristic constant acceleration (i.e., its average acceleration is one half the value of a linear ramp).

Linear ramp:
    v - v0       [P/s]
t = ------
      a          [P/s^2]
S-curve ramp:
        v - v0       [P/s]
t = 2 * ------
        a(max)       [P/s^2]

Interestingly, during S-curve ramping, the minimum acceleration rate of 5 kHz/s only applies to the value set in the ACCEL register - it does not influence the instantaneous acceleration rate of the actual S-curve. In other words, the S-curve is not clipped to fit the range of the ACCEL register, and the S-curve ramp does take twice as long even with the ACCEL register set to 1023.

The implicit value of "jerk" can be extracted in several ways, e.g.:

GIF: k = 2 * a(max) / delta_t; k = a(max)^2 / delta_v

IRQ handling


The IRQ's are almost omitted in the Advantech PCI-1243 manuals, and barely described enough in the PCD4541 manual. There's a sweet secret or two, that have simply fallen below the radar screen, as far as the manuals are concerned. Yet the IRQ's do work and it's appropriate to use them if at all possible. It's not that complicated. The driver should very much simplify IRQ handling. As far as IRQ's are concerned, the PCI-1243 is certainly a major improvement, compared to the ISA-based PCL-839+, where the IRQ's were quite flawed... The fundamentally level-triggered PCI IRQ's are a key factor in that improvement, and Advantech has managed not to spoil the party in terms of their CPLD-based IRQ processing on the PCI-1243 :-)

General description

First of all, obviously it is perfectly possible to use the board without IRQ's. I.e., some application software would simply write a new speed value and maybe re-issue a start command every once in a while, based on an OS-based timer. In that case, the application programmer doesn't have to care about the peculiarities of IRQ handling on the board - just keep all the IRQ sources masked and you're fine.

On the PCI-1243 you can observe these IC's:

  1. the motor controller - one PCD4541
  2. a Lattice CPLD (universal gate array), providing some additional candy - see ports at IO_base+0x10 through IO_base+0x24 in the PCI-1243 manual
  3. an universal PCI slave bridge: PCI9030 by PLX.

The motor controller IC is an almost native ISA IC, and its access to the host system's PCI is mediated by the PLX slave bridge. IRQ handling for the PCD4541 is further interfered with by the CPLD - fortunately in a very sensitive and appropriate way. The CPLD further adds some Advantech board specific functionality.

Both the motor controller and the CPLD are involved in IRQ handling. The PLX slave bridge merely forwards the IRQ to the PCI bus with no processing. The IRQ line passthru is enabled by default, no need to mess with its IRQ handling registers.

The IRQ's generated by the motor controller are level-triggered right from the start. As long as the interrupt source is active, the IRQ line sticks to the active level (low). The CPLD merely maintains the level-triggered nature of the IRQ line, and OR's the level-triggered output of the PCD4541 with other Advantech-specific IRQ sources, to produce one PCI IRQ line, still level-triggered. The CPLD contains board-global interrupt masking and status registers. The status register needs to be cleared (=ACKed) by the ISR for proper operation, but only after all the individual sources are believed to be handled (=muffled).

The driver deploys a generic ISR that aims to take a reasonable action in response to any IRQ that may occur - specifically, it properly ACKs (=masks) both ramp-down and "stop" interrupts. All the application programmer has to do is use the wait_for_irq() function. See also the IRQ handler in pci-1243.c.

IRQ decoding, IRQ enabling and masking

In addition to the board-global IRQ status register, the motor controller chip provides three IRQ status bits per axis, all of them in one register (S0), mixed with other status bits that are not IRQ related. The OR'ed product of these sources per axis shows up as a single bit in another register (S2). Clearly, the IRQ status bits are interspersed with other useful data in the controller chip's internal registers.

There's more - on the PCD4541, it has been observed that the per-source IRQ status bits (as read from S0) don't follow the manual! The PCD4541 chip-level manual and the Advantech manuals say that S0.0 = stop int, S0.1 = ramp-down int, S0.2 = ext.start int. In reality, a rampdown int yields 0x05 in the last three bits, whereas a stop int yields 0x01 or 0x06. I.e., the three bits seem to return some sort of an enumerated result code, rather than a bit per IRQ source as the manuals would put it. Yet the IRQ masking bits do work and must be toggled as documented.

Which brings about the very interesting subject of IRQ masking and ACKing on this chip.

While the interrupt generating condition is true, its respective interrupt source is active, and so are the respective status flags - there's no automagic "ack upon read". The interrupt source becomes inactive only when the trigger condition ceases to be true anymore.
Hence, to prevent repeated ISR invocation, you have to mask the respective interrupt source within the ISR, and perhaps leave it up to the rest of the software to talk to the controller IC and re-enable the interrupt source only when appropriate.
The board-global IRQ enable bits in the CPLD need not be fiddled with at runtime - it is only toggled once at program startup, to enable interrupt generation globally for the whole board. The repetitive ACKing/masking is done using the per-axis/per-source masking bits, and also using the board-global IRQ status register, if you want to use its information for some ISR IO access optimization (which the driver's ISR does).

There's another peculiarity: the INT enable/disable bits are not a part of some regular register - they are a part of two important executive commands. Hence, you can't set/clear the "stop interrupt enable" bit without re-issuing a start/stop command, and you can't toggle the "ramp-down interrupt enable" bit without issuing another "register select" command.

Fortunately this means that the masking bits are tightly coupled to the commands that they'll likely be a part of - so this arrangement does make sense, even though it appears to decrease general programming transparency of the chip.
Still, the driver aims to assist in the bit-banging involved - all register writes are cached, so the application programmer doesn't have to keep track of all the bits when sending simple commands to the board.

A note on polarity: the IRQ signals generated by the motor controllers are active-low. Perhaps that's why the PCD4541 on-chip detailed IRQ source flags are also active-low (negative logic). The IRQ enabling bits use positive logic - set the bit to 0 to mask the IRQ source, set it to 1 to enable the IRQ. The board-global IRQ enable&status registers uses positive logic.

The driver has a compile-time option in the kernel-space pci-1243.c, allowing you to select whether to disable the "board-global interrupt enable bit" at driver insmod (so that your custom code must enable interrupts explicitly for the board, if desired), or to enable interrupts on the board-global scale, with just the individual bottom-level sources being disabled for a clean start (so that your individual custom application threads / contexts only need to mind their own respective interrupt source). The latter option is selected by default, in the driver distribution tarball - the examples enclosed rely on that setup.

The generic ISR takes care of all the ACKing, both per axis (in the PCD-4541, by masking its interrupt sources right after they have fired), and in the board-global IRQ status register (to keep the status bits current). As an application programmer, all you need to do is to unmask the interrupt sources desired, and wait4irq(). The PCD4541's per-axis interrupt sources need to be unmasked repetitively, because the ISR masks them again as a way of ACKing the interrupt. The timer interrupt needn't be unmasked over and over - once the timer is preset, loaded=latched, and the clock gate is open, it keeps working in a periodic mode, generating a single interrupt whenever it counts down to zero. The timer also auto-reloads to the preset value, so the timer's periodic precision is not impaired by IRQ latency. The IDI interrupt sources are edge-triggered as well, so they fire just once on every edge of the pre-configured polarity.
You can also elect to hack the generic ISR to add your own code right into the handler - in that case you're on your own.

Example programs

The package contains a few example programs. Apart from the obvious simple stepper demoes, there are two noteworthy code snippets: