Advantech PCI-1243 (PCI-1243U) driver for Linux 2.6

By: Frank Rysanek of FCC Prumyslove systemy s.r.o., Czech Republic
e-mail: rysanek AT fccps DOT cz

Latest version

This description: http://support.fccps.cz/download/adv/frr/pci-1243/pci-1243.html
Driver and library: http://support.fccps.cz/download/adv/frr/pci-1243/pci-1243-1.2.tgz

Hardware recap

The Advantech PCI-1243 is a 4-axis motor control boards for the PCI bus, based on a chip called PCD4541, by Nippon Pulse Motor (NPM). 4 axis = 4 independent channels = 4 motors.
The board is capable of the following, in hardware:

Constant speed operation - also called "low speed" mode
Linear ramping ("trapezoidal", v = at) - also called "high speed" mode
Rudimentary S-curve ramping (parabolic, v = at²) - subcategory of the "high speed" mode
Any of the above can run in
- continuous mode - indefinite travel; ramping down or immediate stop is performed only upon the respective stop command given by software (or external signals).
- or preset mode - preset travel; deceleration and final stop are calculated and performed automatically by the controller chip.
Interrupts - one PCI level-triggered IRQ per board, merging several interrupt sources:
- three IRQ sources per axis: external start (STA input), rampdown started, stopped
- eight isolated digital inputs, each can optionally generate an interrupt, with individual per-IDI-pin masking and status
- onboard timer
There's one master IRQ status register per board (added by Advantech via a CPLD), another board-global CPLD-based status register (1 byte) for the IDI inputs, and four 3-bit IRQ status words, one per axis (a feature of the PCD4541).
8 general-purpose inputs and 8 general-purpose outputs (A feature added by Advantech via an onboard CPLD)
A couple output options (pulse+dir/up+down)
An onboard timer, generating interrupts, running strictly in periodic mode (software gating and masking is possible) - 16bit resolution, timer clock = 100 kHz.
External STA input, for synchronous start of all axes - single external STA input per board, but software-configurable individually per axis for external or software-based STA triggering.
Single-ended, galvanically isolated output (and input).

In addition, the PCD4541 chip is capable of the following (this feature is not available on the PCI-1243, probably for the sake of simplicity and low cost): a pattern-encoded "excitation" output, four TTL output pins per axis for a 2-phase motor (unipolar or bipolar) - requires an extenal power driver/switch.

In comparison to some other controller boards and IC's (namely the PCI-1240 containing a Nova MCX314), these features are missing:

Interpolations
Fine-grained position feedback - i.e., a quadrature decoder or even a counter input, to read position from from an incremental rotary sensor.
Some younger relatives of the PCI-1240 (PCI-1241, PCI-1242, PCI-1247, PCI-1261) seem to feature specific output interface formats other than pulse+dir/up+down, targeted at particular types of motor or servo drives out there.
The S-curve ramping available with the PCI-1243 is rudimentary - the S-curve is quadratic only (lacks an intermediate region of linear acceleration).

The PCD4541 chip has been used on other advantech products, namely the PCL-839+. Therefore, most software should be easily portable. Note that the PCI-1243 gets more out of the PCD4541 than the earlier ISA designs: the fourth axis is available, IRQ's finally function the way they should etc.

Along with the PCI-1243, Advantech provided a board manual, containing a basic description of the PCD4541 - but the Advantech manual is certainly not as comprehensive as the chipmaker's PCD4541 chip manual, which can also have some weaker parts. This HTML document should shed some light into some of the remaining dark corners. The information presented here was gathered from additional reading (the PCD4541 chip manual) and by experiments while debugging the Linux driver. Most of the detective work has in fact been carried out in the past, during driver development for the PCL-839+. Porting that to the PCI-1243 was a breeze.

Driver and library

Level of abstraction

The driver and library are rather thin wrappers around the PCI-1243 hardware functionality. To use this software package, the application programmer should know the PCI-1243 registers and semantics. This software package aims to shield away the application programmer from the low-level coding details of the PC/Linux platform, but not much more. The functions and macros provided by this package accept the speed/acceleration etc. values in their native hardware types and ranges, related to the underlying hardware registers, as described by the PCI-1243/PCD4541 manuals. I.e., there's no "math behind the scenes" as to convert the metric/physical values into the low-level hardware ranges and resolutions. This has to be done by the application (application programmer).

There are only a few areas in this software that provide some non-zero processing. These are usually related to the fact that the PCI-1243 per-axis configuration bits and status flags are mixed and multiplexed in various ways and thus somewhat cumbersome to work with in their native form. Let's have a few notable examples:

The package provides an optional "cache" (stateful buffer) for any data that is written to write-only configuration registers, that can't be read back from the hardware (some even can, but it may be slow). Using the provided manipulator methods, it's possible to read back the last written value, toggle individual bits in the commands and registers (an easy way to sync the state between different control processes), mask an IRQ without affecting other bits in the respective hardware register etc.
There is a generic IRQ handler, and a function called wait4irq(), that allows processes to register to that handler with a mask specifying interrupt sources (per board) that the process is interested in. This mechanism uses a kernel-space list (per board) of processes waiting for the IRQ.
The package provides an option to read interrupt status in one 32bit word for all axes (returned by the wait4irq() function).

The basic PCD-4541 functionality is handled by reasonably mnemonic wrapper functions and macros. The more advanced functionality, such as individual bits in the configuration registers, are only covered at the level of bare chip commands, symbolic bit names and inb/outb wrappers. The application programmer will have to study the board+chip manuals and use these low-level functions and macros.

User space and kernel space

9 The software package described in this document consists of a driver in the form of a kernel-space module and a user-space library (for static or dynamic linking). The user-space library talks to the driver via a character device (opens an appropriate device node in the file system). The package is ready to work with multiple boards in the system, yet there's a single device node - multiple minor numbers would be more unix-like but perhaps unnecessarily complex to work with (depending on the particular real-world scenario at hand).

The header file is universal, all the function prototypes and macros work the same in kernel space and in user-space. In other words, the application programmer has freedom of choice, whether to compile the control application as a user-space executable binary, or as an insertable kernel-space module.

Obviously there are upsides and downsides to both kernel-space and user-space:

In kernel space, the application programmer can't use a full libc and other user-space libraries.
Communication with various own and third-party programs, access to networking etc is affected by the choice of user-space vs. kernel-space.
There can be timing constraints and concerns:
- An application running in a kernel module (in IRQ or in a kthread) will have direct access to the kernel-space form of the pci-1243 API, which is arguably as fast as it gets, or can access the hardware directly (may have advantages in some special cases).
- In contrast, an application running in user-space will suffer from ioctl() overhead - this entails multiple copies of data among different memory pages, several more function calls (stack operations) and context switching. Also, a user-space app has its time scheduled strictly by the preemptive scheduler - another reason why its response may be more relaxed. Various real-time extensions come to mind here.
- Only a fairly little volume of code can run within an IRQ. Much more can run in a kthread. On the other hand, a kthread has the disadvantage of scheduling-introduced delays.
An application running in kernel-space is not encapsulated by any memory protection mechanisms - if there's a bug in our application, it can hang the whole machine. Most bugs in kernel-space modules (the well-behaved ones) tend to demonstrate themselves in this way fairly soon, which doesn't make them any easier to debug. In comparison to this issue, the risk that other kernel-space code could inadvertently break our kernel-space app is relatively unimportant (and, in embedded scenarios, malicious network-borne security attacks are perehaps even less of a problem). In comparison to that, memory bugs in a user-space application are a lot less likely to cause havoc in the whole system and are much easier to debug.

It's possible to strike a reasonable trade-off by splitting the application among kernel-space and user-space: timing-critical parts of the control task can run in a kernel module and the more relaxed parts can run as user-space apps, communicating among themselves from k-space to u-space via syscalls (read, write, ioctl) over device nodes (block or character type).

The header file

Overview

The header file contains a reasonable level of comments. If you're looking for a quick start, take a look at it.

The header file starts with mnemonic preprocessor macros (#defines) that substitute constant numeric identifiers of the various commands, registers, axis labels, IRQ source bits etc.
There are just a few core functions, implementing key parts of the functionality: read IO port, write IO port, issue a command, write a register, read a register etc. These in turn are used by a number of function-like wrapper macros, simplifying mnemonic access to the various PCI-1243 registers.

As mentioned above, the set of functions and macros is the same in kernel-space and in user-space. The only exception is the pci1243_opendev() prototype that takes no argument in kernel space and one argument in user space (the device node filename).

The identical function prototypes are implemented differently in the kernel-space module and in the user-space library. Any functionality is really implemented only in the kernel variety, the user-space library consists of mere wrapper functions that pass the arguments to their kernel-space counterparts via ioctl(). The bunch of function-like macros only live in the header file (no implementation in the C files) - hence they work the same way in kernel space and in user space.

The elements of the header file are stacked vaguely like this:

kernel space		user space
macros		macros
core functions	<=-.	core functions
outw,inw,kernel code	`-	ioctl()
hardware

Except that, due to the inner dependencies of the header file, the contents flow from top to bottom, whereas the stack-style view above builds the layers from bottom to top.

In other words, in the header file, the most arcane functionality is located at the bottom. If you're after powerful macros and functions, try reading the header file from bottom to top :-)

Functions and macros, in order of appearance

For the named constants, take a look into the header file.

Functions:

int pci1243_opendev(void);            // kernel space
int pci1243_opendev(char* name);  // user space
int pci1243_closedev(void);

int pci1243_write_port(u8 board, u8 port, u8 data, u8 delay);
int pci1243_read_port(u8 board, u8 port, u8* data);
int pci1243_cmd_raw(u8 board, u8 axis, u8 cmd, u8 delay);
int pci1243_cmd(u8 board, u8 axis, u8 cmd, u8 delay);
int pci1243_status0(u8 board, u8 axis, u8* data);
int pci1243_write_reg(u8 board, u8 axis, u8 reg, u32 data, u8 delay);
int pci1243_read_reg(u8 board, u8 axis, u8 reg, u32* data);
int pci1243_write_IO(u8 board, u8 data);
int pci1243_read_IO(u8 board, u8* data);

int pci1243_wait4irq(u8 board, u32* data);

int pci1243_readback_bit(u8 board, u8 axis, u8 bit);
int pci1243_readback_reg(u8 board, u8 axis, u8 reg, u32* data);
int pci1243_readback_port(u8 board, u8 port, u8* data);
int pci1243_bang_reg_bit(u8 board, u8 axis, u8 bit, u8 mode, u8 upload);
int pci1243_bang_port_bits(u8 board, u8 port, u8 bits, u8 mode, u8 upload);

int pci1243_reset_pcd4541(u8 board);

int pci1243_plx_read_reg(u8 board, u8 reg, u32 *data);
int pci1243_plx_write_reg(u8 board, u8 reg, u32 data);
int pci1243_plx_dump_regs(u8 board);  // via printk into dmesg

Macros:

pci1243_start_stop_raw(board, axis, cmd, ramp_ena, use_fh, stop_irq_ena, use_sta)
pci1243_start_stop(board, axis, cmd)
pci1243_start(board, axis)
pci1243_stop(board, axis)
pci1243_decel_stop(board, axis)

pci1243_reg_slct(board, axis, reg, delay)
pci1243_op_mode(board, axis, mode)
pci1243_out_mode(board, axis, mode)

pci1243_preset_pulse_count(board, axis, data, delay)
pci1243_fl(board, axis, data, delay)
pci1243_fh(board, axis, data, delay)
pci1243_acceleration(board, axis, data, delay)
pci1243_multiplier(board, axis, data, delay)
pci1243_rampdown_point(board, axis, data, delay)
pci1243_idling_pulse(board, axis, data, delay)
pci1243_output_type(board, axis, data, delay)

pci1243_timer_set_data(board, data)
pci1243_timer_latch_data(board)
pci1243_timer_gate_open(board)
pci1243_timer_gate_close(board)

pci1243_timer_irq_enable(board)
pci1243_timer_irq_disable(board)
pci1243_motion_irq_enable(board)
pci1243_motion_irq_disable(board)
pci1243_global_irq_enable(board)
pci1243_global_irq_disable(board)

Compared to the PCD4541 manual, it's clear that most of the macros are mere shorthands for controller registers and commands. All of them really return an "int" error code (0 if success). For more information about the argument types and semantics, consult the comments in the header file and examples.

A recap of S-curve maths

The PCD4541 manual is exhaustive but sometimes messy about the maths. Let's have a few things sorted out.

A note on units

In this paper, wherever we speak of speed, we actually mean pulse rate, i.e. frequency (in Hz). Similarly, wherever we speak of distance or travel (abbreviated as "s"), we actually mean a total number of pulses (abbreviated as "p" or "P").

The sketches of ramps are plots of speed vs. time (i.e. not distance vs. time).

When ramping, the chip starts accelerating from an "initial speed" (V₀) and accelerates up to the ultimate drive speed (V). The output then ticks at this ultimate speed (V), until the chip is told to decelerate (or decides to do that). Deceleration works vice versa.
During S-curve (quadratic) ramping, the acceleration is also variable. Each S-curve ramp consists of two parabolic (quadratic) areas. I.e., a whole accel/run/decel path contains four distinct parabolic areas. In each parabolic area, acceleration grows in a linear fashion (jerk is constant). I.e., acceleration is a derivative of speed, and jerk is a derivative of acceleration. One S-curve ramp is continuous in the second derivative, but the third derivative, called "jerk", is incontinuous just in the middle of the "S", where the speed has an inflexion point.

Let's distinguish between the machine units (register values) and the corresponding magnitudes used in general physics and mathematics.

Let's label the hardware values with capital letters:
M = multiplier
P = total pulse count (for preset pulse count operation)
V0 = initial speed (velocity) or pulse rate / frequency - the FL register
V = ultimate driving speed (velocity) or pulse rate / frequency - the FH register
A = acceleration (also used for deceleration)

Let's label the metric magnitudes (physical, SI, or whatever) with the corresponding lower-case letters - some of them are actual physical categories, some are not:
m = multiplier
s = p = distance traveled
v0 = initial speed
v = ultimate driving speed
a = acceleration
k = jerk

The application programmer will likely calculate with the real-world physical values.
The PCI-1243/PCD4541 manuals provide a set of formulas to transform these into the machine values.

Somewhat formal math and physics

The following picture is a generic driving path plot (pulse rate vs. time) with S-curve acceleration and deceleration. Note the six regions (a through f) identified on the curve. The unlabeled seventh region in the middle is a constant speed area. Each region is goverened by its own mathematical function, though you may find useful analogies and shortcuts between a,c,d and f.

GIF: Generic S-curve accel, constant speed area, S-curve decel.

The PCD4541 is only capable of "pure quadratic" S-curves, i.e. the commonly seen intermediate region of linear acceleration is not possible. Combined with a maximum acceleration that characterizes the particular electro-mechanical system being driven, this limits the total acceleration time achievable over an S-curve ramp. It's not possible to strike a tradeoff between the speed of a linear ramp and the smoothness of an S-curve. The quadratic-only S-curve is ultimately smooth but slow - given the upper bound on instantaneous acceleration, the S-curve takes twice the time of a linear ramp with the same acceleration.

The essential set of formulas is best demonstrated on region A and it should really look like this:

GIF: v''=a'=k; v'=a=kt; v=1/2 kt^2; s=1/6 kt^3

The blue area are the magnitudes as we know them (please substitute P for s). The green area are the formulas of how to arrive at them. Please note that the essential coefficient is the "jerk". The red area can be omitted - the PCD4541 is not capable of a "constant acceleration" component in this sense. On the other hand, the yellow area is VERY useful - the "constant speed" component is our initial speed (please substitute v₀ for v_c). The last component, s_c, is an initial offset of our pulse count or travel - very obvious but hard to say how useful (that's really up to the application programmer).

Obviously the acceleration math applies analogically to the deceleration area.

The application programmer will likely work with plots of speed vs. time. Consequently, the starting formula is perhaps the third one: v(t) = 1/2 kt² + v_c. You know your initial speed, your terminal speed (beware, region A ends halfway between them!) and the time required to sweep from the former to the latter. Get the difference of speeds and divide by two. Divide the time difference by two. Enter that into this key formula and you get the jerk.

Let's have another graph. It'll demonstrate a simple trick that may simplify the math for the four parabolic regions.

GIF: unity S-curve accel, unity S-curve decel - no const accel/decel, no const speed

The graph is a "unity curve" - regions A,C,D and F have duration of 1 s, their delta(v) is also equal to 1 P/s. There are no "constant accel/decel" regions (B and E) and the initial speed is a zero. Hence, this parabolic bell curve has only four regions. As described in the manual, using some simple math you'll find that

k = 2 P/s³
a(1) = 2 P/s² = -a(3)
s(0,1) = 1/3 P
s(1,2) = 2 - 1/3 P
s(0,4) = 1/3 + 1 + 2/3 + 2/3 + 1 + 1/3 = 4 P

This gives some hints for the generic S-curve ramps involving nonzero constant acceleration regions.

Let's give up looking for shortcuts for the moment - let's go through all the math required for region "C", i.e. the upper end of the acceleration S-curve. Here's a simple sketch that will say more:

GIF: Generic S-curve accel, constant speed area, S-curve decel. And more...

The red equation in upper left corner describes the upside-down parabola. Our region "C" is a part of it.

The last equation in the following set is a generic formula for the red square under that region (number of pulses between t2 and t3). The steps above it describe how it was arrived at. This is to say that only the last line matters - no need to go through all that symbolic math in software :-)

GIF: a generic formula for pulses across region C

Try applying the values from the unity curve to verify that they fit the formula.

Another hint: try substituting a zero for t2 and (t3 - t2) for t3. Much simpler, is it :-)

Back to the PCD-4541 and its quadratic-only ramps. A single ramp consists of two parabolic areas, each with a different curve equation. There are two ways to derive the number of pulses (travel) within a ramp. Firstly, there's the symbolic method - let's combine the two curve equations for region A and C derived above, substituting a single delta t for t3 and t2 as suggested above:

GIF: s(A+C) = (v0 + v) * delta_t/2

Secondly, there's the graphical method that also makes use of the "unity curve" tricks:

GIF: a graphical explanation of travel across a quadratic-only ramp

And some formulas that can be identified in the sketch:

GIF: s(A+C) = (v0 + v) * delta_t/2

Thirdly, the PCD4541 manual suggests the following formula, which also works, even though it's hard to transform to the ones above:

GIF: s(A+C) = (v^2 - v0^2)/a(max)

Conversion to machine units

The PCD4541 manual describes the math in limited detail on pages 22 through 24 (pages 34 through 36 of the PDF). The calculations are done straight in the machine units or, at the very best, there are conversion formulas given in a form where

real-world physics value = formula involving the machine value

which is perhaps not much use for the application programmer who needs to calculate the machine values from physical values in the first place, to set up the controller. At the end of this chapter, the conversion formulas are listed in the inverted form.

The "velocity" here is a pulse rate (pulses per time), rather than a distance per time. Let's label the unit of one pulse with a capital "P". Let's use a slash "/" instead of the word "per". Thus, the unit of pulse rate (velocity) becomes 1 P/s (=1 Hz). The unit of acceleration will be 1 P/s² (Hz/s). The unit of jerk will be 1 P/s³ (Hz/s²).

First, let's have the multiplier sorted out.
The nominal maximum output pulse rate of one PCD4541 axis is 200 kP/s. The external crystal ticks at 4.9152 MHz. Now the speed values are denominated in P/s, but the hardware registers are 16bit and only about 13 bits are effectively valid - the actual resolution of the speed values is 0 to 8000. So how do we reach those 200 kP/s?
Obviously there's some sort of a pre-scaler in the game. The coefficient here is called the "multiplier" (M), even though it's actually a divisor, inversely proportional to the theroetical multiplier "m". A value of 600 in the "MULTIPLIER" register actually means that indeed the FH and FL range of 0 - 8000 corresponds to 0 - 8000 P/s on the output. I.e., the physical " multiplier" is 1. The M register is inversely proportional to the multiplier, according to this formula:

      4915200          [Hz]
M = -----------
      8192 * m         [Hz?] [no unit]

This means that e.g. M = 300 (m = 2) implies a range of about 16 kPps, and M = 24 (m = 25) implies a range of about 200 kPps.

The multiplier is clearly the first variable that needs to be sorted out, based on the required real-world values and hardware ranges of speed and acceleration. The speed registers (FL, FH) have a range of 0-8191, the acceleration register is limited to 0-1023.

The speed values are easy:

           v0         [P/s]
FL = V0 = ----
            m         [no unit]

           v          [P/s]
FH = V = -----
           m          [no unit]

It seems that the counting registers (the preset register and the ramp-down point register) are always in units of actual output pulses, no matter what the multiplier is. That would make sense :-)

The acceleration register is somewhat peculiar. The hardware value is inversely proportional to the physical acceleration intended:

    m * 4915200       [P/s]
A = -----------
         a            [P/s^2]

Please note that the range of the ACCEL register is 1-1023 (10 bits). Thus, the actual minimum acceleration in "1x mode" (M == 600, m == 1) is about 5 kHz/s. Combined with a maximum pulse rate of 8 kHz, this is barely enough to be noticeable on a 'scope.

With the PCD4541, the physical magnitude called "jerk" is kept quite implicit - it's never mentioned in the manual. The slope of an S-curve ramp is influenced by the "ACCELERATION" register - the value entered is actually derived from the peak instantaneous acceleration (happens in the inflexion point halfway up the ramp), the conversion formula is the one above, the same as for linear ramps.

It can be easily proven that with the same values in FL, FH and ACCEL registers, a quadratic ramp takes twice the time of a linear ramp, and the S-curve's peak acceleration is the same as the linear curve's characteristic constant acceleration (i.e., its average acceleration is one half the value of a linear ramp).

Linear ramp:

    v - v0       [P/s]
t = ------
      a          [P/s^2]

S-curve ramp:

        v - v0       [P/s]
t = 2 * ------
        a(max)       [P/s^2]

Interestingly, during S-curve ramping, the minimum acceleration rate of 5 kHz/s only applies to the value set in the ACCEL register - it does not influence the instantaneous acceleration rate of the actual S-curve. In other words, the S-curve is not clipped to fit the range of the ACCEL register, and the S-curve ramp does take twice as long even with the ACCEL register set to 1023.

The implicit value of "jerk" can be extracted in several ways, e.g.:

GIF: k = 2 * a(max) / delta_t; k = a(max)^2 / delta_v

IRQ handling

Introduction

The IRQ's are almost omitted in the Advantech PCI-1243 manuals, and barely described enough in the PCD4541 manual. There's a sweet secret or two, that have simply fallen below the radar screen, as far as the manuals are concerned. Yet the IRQ's do work and it's appropriate to use them if at all possible. It's not that complicated. The driver should very much simplify IRQ handling. As far as IRQ's are concerned, the PCI-1243 is certainly a major improvement, compared to the ISA-based PCL-839+, where the IRQ's were quite flawed... The fundamentally level-triggered PCI IRQ's are a key factor in that improvement, and Advantech has managed not to spoil the party in terms of their CPLD-based IRQ processing on the PCI-1243 :-)

General description

First of all, obviously it is perfectly possible to use the board without IRQ's. I.e., some application software would simply write a new speed value and maybe re-issue a start command every once in a while, based on an OS-based timer. In that case, the application programmer doesn't have to care about the peculiarities of IRQ handling on the board - just keep all the IRQ sources masked and you're fine.

On the PCI-1243 you can observe these IC's:

the motor controller - one PCD4541
a Lattice CPLD (universal gate array), providing some additional candy - see ports at IO_base+0x10 through IO_base+0x24 in the PCI-1243 manual
an universal PCI slave bridge: PCI9030 by PLX.

The motor controller IC is an almost native ISA IC, and its access to the host system's PCI is mediated by the PLX slave bridge. IRQ handling for the PCD4541 is further interfered with by the CPLD - fortunately in a very sensitive and appropriate way. The CPLD further adds some Advantech board specific functionality.

Both the motor controller and the CPLD are involved in IRQ handling. The PLX slave bridge merely forwards the IRQ to the PCI bus with no processing. The IRQ line passthru is enabled by default, no need to mess with its IRQ handling registers.

The IRQ's generated by the motor controller are level-triggered right from the start. As long as the interrupt source is active, the IRQ line sticks to the active level (low). The CPLD merely maintains the level-triggered nature of the IRQ line, and OR's the level-triggered output of the PCD4541 with other Advantech-specific IRQ sources, to produce one PCI IRQ line, still level-triggered. The CPLD contains board-global interrupt masking and status registers. The status register needs to be cleared (=ACKed) by the ISR for proper operation, but only after all the individual sources are believed to be handled (=muffled).

The driver deploys a generic ISR that aims to take a reasonable action in response to any IRQ that may occur - specifically, it properly ACKs (=masks) both ramp-down and "stop" interrupts. All the application programmer has to do is use the wait_for_irq() function. See also the IRQ handler in pci-1243.c.

IRQ decoding, IRQ enabling and masking

In addition to the board-global IRQ status register, the motor controller chip provides three IRQ status bits per axis, all of them in one register (S0), mixed with other status bits that are not IRQ related. The OR'ed product of these sources per axis shows up as a single bit in another register (S2). Clearly, the IRQ status bits are interspersed with other useful data in the controller chip's internal registers.

There's more - on the PCD4541, it has been observed that the per-source IRQ status bits (as read from S0) don't follow the manual! The PCD4541 chip-level manual and the Advantech manuals say that S0.0 = stop int, S0.1 = ramp-down int, S0.2 = ext.start int. In reality, a rampdown int yields 0x05 in the last three bits, whereas a stop int yields 0x01 or 0x06. I.e., the three bits seem to return some sort of an enumerated result code, rather than a bit per IRQ source as the manuals would put it. Yet the IRQ masking bits do work and must be toggled as documented.

Which brings about the very interesting subject of IRQ masking and ACKing on this chip.

While the interrupt generating condition is true, its respective interrupt source is active, and so are the respective status flags - there's no automagic "ack upon read". The interrupt source becomes inactive only when the trigger condition ceases to be true anymore.
Hence, to prevent repeated ISR invocation, you have to mask the respective interrupt source within the ISR, and perhaps leave it up to the rest of the software to talk to the controller IC and re-enable the interrupt source only when appropriate.
The board-global IRQ enable bits in the CPLD need not be fiddled with at runtime - it is only toggled once at program startup, to enable interrupt generation globally for the whole board. The repetitive ACKing/masking is done using the per-axis/per-source masking bits, and also using the board-global IRQ status register, if you want to use its information for some ISR IO access optimization (which the driver's ISR does).

There's another peculiarity: the INT enable/disable bits are not a part of some regular register - they are a part of two important executive commands. Hence, you can't set/clear the "stop interrupt enable" bit without re-issuing a start/stop command, and you can't toggle the "ramp-down interrupt enable" bit without issuing another "register select" command.

Fortunately this means that the masking bits are tightly coupled to the commands that they'll likely be a part of - so this arrangement does make sense, even though it appears to decrease general programming transparency of the chip.
Still, the driver aims to assist in the bit-banging involved - all register writes are cached, so the application programmer doesn't have to keep track of all the bits when sending simple commands to the board.

A note on polarity: the IRQ signals generated by the motor controllers are active-low. Perhaps that's why the PCD4541 on-chip detailed IRQ source flags are also active-low (negative logic). The IRQ enabling bits use positive logic - set the bit to 0 to mask the IRQ source, set it to 1 to enable the IRQ. The board-global IRQ enable&status registers uses positive logic.

The driver has a compile-time option in the kernel-space pci-1243.c, allowing you to select whether to disable the "board-global interrupt enable bit" at driver insmod (so that your custom code must enable interrupts explicitly for the board, if desired), or to enable interrupts on the board-global scale, with just the individual bottom-level sources being disabled for a clean start (so that your individual custom application threads / contexts only need to mind their own respective interrupt source). The latter option is selected by default, in the driver distribution tarball - the examples enclosed rely on that setup.

The generic ISR takes care of all the ACKing, both per axis (in the PCD-4541, by masking its interrupt sources right after they have fired), and in the board-global IRQ status register (to keep the status bits current). As an application programmer, all you need to do is to unmask the interrupt sources desired, and wait4irq(). The PCD4541's per-axis interrupt sources need to be unmasked repetitively, because the ISR masks them again as a way of ACKing the interrupt. The timer interrupt needn't be unmasked over and over - once the timer is preset, loaded=latched, and the clock gate is open, it keeps working in a periodic mode, generating a single interrupt whenever it counts down to zero. The timer also auto-reloads to the preset value, so the timer's periodic precision is not impaired by IRQ latency. The IDI interrupt sources are edge-triggered as well, so they fire just once on every edge of the pre-configured polarity.
You can also elect to hack the generic ISR to add your own code right into the handler - in that case you're on your own.

Example programs

The package contains a few example programs. Apart from the obvious simple stepper demoes, there are two noteworthy code snippets:

a program to demonstrate the use of an external STA signal (not mentioned much in the Advantech docs)
a program to fiddle with some details of the PLX PCI9030 local bus timings (register 28). The library provides access to the PCI9030 local bus configuration registers. Note that the register 0x28 contents are pretty conservative, you can make the local bus transaction much shorter, but in the end it typically won't make your write transactions more tightly packed, as the minimum turnaround time is pegged by the PCI latency timers (or so I guess). Interestingly, the PCI9030 does not support a latency timer, but the upstream PCI bridge typically does, and its secondary-side latency timer tends to be pretty relaxed (32 or 64).

References

Advantech PCI-1243 Manual
Nippon Pulse Motors' PCD4541 User's Manual