Aviator on the BBC Micro

# Maths: Multiply16x16

```       Name: Multiply16x16                                           [Show more]
Type: Subroutine
Category: Maths
Summary: Multiply two unsigned 16-bit numbers
Deep dive: Times tables and nibble arithmetic
Context: See this subroutine in context in the source code
References: This subroutine is called as follows:
* Multiply16x16Bit0 calls Multiply16x16
* Multiply16x16Mix calls Multiply16x16

This routine multiplies two unsigned 16-bit numbers:

(H A) = (J I) * (S R) >> 16

The fractional part of the result is in W, so this is the full result, though
in practice W is ignored:

(H A W) = (J I) * (S R) >> 8

The fractional part is rounded up or down, to the nearest integer, by adding
128 at mult1. This part of the code is modified by the ApplyAerodynamics
routine to toggle this rounding behaviour.

If the C flag is set on return, then the result in H needs to be incremented,
so technically the result is:

(H+C A) = (J I) * (S R) >> 16

The multiplication is done using the following algorithm:

(J I) * (S R) = (J << 8 + I) * (S << 8 + R)
= (J << 8 * S << 8) + (J << 8 * R)
+ (I * S << 8)
+ (I * R)
= (J * S) << 16 + (J * R) << 8
+ (I * S) << 8
+ (I * R)

This returns a fractional result with the lowest byte containing the fraction.
The routine doesn't care about the very lowest byte, so it actually calculates
the following, effectively shifting the above to the right by 8 places and
dropping the (I * R) part:

(J I) * (S R) = (J * S) << 8 + (J * R) + (I * S)

before adding 128 to round the result up or down.

Arguments:

(S R)                An unsigned 16-bit number

(J I)                An unsigned 16-bit number

Returns:

(H A)                The result of the multiplication

C flag               If set, H should be incremented by the caller to get the
correct result

.Multiply16x16

LDX J                  \ Set (A V) = X * Y
LDY R                  \           = J * R
JSR Multiply8x8

STA G                  \ Set (G A) = (A V)
LDA V                  \           = J * R

CLC                    \ Clear the C flag for the following addition

.mult1

ADC #128               \ Set (G W) = (G A) + 128
STA W                  \
\ starting with the low bytes
\
\ The ADC #128 instruction gets modified by the
\ ApplyAerodynamics routine as follows:
\
\   * ADC #0 disables the rounding
\
\   * ADC #128 enables the rounding

BCC mult2              \ And then the high bytes, so we now have:
INC G                  \
\   (G W) = (J * R) + 128

.mult2

LDX S                  \ Set (A V) = X * Y
LDY J                  \           = S * J
JSR Multiply8x8

STA H                  \ Set (H A) = (A V)
LDA V                  \           = S * J

\ We now do the following addition:
\
\   (H G W) = (H A) << 8 + (G W)
\           = (H A 0) + (G W)
\
\ We don't need to do the lowest byte, as W + 0 is just
\ W again, so we can just do the following:
\
\   (H G) = (H A) + G
\
\ and then keep W as the lowest significant byte of the
\ result

CLC                    \ Set (H G) = (H A) + G
STA G                  \ starting with the low bytes

BCC mult3              \ And then the high bytes, so we now have:
INC H                  \
\   (H G) = (H A) + G
\
\ which is the same as:
\
\   (H G W) = (H A) << 8 + (G W)
\           = (S * J) << 8 + (J * R) + 128

.mult3

LDX S                  \ Set (A V) = X * Y
LDY I                  \           = S * I
JSR Multiply8x8

STA T                  \ Set (T A) = (A V)
LDA V                  \           = S * I

\ We now do the following addition:
\
\   (H A W) = (T A) + (H G W)
\
\ though we don't actually do the highest byte, but
\ instead return the C flag depending on whether the
\ addition of the middle bytes overflowed (in which case
\ the highest byte in H needs incrementing, a task we
\ leave to the caller)

CLC                    \ Set (A W) = (T A) + (G W)