The Multiply16x16 subroutine - Aviator on the BBC Micro

       Name: Multiply16x16                                           [Show more]
       Type: Subroutine
   Category: Maths
    Summary: Multiply two unsigned 16-bit numbers
  Deep dive: Times tables and nibble arithmetic
    Context: See this subroutine in context in the source code
 References: This subroutine is called as follows:
             * Multiply16x16Bit0 calls Multiply16x16
             * Multiply16x16Mix calls Multiply16x16


 This routine multiplies two unsigned 16-bit numbers:

   (H A) = (J I) * (S R) >> 16

 The fractional part of the result is in W, so this is the full result, though
 in practice W is ignored:

   (H A W) = (J I) * (S R) >> 8

 The fractional part is rounded up or down, to the nearest integer, by adding
 128 at mult1. This part of the code is modified by the ApplyAerodynamics
 routine to toggle this rounding behaviour.

 If the C flag is set on return, then the result in H needs to be incremented,
 so technically the result is:

   (H+C A) = (J I) * (S R) >> 16

 The multiplication is done using the following algorithm:

   (J I) * (S R) = (J << 8 + I) * (S << 8 + R)
                 = (J << 8 * S << 8) + (J << 8 * R)
                                     + (I * S << 8)
                                     + (I * R)
                 = (J * S) << 16 + (J * R) << 8
                                 + (I * S) << 8
                                 + (I * R)

 This returns a fractional result with the lowest byte containing the fraction.
 The routine doesn't care about the very lowest byte, so it actually calculates
 the following, effectively shifting the above to the right by 8 places and
 dropping the (I * R) part:

   (J I) * (S R) = (J * S) << 8 + (J * R) + (I * S)

 before adding 128 to round the result up or down.


 Arguments:

   (S R)                An unsigned 16-bit number

   (J I)                An unsigned 16-bit number


 Returns:

   (H A)                The result of the multiplication

   C flag               If set, H should be incremented by the caller to get the
                        correct result


.Multiply16x16

 LDX J                  \ Set (A V) = X * Y
 LDY R                  \           = J * R
 JSR Multiply8x8

 STA G                  \ Set (G A) = (A V)
 LDA V                  \           = J * R

 CLC                    \ Clear the C flag for the following addition

.mult1

 ADC #128               \ Set (G W) = (G A) + 128
 STA W                  \
                        \ starting with the low bytes
                        \
                        \ The ADC #128 instruction gets modified by the
                        \ ApplyAerodynamics routine as follows:
                        \
                        \   * ADC #0 disables the rounding
                        \
                        \   * ADC #128 enables the rounding

 BCC mult2              \ And then the high bytes, so we now have:
 INC G                  \
                        \   (G W) = (J * R) + 128

.mult2

 LDX S                  \ Set (A V) = X * Y
 LDY J                  \           = S * J
 JSR Multiply8x8

 STA H                  \ Set (H A) = (A V)
 LDA V                  \           = S * J

                        \ We now do the following addition:
                        \
                        \   (H G W) = (H A) << 8 + (G W)
                        \           = (H A 0) + (G W)
                        \
                        \ We don't need to do the lowest byte, as W + 0 is just
                        \ W again, so we can just do the following:
                        \
                        \   (H G) = (H A) + G
                        \
                        \ and then keep W as the lowest significant byte of the
                        \ result

 CLC                    \ Set (H G) = (H A) + G
 ADC G                  \
 STA G                  \ starting with the low bytes

 BCC mult3              \ And then the high bytes, so we now have:
 INC H                  \
                        \   (H G) = (H A) + G
                        \
                        \ which is the same as:
                        \
                        \   (H G W) = (H A) << 8 + (G W)
                        \           = (S * J) << 8 + (J * R) + 128

.mult3

 LDX S                  \ Set (A V) = X * Y
 LDY I                  \           = S * I
 JSR Multiply8x8

 STA T                  \ Set (T A) = (A V)
 LDA V                  \           = S * I

                        \ We now do the following addition:
                        \
                        \   (H A W) = (T A) + (H G W)
                        \
                        \ though we don't actually do the highest byte, but
                        \ instead return the C flag depending on whether the
                        \ addition of the middle bytes overflowed (in which case
                        \ the highest byte in H needs incrementing, a task we
                        \ leave to the caller)

 CLC                    \ Set (A W) = (T A) + (G W)
 ADC W                  \
 STA W                  \ starting with the low bytes

 LDA T                  \ And then the high bytes, so we now have:
 ADC G                  \
                        \   (H A W) = (T A) + (H G W)
                        \           = (S * I) + (S * J) << 8 + (J * R) + 128
                        \
                        \ which is the result we need. The C flag is set if the
                        \ last addition overflowed, in which case H needs to be
                        \ incremented by the caller to get the final result

 RTS                    \ Return from the subroutine

[X]

Variable G in workspace Zero page

Temporary storage, used in a number of places

[X]

Variable H in workspace Zero page

Temporary storage, used in a number of places

[X]

Variable I in workspace Zero page

Temporary storage, used in a number of places

[X]

Variable J in workspace Zero page

Temporary storage, used in a number of places

[X]

Subroutine Multiply8x8 (category: Maths)

Multiply two unsigned 8-bit numbers

[X]

Variable R in workspace Zero page

Temporary storage, used in a number of places

[X]

Variable S in workspace Zero page

Temporary storage, used in a number of places

[X]

Variable T in workspace Zero page

Temporary storage, used in a number of places

[X]

Variable V in workspace Zero page

Temporary storage, used in a number of places

[X]

Variable W in workspace Zero page

Temporary storage, used in a number of places

[X]

Label mult2 is local to this routine

[X]

Label mult3 is local to this routine

Maths: Multiply16x16