Skip to navigation


Maths: Multiply16x16

Name: Multiply16x16 [Show more] Type: Subroutine Category: Maths Summary: Multiply two unsigned 16-bit numbers Deep dive: Times tables and nibble arithmetic
Context: See this subroutine in context in the source code References: This subroutine is called as follows: * Multiply16x16Bit0 calls Multiply16x16 * Multiply16x16Mix calls Multiply16x16

This routine multiplies two unsigned 16-bit numbers: (H A) = (J I) * (S R) >> 16 The fractional part of the result is in W, so this is the full result, though in practice W is ignored: (H A W) = (J I) * (S R) >> 8 The fractional part is rounded up or down, to the nearest integer, by adding 128 at mult1. This part of the code is modified by the ApplyAerodynamics routine to toggle this rounding behaviour. If the C flag is set on return, then the result in H needs to be incremented, so technically the result is: (H+C A) = (J I) * (S R) >> 16 The multiplication is done using the following algorithm: (J I) * (S R) = (J << 8 + I) * (S << 8 + R) = (J << 8 * S << 8) + (J << 8 * R) + (I * S << 8) + (I * R) = (J * S) << 16 + (J * R) << 8 + (I * S) << 8 + (I * R) This returns a fractional result with the lowest byte containing the fraction. The routine doesn't care about the very lowest byte, so it actually calculates the following, effectively shifting the above to the right by 8 places and dropping the (I * R) part: (J I) * (S R) = (J * S) << 8 + (J * R) + (I * S) before adding 128 to round the result up or down.
Arguments: (S R) An unsigned 16-bit number (J I) An unsigned 16-bit number
Returns: (H A) The result of the multiplication C flag If set, H should be incremented by the caller to get the correct result
.Multiply16x16 LDX J \ Set (A V) = X * Y LDY R \ = J * R JSR Multiply8x8 STA G \ Set (G A) = (A V) LDA V \ = J * R CLC \ Clear the C flag for the following addition .mult1 ADC #128 \ Set (G W) = (G A) + 128 STA W \ \ starting with the low bytes \ \ The ADC #128 instruction gets modified by the \ ApplyAerodynamics routine as follows: \ \ * ADC #0 disables the rounding \ \ * ADC #128 enables the rounding BCC mult2 \ And then the high bytes, so we now have: INC G \ \ (G W) = (J * R) + 128 .mult2 LDX S \ Set (A V) = X * Y LDY J \ = S * J JSR Multiply8x8 STA H \ Set (H A) = (A V) LDA V \ = S * J \ We now do the following addition: \ \ (H G W) = (H A) << 8 + (G W) \ = (H A 0) + (G W) \ \ We don't need to do the lowest byte, as W + 0 is just \ W again, so we can just do the following: \ \ (H G) = (H A) + G \ \ and then keep W as the lowest significant byte of the \ result CLC \ Set (H G) = (H A) + G ADC G \ STA G \ starting with the low bytes BCC mult3 \ And then the high bytes, so we now have: INC H \ \ (H G) = (H A) + G \ \ which is the same as: \ \ (H G W) = (H A) << 8 + (G W) \ = (S * J) << 8 + (J * R) + 128 .mult3 LDX S \ Set (A V) = X * Y LDY I \ = S * I JSR Multiply8x8 STA T \ Set (T A) = (A V) LDA V \ = S * I \ We now do the following addition: \ \ (H A W) = (T A) + (H G W) \ \ though we don't actually do the highest byte, but \ instead return the C flag depending on whether the \ addition of the middle bytes overflowed (in which case \ the highest byte in H needs incrementing, a task we \ leave to the caller) CLC \ Set (A W) = (T A) + (G W) ADC W \ STA W \ starting with the low bytes LDA T \ And then the high bytes, so we now have: ADC G \ \ (H A W) = (T A) + (H G W) \ = (S * I) + (S * J) << 8 + (J * R) + 128 \ \ which is the result we need. The C flag is set if the \ last addition overflowed, in which case H needs to be \ incremented by the caller to get the final result RTS \ Return from the subroutine