3.2.2Best LeftShift: Algorithm LS3
The plusminus trick does not work with the leftshift algorithm: addition never clears the MS bit. If U and V are close, a subtraction might clear more than one MS bits, otherwise one could try 2U−V and 2V−U for the cases when 2U and V or 2V and U are close. (With the nth bit = 1 other two's power linear combinations, which can be calculated with only shifts, don't help.) Looking at only a few MS bits, one can determine which one of the 3 tested reductions is expected to give the largest length decrease (testing 3 reduction candidates is the reason to call the algorithm LS3). We could often clear extra MS bits this way. In general microprocessors the gain is not much, because computing 2x−y could take 2 instructions instead of one for x−y, but memory load and store steps can still be saved. With hardware for shifted operand fetch the doubling comes for free, giving a larger speedup.
3.3
if (a < m)
{U m; V a;
R 0; S 1;}
else
{V m; U a;
S 0; R 1;}
while (V > 1) {
f U  V
if(sgn(U)=sgn(V))
{U U–(V<
R R–(S<
else
{U U+(V<
R R+(S<
if (U < V)
{U ↔ V; R ↔ S;}
}
if (V = 0) return 0;
if (V < 0) S S;
if (S > m) return S  m;
if (S < 0) return S + m;
return S; // a^{1} mod m

Figure 4. Shifting Euclidean algorithm
 ShiftingEuclidean Modular Inverse: Algorithms SE
The original Euclidean GCD algorithm replaces the larger of the two parameters by subtracting the largest number of times the smaller parameter keeping the result nonnegative: x ← x^{ }−^{ }[x/y]·y. For this we need to calculate the quotient [x/y] and multiply it with y. In this paper we don't deal with algorithms, which perform division or multiplication. However, the Euclidean algorithm works with smaller coefficients q ≤ [x/y], too: x ← x^{ }−^{ }q·y. In particular, we can choose q to be the largest power of 2, such that q = 2^{k} ≤ [x/y]. The reductions can be performed with only shifts and subtractions, and they still clear the most significant bit of x, so the resulting algorithm will terminate in a reasonable number of iterations. It is well known (see [1]), that for random input, in the course of the algorithm, most of the time [x/y] = 1 or 2, so the shifting Euclidean algorithm performs only slightly more iterations than the original, but avoids multiplications and divisions. See Figure 4.
Repeat the above reduction steps until V = 0 or ±1, when U = GCD(m,a). If V = 0, there is no inverse, so we return 0, which is not an inverse of anything. (The pathological cases like m^{ }=^{ }a^{ }=^{ }1 need special handling, but these don't occur in cryptography.)
In the course of the algorithm two auxiliary variables, R and S are kept updated. At termination S is the modular inverse, or the negative of it, within ±m.
3.3.1Justification
The algorithm starts with U = m, V = a, R = 0, S = 1. If a > m, swap (U,V) and (R,S). U always denotes the longer of the just updated U and V. During the course of the algorithm U is decreased, keeping GCD(U,V) = GCD(m,a) true. The algorithm reduces U, swaps with V when U < V, until V = ±1 or 0: U is replaced by U−2^{k}V, with such a k, that reduces the length of U, leading eventually to 0 or ±1, when the iteration can stop. The binary length U is reduced by at least one bit in each iteration, guarantying that the procedure terminates in at most a+m iterations.
At termination of the algorithm either V = 0 (indicating that U = 2^{k}V was beforehand, and so there is no inverse) or V = ±1, otherwise a length reduction was still possible. In the later case 1 = GCD(U,V) = GCD(m,a). Furthermore, the calculations maintain the following two congruencies:
U ≡ Ra mod m, V ≡ Sa mod m. (4)
The weighted difference of the two congruencies in (4) gives U−2^{k}V ≡ (R−2^{k}S)·a mod m, which ensures that at the reduction steps (4) remains true after updating the corresponding variables: U ← U−2^{k}V, R ← R−2^{k}S. As in the proof of correctness of the original extended Euclidean algorithm, we can see that R and S remain less than 2m, so at the end we fix the sign of S to correspond to V, and add or subtract m to make 0 < S < m. Now 1 ≡ Sa mod m, and S is of the right magnitude, so S = a^{−1} mod m. []
3.3.2BestShift Euclidean Modular Inverse: Algorithm SE3
We can employ a similar speedup technique for the shifting Euclidean algorithm as with the leftshift algorithm LS3. If U and 2^{k}V are close, the shiftsubtraction might clear more than one MS bits, otherwise one could try U−2^{k}^{−1}V and U−2^{k}^{+1}V. (With k being the length difference. Other two's power linear combinations cannot clear more MS bits.) Looking at only a few MS bits one can determine, which one of the 3 tested reductions is expected to give the largest (length) decrease. (Testing 3 reduction candidates is the reason to call the algorithm SE3). We could often clear extra MS bits this way. This technique gives about 14% reduction in the number of iterations, and a similar speedup on most computational platforms, because the shift operation takes the same time, regardless of the amount of shift (except, when it is 0).
We have a choice: how to rank the expected reductions. In the SE3 code we picked the largest expected length reduction, because it is the simplest in hardware. Another possibility was to choose the shift amount, which leaves the smallest absolute value result. It is a little more complex, but gives about 0.2% speed increase.
Share with your friends: 