Modular Inverse Algorithms without Multiplications for Cryptographic Applications



Download 288.56 Kb.
Page8/11
Date28.01.2017
Size288.56 Kb.
#9598
1   2   3   4   5   6   7   8   9   10   11

4.Simulation Test Results


The simulation code was written in C, developed in MS Visual Studio 6. It is available at: http://www.hars.us/SW/ModInv.c. GMP Version 4.1.2, the GNU multi precision arithmetic library [9] was used for the long integer operations and for verifying the results. It is linked as an MS Windows DLL, available also at http://www.hars.us/SW/gmp-dll.zip.

We executed 1 million calls of each of the many variants of the modular inverse algorithms with 14 different lengths in the range of 16…1024 bit random inputs, so the experimental complexity results are expected to be accurate within 0.1…0.3% (Central Limit Theorem) at every operand length. The performed operations and their costs were counted separately for different kind of operations. The table at the end of the paper contains the binary costs of the additions and shifts the corresponding modular inverse algorithms performed, and the number of iterations and the number of shifts with the most frequent lengths. (Multiple shifts are combined together.) The computed curves fit to the data with less than 1% error at any operand length.

The right-shift algorithms are the slowest, because they halve two auxiliary variables (R, S) and if they happen to be odd, m is added or subtracted first, to make them even for the halving. Theoretical arguments and also our computational experiments showed that they are too slow at digit serial arithmetic. They were included in the discussions mainly, because there are surprisingly many systems deployed using some variant of the right-shift algorithm, although others are much better.

The addition steps are not needed in the left-shift- or in the shifting Euclidean algorithms. In all three groups of algorithms the length of U and V decreases bit-by-bit in each iteration, and in the left-shift- and shifting Euclidean algorithms the length of R and S increases steadily from 1. In the right shift case they get very soon as long as m, except in the delayed halving variant. In the average, the changing lengths roughly halve the work on those variables. Also, the necessary additions of m in the original right-shift algorithms prevent aggregation of the shift operations of R and S. On the other hand, in the other algorithms (including the delayed halving right shift algorithm) we can first determine by how many bits we have to shift all together in that phase. In the left-shift algorithms, dependent on the relative magnitude of u and v we need only one or two shifts by multiple bits, in the shifting Euclidean algorithm only one. This shift aggregation saves work at longer shifts than the most common lengths of 1 or 2.



On the other hand, the optimum shift lengths in the left-shift- and shifting Euclidean algorithms are only estimated from the MS bits. They are sometimes wrong, while in the right-shift algorithm only the LS bits play a role, so the optimum shift lengths can always be found exactly. Accordingly, the right-shift algorithms perform slightly fewer iterations (8.6…10%), but the large savings in additions in the other algorithms offset these savings.

4.1Software Running Time Comparisons


We did not measure execution times of SW implementations, because of the following reasons:

  1. The results are very much dependent on the characteristics of the hardware platforms (word length, instruction timings, available parallel instructions, length and function of the instruction pipeline, processor vs. memory speed, cache size and speed, number of levels of cache memory, page fault behavior, etc.).

  2. The results also depend on the operating system (multitasking, background applications, virtual/paging memory handling, etc.).

  3. The results are dependent on the code, the programming language and the compiler. For example, GMP [9] uses hand optimized assembler macros, and any other SW written in a higher level language is necessarily disadvantaged, like at handling carries.

In earlier publications running time measurements were reported, like in [5] Jebelean gave software execution time measurements of GCD algorithms on a DEC computer of RISC architecture. Our measurements on a 3 GHz Intel Pentium PC running Windows XP gave drastically different speed ratios. This large uncertainty was the reason why we decided to count the number of specific operations and sum up their time consumption dependent on the operand lengths, instead of the much easier running time measurements. This way the actual SW running time can be well estimated on many different computational platforms.

4.2Notes on the Simulation Results


  • The number of the different UV shifts, together, is the number of iterations, since there is one combined shift in each iteration.

  • In the left-shift algorithms the sum of RS shifts is larger than the number of iterations, because some shifts may cause the relationship between u and v to change, and in this case there are 2 shifts in one iteration.

  • In [14] there are evidences cited that the binary right shift GCD algorithm performs A·log 2||m|| iterations, with A = 1.0185… The RS1 algorithm performs the same number of iterations as the binary right shift GCD algorithm. Our experiments gave a very similar (only 0.2% smaller) result: A' = 0.7045/log 2 = 1.0164…

In the table at the end of the paper we listed the coefficients of the dominant terms of the best fit polynomials to the time consumption of the algorithms, in 3 typical computational models:

1. Shifts execute in a constant number of clock cycles

Algorithm LS3 is the fastest (0.6662 n2), followed by SE3 (0.6750 n2), with only a 1.3% lag. The best right-shift algorithm is RSDH+−, which is 1.66 times slower (1.1086 n2).



2. Shifts are 4 times faster than add/subtracts

Algorithm SE3 is the fastest (0.8114 n2), followed by LS3 (0.9043 n2), within 14%. The best right-shift algorithm (RSDH+−) is 1.71 times slower (1.3858 n2).



3. Shifts and add/subtracts take the same time

Again, algorithm SE3 is the fastest (1.2176 n2), followed by SE (1.3904 n2), within 14%. The best right-shift algorithm (RSDH+−) is 2.37 times slower (2.8804 n2).

Interestingly the Plus-Minus algorithm RS+−, which only assures that U or V are reduced by at least 2 bits, performs fewer iterations, but the overall running time is not improved. When R and S are also handled this way, the running time improves. It shows that speeding up the (R,S) halving steps is more important than speeding up the (U,V) reduction steps, because the later reduction steps operate on diminishing length numbers, while the (R,S) halving works mostly on more costly, full length numbers.


Directory: Papers
Papers -> From Warfighters to Crimefighters: The Origins of Domestic Police Militarization
Papers -> The Tragedy of Overfishing and Possible Solutions Stephanie Bellotti
Papers -> Prospects for Basic Income in Developing Countries: a comparative Analysis of Welfare Regimes in the South
Papers -> Weather regime transitions and the interannual variability of the North Atlantic Oscillation. Part I: a likely connection
Papers -> Fast Truncated Multiplication for Cryptographic Applications
Papers -> Reflections on the Industrial Revolution in Britain: William Blake and J. M. W. Turner
Papers -> This is the first tpb on this product
Papers -> Basic aspects of hurricanes for technology faculty in the United States
Papers -> Title Software based Remote Attestation: measuring integrity of user applications and kernels Authors

Download 288.56 Kb.

Share with your friends:
1   2   3   4   5   6   7   8   9   10   11




The database is protected by copyright ©ua.originaldll.com 2024
send message

    Main page