Anyone interested in participating? In terms of concrete benchmarks, I'm talking about a combination of
SSL
and
NBench
for Digital/Tru64 (a.k.a. OSF/1) UNIX and the fairly well-known ‘
VUPS
’ and
Pi-calculation
benchmarks for VMS AXP.
Mostly relevant, from a performance perspective, would be DECchip 21164 / EV5 (incl. e.g. 21164A/EV56), 21264 / EV6 (incl. e.g. 21264A/EV67, 21264B/EV68AL and 21264C/EV68CB) and 21364 / EV7 , etc. results. Most emulators nowadays are at least 21164/EV5 and upwards, it seems.
Make sure to mention what kind of system you benchmarked. For instance, the benchmarks in this thread were performed on a virtual quad-processor 833-MHz DECchip 21264B/EV68AL AlphaServer ES40 with 32 Gbytes RAM running Compaq Tru64 UNIX V5.1B (Rev. 2650) and HP OpenVMS AXP V8.4 (vanilla, without SYSGEN, SYSUAF, etc. tuning of quotas and such).
To start with the SSL MD5 and RSA speed tests, with my own results and in my case it's OpenSSL 0.9.6g as prepackaged with Tru64 UNIX V5.1B:
NBench is based on an old BYTE magazine benchmark; the sources are available and can be found here . Build with DEC/Compaq C (“cc”) under Digital/Tru64 UNIX, you'll have to edit the makefile. If not (like instead with GNU C, in case you may — for whatever reason — not have anything other than that at your disposal), please do specify. Once built successfully, simply run with or without “-v” flag like I did below:
(I only configured one CPU for this test, since it shouldn't be SMP/SMT optimized anyway.)
Now for the VUPS measurement, here's the DCL procedure source:
Simply execute this (with my results also) like shown below:
Lastly, the Pi-calculation benchmark, assemble the SYS$EXAMPLES:MACRO64$PI.M64 (delivered with VMS) like for instance as follows:
(You can also find a VMS AXP V7.3-2 and upward compatible compiled and linked executable image, here .)
40,000 digits is the goal, like proposed and used by Migration Specialties. Preferably use this DCL procedure:
Execute as such, with my own result below:
See also this resource , as an additional reference.
Mostly relevant, from a performance perspective, would be DECchip 21164 / EV5 (incl. e.g. 21164A/EV56), 21264 / EV6 (incl. e.g. 21264A/EV67, 21264B/EV68AL and 21264C/EV68CB) and 21364 / EV7 , etc. results. Most emulators nowadays are at least 21164/EV5 and upwards, it seems.
Make sure to mention what kind of system you benchmarked. For instance, the benchmarks in this thread were performed on a virtual quad-processor 833-MHz DECchip 21264B/EV68AL AlphaServer ES40 with 32 Gbytes RAM running Compaq Tru64 UNIX V5.1B (Rev. 2650) and HP OpenVMS AXP V8.4 (vanilla, without SYSGEN, SYSUAF, etc. tuning of quotas and such).
To start with the SSL MD5 and RSA speed tests, with my own results and in my case it's OpenSSL 0.9.6g as prepackaged with Tru64 UNIX V5.1B:
Code:
$ openssl speed md5
Doing md5 for 3s on 8 size blocks: 6324114 md5's in 2.98s
Doing md5 for 3s on 64 size blocks: 3383849 md5's in 3.00s
Doing md5 for 3s on 256 size blocks: 1365261 md5's in 3.02s
Doing md5 for 3s on 1024 size blocks: 481872 md5's in 3.00s
Doing md5 for 3s on 8192 size blocks: 68543 md5's in 3.00s
OpenSSL 0.9.6g [engine] 9 Aug 2002
built on: Thu Aug 15 02:59:23 EDT 2002
options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,4,long) blowfish(idx)
compiler: cc -DTHREADS -DDSO_DLFCN -DHAVE_DLFCN_H -DNO_ASM -DNO_IDEA -DNO_RC5 -DNO_HW_KEYCLIENT -pthread -std1 -O4 -readonly_strings
The 'numbers' are in 1000s of bytes per second processed.
type 8 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 16958.52k 72188.78k 115858.61k 164478.98k 187168.09k
Doing md5 for 3s on 8 size blocks: 6324114 md5's in 2.98s
Doing md5 for 3s on 64 size blocks: 3383849 md5's in 3.00s
Doing md5 for 3s on 256 size blocks: 1365261 md5's in 3.02s
Doing md5 for 3s on 1024 size blocks: 481872 md5's in 3.00s
Doing md5 for 3s on 8192 size blocks: 68543 md5's in 3.00s
OpenSSL 0.9.6g [engine] 9 Aug 2002
built on: Thu Aug 15 02:59:23 EDT 2002
options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,4,long) blowfish(idx)
compiler: cc -DTHREADS -DDSO_DLFCN -DHAVE_DLFCN_H -DNO_ASM -DNO_IDEA -DNO_RC5 -DNO_HW_KEYCLIENT -pthread -std1 -O4 -readonly_strings
The 'numbers' are in 1000s of bytes per second processed.
type 8 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md5 16958.52k 72188.78k 115858.61k 164478.98k 187168.09k
Code:
$ openssl speed rsa
Doing 512 bit private rsa's for 10s: 14981 512 bit private RSA's in 10.00s
Doing 512 bit public rsa's for 10s: 140036 512 bit public RSA's in 10.00s
Doing 1024 bit private rsa's for 10s: 3358 1024 bit private RSA's in 10.02s
Doing 1024 bit public rsa's for 10s: 20731 1024 bit public RSA's in 10.00s
Doing 2048 bit private rsa's for 10s: 589 2048 bit private RSA's in 10.00s
Doing 2048 bit public rsa's for 10s: 9983 2048 bit public RSA's in 10.00s
Doing 4096 bit private rsa's for 10s: 101 4096 bit private RSA's in 10.03s
Doing 4096 bit public rsa's for 10s: 7048 4096 bit public RSA's in 10.00s
OpenSSL 0.9.6g [engine] 9 Aug 2002
built on: Thu Aug 15 02:59:23 EDT 2002
options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,4,long) blowfish(idx)
compiler: cc -DTHREADS -DDSO_DLFCN -DHAVE_DLFCN_H -DNO_ASM -DNO_IDEA -DNO_RC5 -DNO_HW_KEYCLIENT -pthread -std1 -O4 -readonly_strings
sign verify sign/s verify/s
rsa 512 bits 0.0007s 0.0001s 1498.1 14003.6
rsa 1024 bits 0.0030s 0.0005s 335.2 2073.1
rsa 2048 bits 0.0170s 0.0010s 58.9 998.3
rsa 4096 bits 0.0993s 0.0014s 10.1 704.8
Doing 512 bit private rsa's for 10s: 14981 512 bit private RSA's in 10.00s
Doing 512 bit public rsa's for 10s: 140036 512 bit public RSA's in 10.00s
Doing 1024 bit private rsa's for 10s: 3358 1024 bit private RSA's in 10.02s
Doing 1024 bit public rsa's for 10s: 20731 1024 bit public RSA's in 10.00s
Doing 2048 bit private rsa's for 10s: 589 2048 bit private RSA's in 10.00s
Doing 2048 bit public rsa's for 10s: 9983 2048 bit public RSA's in 10.00s
Doing 4096 bit private rsa's for 10s: 101 4096 bit private RSA's in 10.03s
Doing 4096 bit public rsa's for 10s: 7048 4096 bit public RSA's in 10.00s
OpenSSL 0.9.6g [engine] 9 Aug 2002
built on: Thu Aug 15 02:59:23 EDT 2002
options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,4,long) blowfish(idx)
compiler: cc -DTHREADS -DDSO_DLFCN -DHAVE_DLFCN_H -DNO_ASM -DNO_IDEA -DNO_RC5 -DNO_HW_KEYCLIENT -pthread -std1 -O4 -readonly_strings
sign verify sign/s verify/s
rsa 512 bits 0.0007s 0.0001s 1498.1 14003.6
rsa 1024 bits 0.0030s 0.0005s 335.2 2073.1
rsa 2048 bits 0.0170s 0.0010s 58.9 998.3
rsa 4096 bits 0.0993s 0.0014s 10.1 704.8
NBench is based on an old BYTE magazine benchmark; the sources are available and can be found here . Build with DEC/Compaq C (“cc”) under Digital/Tru64 UNIX, you'll have to edit the makefile. If not (like instead with GNU C, in case you may — for whatever reason — not have anything other than that at your disposal), please do specify. Once built successfully, simply run with or without “-v” flag like I did below:
Code:
$ ./nbench
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 553.74 : 14.20 : 4.66
STRING SORT : 82.289 : 36.77 : 5.69
BITFIELD : 3.441e+08 : 59.03 : 12.33
FP EMULATION : 30.498 : 14.63 : 3.38
FOURIER : 10204 : 11.61 : 6.52
ASSIGNMENT : 11.731 : 44.64 : 11.58
IDEA : 2884.7 : 44.12 : 13.10
HUFFMAN : 953.06 : 26.43 : 8.44
NEURAL NET : 6.3537 : 10.21 : 4.29
LU DECOMPOSITION : 220.99 : 11.45 : 8.27
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 30.305
FLOATING-POINT INDEX: 11.068
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU :
L2 Cache :
OS : OSF1 V5.1
C compiler : cc
libc :
MEMORY INDEX : 9.331
INTEGER INDEX : 6.460
FLOATING-POINT INDEX: 6.139
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 553.74 : 14.20 : 4.66
STRING SORT : 82.289 : 36.77 : 5.69
BITFIELD : 3.441e+08 : 59.03 : 12.33
FP EMULATION : 30.498 : 14.63 : 3.38
FOURIER : 10204 : 11.61 : 6.52
ASSIGNMENT : 11.731 : 44.64 : 11.58
IDEA : 2884.7 : 44.12 : 13.10
HUFFMAN : 953.06 : 26.43 : 8.44
NEURAL NET : 6.3537 : 10.21 : 4.29
LU DECOMPOSITION : 220.99 : 11.45 : 8.27
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 30.305
FLOATING-POINT INDEX: 11.068
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU :
L2 Cache :
OS : OSF1 V5.1
C compiler : cc
libc :
MEMORY INDEX : 9.331
INTEGER INDEX : 6.460
FLOATING-POINT INDEX: 6.139
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
(I only configured one CPU for this test, since it shouldn't be SMP/SMT optimized anyway.)
Now for the VUPS measurement, here's the DCL procedure source:
Code:
$! ------------------------------------[TOF]------------------------------------
$! CALCULATE_VUPS
$! Use at your own risk.
$!
$ SET NOON
$ cpu_multiplier = 10 ! VAX = 10 - Alpha/AXP = 40
$ cpu_round_add = 1 ! VAX = 1 - Alpha/AXP = 9
$ cpu_round_divide = cpu_round_add + 1
$ init_counter = cpu_multiplier * 525
$ speed_factor = 1 ! to increase no. of loops on fast CPUs
$ 9$:
$ init_loop_maximum = 205 * speed_factor
$ start_cputime = f$getjpi(0,"CPUTIM")
$ loop_index = 0
$ 10$:
$ loop_index = loop_index + 1
$ IF loop_index .NE. init_loop_maximum THEN GOTO 10$
$ end_cputime = f$getjpi(0,"CPUTIM")
$ IF end_cputime .LE. start_cputime + 1 ! not enough clock-ticks = CPU too fast
$ THEN
$ speed_factor = speed_factor + 1 ! increase no. of loops
$ WRITE SYS$OUTPUT "INFO: Preventing endless loop (10$) on fast CPUs"
$ GOTO 9$
$ ENDIF
$ init_vups = ((init_counter / (end_cputime - start_cputime) + -
cpu_round_add) / cpu_round_divide) * cpu_round_divide
$ IF init_vups .LE. 0
$ THEN
$ WRITE SYS$OUTPUT "Calibration error -> exiting (Please report this problem)"
$ SHOW SYMB speed_factor
$ SHOW SYMB init_vups
$ SHOW SYMB init_counter
$ SHOW SYMB end_cputime
$ SHOW SYMB start_cputime
$ SHOW SYMB cpu_multiplier
$ SHOW SYMB cpu_rounding
$ SHOW CPU
$ EXIT
$ ENDIF
$ WRITE SYS$OUTPUT " "
$ loop_maximum = (init_vups * init_loop_maximum) / ( 10 * speed_factor )
$ base_counter = (init_counter * init_vups) / 10
$ vups = 0
$ min_vups = %X7FFFFFFF
$ max_vups = 0
$ avg_vups = 0
$ times_through_loop = 0
$ 20$:
$ start_cputime = f$getjpi(0,"CPUTIM")
$ times_through_loop = times_through_loop + 1
$ loop_index = 0
$ 30$:
$ loop_index = loop_index + 1
$ IF loop_index .NE. loop_maximum THEN GOTO 30$
$ end_cputime = f$getjpi(0,"CPUTIM")
$ IF end_cputime .LE. start_cputime
$ THEN
$ new_vups = 0 ! can not calculate VUPS (CPU too fast)
$ WRITE SYS$OUTPUT "INFO: Loop too fast (20$) - ignoring VUPS data"
$ ELSE
$ new_vups = ((base_counter / (end_cputime - start_cputime) + -
cpu_round_add) / cpu_round_divide) * cpu_round_divide
$ ENDIF
$ IF new_vups .LT. min_vups THEN $ min_vups = new_vups
$ IF new_vups .GT. max_vups THEN $ max_vups = new_vups
$ avg_vups = avg_vups + new_vups
$ IF new_vups .eq. vups THEN GOTO 40$
$ vups = new_vups
$ IF times_through_loop .LE. 5 THEN GOTO 20$
$!! WRITE SYS$OUTPUT "INFO: Preventing endless loop 20$"
$ 40$:
$ vups = avg_vups / times_through_loop
$ write sys$output " Approximate System VUPs Rating : ", -
vups / 10,".", vups - ((vups / 10) * 10), -
" ( min: ", min_vups/10,".", min_vups - ((min_vups / 10) * 10), -
" max: ", max_vups/10,".", max_vups - ((max_vups / 10) * 10), " )"
$ EXIT
$! ------------------------------------[EOF]------------------------------------
$! CALCULATE_VUPS
$! Use at your own risk.
$!
$ SET NOON
$ cpu_multiplier = 10 ! VAX = 10 - Alpha/AXP = 40
$ cpu_round_add = 1 ! VAX = 1 - Alpha/AXP = 9
$ cpu_round_divide = cpu_round_add + 1
$ init_counter = cpu_multiplier * 525
$ speed_factor = 1 ! to increase no. of loops on fast CPUs
$ 9$:
$ init_loop_maximum = 205 * speed_factor
$ start_cputime = f$getjpi(0,"CPUTIM")
$ loop_index = 0
$ 10$:
$ loop_index = loop_index + 1
$ IF loop_index .NE. init_loop_maximum THEN GOTO 10$
$ end_cputime = f$getjpi(0,"CPUTIM")
$ IF end_cputime .LE. start_cputime + 1 ! not enough clock-ticks = CPU too fast
$ THEN
$ speed_factor = speed_factor + 1 ! increase no. of loops
$ WRITE SYS$OUTPUT "INFO: Preventing endless loop (10$) on fast CPUs"
$ GOTO 9$
$ ENDIF
$ init_vups = ((init_counter / (end_cputime - start_cputime) + -
cpu_round_add) / cpu_round_divide) * cpu_round_divide
$ IF init_vups .LE. 0
$ THEN
$ WRITE SYS$OUTPUT "Calibration error -> exiting (Please report this problem)"
$ SHOW SYMB speed_factor
$ SHOW SYMB init_vups
$ SHOW SYMB init_counter
$ SHOW SYMB end_cputime
$ SHOW SYMB start_cputime
$ SHOW SYMB cpu_multiplier
$ SHOW SYMB cpu_rounding
$ SHOW CPU
$ EXIT
$ ENDIF
$ WRITE SYS$OUTPUT " "
$ loop_maximum = (init_vups * init_loop_maximum) / ( 10 * speed_factor )
$ base_counter = (init_counter * init_vups) / 10
$ vups = 0
$ min_vups = %X7FFFFFFF
$ max_vups = 0
$ avg_vups = 0
$ times_through_loop = 0
$ 20$:
$ start_cputime = f$getjpi(0,"CPUTIM")
$ times_through_loop = times_through_loop + 1
$ loop_index = 0
$ 30$:
$ loop_index = loop_index + 1
$ IF loop_index .NE. loop_maximum THEN GOTO 30$
$ end_cputime = f$getjpi(0,"CPUTIM")
$ IF end_cputime .LE. start_cputime
$ THEN
$ new_vups = 0 ! can not calculate VUPS (CPU too fast)
$ WRITE SYS$OUTPUT "INFO: Loop too fast (20$) - ignoring VUPS data"
$ ELSE
$ new_vups = ((base_counter / (end_cputime - start_cputime) + -
cpu_round_add) / cpu_round_divide) * cpu_round_divide
$ ENDIF
$ IF new_vups .LT. min_vups THEN $ min_vups = new_vups
$ IF new_vups .GT. max_vups THEN $ max_vups = new_vups
$ avg_vups = avg_vups + new_vups
$ IF new_vups .eq. vups THEN GOTO 40$
$ vups = new_vups
$ IF times_through_loop .LE. 5 THEN GOTO 20$
$!! WRITE SYS$OUTPUT "INFO: Preventing endless loop 20$"
$ 40$:
$ vups = avg_vups / times_through_loop
$ write sys$output " Approximate System VUPs Rating : ", -
vups / 10,".", vups - ((vups / 10) * 10), -
" ( min: ", min_vups/10,".", min_vups - ((min_vups / 10) * 10), -
" max: ", max_vups/10,".", max_vups - ((max_vups / 10) * 10), " )"
$ EXIT
$! ------------------------------------[EOF]------------------------------------
Simply execute this (with my results also) like shown below:
Code:
$! "cpu_multiplier = 10" & "cpu_round_add = 1"
$ @VUPS
Approximate System VUPs Rating : 408.8 ( min: 408.8 max: 408.8 )
$ @VUPS
Approximate System VUPs Rating : 408.8 ( min: 408.8 max: 408.8 )
Code:
$! "cpu_multiplier = 40" & "cpu_round_add = 9"
$ @VUPS
Approximate System VUPs Rating : 1600.3 ( min: 1594.0 max: 1603.0 )
$ @VUPS
Approximate System VUPs Rating : 1600.3 ( min: 1594.0 max: 1603.0 )
Lastly, the Pi-calculation benchmark, assemble the SYS$EXAMPLES:MACRO64$PI.M64 (delivered with VMS) like for instance as follows:
Code:
$ MACRO /ALPHA_AXP /OBJECT=PI SYS$EXAMPLES:MACRO64$PI
$ LINK PI
$ RUN PI
$ LINK PI
$ RUN PI
(You can also find a VMS AXP V7.3-2 and upward compatible compiled and linked executable image, here .)
40,000 digits is the goal, like proposed and used by Migration Specialties. Preferably use this DCL procedure:
Code:
$ T = F$CVTIME(F$TIME(),,"SECONDOFYEAR")
$ RUN PI
40000
$ T = F$CVTIME(F$TIME(),,"SECONDOFYEAR") - T
$ WRITE SYS$OUTPUT "Computed in ''T' sec"
$ RUN PI
40000
$ T = F$CVTIME(F$TIME(),,"SECONDOFYEAR") - T
$ WRITE SYS$OUTPUT "Computed in ''T' sec"
Execute as such, with my own result below:
Code:
$ @PI
How many digits do you want to compute? Computing PI with 40000 digits
Computed in 12 sec
How many digits do you want to compute? Computing PI with 40000 digits
Computed in 12 sec
See also this resource , as an additional reference.
_________________
• Offering various remaining systems and parts, several interestingly compatible with both IRIX and OpenVMS ( AXP and I64 );
• Looking for an SGI O3000 IP59 1-GHz MIPS R16000 quad-processor node board (for a Tezro).