HP/DEC/Compaq

Alpha (AXP) benchmarking

Winnili
Who joined Sept. 15, 2012, 5:37 a.m.
and authored 178 notes

Wrote the following at April 28, 2013, 2:12 a.m...

Anyone interested in participating? In terms of concrete benchmarks, I'm talking about a combination of SSL and NBench for Digital/Tru64 (a.k.a. OSF/1) UNIX and the fairly well-known ‘ VUPS ’ and Pi-calculation benchmarks for VMS AXP.

Mostly relevant, from a performance perspective, would be DECchip 21164 / EV5 (incl. e.g. 21164A/EV56), 21264 / EV6 (incl. e.g. 21264A/EV67, 21264B/EV68AL and 21264C/EV68CB) and 21364 / EV7 , etc. results. Most emulators nowadays are at least 21164/EV5 and upwards, it seems.

Make sure to mention what kind of system you benchmarked. For instance, the benchmarks in this thread were performed on a virtual quad-processor 833-MHz DECchip 21264B/EV68AL AlphaServer ES40 with 32 Gbytes RAM running Compaq Tru64 UNIX V5.1B (Rev. 2650) and HP OpenVMS AXP V8.4 (vanilla, without SYSGEN, SYSUAF, etc. tuning of quotas and such).

To start with the SSL MD5 and RSA speed tests, with my own results and in my case it's OpenSSL 0.9.6g as prepackaged with Tru64 UNIX V5.1B:

Code:

  $ openssl speed md5
  
  Doing md5 for 3s on 8 size blocks: 6324114 md5's in 2.98s
  
  Doing md5 for 3s on 64 size blocks: 3383849 md5's in 3.00s
  
  Doing md5 for 3s on 256 size blocks: 1365261 md5's in 3.02s
  
  Doing md5 for 3s on 1024 size blocks: 481872 md5's in 3.00s
  
  Doing md5 for 3s on 8192 size blocks: 68543 md5's in 3.00s
  
  OpenSSL 0.9.6g [engine] 9 Aug 2002
  
  built on: Thu Aug 15 02:59:23 EDT 2002
  
  options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,4,long) blowfish(idx)
  
  compiler: cc -DTHREADS -DDSO_DLFCN -DHAVE_DLFCN_H -DNO_ASM -DNO_IDEA -DNO_RC5 -DNO_HW_KEYCLIENT -pthread -std1 -O4 -readonly_strings
  
  The 'numbers' are in 1000s of bytes per second processed.
  
  type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
  
  md5              16958.52k    72188.78k   115858.61k   164478.98k 187168.09k

Code:

  $ openssl speed rsa
  
  Doing 512 bit private rsa's for 10s: 14981 512 bit private RSA's in 10.00s
  
  Doing 512 bit public rsa's for 10s: 140036 512 bit public RSA's in 10.00s
  
  Doing 1024 bit private rsa's for 10s: 3358 1024 bit private RSA's in 10.02s
  
  Doing 1024 bit public rsa's for 10s: 20731 1024 bit public RSA's in 10.00s
  
  Doing 2048 bit private rsa's for 10s: 589 2048 bit private RSA's in 10.00s
  
  Doing 2048 bit public rsa's for 10s: 9983 2048 bit public RSA's in 10.00s
  
  Doing 4096 bit private rsa's for 10s: 101 4096 bit private RSA's in 10.03s
  
  Doing 4096 bit public rsa's for 10s: 7048 4096 bit public RSA's in 10.00s
  
  OpenSSL 0.9.6g [engine] 9 Aug 2002
  
  built on: Thu Aug 15 02:59:23 EDT 2002
  
  options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,4,long) blowfish(idx)
  
  compiler: cc -DTHREADS -DDSO_DLFCN -DHAVE_DLFCN_H -DNO_ASM -DNO_IDEA -DNO_RC5 -DNO_HW_KEYCLIENT -pthread -std1 -O4 -readonly_strings
  
  sign    verify    sign/s verify/s
  
  rsa  512 bits   0.0007s   0.0001s   1498.1  14003.6
  
  rsa 1024 bits   0.0030s   0.0005s    335.2   2073.1
  
  rsa 2048 bits   0.0170s   0.0010s     58.9    998.3
  
  rsa 4096 bits   0.0993s   0.0014s     10.1    704.8

NBench is based on an old BYTE magazine benchmark; the sources are available and can be found here . Build with DEC/Compaq C (“cc”) under Digital/Tru64 UNIX, you'll have to edit the makefile. If not (like instead with GNU C, in case you may — for whatever reason — not have anything other than that at your disposal), please do specify. Once built successfully, simply run with or without “-v” flag like I did below:

Code:

  $ ./nbench
  
  BYTEmark* Native Mode Benchmark ver. 2 (10/95)
  
  Index-split by Andrew D. Balsa (11/97)
  
  Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
  
  TEST                : Iterations/sec.  : Old Index   : New Index
  
  :                  : Pentium 90* : AMD K6/233*
  
  --------------------:------------------:-------------:------------
  
  NUMERIC SORT        :          553.74  :      14.20  :       4.66
  
  STRING SORT         :          82.289  :      36.77  :       5.69
  
  BITFIELD            :       3.441e+08  :      59.03  :      12.33
  
  FP EMULATION        :          30.498  :      14.63  :       3.38
  
  FOURIER             :           10204  :      11.61  :       6.52
  
  ASSIGNMENT          :          11.731  :      44.64  :      11.58
  
  IDEA                :          2884.7  :      44.12  :      13.10
  
  HUFFMAN             :          953.06  :      26.43  :       8.44
  
  NEURAL NET          :          6.3537  :      10.21  :       4.29
  
  LU DECOMPOSITION    :          220.99  :      11.45  :       8.27
  
  ==========================ORIGINAL BYTEMARK RESULTS==========================
  
  INTEGER INDEX       : 30.305
  
  FLOATING-POINT INDEX: 11.068
  
  Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
  
  ==============================LINUX DATA BELOW===============================
  
  CPU                 :
  
  L2 Cache            :
  
  OS                  : OSF1 V5.1
  
  C compiler          : cc
  
  libc                :
  
  MEMORY INDEX        : 9.331
  
  INTEGER INDEX       : 6.460
  
  FLOATING-POINT INDEX: 6.139
  
  Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
  
  * Trademarks are property of their respective holder.

(I only configured one CPU for this test, since it shouldn't be SMP/SMT optimized anyway.)

Now for the VUPS measurement, here's the DCL procedure source:

Code:

  $! ------------------------------------[TOF]------------------------------------
  
  $! CALCULATE_VUPS
  
  $! Use at your own risk.
  
  $!
  
  $ SET NOON
  
  $ cpu_multiplier = 10 ! VAX = 10 - Alpha/AXP = 40
  
  $ cpu_round_add = 1 ! VAX = 1 - Alpha/AXP = 9
  
  $ cpu_round_divide = cpu_round_add + 1
  
  $ init_counter = cpu_multiplier * 525
  
  $ speed_factor = 1 ! to increase no. of loops on fast CPUs
  
  $ 9$:
  
  $ init_loop_maximum = 205 * speed_factor
  
  $ start_cputime = f$getjpi(0,"CPUTIM")
  
  $ loop_index = 0
  
  $ 10$:
  
  $ loop_index = loop_index + 1
  
  $ IF loop_index .NE. init_loop_maximum THEN GOTO 10$
  
  $ end_cputime = f$getjpi(0,"CPUTIM")
  
  $ IF end_cputime .LE. start_cputime + 1 ! not enough clock-ticks = CPU too fast
  
  $ THEN
  
  $ speed_factor = speed_factor + 1 ! increase no. of loops
  
  $ WRITE SYS$OUTPUT "INFO: Preventing endless loop (10$) on fast CPUs"
  
  $ GOTO 9$
  
  $ ENDIF
  
  $ init_vups = ((init_counter / (end_cputime - start_cputime) + -
  
  cpu_round_add) / cpu_round_divide) * cpu_round_divide
  
  $ IF init_vups .LE. 0
  
  $ THEN
  
  $ WRITE SYS$OUTPUT "Calibration error -> exiting (Please report this problem)"
  
  $ SHOW SYMB speed_factor
  
  $ SHOW SYMB init_vups
  
  $ SHOW SYMB init_counter
  
  $ SHOW SYMB end_cputime
  
  $ SHOW SYMB start_cputime
  
  $ SHOW SYMB cpu_multiplier
  
  $ SHOW SYMB cpu_rounding
  
  $ SHOW CPU
  
  $ EXIT
  
  $ ENDIF
  
  $ WRITE SYS$OUTPUT " "
  
  $ loop_maximum = (init_vups * init_loop_maximum) / ( 10 * speed_factor )
  
  $ base_counter = (init_counter * init_vups) / 10
  
  $ vups = 0
  
  $ min_vups = %X7FFFFFFF
  
  $ max_vups = 0
  
  $ avg_vups = 0
  
  $ times_through_loop = 0
  
  $ 20$:
  
  $ start_cputime = f$getjpi(0,"CPUTIM")
  
  $ times_through_loop = times_through_loop + 1
  
  $ loop_index = 0
  
  $ 30$:
  
  $ loop_index = loop_index + 1
  
  $ IF loop_index .NE. loop_maximum THEN GOTO 30$
  
  $ end_cputime = f$getjpi(0,"CPUTIM")
  
  $ IF end_cputime .LE. start_cputime
  
  $ THEN
  
  $ new_vups = 0 ! can not calculate VUPS (CPU too fast)
  
  $ WRITE SYS$OUTPUT "INFO: Loop too fast (20$) - ignoring VUPS data"
  
  $ ELSE
  
  $ new_vups = ((base_counter / (end_cputime - start_cputime) + -
  
  cpu_round_add) / cpu_round_divide) * cpu_round_divide
  
  $ ENDIF
  
  $ IF new_vups .LT. min_vups THEN $ min_vups = new_vups
  
  $ IF new_vups .GT. max_vups THEN $ max_vups = new_vups
  
  $ avg_vups = avg_vups + new_vups
  
  $ IF new_vups .eq. vups THEN GOTO 40$
  
  $ vups = new_vups
  
  $ IF times_through_loop .LE. 5 THEN GOTO 20$
  
  $!! WRITE SYS$OUTPUT "INFO: Preventing endless loop 20$"
  
  $ 40$:
  
  $ vups = avg_vups / times_through_loop
  
  $ write sys$output " Approximate System VUPs Rating : ", -
  
  vups / 10,".", vups - ((vups / 10) * 10), -
  
  " ( min: ", min_vups/10,".", min_vups - ((min_vups / 10) * 10), -
  
  " max: ", max_vups/10,".", max_vups - ((max_vups / 10) * 10), " )"
  
  $ EXIT
  
  $! ------------------------------------[EOF]------------------------------------

Simply execute this (with my results also) like shown below:

Code:

  $! "cpu_multiplier = 10" & "cpu_round_add = 1"
  
  $ @VUPS
  
  Approximate System VUPs Rating : 408.8 ( min: 408.8 max: 408.8 )

Code:

  $! "cpu_multiplier = 40" & "cpu_round_add = 9"
  
  $ @VUPS
  
  Approximate System VUPs Rating : 1600.3 ( min: 1594.0 max: 1603.0 )

Lastly, the Pi-calculation benchmark, assemble the SYS$EXAMPLES:MACRO64$PI.M64 (delivered with VMS) like for instance as follows:

Code:

  $ MACRO /ALPHA_AXP /OBJECT=PI SYS$EXAMPLES:MACRO64$PI
  
  $ LINK PI
  
  $ RUN PI

(You can also find a VMS AXP V7.3-2 and upward compatible compiled and linked executable image, here .)

40,000 digits is the goal, like proposed and used by Migration Specialties. Preferably use this DCL procedure:

Code:

  $ T = F$CVTIME(F$TIME(),,"SECONDOFYEAR")
  
  $ RUN PI
  
  40000
  
  $ T = F$CVTIME(F$TIME(),,"SECONDOFYEAR") - T
  
  $ WRITE SYS$OUTPUT "Computed in ''T' sec"

Execute as such, with my own result below:

Code:

  $ @PI
  
  How many digits do you want to compute? Computing PI with 40000 digits
  
  Computed in 12 sec

See also this resource , as an additional reference.

_________________
:Tezro:

• Offering various remaining systems and parts, several interestingly compatible with both IRIX and OpenVMS ( AXP and I64 );
• Looking for an SGI O3000 IP59 1-GHz MIPS R16000 quad-processor node board (for a Tezro).

urbancamo
Who joined Jan. 13, 2011, 1:33 p.m.
and authored 96 notes

Wrote the following at April 28, 2013, 3:58 a.m...

AlphaServer 1000A

SLAVE$$ @pi
How many digits do you want to compute? Computing PI with 40000 digits

Computed in 189 sec

SLAVE$$ @vups (with multiplier set to 10 for VAX, cpu_round_add set to 1)

Approximate System VUPs Rating : 85.5 ( min: 82.4 max: 87.6 )

SLAVE$$ @vups (with multiplier set to 40for Alpha, cpu_round_add set to 9)

Approximate System VUPs Rating : 343.5 ( min: 341.0 max: 350.0 )

_________________

, Fuel, VAXstation 4000/90 x2, VAXstation 4000/60, VAXstation 4000/VLC x2, AlphaServer 1000A, DEC AXP 3000/600 (desktop), DEC AXP 3000/600 x2 (rackmount), DEC AXP 3000/800 (rackmount), AlphaServer 300 4/266, DEC GIGI, Sun Ultra 5, HP ZX6000, DECstation 5000/240, VAXstation 3100s, MVII, Commodore 64 & Flyer, LA75, PP404, Juki 6100, Brother HR10

Winnili
Who joined Sept. 15, 2012, 5:37 a.m.
and authored 178 notes

Wrote the following at April 28, 2013, 4:45 a.m...

Thanks for posting those results. What kind of processor and how much RAM you have in that AlphaServer 1000A? Also, which version of VMS do you run?

You also reminded me that I forgot to run VUPS in those two ways. I just did and updated my original post.

_________________
:Tezro:

urbancamo
Who joined Jan. 13, 2011, 1:33 p.m.
and authored 96 notes

Wrote the following at April 28, 2013, 5:34 a.m...

SLAVE$$ show cpu

System: SLAVE, AlphaServer 1000A 5/300

CPU ownership sets:
Active 0
Configure 0

CPU state sets:
Potential 0
Autostart 0
Powered Down None
Not Present None
Hard Excluded None
Failover None

SLAVE$$ show mem
System Memory Resources on 28-APR-2013 13:32:23.37

Physical Memory Usage (pages): Total Free In Use Modified
Main Memory (1024.00MB) 131072 105564 23285 2223

Extended File Cache (Time of last reset: 28-APR-2013 08:12:18.63)
Allocated (MBytes) 75.44 Maximum size (MBytes) 512.00
Free (MBytes) 0.05 Minimum size (MBytes) 3.12
In use (MBytes) 75.39 Percentage Read I/Os 32%
Read hit rate 23% Write hit rate 0%
Read I/O count 313581 Write I/O count 643384
Read hit count 72620 Write hit count 0
Reads bypassing cache 235898 Writes bypassing cache 639191
Files cached open 607 Files cached closed 673
Vols in Full XFC mode 0 Vols in VIOC Compatible mode 4
Vols in No Caching mode 0 Vols in Perm. No Caching mode 0

SLAVE$$ show sys
OpenVMS V8.3 on node SLAVE 28-APR-2013 13:32:42.29 Uptime 0 05:21:30
Pid Process Name State Pri I/O CPU Page flts Pages
20200201 SWAPPER HIB 16 0 0 00:00:03.44 0 0
20200207 CLUSTER_SERVER HIB 13 14 0 00:00:00.03 83 103
20200208 SHADOW_SERVER HIB 6 8 0 00:00:00.01 63 82
20200209 CONFIGURE HIB 10 12 0 00:00:00.04 40 20
2020020A LANACP HIB 14 83 0 00:00:00.59 122 151
2020020C IPCACP HIB 10 10 0 00:00:00.01 37 50
2020020D ERRFMT HIB 8 464 0 00:00:00.52 111 131
2020020E CACHE_SERVER HIB 16 3 0 00:00:00.01 31 43
2020020F OPCOM HIB 9 1358 0 00:00:01.19 238 50
20200210 AUDIT_SERVER HIB 10 398 0 00:00:00.99 129 160
20200211 JOB_CONTROL HIB 10 1033 0 00:00:00.85 66 96
20200213 QUEUE_MANAGER HIB 10 1126 0 00:00:01.82 163 202
20200214 SECURITY_SERVER HIB 10 148 0 00:00:03.36 443 393
20200215 ACME_SERVER HIB 9 70 0 00:00:00.18 429 452 M

_________________

Winnili
Who joined Sept. 15, 2012, 5:37 a.m.
and authored 178 notes

Wrote the following at May 4, 2013, 7:09 a.m...

Thanks for the additional information.

_________________
:Tezro: