Mudanças entre as edições de "Timing"
De WikiLICC
m |
m |
||
(7 revisões intermediárias pelo mesmo usuário não estão sendo mostradas) | |||
Linha 1: | Linha 1: | ||
Testando vetorização: | Testando vetorização: | ||
− | |||
− | |||
! http://goparallel.sourceforge.net/optimizing-loops-vectorization/ | ! http://goparallel.sourceforge.net/optimizing-loops-vectorization/ | ||
program Vectorization | program Vectorization | ||
use portlib | use portlib | ||
real(4),dimension(:),allocatable :: x,y,z | real(4),dimension(:),allocatable :: x,y,z | ||
− | integer :: len= | + | integer :: len=150*1024*1024 ! 154 MiB=150MB |
real(4) :: timing | real(4) :: timing | ||
Linha 23: | Linha 21: | ||
end program | end program | ||
− | * | + | * Memory: using performance monitor from windows |
− | + | Maior problema alocável: 150MiB * 3*4 = 1.75GiB = 1.88GB | |
− | Debug ( | + | |
− | + | real(4) = 1.85 GB | |
+ | real(8) = 3.69 GB | ||
+ | real(16) = 7.39 GB | ||
+ | |||
+ | |||
+ | * Results real(4): | ||
+ | Debug (x32) 2.13 s | ||
+ | Debug (x64) 2.00 s | ||
+ | Release (x32) /O2 0.143 s | ||
+ | Release (x64) /O2 0.140 s <========= | ||
+ | |||
+ | Release(x64) | ||
+ | Threshold for vectorization 0 0.140 s | ||
+ | Threshold for parallelization 0 0.140 s | ||
+ | /Qvec- 0.909 s | ||
+ | /Qvec- /Qparallelization 0.171 s usa 8 processors | ||
+ | Inline directive 0.145 s | ||
+ | /Ob1 use 4 processors | ||
+ | /Qvec- /Qparallelization 0.232 s | ||
+ | |||
+ | * Results real(8): | ||
+ | Release (x64) 0.461 s | ||
+ | /Qvec- 0.911 s | ||
+ | /Qvec- /Qparallelization 0.356 s | ||
+ | /Qparallelization 0.342 s <========== | ||
+ | |||
+ | |||
+ | * real(16): sloooow | ||
+ | Release (x64) 7.00 s | ||
+ | /Qvec- 7.02 s | ||
+ | /Qvec- /Qparallelization 1.75 s | ||
+ | /Qparallelization 1.75 s |
Edição atual tal como às 02h22min de 20 de junho de 2012
Testando vetorização:
! http://goparallel.sourceforge.net/optimizing-loops-vectorization/ program Vectorization use portlib real(4),dimension(:),allocatable :: x,y,z integer :: len=150*1024*1024 ! 154 MiB=150MB real(4) :: timing allocate( x(len) ,stat=ierr) allocate( y(len) ,stat=ierr) allocate( z(len) ,stat=ierr) do j=1,10 timing = secnds(0.0) do i=1,len z(i)=sqrt(x(i))+sqrt(y(i)) end do timing = secnds(timing)*1000 print *,' Timing =',timing,'/1000 s' end do end program
- Memory: using performance monitor from windows
Maior problema alocável: 150MiB * 3*4 = 1.75GiB = 1.88GB
real(4) = 1.85 GB real(8) = 3.69 GB real(16) = 7.39 GB
- Results real(4):
Debug (x32) 2.13 s Debug (x64) 2.00 s Release (x32) /O2 0.143 s Release (x64) /O2 0.140 s <=========
Release(x64) Threshold for vectorization 0 0.140 s Threshold for parallelization 0 0.140 s /Qvec- 0.909 s /Qvec- /Qparallelization 0.171 s usa 8 processors Inline directive 0.145 s /Ob1 use 4 processors /Qvec- /Qparallelization 0.232 s
- Results real(8):
Release (x64) 0.461 s /Qvec- 0.911 s /Qvec- /Qparallelization 0.356 s /Qparallelization 0.342 s <==========
- real(16): sloooow
Release (x64) 7.00 s /Qvec- 7.02 s /Qvec- /Qparallelization 1.75 s /Qparallelization 1.75 s