Mudanças entre as edições de "Timing"
De WikiLICC
m |
m |
||
Linha 41: | Linha 41: | ||
/Qvec- /Qparallelization 0.171 s usa 8 processors | /Qvec- /Qparallelization 0.171 s usa 8 processors | ||
Inline directive 0.145 s | Inline directive 0.145 s | ||
− | /Ob1 | + | /Ob1 use 4 processors |
/Qvec- /Qparallelization 0.232 s | /Qvec- /Qparallelization 0.232 s | ||
− | * | + | * Results real(8): |
− | Release (x64) 0.461 s | + | Release (x64) 0.461 s |
− | /Qvec- 0. | + | /Qvec- 0.911 s |
− | /Qvec- /Qparallelization 0.356 s | + | /Qvec- /Qparallelization 0.356 s |
+ | /Qparallelization 0.342 s <========== | ||
− | * real(16): | + | * real(16): sloooow |
Release (x64) 7.00 s | Release (x64) 7.00 s | ||
− | /Qvec- | + | /Qvec- 7.02 s |
/Qvec- /Qparallelization 1.75 s | /Qvec- /Qparallelization 1.75 s | ||
/Qparallelization 1.75 s | /Qparallelization 1.75 s |
Edição atual tal como às 02h22min de 20 de junho de 2012
Testando vetorização:
! http://goparallel.sourceforge.net/optimizing-loops-vectorization/ program Vectorization use portlib real(4),dimension(:),allocatable :: x,y,z integer :: len=150*1024*1024 ! 154 MiB=150MB real(4) :: timing allocate( x(len) ,stat=ierr) allocate( y(len) ,stat=ierr) allocate( z(len) ,stat=ierr) do j=1,10 timing = secnds(0.0) do i=1,len z(i)=sqrt(x(i))+sqrt(y(i)) end do timing = secnds(timing)*1000 print *,' Timing =',timing,'/1000 s' end do end program
- Memory: using performance monitor from windows
Maior problema alocável: 150MiB * 3*4 = 1.75GiB = 1.88GB
real(4) = 1.85 GB real(8) = 3.69 GB real(16) = 7.39 GB
- Results real(4):
Debug (x32) 2.13 s Debug (x64) 2.00 s Release (x32) /O2 0.143 s Release (x64) /O2 0.140 s <=========
Release(x64) Threshold for vectorization 0 0.140 s Threshold for parallelization 0 0.140 s /Qvec- 0.909 s /Qvec- /Qparallelization 0.171 s usa 8 processors Inline directive 0.145 s /Ob1 use 4 processors /Qvec- /Qparallelization 0.232 s
- Results real(8):
Release (x64) 0.461 s /Qvec- 0.911 s /Qvec- /Qparallelization 0.356 s /Qparallelization 0.342 s <==========
- real(16): sloooow
Release (x64) 7.00 s /Qvec- 7.02 s /Qvec- /Qparallelization 1.75 s /Qparallelization 1.75 s