Deprecated: Function create_function() is deprecated in /var/www/judft/pm/pmwiki.php on line 458 Deprecated: Function create_function() is deprecated in /var/www/judft/pm/pmwiki.php on line 458 Deprecated: Function create_function() is deprecated in /var/www/judft/pm/pmwiki.php on line 458 |
Deprecated: Function create_function() is deprecated in /var/www/judft/pm/pmwiki.php on line 458 Deprecated: Function create_function() is deprecated in /var/www/judft/pm/pmwiki.php on line 458
juRS is developed at Densities, potentials and Kohn-Sham wave functions are treated on uniform three-dimensional grids which enables a very straighforward parallelization to distributed memory architectures. Special features are the approximation of gradients with the Finite-Difference scheme, in particular for the kinetic energy operator and the accurate modeling of the ionic scattering properties through the Projector Augmented Wave method (P.E.Blöchl,PRB50 17953(1994)). The PAW method allows for a full-potential accuracy and especially enables the calculation of otherwise difficult first-row and transition metal elements (T.Ono et.al. PRB 82,205115(2010)). Although designed for very large systems, juRS supports the usage of k-point sampling in spatial directions with periodic boundary conditions. Solving for the spectrum at a given k-point is independent and therefore parallelizes with close-to-perfect parallel efficiency. However, the special strength of the real-space grid approach is the grid parallelization. Due to the inherent locality of the the effective potential and the relatively short interaction range of the FD-kinetic energy and PAW-potential, the application of the real-space represented Hamiltonian scales linearly with the system size since it does not involve costly Fourier-transforms back and forth between real- and reciprocal space. Furthermore, we can parallelize the grid in a Domain-Decomposition way and reach close to perfect scaling behaviors on massively parallel supercomputers which are equipped with a (at least) three-dimensional inter-node communication network. In order to achieve the results for huge system sizes on reasonable real-time scales, juRS tackles the problem of cubic scaling of (most) DFT algorithms by adding a third level of parallelism. The distributed storage and treatment of bands is, in contrast to the level of k-point parallelization and domain-decomposition, a strongly communicating task. ![]() The code gains speed increasing the degree of parallelization from 1 (no band parallelization) to 32 band sets. At 32 band sets and beyond, the SCF-iteration speed does not grow further but the memory-per-node falls continuously which may be necessary to accommodate the KS wave functions in memory avoiding IO bottlenecks this way. The upper axis of the graph shows the total number of MPI processes that have been running in these calculations. We may observe that the factor 4096 corresponds to the underlying 16x16x16 domain-decomposition of the real-space grids. At the top end of the curve, 300 nodes-per-atom have been running exploiting the capacity of the installed IBM BlueGene/P system to 88% for a relatively modest sized system (875 atoms GeSbTe). |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Forschungszentrum Jülich, D-52425 Jülich | ![]() |
Institute for Advanced Simulation |
![]() |
![]() |
![]() |
Impressum Webmaster |
![]() |
Page last modified:
September 13, 2013 |