PDSTEIN - compute the eigenvectors of a symmetric tridiagonal
matrix in parallel, using inverse iteration
- SUBROUTINE
PDSTEIN(
- N, D, E, M, W, IBLOCK, ISPLIT, ORFAC, Z, IZ, JZ, DESCZ, WORK, LWORK,
IWORK, LIWORK, IFAIL, ICLUSTR, GAP, INFO )
INTEGER INFO, IZ, JZ, LIWORK, LWORK, M, N DOUBLE PRECISION ORFAC
INTEGER DESCZ( * ), IBLOCK( * ), ICLUSTR( * ), IFAIL( * ), ISPLIT( * ),
IWORK( * ) DOUBLE PRECISION D( * ), E( * ), GAP( * ), W( * ), WORK( * ), Z(
* )
PDSTEIN computes the eigenvectors of a symmetric tridiagonal
matrix in parallel, using inverse iteration. The eigenvectors found
correspond to user specified eigenvalues. PDSTEIN does not orthogonalize
vectors that are on different processes. The extent of orthogonalization is
controlled by the input parameter LWORK. Eigenvectors that are to be
orthogonalized are computed by the same process. PDSTEIN decides on the
allocation of work among the processes and then calls DSTEIN2 (modified
LAPACK routine) on each individual process. If insufficient workspace is
allocated, the expected orthogonalization may not be done.
Note : If the eigenvectors obtained are not orthogonal, increase
LWORK and run the code again.
Notes
=====
Each global data object is described by an associated description
vector. This vector stores the information required to establish the mapping
between an object element and its corresponding process and memory
location.
Let A be a generic term for any 2D block cyclicly distributed
array. Such a global array has an associated description vector DESCA. In
the following comments, the character _ should be read as "of the
global array".
NOTATION STORED IN EXPLANATION
--------------- -------------- --------------------------------------
DTYPE_A(global) DESCA( DTYPE_ )The descriptor type. In this case,
DTYPE_A = 1.
CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
the BLACS process grid A is distribu-
ted over. The context itself is glo-
bal, but the handle (the integer
value) may vary.
M_A (global) DESCA( M_ ) The number of rows in the global
array A.
N_A (global) DESCA( N_ ) The number of columns in the global
array A.
MB_A (global) DESCA( MB_ ) The blocking factor used to distribute
the rows of the array.
NB_A (global) DESCA( NB_ ) The blocking factor used to distribute
the columns of the array.
RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
row of the array A is distributed. CSRC_A (global) DESCA( CSRC_ ) The process
column over which the
first column of the array A is
distributed.
LLD_A (local) DESCA( LLD_ ) The leading dimension of the local
array. LLD_A >= MAX(1,LOCr(M_A)).
Let K be the number of rows or columns of a distributed matrix,
and assume that its process grid has dimension p x q.
LOCr( K ) denotes the number of elements of K that a process would receive if
K were distributed over the p processes of its process column.
Similarly, LOCc( K ) denotes the number of elements of K that a process would
receive if K were distributed over the q processes of its process row.
The values of LOCr() and LOCc() may be determined via a call to the ScaLAPACK
tool function, NUMROC:
LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ). An upper bound for these
quantities may be computed by:
LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A
P = NPROW * NPCOL is the total number of processes
- N (global input) INTEGER
- The order of the tridiagonal matrix T. N >= 0.
- D (global input) DOUBLE PRECISION
array, dimension (N)
- The n diagonal elements of the tridiagonal matrix T.
- E (global input) DOUBLE PRECISION
array, dimension (N-1)
- The (n-1) off-diagonal elements of the tridiagonal matrix T.
- M (global input) INTEGER
- The total number of eigenvectors to be found. 0 <= M <= N.
- W (global input/global output)
DOUBLE PRECISION array, dim (M)
- On input, the first M elements of W contain all the eigenvalues for which
eigenvectors are to be computed. The eigenvalues should be grouped by
split-off block and ordered from smallest to largest within the block (The
output array W from PDSTEBZ with ORDER='b' is expected here). This array
should be replicated on all processes. On output, the first M elements
contain the input eigenvalues in ascending order.
Note : To obtain orthogonal vectors, it is best if eigenvalues
are computed to highest accuracy ( this can be done by setting ABSTOL to
the underflow threshold = DLAMCH('U') --- ABSTOL is an input parameter
to PDSTEBZ )
- IBLOCK (global input)
INTEGER array, dimension (N)
- The submatrix indices associated with the corresponding eigenvalues in W
-- 1 for eigenvalues belonging to the first submatrix from the top, 2 for
those belonging to the second submatrix, etc. (The output array IBLOCK
from PDSTEBZ is expected here).
- ISPLIT (global input)
INTEGER array, dimension (N)
- The splitting points, at which T breaks up into submatrices. The first
submatrix consists of rows/columns 1 to ISPLIT(1), the second of
rows/columns ISPLIT(1)+1 through ISPLIT(2), etc., and the NSPLIT-th
consists of rows/columns ISPLIT(NSPLIT-1)+1 through ISPLIT(NSPLIT)=N (The
output array ISPLIT from PDSTEBZ is expected here.)
- ORFAC (global input) DOUBLE
PRECISION
- ORFAC specifies which eigenvectors should be orthogonalized. Eigenvectors
that correspond to eigenvalues which are within ORFAC*||T|| of each other
are to be orthogonalized. However, if the workspace is insufficient (see
LWORK), this tolerance may be decreased until all eigenvectors to be
orthogonalized can be stored in one process. No orthogonalization will be
done if ORFAC equals zero. A default value of 10^-3 is used if ORFAC is
negative. ORFAC should be identical on all processes.
- Z (local output) DOUBLE PRECISION
array,
- dimension (DESCZ(DLEN_), N/npcol + NB) Z contains the computed
eigenvectors associated with the specified eigenvalues. Any vector which
fails to converge is set to its current iterate after MAXITS iterations (
See DSTEIN2 ). On output, Z is distributed across the P processes in block
cyclic format.
- IZ (global input) INTEGER
- Z's global row index, which points to the beginning of the submatrix which
is to be operated on.
- JZ (global input) INTEGER
- Z's global column index, which points to the beginning of the submatrix
which is to be operated on.
- DESCZ (global and local
input) INTEGER array of dimension DLEN_.
- The array descriptor for the distributed matrix Z.
- WORK (local workspace/global
output) DOUBLE PRECISION array,
- dimension ( LWORK ) On output, WORK(1) gives a lower bound on the
workspace ( LWORK ) that guarantees the user desired orthogonalization
(see ORFAC). Note that this may overestimate the minimum workspace
needed.
- LWORK (local input)
integer
- LWORK controls the extent of orthogonalization which can be done. The
number of eigenvectors for which storage is allocated on each process is
NVEC = floor(( LWORK- max(5*N,NP00*MQ00) )/N). Eigenvectors corresponding
to eigenvalue clusters of size NVEC - ceil(M/P) + 1 are guaranteed to be
orthogonal ( the orthogonality is similar to that obtained from DSTEIN2).
Note : LWORK must be no smaller than: max(5*N,NP00*MQ00) + ceil(M/P)*N,
and should have the same input value on all processes. It is the minimum
value of LWORK input on different processes that is significant.
If LWORK = -1, then LWORK is global input and a workspace
query is assumed; the routine only calculates the minimum and optimal
size for all work arrays. Each of these values is returned in the first
entry of the corresponding work array, and no error message is issued by
PXERBLA.
- IWORK (local
workspace/global output) INTEGER array,
- dimension ( 3*N+P+1 ) On return, IWORK(1) contains the amount of integer
workspace required. On return, the IWORK(2) through IWORK(P+2) indicate
the eigenvectors computed by each process. Process I computes eigenvectors
indexed IWORK(I+2)+1 thru' IWORK(I+3).
- LIWORK (local input)
INTEGER
- Size of array IWORK. Must be >= 3*N + P + 1
If LIWORK = -1, then LIWORK is global input and a workspace
query is assumed; the routine only calculates the minimum and optimal
size for all work arrays. Each of these values is returned in the first
entry of the corresponding work array, and no error message is issued by
PXERBLA.
- IFAIL (global output)
integer array, dimension (M)
- On normal exit, all elements of IFAIL are zero. If one or more
eigenvectors fail to converge after MAXITS iterations (as in DSTEIN), then
INFO > 0 is returned. If mod(INFO,M+1)>0, then for I=1 to
mod(INFO,M+1), the eigenvector corresponding to the eigenvalue W(IFAIL(I))
failed to converge ( W refers to the array of eigenvalues on output ).
ICLUSTR (global output) integer array, dimension (2*P) This
output array contains indices of eigenvectors corresponding to a cluster
of eigenvalues that could not be orthogonalized due to insufficient
workspace (see LWORK, ORFAC and INFO). Eigenvectors corresponding to
clusters of eigenvalues indexed ICLUSTR(2*I-1) to ICLUSTR(2*I), I = 1 to
INFO/(M+1), could not be orthogonalized due to lack of workspace. Hence
the eigenvectors corresponding to these clusters may not be orthogonal.
ICLUSTR is a zero terminated array --- ( ICLUSTR(2*K).NE.0 .AND.
ICLUSTR(2*K+1).EQ.0 ) if and only if K is the number of clusters.
- GAP (global output) DOUBLE
PRECISION array, dimension (P)
- This output array contains the gap between eigenvalues whose eigenvectors
could not be orthogonalized. The INFO/M output values in this array
correspond to the INFO/(M+1) clusters indicated by the array ICLUSTR. As a
result, the dot product between eigenvectors corresponding to the I^th
cluster may be as high as ( O(n)*macheps ) / GAP(I).
- INFO (global output)
INTEGER
- = 0: successful exit
< 0: If the i-th argument is an array and the j-entry had an illegal
value, then INFO = -(i*100+j), if the i-th argument is a scalar and had an
illegal value, then INFO = -i. < 0 : if INFO = -I, the I-th argument
had an illegal value
> 0 : if mod(INFO,M+1) = I, then I eigenvectors failed to converge in
MAXITS iterations. Their indices are stored in the array IFAIL. if
INFO/(M+1) = I, then eigenvectors corresponding to I clusters of
eigenvalues could not be orthogonalized due to insufficient workspace. The
indices of the clusters are stored in the array ICLUSTR.