Connection between phase retrieval, seriation and ranking

Get Complete Project Material File(s) Now! »

Phase Retrieval for Imaging Problems

Chapter abstract: We study convex relaxation algorithms for phase retrieval on imaging problems. We show that exploiting structural assumptions on the signal and the observations, such as sparsity, smoothness or positivity, can significantly speed-up convergence and im-prove recovery performance. We detail numerical results in molecular imaging experiments simulated using data from the Protein Data Bank (PDB).
The material of this part is based on the following publication:
F. Fogel, I. Waldspurger, A. d’Aspremont, Phase retrieval for imaging problems. To appear in Mathematical Programming Computation.

Introduction

Phase retrieval seeks to reconstruct a complex signal, given a number of observations on the magnitude of linear measurements, i.e., solve
find x
such that (2.1)
jAxj = b
in the variable x 2 Cp, where A 2 Rn p and b 2 Rn. This problem has direct applications in X-ray and crystallography imaging, diffraction imaging, Fourier optics or microscopy for example, in problems where physical limitations mean detectors usually capture the intensity of observations but cannot recover their phase. In what follows, we will focus on problems arising in diffraction imaging, where A is usually a Fourier transform, often composed with one or multiple masks. The Fourier structure, through the FFT, considerably speeds up basic linear operations, which allows us to solve large scale convex relaxations on realistically large imaging problems. We will also observe that in many of the imaging problems we consider, the Fourier transform is very sparse, with known support (we lose the phase but observe the magnitude of Fourier coefficients), which allows us to considerably reduce the size of our convex phase retrieval relaxations.
Because the phase constraint jAxj = b is non-convex, the phase recovery problem (2.1) is non-convex. Several greedy algorithms have been developed (see Gerchberg and Saxton, 1972; Fienup, 1982; Griffin and Lim, 1984; Bauschke et al., 2002, among others), which alternate pro-jections on the range of A and on the non-convex set of vectors y such that jyj = jAxj. While empirical performance is often good, these algorithms can stall in local minima. A convex relax-ation was introduced in (Chai et al., 2011) and (Candes et al., 2015a) (who call it PhaseLift) by observing that jAxj2 is a linear function of X = xx , which is a rank one Hermitian matrix, us-ing the classical lifting argument for non-convex quadratic programs developed in (Shor, 1987; Lovász and Schrijver, 1991). The recovery of x is thus expressed as a rank minimization prob-lem over positive semidefinite Hermitian matrices X satisfying some linear conditions, i.e., a matrix completion problem. This last problem has received a significant amount of attention be-cause of its link to compressed sensing and the NETFLIX collaborative filtering problem. This minimum rank matrix completion problem is approximated by a semidefinite program which has been shown to recover x for several (random) classes of observation operators A (Candes et al., 2013, 2014, 2015a).
On the algorithmic side, (Waldspurger et al., 2015) showed that the phase retrieval problem (2.1) can be reformulated in terms of a single phase variable, which can be read as an extension of the MAXCUT combinatorial graph partitioning problem over the unit complex torus, allowing fast algorithms designed for solving semidefinite relaxations of MAXCUT to be applied to the phase retrieval problem.
On the experimental side, phase recovery is a classical problem in Fourier optics for example (Goodman, 2008), where a diffraction medium takes the place of a lens. This has direct applica-tions in X-ray and crystallography imaging, diffraction imaging or microscopy (Harrison, 1993; Bunk et al., 2007; Johnson et al., 2008; Miao et al., 2008; Dierolf et al., 2010).
Here, we implement and study several efficient convex relaxation algorithms for phase retrieval on imaging problem instances where A is based on a Fourier operator. We show in particu-lar how structural assumptions on the signal and the observations (e.g., sparsity, smoothness, positivity, known support, oversampling, etc.) can be exploited to both speed-up convergence and improve recovery performance. While no experimental data is available from diffraction imaging problems with multiple randomly coded illuminations, we simulate numerical exper-iments of this type using molecular density information from the protein data bank (Berman et al., 2002). Our results show in particular that the convex relaxation is stable and that in some settings, as few as two random illuminations suffice to reconstruct the image.
This chapter is organized as follows. Section 3.4 briefly recalls the structure of some key algo-rithms used in phase retrieval. Section 2.3 describes applications to imaging problems and how structural assumptions can significantly reduce the cost of solving large-scale instances and im-prove recovery performance. Section 4.6 details some numerical experiments while Section 2.6 describes the interface to the numerical library developed for these problems.
Notations
We write Sp (resp. Hp) the cone of symmetric (resp. Hermitian) matrices of dimension p ; S+p (resp. H+p) denotes the set of positive symmetric (resp. Hermitian) matrices. We write k kp the Schatten p-norm of a matrix, that is the p-norm of the vector of its eigenvalues (in particular, k k1 is the spectral norm). We write Ay the (Moore-Penrose) pseudoinverse of a matrix A, and A B the Hadamard (or componentwise) product of the matrices A and B. For x 2 Rp, we write diag(x) the matrix with diagonal x. When X 2 Hp however, diag(X) is the vector containing the diagonal elements of X. For X 2 Hp, X is the Hermitian transpose of X, with X = (X) . refers to the complex modulus. When x is a vector in C , x is the vector with coefficients (jx1j; : : : ; jxpj). Finally, we write b2 the vector with components b2i, i = 1; : : : ; n.

Algorithms

In this section, we briefly recall several basic algorithmic approaches to solve the phase re-trieval problem (2.1). Early methods were all based on extensions of an alternating projection algorithm. However, recent results showed that phase retrieval could be interpreted as a matrix completion problem similar to the NETFLIX problem, a formulation which yields both efficient convex relaxations and recovery guarantees.

Greedy algorithms

The phase retrieval problem (2.1) can be rewritten minimize kAx yk22 (2.2) subject to jyj = b; where we now optimize over both phased observations y 2 Cn and signal x 2 Cp. Several greedy algorithms attempt to solve this problem using variants of alternating projections, one iteration minimizing the quadratic error (the objective of (2.2)), the next normalizing the moduli (to satisfy the constraint). We detail some of the most classical examples in the paragraphs that follow.
The algorithm Gerchberg-Saxton in (Gerchberg and Saxton, 1972) for instance seeks to recon-struct y = Ax and alternates between orthogonal projections on the range of A and normaliza-tion of the magnitudes jy j to match the observations b. The cost per iteration of this method is minimal but convergence (when it happens) is often slow.
Algorithm 5 Gerchberg-Saxton.
Input: An initial y1 2 F,
1: for k = 1; : : : ; N 1
i.e., such that jy1j = b.
do
2: Set
yik+1 = bi (AAyyk)i ; i = 1; : : : ; n: (Gerchberg-Saxton)
j(AAyyk)ij
3: end for Output: yN 2 F.
A classical “input-output” variant, detailed here as algorithm Fienup, introduced in (Fienup, 1982), adds an extra penalization step which usually speeds up convergence and improves re-covery performance when additional information is available on the support of the signal. Over-sampling the Fourier transform forming A in imaging problems usually helps performance as well. Of course, in all these cases, convergence to a global optimum cannot be guaranteed but empirical recovery performance is often quite good.
Algorithm 6 Fienup
Input: An initial y1 2 F, i.e., such that jy1j = b, a parameter > 0.
1: for k = 1; : : : ; N 1 do
2: Set
wi = (AAyyk)i ; i = 1; : : : ; n:
j(AAyyk)ij
3:Set
yik+1 = yik (yik biwi) (Fienup)
4: end for Output: yN 2 F.

PhaseLift: semidefinite relaxation in signal

Using a classical lifting argument by (Shor, 1987), and writing
jai xj2 = b2i () Tr(aiai xx ) = b2i
(Chai et al., 2011; Candes et al., 2015a) reformulate the phase recovery problem (2.1) as a matrix completion problem, written
minimize Rank(X)
subject to Tr(aiai X) = bi2; i = 1; : : : ; n
X 0;
in the variable X 2 Hp, where X = xx when exact recovery occurs. This last problem can be relaxed as
minimize Tr(X)
subject to Tr(aiai X) = bi2; i = 1; : : : ; n (PhaseLift)
X 0;
which is a semidefinite program (called PhaseLift by Candes et al. (2015a)) in the variable X 2 Hp. This problem is solved in (Candes et al., 2015a) using first-order algorithms implemented in (Becker et al., 2011). This semidefinite relaxation has been shown to recover the true signal x exactly for several classes of observation operators A (Candes et al., 2015a, 2013, 2014).

PhaseCut: semidefinite relaxation in phase

As in (Waldspurger et al., 2015) we can rewrite the phase reconstruction problem (2.1) in terms of a phase variable u (such that juj = 1) instead of the signal x. In the noiseless case, we then write the constraint jAxj = b as Ax = diag(b)u, where u 2 Cn is a phase vector, satisfying juij = 1 for i = 1; : : : ; n, so problem (2.1) becomes
minimize kAx diag(b)uk22 (2.3)
subject to juij = 1
where we optimize over both phase u 2 Cn and signal x 2 Cp. While the objective of this last problem is jointly convex in (x; u), the phase constraint juij = 1 is not.
Now, given the phase, signal reconstruction is a simple least squares problem, i.e., given u we obtain x as x = Ay diag(b)u (2.4) where Ay is the pseudo inverse of A. Replacing x by its value in problem (2.3), the phase recovery problem becomes
minimize u M u
subject to (2.5)
juij = 1; i = 1; : : : ; n;
in the variable u 2 Cn, where the Hermitian matrix
M = diag(b)(I AAy) diag(b)
is positive semidefinite. This problem is non-convex in the phase variable u. Waldspurger et al. (2015) detailed greedy algorithm Greedy to locally optimize (2.5) in the phase variable.
Algorithm 7 Greedy algorithm in phase.
Input: An initial u 2 Cn such that juij = 1, i = 1; : : : ; n. An integer N > 1.
1: for k = 1; : : : ; N do
2: for i = 1; : : : ; n do
3: Set
Mjiuj
ui = Pj6=iMjiuj (Greedy)
Pj6=i
4: end for
5: end for
Output: u 2 Cn such that juij = 1, i = 1; : : : ; n.
A convex relaxation to (2.5) was also derived in (Waldspurger et al., 2015) using the classi-cal lifting argument for non-convex quadratic programs developed in (Shor, 1987; Lovász and Schrijver, 1991). This relaxation is written
minimize Tr(U M)
(PhaseCut)
subject to diag(U) = 1; U 0;
which is a semidefinite program (SDP) in the matrix U 2 Hn. This problem has a structure similar to the classical MAXCUT relaxation and instances of reasonable size can be solved using specialized implementations of interior point methods designed for that problem (Helmberg et al., 1996). Larger instances are solved in (Waldspurger et al., 2015) using the block-coordinate descent algorithm BlockPhaseCut.
Ultimately, algorithmic choices heavily depend on problem structure, and these will be dis-cussed in detail in the section that follows. In particular, we will study how to exploit structural information on the signal (nonnegativity, sparse 2D Fast Fourier Transform, etc.), to solve real-istically large instances formed in diffraction imaging applications.
2.3 Imaging problems
In the imaging problems we study here, various illuminations of a single object are performed through randomly coded masks, hence the matrix A is usually formed using a combination
Chapter II. Phase Retrieval for Imaging Problems 39
Algorithm 8 Block Coordinate Descent Algorithm for PhaseCut.
Input: An initial U0 = In and > 0 (typically small). An integer N > 1.
1: for k = 1; : : : ; N do
2: Pick i 2 [1; n].
3: Compute
u = Ukc ;i c Mic;i and = u Mic;i (BlockPhaseCut)
i
4:If > 0, set r
Uikc+1;i = Ui;ik+1c = 1 x
else = 0:
Ukc+1 = Uk+1c
i ;i i;i
5: end for
Output: A matrix U 0 with diag(U) = 1.
of random masks and Fourier transforms, and we have significant structural information on both the signal we seek to reconstruct (regularity, etc.) and the observations (power law decay in frequency domain, etc.). Many of these additional structural hints can be used to speedup numerical operations, convergence and improve phase retrieval performance. The paragraphs that follow explore these points in more detail.

READ STANDING CONTACT FATIGUE TESTS

Fourier operators

In practical applications, because of the structure of the linear operator A, we may often reduce numerical complexity, using the Fourier structure of A to speedup the single matrix-vector prod-uct in algorithm BlockPhaseCut. We detail the case where A corresponds to a Fourier transform combined with k random masks, writing I1; :::; Ik 2 Cp the illumination masks. The image by A of some signal x 2 Cp is then and the pseudo-inverse of A also has a simple structure, with
y1 ! = k
Ay y…k F 1(yl) Il0
Xl
=1
where Il0 is the dual filter of Il, which is
!
X
Il0 = Il= jIsj2 :
s
With the fast Fourier transform, computing the image of a vector by A or Ay only requires O(kp log(p)) floating-point operations. For any v 2 Cn, M v = diag(b)(I AAy) diag(b)v may then be computed using O(kp log p) operations instead of O(k2p2) for naive matrix-vector multiplications.
In algorithms Greedy and BlockPhaseCut, we also need to extract quickly columns from M without having to store the whole matrix. Extracting the column corresponding to index i in block l k reduces to the computation of AAy i;l where i;l 2 Ckp is the vector whose coor-dinates are all zero, except the i-th one of l-th block. If we write i 2 Cp the Dirac in i, the preceding formulas yields
0 1
i ? F(I1 Il0)
AAy i;l = B … C :
B C
@ A
i ? F(Ik Il0)
Convolution with i is only a shift and vectors F(Is Il0) may be precomputed so this operation is very fast.

Low rank iterates

In instances where exact recovery occurs, the solution to the semidefinite programming re-laxation (PhaseCut) has rank one. It is also likely to have low rank in a neighborhood of the optimum. This means that we can often store a compressed version of the iterates U in algorithm BlockPhaseCut in the form of their low rank approximation U = V V where V 2 Cn k. Each iteration updates a single row/column of U which corresponds to a rank two update of U, hence updating the SVD means computing a few leading eigenvalues of the matrix V V +LL where L 2 Cn 2. This update can be performed using Lanczos type algorithms and has complexity O(kn log n). Compressed storage of U saves memory and also speeds-up the evaluation of the vector matrix product Uic;ic Mic;i which costs O(nk) given a decomposition Uic;ic = V V , instead of O(n2) using a generic representation of the matrix U. We refer to table 2.3 for experimental comparison between full rank and low rank iterates.

Bounded support

In many inverse problems the signal we are seeking to reconstruct is known to be sparse in some basis and exploiting this structural information explicitly usually improves signal recovery performance. This is for example the basis of compressed sensing where ‘1 penalties encourage sparsity and provide recovery guarantees when the true signal is actually sparse.
The situation is a lot simpler in some of the molecular imaging problems we are studying below since the electron density we are trying to recover is often smooth, which means that its Fourier transform will be sparse, with known support. While we lose the phase, we do observe the magnitude of the Fourier coefficients so we can rank them by magnitude. This allows us to considerably reduce the size of the SDP relaxation without losing much reconstruction fidelity, i.e., in many cases we observe that a significant fraction of the coefficients of b are close to zero. From a computational point of view, sparsity in b allows us to solve a truncated semidefinite relaxation (PhaseCut). See Figure 2.1 for an illustration of this phenomenon on the caffeine molecule.
Orig FT OrigFT
FIGURE 2.1: Electronic density for the caffeine molecule (left), its 2D FFT transform (diffrac-tion pattern, center), the density reconstructed using 2% of the coefficients with largest magni-tude in the FFT (right).
Indeed, without loss of generality, we can reorder the observations b such that we approximately have b = (bT1 ; 0)T . Similarly, we note
u = u2! ; A = A2! ;Ay = (Ay)1 (Ay)2 :
u1 A1
Using the fact that b2 = 0, the matrix M in the objective of (2.5) can itself be written in blocks, that is
M = 0 1 0 ! :
M 0
Since b2 = 0, any complex vector with coefficients of magnitude one can be taken for u2 and the optimization problem (2.5) is equivalent to
minimize u1M1u1 (2.6)
subject to ju1i j = 1; i = 1; : : : n;
in the variable u1 2 Cn1 , where the Hermitian matrix
M1 = diag(b1)(I A1(Ay)1) diag(b1)
is positive semidefinite. This problem can in turn be relaxed into a PhaseCut problem which is usually considerably smaller than the original (PhaseCut) problem since M1 is typically a fraction of the size of M.

Real, positive densities

In some cases, such as imaging experiments where a random binary mask is projected on an object for example, we know that the linear observations are formed as the Fourier transform of a positive measure. This introduces additional restrictions on the structure of these observations, which can be written as convex constraints on the phase vector. We detail two different ways of accounting for this positivity assumption.

Direct nonnegativity constraints on the density

In the case where the signal is real and nonnegative, (Waldspurger et al., 2015) show that prob-lem (2.3) can modified to specifically account for the fact that the signal is real, by writing it
min kAx diag(b)uk2;
u2Cn; juij=1; 2
x2Rp
using the operator T ( ) defined as
T(Z)= Im(Z) Re(Z) ! (2.7)
Re(Z) Im(Z)
we can rewrite the phase problem on real valued signal as
minimize T (A)
0 ! diag b ! Im(u) ! 2
x b Re(u) 2
subject to u 2 Cn; juij = 1
x 2 Rp:
The optimal solution of the inner minimization problem in x is given by x = Ay2B2v, where
! ! !
Re(A) b Re(u)
A2 = ; B2 = diag ; and v =
Im(A) b Im(u)
hence the problem is finally rewritten
minimize k(A2A2yB2 B2)vk22
subject to v2 + v2 = 1; i = 1; : : : ; n;
i n+i
in the variable v 2 R2n. This can be relaxed as above by the following problem
minimize Tr(V M2)
subject to Vii + Vn+i;n+i = 1; i = 1; : : : ; n; (PhaseCutR)
V 0;
which is a semidefinite program in the variable V 2 S2n, where
M2 = (A2Ay2B2 B2)T (A2Ay2B2 B2) = B2T (I A2Ay2)B2:
Because x = Ay2B2v for real instances, we can add a nonnegativity constraint to this relaxation, using
xxT = (Ay2B2)uuT (Ay2B2)T
and the relaxation becomes
minimize Tr(V M2)
subject to (A2yB2)V (A2yB2)T 0;
Vii + Vn+i;n+i = 1; i = 1; : : : ; n;
V 0;
which is a semidefinite program in V 2 S2n.

Bochner’s theorem and the Fourier transform of positive measures

Another way to include nonnegativity constraints on the signal, which preserves some of the problem structure, is to use Bochner’s theorem. Recall that a function f : Rs 7!C is positive semidefinite if and only if the matrix B with coefficients Bij = f(xi xj) is Hermitian positive semidefinite for any sequence xi 2 Rs. Bochner’s theorem then characterizes Fourier transforms of positive measures.
Theorem 2.1. (Bochner) A function f : Rs 7!C is positive semidefinite if and only if it is the Fourier transform of a (finite) nonnegative Borel measure.
Proof. See (Berg et al., 1984) for example.
For simplicity, we first illustrate this in dimension one. Suppose that we observe the magnitude of the Fourier transform of a discrete nonnegative signal x 2 Rp so that
jFxj = b
with b 2 Rn. Our objective now is to reconstruct a phase vector u 2 Cn such that juj = 1 and
Fx = diag(b)u:
If we define the Toeplitz matrix
Bij(y) = yji
so that
jj+1; 1 j i p;
y2 yn 1
y1 y2 y2 .. C
y2 y1 . C
then when Fx = diag(b)u, Bochner’s theorem states that B(diag(b)u) 0 if and only if x 0. The contraint B(diag(b)u) 0 is a linear matrix inequality in u, hence is convex.
Suppose that we observe multiple illuminations and that the k masks I1; : : : ; Ik 2 Rp p are also nonnegative (e.g., random coherent illuminations), we have Ax = B @ F(I1 x) …
and the phase retrieval problem (2.5) for positive signals x is now written
minimize u M u
subject to Bj(diag(b)u) 0; j = 1; : : : ; k
juij = 1; i = 1; : : : n;
where Bj(y) is the matrix B(y(j)), where y(j) 2 Cp is the jth subvector of y (one for each of the k masks). We can then adapt the PhaseCut relaxation to incorporate the positivity requirement. In the one dimensional case, using again the classical lifting argument in (Shor, 1987; Lovász and Schrijver, 1991).

Table of contents :

1 Introduction
1.1 Contributions
1.2 Notations
1.3 Phase retrieval, seriation and ranking problems
1.4 Challenges
1.5 Connection between phase retrieval, seriation and ranking
1.6 Standard convex and spectral relaxation techniques
2 Phase Retrieval for Imaging Problems
2.1 Introduction
2.2 Algorithms
2.3 Imaging problems
2.4 Numerical Experiments
2.5 Discussion
2.6 User guide
3 Convex Relaxations for Permutation Problems
3.1 Introduction
3.2 Seriation, 2-SUM & consecutive ones
3.3 Convex relaxations
3.4 Algorithms
3.5 Applications & numerical experiments
3.6 Discussion
3.7 Appendix
4 Spectral Ranking Using Seriation
4.1 Introduction
4.2 Seriation, Similarities & Ranking
4.3 Spectral Algorithms
4.4 Exact Recovery with Corrupted and Missing Comparisons
4.5 Spectral Perturbation Analysis
4.6 Numerical Experiments
4.7 Discussion
4.8 Appendix
5 Conclusion