Notes for Advanced Algorithm

# Chapter.1 Min-Cut and Max-Cut

## Min-Cut

### Randomized Algorithm

• Karger’s Contraction Algorithm

• Random Contract (Choose $e$ randomly) • While($|V| \gt 2$) Repeat

• $O(n)$ for 1 contraction, $n-2$ contractions, $O(n^2)$ for Karger

• For any multigraph with $n$ verticles, the RandomContract algorithm returns a minimum cut with probability at least $\frac{2}{n(n-1)}$

• Run RandomContract independently for $t=\frac{n(n-1)\ln n}{2}$ times, the probability that a minimum cut is found is at least $1-\frac{1}{n}$

• $O(n^2)$ for 1 time, $O(n^4\log n)$for all ($Pr \geq 1-\frac{1}{n}$)

• For any graph $G(V,E)$ of $n$ vertices, the number of distinct minimum cuts in $G$ is at most $\frac{n(n-1)}{2}$

• Fast Min-Cut

• Random Contract

• Fastcut • A : The minimum-cut C survives all contractions in RandomContract(G,t)

B : size of min-cut is unchanged after RandomContract(G,t)

$\text{Pr}[B]\geq \text{Pr}[A]\geq \frac{t(t-1)}{n(n-1)}\geq \frac{1}{2} (t=\lceil 1+\frac{n}{\sqrt2}\rceil)$

• For any multigraph with $n$ vertices, the FastCut algorithm returns a minimum cut with probability $\Omega(\frac{1}{\log n})$ in time $O(n^2\log n)$

• $O(n^2\log n)​$ for 1 time, $O(n^2\log^3n)​$ for all ($Pr \geq 1-O(\frac{1}{n})​$)

## Max-Cut

### Greedy Algorithm

• Greedy Heuristics • Approximation Ratio

• $|E|=\sum_{i=1}^n(|E(S_i,{v_i})|+|E(T_i,{v_i}|))$
• $SOL_G=\sum_{i=1}^n\max (|E(S_i,{v_i})|+|E(T_i,{v_i}|)$
• $SOL_G\geq \frac{|E|}{2}\geq\frac{OPT_G}{2}$

### Derandomization by Conditional Expectation

• For every vertex $v\in V$ let $X_v\in{0, 1}$ be a uniform and independent random bit which indicates whether $v$ joins $S$ or $T$.
• $\mathbb{E}[|E(S,T)|]=\sum_{uv\in E}\mathbb{E}[I[X_u\neq X_v]]=\sum_{uv\in E}\text{Pr}[X_u\neq X_v]=\frac{|E|}{2}\geq \frac{OPT_G}{2}$
• Find the solution satisfied above(☆)

### Derandomization by Pairwise Independence

• Use $\log n$ independent variables, with xor

# Chapter.2 Fingerprinting

## Polynomial Identity Testing (PIT)

• Given as input two polynomials, determine whether they are identical.
• $f\equiv g \Leftrightarrow f-g\equiv 0$

### Deterministic Algorithm

• Evaluate $f(x_1), f(x_2), \dots, f(x_{d+1})$ over $d+1$ distinct elements $x_1, x_2, \dots, x_{d+1}$ from the field $\mathbb{F}$ and check whether they are all zero.

• Fundamental Theorem of Algebra

Any non-zero univariate polynomial of degree $d$ has at most $d$ roots.

### Randomized Algorithm

• • $\text{Pr}[f(r)=0]\leq\frac{d}{|S|}$

## Communication Complexity of Equality

• $EQ(a,b)=\begin{cases}1 & \text{if a=b,} \\ 0 & \text{otherwise.}\end{cases}$

• Theorem (Yao 1979)

Any deterministic communication protocol computing EQ on two $n$-bit strings costs $n$ bits of communication in the worst-case.

• $f(x)=\sum_{i=0}^{n-1}a_ix^i$ and $g(x)=\sum_{i=0}^{n-1}b_ix^i$, defined over finite field $\mathbb{Z}_p=\lbrace 0,1,\dots,p-1\rbrace$ for some suitable $p$

• • Communication complexity bounded with $O(\log p)$, $\text{Pr}[f(r)=g(r)]\leq\frac{n-1}{p}$, choosing $p$ to be a prime in the interval $[n^2,2n^2]$, error prability of false positive at most $O(\frac{1}{n})$

• Schwartz-Zippel Theorem

Let $f\in\mathbb{F}[x_1,x_2,\dots,x_n]$ be a multivariate polynomial of degree $d$ over a field $\mathbb{F}$. If $f \not\equiv0$, then for any finite set $S \subset\mathbb{F}$, and $r_1,r_2,\dots,r_n\in S$ chosen uniformly and independently at random, $\text{Pr}[f(r_1,r_2,\dots,r_n)=0]\leq\frac{d}{|S|}$.

Proof(★) of induction

Notes :

1. The proof of $n=1$ is based on fundamental theorem of algebra.
2. Straightly consider multivariate polynomial $f(x_1,x_2,\dots,x_n)$ as univariate polynomial $f_{r_1,r_2,\dots,r_{n-1}}(r_n)$ with $\text{Pr}[f_{r_1,r_2,\dots,r_{n-1}}(r_n)=0]\leq\frac{d}{|S|}$is wrong, because $\text{Pr}[f_{r_1,r_2,\dots,r_{n-1}}(r_n)=0]=\text{Pr}[\text{This univariate polynomial$\equiv0$}]+\text{Pr}[\text{The coefficient$=0$}]$ and the only truth is $\text{Pr}[\text{This univariate$\equiv0$}]\leq\frac{d}{|S|}$.

## Fingerprinting

• Fingerprints function $FING(\cdot)$
• $FING(\cdot)$ is a function, meaning that if $X=Y$ then $FING(X)=FING(Y)$.
• It is much easier to compute and compare the fingerprints.
• Ideally, the domain of fingerprints is much smaller than the domain of original objects, so storing and comparing fingerprints are easy. This means the fingerprint function $FING(\cdot)$ cannot be an injection (one-to-one mapping), so it’s possible that different $X$ and $Y$ are mapped to the same fingerprint. We resolve this by making fingerprint function randomized, and for $X\neq Y$, we want the probability $\text{Pr}[FING(X)=FING(Y)]$ to be small.

### Fingerprinting by randomized checksum

• Now we consider a new fingerprint function: We treat each input string $x\in\lbrace 0,1\rbrace ^n$ as the binary representation of a number, and let $FING(x)=x\text{ mod }p$ for some random prime $p$ chosen from $[k]=\lbrace 0,1,\dots,k-1\rbrace$.

• • The number of bits to be communicated is obviously $O(\log k)$.

• Let $z=x-y$. $\text{Pr}[z\text{ mod }p=0]\leq\frac{\text{the number of prime divisors }z}{\text{the number of primes in }[k]}$.

• Prime Number Theorem

Let $\pi(k)$ denote the number of primes less than $k$. Then $\pi(k) \sim\frac{k}{\ln k}$ as $k\sim\infty$.

• Therefore, by choosing $k=2n^2\ln n$, $0\lt z\lt 2^n$, and a random prime $p\in[k]$, $\text{Pr}[z\text{ mod }p]\leq\frac{n}{\pi(k)}\sim\frac{1}{n}$.

$\text{Pr}[FING(x)=FING(y)]\leq \text{Pr}[|x-y|\text{ mod }p=0]\leq\frac{1}{n}$.

## Checking distinctness

• Given a sequence $x_1,x_2,\dots,x_n\in\lbrace 1,2,\dots,n\rbrace$, check whether every element of $\lbrace 1,2,\dots,n\rbrace$ appears exactly once.

### Fingerprinting multisets

• • $FING(A)=\prod_{i=1}^n(r-a_i)$

• Theorem (Lipton 1989)

Let $A=\lbrace a_1,a_2,\dots,a_n\rbrace$ and $B=\lbrace b_1,b_2,\dots,b_n\rbrace$ be two multisets whose elements are from $\lbrace 1,2,\dots,n\rbrace$. If $A\neq B$, then $\text{Pr}[FING(A)=FING(B)]=O(\frac{1}{n})$.

Proof(★)

# Chapter.3 Hashing and Sketching

## Distinct Elements

• Counting distinct elements.
• Input : a sequence of (not necessarily distinct) elements $x_1,x_2,\dots, x_n\in\Omega$
• Output : an estimation of the total number of distinct elements $z=|\lbrace x_1,x_2,\dots,x_n\rbrace|$

### Straightforward Method

• Maintain a dictionary data structure, costs $O(n)$ space.
• Due to an information-theoretical argument, linear space is necessary if you want to compute the exact value of $z$.

### $(\epsilon,\delta)$-estimator

• Relax the problem a little bit to significantly reduce the space cost by tolerating approximate answers.
• • $\epsilon$ : approximation error & $\delta$ : confidence error

### An estimator by hashing

• Random hash function : $h:\Omega\rightarrow[0,1]$ (uniformly distributed)

• $\mathbb{E}[\min_{1\leq i\leq n}h(x_i)]=\frac{1}{z+1}$

Proof(★)

• $Y=\min_{1\leq i\leq n}h(x_i), \text{Answer}=\frac{1}{Y}-1$

### Flajolet-Martin algorithm

• k random hash function : $h:\Omega\rightarrow[0,1]$ (uniformly distributed)

• • For any $\epsilon,\delta\lt\frac{1}{2}$, if $k\geq\lceil\frac{4}{\epsilon^2\delta}\rceil$, the the output $\hat Z$ always gives an $(\epsilon, \delta)$-estimator of the correct answer $z$.

Proof(★)

• ### Uniform Hash Assumption(UHA)

• hash functions with discrete values as $h:\Omega\rightarrow[M]$ where $\text{M=poly(n)}$, the hash values are strings of $O(\log n)$ bits.
• By an information-theretical argument, it takes at least $\Omega(N\log M)$ bits to represent such a random hash function because this is the entropy of such uniform random function.

## Set Membership

• space cost + time cost
• sorted table / balanced search tree : Space : $O(n\log N)$ bits & Time : $O(\log n)$
• perfect hashing : Space : $O(n\log N)$ bits & Time : $O(1)$

### Bloom Filter

• The Bloom filter is a space-efficient hash table that solves the approximate membership problem with one-sided error (false positive).
• A Bloom filter consists of an array $A$ of $cn$ bits, and $k$ hash functions $h_1,h_2,\cdots,h_k$ map $\Omega$ to $[cn]$.
• • Probability Calculate(★)

## Frequency Estimation

• Problem define

• Data : a sequence of (not necessarily distinct) elements $x_1,x_2,\dots,x_n\in\Omega$
• Query : an element $x\in\Omega$
• Output : an estimation $\hat f_x$ of tje frequency $f_x\triangleq |\lbrace i|x_i=x\rbrace|$ of $x$ in input data
• Heavy Hitters : (elements $x$ with high frequencies $f_x>\epsilon n$)

• Count-min Sketch # Chapter.4 Balls into bins and Chernoff bound

## Balls into Bins

### Birthday Problem

• here are $m$ students in the class. Assume that for each student, his/her birthday is uniformly and independently distributed over the 365 days in a years. We wonder what the probability that no two students share a birthday.
• Pigeonhole Principle : Obviously there must be two students with the same birthday is $m \gt 365$.
• Birthday Paradox : Surprisingly, for any $m \gt 57$ this event occurs with more than 99% probability
• Model : $m$ different balls (students) are uniformly and independently thrown into 365 bins (days). More generally, let $n$ be the number of bins.
• $\mathcal{E}$ : there is no bin with more than one balls (i.e. no two students share birthday).
• $\text{Pr}[\mathcal{E}]=\frac{(^n_m)m!}{n^m}=\prod_{k=1}^{m-1}(1-\frac{k}{n})\approx e^{-\frac{m^2}{2n}}$. For $m=\sqrt{2n\ln \frac{1}{\epsilon}}, \text{Pr}[\mathcal{E}]\approx\epsilon$.

### Coupon Collector

• We keep throwing balls one-by-one into $n$ bins (coupons), such that each ball is thrown into a bin uniformly and independently at random. Each ball corresponds to a box of chocolate, and each bin corresponds to a type of coupon. Thus, the number of boxes bought to collect $n$ coupons is just the number of balls thrown until none of the $n$ bins is empty.

• Theorem :

Let $X$ be the number of balls thrown uniformly and independently to $n$ bins until no bin is empty. Then $\mathbb{E}[X]=nH(n)$, where $H(n)$ is the $n$-th harmonic number.

$H(n)=\ln n+O(1)$, the expected number of coupons required to obtain all $n$ types of coupons is $n\ln n+O(n)$.

Proof(★)

• Let $X$ be the number of balls thrown uniformly and independently to $n$ bins until no bin is empty. Then $\text{Pr}[X\geq n\ln n+cn]\lt e^{-c}$ for any $c \gt 0$.

### Occupancy Problem

• An easy analysis shows that for every bin $i$, the expected load $\mathbb{E}[X_i]$ is equal to the average load $\frac{m}{n}$.

• With High Probability (w.h.p) : means $\text{Pr}[A]\gt 1-\frac{1}{n^c}$ in which $n$ means the size of $A$, $c$ is a constant and $c\gt 1$, then we can say $A$ happens with high probability.

• The distribution of the maximum load.

• Theorem : Suppose that $n$ balls are thrown independently and uniformly at random into $n$ bins. For $1\leq i \leq n$, let $X_i$ be the random variable denoting the number of balls in he $i$-th bin. Then $\text{Pr}[\max_{1\leq i\leq n}X_i \geq \frac{3\ln n}{\ln\ln n}] \lt \frac{1}{n}$.

Proof(★)

• Formally, it can be proved that for $m=\Omega(n\log n)$, with high probability, the maximum load is within $O(\frac{m}{n})$, which is asymptotically equal to the average load.

## Chernoff Bound

### Moment generating functions

• The more we know about the moments of a random variable $X$, the more information we would have about $X$. There is a so-called moment generating function, which “packs” all the information about the moments of $X$ into one function.

• Definition : The moment generating function of a random variable $X$ is defined as $\mathbb{E}[e^{\lambda x}]$ where $\lambda$ is the parameter of the function.

By Taylor’s expansion and the linearity of expectations, $\mathbb{E}[e^{\lambda x}]=\mathbb{E}[\sum_{k=0}^\infty \frac{\lambda^k}{k!}X^k]=\sum_{k=0}^\infty \frac{\lambda^k}{k!}\mathbb{E}[X^k]$.

### The Chernoff bound

• Let $X=\sum_{i=1}^n X_i$, where $X_1,X_2,\cdots,X_n$ are independent Poisson trails. Let $\mu=\mathbb{E}[X]$.

• Standard Forms :

• Upper Tail : for any $\delta\gt 0$, $\text{Pr}[X\geq (1+\delta)\mu]\leq (\frac{e^\delta}{(1+\delta)^{1+\delta}})^\mu$.
• Lower Tail : for any $0 \lt \delta \lt 1$, $\text{Pr}[X \leq (1-\delta)\mu] \leq (\frac{e^{-\delta}}{(1-\delta)^{1-\delta}})^\mu$.
• Proof(★)

Idea : apply Markov’s inequality to $e^{\lambda x}$ and for the rest, we just estimate the moment generating function $\mathbb{E}[e^{\lambda x}]$. To make the bound as tight as possible, we minimized the $\frac{e^{(e^{\lambda}-1)}}{e^{\lambda (1+\delta)}}$ by setting $\lambda = \ln (1 + \delta)$, which can be justified by taking derivatives of $\frac{e^{(e^{\lambda}-1)}}{e^{\lambda (1+\delta)}}$.

• Useful Forms :

• for $0 \lt \delta \leq 1$,

$\text{Pr}[X \geq (1+\delta)\mu] \lt \exp(-\frac{\mu\delta^2}{3})$

$\text{Pr}[X \leq (1-\delta)\mu] \lt \exp(-\frac{\mu\delta^2}{2})$

• for $t \geq 2e\mu$,

$\text{Pr}[X \geq t] \leq 2^{-t}$

## Hoeffding

• Let $X$ be a real-valued random variable with expected value $\mathbb{E}[X]=0$, and such that $a \leq X \leq b$ almost surely. Then for all $\lambda \in \mathbb{R}$, $\mathbb{E}[e^{\lambda x}]\leq \exp(\frac{\lambda^2(b-a)^2}{8})$.

• General case of bounded random variables :
• $\overline{X}=\sum_{i=1}^n X_i$, $a_i \leq X_i \leq b_i$
• $P(\overline{X}-\mathbb{E}[\overline{X}] \geq t) \leq \exp(-\frac{2n^2t^2}{\sum_{i=1}^n(b_i-a_i)^2})$
• $P(|\overline{X}-\mathbb{E}[\overline{X}]| \geq t) \leq 2\exp(-\frac{2n^2t^2}{\sum_{i=1}^n(b_i-a_i)^2})$

# Chapter.5 Concentration of Measure

## Conditional Expectations

• conditional expectation of a random variable $Y$ with respect to an event $\mathcal{E}$ :

$\mathbb{E}[Y\mid \mathcal{E}]=\sum_y y\text{Pr}[Y=y \mid \mathcal{E}]$

• In particular, if the event $\mathcal{E}$ is $X=a$, the condition expectation $\mathbb{E}[Y \mid X=a]$ defines a function $f(a)=\mathbb{E}[Y \mid X=a]$. Thus, $\mathbb{E}[Y \mid X]$ can be regared as a random variable $f(X)$.

• Proposition (fundamental facts about conditional expectation) :

Let $X,Y,Z$ be arbitary andom variables. Let $f, g$ be arbitary functions.

• $\mathbb{E}[X] = \mathbb{E}[\mathbb{E}[X \mid Y]]$
• $\mathbb{E}[X \mid Z]=\mathbb{E}[\mathbb{E}[X \mid Y,Z] \mid Z]$
• $\mathbb{E}[\mathbb{E}[f(X)g(X, Y) \mid X]]=\mathbb{E}[f(X) \cdot \mathbb{E}[g(X,Y) \mid X]]$

## Martingales

• Definition :

A sequence of random variables $X_0, X_1, \cdots$ is a martingale if for all $i \gt 0$, $\mathbb{E}[X_i \mid X_0,\cdots, X_{i-1}]=X_{i-1}$

• Definition of martingale’s general version :

A sequence of random variables $Y_0, Y_1, \cdots$ is a martingale with respect to the sequence $X_0, X_1, \cdots$ if, for all $i \gt 0$, the following conditions are hold :

• $Y_i$ is a functiob of $X_0,X_1,\cdots, X_i$
• $\mathbb{E}[Y_{i+1} \mid X_0,\cdots, X_i] = Y_i$.

## Azuma’s Inequality

• Azuma’s Inequality :

Let $X_0, X_1, \cdots$ be a martingale such that, for all $k \geq 1$, $|X_k-X_{k-1}| \leq c_k$,

Then $\text{Pr}[|X_n-X_0| \geq t] \leq 2 \exp(-\frac{t^2}{2\sum_{k=1}^n c_k^2})$.

• Collary :

Let $X_0, X_1, \cdots$ be a martingale such that, for all $k \geq 1$, $|X_k-X_{k-1}| \leq c$,

Then $\text{Pr}[|X_n-X_0| \geq ct\sqrt{n}] \leq 2 \exp(-\frac{t^2}{2})$.

• Azuma’s Inequality ( general version ) :

Let $Y_0, Y_1, \cdots$ be a martingale with respect to the sequence $X_0, X_1, \cdots$ such that, for all $k \gt 1$, $Y_k-Y_{k-1} \leq c_k$.

Then, $\text{Pr}[|Y_n-Y_0| \geq t] \leq 2 \exp(-\frac{t^2}{2\sum_{k=1}^n c_k^2})$.

Proof(★)

Lemma : Let $X$ be a random variable such that $\mathbb{E}[x] = 0$ and $|X| \leq c$. Then for $\lambda \gt 0$, $\mathbb{E}[e^{\lambda X}] \leq \exp(\frac{\lambda^2 c^2}{2})$.

## The Doob Martingales

• Definition :

The Doob sequence of a function $f$ with respect to a sequence of random variables $X_1, \cdots, X_n$ is defined by

$Y_i = \mathbb{E}[f(X_1, \cdots, X_n) \mid X_1, \cdots, X_i], 0 \leq i \leq n$.

In particular, $Y_0 = \mathbb{E}[f(X_1, \cdots, X_n)]$ and $Y_n=f(X_1, \cdots, X_n)$.

• See in Wiki and Slides :

• edge exposure martingale
• vertex exposure martingale
• Chromatic number

Let $G \sim G(n,p)$. Let $\chi(G)$ be the chromatic number of $G$. Then $\text{Pr}[|\chi(G) - \mathbb{E}[\chi(G)]| \geq t \sqrt{n}] \leq 2 \exp(-t^2 / 2)$.

Proof(☆) ( $Y_i = \mathbb{E}[\chi(G) \mid X_1, \cdots, X_i]$, $|Y_i - Y_{i-1}| \leq 1$ )

• Hoeffding’s Inequality

Let $X=\sum_{i=1}^n X_i$, where $X_1, \cdots, X_n$ are independent random variables with $a_i \leq X_i \leq b_i$ for each $1 \leq i \leq n$.

Let $\mu = \mathbb{E}[x]$. Then $\text{Pr}[|X-\mu| \geq t] \leq 2 \exp(-\frac{t^2}{2 \sum_{i=1}^n (b_i-a_i)^2})$.

Proof(☆) ( Use the Doob Martingale, $Y_i = \mathbb{E}[\sum_{j=1}^n X_j \mid X_1, \cdots, X_i]$ )

## The Bounded Difference Method

• For arbitrary random variables :

Let $X=(X_1, \cdots, X_n)$ be arbitrary random variables and let $f$ be a function of $X_0, X_1, \cdots, X_n$ satisfying that,

for all $1 \leq i \leq n$, $|\mathbb{E}[f(X) \mid X_1, \cdots, X_i] - \mathbb{E}[f(X) \mid X_1, \cdots, X_{i-1}]| \leq c_i$,

Then $\text{Pr}[|f(x)-\mathbb{E}[f(X)]| \geq t] \leq 2 \exp(-\frac{t^2}{2 \sum_{i=1}^n c_i^2})$.

Proof(☆) ( $Y_i = \mathbb{E}[f(X_1, \cdots, X_n) \mid X_1, \cdots, X_i]$, use Azuma’s Inequality )

• For independent random variables :

• Definition (Lipschitz condition) :

A function $f(x_1, \cdots, x_n)$ satisfies the Lipschitz condition, if for any $x_1, \cdots, x_n$ and any $y_i$,

$|f(x_1, \cdots, x_{i-1}, x_i, x_{i+1}, \cdots, x_n) - f(x_1, \cdots, x_{i-1}, y_i, x_{i+1}, \cdots, x_n)| \leq 1$.

• Definition (Lipschitz condition, general version) :

A function $f(x_1, \cdots, x_n)$ satisfies the Lipschitz condition, if for any $x_1, \cdots, x_n$ and any $y_i$,

$|f(x_1, \cdots, x_{i-1}, x_i, x_{i+1}, \cdots, x_n) - f(x_1, \cdots, x_{i-1}, y_i, x_{i+1}, \cdots, x_n)| \leq c_i$.

• Collary ( Method of bounded differences ) :

Let $X=(X_1, \cdots, X_n)$ be $n$ independent random variables and let $f$ be a function satisfying the Lipschitz condition with constants $c_i$, $1 \leq i \leq n$.

Then $\text{Pr}[|f(X)-\mathbb{E}[f(X)]| \geq t] \leq 2 \exp(-\frac{t^2}{2 \sum_{i=1}^n c_i^2})$.

Proof(★)

• Applications (See in Wiki and Slides).

# Chapter.6 Dimension Reduction and Nearest Neighbor Search

## Dimension Reduction

• Definition :

• Input : n points $\vec{x_1}, \vec{x_2}, \cdots, \vec{x_n} \in \mathbb{R}^d$
• Output : n points $\vec{y_1}, \vec{y_2}, \cdots, \vec{y_n} \in \mathbb{R}^k$
• Require : $\forall i, j : (1-\epsilon) |\vec{x_i}-\vec{x_j}| \leq |\vec{y_i}-\vec{y_j}| \leq (1+\epsilon) |\vec{x_i}-\vec{x_j} |$
• Usually we want $k \ll d$.
• The JLT ( Johonson-Linenstauss Theorem 1984 )

Euclidean Distance

$\forall 0 \lt \epsilon \lt 1$, for any set $S$ of $n$ points from $\mathbb{R}^d$, there is a $\phi : \mathbb{R}^d \to \mathbb{R}^k$ with $k \in O(\epsilon^{-2} \cdot \log n)$, such that $\forall \vec{x_i}, \vec{x_j} \in S : (1-\epsilon) |\vec{x_i}-\vec{x_j}|_2^2 \leq |\phi(\vec{x_i})-\phi(\vec{x_j})|_2^2 \leq (1+\epsilon) |\vec{x_i}-\vec{x_j} |_2^2$.

Easy to find $\phi​$, just sample a random $k \times d​$ matrix $A​$.

$\text{Pr}[ \forall \vec{x_i}, \vec{x_j} \in S : (1-\epsilon) |\vec{x_i}-\vec{x_j}|_2^2 \leq |A\vec{x_i}-A\vec{x_j}|_2^2 \leq (1+\epsilon) |\vec{x_i}-\vec{x_j} |_2^2 ] \geq 1-O(\frac{1}{n})$.

• How to construct (sample) $A$?

• Choose some suitable $k \in O(\epsilon^{-2} \cdot \log n)$.
• For each $a_{i,j} of A \in \mathbb{R}^{k \times d}$, independently sample $a_{i,j}$ from Gaussian Distribution $N(0,\frac{1}{k})$.
• For each $l \in [n] : \text{Let} \vec{y_l} = A \vec{x_l}$.
• Consider $(1-\epsilon) |\vec{x_i}-\vec{x_j}|_2^2 \leq |A\vec{x_i}-A\vec{x_j}|_2^2 \leq (1+\epsilon) |\vec{x_i}-\vec{x_j} |_2^2 \Leftrightarrow 1-\epsilon \leq | A\frac{\vec{x_i}-\vec{x_j}}{| \vec{x_i}-\vec{x_j} |_2} |_2^2 \leq 1+\epsilon$

$\frac{\vec{x_i}-\vec{x_j}}{| \vec{x_i}-\vec{x_j} |_2}$ means unit vector!

• Union bound over $O(n^2)$ pairs of $\vec{x_i}, \vec{x_j} \in S$ : For any unit vector $\vec{u} \in \mathbb{R}^d : \text{Pr}[| | A\vec{u} |^2_2 -1| \gt \epsilon] \lt \frac{1}{n^3}$

• Definition :

• Setup : metric space ($X$, dist)
• Data : $n$ points $\vec{y_1}, \vec{y_2}, \cdots, \vec{y_n} \in X$
• Query : given a point $\vec{x} \in X$, find the $\vec{y}$ which is closest to $\vec{x}$
• c-ANN (Approximate Nearest Neighbor) :

• Return a $\vec{y_i}$ s.t. $\text{dist}(\vec{x}, \vec{y_i}) \leq c \cdot \min_{1 \leq j \leq n} \text{dist}(\vec{x}, \vec{y_j})$
• (c,r)-ANN (Approximate Near Neighbor) :

• Return a $\vec{y_i}$ s.t. $\text{dist}(\vec{x}, \vec{y_i}) \leq cr$ if $\exist \vec{y_j} : \text{dist}(\vec{x}, \vec{y_j}) \leq r$
• Return “NO” if $\forall \vec{y_j} : \text{dist}(\vec{x}, \vec{y_j}) \gt r$
• Arbitrary answer otherwise
• Hamming Space based on $\lbrace 0, 1\rbrace ^d$ and $\lbrace 0, 1\rbrace ^k$

• For $i = 1, 2, \cdots, n$ : let $\vec{z_i} = A\vec{y_i} \in \lbrace 0, 1 \rbrace^k$ on finite field GF(2).

• Store all s-balls $B_s(\vec{u} = \lbrace\vec{y_i} \mid \text{dist}(\vec{u}, \vec{z_i}) \lt s \rbrace)$ for all $\vec{u} \in \lbrace 0, 1 \rbrace^k$

• Now, upon a query $\vec{x} \in \lbrace 0, 1 \rbrace^k$ :

Retrieve $B_s(A\vec{x})$

If $B_s(A\vec{x}) = \phi$ returns “no”, else return any $\vec{y_i} \in B_s(A\vec{x})$.

• Space : $O(n \cdot 2^k)$

Query Time : $O(dk)$ computation + $O(1)$ memory access

• For suitable $k \in O(\log n), p, s$; $\forall \vec{x}, \vec{y} \in \lbrace 0, 1 \rbrace^d$ :

$\text{dist}(\vec{x}, \vec{y}) \leq r \Rightarrow \text{Pr}[\text{fist}(A\vec{x}, A\vec{y}) \gt s] \leq \frac{1}{n^2}$

$\text{dist}(\vec{x}, \vec{y}) \gt cr \Rightarrow \text{Pr}[\text{fist}(A\vec{x}, A\vec{y}) \leq s] \leq \frac{1}{n^2}$

(c,r)-ANN is solved w.h.p.

• More Normal Condition :  Using Chernoff Bound :

Let independent r.v. $X_1, X_2. \cdots, X_k \in \lbrace 0,1 \rbrace$, let $X = \sum_{i=1}^k X_i$, then for $s \gt 0$ :

$\text{Pr}[X \geq \mathbb{E}[X] + s] \leq \exp(- \frac{2s^2}{k}) \leq \exp(- \Omega(k))$

$\text{Pr}[X \leq \mathbb{E}[X] - s] \leq \exp(- \frac{2s^2}{k}) \leq \exp(- \Omega(k))$

• ## Locality-Sensitive Hashing

• Definition :

Given a metric space $(X, \text{dist})$,

a random $h : X \to U$ drawn from $\mathcal{H}$ is an (r, cr, p, q)-LSH if, for all $\vec{x}, \vec{y} \in X$ :

$\text{dist}(\vec{x}, \vec{y}) \leq r \Rightarrow \text{Pr}[h(\vec{x}) = h(\vec{y})] \geq p$

$\text{dist}(\vec{x}, \vec{y}) \gt r \Rightarrow \text{Pr}[h(\vec{x}) = h(\vec{y})] \geq p$

• We hope $P \gt q$.

• If There exists an (r, cr, p, q)-LSH $h : X \to U$,

then there exists an (r, cr, $p^k$, $q^k$)-LSH $g : X \to U^k$

• Independently draw $h_1, h_2, \cdots, h_k$ according to the distribution of $h$

$g(x) = ( h_1(x), h_2(x), \cdots, h_k(x) ) \in U^k$

•  If the real answer is “no” : always correct.

If exists $\vec{y_s}$ such that $\text{dist}(\vec{x}, \vec{y_s}) \leq r$, the $\text{Pr[answer “no”]} \leq \text{Pr}_1 + \text{Pr}_2$

$\text{Pr}_1 = \text{Pr}[\forall j, g_j(\vec{x}) \neq g_j(\vec{y_s})] \leq (1 - p^*)^l \leq 1/e$

$\text{Pr}_2 = \text{Pr}[\text{exist } 10l \text{ bad } \vec{y_i} \text{ that dist}(\vec{x}, \vec{y_i}) \gt cr \text{ yet } \exist j \text{ s.t. } g_j(\vec{x}) = g_j(\vec{y_i})] \leq \frac{\mathbb{E}[\text{number of such bad } \vec{y_i}]}{10l} \leq \frac{n \cdot l \cdot (1/n)}{10l} = 0.1$

$\text{Pr}_1 + \text{Pr}_2 \leq 1/e + 0.1 \lt 0.5$

Space : $O(nl) = O(n / p^)$ Time : $O(l \log n) = O((\log n) / p^)$

•  • Recap : # Chapter.7 Greedy and Local Search

## Set Cover

• Set Cover Problem :
• Input : $m$ subsets $S_1, S_2, \cdots, S_m \subseteq U$ of universe $U$ of size $n$.
• Output : the smallest $C \subseteq \lbrace 1, 2, \cdots, m \rbrace$ such that $U=\bigcup_{i \in C} S_i$.
• Hitting Set Problem :
• Input : $n$ subsets $S_1, S_2, \cdots, S_m \subseteq U$ of universe $U$ of size $m$.
• Output : the smallest subset $C \subseteq U$ of elements such that $C$ intersects with every $S_i$ for $1 \leq i \leq n$.

### Frequency and Vertex Cover

• $\text{frequency}(x) = | \lbrace i \mid x \in S_i \rbrace |$.
• Vertex Cover Problem :
• Input : an undirected graph $G(V, E)$
• Output : the smallest $C \subseteq V$ such that every edge $e \in E$ is incident to at least on vertex in $C$.
• The vertex cover problem is NP-Hard.

### Greedy Algorithm

• Greedy Cover Analyze :

• Notations :

• We enumerate all elements of the universe $U$ as $x_1, x_2, \cdots, x_n$, in the order in which they are covered in the algorithm.

• For $t = 1, 2, \cdots,$ let $U_t$ denote the set of uncovered elements in the beginning of the $t$-th iteration of the algorithm.

• For the $k$-th element $x_k$ covered, suppose that it is covered by $S_i$ in the $t$-th iteration, define

$\text{price}(x_k) = 1 / |S_i \cap U_i|$

to be the average “price” to cover element $x_k$ in the algorithm.

• Lemma 1 :

For the set cover $C$ returned by the algorithm, $|C|=\sum_{k=1}^n \text{price}(x_k)$.

• Lemma 2 :

For each $x_k$, $\text{price}(x_k) \leq \frac{\text{OPT}}{n-k+1}$, where $\text{OPT}$ is the size of the optimal set cover.

Proof ( ★ )

• Combine Lemma1 and Lemma2 :

$|C|=\sum_{k=1}^n \text{price}(x_k) \leq \sum_{k=1}^n \frac{\text{OPT}}{n-k+1} = \sum_{k=1}^n \frac{\text{OPT}}{k} = H_n \cdot \text{OPT}$.

$H_n$ means the $n$-th Harmonic number, $H_n \simeq \ln n + O(1)$.

• Theorem :

For any set cover instance $S_1, S_2, \cdots, S_n \subseteq U$ with optimal set cover of size $\text{OPT}$, the GreedyCover returns a set cover of size $C \leq H_n \cdot \text{OPT}$, where $n=|U|$ is the size of the universe and $H_n \simeq\ln n$ represents the $n$-th Harmonic Number.

### Primal-Dual Algorithm

• Definition : • Theorem ( Weak Duality ) :

For every matching $M$ and every vertex cover $C$ for $S_1, S_2, \cdots, S_m \subseteq U$, it holds that $|M| \leq |C|$.

Proof

• Corollary :

Let $S_1, S_2, \cdots, S_m \subseteq U$ be an instance for set cover, and $\text{OPT}$ the size of the optimal set cover.

For every matching $M$ for $S_1, S_2, \cdots, S_m$,it holds that $|M| \leq \text{OPT}$.

• Dual Cover : ## Scheduling

### Background

• Pre-Definition :

• There are $n$ jobs to be processed.
• There are $m$ identical parallel machines. Each machine can start processing jobs at time $0$ and can process at most one job at a time.
• Each job $j = 1, 2, \cdots, n$ must be processed on one of these machines for $p_j$ time units without interruption . $p_j$ is called the processing time of job $j$.
• Minimum Makespan on Identical Parallel Machines( load balancing version ) :

• Input :

$n$ positive integers $p_1, p_2, \cdots, p_n$ and a positive integer $m$

• Output :

an assignment $\sigma : [n] \to [m]$ which minimizes $C_{\text{max}}=\max_{i \in [m]} \sum_{j:i=\sigma(j)}p_j$.

• Partition Problem ($m = 2$) :

Input : a set of $n$ positive integers $S=\lbrace x_1, x_2, \cdots, x_n \rbrace$

Determine whether there is a partition of $S$ into $A$ and $B$ such that $\sum_{x \in A} = \sum_{x \in B}$.

### Graham’s List algorithm

• The List algorithm ( Graham 1966 )

Input : a list of jobs $j=1,2, \cdots, n$ with processing times $p_1, p_2, \cdots, p_n$.

for $j=1, 2, \cdots, n$, assign job $j$ to the machine that currently has the smallest load.

It is well known that the list algorithm has approximation ration $(2 - \frac{1}{m})$.

• Theorem :

For every instance of scheduling $n$ jobs with processing times $p_1, p_2, \cdots, p_n$ on $m$ parallel identical machines, the List algorithm finds a schedule with makespan $C_{\text{max}} \leq (2 - \frac{1}{m}) \cdot \text{OPT}$, where $\text{OPT}$ is the makespac of optimal schedules.

Proof(★)

Another natural idea for solving optimization problems is the local search. Given an instance of optimization problem, the principle of local search is as follows:

• We start with a feasible solution.
• At each step, we try to improve the solution by modifying the locally (changing the assignment of a constant number of variables), until nothing can be further changed (thus a local optimum is reached).

See at wiki

### Longest Processing Time ( LPT )

• Definition :

Input : a list of jobs $j=1,2,\cdots, n$ with processing times $p_1 \geq p_2 \geq \cdots \geq p_n$ in non-increasing order.

for $j=1,2,\cdots,n$, assign job $j$ to the machine that currently has the smallest load.

• Analysis ( See at wiki )

See at wiki

# Assignments

## Problem Set 1

### Solution 1

• My answer is “no”.

• We can easily find that :

• The number of distinct min-cuts in a connected acyclic graph is $O(n)$.
• The number of distinct min-cuts in a cycle graph consists of $n$ vertices and $n$ edges is $O(\frac{n(n-1)}{2})$.
• These conditions declare that $O(\log n)$ can’t be the upper bound on the number of distinct min-cuts in any multigraph $G$ with $n$ vertices.

• We know that $p(n)=\text{Pr}[FastCut(G)\text{ returns a min-cut in G}]=\Omega(\frac{1}{\log n})$, and it’s obvious that $\text{Pr}[\text{a min-cut C is returned by }FastCut(G)]\leq \text{Pr}[FastCut(G)\text{ returns a min-cut in G}]=\Omega(\frac{1}{\log n})$.

• Now we know $p_C \leq \Omega(\frac{1}{\log n })$, And since $p_{\text{correct}}$ is a well defined probability, due to the unitarity of probability, it must hold that $p_{\text{correct}}\leq 1$.

• Therefore, the only facts we know are

• $1 \geq p_{\text{correct}}=\sum_{C\in\mathcal{C}}p_C$
• $\forall C\in\mathcal{C}, p_C \leq \Omega(\frac{1}{\log n})$
• we cannot get the upper bound on the number of distinct min-cuts in any multigraph $G$ with $n$ vertices from the probability analysis of $FactCut$, because $\Omega(\frac{1}{\log n})$ is the lower bound of $p_C$, we cannot use it to estimate the upped bound of distinct min-cut numbers.

Actually, we already knew that the upper bound on the number of distinct min-cuts in any multigraph $G$ with $n$ vertices is $O(\frac{n(n-1)}{2})$, and we already have a condition reaching this bound, so we can’t find a tighter bound.

• In conclusion, my answer of this problem is “no”.

### Solution 2

• We can us Tree-Hash algorithm to judge whether two rooted trees are isomorphic.

• Consider a kind of polynomial $h=\sum_{i=0}^{n-1}a_i*x_i$ over $\mathbb{Z}_p=\lbrace 0, 1,\dots, p-1 \rbrace$, in which $n$ is the size of multiset $S$ and $a_i$ means the $i$-th min number in $S$.

• The Tree-Hash algorithm is as follows :

• For a rooted tree, we will calculate its hash value which can be used to judge whether two trees are isomorphic.

• We will use Depth-First-Search to travel the tree, and calculate every node’s hash value from bottom to top. We call node $u$’s hash value $V(u)$.

• If current node is a leaf node, its hash value can be a random number $r\in [M]$. But we need declare that for all leaf nodes, $x$ must be validate.

• If current node is not a leaf node, in other words it has a son set $S\neq\emptyset$, we first get the multiset $T$ which consists of all this node’s sons’ hash value, means $T=\cup_{s\in S}\lbrace V(s)\rbrace$.

• Then we use the result after applied $h$ on multiset $T$ as the hash value of current node $a$, means $V(a)=h(T)$.

• After all nodes’ hash value have been calculated, we use the root of this tree $r$’s hash value as the tree’s hash value, means we need $V(r)$.

• Assuming $r_1$ and $r_2$ are roots of rooted Trees $T_1$ and $T_2$. We need to compare two hash value $V(r_1)$ and $V(r_2)$.

$$\text{The result we derive}=\begin{cases}yes & \text{if } V(r_1)=V(r_2) \\ no & \text{if }V(r_1)\neq V(r_2)\end{cases}$$

• Analysis :

• Space Cost :

Cause we should restore all nodes’ hash value, the space cost of Tree-Hash is $O(n)$.

• Time Cost :

If we need to sort the members of multiset $T$, the time cost is $O(n\log n)$.

If we have some methods to calculate $h(T)$ in linear time of $T$’s size, the time cost is $O(n)$.

• Error Analysis :

Obviously, if the results is “no”, there two trees should not be isomorphic. But there exists a condition that two trees are non-isomorphic but Tree-Hash returns “yes”. In other words, Tree-Hash has one-sided error (False Positive).

The degree of $h$ is at most $n-1$, and $r$ is chosen among $p$ distinct values, so $\text{Pr}[V(r1)=V(r2)]\leq \frac{n-1}{p}$.

By choosing $p$ as a prime in the interval $[n^2, 2n^2]$, the one-sided error of false positive is at most $O(\frac{1}{n})$. We can also use different polynomials to make error probability lower.

### Solution 3

• Assuming the Bloom Filter which were created for $A\cap B$ as $F_C’$.
• We consider $F_A(i)$ as the $i$-th bit of Bloom Filter $F_A$, and $A_j$ means the $j$-th number of set $A$.
• Obviously, for $\forall 1\leq i\leq k, 1\leq j \leq n,\text{ all }F_C’(h_i(C_j))=1\Rightarrow \text{ all }F_C(h_i(C_j))=1$.
• Assuming that $\exists u\in A, u\not\in B, v\in B, v\not\in A$, we can know that $u,v\not\in\lbrace A\cap B \rbrace$.
• Consider if $\exists 1\leq i\leq k, 1\leq j \leq k, h_i(u)=h_j(v)=p \Rightarrow F_A(p)=F_B(p)=1 \Rightarrow F_C(p)=1$.
• As $u,v\not\in\lbrace A\cap B \rbrace$, $F_C’(p)=0$ may happen in a high probability.
• In this condition, $F_C$ is not the same as the Bloom Filter that are created for $A\cap B$.

• Assuming we have 3 sets $X,Y,Z$, which satisfies

• $X\cup Y = A$
• $Y \cup Z = B$
• $X \cap Y = X \cap Z = Y \cap Z = \emptyset$
• We can easily derive that $Y=A \cap B, X=A-Y, Z=B-Y$.

• For a set with size equals $p$, the expectation number of positions which is $1$ using $k$ hash function, means $\mathbb{E}[\text{1’s number}]= \frac{m^{kp}-(m-1)^{kp}}{m^{kp-1}}$.

• Assuming $a_i$ means the expectation number of positions which is $1$ using one hash function for a set with size $i$.
• We know that obviously $a_1=1$ and can derive that $a_i=a_{i-1}+\frac{m-a_{i-1}}{m}=\frac{m-1}{m}*a_{i-1}+1$.
• Than we will find that $a_i=\frac{m^i-(m-1)^i}{m^{i-1}}$, meanwhile we can see $k$ hash functions doing $ki$ hash processes independently and uniformly.
• So $\mathbb{E}[{\text{number of positions which is }1 \text{ using }k \text{ hash function for a set with size }i}]=\frac{m^{ki}-(m-1)^{ki}}{m^{ki-1}}$, we record it as $s_i$.
• Consider that the different positions between $F_A$ and $F_B$ born from two conditions : positions in $F_A$ is $1$ while in $F_B$ is $0$, positions in $F_A$ is $0$ while in $F_B$ is $1$.

The above two conditions are symmetric, so we can consider one of them.

Now we are talking about the expectation number of positions which in $F_A$ are 1, while in $F_B$ are 0, we called it $E_h$.

• Assuming that $|A\cap B|=c$, easily we can find that $|Y|=c, |X|=|Z|=n-c=d$.

• We consider several kinds of $1$s below :

• $E_1$ means the expectation number of positions which in $F_X$ are 1, while in $F_Z$ are 0.

We will find that $E_1=s_d \times (m-s_d) / m$ because there are no elements exist in $X$ and $Z$ at a time.

• $E_2$ means the expectation number of positions which in $F_X$ are 1, while in $F_Y$ are 1.

We will find that $E_2=s_d \times s_c / m$ because there are no elements exist in $X$ and $Y$ at a time.

• $E_3$ means the expectation number of positions which in $F_X$ are 1, $F_Y$ are 1, while in $F_Z$ are 0.

We will find that $E_3=\frac{s_d\times (m-s_d) \times s_c}{m^2}$ because there are no elements exist in at least two sets of $X$ and $Y$ and $Z$ at a time.

• Consider $X \cap Y = X \cap Z = Y \cap Z = \emptyset$ and $F_A=F_X \bigvee F_Y,F_B=F_Z \bigvee F_Y$.

We will derive that $E_h=E_1-E_3=\frac{m\times s_d\times (m-s_d)-s_d\times (m-s_d)\times s_c}{m^2}$.

• The answer $\mathbb{E}=2E_h=2\times\frac{(m-s_c)\times s_d\times (m-s_d)}{m^2}$.

Simple the formula, $\mathbb{E}=2m\times\frac{(m-1)^{kn}m^{k(n-c)}-(m-2)^{k(2n-c)}}{m^{k(2n-c)}}=2m\times((1-\frac{1}{m})^{kn}-(1-\frac{1}{m})^{k(2n-c)})$, in which $c=|A\cap B|$.

## Problem Set 2

### Solution 1

• Consider $P_i=\text{Pr}[\text{The number of components will decrease from }i \text{ to }i-1 \text{ when adding an edge to graph}]$.

• For every node $x$, there will exist at least $i-1$ edges connected to it and satisfy the condition,

so for the graph there will exist at least $\frac{n(i-1)}{2}$ edges satisfy the condition.

• Meanwhile, there are at most $\frac{n(n-1)}{2}-(i-1)$ edges still not being added to the graph,

so $P_i \geq \frac{n(i-1)}{2} / (\frac{n(n-1)}{2}-(i-1))=\frac{ni-n}{n^2-n-2i+2} \geq \frac{i-1}{n-1}$, because of $i \gt 1$.

• Let $e_i$ be the number of edges we need to make number of components decrease from $i$ to $i-1$, which means $\mathbb{E}[e_i]=1/P_i \leq \frac{n-1}{i-1}$.

• Obviously we have $\mathbb{E}=\mathbb{E}[\sum_{i=2}^ne_i]=\sum_{i=2}^n\mathbb{E}[e_i]\leq \sum_{i=2}^n\frac{n-1}{i-1} = (n-1)H(n-1)$.

Means $\mathbb{E} \leq (n-1)\ln (n-1)+O(n-1)$.

### Solution 2

• We define that there at most $\beta_i$ bins each containing at least $i$ balls in the end.

• Consider that for a single operation, we choose the bin with $k$ balls in it.

• If we choose the uniform paradigm, then $\beta_{k+1}$ will increase.

• If we choose “Two-Choices” paradigm, then we need to choose a bin with $p$ balls in it uniformly.

If $p \leq k$ then $\beta_{k+1}$ will increase, otherwise it will not increase.

$\text{Pr}[\beta_{k+1}\text{ will increase}]=1 - \frac{\text{numbers of bins which contains more than }k\text{ balls} }{n} \leq 1$ .

• We can derive the conclusion that

the uniform paradigm always has the better paradigm for us to get higher maximum.

• So for combinations of two paradigms, we define

$\mathbb{E}_1$ means the expectation of maximum balls in one bin only consider the uniform paradigm,

$\mathbb{E}_2$ means the expectation of maximum balls in one bin at the condition that we change “Two-Choices” paradigm to the uniform paradigm,

and $\mathbb{E}$ means the the expectation of maximum balls in one bin for original paradigm.

• For all 3 subproblems, we can derive the same conclusion :

• $\mathbb{E}_1=\Theta(\frac{\log n }{\log \log n})$.
• $\mathbb{E}_2=\Theta(\frac{\log n }{\log \log n})$.
• $\mathbb{E}_1 \leq \mathbb{E} \leq \mathbb{E}_2$.
• So for all 3 subproblems, the asymptotically tight bound is $\Theta(\frac{\log n }{\log \log n})$.

### Solution 3

• $\text{Pr}[X \geq t] = \text{Pr}[e^{\lambda X} \geq e^{\lambda t}] \leq \frac{\mathbb{E}[e^{\lambda X}]}{e^{\lambda t}} = e^{\ln \mathbb{E}[e^{\lambda X}] - \lambda t}$.

• For $\forall t, \lambda t - \ln \mathbb{E}[e^{\lambda X}] \geq \sup_{\lambda \geq 0} (\lambda t - \ln \mathbb{E}[e^{\lambda X}]) = \Psi^*_X(t)$,

which means $\text{Pr}[X \geq t] \leq e^{\ln \mathbb{E}[e^{\lambda X}] - \lambda t} \leq \exp(-\Psi^*_X(t))$.

• Consider $F(\lambda) = \lambda t - \Psi_X(\lambda)$, $F’(\lambda) = t - \Psi’_X(\lambda)$,

means $F(\lambda)$ reaches an extremum over $\lambda \geq 0$ satisfying $\Psi’_X(\lambda) = t$.

• Because $\Psi_X(\lambda)$ is strictly convex as the problem statement says,

we can derive that $F(\lambda)$ reaches supreme here uniquely.

• $\mathbb{E}[e^{\lambda X}]=\int_{-\infty}^{+\infty}\exp(\lambda x)\frac{1}{\sigma \sqrt{2\pi}}\exp(-\frac{(x-\mu)^2}{2\sigma^2})=\exp(\lambda\mu+\sigma^2\lambda^2/2)$.
• $\Psi_X(\lambda)=\ln \mathbb{E}[e^{\lambda X}]=\lambda\mu+\sigma^2\lambda^2/2$.
• $\Psi^X(t)=\sup{\lambda \geq 0}(\lambda t - \Psi_X(\lambda))$, put $\Psi’_X(\lambda)=t \Leftrightarrow \mu + \sigma^2\lambda=t \Leftrightarrow \lambda=\frac{t-\mu}{\sigma^2}$ in, we can derive that $\Psi^_X(t)=\frac{(t-\mu)^2}{2\sigma^2}$.
• $\text{Pr}[X \geq t] \leq \exp(-\Psi^*_X(t)) = \exp(-\frac{(t-\mu)^2}{2\sigma^2})$.

• $\mathbb{E}[e^{\lambda X}] = \sum_{k=0}^{+\infty} e^{\lambda k}\frac{e^{-\nu}\nu^k}{k!} = \exp((e^\lambda - 1)\nu) \sum_{k=0}^{+\infty}\frac{\exp(-\nu e^\lambda)(\nu e^\lambda)^k}{k!} = \exp(\nu(e^\lambda - 1)) = \exp(\nu e^\lambda - \nu)$.
• $\Psi_X(\lambda) = \ln \mathbb{E}[e^{\lambda X}] = \nu e^\lambda - \nu$.
• $\Psi’_X(\lambda) = t \Leftrightarrow \lambda = \ln \frac{t}{\nu}$.
• $\Psi^*_X(t)=t\ln \frac{t}{\nu} - t + \nu$.
• $\text{Pr}[X \geq t] \leq \exp(-\Psi^*_X(t)) = e^{t - \nu}(\frac{\nu}{t})^t$.

• Obviously, $\mathbb{E}[e^{\lambda X}] = pe^\lambda + (1-p)$, and $\Psi_X(\lambda) = \ln (pe^\lambda + (1-p))$.
• Consider $\Psi’_X(\lambda) = t \Leftrightarrow \lambda = \ln \frac{(1-p)t}{(1-t)p}$.
• $\Psi^*_X(t) = t\ln \frac{(1-p)t}{(1-t)p} - \ln ( \frac{t(1-p)}{1-t} + 1-p ) = (1-t)\ln \frac{1-t}{1-p} + t\ln \frac{t}{p}$.

• Consider each $X_i$ are independent, $\Psi_X(\lambda) = \ln( \prod_{i=1}^n\mathbb{E}[e^{\lambda X_i}] ) = \sum_{i=1}^n \ln(\mathbb{E}[e^{\lambda X_i}]) = \sum_{i=1}^n \Psi_{X_i}(\lambda)$.
• $\Psi^X(t) = \sup{\lambda \geq 0}(\lambda t - \Psi_X(\lambda)) = \sup_{\lambda \geq 0}(\lambda t - \sum_{i=1}^n\Psi_{X_i}(\lambda)) = \sum_{i=1}^n \sup_{\lambda \geq 0}(\lambda t/n - \Psi_{X_i}(\lambda)) = n\Psi^_{X_i}(\frac{t}{n}).$
• For $X \sim Bin(n,p)$, $\text{Pr}[X \geq t] \leq \exp(\Psi^_X(t))=\exp(n\Psi^_{X_i}(\frac{t}{n}))=(\frac{n-t}{n-np})^{n-t}(\frac{t}{np})^t$.
• When $X_i$ comply with the geometric distribution with a probability $p$ of success :
• $\Psi_{X_i}(\lambda)=\ln\mathbb{E}[e^{\lambda X_i}] = \ln( \sum_{k=0}^{+\infty} e^{\lambda k}(1-p)^kp) = \ln p -\ln(1+(p-1)e^\lambda)$.
• The above is well defined when $\lambda \lt -\ln(1-p)$.
• $\Psi’_{X_i}(\lambda) = t \Leftrightarrow \lambda = \ln \frac{t}{(1-p)(t+1)}$.
• $\Psi^*_{X_i}(t)=t\ln\frac{t}{(1-p)(t+1)}-\ln p(t+1)$.
• $\text{Pr}[X \geq t] \leq \exp(\Psi^_X(t))=\exp(n\Psi^_{X_i}(\frac{t}{n}))=(\frac{t}{(1-p)(t+n)})^t(\frac{n}{p(t+n)})^n$.

### Solution 4

• If $n \lt k$, according to Pigeonhole Principal , it’s obvious that $d = 0$.

So we only consider the condition that $n \geq k$.

• We assume that the distance result is $(\frac{1}{2}-u\sqrt{r})n=D$.

• Then we want to prove that if we choose a pair $x_1, x_2 \in \lbrace 0, 1\rbrace ^n$ randomly,

the expectation number of pairs which satisfies $d \leq D$ is less than 1.

• Consider we have $x_1, x_2 \in \lbrace 0, 1\rbrace ^n$, $X_i=\text{Pr}[x_1(i) \neq x_2(i)]=\frac{1}{2}$,

and we can see $X_i$ as $n$ independent randomize varibles, which $\mu=\mathbb{E}[X] = \frac{n}{2}$.

• Using Chernoff Bounds, we can derive that $\text{Pr}[X \leq (1-\delta)\mu] \lt \exp(-\frac{\mu\delta^2}{2})$.

• Because there are $\frac{2^k(2^k-1)}{2}$ different pairs, so we want $\exp(-\frac{\mu\delta^2}{2}) \leq 4^{-k} \leq \frac{2}{2^k(2^k-1)}$.

We take $\mu=\frac{n}{2}$ and $(1-\delta)\mu = D \Leftrightarrow \delta = 2u\sqrt{r}$.

Then we have $\exp(-u^2k) \leq 4^{-k} \Leftrightarrow u \geq \sqrt{\ln 4}$.

• Conslusion the above, we can derive that :

There exists a linear boolean code $C : \lbrace 0, 1 \rbrace^k \to \lbrace 0, 1 \rbrace ^ n$ of code rate $r$, and distance $(\frac{1}{2}-\Theta(\sqrt{r}))n$.

The constant in $\Theta(\cdot)$ can be $\sqrt{\ln 4}$.

• If $n \lt k$, according to Pigeonhole Principal , it’s obvious that $d = 0$.

So we only consider the condition that $n \geq k$.

• We assume that the distance result is $(\frac{1}{2}-u\sqrt{r})n=D$.

• Then we want to prove that if we choose the Matrix $A$ randomly,

the expectation number of Matrix which satisfies $d \leq D$ is less than 1.

In other words, there exists a Matrix can make $d \gt D$.

• Consider a $n\times k$ Matrix $A$ and we have $x_1, x_2 \in \lbrace 0, 1\rbrace ^k$, $S = \lbrace i \mid 1 \leq i \leq k, x_1(i) \neq x_2(i) \rbrace$,

then $d \leq D$ means $\sum_{i=1}^n [\sum_{j \in S} A_{i,j} \equiv 1 (\mod2)] \leq D$.

• Consider $X_i=\text{Pr}[\text{for a valid }i, \sum_{j \in S} A_{i,j} \equiv 1 (\mod2)]=\frac{1}{2}$,

and we can see $X_i$ as $n$ independent randomize varibles, which $\mu=\mathbb{E}[X] = \frac{n}{2}$.

• Using Chernoff Bounds, we can derive that $\text{Pr}[X \leq (1-\delta)\mu] \lt \exp(-\frac{\mu\delta^2}{2})$.

• Because there are $2^k$ different sets $S$, so we want $\exp(-\frac{\mu\delta^2}{2}) \leq 2^{-k}$.

We take $\mu=\frac{n}{2}$ and $(1-\delta)\mu = D \Leftrightarrow \delta = 2u\sqrt{r}$.

Then we have $\exp(-u^2k) \leq 2^{-k} \Leftrightarrow u \geq \sqrt{\ln 2}$.

• Conslusion the above, we can derive that :

There exists a linear boolean code $C : \lbrace 0, 1 \rbrace^k \to \lbrace 0, 1 \rbrace ^ n$ of code rate $r$, and distance $(\frac{1}{2}-\Theta(\sqrt{r}))n$.

The constant in $\Theta(\cdot)$ can be $\sqrt{\ln 2}$.

## Problem Set 3

### Solution 1

• Because random variables $X​$ are negatively associated.

• Define $f(X_i, i \in [k-1] ) = \prod_{i \in [k-1]} f_i(X_i)$, $g(X_j, j \in \lbrace k \rbrace) = f_k(X_k)$.

• Obviously $f$ and $g$ are non-negative and non-decreasing, from the definition of negatively associated,

$\mathbb{E}[f(X_i, i \in [k-1])g(X_j, j \in \lbrace k \rbrace)] \leq \mathbb{E}[f(X_i, i \in [k-1])] \mathbb{E}[g(X_j, j \in \lbrace k \rbrace)]$

$\Rightarrow \mathbb{E}[ \prod_{i \in [k] } f_i(X_i) ] \leq \mathbb{E}[f_k(X_k)] \cdot \mathbb{E}[\prod_{i \in [k-1]} f_i(X_i)]$

• By expanding the inequality, we can easily derive that :

$\mathbb { E } [ \prod _ { i \in [ n ] } f _ { i } ( X _ { i } ) ] \leq \prod _ { i \in [ n ] } \mathbb { E } [ f _ { i } ( X _ { i } ) ]$

• Consider $X = \sum_{i=1}^n X_i$, $X_1, X_2, \cdots, X_n$ are negatively associated, $\mu = \mathbb{E}[X]$.

• Upper Tail :

For $\lambda \geq 0$, $\text{Pr}[X \geq (1+\delta)\mu] = \text{Pr}[e^{\lambda X} \geq e^{\lambda (1+\delta)\mu}] \leq \frac{\mathbb{E}[e^{\lambda X}]}{e^{\lambda (1+\delta)\mu}}$

Proof :

• $\mathbb{E}[e^{\lambda X}] = \mathbb{E}[ \exp( \lambda \sum_{i=1}^n X_i )] = \mathbb{E}[ \prod_{i=1}^n \exp(\lambda X_i) ] \leq \prod_{i=1}^n \mathbb{E}[\exp(\lambda X_i)]$

( $e^{\lambda X}$ is a non-decreasing function )

• Let $p_i = \text{Pr}[X_i = 1] for i = 1, 2, \cdots, n$. Then $\mu = \sum_{i=1}^n p_i$.

• $\mathbb{E}[e^{\lambda X_i}] = p_i \cdot e^{\lambda \cdot 1} + (1-p_i) \cdot e^{\lambda \cdot 0} = 1 + p_i(e^{\lambda} - 1) \leq e^{p_i(e^\lambda - 1)}$.

• Therefore, $\mathbb{E}[e^{\lambda X}] \leq e^{\mu (e^{\lambda}-1)}$.

• Then we have $\text{Pr}[X \geq (1+\delta)\mu] \leq \left( \frac{\exp(e^\lambda - 1)}{\exp(\lambda(+\delta))} \right)^\mu = \left( \frac{e^\delta}{(1+\delta)^{1+\delta}} \right)^\mu$, when $\lambda = \ln(1 + \delta) \gt 0$.

• Lower Tail :

Consider $\forall 1 \leq i \leq n, Y_i = 1 - X_i, Y = \sum_{i=1}^n Y_i = n - X$.

Proof :

• $\text{Pr}[ X \leq (1-\delta)\mu ] = \text{Pr}[n-X \geq n - (1-\delta)\mu] = \text{Pr}[e^{\lambda Y} \geq e^{\lambda(n-(1-\delta)\mu)}] \leq \frac{\mathbb{E}[e^{\lambda Y}]}{e^{\lambda (n-(1-\delta))\mu}}$.
• $\mathbb{E}[e^{\lambda Y}] \leq \prod_{i=1}^n \mathbb{E}[e^{\lambda Y_i}]$ ( $\lambda \gt 0$ $\Rightarrow e^{\lambda X}$is a non-decreasing function ).
• $\mathbb{E}[e^{\lambda Y_i}] = p_i\exp(\lambda (1-1)) + (1-p_i)\exp(\lambda(1-0)) = e^\lambda + p_i(1 - e^\lambda) \leq e^\lambda \exp(p_i(e^{-\lambda}-1))$.
• Then we have $\text{Pr}[X \leq (1-\delta)\mu] \leq \frac{\exp(n\lambda)\exp(\mu(e^{-\lambda} - 1))}{\exp(\lambda(n-(1-\delta)\mu)} = \left( \frac{e^{-\delta}}{(1-\delta)^{1-\delta}} \right)^\mu$, when $\lambda = - \ln(1 - \delta) \gt 0$.

• Suppos that for a stable bin $l$, we consider $B_{1,l}, \cdots, B_{n,l}$ as $H_1, \cdots, H_n$.

Now we want to prove $H=(H_1, \cdots, H_n)$ are negatively associated first.

• First of all, $H=(H_1, \cdots, H_n)$ satisfies $\forall 1 \leq i \leq n, H_i \in \lbrace 0, 1 \rbrace$, and $\sum_{i=1}^n H_i = 1$.

• For any non-decreasing function $f$, consider $f(H_i, i \in I)$.

We define $f(\underbrace{0, \cdots, 0}{|I|}) = a$, and $f(\underbrace{0, \cdots, 1, \cdots, 0}{|I|}) = a + u(i)$.

($H_i = 1$, $f$ is a non-decreasing function $\Rightarrow u(i) \geq 0$ ).

• For any non-decreasing function $g$, consider $g(H_j, j \in J)$.

We define $g(\underbrace{0, \cdots, 0}{|J|}) = b$, and $g(\underbrace{0, \cdots, 1, \cdots, 0}{|J|}) = b + v(j)$.

( $H_j = 1$, $g$ is a non-decreasing function $\Rightarrow v(j) \geq 0$ ).

• There exist exactly one $i \in [n]$ satisfies $H_i = 1, \forall j \in [n], j \neq i, H_j = 0$.

• We can calculate that :

$\mathbb{E}[f(H_i, i \in I)] = a + \sum_{i \in I} u(i)p_{i, l}$

$\mathbb{E}[g(H_j, j \in J)] = b + \sum_{j \in J} v(j)p_{j, l}$

$\mathbb{E}[f(H_i, i \in I)g(H_j, j \in J)] = ab + b\sum_{i \in I} u(i)p_{i, l} + a\sum_{j \in J} v(j)p_{j, l}$

( Because $\exist i \in I$, $H_i = 1 \Rightarrow \forall j \in J, H_j = 0$, vice versa )

• Obviously we have :

$\sum_{i \in I} u(i)p_{i, l} \times \sum_{j \in J} v(j)p_{j, l} \geq 0$

$\Rightarrow \mathbb{E}[f(H_i, i \in I)g(H_j, j \in J)] \leq \mathbb{E}[f(H_i, i \in I)] \mathbb{E}[g(H_j, j \in J)]$

• We can derive that $H=(H_1, \cdots, H_n)$ are negatively associated.

• Because $\forall l \in [m]$, $B_{1,l}, \cdots, B_{n,l}$ are negatively associated.

Then we can get $B_{1,1}, \cdots, B_{n,m}$ are negatively associated. ( Closure under products )

• Then we define $f_i(B_{i,1}, \cdots, B_{i,m}) = \sum_{t=1}^m B_{i,t}$. Obviously $f_i$ is a non-decreasing funciton.

• $B=(B_1, \cdots, B_n) = (f_1(B_{1,1}, \cdots, B_{1,m}), \cdots, (f_n(B_{n,1}, \cdots, B_{n,m}))$ are negatively associated.

( Disjoint monotone aggregation )

• Finally, we can get the conclusion that $B_1, \cdots, B_n$ are negatively associated.

### Solution 2

• Suppose that $n$ is the degree of node $u$.

Actually because the graph is a $\Delta$-regular bipartite graph, $n = \Delta$.

• Consider $n$ independent $\text{2-d}$ random variables $x_1, \cdots, x_n$, in which $\forall 1 \leq i \leq n, x_i = ( c_i, s_i )$.

$c_i$ means the color of $e_i$, $e_i$ means the $i$-th edge which incident to node $u$,

$s_i \in \lbrace 0, 1 \rbrace = \begin{cases} 0, & \text{there is no edge incident to }v_i \text{ has same color with }e_i \ 1, & \text{there exists edges incident to }v_i \text{ has same color with }e_i \end{cases}$.

$v_i$ means the other endpoint of $e_i$ except $u$.

• We can easily find that we can calculate

$Z = f(x_1, \cdots, x_n)$ means the number of edges which can get its final color after the first step,

only use $x_1, \cdots, x_n$.

• Additionaly, we need to prove that $x_1, \cdots, x_n$ are independent :

Actually, for an edge $e_i$, $s_i$ is decides by $c_i$ and edges’ color which incident to node $v_i$.

• Consider if we change $c_i$, and $\exist 1 \leq j \leq n, j \neq i, x_j$ changes, we can derive that $s_j$ changes.

This means $e_i$ and $e_j$ are adjancent not only by node $u$,

which means the graph is not a simple graph, which conflicts to the definition.

• Consider if we change the color of one edge which is not incident to node $u$ :

There may $\exist 1\leq i \leq n$ , $s_i$ changes.

If there $\exist 1 \leq i,j \leq n, i \neq j$, both $s_i$ and $s_j$ changes, that means :

There exist an edge, which connects node $v_i$ and $v_j$.

This will be impossible, otherwise the graph is not a bipartite graph cause there exists a circle consists of $u, e_i, e_j$ 3 nodes.

To conclude the above, at most one element of $x_1, \cdots, x_n$ will change if we change any edge’s color once, which means $x_1, \cdots, x_n$ are indepedent.

• Now we find that $\forall i, 1 \leq i \leq n$, if we change $x_i$ to $y_i$, the $f(x_1, x_2, \cdots, x_n)$ may change too.

• Consider the worst case :

The edge $i$ and the only edge with color $y_i$’s first sub-element can both get its final color in the first step before this change.

Now they both cannot get their final color, the answer decline by 2.

The best case can be analyzed in the same way.

• Then we can derive that, for any $x_1, x_2, \cdots, x_n$ and any $y_i$ :

$|f(x_1, \cdots, x_{i-1}, x_i, x_{i+1}, \cdots, x_n) - f(x_1, \cdots, x_{i-1}, y_i, x_{i+1}, \cdots, x_n)| \leq 2$.

Which means $f$ satisfies the Lipschitz condition with constant $2$.

• Consider the Bounded Difference Method :

• $\text{Pr}[ | Z - \mathbb{E}[Z] | \gt t] \leq 2 \exp(- \frac{t^2}{2 \sum_{i=1}^n 4}) = 2 \exp(- \frac{t^2}{8n}) = 2 \exp(- \frac{t^2}{8 \Delta})$ ( $n = \Delta$ )

Which means $\text{Pr}[ | Z - \mathbb{E}[Z] | \gt t] \leq 2 \exp( - \Theta(\frac{t^2}{\Delta}) )$.

• ### Solution 3

• We call $x_i[p]$ as the $p$-th dimension coordinate of point $x_i$.

• $\text{cost}(x_1, \cdots, x_n, C_1, \cdots, C_k) = \sum_{i=1}^k \sum_{j \in C_i} | x_j - \mu_i |_2^2$

$= \sum_{i=1}^k \frac{1}{|C_i|^2} \sum_{j \in C_i} | |C_i|x_j - \sum_{u \in C_i}x_u |_2^2$

$= \sum_{i=1}^k \frac{1}{|C_i|^2} \sum_{j \in C_i} \sum_{t=1}^d (|C_i|x_j[t] - \sum_{l \in C_i} x_l[t])^2$

$=\sum_{i=1}^k \frac{1}{|C_i|} \sum_{t=1}^d \lbrace |C_i|\sum_{j \in C_i} x_j[t]^2 - (\sum_{j \in C_i} x_j[t]) ^2\rbrace$

$= \sum_{i=1}^k \frac{1}{|C_i|} \sum_{t=1}^d \lbrace (|C_i|-1)\sum_{j \in C_i} x_j[t]^2 - 2\sum_{j,l \in C_i, j \lt l} x_j[t]x_l[t] \rbrace$

$= \sum_{i=1}^k \frac{1}{|C_i|} \sum_{t=1}^d \sum_{j,l \in C_i, j \lt l} ( x_j[t] - x_l[t] )^2$

$= \sum_{i=1}^k \frac{1}{|C_i|} \sum_{j,l \in C_i, j \lt l} | x_j - x_l |_2^2$

• We have already known that

For the matrix $A \in \mathbb{R}^{k \times d}$, let each entry of $A$ os chosen i.i.d. from $N(0, 1 / k)$,

then for any unit vector $\vec{u} \in \mathbb{R}^d : \text{Pr}[| | A\vec{u} |_2^2 - 1 | \gt \epsilon] \lt 1 / n^3$,

Which means $\text{Pr}[| | \tilde{x}_i - \tilde{x}_j |_2^2 - | x_i - x_j |_2^2 | \gt \epsilon \cdot | x_i - x_j |_2^2] \lt 1 / n^3$.

• $\text{Pr}[| | \tilde{x}_i - \tilde{x}_j |_2^2 - | x_i - x_j |_2^2 | \gt \epsilon \cdot | x_i - x_j |_2^2]$

$= \text{Pr}[ | \tilde{x}_i - \tilde{x}_j |_2^2 \gt (1 + \epsilon) | x_i - x_j |_2^2 ] + \text{Pr}[ | \tilde{x}_i - \tilde{x}_j |_2^2 \lt (1 +\epsilon) | x_i - x_j |_2^2 ]$

$\Rightarrow \begin{cases} \text{Pr}[ | \tilde{x}_i - \tilde{x}_j |_2^2 \gt (1 + \epsilon) | x_i - x_j |_2^2 ] \lt 1 / n^3 \ \text{Pr}[ | \tilde{x}_i - \tilde{x}_j |_2^2 \lt (1 +\epsilon) | x_i - x_j |_2^2 ] \lt 1 / n^3 \end{cases}$

• Then we use union bound :

$\text{Pr}[ \sum_{j,l \in C_i, j \lt l} | \tilde{x}_j - \tilde{x}_l |2^2 \gt (1 + \epsilon) \sum{j,l \in C_i, j \lt l} | x_j - x_l |_2^2] \lt |C_i|^2 \cdot 1 / n^3$

$\text{Pr}[ \sum_{j,l \in C_i, j \lt l} | \tilde{x}_j - \tilde{x}_l |2^2 \lt (1 - \epsilon) \sum{j,l \in C_i, j \lt l} | x_j - x_l |_2^2] \lt |C_i|^2 \cdot 1 / n^3$

• Additionaly we have : $\sum_{i=1}^k |C_i|^2 \leq (\sum_{i=1}^k |C_i|)^2 = n^2$

• Then we can derive that :

$\text{Pr}[ \sum_{i=1}^k \frac{1}{|C_i|} \sum_{j,l \in C_i, j \lt l} | \tilde{x}_j - \tilde{x}_l |2^2 \gt (1 + \epsilon)\sum{i=1}^k \frac{1}{|C_i|} \sum_{j,l \in C_i, j \lt l} | x_j - x_l |_2^2] \lt n^2 \cdot 1 / n^3 = 1 / n$

$\text{Pr}[ \sum_{i=1}^k \frac{1}{|C_i|} \sum_{j,l \in C_i, j \lt l} | \tilde{x}_j - \tilde{x}_l |2^2 \lt (1 - \epsilon)\sum{i=1}^k \frac{1}{|C_i|} \sum_{j,l \in C_i, j \lt l} | x_j - x_l |_2^2] \lt n^2 \cdot 1 / n^3 = 1 / n$

• That means :

$( 1 - \epsilon ) \text {cost} \left( x_ { 1 } , \cdots , x_ { n } , C _ { 1 } , \cdots , C _ { k } \right)\leq \text {cost} \left( \tilde { x } _ { 1 } , \cdots , \tilde { x } _ { n } , C _ { 1 } , \cdots , C _ { k } \right) \leq ( 1 + \epsilon ) \text {cost} \left( x_ { 1 } , \cdots , x _ { n } , C _ { 1 } , \cdots , C _ { k } \right)$

w.h.p.

• Consider the above, we have :

$( 1 - \epsilon ) \text {cost} \left( x_ { 1 } , \cdots , x_ { n } , C _ { 1 } , \cdots , C _ { k } \right)\leq \text {cost} \left( \tilde { x } _ { 1 } , \cdots , \tilde { x } _ { n } , C _ { 1 } , \cdots , C _ { k } \right) \leq ( 1 + \epsilon ) \text {cost} \left( x_ { 1 } , \cdots , x _ { n } , C _ { 1 } , \cdots , C _ { k } \right)$

w.h.p.

• So we can derive that :

$\text {cost} \left( x_ { 1 } , \cdots , x_ { n } , \tilde{C} _ { 1 } , \cdots , \tilde{C} _ { k } \right) \leq \frac{1}{1 - \epsilon} \text {cost} \left( \tilde{x}_ { 1 } , \cdots , \tilde{x}_ { n } , \tilde{C} _ { 1 } , \cdots , \tilde{C} _ { k } \right)$ w.h.p.

$\text {cost} \left( \tilde{x}_ { 1 } , \cdots , \tilde{x}_ { n } , \tilde{C} _ { 1 } , \cdots , \tilde{C} _ { k } \right) \leq \gamma \cdot \text{cost}\left( \tilde{x}_ { 1 } , \cdots , \tilde{x}_ { n } , \tilde{C}^ _ { 1 } , \cdots , \tilde{C}^ _ { k } \right)$ ( we already knew )

$\text{cost}\left( \tilde{x}_ { 1 } , \cdots , \tilde{x}_ { n } , \tilde{C}^ _ { 1 } , \cdots , \tilde{C}^ _ { k } \right) \leq \text{cost}\left( \tilde{x}_ { 1 } , \cdots , \tilde{x}_ { n } , C^ _ { 1 } , \cdots , C^ _ { k } \right)$ ( the nature of optimal partition )

$\text{cost}\left( \tilde{x}_ { 1 } , \cdots , \tilde{x}_ { n } , C^ _ { 1 } , \cdots , C^ _ { k } \right) \leq (1+\epsilon)\text{cost}\left( x_ { 1 } , \cdots , x_ { n } , C^ _ { 1 } , \cdots , C^ _ { k } \right)$ w.h.p.

• Consider the inequality above :

$\text {cost} \left( x_ { 1 } , \cdots , x_ { n } , \tilde{C} _ { 1 } , \cdots , \tilde{C} _ { k } \right) \leq \frac{1+\epsilon}{1-\epsilon} \cdot \gamma \cdot \text{cost}\left( x_ { 1 } , \cdots , x_ { n } , C^ _ { 1 } , \cdots , C^ _ { k } \right)$ w.h.p.

• In the condition $\epsilon \leq 1/2$, $\frac{1+\epsilon}{1-\epsilon} \leq 1 + 4\epsilon$.

• By concluding above, we can get the conclusion :

$\text {cost} \left( x_ { 1 } , \cdots , x_ { n } , \tilde{C} _ { 1 } , \cdots , \tilde{C} _ { k } \right) \leq (1 + O(\epsilon)) \cdot \gamma \cdot \text{cost}\left( x_ { 1 } , \cdots , x_ { n } , C^ _ { 1 } , \cdots , C^ _ { k } \right)$ w.h.p

## Problem Set 4

### Solution 1

• Suppose that $n=|V|$ and $m=|E|$, means the number of vertices and edges of graph $G$.

• Obviously when $k \geq n$, we can put vertices in different sets, so we only consider $k \lt n$.

• Firstly, let us see a random algorithm :

• Algorithm :

• For all $i \in [n]$, we generate indepedent distributed random variables $x_1, \cdots, x_n \in [k]$.
• For node $i$ in $G$, we put vertex $i$ to subset $S_{x_i}$.
• We use this partition as the result of $k$-cut of graph $G$.
• Calculate the cost $W=\sum_{\left(uv\right) \in E, \exists i \neq j, u \in S_i, v \in S_j} w\left(uv\right)$ and return it.
• If we called the result of optimal result $\text{OPT}$, now we want to prove that $\mathbb{E}[W] \geq \left(1 - \frac{1}{k}\right)\text{OPT}$.

• Provement :

\begin{align} \mathbb{E}[W] & = \sum_{\left(uv\right) \in E} \lbrace w\left(uv\right)\cdot\text{Pr}[\exists i,j \in [k], i\neq j, \text{ s.t. } u \in S_i, v \in S_j] \rbrace \ & = \sum_{\left(uv\right) \in E} \lbrace w\left(uv\right)\cdot \left(1 - \text{Pr}[\exists p \in [k], \text{ s.t. } u \in S_p, v \in S_p]\right) \rbrace \ & = \sum_{\left(uv\right) \in E} \lbrace w\left(uv\right)\cdot \left(1 - \sum_{p=1}^k \text{Pr}[u \in S_p, v \in S_p]\right) \rbrace \ & = \sum_{\left(uv\right) \in E} \lbrace w\left(uv\right)\cdot \left(1 - \sum_{p=1}^k \frac{1}{k^2}\right) \rbrace \ & = \frac{k-1}{k}\sum_{\left(u,v\right) \in E} w\left(u,v\right) \ & \geq \frac{k-1}{k}\text{OPT} \end{align}

• Now the greedy algorithm comes :

• Algorithm :

• For $i = 1$ to $n$ :

​ Select $x_i \in [k]$ satisfies that :

​ $\mathbb{E}[W \mid \forall j \in [i-1] \text{ vertex } j \text{ are put in subset } S_{x_j}, \text{ vertex } i \text{ are put in subset } S_{x_i}]$

​ $= \max_{p \in [k]} \mathbb{E}[W \mid \forall j \in [i-1] \text{ vertex } j \text{ are put in subset } S_{x_j}, \text{ vertex } i \text{ are put in subset } S_{p}]$.

• For node $i$ in $G$, we put vertex $i$ to subset $S_{x_i}$.

• We use this partition as the result of $k$-cut of graph $G$.

• Calculate the cost $W_g=\sum_{\left(uv\right) \in E, \exists i \neq j, u \in S_i, v \in S_j} w\left(uv\right)$ and return it.

• We can find that :

$\forall i \in [n], \max_{p \in [k]} \mathbb{E}[W \mid x_1, \cdots, x_{i-1}, p] \geq \mathbb{E}[W \mid x_1, \cdots, x_{i-1}]$

$\Rightarrow W_g \geq \mathbb{E}[W] \geq \left(1-\frac{1}{k}\right)\text{OPT}$.

• Obviously, calculate the expectation result is poly-time, so this algorithm is poly-time.

• The blank can be filled with “The cut will be larger if we transfer $v$ from $S_i$ to $S_{1-i}$.

• Running Time Analysis :

• There are some situations can make the algorithm not polynomial.

• Consider for a stable node $v$,

if we store the sum of edges connected to $v$,

which the other point of the edge is in set $S_i$ as $A_i$.

We can easily check whether there exists the node can make the results greater,

for one node in $O(1)$ time, and $O(|V|)$ for all nodes.

• When we transfer a node $v$ from one set to the other,

we need to change the $A_x$ value of other nodes which has a common edge with $v$.

This time cost is $O(\text{degree}(v))$, which can be considered as $O(|V|)$.

• We have known that for a single procedure,

because $w \in \mathbb{Z}^+$, the answer may increase at least $1$,

while the possible maximum answer is $\sum_{e \in E}w(e)$.

So the upper bound of running time is $O(|V|\sum_{e \in E}w(e))$.

• Provement of Approximation Ratio :

• For any vertex $v$, we consider the edges belongs to it.

• Let set $S_v$ means the edges belongs to $v$ which connects vertex in the same set as $v$,

$T_v$ means the edges belongs to $v$ which connects vertex in the defferent set as $v$.

• Obviously $S_v \cap T_v = \phi$.

We write the sum of all edges’ weights belongs to $v$ as $W_v$,

while $W_{S_v}$ and $W_{T_v}$ means the sum of edges’ weights in $S_v$ and $T_v$.

Then we have : $W_v = W_{S_v} + W_{T_v}$, the max-cut $W = \frac{1}{2} \sum_{i \in [n]} W_{T_i}$

• If $W_{S_v} \gt W_{T_v}$, we can transfer vertex $v$ to the other set, then $S_v$ and $T_v$ will swap, makes $W$ larger.

So $W_{S_v} \leq W_{T_v}$.

• To conclude the above, we can derive that :

$W = \frac{1}{2} \sum_{i \in [n]} W_{T_i} \geq \frac{1}{2} \sum_{i \in [n]} \frac{W_{S_i} + W_{T_i}}{2} \geq \frac{1}{2} \sum_{e \in E} w\left(e\right) \geq \frac{1}{2}\text{OPT}$.

### Solution 2

• Algorithm :

• For $i = 1$ to $k$ :

​ Choose $p_i \in [m]$ that $S_p$ has not been used and satisfies $|S_{p_i} \cap U|$ has the maximum size.

​ Mark $S_{p_i}$ as used, $U = U \backslash S_{p_i}$.

• Return $\left(S_{p_1}, \cdots, S_{p_k}\right)$ as results.

• Provement :

• Suppose that $a_i = |\cup_{j=1}^i S_{p_j}|$.
• Consider at any time, the number of subsets in the optimal case but not has been chosen is at most $k$.
• Then we have $a_{i+1} - a_i \geq \frac{\text{OPT} - a_i}{k - i} \geq \frac{\text{OPT} - a_i}{k} \Leftrightarrow \left(1-\frac{1}{k}\right)\left(\text{OPT}-a_i\right) \geq \text{OPT}-a_{i+1}$
• Suppose that $a_0 = 0$, we have $\left(1 - \frac{1}{k}\right)^k \text{OPT} \geq \text{OPT} - a_k$ by iterates and multiply the inequality above.
• Then we have $a_k \geq \text{OPT} \cdot \left(1 - \left(1 - 1 / k\right)^k\right) \gt \text{OPT} \cdot (1 - 1 / e)$.

### Solution 3

• For an edge $e = \left(u, v\right) \in E$, $\text{Pr}[e \text{ in the cut}] = \left(1 - x_v^\right)x_u^ \geq \left(y_{u,v}^*\right)^2$.

• $\mathbb{E} = \sum_{\left(u,v\right) \in E} \text{Pr}[\left(u,v\right) \text{ in the cut}] \geq \sum_{\left(u,v\right) \in E} \left(y_{u,v}^*\right)^2$.

• Consider Cauthy-Schwarz Inequality :

$\sum_{\left(u,v\right) \in E} \left(y_{u,v}^\right)^2 \cdot \sum_{i \in [|E|]} 1^2 \geq \lbrace \sum_{\left(u,v\right) \in E} \left(y_{u,v}^ \cdot 1\right) \rbrace ^2 = \text{OPT}_{\text{LP}}^2 \geq \text{OPT}^2$.

• Then we have $\mathbb{E} \geq \frac{\text{OPT}^2}{|E|}$.

• Consider in undirected graph we have the cut-edges,

now at least half of these edges can be cut-edges in directed graph.

• So we have $\text{OPT} \geq \frac{1}{2} \lbrace \text{OPT in undirected graph} \rbrace \geq \frac{1}{2} \cdot \frac{1}{2} |E| = \frac{1}{4}|E|$.

• Then we have $\mathbb{E} \geq \frac{1}{4}\text{OPT}$.

• \begin{align} \mathbb{E} & = \sum_{\left(u,v\right) \in E} \text{Pr}[\left(u,v\right) \text{ in the cut}] \ & = \sum_{\left(u,v\right) \in E}\left(\frac{1}{4}+\frac{x_u^}{2}\right)\left(1-\frac{1}{4}-\frac{x_v^}{2}\right) \ & = \sum_{\left(u,v\right) \in E}\left(\frac{1}{4}+\frac{x_u^}{2}\right)\left(\frac{1}{4}+\frac{1-x_u^}{2}\right) \ & \geq \sum_{\left(u,v\right) \in E} \left(\frac{1}{4}+\frac{y_{\left(u,v\right)}^}{2}\right)^2 \ & = \sum_{\left(u,v\right) \in E} \left(\frac{1}{4}-\frac{y_{\left(u,v\right)}^}{2}\right)^2 + \sum_{\left(u,v\right) \in E} \frac{1}{2}y_{\left(u,v\right)}^ \ & \geq \sum_{\left(u,v\right) \in E} \frac{1}{2}y_{\left(u,v\right)}^ \ & = \frac{1}{2}\text{OPT}_{\text{LP}} \geq \frac{1}{2}\text{OPT} \end{align}

### Solution 4

• \begin{align} \text{Pr}[C_j \text{ satisfied in SOL}] & = 1 - \left( \prod_{i \in S_j^+} (1-f(x_i^)) \right) \left( \prod_{i \in S_j^-} f(x_i^ )\right) \ & \geq 1 - \left( \prod_{i \in S_j^+} 4^{-x_i^} \right) \left( \prod_{i \in S_j^-} 4^{x_i^-1}\right) \ & = 1 - 4^{- \left( \sum_{i \in S_j^+} x_i^ + \sum_{i \in S_j^-} (1 - x_i^) \right)} \ & = 1 - 4^{-y_j^} \geq \frac{3}{4}y_j^ \end{align}

(We can easily prove that, $\forall 0 \leq x \leq 1, 1 - 4^{-x} \geq \frac{3}{4}x$).

• $\mathbb{E} = \sum_{j=1}^m \text{Pr}[C_j \text{ satisfied in SOL}] \geq \sum_{j=1}^m \frac{3}{4}y_j^* = \frac{3}{4}\text{OPT}_{\text{LP}} \geq \frac{3}{4}\text{OPT}$.

• Algorithm :

• Let $W$ denote # of satisfied clauses. We have $\mathbb{E}[W] \geq \frac{3}{4}\text{OPT}$.

• For $i = 1$ to $n$ :

​ IF $\mathbb{E}[x_i = 0 \mid x_1=a_1, \cdots, x_{i-1}=a_{i-1}] \geq \mathbb{E}[x_i = 1 \mid x_1=a_1, \cdots, x_{i-1}=a_{i-1}]$ :

​ $x_i = 0$

​ ELSE :

​ $x_i=1$

​ ENDIF

• No, because the integrality gap of the LP relaxation for MAX-SAT is $\frac{3}{4}$,

means $\text{SOL} \geq \alpha \text{OPT}_{\text{LP}} \geq \alpha \text{OPT}, \alpha \leq \frac{3}{4}$.

### Solution 5

• Integer Program :

Minimize $\sum_{j=1}^m x_j \cdot w_j$

subject to $\sum_{i=1}^m A_{i,j}x_i \geq 1$, $1 \leq j \leq n$

​ $x_j \in \lbrace 0,1 \rbrace$, $1\leq j \leq m$

In which $\forall 1 \leq i \leq m, 1 \leq j \leq n, A_{i,j} \in \lbrace 0,1 \rbrace$

​ $\begin{cases} A_{i,j} = 0, & j \not\in S_i \ A_{i,j} = 1, & j \in S_i \end{cases}$

• LP Relaxation :

Minimize $\sum_{j=1}^m x_j \cdot w_j$

subject to $\sum_{i=1}^m A_{i,j}x_i \geq 1$, $1 \leq j \leq n$

​ $0 \leq x_j \leq 1$, $1\leq j \leq m$

In which $\forall 1 \leq i \leq m, 1 \leq j \leq n, A_{i,j} \in \lbrace 0,1 \rbrace$

​ $\begin{cases} A_{i,j} = 0, & j \not\in S_i \ A_{i,j} = 1, & j \in S_i \end{cases}$

• Consider $\forall 0 \leq x \leq 1, e^{-x} \geq 1- x$.

• $\text{Pr}[x \text{ has not been covered}] = \prod_{i=1}^m (1 - A_{i,x}x_i^) \leq \prod_{i=1}^m e^{-A_{i,x}x_i^} \leq e^{-\sum_{i=1}^m A_{i,x}x_i^*} \leq e^{-1}$.

• Consider Union Bound : $\text{Pr}[\text{not all elements are covered after } k \text{ rounds}] \leq ne^{-k}$.

• Let $K$ denotes the round which all elements are covered,

we have $\text{Pr}[K \geq k] \leq \sum_{i=k-1}^{+\infty}\text{Pr}[\text{not all elements are covered after } i \text{ rounds}] \leq n\frac{e-1}{e}\cdot e^{k-1}$.

• Let $C_i$ denotes the results after $i$ rounds, then $\mathbb{E}[C_{i+1} - C_i]$ is non-increasing as $i$ increases.

( For each subset, the probability it will be chosen will not change,

but the expectation number of subsets has not been chosen will decrease as i increases. )

• $\mathbb{E}[C_1] = \sum_{i=1}^m w_i \cdot x_i^* = \text{OPT}_{\text{LP}} \leq \text{OPT} \Rightarrow \mathbb{E}[C_k] \leq k \cdot \text{OPT}$.

• Consider Markov Ineuqality, $\text{Pr}[C_k \geq \alpha k \text{OPT}] \leq \frac{\mathbb{E}[C_k]}{\alpha k \text{OPT}} \leq \frac{1}{\alpha}$.

• Let event $A$ denotes that this algorithm returns a set cover with approximation ratio $O(\log n)$.

$\text{Pr}[A] \geq \text{Pr}[K \lt k] \cdot \text{Pr}[C_k \lt \alpha k \text{OPT}] \geq \left( 1 - n\frac{e-1}{e}\cdot e^{k-1} \right) \cdot \frac{\alpha - 1}{\alpha} = P$.

If we want $P \geq 0.99$, we let $\alpha = 10000$, then we want $1 - n \frac{e-1}{e} \cdot e^{k-1} \geq \frac{1}{1.01} \Rightarrow k \leq \ln n + 1$.

### Solution 1

Reference :

• We consider that :

• $x_{i, j} \in \lbrace 0, 1 \rbrace$ denotes whether client $j$ is assigned to facility $i$, and $d_{i, j}$ denotes the connection cost,

$0$ - not assigned, $1$ - assigned

• $z_j \in \lbrace 0, 1 \rbrace$ denotes whether cilent $j$ is connected to some facilities, and $p_j$ denotes the penalty of client $j$,

0 - connected, 1 - not connected

• $y_i \in \lbrace 0, 1 \rbrace$ denotes whether facility $i$ is opened, and $f_i$ denotes the open cost of facility $i$,

0 - not opened, 1 - opened

• Then we have the Primal ILP :

$$\begin{array}{} \text{minimize} & \sum_{i, j} d_{i, j} \cdot x_{i, j} + \sum_j p_j \cdot z_j + \sum_i f_i \cdot y_i \ \text{s.t.} & \forall i,j, \ x_{i,j} \leq y_i, & i \in F, j \in C \ & \forall j, \ \sum_j x_{i, j} + z_j \geq 1, & i \in F, j \in C \ & x_{i,j}, z_j, y_i \in \lbrace 0, 1 \rbrace, & i \in F, j \in C \end{array}$$

• By using LP-relaxation, we can write the Primal LP :
$$\begin{array}{} \text{minimize} & \sum_{i, j} d_{i, j} \cdot x_{i, j} + \sum_j p_j \cdot z_j + \sum_i f_i \cdot y_i \ \text{s.t.} & \forall i,j, \ x_{i,j} \leq y_i, & i \in F, j \in C \ & \forall j, \ \sum_j x_{i, j} + z_j \geq 1, & i \in F, j \in C \ & x_{i,j}, z_j, y_i \geq 0, & i \in F, j \in C \end{array}$$
• Meanwhile, we also have the Dual LP :
$$\begin{array}{} \text{maximize} & \sum_j \alpha_j \ \text{s.t.} & \forall i,j, \ \alpha_j - \beta_{i, j} \leq d_{i, j}, & i \in F, j \in C \ & \forall j, \ \alpha_j \leq p_j, & j \in C \ & \forall i, \ \sum_j \beta_{i, j} \leq f_i, & i \in F, j \in C \ & \alpha_j, \beta_{i, j} \geq 0, & i \in F, j \in C \end{array}$$

• $\text{Phase} \ \mathrm{I}$

• Initially $\alpha = 0, \ \beta = 0$, no facility is open, no client is served.

• Raise $\alpha_j$ for all client $j$ simultaneously at a uniform continuous rate:

• upon $\alpha_j = d_{i, j}$ for a closed facility $i$:

edge $(i, j)$ is tight;

fix $\beta_{i, j} = \alpha_i - d_{i, j}$ as $\alpha_j$ being raised;

• upon $\sum_{j \in C} \beta_{i, j} = f_i$ :

tentatively open facility $i$;

connect all unserved clients $j$ with tight $(i, j)$ to facility $i$ ans stop raising $\alpha_j$;

all such clients $j$ are served by facility $i$;

• upon $\alpha_j = d_{i, j}$ for a tentatively open facility $i$:

connect client $j$ to facility $i$ ans stop raising $\alpha_j$;

such client $j$ is served by facility $i$;

• upon $\alpha_j = p_j$ for a not-served client $j$:

stop raising $\alpha_j$;

client $j$ can still be served by facility $i$ if $i$ can serve $j$;

• $\text{Phase} \ \mathrm{II}$

• Construct graph $G(V, E)$ where $V=\lbrace \text{tentatively open facilities} \rbrace$ and $(i_1, i_2) \in E$ if $\beta_{i_1, j}, \beta_{i_2, j} \gt 0$ for some client $j$.
• Find a maximal independent set $I$ of $G$ and permanently open facilities in $I$;
• directly connect every facility $i \in I$ to all clients that $i$ serves or $\beta_{i, j} \gt 0$;
• for every remaining client $j$ : connect $j$ to the nearest open facility; call such client indirectly connected.
• For those clients which satisfy $\alpha_j = p_j$ ans have not connected to any facilities : keep them closed.

• For all clients which are connected directly $C_1$ :

• $\sum_{i \in F, j \in C_1} d_{i, j} \cdot x_{i, j} + \sum_{i \in F} f_i \cdot y_i = \sum_{j \in C_1} \alpha_j$
• For all clients which are indirectly connected $C_2$ :

• Consider a unique client $j \in C_2$ was served by facility $i$ before, we write $t_i$ as the time facility $i$ was opened.

• Obviously we have : $\alpha_j = t_i \geq d_{i, j}$, but unfortunately facility $i$ was not permanent opened at last.

• Then we can derive that there exitst another facility $i^$, and another client $j^$, satisfying that $\beta_{i, j^}, \beta_{i^, j^*} \gt 0$.

• Consider $\text{Phase} \ \mathrm{I}$, we have $\alpha_{j^} \leq t_i, \alpha_{j^} \gt d_{i^, j^}, \alpha_{j^} \gt d_{i, j^}$.

• Conclude the above, we have $d_{i, j} + d_{i, j^} + d_{i^, j^*} \leq 3 t_i \leq 3 \alpha_j$.

• With Triangle Inequality, $d_{i^, j} \leq d_{i, j} + d_{i, j^} + d_{i^, j^} \leq 3 t_i \leq 3 \alpha_j$.

• Consider client $j$ finally indirectly connected to facility $i’$, because $i’$ is the nearest, we have

$d_{i’, j} \leq d_{i^, j} \leq d_{i, j} + d_{i, j^} + d_{i^, j^} \leq 3 t_i \leq 3 \alpha_j$.

• We have $\sum_{i \in F, j \in C_2} d_{i, j} \cdot x_{i, j} \leq 3 \sum_{j \in C_2} \alpha_j$

• For all clients which are closed $C_3$ :

• Obviously $\forall j \in C_3, \alpha_j = p_j$,

which means $\sum_{j \in C} p_j \cdot z_j = \sum_{j \in C_3} p_j \leq \sum_{j \in C_3} \alpha_j$ and $\sum_{i \in F, j \in C_3} d_{i, j} \cdot x_{i, j} = 0$.

• Conclude above all :

$\sum_{i \in F, j \in C} d_{i, j} \cdot x_{i, j} + \sum_{i \in F} f_i \cdot y_i + \sum_{j \in C} p_j \cdot z_j \leq \sum_{j \in C_1} \alpha_j + 3 \sum_{j \in C_2} \alpha_j + \sum_{j \in C_3} \alpha_j \leq 3 \sum_{j \in C} \alpha_j \leq 3 \cdot \text{OPT}$

### Solution 2

• Consider every image $i$ as a one-hot vector $\vec{x}_i$ of the classification, $\vec{x}_i^j = 1$ means image $i$ belongs to cluster $j$.

• There can exist at most $n$ clusters.

• $<\vec{x}_i, \vec{x}_j> = 1 \Leftrightarrow \text{image$i$and$j$belong to the same cluster}$.

• We can find that these $n$ vectors is the feasible solution of the SDP.

• Obviously the purpose of the SDP is same as the origin problem,

so the optimal solution of the SDP is the upper bound of the origin problem.

• Suppose that we have already get the SDP’s optimal solution $\vec{x}_1, \cdots, \vec{x}_n$.

• We can then generate two vectors $\vec{y}_1, \vec{y}_2$ randomly and distributely,

with $\vec{y}_1, \vec{y}_2 \in \mathbb{R}^n, |\vec{y}_1|=|\vec{y}_2|=1, \forall i \in [n] \ j \in , <\vec{y}_j, \vec{x}_i> \geq 0$.

• For any $i \in [n]$, the answer $(<\vec{x}_i, \vec{y}_1> \geq 0, <\vec{x}_i, \vec{y}_2> \geq 0)$ have four values $(F, F), (T, F), (F, T), (T, T)$.

• We write event $A_{i, j}$ as $\vec{x}_i$, $\vec{x}_j$ have not been separated, which means they are in the same cluster.

• Consider that we have $\text{Pr}[A_{i, j}] = (1 - \frac{\theta_{i, j}}{\pi})^2 \geq \frac{3}{4}\cos \theta_{i, j}​$, $1 - \text{Pr}[A_{i, j}] = 1 - (1 - \frac{\theta_{i, j}}{\pi})^2 \geq \frac{3}{4}(1 - \cos \theta_{i, j})​$.

• The above equals $\forall x \in [0, \frac{\pi}{2}], 0 \leq (1 - \frac{x}{\pi})^2 - \frac{3}{4}\cos x \leq \frac{1}{4}$.
• Suppose $f(x) = (1 - \frac{x}{\pi})^2 - \frac{3}{4}x​$, we have $f’(x) = -\frac{2}{\pi} + \frac{2x}{\pi^2} + \frac{3}{4}\sin x​$.
• Obviously, $f’(x)$ is increasing, so we can see $f(x)$ will decrease first, then increase($f(0) < 0, f(\frac{\pi}{2}) > 0$).
• $\max_{x \in [0, \frac{\pi}{2}]} f(x) = \max(f(0), f(\frac{\pi}{2})) = \frac{1}{4} \leq \frac{1}{4}$.
• $\min_{x \in [0, \frac{\pi}{2}]} f(x) = f(x \mid f’(x)=0 \Leftrightarrow \frac{3\pi}{8}x = 1 - \frac{x}{\pi}) \simeq 0.07 \geq 0$.
• Then we have :
\begin{align} \mathbb{E} & = \sum_{1 \leq i < j \leq n} w{(i, j)}^+\text{Pr}[A_{i, j}] + w{(i, j)}^-(1 - \text{Pr}[A_{i, j}]) \ & \geq \frac{3}{4}\sum_{1 \leq i < j \leq n} w{(i, j)}^+\cos \theta_{i, j} + w{(i, j)}^-(1 - \cos \theta_{i, j}) \ & = \frac{3}{4}\text{OPT}{\text{SDP}} \ & \geq \frac{3}{4}\text{OPT} {\text{ORIGIN}}. \end{align}

### Solution 3

• We write bad event $A_i$ means all nodes of $e_i$ has the same color, then we have $\text{Pr}[A_i] = \frac{2}{2^k} = \frac{1}{2^{k-1}}$.
• Because each node’s degree is $k$, which means event $A_i$ is related with at most $k(k-1)$ events.
• By using Lovász Local Lemma, we want $\text{Pr}[A_i] = \frac{1}{2^{k-1}} \leq \frac{1}{e(k(k-1)+1)} \Leftrightarrow 2^{k-1}-e(k(k-1)+1 ) \geq 0$.

• Suppose $f(k) = 2^{k-1}-e(k(k-1)+1 )$, we have $f(8) < 0, f(9) > 0$.

• We can easily prove that $f(k)$ is increasing when $k \geq 9$ by using derivation.
• So the smallest $k$ I can achieve is $k = 9$.