Advanced Algorithms Notes

Notes for Advanced Algorithm


Chapter.1 Min-Cut and Max-Cut

Min-Cut

Deterministic Algorithm

Randomized Algorithm

  • Karger’s Contraction Algorithm

    • Random Contract (Choose $e$ randomly)

      fig1_1

    • While($|V| \gt 2$) Repeat

    • $O(n)$ for 1 contraction, $n-2$ contractions, $O(n^2)$ for Karger

    • For any multigraph with $n$ verticles, the RandomContract algorithm returns a minimum cut with probability at least $\frac{2}{n(n-1)}$

    • Run RandomContract independently for $t=\frac{n(n-1)\ln n}{2}$ times, the probability that a minimum cut is found is at least $1-\frac{1}{n}$

    • $O(n^2)$ for 1 time, $O(n^4\log n)$for all ($Pr \geq 1-\frac{1}{n}$)

  • For any graph $G(V,E)$ of $n$ vertices, the number of distinct minimum cuts in $G$ is at most $\frac{n(n-1)}{2}$

  • Fast Min-Cut

    • Random Contract

    • Fastcut

      fig1_2

    • A : The minimum-cut C survives all contractions in RandomContract(G,t)

      B : size of min-cut is unchanged after RandomContract(G,t)

      $\text{Pr}[B]\geq \text{Pr}[A]\geq \frac{t(t-1)}{n(n-1)}\geq \frac{1}{2} (t=\lceil 1+\frac{n}{\sqrt2}\rceil)$

    • For any multigraph with $n$ vertices, the FastCut algorithm returns a minimum cut with probability $\Omega(\frac{1}{\log n})$ in time $O(n^2\log n)$

    • $O(n^2\log n)​$ for 1 time, $O(n^2\log^3n)​$ for all ($Pr \geq 1-O(\frac{1}{n})​$)

Max-Cut

Greedy Algorithm

  • Greedy Heuristics

    fig1_3

  • Approximation Ratio

    • $|E|=\sum_{i=1}^n(|E(S_i,{v_i})|+|E(T_i,{v_i}|))$
    • $SOL_G=\sum_{i=1}^n\max (|E(S_i,{v_i})|+|E(T_i,{v_i}|)$
    • $SOL_G\geq \frac{|E|}{2}\geq\frac{OPT_G}{2}$

Derandomization by Conditional Expectation

  • For every vertex $v\in V$ let $X_v\in{0, 1}$ be a uniform and independent random bit which indicates whether $v$ joins $S$ or $T$.
  • $\mathbb{E}[|E(S,T)|]=\sum_{uv\in E}\mathbb{E}[I[X_u\neq X_v]]=\sum_{uv\in E}\text{Pr}[X_u\neq X_v]=\frac{|E|}{2}\geq \frac{OPT_G}{2}$
  • Find the solution satisfied above(☆)

Derandomization by Pairwise Independence

  • Use $\log n$ independent variables, with xor

Chapter.2 Fingerprinting

Polynomial Identity Testing (PIT)

  • Given as input two polynomials, determine whether they are identical.
  • $f\equiv g \Leftrightarrow f-g\equiv 0$

Deterministic Algorithm

  • Evaluate $f(x_1), f(x_2), \dots, f(x_{d+1})$ over $d+1$ distinct elements $x_1, x_2, \dots, x_{d+1}$ from the field $\mathbb{F}$ and check whether they are all zero.

  • Fundamental Theorem of Algebra

    Any non-zero univariate polynomial of degree $d$ has at most $d$ roots.

Randomized Algorithm

  • fig2_1

  • $\text{Pr}[f(r)=0]\leq\frac{d}{|S|}$

Communication Complexity of Equality

  • $EQ(a,b)=\begin{cases}1 & \text{if a=b,} \\ 0 & \text{otherwise.}\end{cases}$

  • Theorem (Yao 1979)

    Any deterministic communication protocol computing EQ on two $n$-bit strings costs $n$ bits of communication in the worst-case.

  • $f(x)=\sum_{i=0}^{n-1}a_ix^i$ and $g(x)=\sum_{i=0}^{n-1}b_ix^i$, defined over finite field $\mathbb{Z}_p=\lbrace 0,1,\dots,p-1\rbrace $ for some suitable $p$

  • fig2_2

  • Communication complexity bounded with $O(\log p)$, $\text{Pr}[f(r)=g(r)]\leq\frac{n-1}{p}$, choosing $p$ to be a prime in the interval $[n^2,2n^2]$, error prability of false positive at most $O(\frac{1}{n})$

  • Schwartz-Zippel Theorem

    Let $f\in\mathbb{F}[x_1,x_2,\dots,x_n]$ be a multivariate polynomial of degree $d$ over a field $\mathbb{F}$. If $f \not\equiv0$, then for any finite set $S \subset\mathbb{F}$, and $r_1,r_2,\dots,r_n\in S$ chosen uniformly and independently at random, $\text{Pr}[f(r_1,r_2,\dots,r_n)=0]\leq\frac{d}{|S|}$.

    Proof(★) of induction

    Notes :

    1. The proof of $n=1$ is based on fundamental theorem of algebra.
    2. Straightly consider multivariate polynomial $f(x_1,x_2,\dots,x_n)$ as univariate polynomial $f_{r_1,r_2,\dots,r_{n-1}}(r_n)$ with $\text{Pr}[f_{r_1,r_2,\dots,r_{n-1}}(r_n)=0]\leq\frac{d}{|S|}$is wrong, because $\text{Pr}[f_{r_1,r_2,\dots,r_{n-1}}(r_n)=0]=\text{Pr}[\text{This univariate polynomial$\equiv0$}]+\text{Pr}[\text{The coefficient$=0$}]$ and the only truth is $\text{Pr}[\text{This univariate $\equiv0$}]\leq\frac{d}{|S|}$.

Fingerprinting

  • Fingerprints function $FING(\cdot)$
  • $FING(\cdot)$ is a function, meaning that if $X=Y$ then $FING(X)=FING(Y)$.
  • It is much easier to compute and compare the fingerprints.
  • Ideally, the domain of fingerprints is much smaller than the domain of original objects, so storing and comparing fingerprints are easy. This means the fingerprint function $FING(\cdot)$ cannot be an injection (one-to-one mapping), so it’s possible that different $X$ and $Y$ are mapped to the same fingerprint. We resolve this by making fingerprint function randomized, and for $X\neq Y$, we want the probability $\text{Pr}[FING(X)=FING(Y)]$ to be small.

Communication protocols for Equality by fingerprinting

Fingerprinting by randomized checksum

  • Now we consider a new fingerprint function: We treat each input string $x\in\lbrace 0,1\rbrace ^n$ as the binary representation of a number, and let $FING(x)=x\text{ mod }p$ for some random prime $p$ chosen from $[k]=\lbrace 0,1,\dots,k-1\rbrace $.

  • fig2_3

  • The number of bits to be communicated is obviously $O(\log k)$.

  • Let $z=x-y$. $\text{Pr}[z\text{ mod }p=0]\leq\frac{\text{the number of prime divisors }z}{\text{the number of primes in }[k]}$.

  • Prime Number Theorem

    Let $\pi(k)$ denote the number of primes less than $k$. Then $\pi(k) \sim\frac{k}{\ln k}$ as $k\sim\infty$.

  • Therefore, by choosing $k=2n^2\ln n$, $0\lt z\lt 2^n$, and a random prime $p\in[k]$, $\text{Pr}[z\text{ mod }p]\leq\frac{n}{\pi(k)}\sim\frac{1}{n}$.

    $\text{Pr}[FING(x)=FING(y)]\leq \text{Pr}[|x-y|\text{ mod }p=0]\leq\frac{1}{n}$.

Checking distinctness

  • Given a sequence $x_1,x_2,\dots,x_n\in\lbrace 1,2,\dots,n\rbrace $, check whether every element of $\lbrace 1,2,\dots,n\rbrace $ appears exactly once.

Fingerprinting multisets

  • fig2_4

  • $FING(A)=\prod_{i=1}^n(r-a_i)$

  • Theorem (Lipton 1989)

    Let $A=\lbrace a_1,a_2,\dots,a_n\rbrace $ and $B=\lbrace b_1,b_2,\dots,b_n\rbrace $ be two multisets whose elements are from $\lbrace 1,2,\dots,n\rbrace $. If $A\neq B$, then $\text{Pr}[FING(A)=FING(B)]=O(\frac{1}{n})$.

    Proof(★)


Chapter.3 Hashing and Sketching

Distinct Elements

  • Counting distinct elements.
  • Input : a sequence of (not necessarily distinct) elements $x_1,x_2,\dots, x_n\in\Omega$
  • Output : an estimation of the total number of distinct elements $z=|\lbrace x_1,x_2,\dots,x_n\rbrace|$

Straightforward Method

  • Maintain a dictionary data structure, costs $O(n)$ space.
  • Due to an information-theoretical argument, linear space is necessary if you want to compute the exact value of $z$.

$(\epsilon,\delta)$-estimator

  • Relax the problem a little bit to significantly reduce the space cost by tolerating approximate answers.
  • fig3_1
  • $\epsilon$ : approximation error & $\delta$ : confidence error

An estimator by hashing

  • Random hash function : $h:\Omega\rightarrow[0,1]$ (uniformly distributed)

  • $\mathbb{E}[\min_{1\leq i\leq n}h(x_i)]=\frac{1}{z+1}$

    Proof(★)

  • $Y=\min_{1\leq i\leq n}h(x_i), \text{Answer}=\frac{1}{Y}-1$

Flajolet-Martin algorithm

  • k random hash function : $h:\Omega\rightarrow[0,1]$ (uniformly distributed)

  • fig3_2

  • For any $\epsilon,\delta\lt\frac{1}{2}$, if $k\geq\lceil\frac{4}{\epsilon^2\delta}\rceil$, the the output $\hat Z$ always gives an $(\epsilon, \delta)$-estimator of the correct answer $z$.

    Proof(★)

  • fig3_3

Uniform Hash Assumption(UHA)

  • hash functions with discrete values as $h:\Omega\rightarrow[M]$ where $\text{M=poly(n)}$, the hash values are strings of $O(\log n)$ bits.
  • By an information-theretical argument, it takes at least $\Omega(N\log M)$ bits to represent such a random hash function because this is the entropy of such uniform random function.

Set Membership

  • space cost + time cost
  • sorted table / balanced search tree : Space : $O(n\log N)$ bits & Time : $O(\log n)$
  • perfect hashing : Space : $O(n\log N)$ bits & Time : $O(1)$

Bloom Filter

  • The Bloom filter is a space-efficient hash table that solves the approximate membership problem with one-sided error (false positive).
  • A Bloom filter consists of an array $A$ of $cn$ bits, and $k$ hash functions $h_1,h_2,\cdots,h_k$ map $\Omega$ to $[cn]$.
  • fig3_4
  • Probability Calculate(★)

Frequency Estimation

  • Problem define

    • Data : a sequence of (not necessarily distinct) elements $x_1,x_2,\dots,x_n\in\Omega$
    • Query : an element $x\in\Omega$
    • Output : an estimation $\hat f_x$ of tje frequency $f_x\triangleq |\lbrace i|x_i=x\rbrace|$ of $x$ in input data
  • Heavy Hitters : (elements $x$ with high frequencies $f_x>\epsilon n$)

  • Count-min Sketch

    fig3_5


Chapter.4 Balls into bins and Chernoff bound

Balls into Bins

Birthday Problem

  • here are $m$ students in the class. Assume that for each student, his/her birthday is uniformly and independently distributed over the 365 days in a years. We wonder what the probability that no two students share a birthday.
  • Pigeonhole Principle : Obviously there must be two students with the same birthday is $m \gt 365$.
  • Birthday Paradox : Surprisingly, for any $m \gt 57$ this event occurs with more than 99% probability
  • Model : $m$ different balls (students) are uniformly and independently thrown into 365 bins (days). More generally, let $n$ be the number of bins.
  • $\mathcal{E}$ : there is no bin with more than one balls (i.e. no two students share birthday).
  • $\text{Pr}[\mathcal{E}]=\frac{(^n_m)m!}{n^m}=\prod_{k=1}^{m-1}(1-\frac{k}{n})\approx e^{-\frac{m^2}{2n}}$. For $m=\sqrt{2n\ln \frac{1}{\epsilon}}, \text{Pr}[\mathcal{E}]\approx\epsilon$.

Coupon Collector

  • We keep throwing balls one-by-one into $n$ bins (coupons), such that each ball is thrown into a bin uniformly and independently at random. Each ball corresponds to a box of chocolate, and each bin corresponds to a type of coupon. Thus, the number of boxes bought to collect $n$ coupons is just the number of balls thrown until none of the $n$ bins is empty.

  • Theorem :

    Let $X$ be the number of balls thrown uniformly and independently to $n$ bins until no bin is empty. Then $\mathbb{E}[X]=nH(n)$, where $H(n)$ is the $n$-th harmonic number.

    $H(n)=\ln n+O(1)$, the expected number of coupons required to obtain all $n$ types of coupons is $n\ln n+O(n)$.

    Proof(★)

  • Let $X$ be the number of balls thrown uniformly and independently to $n$ bins until no bin is empty. Then $\text{Pr}[X\geq n\ln n+cn]\lt e^{-c}$ for any $c \gt 0$.

Occupancy Problem

  • An easy analysis shows that for every bin $i$, the expected load $\mathbb{E}[X_i]$ is equal to the average load $\frac{m}{n}$.

  • With High Probability (w.h.p) : means $\text{Pr}[A]\gt 1-\frac{1}{n^c}$ in which $n$ means the size of $A$, $c$ is a constant and $c\gt 1$, then we can say $A$ happens with high probability.

  • The distribution of the maximum load.

    • Theorem : Suppose that $n$ balls are thrown independently and uniformly at random into $n$ bins. For $1\leq i \leq n$, let $X_i$ be the random variable denoting the number of balls in he $i$-th bin. Then $\text{Pr}[\max_{1\leq i\leq n}X_i \geq \frac{3\ln n}{\ln\ln n}] \lt \frac{1}{n}$.

      Proof(★)

    • Formally, it can be proved that for $m=\Omega(n\log n)$, with high probability, the maximum load is within $O(\frac{m}{n})$, which is asymptotically equal to the average load.

Chernoff Bound

Moment generating functions

  • The more we know about the moments of a random variable $X$, the more information we would have about $X$. There is a so-called moment generating function, which “packs” all the information about the moments of $X$ into one function.

  • Definition : The moment generating function of a random variable $X$ is defined as $\mathbb{E}[e^{\lambda x}]$ where $\lambda$ is the parameter of the function.

    By Taylor’s expansion and the linearity of expectations, $\mathbb{E}[e^{\lambda x}]=\mathbb{E}[\sum_{k=0}^\infty \frac{\lambda^k}{k!}X^k]=\sum_{k=0}^\infty \frac{\lambda^k}{k!}\mathbb{E}[X^k]$.

The Chernoff bound

  • Let $X=\sum_{i=1}^n X_i$, where $X_1,X_2,\cdots,X_n$ are independent Poisson trails. Let $\mu=\mathbb{E}[X]$.

  • Standard Forms :

    • Upper Tail : for any $\delta\gt 0$, $\text{Pr}[X\geq (1+\delta)\mu]\leq (\frac{e^\delta}{(1+\delta)^{1+\delta}})^\mu$.
    • Lower Tail : for any $0 \lt \delta \lt 1$, $\text{Pr}[X \leq (1-\delta)\mu] \leq (\frac{e^{-\delta}}{(1-\delta)^{1-\delta}})^\mu$.
  • Proof(★)

    Idea : apply Markov’s inequality to $e^{\lambda x}$ and for the rest, we just estimate the moment generating function $\mathbb{E}[e^{\lambda x}]$. To make the bound as tight as possible, we minimized the $\frac{e^{(e^{\lambda}-1)}}{e^{\lambda (1+\delta)}}$ by setting $\lambda = \ln (1 + \delta)$, which can be justified by taking derivatives of $\frac{e^{(e^{\lambda}-1)}}{e^{\lambda (1+\delta)}}$.

  • Useful Forms :

    • for $0 \lt \delta \leq 1$,

      $\text{Pr}[X \geq (1+\delta)\mu] \lt \exp(-\frac{\mu\delta^2}{3})$

      $\text{Pr}[X \leq (1-\delta)\mu] \lt \exp(-\frac{\mu\delta^2}{2})$

    • for $t \geq 2e\mu$,

      $\text{Pr}[X \geq t] \leq 2^{-t}$

Hoeffding

  • Hoeffding’s Lemma :

    Let $X$ be a real-valued random variable with expected value $\mathbb{E}[X]=0$, and such that $a \leq X \leq b$ almost surely. Then for all $\lambda \in \mathbb{R}$, $\mathbb{E}[e^{\lambda x}]\leq \exp(\frac{\lambda^2(b-a)^2}{8})$.

  • Hoeffding’s Inequality :

    • General case of bounded random variables :
      • $\overline{X}=\sum_{i=1}^n X_i$, $a_i \leq X_i \leq b_i$
      • $P(\overline{X}-\mathbb{E}[\overline{X}] \geq t) \leq \exp(-\frac{2n^2t^2}{\sum_{i=1}^n(b_i-a_i)^2})$
      • $P(|\overline{X}-\mathbb{E}[\overline{X}]| \geq t) \leq 2\exp(-\frac{2n^2t^2}{\sum_{i=1}^n(b_i-a_i)^2})$

Applications to balls-into-bins

Power of Two Choices (See on slides)

Chapter.5 Concentration of Measure

Conditional Expectations

  • conditional expectation of a random variable $Y$ with respect to an event $\mathcal{E}$ :

    $\mathbb{E}[Y\mid \mathcal{E}]=\sum_y y\text{Pr}[Y=y \mid \mathcal{E}]$

  • In particular, if the event $\mathcal{E}$ is $X=a$, the condition expectation $\mathbb{E}[Y \mid X=a]$ defines a function $f(a)=\mathbb{E}[Y \mid X=a]$. Thus, $\mathbb{E}[Y \mid X]$ can be regared as a random variable $f(X)$.

  • Proposition (fundamental facts about conditional expectation) :

    Let $X,Y,Z$ be arbitary andom variables. Let $f, g$ be arbitary functions.

    • $\mathbb{E}[X] = \mathbb{E}[\mathbb{E}[X \mid Y]]$
    • $\mathbb{E}[X \mid Z]=\mathbb{E}[\mathbb{E}[X \mid Y,Z] \mid Z]$
    • $\mathbb{E}[\mathbb{E}[f(X)g(X, Y) \mid X]]=\mathbb{E}[f(X) \cdot \mathbb{E}[g(X,Y) \mid X]]$

Martingales

  • Definition :

    A sequence of random variables $X_0, X_1, \cdots$ is a martingale if for all $i \gt 0$, $\mathbb{E}[X_i \mid X_0,\cdots, X_{i-1}]=X_{i-1}$

  • Definition of martingale’s general version :

    A sequence of random variables $Y_0, Y_1, \cdots$ is a martingale with respect to the sequence $X_0, X_1, \cdots$ if, for all $i \gt 0$, the following conditions are hold :

    • $Y_i$ is a functiob of $X_0,X_1,\cdots, X_i$
    • $\mathbb{E}[Y_{i+1} \mid X_0,\cdots, X_i] = Y_i$.

Azuma’s Inequality

  • Azuma’s Inequality :

    Let $X_0, X_1, \cdots$ be a martingale such that, for all $k \geq 1$, $|X_k-X_{k-1}| \leq c_k$,

    Then $\text{Pr}[|X_n-X_0| \geq t] \leq 2 \exp(-\frac{t^2}{2\sum_{k=1}^n c_k^2})$.

  • Collary :

    Let $X_0, X_1, \cdots$ be a martingale such that, for all $k \geq 1$, $|X_k-X_{k-1}| \leq c$,

    Then $\text{Pr}[|X_n-X_0| \geq ct\sqrt{n}] \leq 2 \exp(-\frac{t^2}{2})$.

  • Azuma’s Inequality ( general version ) :

    Let $Y_0, Y_1, \cdots$ be a martingale with respect to the sequence $X_0, X_1, \cdots$ such that, for all $k \gt 1$, $Y_k-Y_{k-1} \leq c_k$.

    Then, $\text{Pr}[|Y_n-Y_0| \geq t] \leq 2 \exp(-\frac{t^2}{2\sum_{k=1}^n c_k^2})$.

    Proof(★)

    Lemma : Let $X$ be a random variable such that $\mathbb{E}[x] = 0$ and $|X| \leq c$. Then for $\lambda \gt 0$, $\mathbb{E}[e^{\lambda X}] \leq \exp(\frac{\lambda^2 c^2}{2})$.

The Doob Martingales

  • Definition :

    The Doob sequence of a function $f$ with respect to a sequence of random variables $X_1, \cdots, X_n$ is defined by

    $Y_i = \mathbb{E}[f(X_1, \cdots, X_n) \mid X_1, \cdots, X_i], 0 \leq i \leq n$.

    In particular, $Y_0 = \mathbb{E}[f(X_1, \cdots, X_n)]$ and $Y_n=f(X_1, \cdots, X_n)$.

  • See in Wiki and Slides :

    • edge exposure martingale
    • vertex exposure martingale
  • Chromatic number

    Let $G \sim G(n,p)$. Let $\chi(G)$ be the chromatic number of $G$. Then $\text{Pr}[|\chi(G) - \mathbb{E}[\chi(G)]| \geq t \sqrt{n}] \leq 2 \exp(-t^2 / 2)$.

    Proof(☆) ( $Y_i = \mathbb{E}[\chi(G) \mid X_1, \cdots, X_i]$, $|Y_i - Y_{i-1}| \leq 1$ )

  • Hoeffding’s Inequality

    Let $X=\sum_{i=1}^n X_i$, where $X_1, \cdots, X_n$ are independent random variables with $a_i \leq X_i \leq b_i$ for each $1 \leq i \leq n$.

    Let $\mu = \mathbb{E}[x]$. Then $\text{Pr}[|X-\mu| \geq t] \leq 2 \exp(-\frac{t^2}{2 \sum_{i=1}^n (b_i-a_i)^2})$.

    Proof(☆) ( Use the Doob Martingale, $Y_i = \mathbb{E}[\sum_{j=1}^n X_j \mid X_1, \cdots, X_i]$ )

The Bounded Difference Method

  • For arbitrary random variables :

    Let $X=(X_1, \cdots, X_n)$ be arbitrary random variables and let $f$ be a function of $X_0, X_1, \cdots, X_n$ satisfying that,

    for all $1 \leq i \leq n$, $|\mathbb{E}[f(X) \mid X_1, \cdots, X_i] - \mathbb{E}[f(X) \mid X_1, \cdots, X_{i-1}]| \leq c_i$,

    Then $\text{Pr}[|f(x)-\mathbb{E}[f(X)]| \geq t] \leq 2 \exp(-\frac{t^2}{2 \sum_{i=1}^n c_i^2})$.

    Proof(☆) ( $Y_i = \mathbb{E}[f(X_1, \cdots, X_n) \mid X_1, \cdots, X_i]$, use Azuma’s Inequality )

  • For independent random variables :

    • Definition (Lipschitz condition) :

      A function $f(x_1, \cdots, x_n)$ satisfies the Lipschitz condition, if for any $x_1, \cdots, x_n$ and any $y_i$,

      $|f(x_1, \cdots, x_{i-1}, x_i, x_{i+1}, \cdots, x_n) - f(x_1, \cdots, x_{i-1}, y_i, x_{i+1}, \cdots, x_n)| \leq 1$.

    • Definition (Lipschitz condition, general version) :

      A function $f(x_1, \cdots, x_n)$ satisfies the Lipschitz condition, if for any $x_1, \cdots, x_n$ and any $y_i$,

      $|f(x_1, \cdots, x_{i-1}, x_i, x_{i+1}, \cdots, x_n) - f(x_1, \cdots, x_{i-1}, y_i, x_{i+1}, \cdots, x_n)| \leq c_i$.

    • Collary ( Method of bounded differences ) :

      Let $X=(X_1, \cdots, X_n)$ be $n$ independent random variables and let $f$ be a function satisfying the Lipschitz condition with constants $c_i$, $1 \leq i \leq n$.

      Then $\text{Pr}[|f(X)-\mathbb{E}[f(X)]| \geq t] \leq 2 \exp(-\frac{t^2}{2 \sum_{i=1}^n c_i^2})$.

      Proof(★)

  • Applications (See in Wiki and Slides).


Chapter.6 Dimension Reduction and Nearest Neighbor Search

Dimension Reduction

  • Definition :

    • Input : n points $\vec{x_1}, \vec{x_2}, \cdots, \vec{x_n} \in \mathbb{R}^d$
    • Output : n points $\vec{y_1}, \vec{y_2}, \cdots, \vec{y_n} \in \mathbb{R}^k$
    • Require : $\forall i, j : (1-\epsilon) |\vec{x_i}-\vec{x_j}| \leq |\vec{y_i}-\vec{y_j}| \leq (1+\epsilon) |\vec{x_i}-\vec{x_j} |$
    • Usually we want $k \ll d$.
  • The JLT ( Johonson-Linenstauss Theorem 1984 )

    Euclidean Distance

    $\forall 0 \lt \epsilon \lt 1$, for any set $S$ of $n$ points from $\mathbb{R}^d$, there is a $\phi : \mathbb{R}^d \to \mathbb{R}^k$ with $k \in O(\epsilon^{-2} \cdot \log n)$, such that $\forall \vec{x_i}, \vec{x_j} \in S : (1-\epsilon) |\vec{x_i}-\vec{x_j}|_2^2 \leq |\phi(\vec{x_i})-\phi(\vec{x_j})|_2^2 \leq (1+\epsilon) |\vec{x_i}-\vec{x_j} |_2^2$.

    Easy to find $\phi​$, just sample a random $k \times d​$ matrix $A​$.

    $\text{Pr}[ \forall \vec{x_i}, \vec{x_j} \in S : (1-\epsilon) |\vec{x_i}-\vec{x_j}|_2^2 \leq |A\vec{x_i}-A\vec{x_j}|_2^2 \leq (1+\epsilon) |\vec{x_i}-\vec{x_j} |_2^2 ] \geq 1-O(\frac{1}{n})$.

  • How to construct (sample) $A$?

    • Choose some suitable $k \in O(\epsilon^{-2} \cdot \log n)$.
    • For each $a_{i,j} of A \in \mathbb{R}^{k \times d}$, independently sample $a_{i,j}$ from Gaussian Distribution $N(0,\frac{1}{k})$.
    • For each $l \in [n] : \text{Let} \vec{y_l} = A \vec{x_l}$.
  • Consider $(1-\epsilon) |\vec{x_i}-\vec{x_j}|_2^2 \leq |A\vec{x_i}-A\vec{x_j}|_2^2 \leq (1+\epsilon) |\vec{x_i}-\vec{x_j} |_2^2 \Leftrightarrow 1-\epsilon \leq | A\frac{\vec{x_i}-\vec{x_j}}{| \vec{x_i}-\vec{x_j} |_2} |_2^2 \leq 1+\epsilon$

    $\frac{\vec{x_i}-\vec{x_j}}{| \vec{x_i}-\vec{x_j} |_2}$ means unit vector!

  • Union bound over $O(n^2)$ pairs of $\vec{x_i}, \vec{x_j} \in S$ : For any unit vector $\vec{u} \in \mathbb{R}^d : \text{Pr}[| | A\vec{u} |^2_2 -1| \gt \epsilon] \lt \frac{1}{n^3}$

  • Definition :

    • Setup : metric space ($X$, dist)
    • Data : $n$ points $\vec{y_1}, \vec{y_2}, \cdots, \vec{y_n} \in X$
    • Query : given a point $\vec{x} \in X$, find the $\vec{y}$ which is closest to $\vec{x}$
  • c-ANN (Approximate Nearest Neighbor) :

    • Return a $\vec{y_i}$ s.t. $\text{dist}(\vec{x}, \vec{y_i}) \leq c \cdot \min_{1 \leq j \leq n} \text{dist}(\vec{x}, \vec{y_j})$
  • (c,r)-ANN (Approximate Near Neighbor) :

    • Return a $\vec{y_i}$ s.t. $\text{dist}(\vec{x}, \vec{y_i}) \leq cr$ if $\exists \vec{y_j} : \text{dist}(\vec{x}, \vec{y_j}) \leq r$
    • Return “NO” if $\forall \vec{y_j} : \text{dist}(\vec{x}, \vec{y_j}) \gt r$
    • Arbitrary answer otherwise
  • Hamming Space based on $\lbrace 0, 1\rbrace ^d$ and $\lbrace 0, 1\rbrace ^k$

    • For $i = 1, 2, \cdots, n$ : let $\vec{z_i} = A\vec{y_i} \in \lbrace 0, 1 \rbrace^k$ on finite field GF(2).

    • Store all s-balls $B_s(\vec{u} = \lbrace\vec{y_i} \mid \text{dist}(\vec{u}, \vec{z_i}) \lt s \rbrace)$ for all $\vec{u} \in \lbrace 0, 1 \rbrace^k$

    • Now, upon a query $\vec{x} \in \lbrace 0, 1 \rbrace^k$ :

      Retrieve $B_s(A\vec{x})$

      If $B_s(A\vec{x}) = \phi$ returns “no”, else return any $\vec{y_i} \in B_s(A\vec{x})$.

    • Space : $O(n \cdot 2^k)$

      Query Time : $O(dk)$ computation + $O(1)$ memory access

    • For suitable $k \in O(\log n), p, s$; $\forall \vec{x}, \vec{y} \in \lbrace 0, 1 \rbrace^d$ :

      $\text{dist}(\vec{x}, \vec{y}) \leq r \Rightarrow \text{Pr}[\text{fist}(A\vec{x}, A\vec{y}) \gt s] \leq \frac{1}{n^2}$

      $\text{dist}(\vec{x}, \vec{y}) \gt cr \Rightarrow \text{Pr}[\text{fist}(A\vec{x}, A\vec{y}) \leq s] \leq \frac{1}{n^2}$

      (c,r)-ANN is solved w.h.p.

    • More Normal Condition :

      fig6_1

      fig6_2

      Using Chernoff Bound :

      Let independent r.v. $X_1, X_2. \cdots, X_k \in \lbrace 0,1 \rbrace$, let $X = \sum_{i=1}^k X_i$, then for $s \gt 0$ :

      $\text{Pr}[X \geq \mathbb{E}[X] + s] \leq \exp(- \frac{2s^2}{k}) \leq \exp(- \Omega(k))$

      $\text{Pr}[X \leq \mathbb{E}[X] - s] \leq \exp(- \frac{2s^2}{k}) \leq \exp(- \Omega(k))$

    • fig6_3

Locality-Sensitive Hashing

  • Definition :

    Given a metric space $(X, \text{dist})$,

    a random $h : X \to U$ drawn from $\mathcal{H}$ is an (r, cr, p, q)-LSH if, for all $\vec{x}, \vec{y} \in X$ :

    $\text{dist}(\vec{x}, \vec{y}) \leq r \Rightarrow \text{Pr}[h(\vec{x}) = h(\vec{y})] \geq p$

    $\text{dist}(\vec{x}, \vec{y}) \gt r \Rightarrow \text{Pr}[h(\vec{x}) = h(\vec{y})] \geq p$

    • We hope $P \gt q$.

    • If There exists an (r, cr, p, q)-LSH $h : X \to U$,

      then there exists an (r, cr, $p^k$, $q^k$)-LSH $g : X \to U^k$

    • Independently draw $h_1, h_2, \cdots, h_k$ according to the distribution of $h$

      $g(x) = ( h_1(x), h_2(x), \cdots, h_k(x) ) \in U^k$

  • fig6_4

    fig6_5

    If the real answer is “no” : always correct.

    If exists $\vec{y_s}$ such that $\text{dist}(\vec{x}, \vec{y_s}) \leq r$, the $\text{Pr[answer “no”]} \leq \text{Pr}_1 + \text{Pr}_2$

    $\text{Pr}_1 = \text{Pr}[\forall j, g_j(\vec{x}) \neq g_j(\vec{y_s})] \leq (1 - p^*)^l \leq 1/e$

    $\text{Pr}_2 = \text{Pr}[\text{exists } 10l \text{ bad } \vec{y_i} \text{ that dist}(\vec{x}, \vec{y_i}) \gt cr \text{ yet } \exists j \text{ s.t. } g_j(\vec{x}) = g_j(\vec{y_i})] \leq \frac{\mathbb{E}[\text{number of such bad } \vec{y_i}]}{10l} \leq \frac{n \cdot l \cdot (1/n)}{10l} = 0.1$

    $\text{Pr}_1 + \text{Pr}_2 \leq 1/e + 0.1 \lt 0.5$

    Space : $O(nl) = O(n / p^)$ Time : $O(l \log n) = O((\log n) / p^)$

  • fig6_6

    fig6_7

  • Recap :

    fig6_8

Chapter.7 Greedy and Local Search

Set Cover

  • Set Cover Problem :
    • Input : $m$ subsets $S_1, S_2, \cdots, S_m \subseteq U$ of universe $U$ of size $n$.
    • Output : the smallest $C \subseteq \lbrace 1, 2, \cdots, m \rbrace$ such that $U=\bigcup_{i \in C} S_i$.
  • Hitting Set Problem :
    • Input : $n$ subsets $S_1, S_2, \cdots, S_m \subseteq U$ of universe $U$ of size $m$.
    • Output : the smallest subset $C \subseteq U$ of elements such that $C$ intersects with every $S_i$ for $1 \leq i \leq n$.

Frequency and Vertex Cover

  • $\text{frequency}(x) = | \lbrace i \mid x \in S_i \rbrace |$.
  • Vertex Cover Problem :
    • Input : an undirected graph $G(V, E)$
    • Output : the smallest $C \subseteq V$ such that every edge $e \in E$ is incident to at least on vertex in $C$.
  • The vertex cover problem is NP-Hard.

Greedy Algorithm

  • Greedy Cover

    fig7_1

    Analyze :

    • Notations :

      • We enumerate all elements of the universe $U$ as $x_1, x_2, \cdots, x_n$, in the order in which they are covered in the algorithm.

      • For $t = 1, 2, \cdots, $ let $U_t$ denote the set of uncovered elements in the beginning of the $t$-th iteration of the algorithm.

      • For the $k$-th element $x_k$ covered, suppose that it is covered by $S_i$ in the $t$-th iteration, define

        $\text{price}(x_k) = 1 / |S_i \cap U_i|$

        to be the average “price” to cover element $x_k$ in the algorithm.

    • Lemma 1 :

      For the set cover $C$ returned by the algorithm, $|C|=\sum_{k=1}^n \text{price}(x_k)$.

    • Lemma 2 :

      For each $x_k$, $\text{price}(x_k) \leq \frac{\text{OPT}}{n-k+1}$, where $\text{OPT}$ is the size of the optimal set cover.

      Proof ( ★ )

    • Combine Lemma1 and Lemma2 :

      $|C|=\sum_{k=1}^n \text{price}(x_k) \leq \sum_{k=1}^n \frac{\text{OPT}}{n-k+1} = \sum_{k=1}^n \frac{\text{OPT}}{k} = H_n \cdot \text{OPT}$.

      $H_n$ means the $n$-th Harmonic number, $H_n \simeq \ln n + O(1)$.

    • Theorem :

      For any set cover instance $S_1, S_2, \cdots, S_n \subseteq U$ with optimal set cover of size $\text{OPT}$, the GreedyCover returns a set cover of size $C \leq H_n \cdot \text{OPT}$, where $n=|U|$ is the size of the universe and $H_n \simeq\ln n$ represents the $n$-th Harmonic Number.

Primal-Dual Algorithm

  • Definition :

    fig7_2

  • Theorem ( Weak Duality ) :

    For every matching $M$ and every vertex cover $C$ for $S_1, S_2, \cdots, S_m \subseteq U$, it holds that $|M| \leq |C|$.

    Proof

  • Corollary :

    Let $S_1, S_2, \cdots, S_m \subseteq U$ be an instance for set cover, and $\text{OPT}$ the size of the optimal set cover.

    For every matching $M$ for $S_1, S_2, \cdots, S_m$,it holds that $|M| \leq \text{OPT}$.

  • Dual Cover :

    fig7_3

Scheduling

Background

  • Pre-Definition :

    • There are $n$ jobs to be processed.
    • There are $m$ identical parallel machines. Each machine can start processing jobs at time $0$ and can process at most one job at a time.
    • Each job $j = 1, 2, \cdots, n$ must be processed on one of these machines for $p_j$ time units without interruption . $p_j$ is called the processing time of job $j$.
  • Minimum Makespan on Identical Parallel Machines( load balancing version ) :

    • Input :

      $n$ positive integers $p_1, p_2, \cdots, p_n$ and a positive integer $m$

    • Output :

      an assignment $\sigma : [n] \to [m]$ which minimizes $C_{\text{max}}=\max_{i \in [m]} \sum_{j:i=\sigma(j)}p_j$.

  • Partition Problem ($m = 2$) :

    Input : a set of $n$ positive integers $S=\lbrace x_1, x_2, \cdots, x_n \rbrace$

    Determine whether there is a partition of $S$ into $A$ and $B$ such that $\sum_{x \in A} = \sum_{x \in B}$.

Graham’s List algorithm

  • The List algorithm ( Graham 1966 )

    Input : a list of jobs $j=1,2, \cdots, n$ with processing times $p_1, p_2, \cdots, p_n$.

    for $j=1, 2, \cdots, n$, assign job $j$ to the machine that currently has the smallest load.

    It is well known that the list algorithm has approximation ration $(2 - \frac{1}{m})$.

  • Theorem :

    For every instance of scheduling $n$ jobs with processing times $p_1, p_2, \cdots, p_n$ on $m$ parallel identical machines, the List algorithm finds a schedule with makespan $C_{\text{max}} \leq (2 - \frac{1}{m}) \cdot \text{OPT}$, where $\text{OPT}$ is the makespac of optimal schedules.

    Proof(★)

Another natural idea for solving optimization problems is the local search. Given an instance of optimization problem, the principle of local search is as follows:

  • We start with a feasible solution.
  • At each step, we try to improve the solution by modifying the locally (changing the assignment of a constant number of variables), until nothing can be further changed (thus a local optimum is reached).

See at wiki

Longest Processing Time ( LPT )

  • Definition :

    Input : a list of jobs $j=1,2,\cdots, n$ with processing times $p_1 \geq p_2 \geq \cdots \geq p_n$ in non-increasing order.

    for $j=1,2,\cdots,n$, assign job $j$ to the machine that currently has the smallest load.

  • Analysis ( See at wiki )

Online Algorithms and Competitive Ratio

See at wiki

Assignments

Problem Set 1

Solution 1

  • My answer is “no”.

  • We can easily find that :

    • The number of distinct min-cuts in a connected acyclic graph is $O(n)$.
    • The number of distinct min-cuts in a cycle graph consists of $n$ vertices and $n$ edges is $O(\frac{n(n-1)}{2})$.
  • These conditions declare that $O(\log n)$ can’t be the upper bound on the number of distinct min-cuts in any multigraph $G$ with $n$ vertices.

  • We know that $p(n)=\text{Pr}[FastCut(G)\text{ returns a min-cut in G}]=\Omega(\frac{1}{\log n})$, and it’s obvious that $\text{Pr}[\text{a min-cut C is returned by }FastCut(G)]\leq \text{Pr}[FastCut(G)\text{ returns a min-cut in G}]=\Omega(\frac{1}{\log n})$.

  • Now we know $p_C \leq \Omega(\frac{1}{\log n })$, And since $p_{\text{correct}}$ is a well defined probability, due to the unitarity of probability, it must hold that $p_{\text{correct}}\leq 1$.

  • Therefore, the only facts we know are

    • $1 \geq p_{\text{correct}}=\sum_{C\in\mathcal{C}}p_C$
    • $\forall C\in\mathcal{C}, p_C \leq \Omega(\frac{1}{\log n})$
  • we cannot get the upper bound on the number of distinct min-cuts in any multigraph $G$ with $n$ vertices from the probability analysis of $FactCut$, because $\Omega(\frac{1}{\log n})$ is the lower bound of $p_C$, we cannot use it to estimate the upped bound of distinct min-cut numbers.

    Actually, we already knew that the upper bound on the number of distinct min-cuts in any multigraph $G$ with $n$ vertices is $O(\frac{n(n-1)}{2})$, and we already have a condition reaching this bound, so we can’t find a tighter bound.

  • In conclusion, my answer of this problem is “no”.


Solution 2

  • We can us Tree-Hash algorithm to judge whether two rooted trees are isomorphic.

  • Consider a kind of polynomial $h=\sum_{i=0}^{n-1}a_i*x_i$ over $\mathbb{Z}_p=\lbrace 0, 1,\dots, p-1 \rbrace$, in which $n$ is the size of multiset $S$ and $a_i$ means the $i$-th min number in $S$.

  • The Tree-Hash algorithm is as follows :

    • For a rooted tree, we will calculate its hash value which can be used to judge whether two trees are isomorphic.

    • We will use Depth-First-Search to travel the tree, and calculate every node’s hash value from bottom to top. We call node $u$’s hash value $V(u)$.

    • If current node is a leaf node, its hash value can be a random number $r\in [M]$. But we need declare that for all leaf nodes, $x$ must be validate.

    • If current node is not a leaf node, in other words it has a son set $S\neq\emptyset$, we first get the multiset $T$ which consists of all this node’s sons’ hash value, means $T=\cup_{s\in S}\lbrace V(s)\rbrace $.

    • Then we use the result after applied $h$ on multiset $T$ as the hash value of current node $a$, means $V(a)=h(T)$.

    • After all nodes’ hash value have been calculated, we use the root of this tree $r$’s hash value as the tree’s hash value, means we need $V(r)$.

    • Assuming $r_1$ and $r_2$ are roots of rooted Trees $T_1$ and $T_2$. We need to compare two hash value $V(r_1)$ and $V(r_2)$.

      $$\text{The result we derive}=\begin{cases}yes & \text{if } V(r_1)=V(r_2) \\ no & \text{if }V(r_1)\neq V(r_2)\end{cases}$$

  • Analysis :

    • Space Cost :

      Cause we should restore all nodes’ hash value, the space cost of Tree-Hash is $O(n)$.

    • Time Cost :

      If we need to sort the members of multiset $T$, the time cost is $O(n\log n)$.

      If we have some methods to calculate $h(T)$ in linear time of $T$’s size, the time cost is $O(n)$.

    • Error Analysis :

      Obviously, if the results is “no”, there two trees should not be isomorphic. But there exists a condition that two trees are non-isomorphic but Tree-Hash returns “yes”. In other words, Tree-Hash has one-sided error (False Positive).

      The degree of $h$ is at most $n-1$, and $r$ is chosen among $p$ distinct values, so $\text{Pr}[V(r1)=V(r2)]\leq \frac{n-1}{p}$.

      By choosing $p$ as a prime in the interval $[n^2, 2n^2]$, the one-sided error of false positive is at most $O(\frac{1}{n})$. We can also use different polynomials to make error probability lower.


Solution 3

Task 1

  • Assuming the Bloom Filter which were created for $A\cap B$ as $F_C’$.
  • We consider $F_A(i)$ as the $i$-th bit of Bloom Filter $F_A$, and $A_j$ means the $j$-th number of set $A$.
  • Obviously, for $\forall 1\leq i\leq k, 1\leq j \leq n,\text{ all }F_C’(h_i(C_j))=1\Rightarrow \text{ all }F_C(h_i(C_j))=1$.
  • Assuming that $\exists u\in A, u\not\in B, v\in B, v\not\in A$, we can know that $u,v\not\in\lbrace A\cap B \rbrace$.
  • Consider if $\exists 1\leq i\leq k, 1\leq j \leq k, h_i(u)=h_j(v)=p \Rightarrow F_A(p)=F_B(p)=1 \Rightarrow F_C(p)=1$.
  • As $u,v\not\in\lbrace A\cap B \rbrace$, $F_C’(p)=0$ may happen in a high probability.
  • In this condition, $F_C$ is not the same as the Bloom Filter that are created for $A\cap B$.

Task2

  • Assuming we have 3 sets $X,Y,Z$, which satisfies

    • $X\cup Y = A$
    • $Y \cup Z = B$
    • $X \cap Y = X \cap Z = Y \cap Z = \emptyset$
  • We can easily derive that $Y=A \cap B, X=A-Y, Z=B-Y$.

  • For a set with size equals $p$, the expectation number of positions which is $1$ using $k$ hash function, means $\mathbb{E}[\text{1’s number}]= \frac{m^{kp}-(m-1)^{kp}}{m^{kp-1}}$.

    • Assuming $a_i$ means the expectation number of positions which is $1$ using one hash function for a set with size $i$.
    • We know that obviously $a_1=1$ and can derive that $a_i=a_{i-1}+\frac{m-a_{i-1}}{m}=\frac{m-1}{m}*a_{i-1}+1$.
    • Than we will find that $a_i=\frac{m^i-(m-1)^i}{m^{i-1}}$, meanwhile we can see $k$ hash functions doing $ki$ hash processes independently and uniformly.
    • So $\mathbb{E}[{\text{number of positions which is }1 \text{ using }k \text{ hash function for a set with size }i}]=\frac{m^{ki}-(m-1)^{ki}}{m^{ki-1}}$, we record it as $s_i$.
  • Consider that the different positions between $F_A$ and $F_B$ born from two conditions : positions in $F_A$ is $1$ while in $F_B$ is $0$, positions in $F_A$ is $0$ while in $F_B$ is $1$.

    The above two conditions are symmetric, so we can consider one of them.

    Now we are talking about the expectation number of positions which in $F_A$ are 1, while in $F_B$ are 0, we called it $E_h$.

  • Assuming that $|A\cap B|=c$, easily we can find that $|Y|=c, |X|=|Z|=n-c=d$.

  • We consider several kinds of $1$s below :

    • $E_1$ means the expectation number of positions which in $F_X$ are 1, while in $F_Z$ are 0.

      We will find that $E_1=s_d \times (m-s_d) / m$ because there are no elements exists in $X$ and $Z$ at a time.

    • $E_2$ means the expectation number of positions which in $F_X$ are 1, while in $F_Y$ are 1.

      We will find that $E_2=s_d \times s_c / m$ because there are no elements exists in $X$ and $Y$ at a time.

    • $E_3$ means the expectation number of positions which in $F_X$ are 1, $F_Y$ are 1, while in $F_Z$ are 0.

      We will find that $E_3=\frac{s_d\times (m-s_d) \times s_c}{m^2}$ because there are no elements exists in at least two sets of $X$ and $Y$ and $Z$ at a time.

    • Consider $X \cap Y = X \cap Z = Y \cap Z = \emptyset$ and $F_A=F_X \bigvee F_Y,F_B=F_Z \bigvee F_Y$.

      We will derive that $E_h=E_1-E_3=\frac{m\times s_d\times (m-s_d)-s_d\times (m-s_d)\times s_c}{m^2}$.

  • The answer $\mathbb{E}=2E_h=2\times\frac{(m-s_c)\times s_d\times (m-s_d)}{m^2}$.

    Simple the formula, $\mathbb{E}=2m\times\frac{(m-1)^{kn}m^{k(n-c)}-(m-2)^{k(2n-c)}}{m^{k(2n-c)}}=2m\times((1-\frac{1}{m})^{kn}-(1-\frac{1}{m})^{k(2n-c)})$, in which $c=|A\cap B|$.

Problem Set 2

Solution 1

  • Consider $P_i=\text{Pr}[\text{The number of components will decrease from }i \text{ to }i-1 \text{ when adding an edge to graph}]$.

  • For every node $x$, there will exists at least $i-1$ edges connected to it and satisfy the condition,

    so for the graph there will exists at least $\frac{n(i-1)}{2}$ edges satisfy the condition.

  • Meanwhile, there are at most $\frac{n(n-1)}{2}-(i-1)$ edges still not being added to the graph,

    so $P_i \geq \frac{n(i-1)}{2} / (\frac{n(n-1)}{2}-(i-1))=\frac{ni-n}{n^2-n-2i+2} \geq \frac{i-1}{n-1}$, because of $i \gt 1$.

  • Let $e_i$ be the number of edges we need to make number of components decrease from $i$ to $i-1$, which means $\mathbb{E}[e_i]=1/P_i \leq \frac{n-1}{i-1}$.

  • Obviously we have $\mathbb{E}=\mathbb{E}[\sum_{i=2}^ne_i]=\sum_{i=2}^n\mathbb{E}[e_i]\leq \sum_{i=2}^n\frac{n-1}{i-1} = (n-1)H(n-1)$.

    Means $\mathbb{E} \leq (n-1)\ln (n-1)+O(n-1)$.

Solution 2

  • We define that there at most $\beta_i$ bins each containing at least $i$ balls in the end.

  • Consider that for a single operation, we choose the bin with $k$ balls in it.

    • If we choose the uniform paradigm, then $\beta_{k+1}$ will increase.

    • If we choose “Two-Choices” paradigm, then we need to choose a bin with $p$ balls in it uniformly.

      If $p \leq k$ then $\beta_{k+1}$ will increase, otherwise it will not increase.

      $\text{Pr}[\beta_{k+1}\text{ will increase}]=1 - \frac{\text{numbers of bins which contains more than }k\text{ balls} }{n} \leq 1$ .

  • We can derive the conclusion that

    the uniform paradigm always has the better paradigm for us to get higher maximum.

  • So for combinations of two paradigms, we define

    $\mathbb{E}_1$ means the expectation of maximum balls in one bin only consider the uniform paradigm,

    $\mathbb{E}_2$ means the expectation of maximum balls in one bin at the condition that we change “Two-Choices” paradigm to the uniform paradigm,

    and $\mathbb{E}$ means the the expectation of maximum balls in one bin for original paradigm.

  • For all 3 subproblems, we can derive the same conclusion :

    • $\mathbb{E}_1=\Theta(\frac{\log n }{\log \log n})$.
    • $\mathbb{E}_2=\Theta(\frac{\log n }{\log \log n})$.
    • $\mathbb{E}_1 \leq \mathbb{E} \leq \mathbb{E}_2$.
  • So for all 3 subproblems, the asymptotically tight bound is $\Theta(\frac{\log n }{\log \log n})$.

Solution 3

Task1

  • $\text{Pr}[X \geq t] = \text{Pr}[e^{\lambda X} \geq e^{\lambda t}] \leq \frac{\mathbb{E}[e^{\lambda X}]}{e^{\lambda t}} = e^{\ln \mathbb{E}[e^{\lambda X}] - \lambda t}$.

  • For $\forall t, \lambda t - \ln \mathbb{E}[e^{\lambda X}] \geq \sup_{\lambda \geq 0} (\lambda t - \ln \mathbb{E}[e^{\lambda X}]) = \Psi^*_X(t)$,

    which means $\text{Pr}[X \geq t] \leq e^{\ln \mathbb{E}[e^{\lambda X}] - \lambda t} \leq \exp(-\Psi^*_X(t))$.

  • Consider $F(\lambda) = \lambda t - \Psi_X(\lambda)$, $F’(\lambda) = t - \Psi’_X(\lambda)$,

    means $F(\lambda)$ reaches an extremum over $\lambda \geq 0$ satisfying $\Psi’_X(\lambda) = t$.

  • Because $\Psi_X(\lambda)$ is strictly convex as the problem statement says,

    we can derive that $F(\lambda)$ reaches supreme here uniquely.

Task 2

  • $\mathbb{E}[e^{\lambda X}]=\int_{-\infty}^{+\infty}\exp(\lambda x)\frac{1}{\sigma \sqrt{2\pi}}\exp(-\frac{(x-\mu)^2}{2\sigma^2})=\exp(\lambda\mu+\sigma^2\lambda^2/2)$.
  • $\Psi_X(\lambda)=\ln \mathbb{E}[e^{\lambda X}]=\lambda\mu+\sigma^2\lambda^2/2$.
  • $\Psi^X(t)=\sup{\lambda \geq 0}(\lambda t - \Psi_X(\lambda))$, put $\Psi’_X(\lambda)=t \Leftrightarrow \mu + \sigma^2\lambda=t \Leftrightarrow \lambda=\frac{t-\mu}{\sigma^2}$ in, we can derive that $\Psi^_X(t)=\frac{(t-\mu)^2}{2\sigma^2}$.
  • $\text{Pr}[X \geq t] \leq \exp(-\Psi^*_X(t)) = \exp(-\frac{(t-\mu)^2}{2\sigma^2})$.

Task 3

  • $\mathbb{E}[e^{\lambda X}] = \sum_{k=0}^{+\infty} e^{\lambda k}\frac{e^{-\nu}\nu^k}{k!} = \exp((e^\lambda - 1)\nu) \sum_{k=0}^{+\infty}\frac{\exp(-\nu e^\lambda)(\nu e^\lambda)^k}{k!} = \exp(\nu(e^\lambda - 1)) = \exp(\nu e^\lambda - \nu)$.
  • $\Psi_X(\lambda) = \ln \mathbb{E}[e^{\lambda X}] = \nu e^\lambda - \nu$.
  • $\Psi’_X(\lambda) = t \Leftrightarrow \lambda = \ln \frac{t}{\nu}$.
  • $\Psi^*_X(t)=t\ln \frac{t}{\nu} - t + \nu$.
  • $\text{Pr}[X \geq t] \leq \exp(-\Psi^*_X(t)) = e^{t - \nu}(\frac{\nu}{t})^t$.

Task 4

  • Obviously, $\mathbb{E}[e^{\lambda X}] = pe^\lambda + (1-p)$, and $\Psi_X(\lambda) = \ln (pe^\lambda + (1-p))$.
  • Consider $\Psi’_X(\lambda) = t \Leftrightarrow \lambda = \ln \frac{(1-p)t}{(1-t)p}$.
  • $\Psi^*_X(t) = t\ln \frac{(1-p)t}{(1-t)p} - \ln ( \frac{t(1-p)}{1-t} + 1-p ) = (1-t)\ln \frac{1-t}{1-p} + t\ln \frac{t}{p}$.

Task 5

  • Consider each $X_i$ are independent, $\Psi_X(\lambda) = \ln( \prod_{i=1}^n\mathbb{E}[e^{\lambda X_i}] ) = \sum_{i=1}^n \ln(\mathbb{E}[e^{\lambda X_i}]) = \sum_{i=1}^n \Psi_{X_i}(\lambda)$.
  • $\Psi^X(t) = \sup{\lambda \geq 0}(\lambda t - \Psi_X(\lambda)) = \sup_{\lambda \geq 0}(\lambda t - \sum_{i=1}^n\Psi_{X_i}(\lambda)) = \sum_{i=1}^n \sup_{\lambda \geq 0}(\lambda t/n - \Psi_{X_i}(\lambda)) = n\Psi^_{X_i}(\frac{t}{n}).$
  • For $X \sim Bin(n,p)$, $\text{Pr}[X \geq t] \leq \exp(\Psi^_X(t))=\exp(n\Psi^_{X_i}(\frac{t}{n}))=(\frac{n-t}{n-np})^{n-t}(\frac{t}{np})^t$.
  • When $X_i$ comply with the geometric distribution with a probability $p$ of success :
    • $\Psi_{X_i}(\lambda)=\ln\mathbb{E}[e^{\lambda X_i}] = \ln( \sum_{k=0}^{+\infty} e^{\lambda k}(1-p)^kp) = \ln p -\ln(1+(p-1)e^\lambda)$.
    • The above is well defined when $\lambda \lt -\ln(1-p)$.
    • $\Psi’_{X_i}(\lambda) = t \Leftrightarrow \lambda = \ln \frac{t}{(1-p)(t+1)}$.
    • $\Psi^*_{X_i}(t)=t\ln\frac{t}{(1-p)(t+1)}-\ln p(t+1)$.
    • $\text{Pr}[X \geq t] \leq \exp(\Psi^_X(t))=\exp(n\Psi^_{X_i}(\frac{t}{n}))=(\frac{t}{(1-p)(t+n)})^t(\frac{n}{p(t+n)})^n$.

Solution 4

Task 1

  • If $n \lt k$, according to Pigeonhole Principal , it’s obvious that $d = 0$.

    So we only consider the condition that $n \geq k$.

  • We assume that the distance result is $(\frac{1}{2}-u\sqrt{r})n=D$.

  • Then we want to prove that if we choose a pair $x_1, x_2 \in \lbrace 0, 1\rbrace ^n$ randomly,

    the expectation number of pairs which satisfies $d \leq D$ is less than 1.

  • Consider we have $x_1, x_2 \in \lbrace 0, 1\rbrace ^n$, $X_i=\text{Pr}[x_1(i) \neq x_2(i)]=\frac{1}{2}$,

    and we can see $X_i$ as $n$ independent randomize varibles, which $\mu=\mathbb{E}[X] = \frac{n}{2}$.

  • Using Chernoff Bounds, we can derive that $\text{Pr}[X \leq (1-\delta)\mu] \lt \exp(-\frac{\mu\delta^2}{2})$.

  • Because there are $\frac{2^k(2^k-1)}{2}$ different pairs, so we want $\exp(-\frac{\mu\delta^2}{2}) \leq 4^{-k} \leq \frac{2}{2^k(2^k-1)}$.

    We take $\mu=\frac{n}{2}$ and $(1-\delta)\mu = D \Leftrightarrow \delta = 2u\sqrt{r}$.

    Then we have $\exp(-u^2k) \leq 4^{-k} \Leftrightarrow u \geq \sqrt{\ln 4}$.

  • Conslusion the above, we can derive that :

    There exists a linear boolean code $C : \lbrace 0, 1 \rbrace^k \to \lbrace 0, 1 \rbrace ^ n$ of code rate $r$, and distance $(\frac{1}{2}-\Theta(\sqrt{r}))n$.

    The constant in $\Theta(\cdot)$ can be $\sqrt{\ln 4}$.

Task 2

  • If $n \lt k$, according to Pigeonhole Principal , it’s obvious that $d = 0$.

    So we only consider the condition that $n \geq k$.

  • We assume that the distance result is $(\frac{1}{2}-u\sqrt{r})n=D$.

  • Then we want to prove that if we choose the Matrix $A$ randomly,

    the expectation number of Matrix which satisfies $d \leq D$ is less than 1.

    In other words, there exists a Matrix can make $d \gt D$.

  • Consider a $n\times k$ Matrix $A$ and we have $x_1, x_2 \in \lbrace 0, 1\rbrace ^k$, $S = \lbrace i \mid 1 \leq i \leq k, x_1(i) \neq x_2(i) \rbrace$,

    then $d \leq D$ means $\sum_{i=1}^n [\sum_{j \in S} A_{i,j} \equiv 1 (\mod2)] \leq D$.

  • Consider $X_i=\text{Pr}[\text{for a valid }i, \sum_{j \in S} A_{i,j} \equiv 1 (\mod2)]=\frac{1}{2}$,

    and we can see $X_i$ as $n$ independent randomize varibles, which $\mu=\mathbb{E}[X] = \frac{n}{2}$.

  • Using Chernoff Bounds, we can derive that $\text{Pr}[X \leq (1-\delta)\mu] \lt \exp(-\frac{\mu\delta^2}{2})$.

  • Because there are $2^k$ different sets $S$, so we want $\exp(-\frac{\mu\delta^2}{2}) \leq 2^{-k}$.

    We take $\mu=\frac{n}{2}$ and $(1-\delta)\mu = D \Leftrightarrow \delta = 2u\sqrt{r}$.

    Then we have $\exp(-u^2k) \leq 2^{-k} \Leftrightarrow u \geq \sqrt{\ln 2}$.

  • Conslusion the above, we can derive that :

    There exists a linear boolean code $C : \lbrace 0, 1 \rbrace^k \to \lbrace 0, 1 \rbrace ^ n$ of code rate $r$, and distance $(\frac{1}{2}-\Theta(\sqrt{r}))n$.

    The constant in $\Theta(\cdot)$ can be $\sqrt{\ln 2}$.

Problem Set 3

Solution 1

Task a

  • Because random variables $X​$ are negatively associated.

  • Define $f(X_i, i \in [k-1] ) = \prod_{i \in [k-1]} f_i(X_i)$, $g(X_j, j \in \lbrace k \rbrace) = f_k(X_k)$.

  • Obviously $f$ and $g$ are non-negative and non-decreasing, from the definition of negatively associated,

    $\mathbb{E}[f(X_i, i \in [k-1])g(X_j, j \in \lbrace k \rbrace)] \leq \mathbb{E}[f(X_i, i \in [k-1])] \mathbb{E}[g(X_j, j \in \lbrace k \rbrace)]$

    $\Rightarrow \mathbb{E}[ \prod_{i \in [k] } f_i(X_i) ] \leq \mathbb{E}[f_k(X_k)] \cdot \mathbb{E}[\prod_{i \in [k-1]} f_i(X_i)]$

  • By expanding the inequality, we can easily derive that :

    $\mathbb { E } [ \prod _ { i \in [ n ] } f _ { i } ( X _ { i } ) ] \leq \prod _ { i \in [ n ] } \mathbb { E } [ f _ { i } ( X _ { i } ) ]$

Task b

  • Consider $X = \sum_{i=1}^n X_i$, $X_1, X_2, \cdots, X_n$ are negatively associated, $\mu = \mathbb{E}[X]$.

  • Upper Tail :

    For $\lambda \geq 0$, $\text{Pr}[X \geq (1+\delta)\mu] = \text{Pr}[e^{\lambda X} \geq e^{\lambda (1+\delta)\mu}] \leq \frac{\mathbb{E}[e^{\lambda X}]}{e^{\lambda (1+\delta)\mu}}$

    Proof :

    • $\mathbb{E}[e^{\lambda X}] = \mathbb{E}[ \exp( \lambda \sum_{i=1}^n X_i )] = \mathbb{E}[ \prod_{i=1}^n \exp(\lambda X_i) ] \leq \prod_{i=1}^n \mathbb{E}[\exp(\lambda X_i)]$

      ( $e^{\lambda X}$ is a non-decreasing function )

    • Let $p_i = \text{Pr}[X_i = 1] for i = 1, 2, \cdots, n$. Then $\mu = \sum_{i=1}^n p_i$.

    • $\mathbb{E}[e^{\lambda X_i}] = p_i \cdot e^{\lambda \cdot 1} + (1-p_i) \cdot e^{\lambda \cdot 0} = 1 + p_i(e^{\lambda} - 1) \leq e^{p_i(e^\lambda - 1)}$.

    • Therefore, $\mathbb{E}[e^{\lambda X}] \leq e^{\mu (e^{\lambda}-1)}$.

    • Then we have $\text{Pr}[X \geq (1+\delta)\mu] \leq \left( \frac{\exp(e^\lambda - 1)}{\exp(\lambda(+\delta))} \right)^\mu = \left( \frac{e^\delta}{(1+\delta)^{1+\delta}} \right)^\mu$, when $\lambda = \ln(1 + \delta) \gt 0$.

  • Lower Tail :

    Consider $\forall 1 \leq i \leq n, Y_i = 1 - X_i, Y = \sum_{i=1}^n Y_i = n - X$.

    Proof :

    • $\text{Pr}[ X \leq (1-\delta)\mu ] = \text{Pr}[n-X \geq n - (1-\delta)\mu] = \text{Pr}[e^{\lambda Y} \geq e^{\lambda(n-(1-\delta)\mu)}] \leq \frac{\mathbb{E}[e^{\lambda Y}]}{e^{\lambda (n-(1-\delta))\mu}}$.
    • $\mathbb{E}[e^{\lambda Y}] \leq \prod_{i=1}^n \mathbb{E}[e^{\lambda Y_i}]$ ( $\lambda \gt 0$ $\Rightarrow e^{\lambda X}$is a non-decreasing function ).
    • $\mathbb{E}[e^{\lambda Y_i}] = p_i\exp(\lambda (1-1)) + (1-p_i)\exp(\lambda(1-0)) = e^\lambda + p_i(1 - e^\lambda) \leq e^\lambda \exp(p_i(e^{-\lambda}-1))$.
    • Then we have $\text{Pr}[X \leq (1-\delta)\mu] \leq \frac{\exp(n\lambda)\exp(\mu(e^{-\lambda} - 1))}{\exp(\lambda(n-(1-\delta)\mu)} = \left( \frac{e^{-\delta}}{(1-\delta)^{1-\delta}} \right)^\mu$, when $\lambda = - \ln(1 - \delta) \gt 0$.

Task c

  • Suppos that for a stable bin $l$, we consider $B_{1,l}, \cdots, B_{n,l}$ as $H_1, \cdots, H_n$.

    Now we want to prove $H=(H_1, \cdots, H_n)$ are negatively associated first.

    • First of all, $H=(H_1, \cdots, H_n)$ satisfies $\forall 1 \leq i \leq n, H_i \in \lbrace 0, 1 \rbrace$, and $\sum_{i=1}^n H_i = 1$.

    • For any non-decreasing function $f$, consider $f(H_i, i \in I)$.

      We define $f(\underbrace{0, \cdots, 0}{|I|}) = a$, and $f(\underbrace{0, \cdots, 1, \cdots, 0}{|I|}) = a + u(i)$.

      ($H_i = 1$, $f$ is a non-decreasing function $\Rightarrow u(i) \geq 0$ ).

    • For any non-decreasing function $g$, consider $g(H_j, j \in J)$.

      We define $g(\underbrace{0, \cdots, 0}{|J|}) = b$, and $g(\underbrace{0, \cdots, 1, \cdots, 0}{|J|}) = b + v(j)$.

      ( $H_j = 1$, $g$ is a non-decreasing function $\Rightarrow v(j) \geq 0$ ).

    • There exists exactly one $i \in [n]$ satisfies $H_i = 1, \forall j \in [n], j \neq i, H_j = 0$.

    • We can calculate that :

      $\mathbb{E}[f(H_i, i \in I)] = a + \sum_{i \in I} u(i)p_{i, l}$

      $\mathbb{E}[g(H_j, j \in J)] = b + \sum_{j \in J} v(j)p_{j, l}$

      $\mathbb{E}[f(H_i, i \in I)g(H_j, j \in J)] = ab + b\sum_{i \in I} u(i)p_{i, l} + a\sum_{j \in J} v(j)p_{j, l}$

      ( Because $\exists i \in I$, $H_i = 1 \Rightarrow \forall j \in J, H_j = 0$, vice versa )

    • Obviously we have :

      $\sum_{i \in I} u(i)p_{i, l} \times \sum_{j \in J} v(j)p_{j, l} \geq 0$

      $\Rightarrow \mathbb{E}[f(H_i, i \in I)g(H_j, j \in J)] \leq \mathbb{E}[f(H_i, i \in I)] \mathbb{E}[g(H_j, j \in J)]$

    • We can derive that $H=(H_1, \cdots, H_n)$ are negatively associated.

  • Because $\forall l \in [m]$, $B_{1,l}, \cdots, B_{n,l}$ are negatively associated.

    Then we can get $B_{1,1}, \cdots, B_{n,m}$ are negatively associated. ( Closure under products )

  • Then we define $f_i(B_{i,1}, \cdots, B_{i,m}) = \sum_{t=1}^m B_{i,t}$. Obviously $f_i$ is a non-decreasing funciton.

  • $B=(B_1, \cdots, B_n) = (f_1(B_{1,1}, \cdots, B_{1,m}), \cdots, (f_n(B_{n,1}, \cdots, B_{n,m}))$ are negatively associated.

    ( Disjoint monotone aggregation )

  • Finally, we can get the conclusion that $B_1, \cdots, B_n$ are negatively associated.

Solution 2

Task a

  • Suppose that $n$ is the degree of node $u$.

    Actually because the graph is a $\Delta$-regular bipartite graph, $n = \Delta$.

  • Consider $n$ independent $\text{2-d}$ random variables $x_1, \cdots, x_n$, in which $\forall 1 \leq i \leq n, x_i = ( c_i, s_i )$.

    $c_i$ means the color of $e_i$, $e_i$ means the $i$-th edge which incident to node $u$,

    $s_i \in \lbrace 0, 1 \rbrace = \begin{cases} 0, & \text{there is no edge incident to }v_i \text{ has same color with }e_i \ 1, & \text{there exists edges incident to }v_i \text{ has same color with }e_i \end{cases}$.

    $v_i$ means the other endpoint of $e_i$ except $u$.

  • We can easily find that we can calculate

    $Z = f(x_1, \cdots, x_n)$ means the number of edges which can get its final color after the first step,

    only use $x_1, \cdots, x_n$.

  • Additionaly, we need to prove that $x_1, \cdots, x_n$ are independent :

    Actually, for an edge $e_i$, $s_i$ is decides by $c_i$ and edges’ color which incident to node $v_i$.

    • Consider if we change $c_i$, and $\exists 1 \leq j \leq n, j \neq i, x_j$ changes, we can derive that $s_j$ changes.

      This means $e_i$ and $e_j$ are adjancent not only by node $u$,

      which means the graph is not a simple graph, which conflicts to the definition.

    • Consider if we change the color of one edge which is not incident to node $u$ :

      There may $\exists 1\leq i \leq n$ , $s_i$ changes.

      If there $\exists 1 \leq i,j \leq n, i \neq j$, both $s_i$ and $s_j$ changes, that means :

      There exists an edge, which connects node $v_i$ and $v_j$.

      This will be impossible, otherwise the graph is not a bipartite graph cause there exists a circle consists of $u, e_i, e_j$ 3 nodes.

    To conclude the above, at most one element of $x_1, \cdots, x_n$ will change if we change any edge’s color once, which means $x_1, \cdots, x_n$ are indepedent.

  • Now we find that $\forall i, 1 \leq i \leq n$, if we change $x_i$ to $y_i$, the $f(x_1, x_2, \cdots, x_n)$ may change too.

  • Consider the worst case :

    The edge $i$ and the only edge with color $y_i$’s first sub-element can both get its final color in the first step before this change.

    Now they both cannot get their final color, the answer decline by 2.

    The best case can be analyzed in the same way.

  • Then we can derive that, for any $x_1, x_2, \cdots, x_n$ and any $y_i$ :

    $|f(x_1, \cdots, x_{i-1}, x_i, x_{i+1}, \cdots, x_n) - f(x_1, \cdots, x_{i-1}, y_i, x_{i+1}, \cdots, x_n)| \leq 2$.

    Which means $f$ satisfies the Lipschitz condition with constant $2$.

  • Consider the Bounded Difference Method :

  • $\text{Pr}[ | Z - \mathbb{E}[Z] | \gt t] \leq 2 \exp(- \frac{t^2}{2 \sum_{i=1}^n 4}) = 2 \exp(- \frac{t^2}{8n}) = 2 \exp(- \frac{t^2}{8 \Delta})$ ( $n = \Delta$ )

    Which means $\text{Pr}[ | Z - \mathbb{E}[Z] | \gt t] \leq 2 \exp( - \Theta(\frac{t^2}{\Delta}) )$.

Task b ( Bonus )

  • Solution 3

Task a

  • We call $x_i[p]$ as the $p$-th dimension coordinate of point $x_i$.

  • $\text{cost}(x_1, \cdots, x_n, C_1, \cdots, C_k) = \sum_{i=1}^k \sum_{j \in C_i} | x_j - \mu_i |_2^2$

    $= \sum_{i=1}^k \frac{1}{|C_i|^2} \sum_{j \in C_i} | |C_i|x_j - \sum_{u \in C_i}x_u |_2^2$

    $= \sum_{i=1}^k \frac{1}{|C_i|^2} \sum_{j \in C_i} \sum_{t=1}^d (|C_i|x_j[t] - \sum_{l \in C_i} x_l[t])^2$

    $=\sum_{i=1}^k \frac{1}{|C_i|} \sum_{t=1}^d \lbrace |C_i|\sum_{j \in C_i} x_j[t]^2 - (\sum_{j \in C_i} x_j[t]) ^2\rbrace$

    $= \sum_{i=1}^k \frac{1}{|C_i|} \sum_{t=1}^d \lbrace (|C_i|-1)\sum_{j \in C_i} x_j[t]^2 - 2\sum_{j,l \in C_i, j \lt l} x_j[t]x_l[t] \rbrace$

    $= \sum_{i=1}^k \frac{1}{|C_i|} \sum_{t=1}^d \sum_{j,l \in C_i, j \lt l} ( x_j[t] - x_l[t] )^2$

    $= \sum_{i=1}^k \frac{1}{|C_i|} \sum_{j,l \in C_i, j \lt l} | x_j - x_l |_2^2$

Task b

  • We have already known that

    For the matrix $A \in \mathbb{R}^{k \times d}$, let each entry of $A$ os chosen i.i.d. from $N(0, 1 / k)$,

    then for any unit vector $\vec{u} \in \mathbb{R}^d : \text{Pr}[| | A\vec{u} |_2^2 - 1 | \gt \epsilon] \lt 1 / n^3$,

    Which means $\text{Pr}[| | \tilde{x}_i - \tilde{x}_j |_2^2 - | x_i - x_j |_2^2 | \gt \epsilon \cdot | x_i - x_j |_2^2] \lt 1 / n^3$.

  • $\text{Pr}[| | \tilde{x}_i - \tilde{x}_j |_2^2 - | x_i - x_j |_2^2 | \gt \epsilon \cdot | x_i - x_j |_2^2]$

    $= \text{Pr}[ | \tilde{x}_i - \tilde{x}_j |_2^2 \gt (1 + \epsilon) | x_i - x_j |_2^2 ] + \text{Pr}[ | \tilde{x}_i - \tilde{x}_j |_2^2 \lt (1 +\epsilon) | x_i - x_j |_2^2 ]$

    $\Rightarrow \begin{cases} \text{Pr}[ | \tilde{x}_i - \tilde{x}_j |_2^2 \gt (1 + \epsilon) | x_i - x_j |_2^2 ] \lt 1 / n^3 \ \text{Pr}[ | \tilde{x}_i - \tilde{x}_j |_2^2 \lt (1 +\epsilon) | x_i - x_j |_2^2 ] \lt 1 / n^3 \end{cases}$

  • Then we use union bound :

    $\text{Pr}[ \sum_{j,l \in C_i, j \lt l} | \tilde{x}_j - \tilde{x}_l |2^2 \gt (1 + \epsilon) \sum{j,l \in C_i, j \lt l} | x_j - x_l |_2^2] \lt |C_i|^2 \cdot 1 / n^3$

    $\text{Pr}[ \sum_{j,l \in C_i, j \lt l} | \tilde{x}_j - \tilde{x}_l |2^2 \lt (1 - \epsilon) \sum{j,l \in C_i, j \lt l} | x_j - x_l |_2^2] \lt |C_i|^2 \cdot 1 / n^3$

  • Additionaly we have : $\sum_{i=1}^k |C_i|^2 \leq (\sum_{i=1}^k |C_i|)^2 = n^2$

  • Then we can derive that :

    $\text{Pr}[ \sum_{i=1}^k \frac{1}{|C_i|} \sum_{j,l \in C_i, j \lt l} | \tilde{x}_j - \tilde{x}_l |2^2 \gt (1 + \epsilon)\sum{i=1}^k \frac{1}{|C_i|} \sum_{j,l \in C_i, j \lt l} | x_j - x_l |_2^2] \lt n^2 \cdot 1 / n^3 = 1 / n$

    $\text{Pr}[ \sum_{i=1}^k \frac{1}{|C_i|} \sum_{j,l \in C_i, j \lt l} | \tilde{x}_j - \tilde{x}_l |2^2 \lt (1 - \epsilon)\sum{i=1}^k \frac{1}{|C_i|} \sum_{j,l \in C_i, j \lt l} | x_j - x_l |_2^2] \lt n^2 \cdot 1 / n^3 = 1 / n$

  • That means :

    $( 1 - \epsilon ) \text {cost} \left( x_ { 1 } , \cdots , x_ { n } , C _ { 1 } , \cdots , C _ { k } \right)\leq \text {cost} \left( \tilde { x } _ { 1 } , \cdots , \tilde { x } _ { n } , C _ { 1 } , \cdots , C _ { k } \right) \leq ( 1 + \epsilon ) \text {cost} \left( x_ { 1 } , \cdots , x _ { n } , C _ { 1 } , \cdots , C _ { k } \right)$

    w.h.p.

Task c

  • Consider the above, we have :

    $( 1 - \epsilon ) \text {cost} \left( x_ { 1 } , \cdots , x_ { n } , C _ { 1 } , \cdots , C _ { k } \right)\leq \text {cost} \left( \tilde { x } _ { 1 } , \cdots , \tilde { x } _ { n } , C _ { 1 } , \cdots , C _ { k } \right) \leq ( 1 + \epsilon ) \text {cost} \left( x_ { 1 } , \cdots , x _ { n } , C _ { 1 } , \cdots , C _ { k } \right)$

    w.h.p.

  • So we can derive that :

    $\text {cost} \left( x_ { 1 } , \cdots , x_ { n } , \tilde{C} _ { 1 } , \cdots , \tilde{C} _ { k } \right) \leq \frac{1}{1 - \epsilon} \text {cost} \left( \tilde{x}_ { 1 } , \cdots , \tilde{x}_ { n } , \tilde{C} _ { 1 } , \cdots , \tilde{C} _ { k } \right)$ w.h.p.

    $\text {cost} \left( \tilde{x}_ { 1 } , \cdots , \tilde{x}_ { n } , \tilde{C} _ { 1 } , \cdots , \tilde{C} _ { k } \right) \leq \gamma \cdot \text{cost}\left( \tilde{x}_ { 1 } , \cdots , \tilde{x}_ { n } , \tilde{C}^ _ { 1 } , \cdots , \tilde{C}^ _ { k } \right)$ ( we already knew )

    $\text{cost}\left( \tilde{x}_ { 1 } , \cdots , \tilde{x}_ { n } , \tilde{C}^ _ { 1 } , \cdots , \tilde{C}^ _ { k } \right) \leq \text{cost}\left( \tilde{x}_ { 1 } , \cdots , \tilde{x}_ { n } , C^ _ { 1 } , \cdots , C^ _ { k } \right)$ ( the nature of optimal partition )

    $\text{cost}\left( \tilde{x}_ { 1 } , \cdots , \tilde{x}_ { n } , C^ _ { 1 } , \cdots , C^ _ { k } \right) \leq (1+\epsilon)\text{cost}\left( x_ { 1 } , \cdots , x_ { n } , C^ _ { 1 } , \cdots , C^ _ { k } \right)$ w.h.p.

  • Consider the inequality above :

    $\text {cost} \left( x_ { 1 } , \cdots , x_ { n } , \tilde{C} _ { 1 } , \cdots , \tilde{C} _ { k } \right) \leq \frac{1+\epsilon}{1-\epsilon} \cdot \gamma \cdot \text{cost}\left( x_ { 1 } , \cdots , x_ { n } , C^ _ { 1 } , \cdots , C^ _ { k } \right)$ w.h.p.

  • In the condition $\epsilon \leq 1/2$, $\frac{1+\epsilon}{1-\epsilon} \leq 1 + 4\epsilon$.

  • By concluding above, we can get the conclusion :

    $\text {cost} \left( x_ { 1 } , \cdots , x_ { n } , \tilde{C} _ { 1 } , \cdots , \tilde{C} _ { k } \right) \leq (1 + O(\epsilon)) \cdot \gamma \cdot \text{cost}\left( x_ { 1 } , \cdots , x_ { n } , C^ _ { 1 } , \cdots , C^ _ { k } \right)$ w.h.p