Back to lab

To P or (N)ot to P?

Even if you have never heard of Computational Complexity theory, you might be familiar with the phrase: “P vs. NP”, but what does that even mean? Who are P and NP? Why are they fighting, and why should I care?

As to the latter question: the P vs. NP problem is one of the Clay Mathematics Institute’s 77 millenium problems (66 of which, including P vs. NP, are unsolved), for which a solution yields the solver a prize of $11 million in U.S currency, immense academic prestige, massive breakthroughs in several fields of science, and never losing two truths and a lie ever again.

Loosely speaking the question pertains to whether a certain type of problems is fundamentally harder (for a “computer”) to solve than others. P represents the class of problems solvable in polynomial time (colloquially thought of as efficiently/quickly), while NP represents the class of problems where, if a solution is found, it can be verified to be correct in polynomial time.

To elucidate further consider the following two graph problems.

Let GG be some graph, meaning a set of vertices/points connected by edges/lines.

A Eulerian cycle visits every edge exactly once. This problem is efficiently solvable: essentially, one checks that the graph is connected and that every vertex is connected to an even number of other vertices.

A Hamiltonian cycle visits every vertex exactly once. If someone gives us such a cycle, we can check it quickly. But finding one is NP-complete.

So two problems that sound almost identical behave very differently:

visit every edgevs.visit every vertex. \text{visit every edge} \quad \text{vs.} \quad \text{visit every vertex}.

If PNP\text{P} \neq \text{NP}, then determining whether a graph has a hamiltonian cycle is fundamentally harder (for a “computer”) to determine. If P=NP\text{P} = \text{NP}, then every problem as hard as Hamiltonian cycle could be solved “quickly” and this would undoubtedly have incredible consequences including, but not limited to:

So the latter outcome would have immense implications on the real world, while the former outcome is perhaps the more philosophically interesting.

Sadly, or perhaps thankfully depending on which consequences you look at, most Computational Complexity theorists do not believe that P=NP\text{P} = \text{NP}, as proving that P=NP\text{P} = \text{NP} would be exceedingly simple to prove compared to the alternative. In fact, preprint servers such as arXiv or (the more dubious) viXra, are littered with false proofs that P=NP\text{P} = \text{NP} in the form of supposed polynomial time algorithms for NP-complete problems such as 33-SAT. That is where deductive sciences such as Math, Theoretical Computer Science, Logic, etc. differ from empirical ones. Over the 50+50+ years Computational Complexity theory has existed, no one has legitimately found such an algorithm. Thus, one might think we could decide that PNP\text{P} \neq \text{NP} on emprical evidence. However, this is not enough. We need actual mathematical proof that this is the case. What we’re left with is the brutally simple observation that

PNP, \text{P} \subseteq \text{NP},

and some approaches towards closing the gap in the form of modren research programs.

Now that you’ve gotten some, possibly confusing, insight into what the P vs. NP problem is let us embark on a slightly more rigorous exposition of what Computational Complexity theory is.

The Turing Machine

You may have noticed that above I write “computer” in quotation marks. That is because theoretical computer scientists, despite the name, do not usually work directly with computers in the everyday sense. They work with models of computation: mathematically precise idealizations of what it means to compute.

This is not because laptops are beneath us, although some laptops do try very hard to deserve contempt. It is because actual computers are complicated physical objects. They have processors, memory hierarchies, operating systems, compilers, caches, cosmic-ray-induced bit flips if the universe is feeling dramatic, and fans that begin screaming the moment one opens Chrome. That is far too messy if the goal is to prove theorems.

Instead, we use simpler mathematical machines which capture the essential idea of algorithmic computation. The most famous, and still the central one for complexity theory, is the wonderful Turing machine.

Alan Turing introduced this model in his 1936 paper On Computable Numbers, with an Application to the Entscheidungsproblem. The historical context is important: Turing was not trying to design a laptop. He was trying to formalize what it means for something to be computable by a finite mechanical procedure. In doing so, he helped create the mathematical foundation of computer science.

Turing was also one of the central figures in the Allied codebreaking effort at Bletchley Park during the Second World War, where his ideas contributed to the development of the Bombe machines used against Enigma-encrypted messages. But his deepest contribution to theoretical computer science is arguably not the wartime story, famous as it is. It is the fact that he gave us a precise mathematical language for talking about computation itself.

As a brief cultural aside: the movie The Imitation Game is a perfectly watchable film, but it spends much of its time on a dramatized and historically dubious version of Turing’s life, while somehow making the invention of modern computability theory look like an asside. I think most people would agree that a movie about his developments in mathematics would be way more interesting.

So what is a Turing machine?

Intuitively, a Turing machine consists of:

At each step, the machine looks at its current state and the symbol currently under the head. Based on this information, it writes a symbol, moves the head left or right, and changes state.

Formally, a deterministic Turing machine is a tuple

M=(Q,Σ,Γ,δ,q0,qacc,qrej),M = (Q,\Sigma,\Gamma,\delta,q_0,q_{\mathrm{acc}},q_{\mathrm{rej}}),

where:

The transition function has the form

δ:(Q{qacc,qrej})×ΓQ×Γ×{L,R}.\delta : (Q \setminus \{q_{\mathrm{acc}},q_{\mathrm{rej}}\}) \times \Gamma \to Q \times \Gamma \times \{L,R\}.

This means that if the machine is in state qq and reads symbol aa, then δ(q,a)\delta(q,a) tells it three things:

new state,symbol to write,direction to move.\text{new state}, \qquad \text{symbol to write}, \qquad \text{direction to move}.

An input string xΣx \in \Sigma^\ast is written on the tape, the machine starts in state q0q_0, and then it follows its transition function step by step. If the machine eventually enters qaccq_{\mathrm{acc}}, it accepts xx. If it eventually enters qrejq_{\mathrm{rej}}, it rejects xx.

For decision problems, we usually think of a Turing machine as deciding a language

LΣ.L \subseteq \Sigma^\ast.

The language LL is the set of yes-instances of the problem. We say that MM decides LL if, for every input xx,

xLM accepts x,x \in L \quad\Longleftrightarrow\quad M \text{ accepts } x,

and

xLM rejects x.x \notin L \quad\Longleftrightarrow\quad M \text{ rejects } x.

In other words, a Turing machine decides a problem if it always eventually gives the correct yes/no answer.

This is the first important point: complexity theory is usually about decision problems. We ask questions like:

Does this graph have a Hamiltonian cycle?\text{Does this graph have a Hamiltonian cycle?}

rather than directly asking:

Find a Hamiltonian cycle.\text{Find a Hamiltonian cycle.}

This may seem restrictive, but it is a very useful convention. Search problems can often be converted into related decision problems, and decision problems give us a clean yes/no framework for reductions, hardness, and completeness.

The second important point is that Turing machines are intentionally simple. They are not meant to be realistic computers. They are meant to be mathematically clean computers. The remarkable fact is that many reasonable models of computation can simulate each other with at most polynomial overhead. So, for the purposes of classes like P\mathrm P, NP\mathrm{NP}, and PSPACE\mathrm{PSPACE}, the exact choice of machine model usually does not matter.

As an asside: philosophical implications like this are the things which excite me the most. Computation is seemingly such a strong and universal property that most reasonable models of it are equivalent. Things philosophical aspect of the field extends to my favourite observations such as:

This is one reason polynomial time is so central. A laptop, a Turing machine, and many other reasonable models may differ in low-level details, but they largely agree on what it means for a problem to be solvable in polynomial time. The machine model changes the implementation details, not the basic complexity class.

So when we say that a problem is solvable by a “computer”, what we usually mean is something closer to:

there exists a Turing machine which decides the corresponding language within some specified resource bound.

For example, when we define P\mathrm P, we do not say:

the problem can be solved quickly on my laptop, assuming it has enough RAM and no Windows update begins halfway through.

We say:

P={L:L is decidable by a deterministic Turing machine in polynomial time}.\mathrm P = \{L : L \text{ is decidable by a deterministic Turing machine in polynomial time}\}.

That is, LPL \in \mathrm P if there exists a deterministic Turing machine MM and a polynomial pp such that, for every input xx, the machine decides whether xLx \in L in at most

p(x)p(|x|)

steps.

Here x|x| denotes the length of the input. The input size matters because complexity theory studies how the amount of required computation grows as the input grows.

For example, an algorithm running in time n2n^2 is polynomial time. An algorithm running in time 2n2^n is exponential time. For small inputs, this difference may not matter much. For large inputs, it becomes the difference between “done before lunch” and “the Sun has become a red giant.” This may seem silly since 2n2^n is substantially better than n10000000n^{10000000\ldots} in real-world applications, but in theory the latter is better since eventually 2n2^n outgrows the latter runtime for large enough nn. See my paper on Edge Geography for an algorithm which is theoretically better than previous techniques, but completely useless in practice!

The point of introducing Turing machines is therefore not that we secretly want to program with infinite tapes. The point is that they let us make statements like the following precise:

this problem can be solved efficiently,\text{this problem can be solved efficiently},this problem can be verified efficiently,\text{this problem can be verified efficiently},this problem can be solved using polynomial space,\text{this problem can be solved using polynomial space},

and

this problem is at least as hard as every problem in a class.\text{this problem is at least as hard as every problem in a class}.

Once we have a mathematical model of computation, we can start measuring resources. Once we can measure resources, we can define complexity classes. And once we have complexity classes, the question

P=?NP\mathrm P \stackrel{?}{=} \mathrm{NP}

becomes a precise mathematical problem rather than a philosophical argument.

Time, input size, and polynomial time

Now that we have a model of computation, we can start asking complexity-theoretic questions. The most basic one is:

how many steps does the machine need before it gives an answer?

This is where input size enters the picture. If the input is a string xx, we write x|x| for its length. Complexity theory is not usually interested in how long an algorithm takes on one fixed input. It is interested in how the required number of steps grows as the input gets larger.

For example, suppose an algorithm takes at most n2n^2 steps on inputs of length nn. Then doubling the input size makes the bound roughly four times larger. This may be annoying, but it is still a very controlled kind of growth.

By contrast, an algorithm taking 2n2^n steps behaves very differently. Increasing nn by 11 doubles the running time.

So we distinguish between different kinds of running time. The most important class for us is polynomial time.

An algorithm runs in polynomial time if there is some constant kk such that, on inputs of length nn, it halts after at most

O(nk)O(n^k)

steps.

Equivalently, there is a polynomial pp such that the algorithm uses at most p(n)p(n) steps on every input of length nn.

This gives us the class P\mathrm P.

A decision problem is in P\mathrm P if there is a deterministic Turing machine which decides it in polynomial time. In other words,

LPL \in \mathrm P

if there exists a deterministic Turing machine MM and a polynomial pp such that, for every input xx, the machine decides whether xLx \in L in at most

p(x)p(|x|)

steps.

The class P\mathrm P is often informally described as the class of efficiently solvable decision problems. This is not a perfect description. An algorithm running in time n1000n^{1000} is polynomial time, but no one would call it practical with a straight face. Still, polynomial time is mathematically robust. It gives a stable notion of efficient computation that does not depend too much on the specific machine model.

This robustness is one reason P\mathrm P is so central. If a problem is in P\mathrm P, then it is not merely solvable by some clever trick on one specific computer. It is solvable efficiently in a broad, machine-independent mathematical sense.

Many familiar computational problems are in P\mathrm P. For example:

These problems can look nontrivial, but they admit algorithms whose running times are bounded by polynomials in the input size.

This is the first major theme of complexity theory:

some problems have structure that algorithms can exploit efficiently.\text{some problems have structure that algorithms can exploit efficiently.}

The next class, NP\mathrm{NP}, captures a different phenomenon. Some problems may not obviously be easy to solve, but once someone hands us a proposed solution, that solution is easy to check.

Certificates and the class NP\mathrm{NP}

The class NP\mathrm{NP} is often introduced by saying that it contains the problems whose solutions can be checked efficiently.

For example, consider Hamiltonian cycle. Given a graph GG, finding a Hamiltonian cycle may be hard. But if someone hands us a proposed Hamiltonian cycle, checking it is easy: we verify that it visits every vertex exactly once and that every consecutive pair of vertices is connected by an edge.

The proposed cycle is called a certificate or witness.

This gives the central idea behind NP\mathrm{NP}:

a yes-answer should have a short proof that can be checked quickly.\text{a yes-answer should have a short proof that can be checked quickly.}

Formally, a language LL is in NP\mathrm{NP} if there is a polynomial-time verifier VV and a polynomial pp such that, for every input xx,

xLthere exists a string y with yp(x) such that V(x,y)=1.x \in L \quad\Longleftrightarrow\quad \text{there exists a string } y \text{ with } |y| \le p(|x|) \text{ such that } V(x,y) = 1.

Here yy is the certificate. The verifier VV is not required to find yy. It only has to check whether a proposed yy is valid.

This distinction is the heart of the P vs. NP problem.

For Hamiltonian cycle:

For satisfiability:

For Sudoku:

In all these cases, the certificate may be hard to find. But once it is found, checking it is straightforward.

This is why one often hears the slogan:

P=easy to solve,NP=easy to check.\mathrm P = \text{easy to solve}, \qquad \mathrm{NP} = \text{easy to check}.

The slogan is useful, but slightly imprecise. More carefully:

Every problem in P\mathrm P is also in NP\mathrm{NP}. If we can decide whether xLx \in L in polynomial time, then we do not need a meaningful certificate at all. The verifier can simply ignore the certificate and run the polynomial-time decision algorithm.

Thus,

PNP.\mathrm P \subseteq \mathrm{NP}.

The P vs. NP problem asks whether this containment is strict:

P=?NP.\mathrm P \stackrel{?}{=} \mathrm{NP}.

Equivalently:

If a yes-answer can always be verified efficiently, can it also always be found efficiently?

Most complexity theorists believe the answer is no. But no one knows how to prove it.