What is P vs. NP? · Thobias Høivik

To P or (N)ot to P?

Even if you have never heard of Computational Complexity theory, you might be familiar with the phrase: “P vs. NP”, but what does that even mean? Who are P and NP? Why are they fighting, and why should I care?

As to the latter question: the P vs. NP problem is one of the Clay Mathematics Institute’s $7$ millenium problems ( $6$ of which, including P vs. NP, are unsolved), for which a solution yields the solver a prize of $ $1$ million in U.S currency, immense academic prestige, massive breakthroughs in several fields of science, and never losing two truths and a lie ever again.

Loosely speaking the question pertains to whether a certain type of problems is fundamentally harder (for a “computer”) to solve than others. P represents the class of problems solvable in polynomial time (colloquially thought of as efficiently/quickly), while NP represents the class of problems where, if a solution is found, it can be verified to be correct in polynomial time.

To elucidate further consider the following two graph problems.

Let $G$ be some graph, meaning a set of vertices/points connected by edges/lines.

A Eulerian cycle visits every edge exactly once. This problem is efficiently solvable: essentially, one checks that the graph is connected and that every vertex is connected to an even number of other vertices.

A Hamiltonian cycle visits every vertex exactly once. If someone gives us such a cycle, we can check it quickly. But finding one is NP-complete.

So two problems that sound almost identical behave very differently:

\text{visit every edge} \quad \text{vs.} \quad \text{visit every vertex}.

If $\text{P} \neq \text{NP}$ , then determining whether a graph has a hamiltonian cycle is fundamentally harder (for a “computer”) to determine. If $\text{P} = \text{NP}$ , then every problem as hard as Hamiltonian cycle could be solved “quickly” and this would undoubtedly have incredible consequences including, but not limited to:

you waking up with an empty bank account, as many cryptographic systems rely on computational problems being infeasible to solve in practice,
potentially huge advances in drug discovery, enzyme design, vaccines, etc. as searching amino-acid sequences for desired structure is NP-hard,
cheaper transport and better emergency service response since finding optimal routes is NP-hard,
millions of computer scientists scratching their heads at this (in their opinion) marvelously unlikely outcome.

So the latter outcome would have immense implications on the real world, while the former outcome is perhaps the more philosophically interesting.

Sadly, or perhaps thankfully depending on which consequences you look at, most Computational Complexity theorists do not believe that $\text{P} = \text{NP}$ , as proving that $\text{P} = \text{NP}$ would be exceedingly simple to prove compared to the alternative. In fact, preprint servers such as arXiv or (the more dubious) viXra, are littered with false proofs that $\text{P} = \text{NP}$ in the form of supposed polynomial time algorithms for NP-complete problems such as $3$ -SAT. That is where deductive sciences such as Math, Theoretical Computer Science, Logic, etc. differ from empirical ones. Over the $50+$ years Computational Complexity theory has existed, no one has legitimately found such an algorithm. Thus, one might think we could decide that $\text{P} \neq \text{NP}$ on emprical evidence. However, this is not enough. We need actual mathematical proof that this is the case. What we’re left with is the brutally simple observation that

\text{P} \subseteq \text{NP},

and some approaches towards closing the gap in the form of modren research programs.

Now that you’ve gotten some, possibly confusing, insight into what the P vs. NP problem is let us embark on a slightly more rigorous exposition of what Computational Complexity theory is.

The Turing Machine

You may have noticed that above I write “computer” in quotation marks. That is because theoretical computer scientists, despite the name, do not usually work directly with computers in the everyday sense. They work with models of computation: mathematically precise idealizations of what it means to compute.

This is not because laptops are beneath us, although some laptops do try very hard to deserve contempt. It is because actual computers are complicated physical objects. They have processors, memory hierarchies, operating systems, compilers, caches, cosmic-ray-induced bit flips if the universe is feeling dramatic, and fans that begin screaming the moment one opens Chrome. That is far too messy if the goal is to prove theorems.

Instead, we use simpler mathematical machines which capture the essential idea of algorithmic computation. The most famous, and still the central one for complexity theory, is the wonderful Turing machine.

Alan Turing introduced this model in his 1936 paper On Computable Numbers, with an Application to the Entscheidungsproblem. The historical context is important: Turing was not trying to design a laptop. He was trying to formalize what it means for something to be computable by a finite mechanical procedure. In doing so, he helped create the mathematical foundation of computer science.

Turing was also one of the central figures in the Allied codebreaking effort at Bletchley Park during the Second World War, where his ideas contributed to the development of the Bombe machines used against Enigma-encrypted messages. But his deepest contribution to theoretical computer science is arguably not the wartime story, famous as it is. It is the fact that he gave us a precise mathematical language for talking about computation itself.

As a brief cultural aside: the movie The Imitation Game is a perfectly watchable film, but it spends much of its time on a dramatized and historically dubious version of Turing’s life, while somehow making the invention of modern computability theory look like an asside. I think most people would agree that a movie about his developments in mathematics would be way more interesting.

So what is a Turing machine?

Intuitively, a Turing machine consists of:

a tape, divided into cells, which can store symbols,
a head, which reads and writes one tape cell at a time,
a finite list of internal states,
and a transition rule telling the machine what to do next.

At each step, the machine looks at its current state and the symbol currently under the head. Based on this information, it writes a symbol, moves the head left or right, and changes state.

Formally, a deterministic Turing machine is a tuple

M = (Q,\Sigma,\Gamma,\delta,q_0,q_{\mathrm{acc}},q_{\mathrm{rej}}),

where:

$Q$ is a finite set of states,
$\Sigma$ is the input alphabet,
$\Gamma$ is the tape alphabet, with $\Sigma \subseteq \Gamma$ ,
$\delta$ is the transition function,
$q_0 \in Q$ is the start state,
$q_{\mathrm{acc}} \in Q$ is the accepting state,
$q_{\mathrm{rej}} \in Q$ is the rejecting state.

The transition function has the form

\delta : (Q \setminus \{q_{\mathrm{acc}},q_{\mathrm{rej}}\}) \times \Gamma \to Q \times \Gamma \times \{L,R\}.

This means that if the machine is in state $q$ and reads symbol $a$ , then $\delta(q,a)$ tells it three things:

\text{new state}, \qquad \text{symbol to write}, \qquad \text{direction to move}.

An input string $x \in \Sigma^\ast$ is written on the tape, the machine starts in state $q_0$ , and then it follows its transition function step by step. If the machine eventually enters $q_{\mathrm{acc}}$ , it accepts $x$ . If it eventually enters $q_{\mathrm{rej}}$ , it rejects $x$ .

For decision problems, we usually think of a Turing machine as deciding a language

L \subseteq \Sigma^\ast.

The language $L$ is the set of yes-instances of the problem. We say that $M$ decides $L$ if, for every input $x$ ,

x \in L \quad\Longleftrightarrow\quad M \text{ accepts } x,

and

x \notin L \quad\Longleftrightarrow\quad M \text{ rejects } x.

In other words, a Turing machine decides a problem if it always eventually gives the correct yes/no answer.

This is the first important point: complexity theory is usually about decision problems. We ask questions like:

\text{Does this graph have a Hamiltonian cycle?}

rather than directly asking:

\text{Find a Hamiltonian cycle.}

This may seem restrictive, but it is a very useful convention. Search problems can often be converted into related decision problems, and decision problems give us a clean yes/no framework for reductions, hardness, and completeness.

The second important point is that Turing machines are intentionally simple. They are not meant to be realistic computers. They are meant to be mathematically clean computers. The remarkable fact is that many reasonable models of computation can simulate each other with at most polynomial overhead. So, for the purposes of classes like $\mathrm P$ , $\mathrm{NP}$ , and $\mathrm{PSPACE}$ , the exact choice of machine model usually does not matter.

As an asside: philosophical implications like this are the things which excite me the most. Computation is seemingly such a strong and universal property that most reasonable models of it are equivalent. Things philosophical aspect of the field extends to my favourite observations such as:

a problem is hard if and only if it is a “good computer”,
determining whether ceirtain problems (NP) are harder than other (P) is a fundamentally hard problem (see Meta-Complexity theory for more on that!),
and more.

This is one reason polynomial time is so central. A laptop, a Turing machine, and many other reasonable models may differ in low-level details, but they largely agree on what it means for a problem to be solvable in polynomial time. The machine model changes the implementation details, not the basic complexity class.

So when we say that a problem is solvable by a “computer”, what we usually mean is something closer to:

there exists a Turing machine which decides the corresponding language within some specified resource bound.

For example, when we define $\mathrm P$ , we do not say:

the problem can be solved quickly on my laptop, assuming it has enough RAM and no Windows update begins halfway through.

We say:

\mathrm P = \{L : L \text{ is decidable by a deterministic Turing machine in polynomial time}\}.

That is, $L \in \mathrm P$ if there exists a deterministic Turing machine $M$ and a polynomial $p$ such that, for every input $x$ , the machine decides whether $x \in L$ in at most

p(|x|)

steps.

Here $|x|$ denotes the length of the input. The input size matters because complexity theory studies how the amount of required computation grows as the input grows.

For example, an algorithm running in time $n^2$ is polynomial time. An algorithm running in time $2^n$ is exponential time. For small inputs, this difference may not matter much. For large inputs, it becomes the difference between “done before lunch” and “the Sun has become a red giant.” This may seem silly since $2^n$ is substantially better than $n^{10000000\ldots}$ in real-world applications, but in theory the latter is better since eventually $2^n$ outgrows the latter runtime for large enough $n$ . See my paper on Edge Geography for an algorithm which is theoretically better than previous techniques, but completely useless in practice!

The point of introducing Turing machines is therefore not that we secretly want to program with infinite tapes. The point is that they let us make statements like the following precise:

\text{this problem can be solved efficiently},

\text{this problem can be verified efficiently},

\text{this problem can be solved using polynomial space},

and

\text{this problem is at least as hard as every problem in a class}.

Once we have a mathematical model of computation, we can start measuring resources. Once we can measure resources, we can define complexity classes. And once we have complexity classes, the question

\mathrm P \stackrel{?}{=} \mathrm{NP}

becomes a precise mathematical problem rather than a philosophical argument.

Time, input size, and polynomial time

Now that we have a model of computation, we can start asking complexity-theoretic questions. The most basic one is:

how many steps does the machine need before it gives an answer?

This is where input size enters the picture. If the input is a string $x$ , we write $|x|$ for its length. Complexity theory is not usually interested in how long an algorithm takes on one fixed input. It is interested in how the required number of steps grows as the input gets larger.

For example, suppose an algorithm takes at most $n^2$ steps on inputs of length $n$ . Then doubling the input size makes the bound roughly four times larger. This may be annoying, but it is still a very controlled kind of growth.

By contrast, an algorithm taking $2^n$ steps behaves very differently. Increasing $n$ by $1$ doubles the running time.

So we distinguish between different kinds of running time. The most important class for us is polynomial time.

An algorithm runs in polynomial time if there is some constant $k$ such that, on inputs of length $n$ , it halts after at most

O(n^k)

steps.

Equivalently, there is a polynomial $p$ such that the algorithm uses at most $p(n)$ steps on every input of length $n$ .

This gives us the class $\mathrm P$ .

A decision problem is in $\mathrm P$ if there is a deterministic Turing machine which decides it in polynomial time. In other words,

L \in \mathrm P

if there exists a deterministic Turing machine $M$ and a polynomial $p$ such that, for every input $x$ , the machine decides whether $x \in L$ in at most

p(|x|)

steps.

The class $\mathrm P$ is often informally described as the class of efficiently solvable decision problems. This is not a perfect description. An algorithm running in time $n^{1000}$ is polynomial time, but no one would call it practical with a straight face. Still, polynomial time is mathematically robust. It gives a stable notion of efficient computation that does not depend too much on the specific machine model.

This robustness is one reason $\mathrm P$ is so central. If a problem is in $\mathrm P$ , then it is not merely solvable by some clever trick on one specific computer. It is solvable efficiently in a broad, machine-independent mathematical sense.

Many familiar computational problems are in $\mathrm P$ . For example:

deciding whether a graph is connected,
finding a shortest path in a graph with nonnegative edge weights,
deciding whether a graph has an Eulerian cycle,
sorting a list,
solving a system of linear equations over a field.

These problems can look nontrivial, but they admit algorithms whose running times are bounded by polynomials in the input size.

This is the first major theme of complexity theory:

\text{some problems have structure that algorithms can exploit efficiently.}

The next class, $\mathrm{NP}$ , captures a different phenomenon. Some problems may not obviously be easy to solve, but once someone hands us a proposed solution, that solution is easy to check.

Certificates and the class $\mathrm{NP}$

The class $\mathrm{NP}$ is often introduced by saying that it contains the problems whose solutions can be checked efficiently.

For example, consider Hamiltonian cycle. Given a graph $G$ , finding a Hamiltonian cycle may be hard. But if someone hands us a proposed Hamiltonian cycle, checking it is easy: we verify that it visits every vertex exactly once and that every consecutive pair of vertices is connected by an edge.

The proposed cycle is called a certificate or witness.

This gives the central idea behind $\mathrm{NP}$ :

\text{a yes-answer should have a short proof that can be checked quickly.}

Formally, a language $L$ is in $\mathrm{NP}$ if there is a polynomial-time verifier $V$ and a polynomial $p$ such that, for every input $x$ ,

x \in L \quad\Longleftrightarrow\quad \text{there exists a string } y \text{ with } |y| \le p(|x|) \text{ such that } V(x,y) = 1.

Here $y$ is the certificate. The verifier $V$ is not required to find $y$ . It only has to check whether a proposed $y$ is valid.

This distinction is the heart of the P vs. NP problem.

For Hamiltonian cycle:

the input $x$ is the graph,
the certificate $y$ is the proposed cycle,
the verifier checks that the proposed cycle is valid.

For satisfiability:

the input $x$ is a Boolean formula,
the certificate $y$ is a truth assignment,
the verifier checks that the assignment satisfies the formula.

For Sudoku:

the input $x$ is a partially filled board,
the certificate $y$ is a completed board,
the verifier checks that every row, column, and box is valid.

In all these cases, the certificate may be hard to find. But once it is found, checking it is straightforward.

This is why one often hears the slogan:

\mathrm P = \text{easy to solve}, \qquad \mathrm{NP} = \text{easy to check}.

The slogan is useful, but slightly imprecise. More carefully:

$\mathrm P$ consists of decision problems whose yes/no answer can be found in polynomial time.
$\mathrm{NP}$ consists of decision problems whose yes-instances have polynomial-size certificates checkable in polynomial time.

Every problem in $\mathrm P$ is also in $\mathrm{NP}$ . If we can decide whether $x \in L$ in polynomial time, then we do not need a meaningful certificate at all. The verifier can simply ignore the certificate and run the polynomial-time decision algorithm.

Thus,

\mathrm P \subseteq \mathrm{NP}.

The P vs. NP problem asks whether this containment is strict:

\mathrm P \stackrel{?}{=} \mathrm{NP}.

Equivalently:

If a yes-answer can always be verified efficiently, can it also always be found efficiently?

Most complexity theorists believe the answer is no. But no one knows how to prove it.

To P or (N)ot to P?

The Turing Machine

Time, input size, and polynomial time

Certificates and the class NP\mathrm{NP}NP

Certificates and the class $\mathrm{NP}$