Dataflow analysis in the compiler to avoid runtime NULL checks

LOAD_FAST accounts for 14.6% of all bytecodes executed. Including superinstructions brings this up to 14.6+4.7+4.6+2.4+0.9 = 27.1%

TARGET(LOAD_FAST) {
PyObject *value = GETLOCAL(oparg);
if (value == NULL) {
goto unbound_local_error;
}
Py_INCREF(value);
PUSH(value);
DISPATCH();
}

We can turn this NULL-check into an assertion in many cases, where we can determine at compile time that the local variable is already initialized. Preliminary tests show that almost all LOAD_FAST instructions can be statically analyzed to be loading already-initialized variables.

The one twist is handling del frame.f_locals["x"] or frame.f_lineno = 17, where previously-safe loads could become unsafe. For now, we can just replace all the LOAD_FAST (no null check) with LOAD_FAST_CHECK in that particular code object.

See also faster-cpython/ideas#306