I guess the whole argument is that inline has turned into a "ricing option" that programmers throw about for tons of bogus reasons, not understanding gcc, not understanding other architectures, etc. Hence the patches to remove them all and just let the compiler do it, because it can't get any worse.
I liked this post from Ingo:
furthermore, there's also a new CPU-architecture argument: the cost of
icache misses has gone up disproportionally over the past couple of
years, because on the first hand lots of instruction-scheduling
'metadata' got embedded into the L1 cache (like what used to be the BTB
cache), and secondly because the (physical) latency gap between L1 cache
and L2 cache has increased. Thirdly, CPUs are much better at untangling
data dependencies, hence more compact but also more complex code can
still perform well. So the L1 icache is more important than it used to
be, and small code size is more important than raw cycle count - _and_
small code has less of a speed hit than it used to have. x86 CPUs have
become simple JIT compilers, and code size reductions tend to become the
best way to inform the CPU of what operations we want to compute.