Like when I used the depth bounds extension at the time, this tricks had almost no impact and I presume the extra cost was coming from the depth buffer copy stuff.
So this is making me think that performance improvement will only come with smarter geometry setup.
I think I need to look in ways to subdivide the geometry but in a less taking way during the compute pass.