Most of the tests I’ve heard people pitch for DC don’t seem very valuable to me, and I want to at least gesture at why.
Other folks seem to be thinking of Double Crux as a complete method, to be directly compared with other methods: “which one works better”. I think of Double Crux as one (very important) pattern in an ensemble for the overall goal of bridging disagreements. “Testing Double Crux”, as I often hear people talk about it, sounds to me a little like “testing bank shots” in basketball: it is clearly useful sometimes, it isn’t always the right thing to go for, and it depends heavily on personal skill.
I think that example overstates it somewhat: Double Crux is more of a broad framework for disagreement bridging than bankshots are for basketball. And that’s not to say that you can’t test bank shots: it’s plausible that there are superstitions about it, and it isn’t as effective as many practitioner’s belive. But the value of information seems lower to me (at least at this stage, where approximately no one has put in more than 20 hours in explicitly training disagreement bridging, compared to basketball, which has hundreds of highly skilled experts.)
I would be more excited in organizing a “disagreement resolution tournament”, where experts who have developed their art and trained to excellence, compete, rather than (for instance) a setup where we give 20 undergrads a 30 minute long double crux lecture with 30 minutes of practice, and compare them to a control group.
(That second things isn’t useless, but I care a lot less about developing shallow tools that are helpful for ~0-skilled folks, out of the box, than I do about deep experts who increase the range of problems (in this case, disagreements) that humanity / the x-risk ecosystem can solve at all.)
The logistics of such a tournament seem hard to make work, because there’s not an obvious way to standardized disagreements to resolve, and in practice there are very few highly skilled experts of differing schools. So the value of information in 2019 still seems low. But it seems more promising than most of the tests I hear proposed.