Did North Korea Conduct a Secret Nuclear Test in 2010?

What does the evidence tell us? 

North Korea continues to rattle the nuclear saber. Just how potent is the DPRK’s nuclear arsenal? Can North Korea hit the United States with a nuclear weapon? In order to do any of this, proper testing would need to be done. It is with these questions that we present the latest from our friends at 38 North, where this piece first appeared, who ask the question: Did North Korea test a nuclear device in 2010?

In 2010, two radionuclide stations in Northeast Asia detected radioactive particles that seemed to indicate that a nuclear explosion had taken place. While there are other possible explanations, other evidence seemed to suggest that North Korea had conducted a very small and otherwise undetected nuclear test. In the past few years, there have been a number of studies of radionuclide data, seismic data and now, on 38 North, satellite imagery.

While some of the evidence is intriguing, I don’t buy it. My objections are largely methodological—and methodological objections are important to me. Everyone who does analysis will be wrong from time-to-time. I try to be methodologically cautious so that, when I inevitably get it wrong, I will still feel like I made the right judgment based on the evidence available to me.

I think the hypothesis that North Korea conducted a nuclear test in May 2010 is a reasonable one worth considering. North Korea has conducted three nuclear weapons tests, presumably reducing the size and mass of the nuclear device, fixing whatever went wrong in 2006 and possibly confirming a design using uranium. It is possible that, along the way, North Korea conducted a low-yield science experiment or simply tested a dud.

Frankly, I’d love to be the person who proves that North Korea conducted a secret nuclear test. But, based on the evidence we have, I just don’t think it is more likely than not.

The 1979 Flash in the South Atlantic

First, a little history. In many ways, the debate over whether North Korea conducted a May 2010 test reminds me of a similarly ambiguous event in 1979.

The 1979 “flash in the South Atlantic” was precisely that—an optical detector called a “bhangmeter” on a US satellite detected a flash of light that looked a bit like a nuclear test somewhere in the South Atlantic. There was a lot of circumstantial evidence that pointed to Israel as the culprit. And I don’t mean ‘circumstantial’ in an insulting way. I mean that the prospect of a covert Israeli test seemed then, and still does today, totally plausible based on all kinds of evidence.

After the flash, the scientific community started scrutinizing every pool of sensor data to find the slightest corroboration. A few interesting things turned up, but nothing conclusive. There was some hydrophone data, but it required a sound wave to bounce off of Antarctica. There were claims of radioactive sheep thyroids that I’ve never been able to confirm. And so on. A full review of the evidence is beyond the scope of this little essay, but the approach raised a methodological concern. Spurious correlations are a statistical fact. A 90 percent confidence level means you’re still getting fooled 10 percent of the time. So, if you look hard enough for corroboration, you will find a few things, even if they are spurious. As a scientific panel charged with reviewing the data concluded:

We surmise that had a search been made for corroborating data relevant to a nonexistent event chosen to occur at a random time, such a search would have provided ‘corroborative data’ of similar quality and quantity to that which has been found during analysis of the September 22 signal.

To put it simply, one must be careful to avoid collecting coincidences that support a hypothesis while ignoring data that undermines a hypothesis.

Ultimately, the scientific panel decided to reject the hypothesis that the bhangmeter had seen a nuclear test for a simple, elegant reason: the satellite’s bhangmeter, like a pair of eyes, was two sensors, which saw different events. If something is far away—like on the surface of the earth—the two sensors are close enough that they should see the same thing. The fact that the two sensors saw something different, the panel reasoned, suggested the flash occurred in space very near to the satellite and not on the ground. This was an elegant answer. It also persuaded no one. People just simply accused the scientific panel of covering up for the Carter administration, Israel, etc.

I feel precisely the same way about the alleged May 2010 nuclear tests. As in the case of Israel in 1979, I have no trouble accepting that the DPRK might have conducted a nuclear test in May 2010. But, as in the case of 1979, the assembled evidence seems to be merely a collection of coincidences that we could collect for a nonexistent event on a randomly chosen day.

Radionuclide Signatures

At the core of this problem is a reversal in how we think about detecting underground nuclear tests. The traditional thinking is that the correct way to “detect” an underground nuclear test is to spot it seismically. If radionuclides later appear, that helps “characterize” the seismic event as a nuclear explosion, rather than a conventional one. Generally, policymakers have been reluctant to rely only on radionuclide readings alone to “detect’ events for reasons that should become clear. The radionuclide community, however, is very excited about getting the same recognition as seismologists, especially now that computer simulations promise reliable methods to model the transport of radionuclides based on weather data. So, there may a bit of a disciplinary food fight here.

In May 2010, the DPRK released a series of statements that a “thermonuclear” reaction had occurred in April. In the months following the announcements, a well-respected Swedish radiochemist, Lars-Erik De Geer, correlated these statements with certain radionuclide readings collected by the Comprehensive Test Ban Treaty Organization’s (CTBTO) International Monitoring System (IMS). The data includes xenon isotope ratio measurements at a national radionuclide monitoring site near Geojin (South Korea) and an IMS site near Takasaki (Japan) and Barium/Lanthanum measurements at CTBTO IMS sites near Usurriysk in Russia and Okinawa in Japan. (Only Lanthanum (La) was detected at Ussuyriysk.) All these measurements occurred between May 13-18, 2010.

De Geer published his findings in a 2012 article in Science and Global Security. I was skeptical of the original De Geer paper because it posited an extraordinarily artificial scenario of the observed radionuclide readings. De Geer posited two undetected nuclear tests, conducted in the same chamber approximately one month apart.

A number of radiochemists reviewed and agreed with De Geer’s initial paper. One concluded that the evidence suggested a nuclear explosion, although he argued the radionuclide evidence was best explained by a single explosion and dismissing the xenon detections at Takasaki in Japan as coincidental.

De Geer himself concluded that the initial paper was in error, publishing a second paper in the Journal of Radioanalytical and Nuclear Chemistry. While De Geer’s first paper posited two undetected tests, the second paper posits only a single test on May 11.

Now, there are two ways to respond to this revision: I took it as confirmation of my original complaint that the scenario was being fitted to the data, raising serious methodological warnings. My colleagues, quite reasonably said, “Yeah, but the new scenario is pretty clean. What’s your objection to it now?”

Then along came another group of radiochemists, Ihantola et al, who agreed that a nuclear explosion occurred, but estimated the likely time of the event to be much later than De Geer’s estimate. De Geer and Ihantola et al posit very different explosion times, each outside of the error range posited by the other. Only the confidence intervals overlap, and just for a few hours.

So, here we are. Is De Geer right? Are Ihantola et al right? Or do we just shake our heads, muttering about how the data, like Jay from Serial, seems to always tell us what we want to hear? I don’t blame Wright for concluding that some of the readings might be unrelated to a test, but once we start tossing out awkward data, our thin methodological ice starts to crack.

Moreover, false alarms are possible. Nuclear power stations, reprocessing plants and other human events can result in releases that appear to be nuclear explosions. Early operation of a radionuclide monitoring system in Germany detected xenon spikes from nearby reactors. (The false alarm has led to better methods of characterization that emphasize isotoptic ratios, but these methods still struggle to distinguish an explosion from a fresh load of fuel.) In another instance, in 2004, a radionuclide station detected 140La that was later determined to have been from a military decontamination exercise. We are so worried about false negatives—missing a nuclear test—but we seem to never worry about false positives.

I still wonder about other possible explanations. Japan brought its Monju fast breeder reactor online on May 9, 2010. The reactor experienced a number of alarms on May 9 and 10, indicating radiation leaks. Although Japanese authorities later stated that the alarms were false alarms and turned off the alarm system, this possibility should be examined far more thoroughly than it has been to date. Similarly, China brought its first fast reactor online a few weeks later. Maybe the Chinese had a false start? These hypotheses strike me as equally likely as a North Korean test. They deserve the same scrutiny.

Sadly, the radionuclide background in Asia is getting worse, not better as more reactors come online. In particular, South Korea is planning to build a medical isotope production reactor planned for Busan that will produce a lot of radionuclide “noise.” North Korea’s 2013 nuclear test would likely have been lost in the background had this facility been operating at the time.

Seismic Data

After De Geer’s initial report, seismologists began looking for events that would confirm an explosion.

Schaff et al closely examined seismic data from an IMS station in China on the days hypothesized in the original De Geer paper—April 14-16 and May 10-11. They found no evidence of an explosion in either period. For the crucial period of May 10-11, Schaff et al found no explosion down to a threshold of Mb=1.15.

Zhang and Wen, working from the second De Geer paper, examined a wider range of dates. They identified a tiny event on May 12 using a method that looks for tiny events by matching (or cross-correlating) very small deviations at multiple regional seismic stations. This method should produce a lot of spurious correlations and the Zhang and Wen paper is a little vague about how many standard deviations the event in question represents. But in one study that looks at earthquakes using this method, even nine standard deviations above the mean resulted in something like one spurious correlation a day. This is some serious data mining.

That said, the Zhang and Wen event is still interesting. It occurred in the morning of May 12. This event is consistent with Ihantola et al’s estimate of a later event around 16:00 UTC on May 12, and is just inside DeGeer’s confidence interval, too.

Now, here is a problem. The event identified by Zhang and Wen probably did not occur until several hours after the DPRK released its statement about fusion. Moreover, the original DPRK announcement indicated that the fusion event had occurred on April 15. On balance, if the May 12 event was a DPRK nuclear explosion, it does not appear to be related to the announcement of a successful fusion event earlier in the day and referring to an event in April. In other words, the fusion announcement that De Geer emphasized so heavily in his paper was a coincidence. And, if there is one theme that I keep coming back to over and over again, it’s that we have to be cautious about building a case by collecting coincidences.

Decoupling

There is another problem that is worth pondering. The Zhang and Wen paper posits an Mb of 1.44. It is not straightforward to convert this to a nuclear yield for an explosion, although Zhang and Wen use a formula that yields (sorry about the pun) an estimate of about 3 tons. That’s a very small event. It isn’t clear why North Korea would conduct such a small test.

Although De Geer did not stipulate the size of the event necessary to produce the radionuclide signatures, others suggest the explosion must have been on the order of several tens of tons, if not more. That has lead proponents to argue that North Korea might have conducted a May 2010 explosion in a giant cavity that decoupled the seismic signal from the size of the explosion. Decoupling factors in hard rock could be as large as a factor of 40, transforming a 3 ton event into a 120 ton event. Constructing a cavity in hard rock large enough to decouple a 120 ton explosion would be quite an engineering achievement in hard rock. Moreover, construction of such a large cavity would surely have been noticed. This is yet another complication in the story, one that is plausible yet also unlikely.

Conclusion

There is some circumstantial evidence. To me, though, it just doesn’t hang together. The scenario in the original De Geer paper has been completely abandoned. If there was a test, it was one event, not two. And if there was a test, it occurred much later than De Geer initially thought, making the DPRK announcement a coincidence.

What we are left with are some interesting radionuclide readings, but it is possible to imagine alternative explanations for them. And, to my frustration, we haven’t seen a careful examination of those alternatives. Instead, there has been a tendency to build a case against the DPRK. Collecting coincidences makes me nervous.

I still think it is possible that the DPRK did, in fact, conduct a nuclear test in May 2010. But proving it requires more than just collecting data that corroborates the event, while ignoring alternative hypotheses and data that doesn’t fit.

Writing about the September 22, 1979 event, Pief Panofsky concluded the best description of the evidence was the so-called Scotch Verdict. In Scotland, juries may make one of three findings rather than two—guilty, not guilty and not proven. Panofsky was not quite prepared to acquit Israel or South Africa of having conducted a nuclear test, but nor did the evidence point conclusively to their guilt.

“Not proven” seems to be the right verdict in the case of the May 2010 event as well. It is worth noting, by the way, that there was—or rather could have been—a simple way to determine whether North Korea conducted a low-yield nuclear test in May 2010. If the CTBTO had been in force with North Korea as a member, the United States or any other State Party could have requested that the CTBTO conduct an onsite inspection. Radiochemists are undeniably proud that the CTBTO’s network of radionuclide stations detected hints of a possible nuclear explosion. But the system was never intended to function with sensors alone. The ability to conduct onsite inspections is an essential element in the regime envisioned to verify the worldwide ban on nuclear testing. In November and December 2014, the CTBTO conducted Integrated Field Exercise 2014—a simulated onsite inspection in Jordan. Absent such an inspection—or better evidence than has been found to date—the events of May 2010 remain interesting, but ambiguous.

Jeffrey Lewis is Director of the East Asia Nonproliferation Program at the James Martin Center for Nonproliferation Studies (CNS), Monterey Institute of International Studies.