Insights Blog Testing the PRISM program for the NSA


Testing the PRISM program for the NSA

Qualitest can neither confirm nor deny any potential involvement we may have had with any level of testing for the PRISM program in use by...

QualiTest can neither confirm nor deny any potential involvement we may have had with any level of testing for the PRISM program in use by the NSA.

PRISM, if you’re aren’t familiar with it, keeps a record of the online activities engaged in by Internet users through programs partnered with the NSA. Those companies include Facebook, Google, AOL, and pretty much every other large company you’re likely to use for anything on the Internet these days.

As said, we can’t tell you whether or not we had any part of testing PRISM. However, if we did, this is how we would have gone about it:

First of all, there’s one major misconception about the PRISM program, and that is this concept that it’s based around “data mining.” Everyone says the program was designed to mine their data; what they didn’t understand was that it’s not about mining for information about your cat. The NSA already has that. If we were involved with the program (which we aren’t saying we are, but we aren’t saying we’re not), what we would actually be doing is taking the two thousand pictures of your cat and transforming them into information about whether or not your cat is a threat to US security. Therefore, “data mining” is less true than saying we’d be partaking in “data transformation.”

For testing PRISM, one of our first concerns would to be run ETL testing, first running checks that all the data is in a common format, and secondly testing the veracity of the results themselves. Then comes load testing (seeing how well the program handles crazy amounts of traffic) and performance testing (making sure it handles a realistic amount of traffic well).

After testing the data transformation system itself, we would have moved on to security testing. The two main tenants here are testing the external security (encryption to secure the program against outside threats) and internal security (ensuring that those within the company can’t export the data or, in some cases, even see it). With PRISM, we would also ensure that the data is encrypted for various degrees of access: what should analysts be able to see? For example, PRISM blocks all data belonging to American citizens (unless there is a warrant for it). However, accidental exposure is still possible through, for example, the logging mechanism in use; if the program keeps a log saying that it scanned the AOL account of John Smith from Hoboken, New Jersey, and this log contained all of his emails and personal information, it would be easy for an analyst to accidentally stumble upon this information and compromise Mr. Smith’s privacy. Because of how sensitive this information is, it would also be important for us to rigorously test such security standby as logging the analyst out after X minutes of inactivity.

Next, in order to accurately keep false positives in check and identify unexpected patterns, we would need to apply the algorithms provided for us by the developers and engage in functional testing. For example, it wouldn’t be enough to just assume that everyone who says they are from the U.S. is actually an American citizen, so an algorithm determining “American-ness” would be necessary. This would scan a person’s profiles, contacts, and correspondences automatically, examining such variables as use of foreign language versus English vernacular, use of flagged terms, their IP address as well as those of their contacts, their search history, even the things they complain about, etc., and judge the likelihood of non-American-ness from these results. The only way an actual person would have to see any American’s data is if it is necessary for them to manually go into the database to flag someone as American who was accidentally labeled as foreign by the algorithm.

To test this, testers would feed the algorithm carefully-engineered profiles to verify that the algorithm flags the people it should and doesn’t flag the people it shouldn’t. The way this process begins is by having system architects create a profile that perfectly meet the variables for an American citizen. Then, one by one, they start adding in variables that would lead the algorithm to doubt their American-ness; the American starts sharing news stories about foreign events, like the Tour de France. Then, French lingo starts popping up in his status updates. A bunch of French profiles are added to his contacts lists. With every new change, we test and see whether the algorithm flags the profile as American or foreign, and keep track of how many changes it takes for the profile to be flagged. Another aspect of functional testing would examine the way analysts would use the program itself: does the documentation the developers provided for them clearly outline everything they must use the software for? It would also look at the reports the analysts would be expected to make and test that software as well, if it was not part of the original program. We wouldn’t be interested in how the reports are generated, exactly; just in ensuring that whatever process the developers choose works well.

In the same vein of functionality testing is usability testing. While a large part of this tends to be testing the user interface (UI) and ensuring that everything works the way it’s supposed to, there are other facets which are necessary to examine, especially in the case of PRISM. The “flow” or process of actually using the program is a big part of usability testing which is not technically included in the UI but is still incredibly important. For example, if the program’s flow is not checked for redundancy, an analyst could have to manually select each profile they would like to check, as well as each variable in the American-ness algorithm for each profile, and they would have to do this for every one of the 1.11 billion users of Facebook, for example.

Finally, we would have to enact thorough disaster recovery testing to make sure that some common hardware failures can be identified. We would have to make sure that the database does not get corrupted in the event of a power outage, or register testing as completed if a LAN cable splits during a long test. It’s important to make sure that it’s possible to restore corrupted data should this happen. Imagine if there was no hardware testing for PRISM: the power goes out during a test, and everyone in the world is labeled as Luxembourgian terrorists. If there is nothing in place to stop it, the program may read incomplete data as insufficient proof of an individual’s innocence and mislabel them in a way which could have disastrous consequences. Worse, if there were no disaster planning, response, and recovery in place, this same power outage could result in a cascading system failure that never even turns back on.

In conclusion, the bugs which could have been identified by performing testing on software like PRISM aren’t any we’d want to deal with. That said, it’s certainly fun to try and extrapolate on what they might have been, and what we could have done to find them if we had been on the team.