A few months ago, the concept of
fake news made its way into my
daily life. The term has been around for some time, but just recently
has become seemingly inescapable online. It was just before the
Christmas break in 2016 that I decided to think about what could be
done to help solve this issue.
On the surface of the issue, it appears as though creating something that allows us to say with absolute certainty whether a claim that someone is making is demonstrably true or false is the holy grail of fake news solvers. In my opinion, there are two major problems with this approach. The first is that I remain unconvinced that being able to cite facts, automated or not, is going to solve the fundamental issue we face today, in that there is a lot of contention about the validity of the various sources for our facts. The second problem is that building such a solution is unlikely to be possible until we can create a system that approaches “human-level artificial intelligence”, as described in the FAQ section of the fake news challenge (FNC), under the heading “Do you think that automated fact checking is really possible? If yes, in the next 10 years?”
Of course, this doesn’t mean progress can’t be made in the meantime. The folks at the FNC are focusing on a stance detection approach for now, which is likely to yield great results in the future. In the short to medium term, it is in the weedy grey area is where practical solutions are likely to be found. I believe this is where my solution fits.
In fact, I refuse to even label it a fake news detector. I’ve gone
with the more diplomatic
Article Credibility Profiler. Instead of
trying to prove whether the claims in a given piece of text are
demonstrably true or false, I have aimed to build a tool that allows
us to make more robust decisions about the credibility of an article
while browsing the web. There are several signals one might use to
quantify the credibility of an article, and it turns out that the
stories most likely to trick us have a lot of common patterns. In
doing so, I have tried to avoid blaming one group or another for
making up lies, or refusing to believe truths. We can instead create
an unbiased threshold to which all credible sources should be held.
I’ve used a list that Buzzfeed compiled late last year as a baseline test. Only about 45 of the original 51 stories seem to be accessible still, but this is a good number for a test. I’ve uploaded the results of running my automated solution against the URLs provided in Buzzfeed’s list here.
Here is a screenshot of the results:
I haven’t made any attempt to weight any of the signals other than a flag system. Red flags are almost always indicators that an article requires further investigation. Orange flags are frequently good indicators, and yellow flags are usually just hints that an article is a bit sloppy in its delivery.
If anyone is trying to solve this or a similar problem, or would like to get in touch about the solution, please send me an e-mail.