Model

Our final model is a fine-tuned variant of the pre-trained model AWD_LSTM, which had a base accuracy of 87% on our dataset.

Training

  • All but the last two layers of the model were frozen before fine tuning.

  • During training, we are focusing on varying the dropout multiplier and learning rate.

Detecting AI Generated Text Google Mentorship Program
By: Zidanni Clerigo, Daniel Elliot, Mehmet Colak, and Preston Thomsen

For a moment, put yourself in the mind of a hopeful investor, building his portfolio and doing research on the qualitative value of stocks. You go online, do some searching, and wind up encountering these two articles. What you are looking at right now, as you can guess, are two separate news articles advertising the same stock. However, one of these articles was written by a human being, in the flesh, and the other, AI, more specifically GPT3.5. Now, as the wide-eyed investor you are, are you able to determine which of these two articles was made by a fellow investor or a database pluck containing potentially false or outdated information? The answer is not at first glance. It's very possible for a competent reader to confidently guess if an article was written by AI but it would probably take more than just scanning through. So, our group chose to create a solution to the influx of AI generated articles online. We tackled this problem like Shakespeare, fighting fire with fire, and utilized OpenAI to develop and train a Model that could properly differentiate between its own kind and us.

We passed the stock codes to the EODHD Financial News API, giving us human article contents and titles.

We then passed the titles and length of the human articles to the OpenAI API to generate AI articles.

Advised By: Daisy Ferleger
Google Software Engineer

Results

  • 97% accuracy

  • Highest observed when compared to other pre-trained models we were able to fine-tune.