In part one, I briefly mentioned that the most advanced model right now is GPT-3 which is an order of magnitude larger still.
So to recap, GPT-3 is not practical for most business users. The largest GPT-3 model with 175 billion parameters took 355 GPU years at a cost of $4.6m even with the lowest-priced GPU cloud on the market.
Decoding and producing text on a pre-trained GPT-3 model requires a cluster, since the model is too large to store in memory on a single machine. I found with testing that GPT-2 was sufficient to achieve my goals and manageable on a 32-core, Xeon server (CPU only)
It’s still worth keeping this step change from GPU years to GPU centuries in mind for three reasons.
1. Price performance for compute and storage continues to follow a double exponential rate of improvement, as per Kurzweil’s ‘Law of Accelerating Returns’.
2. The AI economy is essentially owned and operated by a Big Tech oligarchy.
3. In practical terms, a modest-sized GPT-2 implementation works well and I was able to retune the medium-sized (355 million parameter) model on a 32-core server in around a week.
I hope that’s helped reframe some of the ethical questions around this paradigm. I’m now skipping back to the practical training process for the remainder of this part of the article.
TRAINING CONSIDERATIONS
Even after only 24 hours of tuning with casino reviews, the sample output from each training batch could be seen to quickly adapt to casino but would run into problems with games and software, referencing Xbox and PlayStation along with their respective titles rather than game developers and slot titles.
After 72 hours of training, most of these issues had been ironed out. However, when it came to outputting text, it would still run into the occasional loop or repetition, as illustrated in this example:— sample start —Restricted Countries and Territories
Players from the United States, United Kingdom, France, Denmark, Spain, Spain, Italy, Belgium, Netherlands, Hungary, Turkey, Hungary, Austria, Turkey, Hungary, Austria, Turkey, Hungary, Austria and Ukraine are not permitted to play at the Casino.— sample end —While occurrences of this problem largely disappeared after seven days of training, another obstacle refused to budge. Even though a fine-tuned 355 million parameter GPT-2 model can produce very coherent text, there remains a big problem: facts tend to be wrong.
Below is an example of a paragraph produced after 96 hours of training:— sample start —Live Casino Games
Players can play at (NAME REDACTED) which is an online casino where players can play slots, table games and live casino games. The live casino features live games hosted by friendly dealers which include Live Roulette, Live Blackjack and Live Baccarat.— sample end —The problem here for those who are still paying attention is that (NAME REDACTED) is a games developer rather than a casino brand. Furthermore, GPT-2 has assumed this to be a live casino brand with live roulette, live blackjack and live baccarat.
This problem required me to rethink the training process. By training on pre-processed paragraphs with the brand names and games replaced by tags (the brand names with a tag; live games with , , ), we get the benefit of coherent natural language, with the flexibility of reinserting the correct facts back into the text as a post-processing step. In this way the pre-processed training data appears as follows:— sample start —Live Casino Games
Players can play at Casino which is an online casino where players can play slots, table games and live casino games. The live casino features live games hosted by friendly dealers which include , and .— sample end —Once tuned on training data following this pattern, we’re able to replace the tags as a post-processing step and insert the correct brand name, along with three of the live games from a database populated with accurate casino data comprising brand name, website address, customer support contact details, software providers, withdrawal and deposit methods, etc.
I’ll cover things in more detail in the final step, but I wanted to illustrate the flexibility of GPT and some simple workarounds.
In the next iGB Affiliate I’ll present the final part of this three-part series. I’ll look at pre-processing, post-processing and some excellent learning resources which I’ve found most useful along the way.