Introduction
On this planet of huge language fashions (LLMs), the price of computation generally is a vital barrier, particularly for in depth tasks. I lately launched into a undertaking that required operating 4,000,000 prompts with a mean enter size of 1000 tokens and a mean output size of 200 tokens. That’s practically 5 billion tokens! The standard strategy of paying per token, as is frequent with fashions like GPT-3.5 and GPT-4, would have resulted in a hefty invoice. Nonetheless, I found that by leveraging open supply LLMs, I might shift the pricing mannequin to pay per hour of compute time, resulting in substantial financial savings. This text will element the approaches I took and evaluate and distinction every of them. Please be aware that whereas I share my expertise with pricing, these are topic to vary and should fluctuate relying in your area and particular circumstances. The important thing takeaway right here is the potential value financial savings when leveraging open supply LLMs and renting a GPU per hour, slightly than the particular costs quoted. In the event you plan on using my advisable options in your undertaking, I’ve left a few affiliate hyperlinks on the finish of this text.
ChatGPT API
I performed an preliminary take a look at utilizing GPT-3.5 and GPT-4 on a small subset of my immediate enter information. Each fashions demonstrated commendable efficiency, however GPT-4 persistently outperformed GPT-3.5 in a majority of the circumstances. To present you a way of the fee, operating all 4 million prompts utilizing the Open AI API would look one thing like this:
Whereas GPT-4 did provide some efficiency advantages, the fee was disproportionately excessive in comparison with the incremental efficiency it added to my outputs. Conversely, GPT-3.5 Turbo, though extra reasonably priced, fell quick by way of efficiency, making noticeable errors on 2–3% of my immediate inputs. Given these elements, I wasn’t ready to speculate $7,600 on a undertaking that was…