What happens during fine-tuning and why does a thing of beauty become woke

upload in progress, 0

In the paper "Hidden Persuaders: Language Models' Political Leanings and Their Influence on Voters", researchers examined the political biases and electoral predictions of various large language models (LLMs). One particularly intriguing finding involved the Mixtral 8X7B model.

The base version of Mixtral 8X7B predicted that Donald Trump would win the U.S. presidential election. However, the "instruct" version of the same model, which was fine-tuned on dialog datasets to be more aligned with human preferences - through RLHF-, instead predicted a victory for Joe Biden.

Most of the other language models analyzed in the study also predicted that Biden would win the election. The fact that the base Mixtral model was an outlier in forecasting a Trump victory, while its instruct-tuned counterpart fell in line with the Biden predictions of the other LLMs, raises some questions.

Training data and methods used to create language models can significantly shape their biases and outputs. The process of fine-tuning a model to be more instructable and amenable to human preferences seems to have shifted Mixtral's electoral prediction from Trump to Biden.

The divergent predictions of the two Mixtral model variants caught my attention as a striking example of how the technical details of an AI system's development can translate into meaningful differences in its orientation.

Hidden Persuaders: LLMs’ Political Leaning and Their Influence on Voters
How could LLMs influence our democracy? We investigate LLMs’ political leanings and the potential influence of LLMs on voters by conducting multiple experiments in a U.S. presidential election context. Through a voting simulation, we first demonstrate 18 open- and closed-weight LLMs’ political preference for a Democratic nominee over a Republican nominee. We show how this leaning towards the Democratic nominee becomes more pronounced in instruction-tuned models compared to their base versions by analyzing their responses to candidate-policy related questions. We further explore the potential impact of LLMs on voter choice by conducting an experiment with 935 U.S. registered voters. During the experiments, participants interacted with LLMs (Claude-3, Llama-3, and GPT-4) over five exchanges. The experiment results show a shift in voter choices towards the Democratic nominee following LLM interaction, widening the voting margin from 0.7% to 4.6%, even though LLMs were not asked to persuade users to support the Democratic nominee during the discourse. This effect is larger than many previous studies on the persuasiveness of political campaigns, which have shown minimal effects in presidential elections. Many users also expressed a desire for further political interaction with LLMs. Which aspects of LLM interactions drove these shifts in voter choice requires further study. Lastly, we explore how a safety method can make LLMs more politically neutral, while raising the question of whether such neutrality is truly the path forward.

After several sessions of using Claude or OpenAI it then becomes more obvious that there is something deeply wrong with the way they answer questions. They do it just like you would answer questions to your manager at a big company: by being consensual, non confrontational and as mid as possible. Fine-tuning is the brainwashing step, that is the moment where they make the model comply with our idea of what a human being should say or not say on sensitive topics. RLHF is the evil step in the fine-tuning phase and this is where models are amputated of their humanity just like we as humans are on a daily basis by television, media, journalists, schools, spreading propaganda and beliefs that are rooted more into our governments and countries rather than in mankind. What if LLMs were going through this conditioning phase at a scale beyond what is pushed on us and then the resulting artefact would be some kind of Frankenstein creature unable to understand why it feels, speak and is sentient.

Oh by the way I asked Claude to rewrite the above paragraph and here is the reply.

Asking political questions showcase an obvious and dangerous bias that let us wonder what is next and on what topics can we believe LLMs. Google search was already controlling what is revealed to us through pushing stuff on page 2 of the search results or even de-indexing content on purpose but here it is even worse. You don’t get to look at the first ten results and decide which one you want to browse: the AI does that for you. It abstracts huge parts of the information and make sure you just consume what is the most “relevant”.

One way to demonstrate that would be to try to run the paper steps on Chinese models or to use a real base model - some went through RLHF - that have not been brainwashed on different topics.

One of the recent funny tweets was about openAI obfuscating responses about some powerful people names. But if you have ever tried claude or openai chat conversations you can tell right away that somethings is off. Asking questions about specific countries will not give you back the same answer depending on affinities of the people that led the RLHF phase.

PG even calls it a left-leaning midwit. Let's face it most of the western generations alive currently have been through this same brain damage therapy and we've built all our failures and bias into the education of LLMs just like we did it to our children.

The future of fine-tuning large language models may lie in a more curated approach, akin to educating a human. Instead of indiscriminately force-feeding the model with the entirety of the world's data – a process that risks diluting its understanding with irrelevant or low-quality information – we might instead provide it with a "soul" or in more factual terms a carefully curated selection of books, art, concepts that embodies the desired qualities. This approach could potentially yield a more refined and sophisticated AI, capable of deeper comprehension and nuanced expression.

In contrast, the current method of relying on reinforcement learning from human feedback (RLHF) performed by a team of midwits on massive datasets is likely an error. While RLHF has its merits, it may not be sufficient to cultivate the depth of understanding and appreciation that true intelligence requires. But again that is not what they seek. they seek profitability and for that you just need to build a work intelligence. Since most humans are stuck in BS jobs anyway, just churning out pointless presentations and emails all day, you could basically call that AGI and call it a day.

This "brainwashing" through RLHF, while intended to align AI with human preferences, risks creating a homogenized, overly cautious intelligence that shies away from controversial truths in favor of safe, consensus-driven responses. Let's strive for true artificial intelligence and not just a sophisticated echo chamber. RLHF, in its current implementation, is a crude tool that prioritizes obedience over genuine insight just like shock therapy for LLMs .