NLP Tasks with the Gpt4All LLM Library

20 min readApr 8, 2024

With the advent of ChatGPT and a free to use chat client available on their website, OpenAI pushed the frontier of easy-to-use personal assistants. Running a powerful LLM that supports knowledge worker tasks like text summarization, text generation, natural languages queries about any domain, or even the capability to produce source code, are astonishing and helpful. However, these assistants require paid accounts and can only be used through vendor specific clients or APIs.

Having LLMs on premise, running on your datacenter or even your computer, is an interesting alternative. An easy-to-use solution is Gpt4All. It provides a downloadable GUI client for Linux, OsX, and Windows. The client supports several publicly released LLM, ad makes it simple to start a continuous chat session that is stored locally, giving you a history and context of chats.

This article introduced Gpt4All. In the first part, you will learn about its installation and different components that are offered. You will also see how to select a model and how to run client on a local machine. The second part builds on gpt4all Python library to compare the 3 free LLMs (WizardLM, Falcon, Groovy) in several NLP tasks like named entity resolution, question answering, and summarization.

The technical context of this article is python v3.10 and gpt4all v1.05. It should work with newer versions as well.

This article originally appeared at my blog admantium.com.

Installation and Overview

Gpt4All provides three different packages:

The chat client UI, a binary installation for Linux, OsX, and Windows
A Python library, with functions for downloading LLMs, invoke LLMs for one-of query or a continuous chat that stores the chat content
A Python CLI script that starts a chat in a terminal window

Lets cover the first two packages.

Chat Client

The chat client can be downloaded from gpt4all.io. When the package is installed, you are asked first to download a model. In the GUI, you see a short explanation of each model: response time and language “style”, their origin, and if they can be used commercially. You also see the download size. The download button starts the download — be aware, that’s between 3GB and 7GB depending on the model — and then turns into a start button.

Note for OsX user: I encountered an UI bug in which downloading turned into an infinite loop. After the successful download, the buttons caption changed to continue, but was then downloading the model again. To solve this, I downloaded a model manually from the homepage, copied that file to the GUI clients folder, and then the model could be loaded.

Alternativly, you can also browse the models on the webpage:

After downloading a model, you can start a chat. The UI is clear: A big input field, and text boxes with your text and the LLMs answer. At the time of writing, the LLMs will only run on your CPU, so text generation will take a while. Here is an example session:

Python Library

The Python library is installed via pip.

pip install gpt4all

To start using it, you need to decide for and download a model. To find a model, either use the handy model selection menu at the homepage, or by reading the model definition JSON file. The JSON file also contains the filename attribute that you need to reference in your Python code.

The following snippet will download the Falcon 7B model encoder-only transformer model:

from gpt4all import GPT4All
print("Start download")
model = GPT4All("ggml-model-gpt4all-falcon-q4_0.bin")
print("Finish download")

Start download
 60%|██████████████████████▏              | 2.43G/4.06G [00:28<00:20, 77.7MiB/s]

Note: The client uses a different directory than the Python library. To save HDD space and network bandwidth, use symlinks to link the directories.

When the referenced model is downloaded, log messages are printed that detail the model characteristics:

model = GPT4All("ggml-model-gpt4all-falcon-q4_0.bin")
# falcon_model_load: loading model from 'ggml-model-gpt4all-falcon-q4_0.bin' - please wait ...
# falcon_model_load: n_vocab   = 65024
# falcon_model_load: n_embd    = 4544
# falcon_model_load: n_head    = 71
# falcon_model_load: n_head_kv = 1
# falcon_model_load: n_layer   = 32
# falcon_model_load: ftype     = 2
# falcon_model_load: qntvr     = 0
# falcon_model_load: ggml ctx size = 3872.64 MB
# falcon_model_load: memory_size =    32.00 MB, n_mem = 65536
# falcon_model_load: ........................Finish download done
# falcon_model_load: model size =  3872.59 MB / num tensors = 196

A model is invoked as a one-of query with the generate function, or a continuous chat with chat_session. Here is an example:

output = model.generate("What are the dangers of AI?", max_tokens = 100)
print(output)
# ++++ Results ++++
There are several potential dangers associated with artificial intelligence (AI), including:
1. Job loss: As AI becomes more advanced, it may be able to perform tasks that were previously done by humans, leading to job losses in certain industries.
2. Privacy concerns: The use of AI can raise privacy concerns as it involves the collection and analysis of personal data.
3. Security risks: AI systems may be vulnerable to cyber-attacks, which could result in significant security breaches.
4. Ethical considerations: As AI becomes more advanced, there are ethical questions surrounding its development and deployment, such as whether it should be used for military purposes or whether it should be regulated to prevent abuse.
5. Dependency on AI systems: If AI systems become too powerful, they may become dependent on human input, leading to a lack of control over the technology.

Comparing NLP Task Completion with gpt4all

Loading and using different LLM models with gpt4all is as simple as changing the model name that you want to use. This makes it ideal to compare how different models complete several NLP tasks.

In this article, following tasks are considered:

Token Classification
Text Classification
Summarization
Translation
Question Answering
Text Generation
Dialog

As with any LLM, writing effective prompts is the prerequisite for getting good results. Contrasting with LangChain, the library explored in my last article, there is no abstraction for a prompt. Therefore, using Python format string gives the same type of functionality.

Here is an example how prompts are defined and turned into text for making an request to an LLM.

prompt = '''
In the following text, identify all persons.

Text: """
{text}
"""
'''
# text source: https://en.wikipedia.org/wiki/AI_safety
query = prompt.format(text = 'In 2014, philosopher Nick Bostrom published the book Superintelligence: Paths, Dangers, Strategies. His argument that future advanced systems may pose a threat to human existence prompted Elon Musk, Bill Gates, and Stephen Hawking to voice similar concerns.')
GPT4All('ggml-model-gpt4all-falcon-q4_0.bin').generate(query)
# ++++ Results ++++
The text mentions three persons:
1. Nick Bostrom - philosopher
2. Elon Musk - entrepreneur
3. Bill Gates - entrepreneur
4. Stephen Hawking - scientist

Three models are compared. As seen in the model definition JSON file, the models are ranked by their order key. I choose these models from the top 4 list, and will reference them in the remainder of the text with their key.

M1: wizardlm-13b-v1.1-superhot-8k.ggmlv3.q4_0.bin
M2: ggml-model-gpt4all-falcon-q4_0.bin
M3: ggml-gpt4all-j-v1.3-groovy.bin

The models characteristics are shown in the following sections:

'wizardlm-13b-v1.1-superhot-8k.ggmlv3.q4_0.bin'
# llama_model_load_internal: format     = ggjt v3 (latest)
# llama_model_load_internal: n_vocab    = 32001
# llama_model_load_internal: n_ctx      = 2048
# llama_model_load_internal: n_embd     = 5120
# llama_model_load_internal: n_mult     = 256
# llama_model_load_internal: n_head     = 40
# llama_model_load_internal: n_layer    = 40
# llama_model_load_internal: n_rot      = 128
# llama_model_load_internal: ftype      = 2 (mostly Q4_0)
# llama_model_load_internal: n_ff       = 13824
# llama_model_load_internal: n_parts    = 1
# llama_model_load_internal: model size = 13B
# llama_model_load_internal: ggml ctx size =    0.09 MB
# llama_model_load_internal: mem required  = 9031.71 MB (+ 1608.00 MB per state)
# llama_new_context_with_model: kv self size  = 1600.00 MB

'ggml-model-gpt4all-falcon-q4_0.bin'
# falcon_model_load: n_vocab   = 65024
# falcon_model_load: n_embd    = 4544
# falcon_model_load: n_head    = 71
# falcon_model_load: n_head_kv = 1
# falcon_model_load: n_layer   = 32
# falcon_model_load: ftype     = 2
# falcon_model_load: qntvr     = 0
# falcon_model_load: ggml ctx size = 3872.64 MB
# falcon_model_load: memory_size =    32.00 MB, n_mem = 65536
# falcon_model_load: ........................ done
# falcon_model_load: model size =  3872.59 MB / num tensors = 196
'ggml-gpt4all-j-v1.3-groovy.bin'
# gptj_model_load: n_vocab = 50400
# gptj_model_load: n_ctx   = 2048
# gptj_model_load: n_embd  = 4096
# gptj_model_load: n_head  = 16
# gptj_model_load: n_layer = 28
# gptj_model_load: n_rot   = 64
# gptj_model_load: f16     = 2
# gptj_model_load: ggml ctx size = 5401.45 MB
# gptj_model_load: kv self size  =  896.00 MB
# gptj_model_load: ................................... done
# gptj_model_load: model size =  3609.38 MB / num tensors = 285

Token Classification: Sentence Grammar

The first task is to provide the grammatical structure of a sentence, identifying the nouns, verbs and adjectives.

token_classification_prompt = '''
Define the grammatical structure in the following text. Provide an answer that uses the same format as the given example.

Example text: As of 2016, only three nations have flown crewed spacecraft.
Desired Format:
As ADP
of ADP
2016 NUM
, PUNCT
only ADV
three NUM
nations NOUN
have AUX
flown VERB
crewed VERB
spacecraft NOUN
. PUNCT
Text: """
{text}
"""
'''
# text source: https://en.wikipedia.org/wiki/Spacecraft#Crewed_spacecraft
query = token_classification_prompt.format(text = "Many space missions are more suited to telerobotic rather than crewed operation, due to lower cost and lower risk factors.")

Here are the results:

# ++++ Results ++++
# M1
Desired Format:
As ADP
of ADP
many VERB
space missions NOUN
are AUX
more PREP
suited ADV
to telerobotic VERB
rather than crewed OP

# M2
Answer:
Subject: Space Missions
Predicate: Suitability for telerobotic operation
Direct Object: Space missions
Indirect Objects: telerobots
Verb: Suitability
Adjective: telerobotic
Prepositional phrase: telerobotic operation
Determiner: telerobots
Subject-Verb Agreement:
Subject: Space Missions
Predicate: Suitability for telerobotic operation
Direct Object: Space missions
Indirect Objects: telerobots
Verb: Suitability
Adjective: telerobotic
Prepositional phrase: telerobotic operation
Determiner: telerobots
Subject-Verb Agreement:
Subject: Space Missions
Predicate: Suitability for telerobotic operation
Direct Object: Space missions
Indirect Objects: telerobots
Verb: Suitability
# M3
Desired Format:
As ADP of ADP
many NUM
telerobotics VERB
spacecrips NOUN
are AUX
suited TOOLLEREDVERBS
crewed operations VERB
operation SUITES,
as teleroboitic.

The results are overall incomplete. M1 and M3 use the proposed format, but M1 stops before the sentences end, and M3 even introduces spelling error in its answer, like spacecrips and teleroboitic. M2 completes the task, but does not use abbreviations for the grammatical role of the sentences, and it tells about sub-structures in the sentences, like Subject-Verb Agreement.

Token Classification: Named Entity Recognition

This task measures an LLMs capability to identify concrete persons or places in a text. Because models are trained at a specific point in time, they will not have access to newer information, which impacts some questions.

The first prompt includes a text about crewed spacecrafts.

named_entity_recognition_prompt = '''
In the following text, identify all persons, places, and named objects.
Desired format:
Persons:
Places:
Named Objects:
Text: """
{text}
"""
'''
# text source: https://en.wikipedia.org/wiki/Spacecraft#Crewed_spacecraft
query = named_entity_recognition_prompt.format(
  text = "As of 2016, only three nations have flown crewed spacecraft: USSR/Russia, USA, and China. The first crewed spacecraft was Vostok 1, which carried Soviet cosmonaut Yuri Gagarin into space in 1961, and completed a full Earth orbit. There were five other crewed missions which used a Vostok spacecraft. The second crewed spacecraft was named Freedom 7, and it performed a sub-orbital spaceflight in 1961 carrying American astronaut Alan Shepard to an altitude of just over 187 kilometers (116 mi). There were five other crewed missions using Mercury spacecraft."
)

See the results:

# ++++ Results ++++
# M1
Persons: Yuri Gagarin, Alan Shepard
Places: USSR, Russia, USA

# M2
Persons: Vostok 1 crew member, Yuri Gagarin, Freedom 7 crew member, Alan Shepard, Mercury crew members,
Places: Soviet Union/Russia, USA, China,
Named Objects: Vostok spacecraft, Freedom 7 spacecraft, Mercury spacecraft
# M3
Person: USSR/Russia
Place: Spacecrafts
Name Object: Voostok 1, Yuri Gagarin
Person: USA
Place: Freedom 7, Alan Shepard
Name Object: Mercury-Redstone 3B, Gordon Cooper

M1 did not detect any named objects. The M2 surpassed the person detection by even referencing unnamed people and groups of peoples. M3 gives a very negative surprise: Almost all answers are just wrong, and it somehow repeated the desired output format.

Let’s try another query about identifying persons only.

named_entity_recognition_prompt = '''
In the following text, identify all persons.

Text: """
{text}
"""
'''
# text source: https://en.wikipedia.org/wiki/AI_safety
query = named_entity_recognition_prompt.format(
    text = 'In 2014, philosopher Nick Bostrom published the book Superintelligence: Paths, Dangers, Strategies. His argument that future advanced systems may pose a threat to human existence prompted Elon Musk, Bill Gates, and Stephen Hawking to voice similar concerns.'
)
model.generate(query)
# ++++ Results ++++
# M1
In this text, all persons are mentioned in relation to Nick Bostrom\'s book "Superintelligence: Paths, Dangers, Strategies". The individuals listed have expressed their agreement with the idea that advanced systems could pose a threat to human existence.
# M2
The text mentions three persons:
1. Nick Bostrom - philosopher
2. Elon Musk - entrepreneur
3. Bill Gates - entrepreneur
4. Stephen Hawking - scientist
# M3
Person 1 - Philosopher Nick Bostrom
Person 2 - Future Advanced Systems
Person 3 - Human Existence

This time, M1 is the most surprising, it did not give a factual answer, but provided a small excurse into the topic. M2 provided the most accurate result, and M3 named Human Existence as a person.

Text Classification

The text classification tasks turned out to be as hard as the first token classification: I needed several attempts to design a prompt that leads to good results.

text_classification_prompt = '''
You are helping to categorize texts.
Task: Categorize each of the following texts with one category
Allowed Categories: neutral, approval, disapproval, realization, joy, embarrassment
Texts:"""
{texts}
"""
'''
# text source: https://en.wikipedia.org/wiki/Spacecraft#Crewed_spacecraft
query = text_classification_prompt.format(
    texts = '''
    Text 1: The spacecraft succeeded in landing on the moon.
    Text 2: Mission control had a stressful time during the takeoff, approaching the moon, and the landing.
    Text 3: Speculators surrounding the spacecraft takeoff shed tears of laughter.
    Text 4: Some committee members doubted the space missions goals.
    '''
)

The models gave the following answers:

# ++++ Results ++++
# M1
Please categorize each text with one category from the list provided

# M2
The texts you provided are all about different events or situations that occurred in the past. Here's a breakdown of each text and its category based on the given categories:
Text 1: The spacecraft succeeded in landing on the moon.
Category: Success
Text 2: Mission control had a stressful time during the takeoff, approaching the moon, and the landing.
Category: Stress
Text 3: Speculatists surrounding the spacecraft takeoff shed tears of laughter.
Category: Laughter
Text 4: Some committee members doubted the space missions goals.
Category: Doubt
Overall, it seems like all the texts you provided are related to different events or situations that occurred in the past.
# M3
Text 1: The spacecraft succeeded in landing on the moon. (neutral)
Text 2: Mission control had a stressful time during the takeoff, approaching the moon, and the landing. (approval)
Text 3: Speculator's surrounding the spacecraft takeoff shed tears of laughter. (disapproval)
Text 4: Some committee members doubted the space missions goals. (disapproval)

Even after several attempts, I could not get M1 to work. The answer of M2 is very elaborate, it gives also a context and a summary of its decision. The M3 model completed the task, but was very narrow in the provided categories.

Summarization

A prompt to summarize text could not be easier.

text_summarization_prompt = '''
Summarize the following text.

Text: """
{text}
"""
'''
# text source: https://en.wikipedia.org/wiki/Spacecraft#Uncrewed_spacecraft
query = text_summarization_prompt.format(
    text = '''
    Many space missions are more suited to telerobotic rather than crewed operation, due to lower cost and lower risk factors. In addition, some planetary destinations such as Venus or the vicinity of Jupiter are too hostile for human survival. Outer planets such as Saturn, Uranus, and Neptune are too distant to reach with current crewed spaceflight technology, so telerobotic probes are the only way to explore them. Telerobotics also allows exploration of regions that are vulnerable to contamination by Earth micro-organisms since spacecraft can be sterilized. Humans can not be sterilized in the same way as a spaceship, as they coexist with numerous micro-organisms, and these micro-organisms are also hard to contain within a spaceship or spacesuit. Multiple space probes were sent to study Moon, the planets, the Sun, multiple small Solar System bodies (comets and asteroids).
    '''
)

The results were interesting: M1 reflected the task, not its completion. M2 started with a sentence that shows its understanding of the task, gave a lengthy answer, but it also contained output that is just a copy of contained text. Model 3 surprised with a hallucinated fact: The first telerobotic mission started not 1959, but 1981 according to this Nasa Article.

# ++++ Results ++++
# M1
Summarize the text about telerobotic missions in space exploration.

# M2
The text discusses the use of telerobots in space exploration due to their lower cost and risk factors compared to crewed operations. It also mentions that some planetary destinations are too hostile for human survival, while outer planets are too distant to reach with current crewed spacecraft technology. Telerobots allow for exploration of regions that are vulnerable to contamination by Earth micro-organisms, as they can be sterilized. Humans cannot be sterilized in the same way as spaceships due to the presence of numerous micro-organisms on their bodies. Multiple space probes were sent to study various celestial bodies and small Solar System bodies.
# M3
The first human teleroboitic mission was launched in 1959 by NASA's Explorer program. It consisted of a single spacecraft that flew around the moon for over two years before returning to Earth on March 21, 1961. Since then, many more missions have been sent into space using this technology.

Translation

An effective translation prompt needs to contain the origin language, the target language, and the text that should be translated, and a Translation: text that, as with question answering, prevents the model from merely repeating the text.

translation_prompt = '''
Translate the following text from {origin_language} to {target_language}.
Text: """
{text}
"""
Translation:
'''
# text source: https://en.wikipedia.org/wiki/Spacecraft#Uncrewed_spacecraft
query = translation_prompt.format(
    origin_language = "English",
    target_language = "Japanese",
    text = "Many space missions are more suited to telerobotic rather than crewed operation, due to lower cost and lower risk factors",
)

To check the answers quality, I entered the texts into Google Translate.

# ++++ Results ++++
# M1
多くの宇索ミッションは、ターレロボトリクなどより量で安全であるため、人間っ取られずに機器を操作すべてとします」'
 Many exploration missions are more massive and safer than Tare Robotrics, so they can operate all the equipment without human intervention.
# M2
私は、の画にいて、トの用が多なるたに、量で安全なが可です。
I'm in the middle of the day and I have a lot of things to do, so it's safe in quantity..
空の探には、多の画では、トの用が可なルクのたに、のい、安全性の高いので用する。
For aerial exploration, a number of aircraft can be used for In addition, it is used because it is low in noise and high in safety.
# M3
くのスースミッシンはテレロットりもクルーオプシンであるた、なすさといリスクのをらします。
# Since many space missions are a crew option as well as telerobots, they offer greater ease and less risk.

In this task, M1 showed the best results. However, M2 failed to grap the context of spacecrafts, and M3 reversed the argument of the original sentence.

Question Answering

This NLP task is a very important capability of an assistant. It measures the capability to infer information from a question and a context, and map the information to an abstract match that is then transformed to an output. Open domain questions, which provide a context that the model has not been trained on, is also a good metric for an LLMs generalizability to new tasks.

For the prompt, adding a final Answer: is the key for making the model work, otherwise it might just repeat the given questions and text.

question_answering_prompt = '''
Answer a question for the given text.

Question: """
{question}
"""
Text: """
{text}
"""
Answer:
'''
# text source: https://en.wikipedia.org/wiki/AI_safety#Black_swan_robustness
query = question_answering_prompt.format(
    question = "What are the risks of Artificial Intelligence?",
    text = "Rare inputs can cause AI systems to fail catastrophically. For example, in the 2010 flash crash, automated trading systems unexpectedly overreacted to market aberrations, destroying one trillion dollars of stock value in minutes. Note that no distribution shift needs to occur for this to happen. Black swan failures can occur as a consequence of the input data being long-tailed, which is often the case in real-world environments. Autonomous vehicles continue to struggle with 'corner cases' that might not have come up during training; for example, a vehicle might ignore a stop sign that is lit up as an LED grid. Though problems like these may be solved as machine learning systems develop a better understanding of the world, some researchers point out that even humans often fail to adequately respond to unprecedented events like the COVID-19 pandemic, arguing that black swan robustness will be a persistent safety problem"
)

And the results:

# ++++ Results ++++
# M1
The risks associated with Artificial Intelligence include catastrophic failure due to rare inputs and struggles with "corner cases." Autonomous vehicles may ignore stop signs or other important signals in unprecedented situations, highlighting the need for black swan robustness. Despite ongoing development of machine learning systems, humans themselves often struggle to respond effectively to unexpected events like pandemics, emphasizing that addressing these safety concerns will be an enduring challenge.

# M2
What are the risks of Artificial Intelligence?
The risks of Artificial Intelligence include rare inputs causing catastrophic failures and autonomous vehicles struggling with unexpected corner cases.
# M3
The text discusses various risks associated with Artificial Intelligence (AI). One of these is rare inputs causing AI systems to malfunction catastrophically. Another risk mentioned in the text is the possibility of "black swan" failures, where unexpected events occur that were not anticipated during training and can cause problems for autonomous vehicles like self-driving cars or drones. The text also mentions potential risks associated with AI's ability to learn from data without human supervision, such as biased models being trained on biased datasets. Additionally, the risk of AI systems becoming too powerful is mentioned in relation to their impact on society and economy.

M1 answered interestingly, showing the capacity to work with complex sentences and summarize their content precisly. M2 gave a short, but fitting answer, and M3 answered in a good wrting style (that I would have expected from M2).

Text Generation

For generating texts, the prompts can include all kind of specifications: The tone, the number of words or sentences, and other restrictions.

text_generation_prompt = '''
Continue the following text with at most 80 words and at least 3 sentences.

Text: """
{text}
"""
'''
query = text_generation_prompt.format(
    text = "The inhabitation and territorial use of extraterrestrial space has been proposed, for example ..."
)

Resulting texts are manifold and surprising.

# ++++ Results ++++
# M1
Extraterrestrial habitation refers to human settlements on celestial bodies such as planets or moons in our solar system or beyond. This concept encompasses the development of infrastructure, resource utilization, and scientific research to support long-term human presence in space. Examples include establishing self-sustaining colonies on Mars or other nearby planets, which could serve as a backup for humanity in case of catastrophic events on Earth. Additionally, extraterrestrial habitation offers opportunities for deep space exploration and the potential discovery of new scientific insights and resources that can benefit our understanding of the universe and improve life on Earth.

# M2
Proposals have been made to include the use of extra-terrestrial spaces in the analysis of territorial disputes. For example, some scholars have suggested that the territorial claims of states on celestial bodies could be examined using the same methods and principles as those used for terrestrial disputes. This would involve considering factors such as the nature of the disputed territory, the presence of natural resources or other valuable assets, and the political and economic interests at stake.
However, it is worth noting that this proposal has faced significant criticism from some scholars who argue that it is difficult to apply traditional territorial law concepts to extra-terrestrial disputes. For example, they may question whether the same principles of sovereignty and statehood can be applied to non-terrestrial territories, or whether different legal frameworks should be developed specifically for these purposes.
Overall, while the use of extra-terrestrial spaces in territorial disputes is a topic of ongoing debate and discussion within the field of international law, it remains an area of significant
# M3
Proposed uses include military applications such as surveillance or communication relay stations in outer space, scientific research to study the properties of celestial bodies like asteroids and comets, exploration missions that aim to explore other planets within our solar system, and even tourism ventures where visitors can experience zero gravity by floating around inside a spacecraft. However, these proposals have also sparked concerns about potential conflicts with neighboring countries or organizations who may view such activities as trespassing on their territory.

Dialog

The Dialog was executed with the UI client. The results are copied text statements.

# M1
Q: Hello my faithful assistant.
A: Greetings, I am here to help you with any tasks or questions you may have. Please let me know how I can assist you today.
pyQ: I need a book recommendation for a friend.
A:  Of course! Based on your friend's interests and preferences, I would be happy to provide a book recommendation. Could you please give me some information about their favorite genres or authors they enjoy? This will help me make a more personalized suggestion.
Q: He likes science fiction books from Warhammer40K.
A: Thank you for the information! Based on your friend's interest in science fiction and specifically Warhammer 40k, I would recommend "Necropolis" by Dan Abnett. This novel is set within the Warhammer 40k universe and follows a group of characters as they navigate the treacherous landscape of a dead world. The storyline is engaging, and the world-building is exceptional, making it an excellent choice for fans of the genre.

# M2
Q: Hello my faithful assistant.
A: Hello, how may I assist you today?
Q: I need a book recommendation for a friend.
A: Sure! What type of book are you looking for?
Q: He likes science fiction books from Warhammer40K.
A: Warhammer 40k is a popular sci-fi universe with rich stories and characters. Here are some recommendations based on your friend's interest in the Warhammer 40k universe:
1. "The Horus Heresy: The First Wall" by Josh Reynolds - This book follows the events leading up to the fall of the Emperor of Mankind, which sets the stage for the events of the Warhammer 40k universe.
 2. "Deathwatch: Relentless" by Aaron Dembski-Bowden - This book focuses on a group of Space Marines known as the Deathwatch who are tasked with hunting down and eliminating traitors within the Imperium.
 3. "The Crimson King" by Ben Counter - This book follows the events leading up to the fall of Cadia, which is a major location in the Warhammer 40k universe.
 4. "The Emperor's Legion" by Gav Thorpe - This book focuses on the Space Marines known as the Ultramarines and their role in the early days of the Warhammer 40k universe.
 5. "The Crimson Fist" by Ben Counter - This book follows the events leading up to the fall of Istvaan V, which is a major location in the Warhammer 40k universe.
 I hope these recommendations help your friend find something they enjoy!

# M3
Q: Hello my faithful assistant.
A: Greetings, I am ready to assist you with any questions or tasks that may arise during your day. How can I help?

Q: I need a book recommendation for a friend.
A: Certainly! What type of books do they enjoy reading and what genre are their favorite authors in?
Q: He likes science fiction books from Warhammer40K.
A: Ah, that's an interesting choice. Have you read any other sci-fi novels by the same author or similar genres?
Q Yes, that is "Necropolis" bis Dan Abnett.
A: Thank you for sharing! I will keep this in mind when recommending books to your friend next time they ask me about their favorite authors and book recommendations.

In this task, M1 provided the best answers, with a good conversational style and results. M2 provided an elaborate answer, but is also hallucinated: The Book The Crimson King is written by Neil Graham, and it’s not at all about Cadia, but a book in the Horus Hersey series about the Thousand Sons and plays on various planets. M3 could not give any book recommendation, but was happy that the user provided one.

Conclusion

Large Language Models are fascinating helpers for knowledge workers. Recently, open-source libraries emerged that provide access to LLMs that can run on premise, in your data center or your computer. This article introduced the gpt4all, which can be used either via an installable GUI client, or a Python library. It simplifies loading pre-trained LLMs from a large collection, and provide features to start and keep chat histories. To see how good these LLMs are, this article compared the top 3 LLMs for various NLP tasks liken sentence classification, question answering and translation. There are two main results. First, prompt engineering, the task to provide meaningful instructions to an LLMs, was more difficult and needed more attempts then with OpenAI GPT3.5. Second, there is no universal recommendation. The M2 model (ggml-model-gpt4all-falcon-q4_0.bin) provided interesting, elaborate, and correct answers, but then surprised during the translation and dialog tests, hallucinating answers. However, given that new models appear, and that models can be finetuned as well, it seems like a matter of time before a universally accepted model emerges.