Revolutionizing Language Model Integration - LangChain #1

In today's digital age, language models have become an integral part of various applications, enhancing user experience and streamlining processes. One such groundbreaking framework that stands out is LangChain.

LangChain, at its core, is designed to empower applications with the prowess of language models. It presents a distinctive methodology to seamlessly integrate existing Language Learning Models (LLM). This discourse seeks to unravel the intricacies of LangChain, spotlighting its salient features and the unparalleled advantages it brings to the table.

LangChain: A Brief Overview

LangChain is not just another framework; it's a revolution in the realm of language models. It facilitates the creation of applications that are context-aware, connecting a language model to various sources of context, such as prompt instructions, few-shot examples, and content to ground its response.

Furthermore, it equips applications with the capability to reason, leveraging the language model to deduce answers rooted in the given context and chart out subsequent courses of action.

A standout aspect of this library is its adaptability to external ecosystems. Be it databases, LLMs, data extraction tools, or memory management, LangChain boasts integrated modules that can be operationalized with minimal effort.

To provide a clearer picture, here are some of the most pivotal modules LangChain offers:

  • LLMs: to invoke whatever LLM you are working with like Llama or OpenAI
  • Chat models: to build conversation with a existing model like ChatGPT
  • Document loaders: to extract data from a file like Excel or PDF
  • Vector stores: to store and search over unstructured data
  • Agents: to let chains choose which tools to use given high-level directives
Modules | 🦜️🔗 Langchain
LangChain provides standard, extendable interfaces and external integrations for the following modules, listed from least to most complex:

Use case: summarize candidates' CVs

To truly appreciate the simplicity LangChain introduces, let's delve into a tangible use case: crafting a script to encapsulate the essence of candidates' CVs.

Initiating the process requires extracting candidate information from the file. LangChain, with its integration of the PyPdf library, streamlines this through the PyPdfLoader class.

from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("cv.pdf")
pages = loader.load_and_split()
cv_content = " ".join([page.page_content for page in pages])

The load_and_split method is used to obtain the information and separate it into an array of elements. We then concatenate them to obtain the entire content of the CV in a string variable. In exactly 3 instructions, the content of the document is extracted.

Subsequent to data extraction, the content is relayed to OpenAI's ChatGPT chatbot. A mere token associated with the user's account and the selection of the desired model, in this instance, GPT 3.5, sets the stage.

from langchain.chat_models import ChatOpenAI

chat = ChatOpenAI(temperature=0, 
                 openai_api_key='OPENAI-API-KEY', 
                 model_name="gpt-3.5-turbo")

One might say that the code written here is highly dependent on OpenAI, and therefore has no real interest in being used. If a library just adds an overlay without simplifying use, it just adds a layer of complexity. And that's not the case here.

The beauty of LangChain lies in its universality. While the model declaration is intrinsically linked to ChatGPT, the subsequent steps are universally applicable across LangChain models, categorized into SystemMessage, HumanMessage, and AIMessage.

Whether you're using a chat model, an LLM, a text embedding model or even an agent, communication prompts are divided into these 3 messages. Nice, isn't it?

Concluding the use case, a prompt is crafted to solicit the desired response from the model, and LangChain's template classes ensure the prompt aligns with the model's expectations.

from langchain.schema import SystemMessage

system_message = SystemMessage(content="""
    You're recruiterGPT and your role is to summary a candidate cv.
    Do not forget any technologies or tools needed. The CV is for a software developer.
    You'll focus on keeping all experiences with skills and company to link later with the job offer.
    """)

The final step is to fill in the prompt containing the request to be made. LangChain provides template classes to automatically build the message structure expected by the chat.

In this example, the request is to produce a simple summary of the extracted document.

from langchain.prompts.chat import ChatPromptTemplate, HumanMessagePromptTemplate

human_template = "Please summarize the key details of the following job offer: {cv_content}"
human_message = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])
messages = chat_prompt.format_prompt(cv_content=cv_content).to_messages()
offer_summary = chat(messages)

And here's the result:

print(offer_summary.content)

# Result here:
# The candidate has 8 years of experience in software development, specializing in building robust and scalable software using C#/.NET. They have worked on creating predictive AI models and have experience with technologies such as RabbitMq, MassTransit, Redis, Akka.NET, xUnit, GitHub, GitLab, and Azure DevOps.
# [...]
# Overall, the candidate has a solid background in software development with expertise in C#/.NET and a range of other technologies. They have experience working on complex projects and have a strong track record of delivering high-quality software solutions.

And that's it, quick, isn't it?

💡
It's not a question of judging the quality of the result produced, which mainly depends on the model used and the summary prompt, but rather on how easy it is to set up.

Summary

LangChain stands as a testament to the evolution of AI model utilization, allowing users to focus on the crux: data manipulation. The library eradicates the need for intricate setups, with its modules offering swift and efficient connections to external platforms.

There's no need to waste time wiring up external storage and analysis elements, as the integrated modules enable you to connect to external environments in just a few lines.

In less than twenty lines, we were able to create a PDF resume summary script, and it's bluffingly efficient, hats off to LangChain!

To go further:

Chatbots | 🦜️🔗 Langchain
Open In Colab

Have a goat day 🐐