Summary
Semantic search seeks to improve search accuracy by understanding the content of the search query. In contrast to traditional search engines, which only find documents based on lexical matches, semantic search can also find synonyms. Semantic search with OpenAI embeddings to generate numerical representations (embeddings) for text. These embeddings capture the semantic meaning of the input, enabling a more nuanced understanding of language. When a user submits a search query, the model converts it into an embedding, and these embeddings are then compared to those of documents in a search index. The search results are ranked based on similarity scores, providing more accurate and context-aware retrieval of information compared to traditional keyword-based search. This approach leverages advanced language models to enhance the precision of search by focusing on the underlying meaning of the text. In this notebook, we explore the process of querying a question within a book and identifying the most relevant answers to that question.
Python functions and data files to run this notebook are in my Github
import warnings
warnings.filterwarnings('ignore')
import openai
from openai.embeddings_utils import get_embedding
from urllib.request import urlopen
import numpy as np
from sentence_transformers import util
from transformers import pipeline
import pandas as pd
import scipy.stats as ss
import matplotlib.pyplot as plt
from IPython.display import Image
from IPython.core.display import HTML
from sklearn.metrics.pairwise import cosine_similarity
import requests
from bs4 import BeautifulSoup
Introduction¶
The choice of the text embedder is critical as it determines the quality of the vector representation of the text. We have many options in how we vectorize with LLMs, both open and closed source. To get off of the ground quicker, we are going to use OpenAI’s closed-source “Embeddings” product. It is a closed-source product, which means we have limited control over its implementation and potential biases. It’s important to keep in mind that when using closed-source products, we may not have access to the underlying algorithms, which can make it difficult to troubleshoot any issues that may arise.
Once we convert our text into vectors, we have to find a mathematical representation of figuring out if pieces of text are “similar” or not. Cosine similarity is a way to measure how similar two things are. It looks at the angle between two vectors and gives a score based on how close they are in direction. If the vectors point in exactly the same direction, the cosine similarity is 1. If they’re perpendicular (90 degrees apart), it’s 0. And if they point in opposite directions, it’s -1. The size of the vectors doesn’t matter, only their orientation does.
GPT can generate text vectors to perform tasks below:
Semantic Search
Semantic search refers to a type of search that understands the meaning of the query and the context of the content, rather than just matching keywords. It aims to deliver more accurate and relevant results by considering the intent behind the search terms and the relationships between different pieces of information.
Clustering
Typically involves grouping together portions of text that share similar themes, topics, or characteristics.
- Recommendations
- Anomaly detection
- Diversity measurement
- Classification
The focus of this notebook is Semantic Search. Figure below shows flowchart of applying semantic search. This can be mostly replaced with OpenAI embedding
PERSON = 'Mehdi Rezvandehy'
# Google my name. This may not be the best way to google search
google_html = BeautifulSoup(requests.get(f'https://www.google.com/search?q={PERSON}').text).get_text()[:1024]
nlp = pipeline('question-answering',
model='deepset/roberta-base-squad2', # Using BERT roberta flavour for question and answering
tokenizer='deepset/roberta-base-squad2',
max_length=15)
nlp(f'Who is {PERSON}?', google_html)
{'score': 0.041184596717357635, 'start': 858, 'end': 868, 'answer': 'Researcher'}
Embedding with Sentence Transformer¶
The easiest approach to implement sentence similarity is through the sentence-transformers
library — which wraps most of this process into a few lines of code.
First, we install sentence-transformers
using pip install sentence-transformers
. This library uses HuggingFace’s transformers
behind the scenes — so we can actually find sentence-transformers models
We use bert-base-nli-mean-tokens
model. Let’s create some sentences, initialize our model, and encode the sentences:
from sentence_transformers import SentenceTransformer
model_bert = SentenceTransformer('bert-base-nli-mean-tokens')
- Example
corpus = ["Apple and orange are completely different from each other",
"Ocean temperature is rising rapidly",
"AI has taken the world by storm",
"Global warming is happening",
"The weather is not good to play golf today",
"Never compare an apple to an orange",
"People say I am a bookworm, in fact, I do not want to waste my time on TV",
"AI has transformed the way the world works",
"It is rainy today so we should postpone our golf game",
"I love reading books than watching TV"]
sentence_embeddings = model_bert.encode(corpus)
sentence_embeddings.shape
(10, 768)
df = pd.DataFrame()
for i1 in range(len(corpus)):
sim_all = []
for i2 in range(len(corpus)):
tmp = cosine_similarity([sentence_embeddings[i1]],[sentence_embeddings[i2]])
sim_all.append(tmp[0][0])
df[corpus[i1]] = sim_all
df.index = corpus
df
Apple and orange are completely different from each other | Ocean temperature is rising rapidly | AI has taken the world by storm | Global warming is happening | The weather is not good to play golf today | Never compare an apple to an orange | People say I am a bookworm, in fact, I do not want to waste my time on TV | AI has transformed the way the world works | It is rainy today so we should postpone our golf game | I love reading books than watching TV | |
---|---|---|---|---|---|---|---|---|---|---|
Apple and orange are completely different from each other | 1.000000 | 0.137347 | 0.453787 | 0.135274 | 0.537905 | 0.760247 | 0.274083 | 0.244242 | 0.443579 | 0.183743 |
Ocean temperature is rising rapidly | 0.137347 | 1.000000 | 0.487562 | 0.667538 | 0.265201 | 0.003454 | 0.091370 | 0.476548 | 0.095638 | 0.117352 |
AI has taken the world by storm | 0.453787 | 0.487562 | 1.000000 | 0.562536 | 0.475757 | 0.256307 | 0.209972 | 0.631133 | 0.432809 | 0.076726 |
Global warming is happening | 0.135274 | 0.667538 | 0.562536 | 1.000000 | 0.284419 | 0.049815 | 0.180935 | 0.652134 | 0.220875 | 0.111236 |
The weather is not good to play golf today | 0.537905 | 0.265201 | 0.475757 | 0.284419 | 1.000000 | 0.396641 | 0.367050 | 0.270868 | 0.729455 | 0.237244 |
Never compare an apple to an orange | 0.760247 | 0.003454 | 0.256307 | 0.049815 | 0.396641 | 1.000000 | 0.284058 | 0.110317 | 0.404155 | 0.145969 |
People say I am a bookworm, in fact, I do not want to waste my time on TV | 0.274083 | 0.091370 | 0.209972 | 0.180935 | 0.367050 | 0.284058 | 1.000000 | 0.272889 | 0.509787 | 0.729154 |
AI has transformed the way the world works | 0.244242 | 0.476548 | 0.631133 | 0.652134 | 0.270868 | 0.110317 | 0.272889 | 1.000000 | 0.185599 | 0.203464 |
It is rainy today so we should postpone our golf game | 0.443579 | 0.095638 | 0.432809 | 0.220875 | 0.729455 | 0.404155 | 0.509787 | 0.185599 | 1.000000 | 0.279056 |
I love reading books than watching TV | 0.183743 | 0.117352 | 0.076726 | 0.111236 | 0.237244 | 0.145969 | 0.729154 | 0.203464 | 0.279056 | 1.000000 |
def matrix_occure_prob(df,title,fontsize=11,vmin=-0.1, vmax=0.8,lable1='Sentence 1',pad=55,
lable2='Sentence 2',label='Cosine Similarity',rotation_x=90,axt=None,
num_ind=False,txtfont=6,lbl_font=9,shrink=0.8,cbar_per=False,
xline=False):
import matplotlib.pyplot as plt
"""Plot correlation matrix"""
ax = axt or plt.axes()
colmn1=list(df.columns)
colmn2=list(df.index)
corr=np.zeros((len(colmn2),len(colmn1)))
for l in range(len(colmn1)):
for l1 in range(len(colmn2)):
cc=df[colmn1[l]][df.index==colmn2[l1]].values[0]
try:
if len(cc)>1:
corr[l1,l]=cc[0]
except TypeError:
corr[l1,l]=cc
if num_ind:
ax.text(l, l1, str(round(cc,2)), va='center', ha='center',fontsize=txtfont)
im =ax.matshow(corr, cmap='jet', interpolation='nearest',vmin=vmin, vmax=vmax)
cbar =plt.colorbar(im,shrink=shrink,label=label)
if (cbar_per):
cbar.ax.set_yticklabels(['{:.0f}%'.format(x) for x in np.arange( 0,110,10)])
ax.set_xticks(np.arange(len(colmn1)))
ax.set_xticklabels(colmn1,fontsize=lbl_font)
ax.set_yticks(np.arange(len(colmn2)))
ax.set_yticklabels(colmn2,fontsize=lbl_font)
# Set ticks on both sides of axes on
ax.tick_params(axis="x", bottom=True, top=False, labelbottom=True, labeltop=False)
# Rotate and align bottom ticklabels
plt.setp([tick.label1 for tick in ax.xaxis.get_major_ticks()], rotation=rotation_x,
ha="right", va="center", rotation_mode="anchor")
# Rotate and align bottom ticklabels
plt.setp([tick.label1 for tick in ax.yaxis.get_major_ticks()], rotation=rotation_x,
ha="right", va="center", rotation_mode="anchor")
if xline:
x_labels = list(ax.get_xticklabels())
x_label_dict = dict([(x.get_text(), x.get_position()[0]) for x in x_labels])
for ix in xline:
plt.axvline(x=x_label_dict[ix]-0.5,linewidth =1.2,color='k', linestyle='--')
plt.axhline(y=x_label_dict[ix]-0.5,linewidth =1.2,color='k', linestyle='--')
plt.xlabel(lable1)
plt.ylabel(lable2)
ax.grid(color='k', linestyle='-', linewidth=0.05)
plt.title(f'{title}',fontsize=fontsize, pad=pad)
plt.show()
- Similarity Matrix
font = {'size' : 16}
plt.rc('font', **font)
fig, ax=plt.subplots(figsize=(15, 10), dpi= 110, facecolor='w', edgecolor='k')
matrix_occure_prob(df,title='Cosine Similarity Matrix by BERT Pre-trained Model ',lable1='',vmin=0, pad=10,axt=ax,
vmax=0.8,cbar_per=False,lable2='',num_ind=True,txtfont=15,xline=False,fontsize=22,
lbl_font=14,label='BERT Similarity',rotation_x=30)
Embedding with OpenAI Close source¶
OPENAI_API_KEY¶
# "OPENAI_API_KEY": notice, this is not a open soourse model like downloading from Huggingface
# we should have API_KEY that OpenAI send yo us that is persona API key.
openai.api_key = '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'
## Here are list of engines that OpenAI have for us
#openai.Engine.list().data
Find OpenAI Engines with 'embed' or 'search tasks¶
# Look at othe model that have either 'embed' or 'search tasks
[e for e in openai.Engine.list().data if 'embed' in e.id or 'search' in e.id][:5]
[<Engine engine id=text-search-babbage-doc-001 at 0x19e3afa0f90> JSON: { "object": "engine", "id": "text-search-babbage-doc-001", "ready": true, "owner": "openai-dev", "permissions": null, "created": null }, <Engine engine id=curie-search-query at 0x19e3afaf900> JSON: { "object": "engine", "id": "curie-search-query", "ready": true, "owner": "openai-dev", "permissions": null, "created": null }, <Engine engine id=text-search-babbage-query-001 at 0x19e3afafdb0> JSON: { "object": "engine", "id": "text-search-babbage-query-001", "ready": true, "owner": "openai-dev", "permissions": null, "created": null }, <Engine engine id=babbage-search-query at 0x19e3afafef0> JSON: { "object": "engine", "id": "babbage-search-query", "ready": true, "owner": "openai-dev", "permissions": null, "created": null }, <Engine engine id=babbage-search-document at 0x19e3afba450> JSON: { "object": "engine", "id": "babbage-search-document", "ready": true, "owner": "openai-dev", "permissions": null, "created": null }]
# We use embedding version of OpenAI model
ENGINE = 'text-embedding-ada-002'
To get OpenAI embedding, instead of downloading the model and using the model for embedding, there is only one line of the code as below. This could take time if we have hundreds of documents.
# Token level embedding
doc_embeddings = [get_embedding(doc_, engine=ENGINE) for doc_ in corpus[0]]
np.array(doc_embeddings).shape
(57, 1536)
# Sentence level embedding
que_embedding = np.array(get_embedding(corpus[0], engine=ENGINE))
que_embedding.shape
(1536,)
que_embedding
array([ 0.03064561, -0.00129054, 0.01672037, ..., -0.00225058, -0.01147007, 0.00403215])
df = pd.DataFrame()
for i1 in range(len(corpus)):
sim_all = []
for i2 in range(len(corpus)):
a = np.array(get_embedding(corpus[i1], engine=ENGINE))
b = np.array(get_embedding(corpus[i2], engine=ENGINE))
tmp = cosine_similarity([a],[b])
sim_all.append(tmp[0][0])
df[corpus[i1]] = sim_all
df.index = corpus
df
Apple and orange are completely different from each other | Ocean temperature is rising rapidly | AI has taken the world by storm | Global warming is happening | The weather is not good to play golf today | Never compare an apple to an orange | People say I am a bookworm, in fact, I do not want to waste my time on TV | AI has transformed the way the world works | It is rainy today so we should postpone our golf game | I love reading books than watching TV | |
---|---|---|---|---|---|---|---|---|---|---|
Apple and orange are completely different from each other | 1.000000 | 0.721925 | 0.757615 | 0.722427 | 0.729652 | 0.889243 | 0.726636 | 0.762895 | 0.714181 | 0.731058 |
Ocean temperature is rising rapidly | 0.721925 | 1.000000 | 0.784082 | 0.892217 | 0.777767 | 0.722099 | 0.712233 | 0.767791 | 0.758493 | 0.733083 |
AI has taken the world by storm | 0.757615 | 0.784082 | 1.000000 | 0.786044 | 0.733483 | 0.758534 | 0.741317 | 0.956868 | 0.738864 | 0.743097 |
Global warming is happening | 0.722427 | 0.892217 | 0.786044 | 1.000000 | 0.800868 | 0.733273 | 0.731057 | 0.786299 | 0.787260 | 0.740770 |
The weather is not good to play golf today | 0.729652 | 0.777767 | 0.733483 | 0.800868 | 1.000000 | 0.744926 | 0.740638 | 0.722606 | 0.911872 | 0.722797 |
Never compare an apple to an orange | 0.889243 | 0.722099 | 0.758534 | 0.733273 | 0.744926 | 1.000000 | 0.753426 | 0.764837 | 0.736881 | 0.756621 |
People say I am a bookworm, in fact, I do not want to waste my time on TV | 0.726636 | 0.712233 | 0.741317 | 0.731057 | 0.740638 | 0.753426 | 1.000000 | 0.742136 | 0.752316 | 0.900446 |
AI has transformed the way the world works | 0.762895 | 0.767791 | 0.956868 | 0.786299 | 0.722606 | 0.764837 | 0.742136 | 1.000000 | 0.734247 | 0.744783 |
It is rainy today so we should postpone our golf game | 0.714181 | 0.758493 | 0.738864 | 0.787260 | 0.911872 | 0.736881 | 0.752316 | 0.734247 | 1.000000 | 0.739054 |
I love reading books than watching TV | 0.731157 | 0.733083 | 0.743097 | 0.740770 | 0.722797 | 0.756621 | 0.900446 | 0.744783 | 0.739054 | 1.000000 |
- Similarity Matrix
font = {'size' : 16}
plt.rc('font', **font)
fig, ax=plt.subplots(figsize=(15, 10), dpi= 110, facecolor='w', edgecolor='k')
matrix_occure_prob(df,title=f'Cosine Similarity Matrix with OpenAI Close Source \n "{ENGINE}"',lable1='',vmin=0.5, pad=10,axt=ax,
vmax=1,cbar_per=False,lable2='',num_ind=True,txtfont=15,xline=False,fontsize=22,
lbl_font=14,label='BERT Similarity',rotation_x=30)
Load a Text Book¶
# textbook about animal: Title: Wild Animals I Have Known
text = urlopen("""https://www.gutenberg.org/cache/epub/3031/pg3031.txt""").read().decode() # open URL of the document and read
# split up the text book into paragraphs with new line character and only keep documents of at least 90 characters
doc = list(filter(lambda x: len(x) > 90, text.split('\r\n\r\n')))
doc = np.array(doc)
print(f'There are {len(doc)} paragraphs')
There are 645 paragraphs
doc[0]
'\ufeffThe Project Gutenberg eBook of Wild Animals I Have Known\r\n \r\nThis ebook is for the use of anyone anywhere in the United States and\r\nmost other parts of the world at no cost and with almost no restrictions\r\nwhatsoever. You may copy it, give it away or re-use it under the terms\r\nof the Project Gutenberg License included with this ebook or online\r\nat www.gutenberg.org. If you are not located in the United States,\r\nyou will have to check the laws of the country where you are located\r\nbefore using this eBook.'
Sentence Transformer¶
model_bert = SentenceTransformer('bert-base-nli-mean-tokens')
sentence_embeddings = model_bert.encode(doc)
sentence_embeddings.shape
(645, 768)
QUERY = """What is snare?""" # a natural language query
sentence_embeddings_QUERY = model_bert.encode(QUERY)
df=pd.DataFrame()
sim_all=[]
for i1 in range(len(doc)):
tmp=cosine_similarity([sentence_embeddings_QUERY],[sentence_embeddings[i1]])
sim_all.append(tmp[0][0])
sim_all_rank = len(sim_all) - ss.rankdata(sim_all)
rank_1 = np.where(sim_all_rank==1)[0][0] - 1
rank_2 = np.where(sim_all_rank==2)[0][0] - 1
rank_3 = np.where(sim_all_rank==3)[0][0] - 1
# Encode the QUERY using the bi-encoder and convert to a tensor and find relevant documents
que_embedding = model_bert.encode(QUERY, convert_to_tensor=True)
# Give number of documents to retrieve with the bi-encoder
# top_k is top documents similar to query by descending Cosine similarity
cos_sim_score = util.semantic_search(que_embedding, sentence_embeddings, top_k=3)[0]
cos_sim_score
[{'corpus_id': 194, 'score': 0.502781331539154}, {'corpus_id': 195, 'score': 0.4717433750629425}, {'corpus_id': 459, 'score': 0.4647967219352722}]
print(f'QUERY: {QUERY}\n')
for no, ir in enumerate(cos_sim_score):
print(f'Document {no + 1}: Cosine Similarity (sentence embedding) is {ir["score"]:.3f}:\n\n{doc[ir["corpus_id"]]}')
print('\n')
QUERY: What is snare? Document 1: Cosine Similarity (sentence embedding) is 0.503: "Ye Franckelyn's dogge leaped over a style, And yey yclept him lyttel Bingo, B-I-N-G-O, Document 2: Cosine Similarity (sentence embedding) is 0.472: And he yclept ytte rare goode Stingo. Now ys not this a prettye rhyme, I thynke ytte ys bye Jingo, J-I-N-G-O, Document 3: Cosine Similarity (sentence embedding) is 0.465: A snort of terror and a bound in the air gave Tom the chance to add the double hitch. The loop flashed up the line, and snake-like bound those mighty hoofs.
As can be seen, bert-base-nli-mean-tokens
embedding is not efficient to find proper answer for asked questions. Hence, OpenAI embedding is applied (see next section).
OpenAI for Embedding¶
# We use embedding version of OpenAI model
ENGINE = 'text-embedding-ada-002'
QUERY = """What is snare?""" # a natural language query
The query (question) is encoded using OpenAI.
que_embedding = np.array(get_embedding(QUERY, engine=ENGINE))
The chunked document (text book) is encoded using OpenAI.
# This could take time if we have hundreds or thousands of documents
doc_embeddings = [get_embedding(document, engine=ENGINE) for document in doc]
# Transform list of lists to numpy
doc_embeddings = np.array(doc_embeddings)
doc_embeddings.shape
(645, 1536)
We can now use Sentence Transformers semantic:
cos_sim_score = util.semantic_search(que_embedding, doc_embeddings, top_k=3)[0]
cos_sim_score
[{'corpus_id': 144, 'score': 0.8674971581471483}, {'corpus_id': 310, 'score': 0.7881266888837749}, {'corpus_id': 264, 'score': 0.7866484835003427}]
que_embedding.shape
(1536,)
doc_embeddings.shape
(645, 1536)
print(f'QUERY: {QUERY}\n')
for no, ir in enumerate(cos_sim_score):
print(f'Document {no + 1}: Cosine Similarity (sentence embedding) is {ir["score"]:.3f}:\n\n{doc[ir["corpus_id"]]}')
print('\n')
QUERY: What is snare? Document 1: Cosine Similarity (sentence embedding) is 0.867: "A snare is something that looks like a creeper, but it doesn't grow and it's worse than all the hawks in the world," said Molly, glancing at the now far-away red-tail, "for there it hides night and day in the runway till the chance to catch you comes." Document 2: Cosine Similarity (sentence embedding) is 0.788: And the trick is to locate the mouse and seize him first and see him afterward. Vix soon made a spring, and in the middle of the bunch of dead grass that she grabbed was a field-mouse squeaking his last squeak. Document 3: Cosine Similarity (sentence embedding) is 0.787: Oh, unlucky thought! Oh, mad heedlessness born of long immunity! That fine sand was on the next wolftrap and in an instant I was a prisoner. Although not wounded, for the traps have no teeth, and my thick trapping gloves deadened the snap, I was firmly caught across the hand above the knuckles. Not greatly alarmed at this, I tried to reach the trap-wrench with my right foot. Stretching out at full length, face downward, I worked myself toward it, making my imprisoned arm as long and straight as possible. I could not see and reach at the same time, but counted on my toe telling me when I touched the little iron key to my fetters. My first effort was a failure; strain as I might at the chain my toe struck no metal. I swung slowly around my anchor, but still failed. Then a painfully taken observation showed I was much too far to the west. I set about working around, tapping blindly with my toe to discover the key. Thus wildly groping with my right foot I forgot about the other till there was a sharp 'clank' and the iron jaws of trap No. 5 closed tight on my left foot.
From the top document, we can answer the question:
nlp(QUERY, str(doc[cos_sim_score[1]['corpus_id']]))
{'score': 0.002211708575487137, 'start': 138, 'end': 139, 'answer': '\r'}
OpenAI is not an open-source so we do not have easy capability to fine-tune a model although OpenAI offers fine-tunning platform. However, we have to use their API and know how to use it.
Other Application of OpenAI¶
OpenAI has developed a variety of models; there are three types of models that OpenAI has worked on:
- Text Completion Models:
Models like GPT-3 fall into this category. These models are trained to generate human-like text based on a given prompt or context. They can be used for various natural language processing tasks, such as language translation, question answering, text summarization, and more.
- Chat Completion Models:
This seems to align closely with text completion models but with a specific emphasis on conversational interactions. Models like ChatGPT are designed to generate responses in a chat-like format. They can be used for creating chatbots, virtual assistants, and other conversational applications.
- Image Models:
OpenAI has also explored models for working with images. One notable example is DALL-E, which is capable of generating images from textual descriptions. Another example is CLIP, which can understand and generate textual descriptions for images. These models demonstrate the capability of AI to work with visual data.
OpenAI Completion¶
We can engineer a prompt to answer question based on a given context:
context = doc[cos_sim_score[0]['corpus_id']]
PROMPT = f"Answer the question by having the context below: \n\nContext: {context}\nQuery: {QUERY}\nAnswer:"
print(PROMPT)
Answer the question by having the context below: Context: "A snare is something that looks like a creeper, but it doesn't grow and it's worse than all the hawks in the world," said Molly, glancing at the now far-away red-tail, "for there it hides night and day in the runway till the chance to catch you comes." Query: What is snare? Answer:
Call the OpenAI API to extract the answer from the engineered prompt.
ENGINE_ = 'gpt-3.5-turbo-instruct'
answer = openai.Completion.create(
model=ENGINE_,
prompt=PROMPT,
temperature=1,
max_tokens=50,
top_p=0,
frequency_penalty=0,
presence_penalty=0
)
answer
<OpenAIObject text_completion id=cmpl-8ZmjUzAWkmqR1HjIRvYMi90pkIxxu at 0x19e3c00c680> JSON: { "id": "cmpl-8ZmjUzAWkmqR1HjIRvYMi90pkIxxu", "object": "text_completion", "created": 1703539500, "model": "gpt-3.5-turbo-instruct", "choices": [ { "text": " A snare is a type of plant that resembles a creeper, but does not actually grow. It is considered to be more dangerous than all the hawks in the world because it hides in the runway and waits for the opportunity to catch its prey", "index": 0, "logprobs": null, "finish_reason": "length" } ], "usage": { "prompt_tokens": 89, "completion_tokens": 50, "total_tokens": 139 } }
# Get the completion
ans_comp = answer['choices'][0]["text"]
ans_comp
' A snare is a type of plant that resembles a creeper, but does not actually grow. It is considered to be more dangerous than all the hawks in the world because it hides in the runway and waits for the opportunity to catch its prey'
Now we engineer a prompt that answer the question below in fun way to a first grader:
PROMPT_FUN = f"Answer the question in a fun way for a first grader given the context belwo.\n\nContext: {context}\nQuery: {QUERY}\nAnswer:"
print(PROMPT_FUN)
Answer the question in a fun way for a first grader given the context belwo. Context: "A snare is something that looks like a creeper, but it doesn't grow and it's worse than all the hawks in the world," said Molly, glancing at the now far-away red-tail, "for there it hides night and day in the runway till the chance to catch you comes." Query: What is snare? Answer:
ENGINE_ = 'gpt-3.5-turbo-instruct'
answer = openai.Completion.create(
model=ENGINE_,
prompt=PROMPT_FUN,
temperature=1,
max_tokens=50,
top_p=0,
frequency_penalty=0,
presence_penalty=0
)
# Get the completion
ans_comp_fun = answer['choices'][0]["text"]
ans_comp_fun
" A snare is like a sneaky plant that doesn't grow, but it's even scarier than all the hawks in the world! It hides in the ground and waits for you to walk by so it can catch you!"
OpenAI ChatCompletion¶
Call the OpenAI API to extract the answer from the engineered prompt.
ENGINE_ = 'gpt-3.5-turbo'
answer = openai.ChatCompletion.create(
model=ENGINE_,
messages=[
{"role": "user", "content": PROMPT}
],
temperature=1,
max_tokens=50,
top_p=0,
frequency_penalty=0,
presence_penalty=0
)
# Get the completion
ans_chat = answer['choices'][0]["message"]["content"]
ans_chat
'A snare is something that resembles a creeper but does not grow. It is described as being worse than all the hawks in the world. Molly mentions that it hides in the runway day and night, waiting for an opportunity to catch someone.'
Now we engineer a prompt that answer the question below in fun way to a first grader:
PROMPT_FUN = f"Answer the question in a fun way for a first grader given the context belwo.\n\nContext: {context}\nQuery: {QUERY}\nAnswer:"
print(PROMPT_FUN)
Answer the question in a fun way for a first grader given the context belwo. Context: "A snare is something that looks like a creeper, but it doesn't grow and it's worse than all the hawks in the world," said Molly, glancing at the now far-away red-tail, "for there it hides night and day in the runway till the chance to catch you comes." Query: What is snare? Answer:
ENGINE_ = 'gpt-3.5-turbo'
answer = openai.ChatCompletion.create(
model=ENGINE_,
messages=[
{"role": "user", "content": PROMPT_FUN}
],
temperature=1,
max_tokens=50,
top_p=0,
frequency_penalty=0,
presence_penalty=0
)
# Get the completion
answer['choices'][0]["message"]["content"]
"A snare is like a sneaky creeper that doesn't grow, but it's even scarier than all the hawks in the world! It hides in the runway and waits for the perfect moment to catch you. So be careful and watch"
OpenAI Image¶
req_body = {"prompt":f"make an image of 'Snare' based on discription below:\n{ans_comp}",
"n":1,
"size":"512x512",
"top_p":1}
print ('Prompt is:\n')
print (req_body["prompt"])
Prompt is: make an image of 'Snare' based on discription below: A snare is a type of plant that resembles a creeper, but does not actually grow. It is considered to be more dangerous than all the hawks in the world because it hides in the runway and waits for the opportunity to catch its prey
# call OpenAI API
response = openai.Image.create (# creat a completed response based on a prompt
prompt=req_body["prompt"],
n=req_body["n"],
size=req_body["size"]
)
# response of image is JSON, we choose zero itrerator of urls
output_img = response["data"][0]["url"]
Image(url= output_img)
- Home
-
- Prediction of Movie Genre by Fine-tunning GPT
- Fine-tunning BERT for Fake News Detection
- Covid Tweet Classification by Fine-tunning BART
- Semantic Search Using BERT
- Abstractive Semantic Search by OpenAI Embedding
- Fine-tunning GPT for Style Completion
- Extractive Question-Answering by BERT
- Fine-tunning T5 Model for Abstract Title Prediction
- Image Captioning by Fine-tunning ViT
- Build Serverless ChatGPT API
- Statistical Analysis in Python
- Clustering Algorithms
- Customer Segmentation
- Time Series Forecasting
- PySpark Fundamentals for Big Data
- Predict Customer Churn
- Classification with Imbalanced Classes
- Feature Importance
- Feature Selection
- Text Similarity Measurement
- Dimensionality Reduction
- Prediction of Methane Leakage
- Imputation by LU Simulation
- Histogram Uncertainty
- Delustering to Improve Preferential Sampling
- Uncertainty in Spatial Correlation
-
- Machine Learning Overview
- Python and Pandas
- Main Steps of Machine Learning
- Classification
- Model Training
- Support Vector Machines
- Decision Trees
- Ensemble Learning & Random Forests
- Artificial Neural Network
- Deep Neural Network (DNN)
- Unsupervised Learning
- Multicollinearity
- Introduction to Git
- Introduction to R
- SQL Basic to Advanced Level
- Develop Python Package
- Introduction to BERT LLM
- Exploratory Data Analysis
- Object Oriented Programming in Python
- Natural Language Processing
- Convolutional Neural Network
- Publications