Playing with the new OpenAI Assistant and function calling in Python

7 min readNov 17, 2023

Note: since I wrote this article a while ago I created a very easy to deploy FastAPI version with full source code that is ready to deploy on AWS Elastic beanstalk with several demo’s that include uploading files, function calling and calling another Assistant inside a function call. Here’s the article (with Github Link)

I just updated our VicBot AI Assistant to support the new Assistant Structure that OpenAI released in Beta last week. I love it!

The rework was pretty serious but now that it is done I’m amazed at how easy it isto deploy new assistants tailored to specific data tasks we have here.

In the ‘old’ version I was already using functions to get data from Pitchbook and Salesforce to enhance the bots reasoning and access to data. But the more functions I added the less tokens I had left for doing useful stuff. Now, functions don’t count against the limit anymore, they live together with the assistant instructions in their own space with limits. This alone is a _huge_ improvement.

Assistants are now persistent inside OpenAI — you create them (either programmatically or in the OpenAI backend) and then instantiate them in your application, combined with a thread and a run. Each Assistant contains an instruction that can be 32k (!) tokens plus functions list if you enable tools.

Our assistants need to look up company website pages — a simple webscrape and interact with Salesforce and Pitchbook. For each of those I have created fairly simply lookup functions. The functions and their parameters, and the definition template have not changed. Here’s an example of the webscrape function. The template has only one parameter, the url to scrape.

WEBSCRAPE.PY

import html2text
import requests

def webscrape(info=None):
    # called without parameters it returns the required openai template.
    if info==None:
        return {
    "name": "webscrape", "description": "Get the text content of a webpage", 
    "parameters": { 
        "type": "object", 
        "properties": { "url": { "type": "string", "description": "The URL of the website to scrape" } }
    },
    "required" : ["url"]
}

    text= html2text.HTML2Text()
    text.ignore_links = True
    text.bypass_tables = False
    url = info["url"]
    header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'}
    if not url.startswith('http'):
        url = 'http://'+url
    try:
        h = requests.get(url, headers=header, allow_redirects=True, timeout=5)
    except:
       return None
    return

Notice the function name in template (“name”:”webscrape”) is the same as the function name (def webscrape). That is on purpose and that is the assumption the callTools() function works with!

And here is an example of a function that gets a Contact from Salesforce. Again — the function name in the template ‘getContact’ is the same as the function getContact()

SALESFORCE.PY

from simple_salesforce import Salesforce

getContactTemplate = { 
  "name": "getContact", 
  "description": "A contact in Salesforce", 
  "parameters": {
   "type": "object", 
   "properties" : { 
    "FirstName": { "type": "string", "description": "" }, 
    "LastName": { "type": "string", "description": "" }, 
    "FullName": { "type": "string", "description": "can be used to search by name" }, 
    "Title": { "type": "string", "description": "Title" },
    "LinkedIn": { "type": "string", "description": "" },
    "Email":{ "type":"string", "description":"" }, 
    'Company':{ 'type':'string', 'description':'' },
    "SalesforceID": { "type": "string", "description": "" }, 
    'Salesforce URL':{ 'type':'string', 'description':'' },
    }
  },
  "required": ["Email"]
}

def getContact(info=None):
 # faciliate the ability to search by fullname, email or first/last or company name
 # returns one or more contacts in the form defined above. 
if info == None:
    return getContactTemplate # return the openai template
  # find a contact or contacts in Salesforce using the contact template provided
  ret = { "ContactID": "Not found in Salesforce" }
  q= "SELECT id,FirstName,LastName,Email,AccountId,Account.Name Title FROM Contact Where "
  where = None
  if "FullName" in info and info["FullName"] != "" and ("Email" not in info or info["Email"] == ""):
    where = "Name = '" + info["FullName"] + "' LIMIT 1"
  elif "Email" in info and info["Email"] != "":
    where = "Email = '" + info["Email"] + "' LIMIT 1"
    where = "pbk__pbId__c = '" + info["pitchbookID"] + "' LIMIT 1"
  elif "FirstName" in info and info["FirstName"] != "" and "Company" in info and info["Company"] != "":
    where = "FirstName Like '" + info["FirstName"] + "' AND Account.Name Like '" + info["Company"] + "' LIMIT 1"
  elif "Company" in info and info["Company"] != "":
    where = "Where Account.Name Like '" + info["Company"]+"'"
  elif "AccountID" in info and info["AccountID"] != "":
    where = "Where Account.id = '" + info["AccountID"]+"'"
  elif "FirstName" in info and info["FirstName"] != "" and "LastName" in info and info["LastName"] != "":
    where = "FirstName Like '" + info["FirstName"] + "' AND LastName Like '" + info["LastName"] + "' LIMIT 1"
  if where != None:
    try:
      rec = sf.query_all(q+where)
    except Exception as err:
      rec= { 'records': [] }
      print(err)
    if len(rec['records']) > 0: 
      ret = []
      for rec in rec['records']:       
        ret.append( { 
            "SalesforceID": rec["Id"], 
            "FirstName": rec["FirstName"], 
            "LastName": rec["LastName"], 
            "FullName": rec["FirstName"] + " " + rec["LastName"],
            "Email": rec["Email"],
            "Title": rec["Title"],
            "Company": rec["Account"]["Name"]
        })
  else:
    print('SF contact search for returned 0 records') 
  return ret

We now have two python files with two different functions that we want to use with one or more Assistants. In some cases we will want to use both functions with an Assistant, in other cases only one. In general you’ll want to create the assistants programmatically because (go try!) doing a function definition in the OpenAI backend is no fun. So we need to automate the creation of a new Assistant with tools (functions) attached and at the same time automate the handling of the calling of those functions when the assistant tells us to.

I tried to make this process seamless by creating a Tools.py in my project where I collect the different tools (functions) I have, simply by including them with an import statement and where I handle the OpenAI function execution requests. It is surprisingly clean:

TOOLS.PY

from .salesforce import getContact # each function goes 
from .webscrape import webscrape  # here

import json
    
def getTools(array):
  # call this function to get your template json to attache to a new Assistant.
  # the array contains the function NAMES only
  # tool = getTools(["getContact", "webscrape"])

    tools = []
    l = globals()
    for a in array:
        f = None
        try:
            f = l[a]
        except:
            pass
        if f!=None:
            tools.append( {"type": "function","function" : f()})
    return tools

def callTools(tool_calls):
# the parameter comes straight from the openAI run 
    tool_outputs = []
    for t in tool_calls:
        functionName = t.function.name
        attributes = json.loads(t.function.arguments)
        try:
            functionResponse =globals()[functionName](attributes)
        except:
             # we just tell openAi we couldn't :)
            functionResponse = { "status" : 'Error in function call '+functionName+'('+t.function.arguments+')' }
        tool_outputs.append(  { "tool_call_id": t.id , "output": json.dumps(functionResponse) })
    return tool_outputs

getTools() is used to return the template array that OpenAI needs for the functions part of the Assistant. It collects the templates from the functions you tell it. getTools() without a parameter will return an array with ALL functions. getTools([“webscrape”,”getContact”]) returns the template array for both functions described before. Remember this works because I setup the function to return the template when called with NO parameter.

callTools() will be called from without the openAI run loop to handle function calls from any function that you have imported, automatically. So as you create more external tool functions you simply add them to the import statement in tools.py and they are ready to use.

Now we need to get the basics for creating or retrieving an Assistant.

OPENAICALLS.PY

from openai import OpenAI
import markdown

def getOpenaiClient():
    return OpenAI( api_key=settings.OPENAI_API_KEY)
   
def getAssistant(Name,tools=None, Instructions):
    # retrieve an OpenAI assistant by Name or create one if not found.
    # in that case you should add your getTools() call so that the assistant
    # is created with the right tools attached to it. 
    client = getOpenaiClient()
    aa = client.beta.assistants.list()
    assistant = None
    for a in aa.data:
        if a.name == Name:
            assistant = a
            break
    if assistant == None: # create a new Agent
        assistant = client.beta.assistants.create(
        name=Name,
        instructions=Instructions
        tools=tools,
        model="gpt-3.5-turbo-16k")
    return assistant

def applyMarkdown(text):
    # this make the openAi answer nice for HTML - optional
    extension_configs = {
        'markdown_link_attr_modifier': {
            'new_tab': 'on',
            'no_referrer': 'external_only',
            'auto_title': 'on',
        }
    }
    return markdown.markdown(text,extensions=['tables','markdown_link_attr_modifier'],extension_configs=extension_configs)

def runOpenai(thread_id,assistant_id,instructions=None):
    client = getOpenaiClient()
    run = client.beta.threads.runs.create(
        thread_id=thread_id,
        assistant_id= assistant_id,
        instructions= instructions
    )
    while True:
        run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run.id)
        if run.status == 'completed':
            break
        elif run.status == 'requires_action':
            print('requires action')
            tool_calls = run.required_action.submit_tool_outputs.tool_calls
            tool_outputs = callTools(tool_calls)
            run = client.beta.threads.runs.submit_tool_outputs(
                thread_id=thread_id,
                run_id=run.id,
                tool_outputs=tool_outputs
            )
        time.sleep(.1) 
    return run

So here’s an Assistant that can scan and summarize webpages. This one only uses one function, ‘webscrape’

from .openaicalls import runOpenai, applyMarkdown, getOpenaiClient, getAssistant

def scraperAssistant:
  
    client = getOpenaiClient()
    assistant = getAssistant("Webscraper", tools = getTools(["webscraper"]), 
                instructions="Your job is to retrieve a website text data and create a summary")
    # this will RETRIEVE the assistant, or create it. You will want to add instruction 
    # again if your Assistant is already create you don't need the extra parameters
    # and you could simply have assistant = getAssistant("Webscraper") 
 
    thread = client.beta.threads.create(messages=[] )
    client.beta.threads.messages.create(
        thread_id=thread.id,
        content="Website to work on today cnn.com and usatoday.com"
        role="user"
            )
    run = runOpenai(thread.id,assistant.id)
    messagages = client.beta.threads.messages.list(
    thread_id=thread.id)
    if run.status == 'completed':
       # here's openAI's response.
       response = messagages.data[0].content[0].text.value.strip()

If you want to continue the conversation you’ll add a message to thread and call runOpenai() again. In my server side app I run functions like scraperAssistant() in the background using redis. Each run is an assistant running a specific task.

You can tweak the instructions in openAI’s back end under https://platform.openai.com/assistants

You will notice there why you should not call ‘client.beta.assistants.create() every time — it will indeed create a new Assistant.

We are now using assistants to screen incoming Startup Runway Applications, by using an extensive prompt that covers all the criteria we already applied as humans and for our incoming Seed Round pitches as well. Over the next few weeks I hope to be adding automated e-mail sending to it (asking for missing information or more details, specific to the information already provided).

Give these Assistants a try — it’s pretty amazing to see them do their work. I would love to hear about your experience!

In the mean time I will be working on making the tool calling asynchronous. Think about it — now that threads, and run’s are by default persistent, assistants are basically ‘on their own’, especially if we handle the tool calling asynchronous. We start an Assistant with a task — it figures out it needs to use some tools (functions) — which could be new assistants (!) and as the results come we collect them and ultimately pass them back to the run. It makes a big part of Langchain obsolete if you ask me.

Playing with the new OpenAI Assistant and function calling in Python

Written by Jean-Luc Vanhulst

Responses (1)