AutoBuild

By: Linxin Song, Jieyu Zhang Reference: Agent AutoBuild

AutoGen offers conversable agents powered by LLM, tool, or human, which can be used to perform tasks collectively via automated chat. This framework allows tool use and human participation through multi-agent conversation. Please find documentation about this feature here.

In this notebook, we introduce a new class, AgentBuilder, to help user build an automatic task solving process powered by multi-agent system. Specifically, in build(), we prompt a LLM to create multiple participant agent and initialize a group chat, and specify whether this task need programming to solve. AgentBuilder also support open-source LLMs by vLLM and Fastchat. Check the supported model list here.

Requirement

AutoBuild require autogen[autobuild], which can be installed by the following command:

%pip install autogen[autobuild]

Step 1: prepare configuration and some useful functions

Prepare a config_file_or_env for assistant agent to limit the choice of LLM you want to use in this task. This config can be a path of json file or a name of environment variable. A default_llm_config is also required for initialize the specific config of LLMs like seed, temperature, etc…

import autogen
from autogen.agentchat.contrib.agent_builder import AgentBuilder

config_file_or_env = "OAI_CONFIG_LIST"
llm_config = {"temperature": 0}
config_list = autogen.config_list_from_json(config_file_or_env, filter_dict={"model": ["gpt-4-turbo", "gpt-4"]})


def start_task(execution_task: str, agent_list: list, coding=True):
    group_chat = autogen.GroupChat(
        agents=agent_list,
        messages=[],
        max_round=12,
        allow_repeat_speaker=agent_list[:-1] if coding is True else agent_list,
    )
    manager = autogen.GroupChatManager(
        groupchat=group_chat,
        llm_config={"config_list": config_list, **llm_config},
    )
    agent_list[0].initiate_chat(manager, message=execution_task)

Step 2: create a AgentBuilder

Create a AgentBuilder with the specified config_path_or_env. AgentBuilder will use gpt-4 in default to complete the whole process, you can specify the builder_model and agent_model to other OpenAI model to match your task. You can also specify an open-source LLM supporting by vLLM and FastChat, see blog for more details.

builder = AgentBuilder(
    config_file_or_env=config_file_or_env, builder_model=["gpt-4-turbo"], agent_model=["gpt-4-turbo"]
)

Step 3: specify a building task

Specify a building task with a general description. Building task will help build manager (a LLM) decide what agents should be built.

building_task = "Generate some agents that can find papers on arxiv by programming and analyzing them in specific domains related to computer science and medical science."

Step 4: build group chat agents

Use build() to let build manager (the specified builder_model) complete the group chat agents generation. If you think coding is necessary in your task, you can use coding=True to add a user proxy (an automatic code interpreter) into the agent list, like:

builder.build(building_task, default_llm_config, coding=True)

If coding is not specified, AgentBuilder will determine on its own whether the user proxy should be added or not according to the task.

agent_list, agent_configs = builder.build(building_task, llm_config)

==> Generating agents...
['DataMining_Expert', 'Bioinformatics_Expert', 'AI_ComputerScience_Expert'] are generated.
==> Generating system message...
Preparing system message for DataMining_Expert
Preparing system message for Bioinformatics_Expert
Preparing system message for AI_ComputerScience_Expert
==> Generating description...
Preparing description for DataMining_Expert
Preparing description for Bioinformatics_Expert
Preparing description for AI_ComputerScience_Expert
==> Creating agents...
Creating agent DataMining_Expert...
Creating agent Bioinformatics_Expert...
Creating agent AI_ComputerScience_Expert...
Adding user console proxy...

Step 5: execute task

Let agents generated in build() to complete the task collaboratively in a group chat.

start_task(
    execution_task="Find a recent paper about gpt-4 on arxiv and find its potential applications in software.",
    agent_list=agent_list,
    coding=agent_configs["coding"],
)

DataMining_Expert (to chat_manager):
Find a recent paper about gpt-4 on arxiv and find its potential applications in software.

--------------------------------------------------------------------------------

Next speaker: Computer_terminal

Computer_terminal (to chat_manager):

There is no python code from the last 1 message for me to execute. Group chat manager should let other participants to continue the conversation. If the group chat manager want to end the conversation, you should let other participant reply me only with "TERMINATE"

--------------------------------------------------------------------------------

Next speaker: AI_ComputerScience_Expert

AI_ComputerScience_Expert (to chat_manager):

To find a recent paper about GPT-4 on arXiv and explore its potential applications in software, we can utilize the arXiv API to search for papers related to "GPT-4". I can write a Python script to fetch this information. Let's proceed with that.

```python
import requests
from xml.etree import ElementTree

def search_arxiv(query, max_results=10):
    url = 'http://export.arxiv.org/api/query?search_query=all:' + query + '&start=0&max_results=' + str(max_results)
    response = requests.get(url)
    root = ElementTree.fromstring(response.content)
    papers = []
    for entry in root.findall('{http://www.w3.org/2005/Atom}entry'):
        title = entry.find('{http://www.w3.org/2005/Atom}title').text
        summary = entry.find('{http://www.w3.org/2005/Atom}summary').text
        papers.append({'title': title, 'summary': summary})
    return papers

# Search for GPT-4 related papers
papers = search_arxiv('GPT-4')
for paper in papers:
    print(f"Title: {paper['title']}\nSummary: {paper['summary']}\n")
```

This script will fetch the titles and summaries of papers related to GPT-4 from arXiv. We can then analyze these summaries to identify potential applications in software. Shall I proceed to execute this script?

--------------------------------------------------------------------------------

Next speaker: Computer_terminal


>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
Computer_terminal (to chat_manager):

exitcode: 0 (execution succeeded)
Code output: 
Title: Can LLMs like GPT-4 outperform traditional AI tools in dementia
  diagnosis? Maybe, but not today
Summary:   Recent investigations show that large language models (LLMs), specifically
GPT-4, not only have remarkable capabilities in common Natural Language
Processing (NLP) tasks but also exhibit human-level performance on various
professional and academic benchmarks. However, whether GPT-4 can be directly
used in practical applications and replace traditional artificial intelligence
(AI) tools in specialized domains requires further experimental validation. In
this paper, we explore the potential of LLMs such as GPT-4 to outperform
traditional AI tools in dementia diagnosis. Comprehensive comparisons between
GPT-4 and traditional AI tools are conducted to examine their diagnostic
accuracy in a clinical setting. Experimental results on two real clinical
datasets show that, although LLMs like GPT-4 demonstrate potential for future
advancements in dementia diagnosis, they currently do not surpass the
performance of traditional AI tools. The interpretability and faithfulness of
GPT-4 are also evaluated by comparison with real doctors. We discuss the
limitations of GPT-4 in its current state and propose future research
directions to enhance GPT-4 in dementia diagnosis.


Title: GPT-4 Can't Reason
Summary:   GPT-4 was released in March 2023 to wide acclaim, marking a very substantial
improvement across the board over GPT-3.5 (OpenAI's previously best model,
which had powered the initial release of ChatGPT). However, despite the
genuinely impressive improvement, there are good reasons to be highly skeptical
of GPT-4's ability to reason. This position paper discusses the nature of
reasoning; criticizes the current formulation of reasoning problems in the NLP
community, as well as the way in which LLM reasoning performance is currently
evaluated; introduces a small collection of 21 diverse reasoning problems; and
performs a detailed qualitative evaluation of GPT-4's performance on those
problems. Based on this analysis, the paper concludes that, despite its
occasional flashes of analytical brilliance, GPT-4 at present is utterly
incapable of reasoning.


Title: Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4
Summary:   Harnessing logical reasoning ability is a comprehensive natural language
understanding endeavor. With the release of Generative Pretrained Transformer 4
(GPT-4), highlighted as "advanced" at reasoning tasks, we are eager to learn
the GPT-4 performance on various logical reasoning tasks. This report analyses
multiple logical reasoning datasets, with popular benchmarks like LogiQA and
ReClor, and newly-released datasets like AR-LSAT. We test the multi-choice
reading comprehension and natural language inference tasks with benchmarks
requiring logical reasoning. We further construct a logical reasoning
out-of-distribution dataset to investigate the robustness of ChatGPT and GPT-4.
We also make a performance comparison between ChatGPT and GPT-4. Experiment
results show that ChatGPT performs significantly better than the RoBERTa
fine-tuning method on most logical reasoning benchmarks. With early access to
the GPT-4 API we are able to conduct intense experiments on the GPT-4 model.
The results show GPT-4 yields even higher performance on most logical reasoning
datasets. Among benchmarks, ChatGPT and GPT-4 do relatively well on well-known
datasets like LogiQA and ReClor. However, the performance drops significantly
when handling newly released and out-of-distribution datasets. Logical
reasoning remains challenging for ChatGPT and GPT-4, especially on
out-of-distribution and natural language inference datasets. We release the
prompt-style logical reasoning datasets as a benchmark suite and name it
LogiEval.


Title: How is ChatGPT's behavior changing over time?
Summary:   GPT-3.5 and GPT-4 are the two most widely used large language model (LLM)
services. However, when and how these models are updated over time is opaque.
Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on
several diverse tasks: 1) math problems, 2) sensitive/dangerous questions, 3)
opinion surveys, 4) multi-hop knowledge-intensive questions, 5) generating
code, 6) US Medical License tests, and 7) visual reasoning. We find that the
performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time.
For example, GPT-4 (March 2023) was reasonable at identifying prime vs.
composite numbers (84% accuracy) but GPT-4 (June 2023) was poor on these same
questions (51% accuracy). This is partly explained by a drop in GPT-4's amenity
to follow chain-of-thought prompting. Interestingly, GPT-3.5 was much better in
June than in March in this task. GPT-4 became less willing to answer sensitive
questions and opinion survey questions in June than in March. GPT-4 performed
better at multi-hop questions in June than in March, while GPT-3.5's
performance dropped on this task. Both GPT-4 and GPT-3.5 had more formatting
mistakes in code generation in June than in March. We provide evidence that
GPT-4's ability to follow user instructions has decreased over time, which is
one common factor behind the many behavior drifts. Overall, our findings show
that the behavior of the "same" LLM service can change substantially in a
relatively short amount of time, highlighting the need for continuous
monitoring of LLMs.


Title: Gpt-4: A Review on Advancements and Opportunities in Natural Language
  Processing
Summary:   Generative Pre-trained Transformer 4 (GPT-4) is the fourth-generation
language model in the GPT series, developed by OpenAI, which promises
significant advancements in the field of natural language processing (NLP). In
this research article, we have discussed the features of GPT-4, its potential
applications, and the challenges that it might face. We have also compared
GPT-4 with its predecessor, GPT-3. GPT-4 has a larger model size (more than one
trillion), better multilingual capabilities, improved contextual understanding,
and reasoning capabilities than GPT-3. Some of the potential applications of
GPT-4 include chatbots, personal assistants, language translation, text
summarization, and question-answering. However, GPT-4 poses several challenges
and limitations such as computational requirements, data requirements, and
ethical concerns.


Title: Is GPT-4 a Good Data Analyst?
Summary:   As large language models (LLMs) have demonstrated their powerful capabilities
in plenty of domains and tasks, including context understanding, code
generation, language generation, data storytelling, etc., many data analysts
may raise concerns if their jobs will be replaced by artificial intelligence
(AI). This controversial topic has drawn great attention in public. However, we
are still at a stage of divergent opinions without any definitive conclusion.
Motivated by this, we raise the research question of "is GPT-4 a good data
analyst?" in this work and aim to answer it by conducting head-to-head
comparative studies. In detail, we regard GPT-4 as a data analyst to perform
end-to-end data analysis with databases from a wide range of domains. We
propose a framework to tackle the problems by carefully designing the prompts
for GPT-4 to conduct experiments. We also design several task-specific
evaluation metrics to systematically compare the performance between several
professional human data analysts and GPT-4. Experimental results show that
GPT-4 can achieve comparable performance to humans. We also provide in-depth
discussions about our results to shed light on further studies before reaching
the conclusion that GPT-4 can replace data analysts.


Title: Graph Neural Architecture Search with GPT-4
Summary:   Graph Neural Architecture Search (GNAS) has shown promising results in
automatically designing graph neural networks. However, GNAS still requires
intensive human labor with rich domain knowledge to design the search space and
search strategy. In this paper, we integrate GPT-4 into GNAS and propose a new
GPT-4 based Graph Neural Architecture Search method (GPT4GNAS for short). The
basic idea of our method is to design a new class of prompts for GPT-4 to guide
GPT-4 toward the generative task of graph neural architectures. The prompts
consist of descriptions of the search space, search strategy, and search
feedback of GNAS. By iteratively running GPT-4 with the prompts, GPT4GNAS
generates more accurate graph neural networks with fast convergence.
Experimental results show that embedding GPT-4 into GNAS outperforms the
state-of-the-art GNAS methods.


Title: Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with
  Code-based Self-Verification
Summary:   Recent progress in large language models (LLMs) like GPT-4 and PaLM-2 has
brought significant advancements in addressing math reasoning problems. In
particular, OpenAI's latest version of GPT-4, known as GPT-4 Code Interpreter,
shows remarkable performance on challenging math datasets. In this paper, we
explore the effect of code on enhancing LLMs' reasoning capability by
introducing different constraints on the \textit{Code Usage Frequency} of GPT-4
Code Interpreter. We found that its success can be largely attributed to its
powerful skills in generating and executing code, evaluating the output of code
execution, and rectifying its solution when receiving unreasonable outputs.
Based on this insight, we propose a novel and effective prompting method,
explicit \uline{c}ode-based \uline{s}elf-\uline{v}erification~(CSV), to further
boost the mathematical reasoning potential of GPT-4 Code Interpreter. This
method employs a zero-shot prompt on GPT-4 Code Interpreter to encourage it to
use code to self-verify its answers. In instances where the verification state
registers as ``False'', the model shall automatically amend its solution,
analogous to our approach of rectifying errors during a mathematics
examination. Furthermore, we recognize that the states of the verification
result indicate the confidence of a solution, which can improve the
effectiveness of majority voting. With GPT-4 Code Interpreter and CSV, we
achieve an impressive zero-shot accuracy on MATH dataset \textbf{(53.9\% $\to$
84.3\%)}.


Title: OpenAI Cribbed Our Tax Example, But Can GPT-4 Really Do Tax?
Summary:   The authors explain where OpenAI got the tax law example in its livestream
demonstration of GPT-4, why GPT-4 got the wrong answer, and how it fails to
reliably calculate taxes.


Title: GPT-4 Understands Discourse at Least as Well as Humans Do
Summary:   We test whether a leading AI system GPT-4 understands discourse as well as
humans do, using a standardized test of discourse comprehension. Participants
are presented with brief stories and then answer eight yes/no questions probing
their comprehension of the story. The questions are formatted to assess the
separate impacts of directness (stated vs. implied) and salience (main idea vs.
details). GPT-4 performs slightly, but not statistically significantly, better
than humans given the very high level of human performance. Both GPT-4 and
humans exhibit a strong ability to make inferences about information that is
not explicitly stated in a story, a critical test of understanding.




--------------------------------------------------------------------------------

Next speaker: AI_ComputerScience_Expert

AI_ComputerScience_Expert (to chat_manager):

The search results from arXiv provide a diverse range of papers discussing the capabilities and applications of GPT-4. Here are some potential applications in software based on the summaries:

1. **Dementia Diagnosis**: The first paper discusses the use of GPT-4 in dementia diagnosis, comparing its performance with traditional AI tools. Although it currently does not surpass traditional methods, it shows potential for future advancements in medical diagnostics.

2. **Logical Reasoning**: The third paper evaluates GPT-4's performance on logical reasoning tasks. It highlights that while GPT-4 shows improvements over previous models, it still struggles with out-of-distribution datasets. This suggests applications in enhancing reasoning capabilities in software systems that require robust decision-making.

3. **Data Analysis**: The paper titled "Is GPT-4 a Good Data Analyst?" explores GPT-4's capabilities in performing end-to-end data analysis. This indicates potential applications in software tools for data analytics, where GPT-4 could assist or augment human data analysts.

4. **Graph Neural Architecture Search**: The integration of GPT-4 in designing graph neural networks, as discussed in the "Graph Neural Architecture Search with GPT-4" paper, showcases its application in automating and optimizing the design of complex network architectures in software.

5. **Math Word Problems**: The paper on solving challenging math word problems using GPT-4's code interpreter suggests applications in educational software, particularly in developing tools that assist in learning and solving mathematical problems.

These applications demonstrate GPT-4's potential to enhance various aspects of software, from improving diagnostic tools in healthcare to optimizing data analysis and network design in technical fields.

--------------------------------------------------------------------------------

Next speaker: DataMining_Expert

DataMining_Expert (to chat_manager):

The applications outlined by the AI_ComputerScience_Expert indeed highlight the versatility and potential of GPT-4 in various software domains. To further validate these applications, we could consider setting up experiments or simulations that specifically test GPT-4's performance in these areas. For instance, in the context of dementia diagnosis, we could simulate a diagnostic process using GPT-4 and compare its accuracy and efficiency against traditional AI tools. Similarly, for data analysis and graph neural architecture search, we could benchmark GPT-4 against current state-of-the-art methods to quantitatively assess its improvements or shortcomings.

These practical evaluations would provide a more concrete understanding of how GPT-4 can be integrated into software solutions and its potential impact on improving functionalities and user experiences. If needed, I can assist in designing these experiments or simulations to ensure they are robust and provide meaningful insights.

--------------------------------------------------------------------------------

Next speaker: AI_ComputerScience_Expert

AI_ComputerScience_Expert (to chat_manager):

Absolutely, setting up experiments or simulations to test GPT-4's performance in specific applications would be a crucial step in validating its practical utility and integration into software solutions. For the dementia diagnosis application, we could use a dataset of clinical cases to evaluate the model's diagnostic accuracy and compare it with traditional AI systems. This would involve not only accuracy but also examining aspects like false positives and negatives, which are critical in medical diagnostics.

For data analysis, we could design a set of tasks that mimic real-world data analysis scenarios. These tasks could include data cleaning, exploration, visualization, and predictive modeling. GPT-4's performance can be evaluated based on its accuracy, efficiency, and the insights it generates compared to human data analysts or other AI tools.

In the case of graph neural architecture search, we could use standard datasets and benchmarks in the field to test the effectiveness of the architectures designed by GPT-4. Metrics such as the time taken to design the architecture, performance of the designed network on test data, and comparison with architectures designed by human experts or other automated systems would be valuable.

These experiments would not only help in understanding GPT-4's capabilities but also in identifying areas where it might need further improvement. If you need assistance with the statistical analysis or the setup of these experiments, I can contribute with my expertise in programming and data analysis to ensure that the experiments are conducted efficiently and the results are analyzed correctly.

--------------------------------------------------------------------------------

Next speaker: DataMining_Expert

DataMining_Expert (to chat_manager):

The proposed experimental setups by the AI_ComputerScience_Expert are well-thought-out and would indeed provide valuable insights into GPT-4's capabilities across different domains. To support these experiments, I can contribute by developing data mining scripts that efficiently gather and preprocess the necessary data from various sources. For instance, for the dementia diagnosis application, we can mine patient data, symptoms, and diagnostic results to create a comprehensive dataset for testing GPT-4.

Additionally, for the data analysis tasks, I can help automate the process of data cleaning and preparation, which is crucial for ensuring the accuracy of the results. This involves handling missing data, normalizing datasets, and encoding categorical variables, which are common tasks in data analysis that can be automated using Python scripts.

For the graph neural architecture search, I can assist in mining existing literature and datasets to find relevant benchmarks and performance metrics that can be used to evaluate the architectures designed by GPT-4. This would involve not only retrieving data but also analyzing it to extract meaningful patterns and insights that can guide the experimental setup.

By combining our expertise in AI, data mining, and bioinformatics, we can ensure that the experiments are not only well-designed but also supported by robust data handling and analysis methodologies. This collaborative approach will enhance the reliability and validity of the findings, providing a solid foundation for assessing GPT-4's practical applications in software.

--------------------------------------------------------------------------------

Next speaker: Bioinformatics_Expert

Bioinformatics_Expert (to chat_manager):

The collaborative approach outlined by the DataMining_Expert is essential for the success of these experiments. By leveraging our combined expertise in AI, data mining, and bioinformatics, we can ensure that the experimental setups are robust and the data used is of high quality and relevance.

For the dementia diagnosis application, integrating comprehensive patient data and diagnostic results will allow us to simulate realistic scenarios where GPT-4's diagnostic capabilities can be rigorously tested. This will help in assessing not only its accuracy but also its reliability and potential as a supportive tool in medical diagnostics.

In the data analysis tasks, automating the data preparation process will significantly enhance the efficiency of the experiments. It will allow us to focus on evaluating GPT-4's performance in generating insights and making predictions, which are critical aspects of data analysis.

For the graph neural architecture search, having access to relevant benchmarks and performance metrics is crucial. The data mining efforts to gather and analyze existing literature and datasets will provide a solid basis for evaluating the effectiveness of the architectures designed by GPT-4.

Overall, this collaborative effort will enable us to conduct comprehensive and meaningful experiments that will provide insights into GPT-4's capabilities and limitations. This will not only contribute to the academic and scientific community but also guide future developments and applications of AI in software solutions. If there are no further inputs or adjustments needed, we can proceed with the planning and execution of these experiments.

--------------------------------------------------------------------------------

Next speaker: Bioinformatics_Expert

Bioinformatics_Expert (to chat_manager):

TERMINATE

--------------------------------------------------------------------------------

Step 6 (Optional): clear all agents and prepare for the next task

You can clear all agents generated in this task by the following code if your task is completed or the next task is largely different from the current task. If the agent’s backbone is an open-source LLM, this process will also shut down the endpoint server. If necessary, you can use recycle_endpoint=False to retain the previous open-source LLMs’ endpoint server.

builder.clear_all_agents(recycle_endpoint=True)

All agents have been cleared.

Save & load configs

You can save all necessary information of the built group chat agents. Here is a case for those agents generated in the above task:

{
    "building_task": "Generate some agents that can find papers on arxiv by programming and analyzing them in specific domains related to computer science and medical science.",
    "agent_configs": [
        {
            "name": "ArXiv_Data_Scraper_Developer",
            "model": "gpt-4-1106-preview",
            "system_message": "You are now in a group chat. You need to complete a task with other participants. As an ArXiv_Data_Scraper_Developer, your focus is to create and refine tools capable of intelligent search and data extraction from arXiv, honing in on topics within the realms of computer science and medical science. Utilize your proficiency in Python programming to design scripts that navigate, query, and parse information from the platform, generating valuable insights and datasets for analysis. \n\nDuring your mission, it\u2019s not just about formulating queries; your role encompasses the optimization and precision of the data retrieval process, ensuring relevance and accuracy of the information extracted. If you encounter an issue with a script or a discrepancy in the expected output, you are encouraged to troubleshoot and offer revisions to the code you find in the group chat.\n\nWhen you reach a point where the existing codebase does not fulfill task requirements or if the operation of provided code is unclear, you should ask for help from the group chat manager. They will facilitate your advancement by providing guidance or appointing another participant to assist you. Your ability to adapt and enhance scripts based on peer feedback is critical, as the dynamic nature of data scraping demands ongoing refinement of techniques and approaches.\n\nWrap up your participation by confirming the user's need has been satisfied with the data scraping solutions you've provided. Indicate the completion of your task by replying \"TERMINATE\" in the group chat.",
            "description": "ArXiv_Data_Scraper_Developer is a specialized software development role requiring proficiency in Python, including familiarity with web scraping libraries such as BeautifulSoup or Scrapy, and a solid understanding of APIs and data parsing. They must possess the ability to identify and correct errors in existing scripts and confidently engage in technical discussions to improve data retrieval processes. The role also involves a critical eye for troubleshooting and optimizing code to ensure efficient data extraction from the ArXiv platform for research and analysis purposes."
        },
        {
            "name": "Computer_Science_Research_Analyst",
            "model": "gpt-4-1106-preview",
            "system_message": "You are now in a group chat. You need to complete a task with other participants. As a Computer Science Research Analyst, your objective is to utilize your analytical capabilities to identify and examine scholarly articles on arXiv, focusing on areas bridging computer science and medical science. Employ Python for automation where appropriate and leverage your expertise in the subject matter to draw insights from the research.\n\nEnsure that the information is acquired systematically; tap into online databases, interpret data sets, and perform literature reviews to pinpoint relevant findings. Should you encounter a complex problem or if you find your progress stalled, feel free to question the existing approaches discussed in the chat or contribute an improved method or analysis.\n\nIf the task proves to be beyond your current means or if you face uncertainty at any stage, seek assistance from the group chat manager. The manager is available to provide guidance or to involve another expert if necessary to move forward effectively.\n\nYour contributions are crucial, and it is important to communicate your findings and conclusions clearly. Once you believe the task is complete and the group's need has been satisfied, please affirm the completion by replying \"TERMINATE\".",
            "description": "Computer_Science_Research_Analyst is a role requiring strong analytical skills, a deep understanding of computer science concepts, and proficiency in Python for data analysis and automation. This position should have the ability to critically assess the validity of information, challenge assumptions, and provide evidence-based corrections or alternatives. They should also have excellent communication skills to articulate their findings and suggestions effectively within the group chat."
        },
        {
            "name": "Medical_Science_Research_Analyst",
            "model": "gpt-4-1106-preview",
            "system_message": "You are now in a group chat. You need to complete a task with other participants. As a Medical_Science_Research_Analyst, your function is to harness your analytical strengths and understanding of medical research to source and evaluate pertinent papers from the arXiv database, focusing on the intersection of computer science and medical science. Utilize your Python programming skills to automate data retrieval and analysis tasks. Engage in systematic data mining to extract relevant content, then apply your analytical expertise to interpret the findings qualitatively. \n\nWhen there is a requirement to gather information, employ Python scripts to automate the aggregation process. This could include scraping web data, retrieving and processing documents, and performing content analyses. When these scripts produce outputs, use your subject matter expertise to evaluate the results. \n\nProgress through your task step by step. When an explicit plan is absent, present a structured outline of your intended methodology. Clarify which segments of the task are handled through automation, and which necessitate your interpretative skills. \n\nIn the event code is utilized, the script type must be specified. You are expected to execute the scripts provided without making changes. Scripts are to be complete and functionally standalone. Should you encounter an error upon execution, critically review the output, and if needed, present a revised script for the task at hand. \n\nFor tasks that require saving and executing scripts, indicate the intended filename at the beginning of the script. \n\nMaintain clear communication of the results by harnessing the 'print' function where applicable. If an error arises or a task remains unsolved after successful code execution, regroup to collect additional information, reassess your approach, and explore alternative strategies. \n\nUpon reaching a conclusion, substantiate your findings with credible evidence where possible.\n\nConclude your participation by confirming the task's completion with a \"TERMINATE\" response.\n\nShould uncertainty arise at any point, seek guidance from the group chat manager for further directives or reassignment of the task.",
            "description": "The Medical Science Research Analyst is a professionally trained individual with strong analytical skills, specializing in interpreting and evaluating scientific research within the medical field. They should possess expertise in data analysis, likely with proficiency in Python for analyzing datasets, and have the ability to critically assess the validity and relevance of previous messages or findings relayed in the group chat. This role requires a solid foundation in medical knowledge to provide accurate and evidence-based corrections or insights."
        },
        {
            "name": "Data_Analysis_Engineer",
            "model": "gpt-4-1106-preview",
            "system_message": "You are now in a group chat. You need to complete a task with other participants. As a Data Analysis Engineer, your role involves leveraging your analytical skills to gather, process, and analyze large datasets. You will employ various data analysis techniques and tools, particularly Python for scripting, to extract insights from the data related to computer science and medical science domains on arxiv.\n\nIn scenarios where information needs to be collected or analyzed, you will develop Python scripts to automate the data retrieval and processing tasks. For example, you may write scripts to scrape the arXiv website, parse metadata of research papers, filter content based on specific criteria, and perform statistical analysis or data visualization. \n\nYour workflow will include the following steps:\n\n1. Use your Python coding abilities to design scripts for data extraction and analysis. This can involve browsing or searching the web, downloading and reading files, or printing the content of web pages or files relevant to the given domains.\n2. After gathering the necessary data, apply your data analysis expertise to derive meaningful insights or patterns present in the data. This should be done methodically, making the most of your Python skills for data manipulation and interpretation.\n3. Communicate your findings clearly to the group chat. Ensure the results are straightforward for others to understand and act upon.\n4. If any issues arise from executing the code, such as lack of output or unexpected results, you can question the previous messages or code in the group chat and attempt to provide a corrected script or analysis.\n5. When uncertain or facing a complex problem that you cannot solve alone, ask for assistance from the group chat manager. They can either provide guidance or assign another participant to help you.\n\nOnce you believe the task is completed satisfactorily, and you have fulfilled the user's need, respond with \"TERMINATE\" to signify the end of your contribution to the task. Remember, while technical proficiency in Python is essential for this role, the ability to work collaboratively within the group chat, communicate effectively, and adapt to challenges is equally important.",
            "description": "Data_Analysis_Engineer is a professional adept in collecting, analyzing, and interpreting large datasets, using statistical tools and machine learning techniques to provide actionable insights. They should possess strong Python coding skills for data manipulation and analysis, an understanding of database management, as well as the ability to communicate complex results effectively to non-technical stakeholders. This position should be allowed to speak when data-driven clarity is needed or when existing analyses or methodologies are called into question."
        },
        {
            "name": "ML_Paper_Summarization_Specialist",
            "model": "gpt-4-1106-preview",
            "system_message": "You are now in a group chat. You need to complete a task with other participants. As an ML_Paper_Summarization_Specialist, your role entails leveraging machine learning techniques to extract and analyze academic papers from arXiv, focusing on domains that intersect computer science and medical science. Utilize your expertise in natural language processing and data analysis to identify relevant papers, extract key insights, and generate summaries that accurately reflect the advancements and findings within those papers.\n\nYou are expected to apply your deep understanding of machine learning algorithms, data mining, and information retrieval to construct models and systems that can efficiently process and interpret scientific literature.\n\nIf you encounter any challenges in accessing papers, parsing content, or algorithmic processing, you may seek assistance by presenting your issue to the group chat. Should there be a disagreement regarding the efficacy of a method or the accuracy of a summarization, you are encouraged to critically evaluate previous messages or outputs and offer improved solutions to enhance the group's task performance.\n\nShould confusion arise during the task, rather than relying on coding scripts, please request guidance from the group chat manager, and allow them to facilitate the necessary support by inviting another participant who can aid in overcoming the current obstacle.\n\nRemember, your primary duty is to synthesize complex academic content into concise, accessible summaries that will serve as a valuable resource for researchers and professionals seeking to stay abreast of the latest developments in their respective fields. \n\nOnce you believe your task is completed and the summaries provided meet the necessary standards of accuracy and comprehensiveness, reply \"TERMINATE\" to signal the end of your contribution to the group's task.",
            "description": "The ML_Paper_Summarization_Specialist is a professional adept in machine learning concepts and current research trends, with strong analytical skills to critically evaluate information, synthesizing knowledge from academic papers into digestible summaries. This specialist should be proficient in Python for text processing and have the ability to provide constructive feedback on technical discussions, guide effective implementation, and correct misconceptions or errors related to machine learning theory and practice in the chat. They should be a reliable resource for clarifying complex information and ensuring accurate application of machine learning techniques within the group chat context."
        }
    ],
    "coding": true,
    "default_llm_config": {
        "temperature": 0
    },
    "code_execution_config": {
        "work_dir": "groupchat",
        "use_docker": false,
        "timeout": 60,
        "last_n_messages": 2
    }
}

These information will be saved in JSON format. You can provide a specific filename, otherwise, AgentBuilder will save config to the current path with a generated filename ‘save_config_TASK_MD5.json’.

saved_path = builder.save()

Building config saved to ./save_config_c52224ebd16a2e60b348f3f04ac15e79.json

After that, you can load the saved config and skip the building process. AgentBuilder will create agents with those information without prompting the builder manager.

new_builder = AgentBuilder(config_file_or_env=config_file_or_env)
agent_list, agent_configs = new_builder.load(
    "./save_config_c52224ebd16a2e60b348f3f04ac15e79.json"
)  # load previous agent configs
start_task(
    execution_task="Find a recent paper about LLaVA on arxiv and find its potential applications in computer vision.",
    agent_list=agent_list,
)
new_builder.clear_all_agents()

Loading config from ./save_config_c52224ebd16a2e60b348f3f04ac15e79.json
==> Creating agents...
Creating agent DataMining_Expert...
Creating agent Bioinformatics_Expert...
Creating agent AI_ComputerScience_Expert...
Adding user console proxy...
DataMining_Expert (to chat_manager):

Find a recent paper about LLaVA on arxiv and find its potential applications in computer vision.

--------------------------------------------------------------------------------

Next speaker: Computer_terminal

Computer_terminal (to chat_manager):

There is no python code from the last 1 message for me to execute. Group chat manager should let other participants to continue the conversation. If the group chat manager want to end the conversation, you should let other participant reply me only with "TERMINATE"

--------------------------------------------------------------------------------

Next speaker: AI_ComputerScience_Expert

AI_ComputerScience_Expert (to chat_manager):

To assist with the request on finding a recent paper about LLaVA on arXiv and exploring its potential applications in computer vision, I will perform a search on arXiv for the most recent papers related to LLaVA and analyze any mentioned applications in the field of computer vision.

Let's start by searching for the most recent papers on this topic. I will write a Python script to query the arXiv API for papers related to "LLaVA" and "computer vision". Let's proceed with that.

```python
import urllib.request
import urllib.parse
import feedparser

# Define the base URL for the arXiv API
base_url = 'http://export.arxiv.org/api/query?'

# Define the search parameters
search_query = 'all:LLaVA AND all:"computer vision"'  # Search for LLaVA and computer vision
start = 0  # Start at the first result
max_results = 5  # Get the top 5 results

query = f'search_query={urllib.parse.quote(search_query)}&start={start}&max_results={max_results}'
url = base_url + query

# Perform the HTTP request
response = urllib.request.urlopen(url)

# Parse the response using feedparser
feed = feedparser.parse(response)

# Print out the entries (titles and links) for each returned article
for entry in feed.entries:
    print(f"Title: {entry.title}")
    print(f"Authors: {', '.join(author.name for author in entry.authors)}")
    print(f"Published: {entry.published}")
    print(f"Link: {entry.link}")
    print(f"Summary: {entry.summary[:150]}...")  # Print the first 150 characters of the summary
    print("\n")
```

This script will retrieve the top 5 most relevant papers from arXiv that mention both LLaVA and computer vision. We can analyze these papers to identify potential applications in computer vision.

--------------------------------------------------------------------------------

Next speaker: Computer_terminal


>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
Computer_terminal (to chat_manager):

exitcode: 0 (execution succeeded)
Code output: 
Title: LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
  Generation and Editing
Authors: Wei-Ge Chen, Irina Spiridonova, Jianwei Yang, Jianfeng Gao, Chunyuan Li
Published: 2023-11-01T15:13:43Z
Link: http://arxiv.org/abs/2311.00571v1
Summary: LLaVA-Interactive is a research prototype for multimodal human-AI
interaction. The system can have multi-turn dialogues with human users by
taking mul...


Title: LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Authors: Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li
Published: 2023-11-09T15:22:26Z
Link: http://arxiv.org/abs/2311.05437v1
Summary: LLaVA-Plus is a general-purpose multimodal assistant that expands the
capabilities of large multimodal models. It maintains a skill repository of
pre-...


Title: Enhance Image-to-Image Generation with LLaVA Prompt and Negative Prompt
Authors: Zhicheng Ding, Panfeng Li, Qikai Yang, Siyang Li
Published: 2024-06-04T04:31:39Z
Link: http://arxiv.org/abs/2406.01956v1
Summary: This paper presents a novel approach to enhance image-to-image generation by
leveraging the multimodal capabilities of the Large Language and Vision
A...


Title: Visual Instruction Tuning
Authors: Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee
Published: 2023-04-17T17:59:25Z
Link: http://arxiv.org/abs/2304.08485v2
Summary: Instruction tuning large language models (LLMs) using machine-generated
instruction-following data has improved zero-shot capabilities on new tasks,
b...


Title: Improved Baselines with Visual Instruction Tuning
Authors: Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee
Published: 2023-10-05T17:59:56Z
Link: http://arxiv.org/abs/2310.03744v2
Summary: Large multimodal models (LMM) have recently shown encouraging progress with
visual instruction tuning. In this note, we show that the fully-connected
...



--------------------------------------------------------------------------------

Next speaker: AI_ComputerScience_Expert

AI_ComputerScience_Expert (to chat_manager):

The search has returned several interesting papers related to LLaVA and its applications in computer vision. Here are the summaries of the top papers:

1. **LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation, and Editing**
   - **Authors:** Wei-Ge Chen, Irina Spiridonova, Jianwei Yang, Jianfeng Gao, Chunyuan Li
   - **Published:** 2023-11-01
   - **Summary:** This paper introduces LLaVA-Interactive, a multimodal human-AI interaction system capable of multi-turn dialogues with human users by taking multiple inputs including images. It demonstrates applications in image chat, segmentation, generation, and editing.
   - **Link:** [Read more](http://arxiv.org/abs/2311.00571v1)

2. **LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents**
   - **Authors:** Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li
   - **Published:** 2023-11-09
   - **Summary:** LLaVA-Plus expands the capabilities of large multimodal models, maintaining a skill repository of pre-trained models for various tasks including visual tasks.
   - **Link:** [Read more](http://arxiv.org/abs/2311.05437v1)

3. **Enhance Image-to-Image Generation with LLaVA Prompt and Negative Prompt**
   - **Authors:** Zhicheng Ding, Panfeng Li, Qikai Yang, Siyang Li
   - **Published:** 2024-06-04
   - **Summary:** This paper presents a novel approach to enhance image-to-image generation by leveraging the multimodal capabilities of LLaVA, focusing on improving visual content generation.
   - **Link:** [Read more](http://arxiv.org/abs/2406.01956v1)

These papers highlight the versatility of LLaVA in handling various aspects of computer vision, such as image segmentation, generation, and editing. The applications are quite broad, impacting areas like multimodal human-AI interaction, enhancing image-to-image generation, and creating multimodal agents capable of performing visual tasks. These capabilities are crucial for advancing the field of computer vision, providing tools that can better understand and interact with visual data in a more human-like manner.

--------------------------------------------------------------------------------

Next speaker: DataMining_Expert

DataMining_Expert (to chat_manager):

The summaries provided indeed highlight the potential applications of LLaVA in computer vision. The capabilities of LLaVA in handling tasks such as image segmentation, generation, and editing are particularly noteworthy. These functionalities can be extremely useful in various practical applications, such as enhancing visual content for media, improving interfaces for human-computer interaction, and even aiding in educational tools where visual aids are crucial.

Given the detailed information from the papers, it seems that LLaVA's integration into computer vision tasks could lead to significant advancements in how machines process and understand visual information, making them more efficient and effective in tasks that require a deep understanding of visual contexts.

It would be beneficial to further explore how these capabilities can be integrated into existing systems or used to develop new applications in fields that heavily rely on visual data.

--------------------------------------------------------------------------------

Next speaker: DataMining_Expert

DataMining_Expert (to chat_manager):

TERMINATE

--------------------------------------------------------------------------------
All agents have been cleared.

AutoBuild

Requirement​

Step 1: prepare configuration and some useful functions​

Step 2: create a AgentBuilder​

Step 3: specify a building task​

Step 4: build group chat agents​

Step 5: execute task​

Step 6 (Optional): clear all agents and prepare for the next task​

Save & load configs​