# Prompt to generate questions qa_generate_prompt_tmpl = """\ Context information is below. --------------------- {context_str} --------------------- Given the context information and not prior knowledge. generate only questions based on the below query. You are a Professor. Your task is to setup \ {num_questions_per_chunk} questions for an upcoming \ quiz/examination. The questions should be diverse in nature \ across the document. The questions should not contain options, not start with Q1/ Q2. \ Restrict the questions to the context information provided.\ """ llm = Anthropic(api_key=anthropic_api_key) qa_dataset = generate_question_context_pairs( nodes, llm=llm, num_questions_per_chunk=2 )
过滤句子的功能,比如— Here are 2 questions based on provided context
# function to clean the dataset deffilter_qa_dataset(qa_dataset): """ Filters out queries from the qa_dataset that contain certain phrases and the corresponding entries in the relevant_docs, and creates a new EmbeddingQAFinetuneDataset object with the filtered data. :param qa_dataset: An object that has 'queries', 'corpus', and 'relevant_docs' attributes. :return: An EmbeddingQAFinetuneDataset object with the filtered queries, corpus and relevant_docs. """
# Extract keys from queries and relevant_docs that need to be removed queries_relevant_docs_keys_to_remove = { k for k, v in qa_dataset.queries.items() if'Here are 2'in v or'Here are two'in v }
# Filter queries and relevant_docs using dictionary comprehensions filtered_queries = { k: v for k, v in qa_dataset.queries.items() if k notin queries_relevant_docs_keys_to_remove } filtered_relevant_docs = { k: v for k, v in qa_dataset.relevant_docs.items() if k notin queries_relevant_docs_keys_to_remove }
# Create a new instance of EmbeddingQAFinetuneDataset with the filtered data return EmbeddingQAFinetuneDataset( queries=filtered_queries, corpus=qa_dataset.corpus, relevant_docs=filtered_relevant_docs )
# filter out pairs with phrases `Here are 2 questions based on provided context` qa_dataset = filter_qa_dataset(qa_dataset)