资讯
Anthropic has released its next generation of AI models, Claude Opus 4 and Claude Sonnet 4, and is introducing new safety measures designed to prevent their use in developing chemical, biological, ...
OpenAI has released a new benchmark for testing AI systems in healthcare. Called HealthBench, it's designed to evaluate how well language models handle realistic medical conversations. According to ...
Anthropic has introduced a new "AI for Science" initiative that offers up to $20,000 per month in API usage credits to selected researchers. According to the company, applicants are evaluated using ...
LMArena has become one of the most prominent public benchmarking platforms for large language models. The platform operates by presenting users with head-to-head comparisons of model responses, asking ...
Researchers at the University of Zurich conducted an unauthorized experiment on the popular Reddit community r/ChangeMyView (CMV), using AI-powered accounts to test the persuasive ability of large ...
OpenAI’s latest language models, o3 and o4-mini, incorporate advanced reasoning capabilities and extensive tool use, including image analysis, Python execution, and web browsing. According to OpenAI, ...
ByteDance introduces Seedream 3.0, a new text-to-image model. Benchmarks suggest improvements over GPT-4o and Midjourney in speed, accuracy, and visual quality. ByteDance has released Seedream 3.0, a ...
OpenAI has unveiled o3 and o4-mini, the newest additions to its o-series lineup that the company claims are its most intelligent models to date. Ad According to OpenAI, a key advancement is the models ...
Anthropic has enhanced its AI assistant Claude with two new capabilities: an agent-based research function and Google Workspace integration, both designed to significantly expand the chatbot's ...
OpenAI adds three new GPT-4.1 models to its API. The models are designed to outperform GPT-4o in most areas, while lowering costs and improving speed. OpenAI has introduced a new family of language ...
Jean Rémi King: This is a highly debated question in the field, and I want to emphasize that what I’m expressing here is my personal opinion—it’s not a scientific consensus. It’s a long-standing ...
The model aims to deliver similar performance to closed systems like OpenAI's o3-mini, but with a smaller footprint. According to benchmark tests on LiveCodeBench, DeepCoder-14B performs at the same ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果