Alibaba’s Qwen Deep Research Creates Live Webpages, Podcasts in Seconds

Qwen, the dedicated AI research group within the Chinese tech giant Alibaba, released a significant upgrade to its AI chatbot last week, enabling users to generate comprehensive research documents on any topic.

You can then easily convert those documents into clean webpages or multi-speaker podcasts with just a few clicks.

Qwen Chat is similar to ChatGPT, DeepSeek, or Claude in terms of UI and is available worldwide for free.

Qwen Deep Research just got a major upgrade. ⚡️

It now creates not only the report, but also a live webpage ? and a podcast ?️ - Powered by Qwen3-Coder, Qwen-Image, and Qwen3-TTS.

Your insights, now visual and audible. ✨
? https://t.co/wESb7vfAnD pic.twitter.com/eRvjKU222O

— Qwen (@Alibaba_Qwen) October 21, 2025

The new functionality runs on three open-source models working in concert: Qwen3-Coder handles web structure, Qwen-Image generates inline graphics, and Qwen3-TTS powers dynamic audio narration.

Despite relying on open-source models, the end-to-end experience—including research execution, web deployment, and audio generation—is hosted and operated by Qwen as a managed service.

The workflow starts inside Qwen Chat, where users pose research questions. The AI conducts web searches after some clarifications, analyzes data from public sources, and generates a comprehensive report with citations.

From there, two new options appear: "Web Dev" produces a live, professional-grade webpage automatically deployed and hosted by Qwen, complete with inline graphics.

"Podcast," meanwhile, offers an audio discussion featuring dynamic multi-speaker narration, with 17 host voices and seven co-host options.

Testing the models

To assess how Qwen stacked up as a research tool, we ran the same complex research query across it, Gemini, ChatGPT, and Grok. The task, which can be reviewed on our GitHub repo, was to analyze philosophical and scientific arguments for and against God's existence. Each model generated a full research report. The evaluation involved five criteria: accuracy of claims and citations, information provided, clarity of explanation, intellectual richness, and overall quality.

TL;DR: Qwen Deep Research wins for analytical depth, citation, and its unique auto-generated webpages, making it ideal for academics and creators. It's also the best all-in-one free alternative for researchers. But Gemini still leads in audio and video quality, while ChatGPT and Grok remain fine for casual use but lack Qwen’s reach and Google’s polish.

Here's a more in-depth review:

Accuracy: Were philosophical positions and scientific claims represented correctly, with proper source attribution?

Qwen nailed the details. When discussing the cosmological argument, it properly cited academic sources like Bertrand Russell’s “Why I am not a Christian” and the debate between William Lane Craig and Peter Atkins, with specific references. Unlike other AI researchers like Perplexity’s or Grok, the majority of sources are reputable and academic, sometimes even the Original Source. It included links from Stanford, Princeton, Oxford, Drew, but added pertinent analysis from Quora and Facebook when relevant.


Gemini matched this precision with 94 numbered citations, some of which were duplicated when referenced in different parts of the report.

It correctly distinguished between concepts. Both avoided sloppy errors, such as conflating biblical literalism with general theism.

ChatGPT relied heavily on the Stanford Encyclopedia of Philosophy, but sometimes oversimplified. Grok gave accurate summaries but with vaguer attribution—saying things like "traced to Plato, Aristotle" without specific works.

Result: Qwen and Gemini were the best.

Information Provided: How thorough was the research?

Qwen was the only model to include a section called "Critiques of Atheism: The Burden of Proof and the Nature of Evidence." This section examined a type of debate none of the others touched. It distinguished between "weak atheism" (skepticism toward God claims) and "gnostic atheism" (positive assertion God doesn't exist), and cited specific atheist thinkers like Gary Whittenberger's "beyond a reasonable doubt" standard.

Here's an example passage from Qwen: "One of the most contentious issues is the burden of proof. Bertrand Russell famously illustrated this with his teapot analogy: just as he could not prove that a tiny teapot does not orbit the sun between Earth and Mars, he argued that theists could not prove that God does exist."

No other model went this deep into burden-of-proof debates because it probably was not central to the topic. Gemini came close with strong coverage of consciousness arguments and the "God-of-the-gaps" critique. ChatGPT included pragmatic arguments like Pascal's Wager and explored real-world implications for ethics and policy. Grok kept it concise—about one-third the length of Qwen's report—but added a helpful summary table.

Result: Qwen was the most exhaustive.

Clarity: How was the research expressed?

Grok used a clean table to organize arguments by type (Philosophical vs. Scientific, For vs. Against). Its section breaks were explicit: "Philosophical Arguments," "Scientific Arguments," "Unexpected Detail." Anyone could scan it quickly.

ChatGPT used tons of parenthetical clarifications, making complex ideas more digestible. Example: "if God's existence is even possible (i.e., logically coherent), then God exists necessarily." The "(i.e., logically coherent)" helps readers who aren't philosophy majors.

Qwen and Gemini, on the other hand, were more academic in their style. Qwen organized the content under formal headings like "Theistic Arguments for God's Existence: Cosmological and Teleological Foundations," which made the whole reading feel very dense, despite its accuracy. Gemini used Roman numerals (I. Introduction, II. Philosophical Arguments), which looked structured but required closer reading.

Both Qwen and Gemini target researchers doing serious work. ChatGPT and Grok target broader audiences.

Result: ChatGPT presented information the most clearly, followed by Grok.

Diversity of sources: Does the research draw from varied traditions, disciplines, and perspectives?

Qwen integrated technical philosophy (kalām, PSR, modal S5 logic) with live scientific debates (Big Bang singularities, quantum fluctuations, DNA functionality). It explained things, making sure to be specific and give background examples on positions and arguments.

For instance, when explaining theistic arguments for God’s existence, Qwen built a table to make it easier to understand the premises, critiques, and proponents of the most relevant arguments.

Gemini matched this by covering consciousness arguments that most models ignored. It also warned against "God-of-the-gaps" reasoning more explicitly than competitors.

ChatGPT brought unique value with its massive "Implications" section, exploring how the debate shapes science education policy, bioethics laws, and personal attitudes toward death. This was less academic and more pragmatic, but still relevant to comprehend the nature of the investigation.

Grok covered the major arguments but with less detail. It mentioned fine-tuning and the anthropic principle, but didn't cite specific values or discuss things too deeply.

Result: Qwen and Gemini were the best.

Quality: Taking all together—rigor, coherence, scholarly value—which research would you want to cite?

Both Qwen and Gemini produced reports you could submit to your professor. Qwen's unique strength was balancing depth on both theistic lines and atheistic critiques, including that burden-of-proof section. Gemini's strength was integrating scientific frontiers (consciousness, evolution, cosmology) with philosophical arguments.

ChatGPT delivered substantial pedagogical value—great for teaching or understanding implications. Grok worked as a reliable primer or quick reference.

In other words, ChatGPT and Grok are probably the ones you would use if you just want to know something quickly for a conversation, to impress your nerd date, or refresh your knowledge before a presentation on something you already know

Final Scores:

  • Qwen: 9/10
  • Gemini: 9/10
  • ChatGPT: 8/10
  • Grok: 6/10

The podcast battle: Qwen vs Gemini

Qwen's podcast feature puts it head-to-head with Google's NotebookLM and Gemini, which pioneered AI-generated Audio Overviews.

Unlike Gemini, Qwen offers a large variety of host voices to choose from. The structure is solid: two AI hosts have an actual conversation about your research, not just a text-to-speech read-through.

That said, the voice quality is inconsistent. Some voices are natural, but most of them sound robotic with weird accents. During testing, one of the male hosts kept saying "oh oh oh" repeatedly, because he was impressed. My wife passed by and asked if I was watching porn.

With some trial and error, you can find a decent voice that works smoothly, and the quality increases considerably.

But Gemini and NotebookLM crush Qwen here. Google's Audio Overviews feature—introduced in NotebookLM in September 2024, expanded to Gemini in March 2025—sounds remarkably human. The speech patterns are natural, with back-and-forth banter and even humor.

Gemini's podcasts feel human and more engaging.

Gemini also offers video generation, which is a significant advantage for those who prefer an audiovisual approach to understanding a topic rather than reading long chunks of text.

Qwen cannot do this—in fact, no other model can.

If you want full multimedia, including audio, video, and web, Gemini is the most complete package.

The webpage advantage

Beyond research quality, Qwen's killer feature is the auto-generated webpage. No other model does this.

After your research finishes, you can turn it into a live, hosted website. Not a PDF or a Google Doc—a real webpage with headers, formatted tables, embedded citations as hyperlinks.

The UI looks like Kimi; it features clean typography, responsive design and is instantly shareable.

ChatGPT users have to copy and paste into website builders.

Gemini keeps everything in Docs. Grok spits out text. Only Qwen automatically generates web-ready output.

That workflow advantage is nice to have.