When an app can send WeChat messages, order takeout, summarize work, and hail rides on behalf of users, AI (Artificial Intelligence) agents in the AI era may truly begin to impact the daily lives of ordinary people.
On October 28th, shares of AI concept stocks surged. By the end of trading, stocks related to companies such as Chuangye Black Horse, Capital Online, Dou Shen Education, Chuan Zhi Education, and Zheng He Ecology hit the daily limit. In terms of news, Beijing Zhi Pu Hua Zhang Technology Co., Ltd. (referred to as Zhi Pu) recently launched an intelligent agent product called AutoGLM, which can simulate human operation of mobile phones and perform various tasks.
Zhang Peng, CEO of Zhi Pu, stated that AutoGLM is an exploration by Zhi Pu according to the AGI (Artificial General Intelligence) upgrade roadmap and can be seen as an attempt by Zhi Pu to use tools at the L3 level of AI capabilities, laying the foundation for building GLM-OS, a general computing system centered around large models.
Within the industry, Zhi Pu's Agent product is not an isolated case. Previously, ByteDance's Dou Bao released AI intelligent earphones that, after connecting to the Dou Bao large model, can deeply integrate with the Dou Bao app, enabling functions such as information queries, travel arrangements, and English learning through voice interaction. Kingdee's AI management assistant, Cang Qiong APP, provides employees with institutional inquiries and offers HR capabilities such as intelligent recruitment and intelligent allocation. Zhao Ming, CEO of Honor Terminal Co., Ltd., disclosed that Honor's AI intelligent agent and China Mobile's Ling Xi industry's first AI intelligent agent interconnectivity.
Advertisement
In the overseas market, last week, Anthropic launched the Computer use feature, enabling AI to take over human computers; Google is developing a similar new project called Project Jarvis, which can automate Chrome web page tasks; Microsoft's team introduced the screen parsing tool OmniParser after Claude, which can convert screenshots into structured data, helping AI accurately understand user intentions; OpenAI's yet-to-be-released AI intelligent agent prototype is said to be capable of controlling computers to complete online ordering and automatically query and solve programming problems.
Zhao Ming stated that the current intelligent agents on the market are mainly divided into two categories: one is the end-side intelligent agent, which can call applications from other terminals, and two intelligent agents can collaborate and cooperate; the other is a more complex intelligent agent, which can cross applications and intelligent agents to simulate humans, perform screen analysis and learning, and actively carry out corresponding operations.
AI companies at home and abroad are intensively aiming at the Agent direction, essentially to further enhance the application efficiency of AI and bring it closer to reality. A researcher from Zhi Pu AI told the First Financial Daily reporter that, judging from the industry's intensive release of Agent-like functions and applications in the fourth quarter, the realization of large-scale application is not a future tense but a present tense.
Currently, Zhi Pu AutoGLM is in the internal testing phase. A First Financial Daily reporter experienced that the app currently cooperates with apps including WeChat, Meituan, Taobao, Dianping, Xiaohongshu, Gaode, and Ctrip. After transmitting commands to the Agent via voice, AutoGLM will automatically open the target app and execute the relevant commands with user authorization. However, there are still imperfections in terms of accuracy and completion.
Regarding the selection of the first batch of cooperative apps, a researcher from Zhi Pu AI told the reporter that AutoGLM is a system-level function. In theory, AutoGLM can fulfill all human needs on electronic devices, not limited to simple task scenarios or API calls. Currently, AutoGLM is in the development and adaptation process, giving priority to scenarios with the highest user frequency. As for the imperfections of the product, the person said that AutoGLM is still improving based on model capabilities, content recognition capabilities, automatic error correction capabilities, and voice capabilities, and existing problems will continue to be optimized and iterated.
On the technical level, a researcher from Zhi Pu AI said that AutoGLM is based on Zhi Pu's self-developed "Basic Intelligent Agent Decoupling Intermediate Interface" and "Self-Evolution Online Course Reinforcement Learning Framework". The core technology WebRL overcomes the challenges of task planning and action execution in large model intelligent agents, such as capability antagonism, scarce training tasks and data, sparse feedback signals, and strategy distribution drift. With adaptive learning strategies, it can continue to iterate. In the future, Zhi Pu believes that the tool capabilities of large models should be like humans, perceiving the environment, planning tasks, executing actions, and ultimately completing specific tasks. Achieving the human imitation of the Plan-Do-Check-Act cycle to form self-feedback and self-improvement.It should be noted that while agents require user authorization before performing tasks, is there a potential for network security risks when handing over control to AI? In response to this, researchers at ZhiPu AI told reporters that AutoGLM itself does not actively obtain users' personal privacy information. For tasks outside the scope of authorization, it will actively prompt users and obtain their consent. For important operations involving transactions and payments, it will further inquire whether the user wishes to proceed. Every time the application is closed and AutoGLM is re-launched in the background, it will re-apply for accessibility permissions from the user. Users who wish to stop using the service can also choose to manually disable it in the phone settings page.
Leave A Reply