In this final blog post, we will now deploy several AI model endpoints (downloaded from Hugging Face), configure our private data source which can be a shared location (Google Drive, Confluence, Microsoft Sharepoint, or S3-compatible endpoint) or using local files and then consuming them using an AI Agent that is built using VMware for Private AI Services (PAIS).
As mentioned in the very first blog post of this mini-series, my goal was to get hands experience with PAIS but without the need to have an NVIDIA GPU capable of vGPU, which would also require an NVIDIA AI for Enterprise (NVAIE) license.
Luckily, we can use an NVIDIA GPU via DirectPath I/O, thanks to the backend plumbing the PAIS Engineering team have built and had shared with me 😊
For my proof of concept, I am using an ASUS NUC 14 Performance, which has an NVIDIA GeForce RTX 4070 mobile GPU (8GB VRAM). The ASUS NUC 14 is running alongside my Minisforum MS-A2 setup, is only used to deploy the completions model endpoint. The use of the ASUS NUC 14 is purely for prototyping and experimentation purposes to demonstrate that anyone can play with PAIS within their lab environment. I plan to use a more powerful NVIDIA GPU setup, which I will share more details at a later point for those interested.
References:
- Running Completion or Embedding Models by Using Model Endpoints
- Adding Context to Model Responses by Using Knowledge Bases
- Deploy an Agent for a Generative AI Application
Requirements:


