Graph Analytics Without ETL: PuppyGraph + Amazon S3 Tables

Sa Wang
|
Software Engineer
|
March 13, 2025
Graph Analytics Without ETL: PuppyGraph + Amazon S3 Tables
“Amazon S3 has long been the foundation of modern data infrastructure, and the launch of S3 Tables marks a major milestone—bringing Apache Iceberg closer to becoming the universal standard for data and AI. This innovation allows organizations to leverage high-performance, open table formats on S3, enabling multi-engine analytics without data duplication. For PuppyGraph customers, it means they can now run real-time graph queries directly on their S3 data, maintaining fresh, scalable insights without the overhead of complex ETL. We’re excited to be part of this evolution, making graph analytics as seamless as the data itself.”
Weimo Liu
Cofounder & CEO, PuppyGraph

Amazon S3 has long been a cornerstone of scalable data storage, and with the introduction of Amazon S3 Tables, AWS is enhancing its role in modern analytics. Currently in preview, S3 Tables offer a new type of bucket—table buckets—optimized for tabular data like transaction logs or sensor streams. Built with Apache Iceberg integration, these tables enable efficient storage and querying with standard SQL through engines like Amazon Athena, Amazon Redshift, and Apache Spark. Unlike general-purpose S3 buckets, table buckets deliver higher transaction rates and automated optimizations, such as file compaction and snapshot management, to boost performance and reduce costs—all while maintaining S3’s durability and scalability.

For analytics professionals, this is a game-changer. S3 Tables streamline data lake operations by keeping tabular data query-ready without requiring extensive manual maintenance. But what happens when you need to go beyond rows and columns to explore relationships and networks within your data? That’s where PuppyGraph steps in.

PuppyGraph brings real-time graph analytics directly to your S3 Tables, letting you uncover connections and patterns without the overhead of traditional ETL processes or separate graph databases. Security teams, fraud analysts, and data engineers can now model and query complex relationships—like fraud networks or supply chain links—right where the data lives.

What Makes S3 Tables Stand Out for Analytics

Amazon S3 Tables bring a fresh approach to managing tabular data in the cloud. Unlike traditional S3 buckets, which excel at general-purpose storage, S3 Tables are purpose-built for analytics workloads. They introduce table buckets—a specialized bucket type designed to store tables as subresources in the Apache Iceberg format. This setup is ideal for datasets like daily purchase records, streaming sensor data, or ad impressions, where data naturally organizes into rows and columns.

So, what sets S3 Tables apart? First, they’re optimized for performance. Table buckets offer higher transactions per second and better query throughput than self-managed tables in standard S3 buckets, all while preserving S3’s renowned durability and scalability. The integration with Apache Iceberg adds powerful features like schema evolution, partition flexibility, and time-travel capabilities, letting your data adapt over time without rewriting queries or restructuring storage.

Beyond performance, S3 Tables simplify data lake management. AWS handles maintenance tasks—like compacting small files into larger ones, managing snapshots, and removing unused objects—automatically. This not only speeds up queries but also trims storage costs by keeping your tables lean. You can even customize these maintenance settings for each table or bucket, giving you control without the operational burden.

Security and access are also streamlined. With a dedicated s3tables namespace, you can craft precise IAM policies to govern table buckets and individual tables, separate from standard S3 resources. Plus, integration with AWS analytics tools—like Amazon Athena, AWS Glue, and Amazon Redshift—means your tables slot effortlessly into existing workflows, accessible via standard SQL.

In short, S3 Tables take the heavy lifting out of storing and querying tabular data at scale. They’re a natural fit for analytics teams looking to maximize efficiency in their data lakes. But their real potential shines when you pair them with analytic engines like PuppyGraph, which adds a whole new dimension of insights from the same copy of data.

Graph Analytics on S3 Tables: No ETL, No Vendor Lock-In

Data in S3 Tables is ready for SQL-based analytics, but some insights—like tracing influence networks or mapping dependencies—require seeing beyond rows and columns. Graph analytics excels at uncovering these relationships, and PuppyGraph delivers it directly on your S3 Tables, sidestepping the usual hurdles.

Traditionally, graph analytics demands a cumbersome ETL pipeline: pull data from S3, reshape it into nodes and edges, and load it into a dedicated graph database. That’s slow, costly, and often ties you to a vendor’s proprietary system. 

PuppyGraph, a distributed graph analytics engine, changes the equation. It connects to the Apache Iceberg tables in your S3 table buckets and queries them in place—no extraction, no transformation, no separate storage. Built for speed and simplicity, PuppyGraph lets you define a graph model—say, customers as nodes linked by purchase edges—over your existing data and run queries instantly using languages like Gremlin or openCypher.

Figure: How PuppyGraph works with Amazon S3 Tables

PuppyGraph’s design is a perfect match for S3 Tables. It leverages their high-performance storage and Iceberg’s structured format to handle massive datasets efficiently, scaling seamlessly with S3’s infrastructure. Unlike traditional graph databases that require data ingestion and maintenance, PuppyGraph operates as a lightweight, serverless layer—spinning up compute only when you query. This means you’re not duplicating data or managing another system; your S3 Tables remain the single source of truth, optimized by AWS’s automated compaction and snapshot features while PuppyGraph enables real-time relationship analysis.

The benefits are striking. Analysts can detect fraud rings, engineers can analyze telemetry signals, and security teams can map threat networks—all in real time, without ETL overhead. Because PuppyGraph works with Iceberg’s open format, you’re not boxed into a proprietary ecosystem; your data stays accessible to other tools like Athena or Spark. Add PuppyGraph’s support for complex traversals—like finding the shortest path between entities or clustering connected groups—and you’ve got a versatile powerhouse for relationship-driven insights.

With S3 Tables providing a robust, scalable foundation and PuppyGraph enabling sophisticated graph analytics, organizations can derive actionable insights without the complexity of traditional methods. To illustrate this powerful combination in practice, the following "Getting Started" section offers a detailed demonstration of how to configure and query S3 Tables with PuppyGraph, showcasing a zero-ETL workflow.

Getting Started

In this section, we’ll walk through a step-by-step tutorial on how to connect PuppyGraph to Amazon S3 Tables, set up a graph model, and start running real-time queries. We have prepared a demo for you to work with in GitHub. Here we go!

Prerequisites

An AWS account

- aws cli version 2

- Docker

Before We Start

It is strongly recommended to take the getting-started tutorial of AWS S3 Tables first to become familiar with AWS S3 Tables and resolve IAM problems. In this demo, we will use a similar approach when creating tables.

Data Preparation

Create Table Bucket

Follow the getting-started tutorial of s3 tables to create a table bucket.

Create namespace

Use aws cli to create a namespace. Replace the placeholders with your real value.

aws s3tables create-namespace \
    --table-bucket-arn arn:aws:s3tables:<region>:<account-
id>:bucket/<table-bucket-name> \
    --namespace puppygraph_test

Create Tables

Use aws cli to create tables. Replace the placeholders in corresponded files by your real value.

aws s3tables create-table --cli-input-json
file://table_definition/person_definition.json
aws s3tables create-table --cli-input-json
file://table_definition/software_definition.json
aws s3tables create-table --cli-input-json
file://table_definition/knows_definition.json
aws s3tables create-table --cli-input-json
file://table_definition/created_definition.json
Figure: Tables in the table bucket.

Grant Lake Formation Permissions On Your Table

Follow the step 3 in S3 Tables tutorial.

Insert Data

Use AWS Athena to insert data into tables.

INSERT INTO person VALUES
('v1', 'marko', 29),
('v2', 'vadas', 27),
('v4', 'josh', 32),
('v6', 'peter', 35);
INSERT INTO software VALUES
('v3', 'lop', 'java'),
('v5', 'ripple', 'java');
INSERT INTO created VALUES
('e9', 'v1', 'v3', 0.4),
('e10', 'v4', 'v5', 1.0),
('e11', 'v4', 'v3', 0.4),
('e12', 'v6', 'v3', 0.2);
INSERT INTO knows VALUES
('e7', 'v1', 'v2', 0.5),
('e8', 'v1', 'v4', 1.0);
Figure: Query in Amazon Athena.
Modern Graph
Figure: the "modern" Graph defined by Apache Tinkerpop.

Connecting to PuppyGraph

Start PuppyGraph

docker run -p 8081:8081 -p 8182:8182 -p 7687:7687 -d --name puppy --rm
--pull=always puppygraph/puppygraph-dev:preview20250314

Log into the UI

Log into the PuppyGraph Web UI at http://localhost:8081 with the following credentials:

  • Username: puppygraph
  • Password: puppygraph123

Modeling the Graph

Replace the placeholder in schema.json by your real value. (table-bucket-arn is something like arn:aws:s3tables:<region>:<account-id>:bucket/<table-bucket-name>.)

In Web UI, select the file `schema.json` under Upload Graph Schema JSON, then click on Upload.

Figure: Schema page.

Optional: You can also create the schema using the schema builder via click on Create graph schema. You will add vertices and edges step by step. See the demo video and the UI demo video.

Querying via PuppyGraph

Navigate to Query in the Web UI:

  • Use Graph Query for Gremlin/openCypher queries with visualization.
  • Use Graph Notebook for Gremlin/openCypher queries.

Example Queries

  • Retrieve a vertex named "marko":

Gremlin:

g.V().has("name", "marko").valueMap()

OpenCypher:

MATCH (v {name: 'marko'}) RETURN v

  • Retrieve the paths from "marko" to the software created by those whom "marko" knows:

Gremlin:

g.V().has("name", "marko")
.out("knows").out("created").path()

OpenCypher:

MATCH p=(v {name: 'marko'})-[:knows]->()-[:created]->()
RETURN p
Figure: Query visualization.

Teardown

Stop the PuppyGraph container.

docker stop puppy

Conclusion

Amazon S3 Tables and PuppyGraph offer a transformative approach to analytics, blending optimized tabular storage with real-time graph querying to deliver insights without the burden of ETL processes. This powerful combination—rooted in S3 Tables’ scalable, Apache Iceberg-backed design and PuppyGraph’s lightweight, in-place graph analytics—enables teams to tackle fraud detection, telemetry analytics, cybersecurity, supply chain analysis, and more, all from a single, open data source. As demonstrated in the "Getting Started" section, this workflow is both practical and accessible, promising to reshape how organizations leverage data lakes for actionable intelligence.

If you're ready to explore how real-time graph analytics on S3 Tables can simplify your data pipeline and unlock new insights, download the forever free Developer edition or book a free demo today with our graph experts

.

Sa Wang is a Software Engineer with exceptional math abilities and strong coding skills. He earned his Bachelor's degree in Computer Science from Fudan University and has been studying Mathematical Logic in the Philosophy Department at Fudan University, expecting to receive his Master's degree in Philosophy in June this year. He and his team won a gold medal in the Jilin regional competition of the China Collegiate Programming Contest and received a first-class award in the Shanghai regional competition of the National Student Math Competition.

Join our newsletter

See PuppyGraph
In Action

See PuppyGraph
In Action

Graph Your Data In 10 Minutes.

Get started with PuppyGraph!

PuppyGraph empowers you to seamlessly query one or multiple data stores as a unified graph model.

Dev Edition

Free Download

Enterprise Edition

Developer

$0
/month
  • Forever free
  • Single node
  • Designed for proving your ideas
  • Available via Docker install

Enterprise

$
Based on the Memory and CPU of the server that runs PuppyGraph.
  • 30 day free trial with full features
  • Everything in Developer + Enterprise features
  • Designed for production
  • Available via AWS AMI & Docker install
* No payment required

Developer Edition

  • Forever free
  • Single noded
  • Designed for proving your ideas
  • Available via Docker install

Enterprise Edition

  • 30-day free trial with full features
  • Everything in developer edition & enterprise features
  • Designed for production
  • Available via AWS AMI & Docker install
* No payment required