<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Chronicles of a Software Engineer]]></title><description><![CDATA[Chronicles of a Software Engineer]]></description><link>https://blog.emendez.dev</link><generator>RSS for Node</generator><lastBuildDate>Thu, 14 May 2026 20:09:30 GMT</lastBuildDate><atom:link href="https://blog.emendez.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Exploring JSON in DuckDB with the openFDA Animal & Veterinary Adverse Events API]]></title><description><![CDATA[Exploring JSON in DuckDB with the openFDA Animal & Veterinary Adverse Events API
A practical, end‑to‑end walkthrough using the DuckDB CLI with JSON
When working with Modern data, JSON is everywhere — APIs, logs, manifests, telemetry, and data feeds. ...]]></description><link>https://blog.emendez.dev/exploring-json-in-duckdb-with-the-openfda-animal-and-veterinary-adverse-events-api</link><guid isPermaLink="true">https://blog.emendez.dev/exploring-json-in-duckdb-with-the-openfda-animal-and-veterinary-adverse-events-api</guid><category><![CDATA[duckDB]]></category><category><![CDATA[json]]></category><category><![CDATA[cli]]></category><category><![CDATA[dataengineering]]></category><category><![CDATA[SQL]]></category><dc:creator><![CDATA[Edward Mendez]]></dc:creator><pubDate>Mon, 02 Feb 2026 22:20:24 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1770070414646/e2a0005d-c192-4645-8658-5b4be65851ac.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<h1 id="heading-exploring-json-in-duckdb-with-the-openfda-animal-amp-veterinary-adverse-events-api"><strong>Exploring JSON in DuckDB with the openFDA Animal &amp; Veterinary Adverse Events API</strong></h1>
<p><em>A practical, end‑to‑end walkthrough using the DuckDB CLI with JSON</em></p>
<p>When working with Modern data, JSON is everywhere — APIs, logs, manifests, telemetry, and data feeds. But working with JSON at scale is often painful: nested structures, arrays inside arrays, inconsistent schemas, and compressed files delivered over HTTP.</p>
<p>DuckDB does a great job changing that dynamic completely.</p>
<p>In this article, we’ll walk through a real‑world example using the <strong>openFDA Animal &amp; Veterinary Adverse Events dataset</strong>, a public API that exposes:</p>
<ul>
<li><p>a self‑describing JSON manifest</p>
</li>
<li><p>downloadable ZIP‑wrapped JSON partitions</p>
</li>
<li><p>deeply nested event structures</p>
</li>
</ul>
<p>Using only the <strong>DuckDB CLI</strong>, we will:</p>
<ul>
<li><p>Read JSON directly from URLs</p>
</li>
<li><p>Parse nested JSON structures</p>
</li>
<li><p>Extract ZIP‑compressed JSON over HTTPS</p>
</li>
<li><p>Dynamically generate SQL to build staging tables</p>
</li>
<li><p>Flatten complex arrays with <code>UNNEST</code></p>
</li>
<li><p>Materialize clean relational tables:</p>
<ul>
<li><p><code>events</code></p>
</li>
<li><p><code>event_reactions</code></p>
</li>
<li><p><code>event_drugs</code></p>
</li>
</ul>
</li>
<li><p>Prepare the dataset for analytics or downstream modeling</p>
</li>
<li><p>exporting to csv and parquet.</p>
</li>
</ul>
<p>This is the kind of workflow that normally a full ELT pipeline, or custom process — but DuckDB handles it locally, with no servers and no friction.</p>
<p>As a disclaimer, I want to point out that the main objective of this article is to illustrate DuckDB's capabilities.</p>
<hr />
<h1 id="heading-1-setup-environment"><strong>1. Setup Environment</strong></h1>
<p>When working with DuckDB project, I usually create a folder for the project and use VS Code. I create a commands.txt file to store my SQL commands. In this walk through we will be creating sql files and a duckdb database.</p>
<pre><code class="lang-plaintext">mkdir duckdb/openfda_demo
cd duckdb/openfda_demo
code .
</code></pre>
<h1 id="heading-2-installing-duckdb"><strong>2. Installing DuckDB</strong></h1>
<p>DuckDB has documentation on how to install the DuckDB CLI <a target="_blank" href="https://duckdb.org/install/?platform=windows&amp;environment=cli">here</a>. I used winget to install the DuckDB cli.</p>
<pre><code class="lang-plaintext">winget install DuckDB.cli
</code></pre>
<p>To execute DuckDB, I would normally open the terminal in VSCode and type my commands in there. We will be creating a DuckDB database</p>
<pre><code class="lang-plaintext">duckdb -ui openfda_animalandveterinary.duckdb
</code></pre>
<p>The -ui will download the duckdb ui extension, load it and will open up a browser session to a beautiful jupyter notebook styled IDE. The exercises will use the terminal cli, but the browser session has nice features to look into.</p>
<p>If you wanted to not create a DuckDB, you can just run this command</p>
<pre><code class="lang-plaintext">duckdb -ui
</code></pre>
<p>That will use an <code>:memory:</code> database and any tables you create will be lost when you exit the session.</p>
<p>DuckDB’s extension system is one of its superpowers.<br />For this project, we will be using the <code>httpfs</code>, <code>json</code> and the <code>zipfs</code> extensions. The <code>httpfs</code> and <code>json</code> are core extensions included when DuckDB is installed. The <code>zipfs</code> needs to be installed from <code>community</code>. This extensions allows DuckDB to read/decompress ZIP files. The <code>httpfs</code> allows DuckDB to access directly from URLs.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">INSTALL</span> zipfs <span class="hljs-keyword">FROM</span> community;
<span class="hljs-keyword">LOAD</span> zipfs;

<span class="hljs-keyword">SET</span> memory_limit = <span class="hljs-string">'40GB'</span>;
</code></pre>
<p>The memory limit is optional but helpful when working with large FDA datasets.</p>
<hr />
<h1 id="heading-2-loading-the-fda-manifest"><strong>2. Loading the FDA Manifest</strong></h1>
<p>The openFDA API provides a single manifest file describing all datasets and their download locations. We will download the manifest into a table and access the specific nodes we are looking for.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">DROP</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">IF</span> <span class="hljs-keyword">EXISTS</span> manifest_json;

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> manifest_json <span class="hljs-keyword">AS</span>
<span class="hljs-keyword">SELECT</span> *
<span class="hljs-keyword">FROM</span> read_json(<span class="hljs-string">'https://api.fda.gov/download.json'</span>, maximum_object_size = <span class="hljs-number">4294967295</span>);
</code></pre>
<p>DuckDB stores JSON data as a Struct data type. The Struct data type makes it easy to work with complex nested data. I highly recommend getting a hand of <code>DuckDB in Action</code>. It contains a treasure-trove of information about DuckDB. <code>MotherDuck</code> has a promotion for the e-book if you <a target="_blank" href="https://motherduck.com/duckdb-book-brief/">subscribe</a> to their newsletters</p>
<p>The manifest_json table contains two columns, meta and results.</p>
<p>A quick schema check:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">DESCRIBE</span> manifest_json;
</code></pre>
<p>below are the struct definitions of the two columns. Meta Column</p>
<pre><code class="lang-plaintext">STRUCT(
  disclaimer    VARCHAR,
  terms         VARCHAR,
  license       VARCHAR,
  last_updated  DATE
)
</code></pre>
<p>Results Column</p>
<pre><code class="lang-plaintext">STRUCT(
  food STRUCT(
    enforcement STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    "event" STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    )
  ),

  animalandveterinary STRUCT(
    "event" STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    )
  ),

  transparency STRUCT(
    crl STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    )
  ),

  tobacco STRUCT(
    problem STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    )
  ),

  other STRUCT(
    historicaldocument STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    unii STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    nsde STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    substance STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    )
  ),

  device STRUCT(
    classification STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    "510k" STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    covid19serology STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    registrationlisting STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    enforcement STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    udi STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    "event" STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    recall STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    pma STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    )
  ),

  cosmetic STRUCT(
    "event" STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    )
  ),

  drug STRUCT(
    drugsfda STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    "label" STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    enforcement STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    "event" STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    shortages STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    ),
    ndc STRUCT(
      export_date    DATE,
      partitions     STRUCT(display_name VARCHAR, file VARCHAR, size_mb VARCHAR, records BIGINT)[],
      total_records  BIGINT
    )
  )
)
</code></pre>
<p>At first it can be overwhelming, today we are interested <code>animalandveterinary.event</code> from the Results column. If you notice, each dataset in the manifest contains an export_date, partitions, and total_records.</p>
<hr />
<h1 id="heading-3-extracting-dataset-partitions"><strong>3. Extracting Dataset Partitions</strong></h1>
<p>If you navigate to the manifest link <a target="_blank" href="https://api.fda.gov/download.json">https://api.fda.gov/download.json</a> and scroll down to the <code>animalandveterinary → event</code> node you will see many partitions to this dataset. Today we are interested in the download data from a file called <code>2025 Q3 (all)</code>.</p>
<p>The manifest contains a nested structure:</p>
<pre><code class="lang-plaintext">results → animalandveterinary → event → partitions[]
</code></pre>
<p>Each partition represents a downloadable ZIP file containing JSON.</p>
<p>We extract them like this:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">DROP</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">exists</span> stg_partitions;

<span class="hljs-keyword">CREATE</span> temp <span class="hljs-keyword">TABLE</span> stg_partitions <span class="hljs-keyword">AS</span>
<span class="hljs-keyword">SELECT</span> 
results.animalandveterinary.event.export_date,
p.display_name,
p.file,
p.size_mb,
p.records
<span class="hljs-keyword">FROM</span> manifest_json
<span class="hljs-keyword">CROSS</span> <span class="hljs-keyword">JOIN</span> <span class="hljs-keyword">unnest</span> (results.animalandveterinary.event.partitions) <span class="hljs-keyword">as</span> p(p);
</code></pre>
<p>Now we can inspect the available partitions:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> stg_partitions <span class="hljs-keyword">WHERE</span> display_name;
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770017829298/ae1f633d-2fba-43d6-a8f8-726237df1b77.png" alt /></p>
<hr />
<h1 id="heading-4-dynamically-generating-sql-to-create-the-staging-table"><strong>4. Dynamically Generating SQL to Create the Staging Table</strong></h1>
<p>DuckDB’s CLI has a powerful trick:<br />You can generate SQL dynamically using <code>.once</code>, <code>.read</code>, and <code>printf</code>. More information can be found <a target="_blank" href="https://duckdb.org/docs/stable/clients/cli/dot_commands">here</a> on the <code>Dot Commands</code>.</p>
<p>First, generate a script that creates an <strong>empty staging table</strong> with the correct schema:</p>
<pre><code class="lang-sql">.header off
.mode tabs
.once tmp_create_stg_json.sql

<span class="hljs-keyword">SELECT</span> printf(
    <span class="hljs-string">'DROP TABLE IF EXISTS stg_json;
     CREATE TABLE stg_json AS
     SELECT ''%s'' AS display_name, *
     FROM read_json(''zip://%s/*json'', maximum_object_size = 4294967295)
     WITH NO DATA;'</span>,
    display_name, <span class="hljs-keyword">file</span>
)
<span class="hljs-keyword">FROM</span> stg_partitions
<span class="hljs-keyword">LIMIT</span> <span class="hljs-number">1</span>;
</code></pre>
<p>The above command will create the file <code>tmp_create_stg_json.sql</code> with the following context.</p>
<pre><code class="lang-plaintext">drop table if exists stg_json;

CREATE TABLE stg_json AS SELECT '2025 Q3 (all)' display_name, * FROM read_json( 'zip://https://download.open.fda.gov/animalandveterinary/event/2025q3/animalandveterinary-event-0001-of-0001.json.zip/*json', maximum_object_size = 4294967295) WITH NO DATA
</code></pre>
<p>Notice the file name from the partition is ends with <code>.</code><a target="_blank" href="http://json.zip"><code>json.zip</code></a> , and we prepend the <code>zip://</code> and append the <code>*json</code>. This is saying to pass this file into the zipfs extension and unzip any files that end with <code>*json. We need to do this, so DuckDB can read the Json structure to create the table.</code> The <strong>WITH NO DATA</strong> tells DuckDB, just the schema, no data.</p>
<p>Then execute it:</p>
<pre><code class="lang-sql">.read tmp_create_stg_json.sql
</code></pre>
<p>This creates an empty table <code>stg_json</code> with the exact schema of that FDA JSON.</p>
<hr />
<h1 id="heading-5-populating-the-staging-table"><strong>5. Populating the Staging Table</strong></h1>
<p>Now generate a script to load the actual data:</p>
<pre><code class="lang-sql">.header off
.mode tabs
.once tmp_populate_stg_json.sql
<span class="hljs-keyword">Select</span> printf(<span class="hljs-string">'truncate table stg_json;'</span>)
<span class="hljs-keyword">union</span> <span class="hljs-keyword">all</span>
<span class="hljs-keyword">Select</span> printf(<span class="hljs-string">'INSERT INTO stg_json 
SELECT ''%s'' display_name, * FROM read_json( ''zip://%s/*json'', maximum_object_size = 4294967295);'</span>,display_name, <span class="hljs-keyword">file</span> ) <span class="hljs-keyword">as</span> create_statement
<span class="hljs-keyword">from</span> stg_partitions
<span class="hljs-keyword">where</span> display_name = <span class="hljs-string">'2025 Q3 (all)'</span>;
</code></pre>
<p>The above command will create the following sql file <code>tmp_populate_stg_json.sql</code>. Today we will pull the partitions with the <code>display_name</code> = '2025 Q3 (all)'</p>
<pre><code class="lang-sql"><span class="hljs-keyword">TRUNCATE</span> <span class="hljs-keyword">TABLE</span> stg_json;

<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> stg_json

<span class="hljs-keyword">SELECT</span> <span class="hljs-string">'2025 Q3 (all)'</span> display_name, * <span class="hljs-keyword">FROM</span> read_json( <span class="hljs-string">'zip://https://download.open.fda.gov/animalandveterinary/event/2025q3/animalandveterinary-event-0001-of-0001.json.zip/*json'</span>, maximum_object_size = <span class="hljs-number">4294967295</span>);
</code></pre>
<p>Execute it:</p>
<pre><code class="lang-sql">.read tmp_populate_stg_json.sql
</code></pre>
<p>At this point the <code>stg_json</code> will contain the following structure with 1 record. The column we are interested in is the results, which will contain the complete json file.</p>
<pre><code class="lang-plaintext">describe stg_json;
column_name = display_name
column_type = VARCHAR
       null = YES
        key = NULL
    default = NULL
      extra = NULL

column_name = meta
column_type = STRUCT(disclaimer VARCHAR, terms VARCHAR, license VARCHAR, last_updated DATE, results STRUCT("skip" BIGINT, "limit" BIGINT, total BIGINT))
       null = YES
        key = NULL
    default = NULL
      extra = NULL

column_name = results
column_type = STRUCT(reaction STRUCT(veddra_version VARCHAR, veddra_term_code VARCHAR, veddra_term_name VARCHAR)[], receiver STRUCT(organization VARCHAR, street_address VARCHAR, city VARCHAR, state VARCHAR, postal_code VARCHAR, country VARCHAR), unique_aer_id_number VARCHAR, original_receive_date VARCHAR, number_of_animals_affected VARCHAR, primary_reporter VARCHAR, number_of_animals_treated VARCHAR, drug STRUCT(route VARCHAR, brand_name VARCHAR, dosage_form VARCHAR, manufacturer STRUCT("name" VARCHAR, registration_number VARCHAR), atc_vet_code VARCHAR, active_ingredients STRUCT("name" VARCHAR, dose STRUCT(numerator VARCHAR, numerator_unit VARCHAR, denominator VARCHAR, denominator_unit VARCHAR))[], used_according_to_label VARCHAR, off_label_use VARCHAR, lot_number VARCHAR)[], health_assessment_prior_to_exposure STRUCT(assessed_by VARCHAR), onset_date VARCHAR, report_id VARCHAR, animal STRUCT(species VARCHAR, gender VARCHAR, female_animal_physiological_status VARCHAR, age STRUCT(min VARCHAR, unit VARCHAR, qualifier VARCHAR, max VARCHAR), weight STRUCT(qualifier VARCHAR, min VARCHAR, unit VARCHAR, max VARCHAR), breed STRUCT(is_crossbred VARCHAR, breed_component VARCHAR), reproductive_status VARCHAR), type_of_information VARCHAR, outcome STRUCT(medical_status VARCHAR, number_of_animals_affected VARCHAR)[])[]
       null = YES
        key = NULL
    default = NULL
      extra = NULL
</code></pre>
<hr />
<h1 id="heading-6-flattening-the-nested-json"><strong>6. Flattening the Nested JSON</strong></h1>
<p>The FDA event JSON is deeply nested:</p>
<ul>
<li><p><code>results[]</code></p>
</li>
<li><p><code>reaction[]</code></p>
</li>
<li><p><code>drug[]</code></p>
</li>
<li><p><code>drug.active_ingredients[]</code></p>
</li>
<li><p><code>outcome[]</code></p>
</li>
</ul>
<p>DuckDB’s <code>UNNEST</code> makes this manageable.</p>
<p>A key trick is aliasing the unnested column:</p>
<pre><code class="lang-sql">CROSS JOIN UNNEST(results) AS r(r)
</code></pre>
<p>This avoids the default <code>unnest</code> column name.</p>
<p>Here’s the flattening query:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SET</span> memory_limit = <span class="hljs-string">'40GB'</span>;

<span class="hljs-keyword">DROP</span> <span class="hljs-keyword">SEQUENCE</span> <span class="hljs-keyword">IF</span> <span class="hljs-keyword">EXISTS</span> seq_eventid;
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">SEQUENCE</span> seq_eventid <span class="hljs-keyword">START</span> <span class="hljs-number">1</span>;

<span class="hljs-keyword">DROP</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">IF</span> <span class="hljs-keyword">EXISTS</span> e;

<span class="hljs-keyword">CREATE</span> TEMP <span class="hljs-keyword">TABLE</span> e <span class="hljs-keyword">AS</span>
<span class="hljs-keyword">SELECT</span>
    <span class="hljs-keyword">nextval</span>(<span class="hljs-string">'seq_eventid'</span>) <span class="hljs-keyword">AS</span> <span class="hljs-keyword">id</span>,
    r.unique_aer_id_number,
    r.original_receive_date,
    r.number_of_animals_affected,
    r.primary_reporter,
    r.number_of_animals_treated,
    r.onset_date,
    r.report_id,
    r.type_of_information,

    <span class="hljs-comment">-- receiver</span>
    r.receiver,
    r.receiver.organization <span class="hljs-keyword">AS</span> receiver_organization,
    r.receiver.street_address <span class="hljs-keyword">AS</span> receiver_street_address,
    r.receiver.city <span class="hljs-keyword">AS</span> receiver_city,
    r.receiver.state <span class="hljs-keyword">AS</span> receiver_state,
    r.receiver.postal_code <span class="hljs-keyword">AS</span> receiver_postal_code,
    r.receiver.country <span class="hljs-keyword">AS</span> receiver_country,

    <span class="hljs-comment">-- health assessment</span>
    r.health_assessment_prior_to_exposure.assessed_by,

    <span class="hljs-comment">-- animal</span>
    r.animal,
    r.animal.species <span class="hljs-keyword">AS</span> animal_species,
    r.animal.gender <span class="hljs-keyword">AS</span> animal_gender,
    r.animal.female_animal_physiological_status <span class="hljs-keyword">AS</span> animal_female_animal_physiological_status,
    r.animal.age.min <span class="hljs-keyword">AS</span> animal_age_min,
    r.animal.age.max <span class="hljs-keyword">AS</span> animal_age_max,
    r.animal.age.unit <span class="hljs-keyword">AS</span> animal_age_unit,
    r.animal.age.qualifier <span class="hljs-keyword">AS</span> animal_age_qualifier,
    r.animal.weight.min <span class="hljs-keyword">AS</span> animal_weight_min,
    r.animal.weight.max <span class="hljs-keyword">AS</span> animal_weight_max,
    r.animal.weight.unit <span class="hljs-keyword">AS</span> animal_weight_unit,
    r.animal.weight.qualifier <span class="hljs-keyword">AS</span> animal_weight_qualifier,
    r.animal.breed.is_crossbred <span class="hljs-keyword">AS</span> animal_breed_is_crossbred,
    r.animal.breed.breed_component <span class="hljs-keyword">AS</span> animal_breed_component,
    r.animal.reproductive_status <span class="hljs-keyword">AS</span> animal_reproductive_status,

    <span class="hljs-comment">-- nested arrays preserved for later normalization</span>
    r.reaction,
    r.drug,
    r.outcome

<span class="hljs-keyword">FROM</span> stg_json
<span class="hljs-keyword">CROSS</span> <span class="hljs-keyword">JOIN</span> <span class="hljs-keyword">UNNEST</span>(results) <span class="hljs-keyword">AS</span> r(r)
<span class="hljs-keyword">WHERE</span> stg_json.display_name = <span class="hljs-string">'2025 Q3 (all)'</span>;
</code></pre>
<p>There are a large number of columns so I will use the following command to show the record in a line mode. the temp table <code>e</code> contains all the values we will need to materialize the relational tables.</p>
<pre><code class="lang-plaintext">.mode line
select * from e limit 1;
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770017937322/79a81b86-4f50-406b-a050-3e70d2108458.png" alt /></p>
<hr />
<h1 id="heading-7-materializing-the-events-reactions-and-drugs-tables"><strong>7. Materializing the Events, Reactions and Drugs Tables</strong></h1>
<p>Now that the event‑level table is flattened, we can normalize the nested arrays.</p>
<h2 id="heading-71-events"><strong>7.1 Events</strong></h2>
<pre><code class="lang-sql"><span class="hljs-keyword">DROP</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">exists</span> <span class="hljs-keyword">events</span>;

<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">events</span> <span class="hljs-keyword">AS</span> 
<span class="hljs-keyword">SELECT</span> 
  <span class="hljs-keyword">id</span>
  ,unique_aer_id_number
  ,original_receive_date
  ,number_of_animals_affected
  ,primary_reporter
  ,number_of_animals_treated
  ,onset_date
  ,report_id
  ,type_of_information
  ,receiver_organization
  ,receiver_street_address
  ,receiver_city
  ,receiver_state
  ,receiver_postal_code
  ,receiver_country
  ,assessed_by
  ,animal_species
  ,animal_gender
  ,animal_female_animal_physiological_status
  ,animal_age_min
  ,animal_age_max
  ,animal_age_unit
  ,animal_age_qualifier
  ,animal_weight_min
  ,animal_weight_max
  ,animal_weight_unit
  ,animal_weight_qualifier
  ,animal_breed_is_crossbred
  ,animal_breed_component
  ,animal_reproductive_status
<span class="hljs-keyword">FROM</span> e;
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770017970666/a31f953b-a5f9-4ed4-99da-ee179cc6904a.png" alt /></p>
<h2 id="heading-72-event-reactions"><strong>7.2 Event Reactions</strong></h2>
<pre><code class="lang-sql"><span class="hljs-keyword">DROP</span> <span class="hljs-keyword">SEQUENCE</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">exists</span> seq_event_reaction_id;
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">SEQUENCE</span> seq_event_reaction_id <span class="hljs-keyword">START</span> <span class="hljs-number">1</span>;

<span class="hljs-keyword">DROP</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">exists</span> event_reactions;
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> event_reactions <span class="hljs-keyword">AS</span>
<span class="hljs-keyword">SELECT</span> 
<span class="hljs-keyword">nextval</span>(<span class="hljs-string">'seq_event_reaction_id'</span>) <span class="hljs-keyword">as</span> <span class="hljs-keyword">id</span>,
e.id <span class="hljs-keyword">as</span> event_id,
e.unique_aer_id_number,
r.veddra_version,
r.veddra_term_code,
r.veddra_term_name
<span class="hljs-keyword">FROM</span> e
<span class="hljs-keyword">CROSS</span> <span class="hljs-keyword">JOIN</span> <span class="hljs-keyword">unnest</span>(e.reaction) <span class="hljs-keyword">as</span> r(r);
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770017995671/14dc7da3-3789-4d2d-9ddb-f6bce9f5160b.png" alt /></p>
<hr />
<h2 id="heading-73-event-drug-details"><strong>7.3 Event Drug Details</strong></h2>
<pre><code class="lang-sql"><span class="hljs-keyword">DROP</span> <span class="hljs-keyword">SEQUENCE</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">exists</span> seq_event_drug_id;
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">SEQUENCE</span> seq_event_drug_id <span class="hljs-keyword">START</span> <span class="hljs-number">1</span>;

<span class="hljs-keyword">DROP</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">if</span> <span class="hljs-keyword">exists</span> event_drugs;
<span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> event_drugs <span class="hljs-keyword">as</span>
<span class="hljs-keyword">SELECT</span> 
<span class="hljs-keyword">nextval</span>(<span class="hljs-string">'seq_event_drug_id'</span>) <span class="hljs-keyword">as</span> <span class="hljs-keyword">id</span>,
e.id <span class="hljs-keyword">as</span> event_id,
e.unique_aer_id_number,
r.veddra_version,
r.veddra_term_code,
r.veddra_term_name
<span class="hljs-keyword">FROM</span> e
<span class="hljs-keyword">CROSS</span> <span class="hljs-keyword">JOIN</span> <span class="hljs-keyword">unnest</span>(e.reaction) <span class="hljs-keyword">as</span> r(r);
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770018017274/23c0d8f5-36aa-4bac-b9c4-e2f50cd2916d.png" alt /></p>
<hr />
<h1 id="heading-8-export-our-relational-tables"><strong>8. Export our Relational Tables</strong></h1>
<p>DuckDB make it easy to export data. More information can be found <a target="_blank" href="https://duckdb.org/docs/stable/sql/statements/copy#copy--to">here</a> .</p>
<pre><code class="lang-plaintext"> COPY (Select * from events ) TO 'events_exported.csv';
 COPY (Select * from events ) TO 'events_exported.parquet';
</code></pre>
<hr />
<h1 id="heading-9-final-thoughts"><strong>9. Final Thoughts</strong></h1>
<p>This walk though demonstrates just how far DuckDB can go with complex JSON:</p>
<ul>
<li><p>Read JSON directly from URLs</p>
</li>
<li><p>Parse nested structures</p>
</li>
<li><p>Extract ZIP‑wrapped JSON over HTTPS</p>
</li>
<li><p>Flatten arrays with <code>UNNEST</code></p>
</li>
<li><p>Build staging and normalized tables</p>
</li>
<li><p>Automate SQL generation in the CLI</p>
</li>
</ul>
<p>All of this runs locally, with no servers, clusters, or cloud infrastructure.</p>
<p>DuckDB continues to redefine what’s possible in local analytics — and JSON processing is one of its superpowers.</p>
<hr />
]]></content:encoded></item></channel></rss>