This document explains how a .pptx file works internally so you can parse and edit PowerPoint presentations via XML: change text, swap images, tweak layouts, and safely add or remove slides.
A .pptx file is a ZIP archive following the Open Packaging Convention (OPC) and the Office Open XML (OOXML) PresentationML standard.
Inside the ZIP you’ll find:
- XML “parts” (content)
- Binary assets (images, media, embedded files)
- Relationship files (
*.rels) - A content-type manifest (
[Content_Types].xml)
You must:
- Treat the PPTX as a ZIP.
- Use the relationships to navigate, not just filenames.
- Preserve content types and relationship integrity when editing.
Typical PPTX structure (paths are inside the ZIP):
/_rels/.rels- Root relationships (e.g., to
ppt/presentation.xml).
- Root relationships (e.g., to
/[Content_Types].xml- Declares MIME types for each part type.
/ppt/presentation.xml- Main presentation part (list of slide references, masters, etc.).
/ppt/_rels/presentation.xml.rels- Relationships from the presentation to slide parts, slide masters, theme, etc.
/ppt/slides/slide1.xml,slide2.xml, ...- Individual slide parts.
/ppt/slides/_rels/slide1.xml.rels, ...- Relationships from slides to layouts, images, charts, etc.
/ppt/slideMasters/slideMaster1.xml, ...- Slide master parts.
/ppt/slideLayouts/slideLayout1.xml, ...- Slide layout parts.
/ppt/notesSlides/notesSlide1.xml, ...- Notes for slides.
/ppt/theme/theme1.xml, ...- Theme (colors, fonts).
/ppt/media/image1.png,image2.jpeg, ...- Images and other media.
Other optional parts: charts, embedded objects, custom XML, etc.
A part is a file inside the package, such as:
ppt/slides/slide3.xml(XML)ppt/media/image5.png(PNG)
Each part has:
- A path inside the ZIP
- A content type (from
[Content_Types].xml) - Zero or more relationships to other parts or external URIs
Relationships are stored in .rels XML files next to their “source” part.
Each <Relationship> has:
Id– local identifier likerId1Type– URI describing the relationship typeTarget– relative path to target partTargetMode–Internal(default) orExternal
Important:
rIdvalues are only unique within their source part.- You must resolve targets via the appropriate
.relsfile, not by guessing filenames.
The entry point is ppt/presentation.xml.
ppt/presentation.xml root is typically <p:presentation>.
Key elements:
<p:sldIdLst>– ordered list of slide instances- Child
<p:sldId>elements:id: unique numeric ID within this listr:id: references a relationship inpresentation.xml.rels
- Child
<p:sldMasterIdLst>– slide masters<p:notesMasterIdLst>– notes master<p:handoutMasterIdLst>– handout master<p:sldSz>– slide size<p:defaultTextStyle>– default text styles
Slide order is determined by the order of <p:sldId> elements in <p:sldIdLst>, not by the numeric suffix in slideN.xml.
Algorithm:
- Open
ppt/presentation.xml. - Read all
<p:sldId>elements in<p:sldIdLst>in order. - For each
<p:sldId>, get itsr:id. - In
ppt/_rels/presentation.xml.rels, find the<Relationship>withId="<that r:id>". - Its
Targetis a slide part path, e.g.slides/slide3.xml.
Use this relationship-based mapping instead of assuming slideN.xml is slide number N.
A slide part (e.g. ppt/slides/slide1.xml) typically looks like:
<p:sld xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"> <p:cSld> <p:spTree> <!-- shapes, pictures, groups, etc. --> </p:spTree> </p:cSld> <p:clrMapOvr>...</p:clrMapOvr> </p:sld>
Inside <p:spTree> you’ll see:
<p:sp>– shapes (rectangles, text boxes, titles, etc.)<p:pic>– images<p:grpSp>– group shapes<p:cxnSp>– connectors<p:graphicFrame>– charts, tables, SmartArt, etc.
A typical <p:sp> (shape) has:
<p:nvSpPr>– non-visual properties (ID, name, placeholder info)<p:spPr>– shape properties (geometry, line, fill, transform)<p:txBody>– text content (if the shape holds text)
Text is stored using DrawingML (a: namespace) inside <p:txBody>.
Typical text-bearing shape:
<p:sp> <p:nvSpPr>...</p:nvSpPr> <p:spPr>...</p:spPr> <p:txBody> <a:bodyPr/> <!-- text box properties --> <a:lstStyle/> <!-- optional paragraph style list --> <a:p> <!-- paragraph --> <a:pPr>...</a:pPr> <!-- paragraph properties --> <a:r> <!-- run --> <a:rPr .../> <!-- run properties: font, size, color, etc. --> <a:t>Some text</a:t> <!-- actual text --> </a:r> <a:br/> <!-- line break --> <a:r> <a:rPr .../> <a:t>More text</a:t> </a:r> </a:p> <a:p>...</a:p> <!-- another paragraph --> </p:txBody> </p:sp>
Elements:
<a:p>– paragraph<a:pPr>– paragraph properties (bullets, level, alignment)<a:r>– run (a sequence of uniformly formatted text)<a:rPr>– run properties (size, font, color, bold, etc.)<a:t>– literal text<a:br/>– manual line break<a:fld>– field (date, slide number, etc.), also containing<a:t>
- To collect text from a slide:
- Traverse
p:sld → p:cSld → p:spTree → p:spwith ap:txBody. - For each
a:p:- Concatenate all
a:ttext nodes in order. - Insert line breaks when encountering
a:br/.
- Concatenate all
- Traverse
- For search/replace:
- Operate at the
a:tlevel where possible. - Preserve the structure of
a:randa:rPrto keep formatting. - Handle text that may be split across multiple runs, even mid-word.
- Operate at the
- Placeholders (title, body, footer, etc.) are identified by
<p:ph>in the shape’s non-visual properties, not by the text itself.
Hierarchy:
- Presentation → Slide Masters (
/ppt/slideMasters/slideMasterN.xml) - Each Slide Master → Slide Layouts (
/ppt/slideLayouts/slideLayoutN.xml) via relationships and<p:sldLayoutIdLst> - Each Slide → a Slide Layout via its slide
.relsfile
Slide master/layout parts look similar to slides:
<p:sldMaster>/<p:sldLayout>root<p:cSld>/<p:spTree>– shapes and placeholders<p:txStyles>– default text styles- References to theme (
/ppt/theme/themeN.xml)
Slides inherit positioning and styles from their layout and master.
Placeholders designate semantic roles:
<p:sp> <p:nvSpPr> <p:cNvPr id="2" name="Title 1"/> <p:cNvSpPr/> <p:nvPr> <p:ph type="title" idx="0"/> </p:nvPr> </p:nvSpPr> ... </p:sp>
Common type values:
title,ctrTitle– title, centered titlesubTitle– subtitlebody– main contentpic– picture placeholderdt– datesldNum– slide numberftr– footer
Slides override placeholder text by providing shapes with matching placeholder metadata (type, idx).
Conceptually:
- In the slide’s
.relsfile, change the relationship of typeslideLayoutto target a differentslideLayoutX.xml. - Ensure the new layout’s placeholders are compatible (same logical roles), or be prepared to reposition or re-create shapes.
Typical picture shape:
<p:pic> <p:nvPicPr>...</p:nvPicPr> <p:blipFill> <a:blip r:embed="rId5"/> ... </p:blipFill> <p:spPr>...</p:spPr> <!-- transform, size, etc. --> </p:pic>
r:embed="rId5"refers to a relationship inppt/slides/_rels/slideX.xml.rels.- That relationship’s
Targetis something like../media/image3.png. - The actual image is at
ppt/media/image3.png.
To swap an image while preserving position and size:
- Find the
<a:blip>withr:embed="rIdX". - Resolve
rIdXin the slide’s.relsfile. - Overwrite the binary file at the
Targetpath with the new image bytes (preferably same format). - Do not change XML unless pointing to a new media part.
Shapes use EMUs (English Metric Units):
- 1 inch = 914400 EMUs
- 1 cm ≈ 360000 EMUs
Positions and sizes are defined in <a:xfrm>:
<a:xfrm> <a:off x="914400" y="914400"/> <!-- position --> <a:ext cx="4572000" cy="3200400"/> <!-- width/height --> </a:xfrm>
Adjust these attributes to move/resize shapes.
These reside in <p:graphicFrame> with <a:graphic> inside. Charts often reference:
/ppt/charts/chartN.xml- Possibly embedded Excel parts under
/xl/
For a generic editing agent:
- Avoid restructuring chart/table XML unless you fully know the schema.
- For simple text edits (titles, labels), locate
<a:t>text nodes within chart/table parts using similar traversal.
Themes live under /ppt/theme/themeN.xml.
They define:
- Color schemes
- Font schemes
- Effects and other defaults
Slide masters reference themes, and slides/masts can override color mapping using <p:clrMap> and <p:clrMapOvr>.
For text-only edits, you normally do not need to touch theme parts. Just keep them intact.
/[Content_Types].xml declares MIME types for parts.
Examples:
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"> <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/> <Default Extension="xml" ContentType="application/xml"/> <Override PartName="/ppt/presentation.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml"/> <Override PartName="/ppt/slides/slide1.xml" ContentType="application/vnd.openxmlformats-officedocument.presentationml.slide+xml"/> <Default Extension="png" ContentType="image/png"/> </Types>
Rules:
- When adding a new slide
slideN.xml, if an<Override>for/ppt/slides/slideN.xmlor a generic slide override is already present, you typically don’t need to change anything. - When introducing a totally new extension (e.g.,
.foo), add a<Default>or<Override>entry. - Do not remove or corrupt existing entries.
Breaking this file can make the PPTX unreadable.
- From
ppt/presentation.xml, list all<p:sldId>in order. - For each
r:id, resolve toslides/slideX.xmlviapresentation.xml.rels. - For each slide:
- Parse
ppt/slides/slideX.xml. - Traverse
p:cSld → p:spTree → p:sp. - For each
p:spthat hasp:txBody:- For each
a:p:- Collect
a:tvalues, respectinga:br/as line breaks.
- Collect
- For each
- Parse
Optional: use <p:ph> info to categorize text (title, body, footer, etc.).
Scenario: Replace entire text of a title or content placeholder.
Simplest robust approach:
- Identify the target shape by placeholder (
p:ph type="title",body, etc.) or by existing text. - Inside its
p:txBody, remove existinga:pchildren. - Insert new structure:
<a:p> <a:r> <a:rPr/> <!-- can clone from existing run or leave minimal --> <a:t>New text here</a:t> </a:r> </a:p>
For multiple paragraphs, add multiple a:p elements.
Formatting-preserving approach:
- Keep
a:pPranda:rPrnodes. - Only modify the content of
a:t, preserving run structure.
- Identify which
p:picto edit (by placeholder typepic, by name, or by its current image). - Find
<a:blip r:embed="rIdX">inside itsp:blipFill. - In
ppt/slides/_rels/slideX.xml.rels, findRelationship Id="rIdX". - Resolve its
Target, e.g.,../media/image5.png→ppt/media/image5.png. - Overwrite that file with your new image bytes (matching format, e.g., PNG → PNG).
No changes needed in XML or [Content_Types].xml if you reuse the same part.
-
Choose a layout:
- From slide masters and their
<p:sldLayoutIdLst>, or - From an existing slide’s
slideLayoutrelationship.
- From slide masters and their
-
Create a new slide part:
- Option 1: clone an existing slide and clear the content you want to reset.
- Option 2: construct a minimal valid
p:sldreferencing the chosen layout.
-
Save it as
ppt/slides/slideN.xml(unique filename). -
In
ppt/_rels/presentation.xml.rels, add a relationship:
<Relationship Id="rIdNew" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slideN.xml"/>
- In
ppt/presentation.xml, add a new<p:sldId>under<p:sldIdLst>:
<p:sldId id="uniqueNumericId" r:id="rIdNew"/>
idmust be a unique integer within<p:sldIdLst>.
- In
ppt/slides/_rels/slideN.xml.rels, add aslideLayoutrelationship:
<Relationship Id="rIdLayout" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideLayout" Target="../slideLayouts/slideLayoutX.xml"/>
- Ensure
[Content_Types].xmlhas an appropriate<Override>for slide parts (usually already present).
- In
ppt/presentation.xml, locate<p:sldIdLst>. - Reorder the
<p:sldId>elements themselves.
Don’t change their id or r:id; slide order is determined solely by element order.
To make safe, reliable changes:
-
Always use relationships:
- Never infer master/layout/media links purely from filenames or numeric IDs.
-
Preserve unknown XML:
- If you don’t need to modify it, copy elements and attributes through unchanged.
-
Preserve namespaces:
- Keep existing
xmlns:p,xmlns:a,xmlns:rdeclarations and prefixes as-is.
- Keep existing
-
Make minimal edits:
- Focus on local text/shape/media changes instead of global restructures.
-
Keep
[Content_Types].xmlconsistent:- Only add entries when introducing new part types or extensions.
- Avoid changing or removing existing entries.
-
Maintain well-formed XML:
- Ensure all tags are properly closed and nesting is valid.
For most real-world editing tasks, the following subset is sufficient:
-
Understand the ZIP and locate:
ppt/presentation.xmlppt/_rels/presentation.xml.relsppt/slides/slide*.xmland their.relsppt/media/*
-
Use:
<p:sldIdLst>+presentation.xml.relsto enumerate slides in order.p:cSld → p:spTree → p:sp → p:txBody → a:p → a:r → a:tto read/write text.p:pic+a:blip r:embed="..."+ slide.relsto handle images.p:phmetadata to recognize placeholders (title, body, footer, etc.).a:xfrm(a:off,a:ext) for shape coordinates (optional for text-only edits).
Staying within this structure lets you:
- Extract and modify text (titles, bullet lists, body text).
- Replace images without disturbing layouts.
- Add, remove, and reorder slides.
- Respect existing themes, masters, and layouts without needing to deeply modify them.