ISSUE № 219 · JUN 5, 2026
NEW · 75 films added with full TMDB metadata PLAY · 51 browser games — chess, 2048, snake, more BEST · Hand-picked AI tools updated weekly COMPARE · Phones, laptops, headphones — side by side SWAP · 600+ apps with free open-source alternatives NEW · 75 films added with full TMDB metadata PLAY · 51 browser games — chess, 2048, snake, more BEST · Hand-picked AI tools updated weekly COMPARE · Phones, laptops, headphones — side by side SWAP · 600+ apps with free open-source alternatives
ai-tools/image-reviews

Flux by Black Forest Labs Builder's image pick

Black Forest Labs' open-weight image model family — the tool that dethroned Stable Diffusion on quality and now rivals Midjourney on photorealism.

Flux by Black Forest Labs
v1.0 verified 2026 2026-06-02

The Black Forest Labs origin story

The Flux story starts in Munich in 2022, inside Stability AI's research team. Robin Rombach, Andreas Blattmann, Patrick Esser, and Dominik Lorenz were the core researchers behind Stable Diffusion — the model that put open-source image generation on the map. In early 2024, the team departed Stability AI and founded Black Forest Labs (BFL) with backing from Andreessen Horowitz. Their pitch was direct: take everything they'd learned building and shipping Stable Diffusion, fix what SD got wrong, and rebuild the model architecture from scratch.

The result — FLUX.1 — landed in August 2024 in three simultaneous variants: [schnell] (fast, open weights, Apache 2.0), [dev] (higher quality, open weights, non-commercial), and [pro] (best quality, API-only, commercial). The release immediately attracted attention not just for its output quality but for the architecture underneath: a hybrid transformer combining multimodal and parallel diffusion transformer blocks, trained at 12 billion parameters. That's roughly double the scale of SDXL. The size paid off in ways you could see immediately — portraits that looked photographic, text in images that was legible, prompt adherence that actually tracked complex multi-object descriptions.

By October 2024, BFL released FLUX1.1 [pro] with a 2x speed increase and measurable quality jump, followed by Ultra and Raw modes — Ultra pushing output to 4 megapixels, Raw tuned for candid, documentary-style photorealism without the "AI polish" sheen. Then in November 2025, FLUX.2 [dev] arrived: a full architectural generation update at 32B parameters, with native image editing, multi-reference support, and NVIDIA-optimized FP8 inference that cut VRAM requirements by roughly 40%. BFL closed a $300M Series B in December 2025, valuing the company at over $6B, and by January 2026 had shipped FLUX.2 [klein] — a distilled, Apache 2.0-licensed variant built for real-time generation.

This is not a hobby project. This is the same team that built Stable Diffusion, with more resources, a clean slate, and a commercial mandate. That context matters when evaluating every output you get from a Flux model.

What Flux actually is

Flux is a text-to-image diffusion model family — you give it a text prompt, it gives back a photographic-quality image. What separates it from every predecessor is a combination of scale (12B–32B parameters), a novel hybrid-transformer architecture, and a training methodology that produces unusually tight correspondence between what you write and what you get.

At the technical level, Flux uses a rectified flow transformer architecture — a departure from the U-Net backbone that defined Stable Diffusion 1.x and SDXL. Rectified flow simplifies the diffusion path between noise and image, which translates to faster inference at equivalent quality and better gradient flow during training. The multimodal transformer blocks attend jointly over image patches and text tokens, meaning the model's "understanding" of your prompt and its internal image representation are tightly coupled throughout the generation process, not just at the conditioning step.

The practical consequence of that architecture: if you write a 200-word prompt describing the exact pose, lighting, color palette, and background of a scene, Flux will honor it with a degree of fidelity that competitors struggle to match. You're not prompt-engineering around the model's limitations — you're describing what you want.

The model family in 2026

The FLUX lineup has two generations. Understanding the split is worth the two minutes it takes.

FLUX.1 — the original generation

Three models, each with a different availability and use-case profile:

FLUX1.1 [pro] — the first quality jump

Released October 2024, FLUX1.1 [pro] added a 2x speed improvement and a quality uplift verified by third-party benchmarks. Two modes on top of the base model:

FLUX.2 — the 2025–2026 generation

The full-generation update, built on a 32B architecture with native editing capabilities:

NOTE · which model to start with

For local experimentation: FLUX.2 [klein] (Apache 2.0, fast, free). For serious personal work: FLUX.1 [dev] or FLUX.2 [dev] via ComfyUI. For a production API: FLUX.2 [pro] at $0.03/image. For maximum resolution output: FLUX1.1 [pro] Ultra or FLUX.2 [max] at $0.06–0.07/image.

flux · flux-render.png
A FLUX render
fig · A FLUX render · source: getimg.ai

First impressions using Flux

The fastest way to see what Flux can do without spending anything: fal.ai's playground, which lets you run FLUX.1 [schnell] for free on their infrastructure. You type a prompt, hit generate, and an image comes back in about 3–5 seconds. No account required for casual testing.

The first thing that stands out is how literally Flux interprets your prompt. Type "a barista with red hair and silver earrings pouring latte art in a sunlit cafe" and you get exactly that — red hair, earrings visible, latte art that actually looks like latte art, window light from the direction you'd expect. This isn't remarkable to describe, but it's remarkable in practice. Earlier models — including Stable Diffusion XL — would catch some elements and drop others. Flux catches everything in a 30-word prompt without you needing to reorder it, weight elements with brackets, or add negative prompts to suppress artifacts.

The second thing that hits you is resolution. Even the schnell model generates at 1024×1024 with a level of crispness and edge definition that older SD variants needed ControlNet extensions to approach. Hair strands are individually defined. Fabric textures are coherent. Hands — historically AI's most reliably broken output — are correct more often than not.

Move to FLUX1.1 [pro] Ultra and the gap to professional photography becomes difficult to articulate. The level of detail at 4MP is not "this looks like a photo"; it's "I would need metadata to confirm this isn't a photo."

Photorealism: where Flux dethroned Stable Diffusion

Stable Diffusion's photorealism ceiling was always a negotiation between the base model, checkpoint merges, LoRA stacking, and samplers. Power users built elaborate workflows to approach photographic quality. Beginners got muddy, over-saturated outputs that screamed "AI."

Flux changed this in two ways. First, the base model is dramatically better at physical coherence — lighting behaves like light, materials behave like materials, depth of field follows optics. You don't need a "photorealism" LoRA to get photographic output; you need a clear prompt. Second, the scale (12B–32B parameters) brings a richness of world knowledge that smaller models simply can't access. Flux has seen enough photographs during training to understand what skin looks like under different lighting, what a worn leather jacket looks like versus a new one, what the bokeh of a 50mm f/1.4 looks like versus a phone camera.

In third-party benchmarks run through early 2026, FLUX.2 [pro] and [max] rank first among accessible image models for photographic realism — ahead of Midjourney V8 on technical metrics (LPIPS perceptual similarity, FID for natural photography) even while Midjourney retains its lead on subjective "beautiful output" scores.

bench --tool=all --metric=photorealism,text,prompt-adherence 2026 model comparison

Flux 2 Max9.2
MJ V88.5
SD 3.56.8
DALL-E 37.2
Ideogram9.5
Flux 27.2
DALL-E 37.0
MJ V85.5
Flux 2 Pro9.0
DALL-E 38.2
MJ V87.0
SD 3.56.5

Text rendering: the feature that surprised everyone

Generating legible, correctly spelled text inside an image was the AI image generation community's longest-standing joke. Stable Diffusion would produce signs with letter soup. Midjourney would give you glyphs that looked vaguely like English from a distance and fell apart on inspection. Every model had this problem.

Flux addressed it through architectural changes and scale — the transformer's joint attention over text and image tokens makes character rendering a first-class part of the generation process rather than an afterthought. FLUX.2 [flex] and [pro] claim roughly 60% accuracy on first-attempt legible typography, which is dramatically better than predecessors even if it's not Ideogram-level reliability.

In practice, this means: short words on clean backgrounds (product labels, poster headlines, simple signs) work consistently. Longer sentences, stylized fonts at small sizes, or text on complex backgrounds still require multiple attempts. The gap to Ideogram — which was purpose-built for text-in-image work and remains the category leader at ~95% accuracy — is real. But Flux is now usable for most everyday text-in-image tasks, which is a category shift from "impossible" to "usually works."

TIP · best Flux setup for text in images

Use FLUX.2 [flex] via the BFL API or fal.ai. Keep your text short (under 8 words per element), use high contrast between text color and background in your prompt, and specify the font style explicitly ("bold sans-serif," "handwritten chalk"). Generate 3–4 seeds and pick the cleanest. For anything that needs pixel-perfect typography, use Ideogram and composite the result.

flux · flux-ui.png
The Flux playground
fig · The Flux playground · source: blogs.nvidia.com

Prompt adherence: the real competitive moat

Prompt adherence is the metric that matters most for professional use, and it's where Flux has the clearest technical lead over everything except DALL-E 3 in the conversational pipeline. The architecture's joint attention mechanism means the model processes your text description and the image it's building simultaneously, checking correspondence throughout generation rather than only at the conditioning step.

The practical test: describe a scene with four distinct visual elements — a specific person type, an action, a setting, and a lighting condition. Run the same prompt on Flux, Midjourney, and SD 3.5. Flux will hit all four. Midjourney will frequently reinterpret the aesthetic and drop the least "beautiful" element. SD 3.5 will sometimes confuse spatial relationships. This isn't cherry-picking — it's consistent across thousands of documented community comparisons.

For marketing teams, product photographers, and UI designers, this difference is the difference between "AI tool that saves me 30 minutes" and "AI tool that actually does the job." When your brief says "CEO in a navy suit standing at the window of a modern office, late afternoon backlight, shallow depth of field" — you need all of that in the output, not a reinterpretation of the vibes.

Three real workflows, end-to-end

case-study #01 · product photography at scale

Generate lifestyle product shots for an e-commerce catalog

tool: FLUX.2 [pro] via BFL API · volume: 200 images · cost: ~$6

The workflow: a small outdoor gear brand needs seasonal catalog imagery — 200 product lifestyle shots showing their backpacks in use across various environments. Traditional photography: $8,000–15,000 for a two-day shoot. Stock photography: close but never quite right — wrong bag, wrong environment, wrong demographic. AI: $6 in API costs plus two hours of prompting and curation.

The prompt template for each shot: [product name], [specific color], worn by [person description], [specific outdoor environment], [lighting], [camera simulation], photorealistic, 4K. Because Flux tracks every element of the prompt, the consistency across the 200 images is genuinely usable — the same bag appears recognizably as the same product across different scenes. Multi-reference support in FLUX.2 [pro] allowed feeding a reference image of the actual product, anchoring the bag's appearance across every generation.

Rejection rate was roughly 15% — images where a strap was wrong, a shadow broke, or the face wasn't usable. Standard for AI-assisted photography work at this price point.

// cost: $6 API + 2hrs curation vs $8,000–15,000 traditional shoot
case-study #02 · brand asset generation via fine-tuned LoRA

Train a LoRA on brand characters and generate consistent assets

base: FLUX.1 [dev] · fine-tune: LoRA, ~20 reference images · license: commercial via BFL agreement

A direct-to-consumer food brand has a mascot character — an illustrated fox — that appears on all their packaging. They need 40 unique poses of the fox for a new product line and social campaign. The traditional route: brief the original illustrator, wait three weeks, pay $200–400 per pose.

The Flux route: train a LoRA on FLUX.1 [dev] using 20 existing poses of the fox with consistent lighting and background. The LoRA captures the character's style, color palette, and distinguishing features in roughly 2–3 hours of training on a rented A100 ($8–12 in compute). With the LoRA loaded in ComfyUI, each new pose is a prompt: "fox mascot in running pose, white background, flat color illustration style." Output is stylistically consistent with the source character within one or two seeds per pose.

The team generated the 40 assets in half a day. A small-business commercial license from BFL covered the commercial use of the [dev] weights. Total cost: under $50 in training compute plus the license fee.

// wall-clock: 1 day · traditional: 3 weeks + $8,000–16,000
case-study #03 · in-app image generation pipeline

Build a "generate header image" feature for a SaaS product

model: FLUX.2 [pro] via fal.ai · users: SaaS app, ~500 monthly active

A content marketing SaaS wanted to add a one-click "generate featured image" button. The user fills in the article title and topic; the app generates a header image and inserts it into the editor. Requirements: fast (under 6 seconds), cheap (under $0.05 per generation), consistent quality, no NSFW issues, commercially safe output.

FLUX.2 [pro] via fal.ai hit all four requirements. The integration is a single POST request. Latency is 3–5 seconds for a 1024×768 image. At $0.03–0.04/image with 500 MAU generating roughly 3 images per session, the monthly API spend is under $50 — a line item, not a budget discussion. The output quality means users keep the generated images more than 80% of the time without modification.

The critical choice was Flux over Midjourney: Midjourney has no API, making it unsuitable for programmatic pipelines. DALL-E 3 was API-available but deprecated from the official API in May 2026. Flux's API coverage across fal.ai, Replicate, and the BFL API itself means no single-vendor lock-in and competitive pricing across platforms.

// API cost: ~$50/mo · image acceptance rate: 80%+ without edit

Open weights vs API: the choice that matters

Flux is unusual in the AI image space because you have a genuine choice: run the model yourself, or pay per image through an API. Neither option is objectively correct — the right answer depends on your volume, technical appetite, and use case.

Running Flux locally

FLUX.1 [schnell], FLUX.1 [dev], FLUX.2 [dev], and FLUX.2 [klein] are all available on Hugging Face with open weights. Local inference works well on a GPU with 16–24GB VRAM for FLUX.1 models, and on high-VRAM consumer cards (24GB+) for FLUX.2 [dev] with FP8 optimization. ComfyUI is the dominant workflow tool — a node-based interface where you chain together sampling, LoRA loading, conditioning, and upscaling into a visual pipeline. The community has built thousands of public ComfyUI workflows for every conceivable Flux use case.

The licensing rules for local use: [schnell] and [klein] are Apache 2.0 — no restrictions, including commercial. [dev] models carry a non-commercial license by default; commercial use requires a separate agreement with BFL (tiered by volume). FLUX.2 [dev] weights on Hugging Face are gated — you agree to the license before downloading, and BFL can verify you have the right agreement in place before using commercially.

Using the API

For developers who don't want to manage GPU infrastructure, the API is straightforward: send a prompt (and optionally reference images), receive a URL to a generated image, pay per image. Three main platforms:

WARNING · licensing gotcha

FLUX.1 [dev] is not free for commercial use. The non-commercial license on the open weights is commonly misread. Images generated through the BFL API or fal.ai on [pro] models are commercially licensed. Images generated locally on [dev] weights require a separate BFL commercial license. If you're building a product on Flux, clarify this before launch — retroactive licensing discussions are awkward.

Pricing and licensing in full

Flux has no subscription tier and no free managed service — it's either open weights (free to run, license restrictions apply) or pay-per-image API. Here's the full picture as of mid-2026:

Open weights (self-hosted)

BFL API (pay-as-you-go)

For comparison: Midjourney Basic ($10/mo) gives approximately 200 images per month — roughly $0.05/image. At FLUX.2 [pro] pricing ($0.03), you break even around 330 images/month. Above that volume, Flux is cheaper, API-accessible, and less restricted in output style.

flux · flux-text.png
Text rendering in FLUX
fig · Text rendering in FLUX · source: fal.ai

Flux vs Midjourney

a/flux b/midjourney

Midjourney is the dominant consumer-facing AI image tool — a closed Discord-and-web service known for its opinionated aesthetic, active community, and best-in-class "beautiful output" scores. Flux is the open-weight, API-first alternative. They're used by different people for different reasons, but the overlap is real and growing.

flux wins at

  • photographic technical accuracy (FID, LPIPS)
  • complex prompt adherence — tracks all elements
  • text rendering in images (~60% first-attempt)
  • API access for programmatic use
  • open weights for self-hosting and fine-tuning
  • per-image pricing — cheaper at high volume
  • LoRA ecosystem for style consistency

midjourney wins at

  • subjective aesthetic quality — "wow factor"
  • artistic and painterly styles
  • zero-setup — web and Discord, no technical config
  • community and inspiration browsing
  • consistent high output floor on simple prompts
  • image variation and explore workflows

Verdict: Choose Flux if you need programmatic access, photorealistic output, high-volume pipelines, or fine-tuning. Choose Midjourney if you want beautiful output from short prompts with no setup and care more about "stunning" than "accurate." Many professionals use both — Flux for production, Midjourney for inspiration and concepting.

Where Flux gets it wrong

No tool review is honest without this section. Flux has real limitations that matter.

Text rendering is better, not great

The 60% first-attempt accuracy on text sounds promising until you're on deadline and need a hero image with a specific tagline. On complex typography — multiple lines, stylized fonts, text on curved surfaces — you'll spend more time iterating than Ideogram would require. Flux is not the specialist here. Use Ideogram when text precision is the whole job.

No native interface — you bring your own

Midjourney gives you a beautiful web gallery, variation controls, and a community feed. Flux gives you an API or a model weight. Everything else — the interface, the gallery, the workflow — you build or borrow. fal.ai and BFL's playground fill some of this gap, but the experience is spartan versus Midjourney's polished product. This is a feature for developers and a real limitation for non-technical creatives.

Artistic styles require LoRAs — base model is photorealism-default

Ask Flux for an oil painting, a watercolor study, or a specific illustration aesthetic and the base model gives you something workable but generic. Midjourney's aesthetic engine produces gallery-quality stylized output from short prompts. With Flux, stylized work requires either a fine-tuned LoRA (takes setup and training budget) or a FLUX.2 [flex] workflow with careful prompting and steps configuration. The payoff is better control and consistency once you have the LoRA — but the entry cost is real.

VRAM requirements for local FLUX.2

FLUX.2 [dev] at 32B parameters is not a casual local inference model. You need 24GB+ VRAM with FP8 optimization, or you're running it very slowly or quantized. FLUX.1 models run acceptably on 16GB cards. FLUX.2 [klein] is the practical answer for budget hardware, but it's a meaningful capability trade-off from [dev]. The community is actively building quantization and optimization tooling — this constraint will loosen over the coming year.

Consistency across generations without LoRA

Generate the same person, character, or branded element twice with Flux and you'll get two different faces without a LoRA or reference image. FLUX.2 [pro]'s multi-reference support helps significantly — feeding a reference image constrains the output. But pure text-to-image consistency for characters is a known limitation across all current diffusion models, Flux included.

What's next for Flux

// roadmap · what BFL has signaled · 2026
  • Text-to-video expansion — BFL's $300M Series B specifically funded research beyond static images. The company has signaled multimodal models extending to video generation, potentially building on the same transformer architecture as FLUX.2.
  • Multimodal reasoning — Models that can reason about visual content, not just generate it. BFL described this as unifying "visual perception, generation, memory, and reasoning" in a single model.
  • FLUX.2 [klein] ecosystem — Community LoRA training and fine-tuning for the Apache 2.0 klein variant is accelerating. Expect a Civitai-scale ecosystem for klein models by late 2026, similar to what FLUX.1 [dev] built in 2024–2025.
  • Native editing tools — FLUX.2 already supports native image editing (outpainting, erasing, inpainting) via the BFL API. Deeper integration of editing into open-weight workflows via ComfyUI nodes is actively in development.
  • Kontext line — BFL's FLUX.1 Kontext [dev] (open weights for image-to-image editing) signals a pattern: open-weight versions of commercial editing tools released months after the API-only launch. The community builds on the open version while commercial users pay for the API.
flux · flux-api.png
Running Flux via API
fig · Running Flux via API · source: reddit.com

Who is Flux for

After spending time across the model family, the user profile becomes clear.

Flux is the right tool if you are a developer integrating image generation into a product, a marketing team generating product photography and campaign assets at scale, a designer who needs precise prompt control and consistent output for a brand, a researcher or artist who wants to fine-tune a model on their own visual style, or anyone who generates enough images per month that per-image pricing beats a subscription.

Flux is probably not your first stop if you are a creative who wants to explore AI art aesthetics with short prompts and zero setup, someone who wants to browse community inspiration and do image variations in a social feed, or a non-technical user who finds API keys and ComfyUI workflows intimidating. That person should start with Midjourney or DALL-E.

Alternatives at a glance

Tool
Best for
Flux's edge
Price
Midjourney
Stunning aesthetic output, short prompts, community
API access, photorealism metrics, open weights, cost at scale
$10–60/mo
Stable Diffusion
Maximum customization, existing SD ecosystem, older hardware
Base model quality, prompt adherence, text rendering
Free (open)
Ideogram
Typography-heavy images, logos, poster text, signage
Photorealism, prompt fidelity, API flexibility, model variety
Free / $8+

FAQ

Is Flux actually free to use?

Depends on the model and use case. FLUX.1 [schnell] and FLUX.2 [klein] are Apache 2.0 — fully free including commercial use. FLUX.1 [dev] and FLUX.2 [dev] are free for personal and non-commercial use; commercial use requires a license from BFL. The API models (pro, ultra, max) are pay-per-image with no free tier — but fal.ai offers free trial credits that cover dozens of images.

Can I use Flux-generated images commercially?

Images generated through the BFL API, fal.ai, or Replicate on [pro] models: yes, fully commercial. Images generated locally on [dev] weights: you need a commercial license from BFL. Images from [schnell] or [klein] (Apache 2.0): yes, commercial use is permitted. Check the model you're actually running, not just "Flux" generically — the license differs per variant.

How does Flux compare to Midjourney on quality?

Technically, FLUX.2 [max] leads on photographic realism metrics. Subjectively, Midjourney V8 still produces images that "wow" people on first look more consistently — its aesthetic engine is opinionated in a way users love. For product photography, marketing assets, and precision-prompt work, Flux wins. For "show me something beautiful," many people still reach for Midjourney.

What GPU do I need to run Flux locally?

For FLUX.1 [schnell] or [dev]: a 16GB VRAM GPU (RTX 3080 Ti, RTX 4080, etc.) with ComfyUI runs acceptably. For FLUX.2 [dev] at full quality: 24GB+ VRAM (RTX 4090, RTX 3090) with FP8 optimization. FLUX.2 [klein] was designed to run on lower-spec hardware and is the practical option for 8–12GB VRAM cards.

Does Flux have a Midjourney-style web app?

BFL has a playground at bfl.ai that lets you test models without technical setup. It's functional but sparse — no community feed, no variation browsing, no prompt inspiration. fal.ai's Flux playground is better for casual use. For the full aesthetic, community, and product experience, Midjourney has no close equivalent among Flux interfaces.

What's the relationship between Flux and Stable Diffusion?

Black Forest Labs was founded by the core team behind Stable Diffusion (Robin Rombach and colleagues, previously at Stability AI). Flux is not a successor to SD in a technical sense — it's a ground-up rebuild with a different architecture (rectified flow transformer vs. U-Net). But the expertise, training methodology, and open-weight philosophy are directly continuous. Flux is best understood as "what SD would have become if the team had stayed together and had more resources."

Can I fine-tune Flux with my own images?

Yes. LoRA fine-tuning on FLUX.1 [dev] is widely supported via ComfyUI, Kohya, and dedicated training platforms like Replicate and fal.ai's training API. 15–30 reference images is a typical starting point. FLUX.2 [dev] LoRA training is newer but tooling is maturing rapidly. For commercial use of fine-tuned [dev] models, the same licensing rules apply — get a BFL commercial license.

Is Flux good for generating text in images?

Better than any previous open-weight model, and approaching DALL-E 3 quality on simple text. FLUX.2 [flex] and [pro] hit roughly 60% first-attempt legibility on short, high-contrast text. Not Ideogram-level — Ideogram is the category leader at ~95% and worth using when typographic precision is the whole brief. Use Flux when text is one element of a larger scene; use Ideogram when text is the point.

The verdict

flux-review · v1.0 · latest Builder's Pick
8.6/10
+ photorealism + open weights + API-first + prompt-precise

The most capable open image model — and a credible Midjourney alternative for professional work.

Flux is the rare AI tool that succeeds on two completely different audiences: developers who need a reliable, cheap, API-accessible image generation backend, and creators who need photographic accuracy and precise prompt control. The open-weight FLUX.1 [dev] and FLUX.2 [dev] models gave the community a foundation to build on that Stable Diffusion's architecture could never quite sustain. The commercial API tiers — [pro], Ultra, [max] — give product teams pricing that undercuts Midjourney at volume while exceeding it on technical quality metrics.

The meaningful gaps are real: no polished native interface for casual users, stylized art requires LoRA investment, text rendering is improved but not specialist-grade, and FLUX.2's local inference demands serious hardware. None of these are dealbreakers for the audience Flux is built for. If you generate images programmatically, at volume, or with exacting prompt requirements, Flux is the model family to build on in 2026.

// last verified 2026-06-02 · model family: FLUX.1 + FLUX1.1 + FLUX.2 · tested via BFL API, fal.ai, ComfyUI (local)