/r/PostgreSQL

Photograph via snooOG

The home of the most advanced Open Source database server on the worlds largest and most active Front Page of the Internet.

/r/PostgreSQL

47,800 Subscribers

1

Increase Cpu usage for postgres

I have a VM with 4CPUs 16GB, with disk space of 62GB. I am running a purging process. When I am running that process, and running "top" it is showing Postgres is consuming 100% CPU. I tried changing a few parameters in postgres.conf(max_connection=2000), shm_size=1gb(in compose file), CPUs(2/3/4 in compose file). But still, it consumes the same amount. As of now, it is taking 15 sec to delete 10 records. I want to decrease the time to delete 10 records. What are the other possibilities which can help me in achieving the output?

Note: Postgres is running as a container

4 Comments
2024/11/08
15:22 UTC

19

Postgres Superpowers in Practice

2 Comments
2024/11/08
07:24 UTC

1

Using a quoted column (capitalized) inside a crosstab select statement

Hi all,

I have a challenge here. Say I have a crosstab query like:

select * from crosstab('select * from x."firstName" as first',...) as ...

How do I format the query to avoid the error syntax error at or near "" as first caused by quoting the column name due to it being capitalized?

TIA.

6 Comments
2024/11/08
05:20 UTC

4

thoughts on using vertical or column-based partitioning

I’m fairly new to database design, so please bear with me if this is a well-trodden topic or if I’m heading in the wrong direction.

I’m working with a dataset that’s really wide -- sometimes it’s got hundreds or even thousands of columns and the number of records can be in the millions. This will mostly be a read-heavy system, where I need to pull up sets of records rather than doing complex aggregations.

Right now, I’m thinking about using vertical partitioning. The idea is to split the columns into multiple tables and join them on the primary key when I need the data. But these vertically partitioned tables would also probably have foreign key relationships to other similar tables in the system, so I’m wondering how that might impact performance and design.

For anyone who’s worked with something like this in PostgreSQL, does that approach sound reasonable? Or are there other ways I should be thinking about organizing this kind of data to get the best performance and scalability?

Appreciate any tips or advice!

6 Comments
2024/11/08
00:36 UTC

0

I need help writing a query

So I need to write a query that counts average reply time of managers in chat, there are few conditions:

- Tables are: 1) chat_messages with columns entity_id(basically chat id), type (incoming_chat_message or outcoming_chat_message), created_by (id of manager, or 0 if it's a client), created_at (date of message in unix ts)
2) managers with columns mop_id(manager id), name_mop (name of the manager)
- If there are multiple messages from client in dialogue without a reply from manager and/or multiple messages from manager without a following client message, then just use the first date of each block of messages, if it's just message-reply-message-reply, leave as is.
3) working hours are 09:30-00:00. We only need to count those hours and everything inside of them (e.g. difference 2024-11-07 23:59 and 2024-11-08 09:30 will only be 1 minute in this logic.

I have this query, but feels like it has a lot of breaches and doesn't calcualte correctly
with filt as (SELECT *, to_timestamp(created_at)

at time zone 'Europe/Moscow' as "date_client",

row_number() over(partition by cm.entity_id

order by created_at asc) as "rn"

from test.chat_messages cm

left join test.managers m

on m.mop_id = cm.created_by),

lag_inc as (select *, lag(rn) over(partition by "entity_id"

order by date_client) as "last_row"

from filt

where type = 'incoming_chat_message'),

lag_inc_filt as (select * from lag_inc

where rn - last_row != 1 or last_row is null),

lag_outc as (select *, lag(rn) over(partition by "entity_id"

order by date_client) as "last_row"

from filt

where type = 'outgoing_chat_message'),

lag_outc_filt as (select * from lag_outc

where rn - last_row != 1 or last_row is null),

result as (select lif.entity_id,

GREATEST(lif.date_client, lif.date_client::date + '09:30:00'::interval) as message_date,

case when GREATEST(lif.date_client, lif.date_client::date + '09:30:00'::interval) >

LEAST(lof.date_client, lof.date_client::date + interval '1 day'+ '00:00:00'::interval)

then GREATEST(lof.date_client, lof.date_client::date + '09:30:00'::interval)

else LEAST(lof.date_client, lof.date_client::date + interval '1 day'+ '00:00:00'::interval)

end as reply_date,

lof.name_mop

from lag_inc_filt lif

left join lag_outc_filt lof

on lof.entity_id = lif.entity_id

and lof.rn - lif.rn = 1),

wh as (select * ,

CASE WHEN message_date < reply_date

THEN AGE(reply_date, message_date)

ELSE INTERVAL '0 seconds'

END AS work_hours from result)

select name_mop, avg(work_hours)

from wh

where name_mop is not null

group by name_mop;

Could someone please untie this mess of a code and say if this code calculates anything right? If no, how do I calculate work hours between two dates to begin with?

3 Comments
2024/11/07
21:54 UTC

1

Brain-storming database architecture options between local development and ETL vs. cloud services

1 Comment
2024/11/07
20:42 UTC

14

Am I crazy to use logical replication to transfer databases?

I have a database of about 840 GB of time series data that I need to transfer over to another server, the source will be written to constantly during this time and we have to switch over individual clients in a staggered fashion, so I don't want any data loss or mess during this period. Is it crazy to use logical replication to do the transfer? Will it take days to complete?

EDIT: changed database to server for clarity

27 Comments
2024/11/07
19:03 UTC

4

PgManage 1.2 Released

  • Downoads
  • Github
  • New features:
    • implemented support for adding/changing table indexes in Schema Editor
    • implemented Postgres role editor
    • added SQL error annotations in query editor
    • significant code completion improvements: added context-aware schema, table, view, column and function completions
    • added support for Postgres byte array display query results data grid
  • Bugs fixed:
    • fixes a bug in connection manager where "Discard changes" confirmation was shown after clicking "Test Connection" button
    • fixed a bug when PgManage was trying to restore tabs for closed DB workspaces
    • fixed a bug when "Discard changes" confirmation appeared after running "Explain/Analyze" and then closing DB workspace
    • fall back to unencrypted ssh key when no password is provided (thanks u/El-Virus)
    • use user-provided database password instead of previously stored one when "Test connection" is clicked in connection manager
    • fixed a bug when backup/restore background job info was potentially accessible by other pgmanage user accounts
    • fixed a bug when redundant database back-end was instantiated when requesting database auto-completion metadata
    • fixed a rare race condition when opening new database workspace
    • rearranged parts of DROP INDEX query template to make it runnable without needing extra modifications by the user
    • fixed a bug in Monitoring Dashboard when "Refresh all widgets" button was doing nothing after deleting all and restoring some monitoring widgets
    • fixes a bug in connection manager where "Discard changes" confirmation was shown for connections with passwords auto-filled by the browser
    • fixes a bug in schema editor where "DEFAULT" part of column definition was rendered regardless of presence of column default value
  • UI/UX Improvements:
    • new application startup screen
    • improved naming for exported CSV/XLS files
  • Other changes
    • Django updated from 4.2.11 to 4.2.16
    • cryptography updated from 36.0.2 to 41.0.7
    • pymysql updated from 1.0.x to 1.1.1
    • psycopg2 updated from 2.9.5 to 2.9.9
    • oracledb updated form 1.3.1 to 2.2.1
    • other occurrences of highlighed selection in query editor are now case-insensitive
    • implemented custom SESSION_SERIALIZER for improved sesion handling security
    • eager-load QueryTab components when opening database workspace for improved app responsiveness
    • added uniqueness validation to connection group names
    • removed unnecessary files from windows build of PgManage
    • changed default value for CSV separator setting
    • improved database back-end cleanup when no keep-alive requests come from the front-end
    • don't show error toast when running Explain/Analyze if PEV2 can display these errors by itself
1 Comment
2024/11/07
17:28 UTC

8

Multi-tenant database design.

I am designing a DB solution with multiple tenants, and they need strict data isolation and permissions around their data. Currently, a tenant can deploy a schema (as a namespace) with a single table that they control permissions for reading/writing around. My other option is a single table with the namespace as an embedded field, then index and partition the single table on the namespace. I expect 1000s of tenants with 10k-100k rows per tenant. What offers better performance, manageability, and scalability?

10 Comments
2024/11/07
17:07 UTC

0

Optimizing a cumulative price calculation query

I have a "pricing engine" query that is getting the cost of a variant (per source country), and is calculating taxes & margins in a cumulative fashion (one on top of the other and so forth ex. (cost * tariff)*tariff etc...) according to the order in which they need to be applied, in order to calculate the final price per channel, wholesale, b2b etc.

The query does work using recision but it's rather heavy. I've tried to accomplish the same thing using window functions. But I just do not get the correct result at the end.

Any ideas/suggestions on how I can optimize and make it way more performant?

    WITH RECURSIVE __static AS (
        SELECT NOW() AS __t_now
    ),
    raw AS (
     SELECT
            pp.p_var_id,
    
            pp.r_ch_id,
            _ch.channel,
    
            pp.r_pl_c_id, -- source country
            _c.source_r_pl_c_id,
            _c.source_r_ccy_id,
    
            _c.splr_acct_id,
            _c.source_cost,
    
            _br.bkng_rt_id,
            _br.rate AS rate, -- default to 1 -- no rate defined
    
            _c.source_cost * COALESCE(_br.rate, 1) AS target_cost,
            _ch.r_ccy_id AS target_r_ccy_id,
            _pt.position,
    
            pp.p_pr_id,
    
            COALESCE(pp.p_pr_ty_id, _pc.p_pr_ty_id) AS p_pr_ty_id,
            COALESCE(pp.tariff, _pc.tariff, 0) AS tariff_normalized,
            COALESCE(pp.fixed, _pc.fixed, 0) AS fixed_normalized,
    
            COALESCE(pp.tariff, _pc.tariff) AS tariff,
            COALESCE(pp.fixed, _pc.fixed) AS fixed,
    
            ROW_NUMBER()
                OVER (
                    PARTITION BY
                        pp.p_var_id,
                        pp.r_pl_c_id,
                        _c.source_cost,
                        _c.source_r_pl_c_id,
                        _c.source_r_ccy_id,
                        _c.splr_acct_id,
                        pp.r_ch_id,
                        _br.bkng_rt_id,
                        _br.rate
                    ORDER BY _pt.position DESC
                ) AS row_number
    
    
        FROM prices pp
        CROSS JOIN __static
    
        LEFT JOIN price_components _pc on _pc.p_pr_cmp_id = pp.p_pr_cmp_id
        LEFT JOIN price_types _pt on _pt.p_pr_ty_id = COALESCE(pp.p_pr_ty_id, _pc.p_pr_ty_id)
        LEFT JOIN channels _ch ON pp.r_ch_id = _ch.r_ch_id AND _ch.active IS TRUE
    
        LEFT JOIN LATERAL (
            SELECT DISTINCT ON (c.p_var_id, c.splr_acct_id)
                c.p_var_id,
                c.splr_acct_id,
                c.cost AS source_cost,
                c.bkng_rt_src_id,
                c.r_ccy_id AS source_r_ccy_id,
                c.r_pl_c_id AS source_r_pl_c_id
            FROM costs c
            WHERE
                c.r_pl_c_id = pp.r_pl_c_id -- match cost source country to price source country (new)
                AND __static.__t_now BETWEEN c.t_from AND c.t_until
            ORDER BY c.p_var_id, c.splr_acct_id, c.t DESC
        ) _c ON pp.p_var_id = _c.p_var_id
    
        LEFT JOIN LATERAL (
            SELECT DISTINCT ON (br.bkng_rt_src_id, br.source_r_ccy_id, br.target_r_ccy_id)
                br.bkng_rt_id,
                br.bkng_rt_src_id,
                br.rate
            FROM rates br
            WHERE
                _c.source_r_ccy_id <> _ch.r_ccy_id  -- Only join if conversion is needed
                AND br.source_r_ccy_id = _c.source_r_ccy_id --cost source ccy
                AND br.target_r_ccy_id = _ch.r_ccy_id --channel target ccy
                AND br.bkng_rt_src_id = _c.bkng_rt_src_id
                AND __static.__t_now >= br.t_from
                AND br.deleted IS FALSE
    
            ORDER BY br.bkng_rt_src_id, br.source_r_ccy_id, br.target_r_ccy_id, br.t_from DESC
        ) _br ON _c.bkng_rt_src_id = _br.bkng_rt_src_id
    
        WHERE __static.__t_now BETWEEN pp.t_from AND pp.t_until
        GROUP BY
            __static.__t_now,
            _c.p_var_id, _c.source_cost,
            pp.r_pl_c_id, _c.source_r_pl_c_id,
            _c.source_r_ccy_id, _c.splr_acct_id, _ch.r_ccy_id,
            pp.p_var_id, pp.r_ch_id,
            _ch.r_ch_id, _ch.channel, _br.bkng_rt_id, _br.rate,
            _pt.position,
            pp.p_pr_ty_id, _pc.p_pr_ty_id,
            pp.p_pr_id,
            pp.tariff, _pc.tariff,
            pp.fixed, _pc.fixed
    ),
    calc AS (
        SELECT *,
    
            target_cost + (target_cost * tariff_normalized) + fixed_normalized AS cumulative, -- Apply first tariff
    
            jsonb_build_array(
                jsonb_build_object(
                    'p_pr_id', p_pr_id,
                    'p_pr_ty_id', p_pr_ty_id,
                    'tariff', trim_scale(tariff),
                    'fixed', trim_scale(fixed),
                    'subtotal', trim_scale((target_cost * tariff_normalized) + fixed_normalized)
                )
            ) AS components
    
        FROM raw
        WHERE row_number = 1  -- Start with the highest position tariff
    
        UNION ALL
    
        SELECT raw.*,
    
            cc.cumulative + (cc.cumulative * raw.tariff_normalized) + raw.fixed_normalized AS cumulative, -- Apply each subsequent tariff
    
            cc.components ||  jsonb_build_object(
                'p_pr_id', raw.p_pr_id,
                'p_pr_ty_id', raw.p_pr_ty_id,
                'tariff', trim_scale(raw.tariff),
                'fixed', trim_scale(raw.fixed),
                'subtotal', trim_scale((cc.cumulative * raw.tariff_normalized) + raw.fixed_normalized)
            ) AS components
    
        FROM calc cc
        JOIN raw ON
            cc.p_var_id = raw.p_var_id
            AND cc.r_pl_c_id = raw.r_pl_c_id
            AND cc.source_cost = raw.source_cost
            AND cc.source_r_pl_c_id = raw.source_r_pl_c_id
            AND cc.source_r_ccy_id = raw.source_r_ccy_id
            AND cc.splr_acct_id IS NOT DISTINCT FROM raw.splr_acct_id
            AND cc.r_ch_id = raw.r_ch_id
            AND cc.bkng_rt_id IS NOT DISTINCT FROM raw.bkng_rt_id
            AND cc.rate IS NOT DISTINCT FROM raw.rate
            AND cc.row_number + 1 = raw.row_number  -- Join on the next lower tariff
    )
    
    SELECT *
    FROM calc c
    WHERE row_number = (
        SELECT MAX(raw.row_number)
        FROM raw
        WHERE
            p_var_id = c.p_var_id
            AND r_pl_c_id = c.r_pl_c_id
            AND source_cost = c.source_cost
            AND source_r_pl_c_id = c.source_r_pl_c_id
            AND source_r_ccy_id = c.source_r_ccy_id
            AND splr_acct_id IS NOT DISTINCT FROM c.splr_acct_id
            AND r_ch_id = c.r_ch_id
            AND bkng_rt_id IS NOT DISTINCT FROM c.bkng_rt_id
            AND rate IS NOT DISTINCT FROM c.rate
        )
    ;

Please find a live version here: https://www.db-fiddle.com/f/vnM3o5RZnhvyNgSqr57w66/0

PS. This is meant to go into a materialized view (thats why it's calculating everything). But I would still like to optimize the heck out of it, because I will need to readapt it in order to get the price for a single product.

----

Correct result:

    +--------+-------+--------+---------+----------------+---------------+------------+-----------+----------+---------+-----------+---------------+--------+-------+----------+-----------------+----------------+------+-----+----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |p_var_id|r_ch_id|r_pl_c_id|source_r_pl_c_id|source_r_ccy_id|splr_acct_id|source_cost|bkng_rt_id|rate     |target_cost|target_r_ccy_id|position|p_pr_id|p_pr_ty_id|tariff_normalized|fixed_normalized|tariff|fixed|row_number|cumulative   |components                                                                                                                                                                                                                                                                                                                                                  |
    +--------+-------+--------+---------+----------------+---------------+------------+-----------+----------+---------+-----------+---------------+--------+-------+----------+-----------------+----------------+------+-----+----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |1       |4      |807      |807             |978            |1           |100.000000 |null      |null     |100        |978            |70      |33     |4         |0.35             |0               |0.3500|null |2         |152.55       |[{"fixed": null, "tariff": 0.13, "p_pr_id": 34, "subtotal": 13, "p_pr_ty_id": 2}, {"fixed": null, "tariff": 0.35, "p_pr_id": 33, "subtotal": 39.55, "p_pr_ty_id": 4}]                                                                                                                                                                                       |
    |1       |4      |807      |807             |807            |null        |2000.000000|6         |0.016129 |32.258     |978            |70      |33     |4         |0.35             |0               |0.3500|null |2         |49.209579    |[{"fixed": null, "tariff": 0.13, "p_pr_id": 34, "subtotal": 4.19354, "p_pr_ty_id": 2}, {"fixed": null, "tariff": 0.35, "p_pr_id": 33, "subtotal": 12.758039, "p_pr_ty_id": 4}]                                                                                                                                                                              |
    |1       |1      |807      |807             |978            |1           |100.000000 |1         |61.696400|6169.64    |807            |1       |19     |1         |0.18             |0               |0.1800|null |4         |11110.0372676|[{"fixed": null, "tariff": 0.13, "p_pr_id": 28, "subtotal": 802.0532, "p_pr_ty_id": 2}, {"fixed": null, "tariff": 0.35, "p_pr_id": 26, "subtotal": 2440.09262, "p_pr_ty_id": 4}, {"fixed": 3.5, "tariff": null, "p_pr_id": 27, "subtotal": 3.5, "p_pr_ty_id": 3}, {"fixed": null, "tariff": 0.18, "p_pr_id": 19, "subtotal": 1694.7514476, "p_pr_ty_id": 1}]|
    |1       |2      |807      |807             |978            |1           |100.000000 |1         |61.696400|6169.64    |807            |1       |31     |1         |0.18             |0               |0.1800|null |4         |11932.6970652|[{"fixed": null, "tariff": 0.13, "p_pr_id": 32, "subtotal": 802.0532, "p_pr_ty_id": 2}, {"fixed": null, "tariff": 0.45, "p_pr_id": 29, "subtotal": 3137.26194, "p_pr_ty_id": 4}, {"fixed": 3.5, "tariff": null, "p_pr_id": 30, "subtotal": 3.5, "p_pr_ty_id": 3}, {"fixed": null, "tariff": 0.18, "p_pr_id": 31, "subtotal": 1820.2419252, "p_pr_ty_id": 1}]|
    |1       |1      |807      |807             |807            |null        |2000.000000|null      |null     |2000       |807            |1       |19     |1         |0.18             |0               |0.1800|null |4         |3604.31      |[{"fixed": null, "tariff": 0.13, "p_pr_id": 28, "subtotal": 260, "p_pr_ty_id": 2}, {"fixed": null, "tariff": 0.35, "p_pr_id": 26, "subtotal": 791, "p_pr_ty_id": 4}, {"fixed": 3.5, "tariff": null, "p_pr_id": 27, "subtotal": 3.5, "p_pr_ty_id": 3}, {"fixed": null, "tariff": 0.18, "p_pr_id": 19, "subtotal": 549.81, "p_pr_ty_id": 1}]                  |
    |1       |2      |807      |807             |807            |null        |2000.000000|null      |null     |2000       |807            |1       |31     |1         |0.18             |0               |0.1800|null |4         |3870.99      |[{"fixed": null, "tariff": 0.13, "p_pr_id": 32, "subtotal": 260, "p_pr_ty_id": 2}, {"fixed": null, "tariff": 0.45, "p_pr_id": 29, "subtotal": 1017, "p_pr_ty_id": 4}, {"fixed": 3.5, "tariff": null, "p_pr_id": 30, "subtotal": 3.5, "p_pr_ty_id": 3}, {"fixed": null, "tariff": 0.18, "p_pr_id": 31, "subtotal": 590.49, "p_pr_ty_id": 1}]                 |
    +--------+-------+--------+---------+----------------+---------------+------------+-----------+----------+---------+-----------+---------------+--------+-------+----------+-----------------+----------------+------+-----+----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

3 Comments
2024/11/07
15:28 UTC

24

"SELECT COUNT(*) FROM products" took 17 seconds. Advice?

My products table has over 46 million rows. Now a simple COUNT takes 17 seconds. How can I identify the performance issue then fix it?

The table schema:

-- Table Definition
CREATE TABLE "public"."products" (
    "id" uuid NOT NULL DEFAULT gen_random_uuid(),
    "store_id" uuid NOT NULL,
    "shopify_id" int8 NOT NULL,
    "title" varchar NOT NULL,
    "vendor" varchar NOT NULL,
    "product_type" varchar,
    "price" numeric(10,2) NOT NULL,
    "handle" varchar,
    "tags" _varchar DEFAULT '{}'::character varying[],
    "published_at" timestamp,
    "created_at" timestamp NOT NULL,
    "updated_at" timestamp NOT NULL,
    PRIMARY KEY ("id")
);


-- Indices
CREATE UNIQUE INDEX index_products_on_shopify_id ON public.products USING btree (shopify_id);
CREATE INDEX index_products_on_store_id ON public.products USING btree (store_id);
49 Comments
2024/11/07
12:27 UTC

0

Changing Computer Name (Windows) with PostgreSQL Installed

Hi Co-DBA guys, Is it safe to rename the computer name of a windows environment with postgreSQL installed? I mean I don't need to configure any files or settings in PSQL after changing it? I am thinking if I compare it with MSSQL, I need to change the server name also of the instance of SQL Server by running below command.

"EXEC sp_dropserver 'DC1';

GO

EXEC sp_addserver '<ENTER THE NEW NAME>', local;

GO"

4 Comments
2024/11/07
02:38 UTC

6

Way to view intermediate CTE results?

Does anyone know of a way to easily view the results of CTEs without needing to modify the query?

I'm using DBeaver and in order to see what the results are of a CTE in the middle of a long query, it takes a little bit of editing/commenting out. It's definitely not the end of the world, but can be a bit of pain when I'm working with a lot of these longer queries. I was hoping there'd be a easier way when I run the whole query to see what the results are of the CTEs along the way without needing to tweak the SQL.

Just to illustrate, here's an example query:

WITH customer_orders AS (
    -- First CTE: Get customer order summary
    SELECT 
        customer_id,
        COUNT(*) as total_orders,
        SUM(order_total) as total_spent,
        MAX(order_date) as last_order_date
    FROM orders
    WHERE order_status = 'completed'
    GROUP BY customer_id
),

customer_categories AS (
    -- Second CTE: Categorize customers based on spending
    SELECT 
        customer_id,
        total_orders,
        total_spent,
        last_order_date,
        CASE 
            WHEN total_spent >= 1000 THEN 'VIP'
            WHEN total_spent >= 500 THEN 'Premium'
            ELSE 'Regular'
        END as customer_category,
        CASE 
            WHEN last_order_date >= CURRENT_DATE - INTERVAL '90 days' THEN 'Active'
            ELSE 'Inactive'
        END as activity_status
    FROM customer_orders
),

final_analysis AS (
    -- Third CTE: Join with customer details and calculate metrics
    SELECT 
        c.customer_name,
        cc.customer_category,
        cc.activity_status,
        cc.total_orders,
        cc.total_spent,
        cc.total_spent / NULLIF(cc.total_orders, 0) as avg_order_value,
        EXTRACT(days FROM CURRENT_DATE - cc.last_order_date) as days_since_last_order
    FROM customer_categories cc
    JOIN customers c ON cc.customer_id = c.customer_id
)

-- Main query using all CTEs
SELECT 
    customer_category,
    activity_status,
    COUNT(*) as customer_count,
    ROUND(AVG(total_spent), 2) as avg_customer_spent,
    ROUND(AVG(avg_order_value), 2) as avg_order_value
FROM final_analysis
GROUP BY customer_category, activity_status
ORDER BY customer_category, activity_status;

I'd like to be able to quickly see the result from the final_analysis CTE when I run the whole query.

8 Comments
2024/11/06
21:00 UTC

21

pgvector 0.8.0 released + blog post

pgvector 0.8.0 was released with a cool list of improvements:

  • Added support for iterative index scans
  • Added casts for arrays to sparsevec
  • Improved cost estimation for better index selection when filtering
  • Improved performance of HNSW index scans
  • Improved performance of HNSW inserts and on-disk index builds
  • Dropped support for Postgres 12

I blogged about these improvements, with a few examples that show the impact of the new iterative index scans and the improved cost estimation:

https://www.thenile.dev/blog/pgvector-080

2 Comments
2024/11/06
19:33 UTC

0

Sending Postgres notifications to a GCP pubsub

Is there some kind of FDW / extension to push NOTIFY messages to a GCP pubsub? Or isn't that a good idea?

2 Comments
2024/11/06
17:53 UTC

2

Master-Master Alternative to Bucardo

I’m currently maintaining a setup with five PostgreSQL servers: two running on Debian 11 with PostgreSQL 13.9 and three on Debian 12 with PostgreSQL 15.3.

These servers replicate a Radius database (FreeRadius) and all its tables using Bucardo. The replication works as expected; however, I’ve noticed that Bucardo hasn’t had an update in quite some time. We’re using version 5.6.0-4, released on February 28, 2020, which is the latest available version.

Given the lack of recent updates, I’m considering alternative solutions to Bucardo. What other tools could I explore for this purpose?

Additional Context: All databases are set up in a master-master configuration, meaning writes can happen on any server, and changes need to propagate across all of them. The servers are in different cities, so client requests may come from any location and need to be served by any of the databases.

Thank you in advance for any recommendations!

4 Comments
2024/11/06
13:35 UTC

3

Is there a tool that would translate Mysql code to PostgreSQL ?

I have a big .sql file that was originally written for Mysql and I need to change the code to be compatible with Postgres syntax, any tool/script for this matter ?

17 Comments
2024/11/06
00:09 UTC

18

DB Migration gone wrong - what to do?

Before the migration, we were using Google Cloud SQL as the database provider for my bootstrapped startup (no external funding). The costs had slowly crept up to the point where we were paying about $2000 to use Google Cloud SQL (including data transfer fees - mostly related to self-managed backups) and it was simply ridiculous. So we decided to move it to our managed instances (running ubuntu and postgresql 16).

For the migration, I created a dump, loaded it onto the new db then set up pglogical replication to sync the data. Unfortunately for me, this was the first time using pglogical and all the streams for all tables seemed up to date.

With streaming replication going on, I felt that it was safe to point our web-app from the old google managed db to our own. Everything seemed to go alright. For anyone who cares, we were also moving the web-app frontend from one IP to another (for cost + performance) but it took a while for all our clients to resolve the new DNS. After about 6h, the old app (pointing to the old google cloud sql instance stopped receiving traffic and I mostly scaled it down.

Things ran alright until the next day when there were sequence collision? issues. I ran a number of resets like

SELECT setval('approvals_e_id_seq', (SELECT MAX(id) FROM approvals_e) + 1);

I also unsubscribed from pglogical and that seemed to stop any new errors.

However, it appears there are data gaps when the dump happened and a few hours later. We noticed this when customers would call about missing data etc.

So far, I've been writing specific scripts to read from the old db (if the last update time is after DATE_OF_DUMP) and write the data if it's missing or wasn;t updated later than what's in the olddb.

Of course, I'm wondering what I could be doing differently and regret not having planned this by asking here or hiring a pro etc. Any help is highly appreciated

10 Comments
2024/11/05
21:49 UTC

1

Determining How Much of the Data in a Table is Accessed

Is there a way to determine how much of a tables data is actually accessed for a time period? What I would like to be able to determine in an automated way, is how much of the data in a given table is actually being actively used for any given table/DB. This data can then be used to potentially move some portion of data out etc..

8 Comments
2024/11/05
20:12 UTC

0

Postgres sending Webkooks when action is recorded

Hi – new to this community and glad I can tap into the swarm knowledge here.

Context: I'm a no-coder working in GTM workflow automation, I just started my business early 2024 and now get into a territory where I need to do more and more system integration.

A client has a Postgres db behind a VPN, so I cannot easily read from that db with Clay.com.

I believe a possible solution is sending the data via HTTPS webhook.

Is that a viable solution for this?

I'm ok with technical concepts but have zero Postgres experience.

Thx in advance!

1 Comment
2024/11/05
18:08 UTC

13

PostgreSQL data warehouse solutions

Has anyone come across PostgreSQL data warehouse solutions?
I recently stumbled across this company - https://www.tacnode.io/

What are some other alternatives in this space?

13 Comments
2024/11/05
17:30 UTC

1

A silly question for someone who new to cloud

https://preview.redd.it/hvpkv3ys04zd1.png?width=319&format=png&auto=webp&s=057b1051eb6932a6f1363ada53042b97491d015e

This is the pricing for Azure Managed PostgreSQL with Flexible Server. Does this pricing mean I will be charged $763 even if I have low data storage and low ingress and egress?

12 Comments
2024/11/05
16:30 UTC

6

what's the fastest way to insert on a table with a unique constraint ?

I have been working for some time on an ETL that depends on backfilling and has a unique index. I can't use COPY because if a Tx fails, the entire batch fails. I am left to use queued inserts via batch ( using go pgx ), but it's very slow. Parallelizing batches is fast but it's problematic due to non-ordered access and potential deadlocking. What is the 2024 solution to this use case ?

18 Comments
2024/11/05
01:50 UTC

10

How much PosgresSQL can handle (writes/reads/delete)on digitalocean droplet 8GB Ram 2VCPUs ?

How much PostgreSQL can handle (writes/reads/delete)on digitalocean droplet 8GB Ram 2VCPUs OR 4VCPUshosted along with (nestjs,React in nginx)

Problem : i have client how was using firebase and it was very good for him but the app needed to be rebuild (nestjs) and i was informed after a week that they need firestore so want to know how can PostgreSQL handle if the reads say like 100K daily (joins of course etc) and writes say 2000 daily

1- How will PostgreSQL will Perform provided the above droplet ?
2- How will PostgreSQL will perform along with the back and front
3- How the rest of the app Perform (Backend-frontend)
4- what will be the challenges i will counter in terms of database and data as all (Backups - scaling - availability )?

23 Comments
2024/11/04
21:44 UTC

1

%search% on a column with single word string code

I Have a Huge database and a column which is a single word string code, I want to apply %foo% seaching into that. currently using LIKE and it's now giving statement timeout, Any Better/Best Alternative????

8 Comments
2024/11/04
05:35 UTC

Back To Top