/r/SQL

Photograph via snooOG

The goal of /r/SQL is to provide a place for interesting and informative SQL content and discussions.

The goal of /r/SQL is to provide a place for interesting and informative SQL content and discussions.

Filter Posts

MySQL

Oracle

MS SQL

PostgreSQL

Posting

When requesting help or asking questions please prefix your title with the SQL variant/platform you are using within square brackets like so:

  • [MySQL]
  • [Oracle]
  • [MS SQL]
  • [PostgreSQL]
  • etc

While naturally we should endeavor to work as platform neutrally as possible many questions and answers require tailoring to the feature set of a specific platform.

Help posts

If you are a student or just looking for help on your code please do not just post your questions and expect the community to do all the work for you. We will gladly help where we can as long as you post the work you have already done or show that you have attempted to figure it out on your own.

Format Your Code

If you are including actual code in a post or comment, please attempt to format it in a way that is readable for other users. This will greatly increase your chances of receiving the help you desire. Something as simple as line breaks and using reddit's built in code formatting (4 spaces at the start of each line) can turn this:

SELECT count(a.field1), a.field2, SUM(b.field4) FROM a INNER JOIN b ON a.key1 = b.key1 WHERE a.field8 = 'test' GROUP by a.field1, a.field2 HAVING SUM(b.field4) > 5 ORDER by a.field.3

Into this:

SELECT count(a.field1),
  a.field2,
  SUM(b.field4) 
FROM a INNER JOIN b 
  ON a.key1 = b.key1 
WHERE a.field8 = 'test' 
GROUP by a.field1, 
  a.field2 
HAVING SUM(b.field4) > 5 
ORDER by a.field3

For those with SQL questions we recommend using SQLFiddle to provide a useful development and testing environment for those who wish to fully understand your problem and help devise a solution.

Learning SQL

A common question is how to learn SQL. Please view the Wiki for online resources.

Note /r/SQL does not allow links to basic tutorials to be posted here. Please see this discussion. You should post these to /r/learnsql instead.

Related Reddit communities


Wiki

We have a Wiki


Acknowledgements

Thanks to ---sniff--- for the awesome header image!

/r/SQL

207,384 Subscribers

6

Division returns zero even after casting to numeric

  • Postgres 17
  • pgAdmin 4
  • All hosted locally

I've got the code below which is attempting to calculate the percentage between the previous value and current one.

I know dividing integers will return zero but regardless of how I change the values I don't ever get a figure back.

I've:

  • changed the values in the CTE before it gets to the main query
  • changed in the percentage calculation (shown below)
  • tried both CAST() and the ::numeric shorthand
  • tried using 100.00 * (ftp_delta / previous_ftp) to force it to numeric, however, the result is still 0.00

WITH delta AS (
SELECT 
    date
, ftp 
    ,  (ftp - LAG(ftp, 1,0) OVER (ORDER BY date)) AS ftpDelta
    ,  LAG(ftp, 1,0) OVER (ORDER BY date) AS previousFtp
FROM 
    bikes.t_measurements
)
SELECT 
    date
    , ftp
, previousFtp
    , ftpDelta
    , ((DIV((ftpDelta::numeric), previousFtp::numeric)) * 100.0) AS percentageDelta
FROM
    delta
WHERE
    delta.ftpDelta != 0
AND delta.previousFtp != 0
ORDER BY
    date ASC

I've included a screenshot of the results table in case that helps.

https://preview.redd.it/igrb207x8jyd1.png?width=537&format=png&auto=webp&s=e99275f3f7ae34d6391513787dd56169cf76bb92

2 Comments
2024/11/02
18:49 UTC

3

SQL Server editor freezes while typing. What could be causing this?

https://i.imgur.com/Pzkcguc.png

This started happening recently with my stored proc code. I've worked on it for ages without any issues. The code isn't even crazy long or anything, it's only 800 lines. It also executes fine, but it's while I'm typing or working on it that it constantly freezes and takes a relatively long time to unfreeze.

What could be causing it and how might I fix it?

The only change I made was I added a 50-line piece of code from my colleague, which outputs a table. It is relatively complex I suppose, but I just don't understand why a 50-line bit code being complex would mean that I literally cannot type anything anymore, yet can still execute the code just fine.

I'm on SQL Server Mgmt Studio 18, and I also have dbForge that my boss forced us to get. I don't even use it really, but maybe it's breaking things somehow?

The stored proc is more or less structured as

WITH

x as (

...

),

y as (

...

),

z as (

...

)

SELECT * FROM z WHERE ...

etc.

3 Comments
2024/11/02
17:39 UTC

4

Deleting Data from Tables in the DataBase - Red Alert

TL;DR;

The President of the company and the power user of the software walked into my office. They had been discussing a problem and had a suggestion command, delete all the data in the database before year 20XX. They told me not worry if it doesn't work and the test environment burns down. (I restore the test environment nightly through jobs)

Am I overthinking this, because I have the primary key and linkage that filters out everything prior to 20XX. I had embedded this in the software as it seems the proper way to filter that data.

They absolutely want the data deleted from some or all the tables. I can't tell which tables to attempt and which to ignore. I look at key setups and index. I know if I delete from a table and violate the referential integrity that nothing will be deleted.

TL;DR;

Having the key of interest, how do I delete its data in all tables containing its key, while honoring referential integrity? I can filter the data with SQL, but they don't want that solution and will slap me down if I try to mention it again.

16 Comments
2024/11/02
16:06 UTC

3

ERD Diagram check+need advices

Hello, I did a small(10 entities) ERD Diagram about Football Club Structure, but I'm not sure, that's correct 100%.

Write all mistakes, what I can do better etc.

I would be grateful! Maybe I would have some questions about ERD Diagrams later.

https://preview.redd.it/o9o7u8qxdiyd1.png?width=1486&format=png&auto=webp&s=f43cc07a9af68c32d505cbb31be0a7d2fc916949

6 Comments
2024/11/02
15:44 UTC

2

SQL books that are more about the underlying tech

Hoping what I’m looking for exists. I’m looking for a book more about the underlying design of sql(or other common databases) than writing queries or doing db maintenance. Think ‘nand to Tetris’ course but for databases(and book form).

7 Comments
2024/11/02
15:08 UTC

37

Explain indexes please

So I understand they speed up queries substantially and that it’s important to use them when joining but what are they actually and how do they work?

35 Comments
2024/11/02
05:38 UTC

0

MySQL keeps showing duplicated results

Hi all, I'm new to MySQL and while trying to run some code on it, it kept returning duplicated results. It was working fine earlier, but now whenever I use WHERE in my query it happens where I get 4x the actual result (shown below).

I have checked the original table without using WHERE many times and there are no duplicates so I'm confused as to why this is happening. I'm not sure if using WHERE even has anything to do with it, I think it might be a bug, but any help would be appreciated. Thank you!

https://preview.redd.it/37kysaekpkyd1.png?width=755&format=png&auto=webp&s=fdd7fa4f4b2d0552f416a47ffe62c7a142a2bcd5

Here's the second image showing it's just repeating itself or duplicating, so instead of just giving me 100ish rows of data it's giving me 460 rows.

https://preview.redd.it/4aeiitfrpkyd1.png?width=759&format=png&auto=webp&s=90dd2c7ac887aaefdf13b9e139191d96d5ef105d

Third image is just a clearer example where I used to ORDER BY to show how much it duplicated itself

https://preview.redd.it/1horqwsdxeyd1.png?width=528&format=png&auto=webp&s=2772ab626aae988379ed274ceb0aa64b8de60d1f

18 Comments
2024/11/02
04:08 UTC

6

Are Provisioned SQL Databases Considered Safe or Not?

If I provision an SQL database from services like DigitalOcean, Linode, Vultr, or AWS, and obtain the connection string, would that database be considered publicly unsafe, even though it requires a username and password for access? Additionally, if I use the connection string in my desktop app, is that okay or not? Do I need to secure it somehow from unauthorized access? What security measures should I take to ensure it's safe?

Thanks for your insights!

11 Comments
2024/11/01
22:41 UTC

13

Struggling reducing the execution time of a complex query

Hi all.

I have a query that is quite complex in PostgreSQL. By complex, I mean it has a lot of joins, counts, CTEs, etc. I also believe this is the best way to get my desired results with the current DDL I am working with. I won't paste the query itself, however I will show an anonymized part of it that seems to be where it is taking a long time. Just a little rundown:

1. The first CTE is a union between two tables, this part does not take a long time at all to execute. 
2. The query that I perform on this CTE above, its goal is to retrieve the records belonging to a group with the highest date column.

This is what this looks like:

-- foo CTE: Union between two tables, different logic for joins / where, etc.
WITH foo AS (
  SELECT 
    col1 AS col1,
    col2 AS col2, 
    col3 AS col3,
    col4 AS col4,
    col5 AS col5
  FROM tbl1 t1 
  JOIN j1 ... 
  JOIN j2 ...
  WHERE 
    ...
  UNION ALL 
  SELECT 
    col1 AS col1, 
    ccol2 AS col2, 
    col3 AS col3,
    col4 AS col4, 
    col5 AS col5 
  FROM tbl2 t2 
  JOIN j1 ... 
  JOIN j2 ...
  WHERE 
    ...
)
SELECT 
  DISTINCT ON (f.col1, f.col2, f.col3)
  f.col1, 
  f.col2, 
  f.col3, 
  f.col5
FROM foo f 
ORDER BY 
  f.col1, 
  f.col2, 
  f.col3,
  f.col4 DESC -- This is where the logic is to pull the record having the highest col4 (date column), from the distinct group of col1, col2, and col3.
;

Here is where my issue is. Because of the complexity of the source data, a CTE seems the only viable approach. This CTE will return a couple million records, growing as time goes on, and the logic to do the distinct on / order by takes forever. I cannot index a CTE.

I've tried making that CTE a temporary table instead, but just creating the temporary table itself takes too long. I need this logic to be returned at endpoint level speeds, and the logic above is just one part of many albeit the part that is taking by far the longest, how can I achieve this?

Thank you!

21 Comments
2024/11/01
20:52 UTC

4

Please critique my understanding of SQL & SQL Server

I'm a newbie when it comes to SQL and technology so please bare with me. Here's my understanding so far with the example of Microsoft's SQL Server (developer/local version) and please let me know if there are any corrections or gaps in my understanding:

I understand SQL is a programming language for the client to communicate and interact with the database server to produce some result with the data whether it be view only or manipulating data.

On the other hand, my understanding of a Server is a computer that runs a program that fetches info to the client upon request and it usually has a primary role, i.e: a database server is a computer that primarily contains a program that hosts a database.

SQL Server, a Microsoft product, is one of the many SQL programs out there (MySQL, Postgres, etc.) which I believe is called a relational database management system (RDBMS). Then it gets a bit tricky because there is a supplemental front-end program called SQL Server Management Studio (SSMS) where you actually have functionalities to type or click, i.e: entering queries.

Okay now wrapping this all together does mean the SQL Server bundle including both SQL Server application and SSMS application is considered a co-existing server and client application? Because 1) I can host my own database on my own computer which I believe lives in the SQL Server program and 2) I am entering queries to access my own database information which is through the SSMS application / client application?

6 Comments
2024/11/01
19:07 UTC

1

Index Use Question - please help

I am creating a database to track portable items, each of these items has a unique integer ID (QID). I will have multiple customers owning different amounts of these items, and I have a table in my database reserved for each customer (e.g. customer1: 1-27, customer2: 28-33 and so on).
when one of these items is identified, all that is taken from it is the integer ID, and I need to find which customer it belongs to, therefore I have created the table in image 1.

image 1

when I retrieve the ID of the item I dump the contents of the table from image 1 into a dictionary and perform a binary search on them manually after querying the database to find the range within which the retrieved ID is (image 2).

image 2

Given I am relatively beginner with SQL, I believe I may be missing a far more efficient way to do it than this but I was unsure whether or not SQL would be able to do a binary search on this data.
Any help would be very appreciated, just trying to work my head around trying to use indexing for this and other tables at the moment.

Thanks :)

4 Comments
2024/11/01
16:45 UTC

2

split column to new column with starts with

Hi There, I am very new to SQL. I have a table that I would like to transform a column's data to a new field. I am trying to write a query and I would like to take the first part of the data text. Example and NotTesting and add to a new column called ReasonType.

ColumnName ReasonDescription

Data:

Testing123
Testing123
NotTesting123
NotTesting123

Result:

Testing
Testing
NotTesting
NotTesting

15 Comments
2024/11/01
13:05 UTC

1

What is the best approach to parse data from an excel field into a another ordered field and have it set up as a automatic batch process for future entry.

Am working on an exercise where I have to manually review and update data from multiple sets of entries that is non standardized. I am exploring for a solution that will allow me to extract the existing data and insert it into a field that is actionable. This will result is a process that will automatically update future entries from submitted data entries into the correct field that we can use for analysis. The current solution that we are using is too slow and process heavy. What would be the best approach to create a solution that will give me the desired outcomes. Thanks.

2 Comments
2024/11/01
12:19 UTC

20

Friday Discourse: CURSOR vs WHILE LOOP

Little bored of seeing endless homework and interview questions, so I thought we could start a more fruitful conversation thread for those of us who aren’t passing exams or looking for a career change.

_Today I figured we could start with Cursors vs while loops.

Which do you prefer, and why? Which is more optimised, in your opinion. Or, which just looks nicer._

If this goes well I’d like to do more on other subjects such as:

  • dynamic SQL optimisation
  • linked servers and index interaction
  • TVF vs views

Does anyone else have other ideas?

26 Comments
2024/11/01
09:10 UTC

0

I HAVE QUESTION GUYS ? WHY "Y" COMMING FIRST HERE IM NOT ABLE TO UNDERSTAND ABOUT THIS ORDERING

leet code problem : problem link

note please dont tell me the answer how to solve this problem

i just want to know why ordering working different here ?

  1. when i use normal ordering it work fine like to A - Z
  2. but when i use in partitioning it working completly different by why?
  3. why "y" coming first?

please if anyone know about this please let me know

and i solved this problem using other methods but i want to know why it coming

https://preview.redd.it/9o9rjfxhz8yd1.png?width=1918&format=png&auto=webp&s=984195af651581121761d9eb8b507bae61dc4cbf

https://preview.redd.it/9jlpyp7109yd1.png?width=1918&format=png&auto=webp&s=f788733fa1f45553cfd8d93478a131b06e10fffb

21 Comments
2024/11/01
08:15 UTC

6

Understanding why SQL Joins are evaluated different to Power Query Joins

Hi everyone,

I'm relatively new to SQL but am somewhat familiar with data transformations with Power Query. I'm having some trouble rewriting some M code with SQL and getting the joins to return the correct rows

I have the below SQL Server query

SELECT
    LOWER(config.email) AS email,
    system_assignments.system_assignment_id,
FROM security.configuration config
LEFT OUTER JOIN security.systems systems
    ON config.permission_source = systems.system_name
    AND config.access_level = systems.access_level
    AND config.approved_data = systems.approved_data
LEFT OUTER JOIN security.system_assignments system_assignments
    ON config.system_assignment_value = system_assignments.system_assignment_value
    AND systems.system_id = system_assignments.system_id;

Which is based off of this Power Query code

let
    Source = #"security configuration",
    #"Merged Queries" = Table.NestedJoin(Source, {"permission_source", "approved_data", "access_level"}, systems, {"system_name", "approved_data", "access_level"}, "systems", JoinKind.LeftOuter),
    #"Expanded systems" = Table.ExpandTableColumn(Table.Buffer(#"Merged Queries"), "systems", {"system_id"}, {"systems.system_id"}),
    #"Merged Queries1" = Table.NestedJoin(#"Expanded systems", {"systems.system_id", "system_assignment_value"}, system_assignments, {"system_id", "system_assignment_value"}, "system_assignments", JoinKind.LeftOuter),
    #"Expanded system_assignments" = Table.ExpandTableColumn(Table.Buffer(#"Merged Queries1"), "system_assignments", {"system_assignment_id"}, {"system_assignments.system_assignment_id"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Expanded system_assignments",{"email", "system_assignments.system_assignment_id"})
in
    #"Removed Other Columns"
    #"Expanded system_assignments"

The power query code gives me the exact result that I expect, while the SQL code gets a few rows wrong. The row counts at the end of both queries are the same.

I'm not interested in making the code more performant at the moment. I just need some assistance as to why the joins aren't performing the same way.

5 Comments
2024/10/31
23:59 UTC

1

Certification recommendations

Is there any entry level certifications you guys recommend? I've had some experience with sql through work but I'd like to get more exposure with the end goal of being able to put it on my resume. I just want to strengthen my resume and be able to say "hey not an expert but got the basics down,"

7 Comments
2024/10/31
20:55 UTC

1

Quick question on schema design

I have an event, a spectator, and a form a spectator can fill out about an event. So let's say the events are sports games of various types (baseball, football, water polo, etc). A spectator can go to many games and a game can have many spectators. A spectator needs to submit a form for every game. What's the right way to define these relationships for good efficiency in my queries? Spectators should have the ability to see/edit their forms for every game they attended/plan on attending. Admins of the app would want to be able to view forms for every future/past event.

(Basically, I'm trying to figure out the right way to deal with the form. Should I add a relationship between the form and the event in addition to the spectator? Is that better for query optimization?)

Also, where do I go to learn the right way to design this kind of schema?

10 Comments
2024/10/31
20:52 UTC

0

Desperately seeking help with relationships, joins, IDs

Hi! I am very very new to databases despite loving data and recently being hired to be in charge of a nonprofits data collection, reporting, etc.

I am currently working in Caspio to create a system for collecting, organizing, managing and reporting on data for a a group of specialists that provide trainings.

This is complicated and a lot of info but I am trying to explain the best way I can and ANY assistance (given to me like I'm five) would be very much appreciated. What I am struggling with is how to get all of the data to work together (ha!). Let me explain...

I have specialists - they create groups. Within each group is enrollment of a participant(s). The participant(s) attend classes (the classes are under the umbrella of a "group.") I am needing to track attendance of each participant that is enrolled in the group and that attends the classes.

I currently have these tables with primary IDs and relevant data:

Specialists -- Specialist .ID, Name, Email

Participant - Participant .ID, Name, Email, Language

A Specialist uses a form to create a Group, using the Group Table, and generates a new unique Group.ID. This table includes: Group .ID (PK), Specialist .ID (FK), Group Name, Language, Funder, etc.

Then, the specialist needs to enroll a participant to the group. And this is where it all gets tricky.

I have another table called Group_Enrollment to show groups and the groups' enrolled participants together (with Groups .ID as a foreign key and Participant .ID as a foreign key.) At this point I have several questions... if I were to use a trigger to have all of inserted data in Groups table be inserted into the Group_Enrollment table, would this automatically associate the Specialists table since the Groups table include the specialists .ID? Or would I need to do a join and include all of the fields for the specialist in order to include that information in the the Group_Enrollment table?

(Caspio does allow link relationships between tables using IDs and I do have those set up)

I have further tables... Specialists create "Classes" using the Classes table and creating a Class .ID. This table includes the Group .ID so that each class is associated with a parent group. But again, in order for the CLASSES table to have all of the info about a group that it needs, do I need to have a trigger that inserts all of the fields of the Group table or can I just input the Group .ID?

This is so long and just the tip of the assistance I need. But any help at this point is very appreciated.

2 Comments
2024/10/31
19:11 UTC

9

Need tips on what/where/how to practice SQL for future interviews and roles

Hi so I’m currently in the midst of an online data analyst course learning SQL, Python, and PowerBI. Currently finished the SQL portion of the course and know the basics…currently going through leetcode’s SQL50 as practice outside of my course however I’d like to practice stuff that I’d actually use in a data analyst role and need for interviews. Any recommendations on

  1. What I should practice?
  2. How I should practice?
  3. Where to practice?

I use MySQL Workbench and have a few months before I finish this course

Any tips and tricks are welcomed! :)

4 Comments
2024/10/31
17:02 UTC

11

How can I join these two columns? I just need a new column with employee_code + gender_code (examples: 1F, 3F, 1M, 6M...). I created these two columns to practice, they are not part of the original tables, and apparently I can't use concat without a table name to pull from.

15 Comments
2024/10/31
16:32 UTC

3

When a1=a2, b1=b2, but c1<>c2

Hi all! In short, I’m trying to find when a customer is being billed for one product at multiple prices.

Some background: I’m using a single table that holds this information. I’ve been trying to find a way to have a query pull if a single customer is being billed different prices on one SKU. In the system I work in, accounts should have only one price per SKU, so I’m trying to find any SKUs with multiple price points per account.

Every account is completely different with what and how many SKUs are being billed, and pricing for SKUs is individualized for each account. There are thousands of accounts.

Attempts: I tried putting the same information into two temp tables (##1 & ##2) to then try to pull when ##1.customer=##2.customer and ##1.SKU=##2.SKU but ##1.price<>##2.price, but my system said there was too much data. Outside of that, everything else I’ve tried just pulls distinct data of every customer with more than one price on their accounts. Since accounts have more than one SKU and each SKU would have different pricing, this doesn’t pull the data I’m looking for.

4 Comments
2024/10/31
15:36 UTC

15

WHERE clause that retrieves only columns that contain both words

Is it possible to retrieve only member id's that have both "xyz" and " abc" in the column rather one or the other? Issue is the set up has duplicate member id numbers in different rows. I don't need all of xyz or all of abc. I only want the member id that meets the condition of having both xyz and abc associated with it.

member idtype
00000000xyz
00000000abc
26 Comments
2024/10/31
14:40 UTC

1

Changing DB Collation

Our SQL Server default instance has it's collection set to the default - Latin1_General_CI_AS

One of our Live Databases has it's collation set to SQL_Latin1_General_CP1_CI_AS

I need them to match as when we do the upgrade of our ERP system to the latest version it will fail as the collations need to match between the Live DB and the Database that is auto-generated with our newer ERP version.

From everything I have read online, it states that it's best to change the SQL Server Instance collation than to change the Database collation.

I asked ChatGPT and it came back with this:

https://chatgpt.com/share/67237bc2-77cc-8005-a35b-cbf8760a1973

Unsure if I should do that or do a complete reinstall of SQL?

0 Comments
2024/10/31
12:46 UTC

2

Where do i start learning?

I found a job which requires knowing SQL. I want to learn it but i have no idea where to start. I am pretty skilled with computers but i never even heard of SQL before. Should i watch tutorials should i just download it and try it? And can anyone explain the basics of it?

10 Comments
2024/10/31
12:27 UTC

89

Are there any jobs out there that only require writing SQL queries

I've had a mostly non-tech job for the last few years although I do work with developers. In past positions I used to be pretty good at writing SQL for UIs and for ad hoc reporting mainly using Oracle DBs. Some of these queries were quite complex. I find myself missing it lately so I was wondering if companies hire/contract for just SQL support even if it pays less than "full stack" type jobs. I am not interested in learning Java, Python or anything non-SQL related.

Thanks for any advice.

Edit: Thanks for all the replies. This is one of the most helpful subreddits I have ever seen! Some other details - I have a couple decades of experience mainly with large health insurance companies and large banks. I should also have mentioned that I would need something that is 100% remote at this time. I know that may limit me even further, but that is the reality of my current situation.

65 Comments
2024/10/31
11:47 UTC

11

What is your average CTE used to solve a question/task in your actual work?

Recently I'm trying to solve questions that require using window functions as well as pivot , ntile, percentile and more and often i have to write at least 5 CTEs before reaching the final query. So I was just wondering what is the amount of CTE you guys actually have to write in your working life daily.

36 Comments
2024/10/31
10:33 UTC

3

Anyone know an SQL formatter that can add semicolons to scripts with multiple SQL statements?

In SQL Server, adding semicolons to the end of statements is optional, unfortunately.

Does anyone here have a good solution that can read an SQL script and magically place semicolons in the proper place? I don't need it to be 100% accurate. The scripts will have multiple SQL statements in them.

I have potentially thousands of scripts I need to perform this on and cannot use a LLM.

I've tried various formatters/liters, but I haven't had any luck. I hope the community here can help me.

,
I'm in the middle of a data migration and I need to search scripts for certain strings, and these strings can be in different rows. So I want to do a gaps and islands approach and search for these string occurrences between semicolons. For example, I need to search for "%INTO% and %Account% that exists in a single SQL statement within the script. Again, these scripts can have multiple SQL statements, so I need to be able to separate them out. I don't need this to be 100% accurate, but something is better than nothing.

I did write a Python script that adds semicolons based on certain rules, but there has to be something better than what I have.

38 Comments
2024/10/31
03:04 UTC

Back To Top