/r/SQL
The goal of /r/SQL is to provide a place for interesting and informative SQL content and discussions.
The goal of /r/SQL is to provide a place for interesting and informative SQL content and discussions.
When requesting help or asking questions please prefix your title with the SQL variant/platform you are using within square brackets like so:
While naturally we should endeavor to work as platform neutrally as possible many questions and answers require tailoring to the feature set of a specific platform.
If you are a student or just looking for help on your code please do not just post your questions and expect the community to do all the work for you. We will gladly help where we can as long as you post the work you have already done or show that you have attempted to figure it out on your own.
If you are including actual code in a post or comment, please attempt to format it in a way that is readable for other users. This will greatly increase your chances of receiving the help you desire. Something as simple as line breaks and using reddit's built in code formatting (4 spaces at the start of each line) can turn this:
SELECT count(a.field1), a.field2, SUM(b.field4) FROM a INNER JOIN b ON a.key1 = b.key1 WHERE a.field8 = 'test' GROUP by a.field1, a.field2 HAVING SUM(b.field4) > 5 ORDER by a.field.3
Into this:
SELECT count(a.field1),
a.field2,
SUM(b.field4)
FROM a INNER JOIN b
ON a.key1 = b.key1
WHERE a.field8 = 'test'
GROUP by a.field1,
a.field2
HAVING SUM(b.field4) > 5
ORDER by a.field3
For those with SQL questions we recommend using SQLFiddle to provide a useful development and testing environment for those who wish to fully understand your problem and help devise a solution.
A common question is how to learn SQL. Please view the Wiki for online resources.
Note /r/SQL does not allow links to basic tutorials to be posted here. Please see this discussion. You should post these to /r/learnsql instead.
Thanks to ---sniff--- for the awesome header image!
/r/SQL
I've got the code below which is attempting to calculate the percentage between the previous value and current one.
I know dividing integers will return zero but regardless of how I change the values I don't ever get a figure back.
I've:
::numeric
shorthand100.00 * (ftp_delta / previous_ftp)
to force it to numeric, however, the result is still 0.00
WITH delta AS (
SELECT
date
, ftp
, (ftp - LAG(ftp, 1,0) OVER (ORDER BY date)) AS ftpDelta
, LAG(ftp, 1,0) OVER (ORDER BY date) AS previousFtp
FROM
bikes.t_measurements
)
SELECT
date
, ftp
, previousFtp
, ftpDelta
, ((DIV((ftpDelta::numeric), previousFtp::numeric)) * 100.0) AS percentageDelta
FROM
delta
WHERE
delta.ftpDelta != 0
AND delta.previousFtp != 0
ORDER BY
date ASC
I've included a screenshot of the results table in case that helps.
https://i.imgur.com/Pzkcguc.png
This started happening recently with my stored proc code. I've worked on it for ages without any issues. The code isn't even crazy long or anything, it's only 800 lines. It also executes fine, but it's while I'm typing or working on it that it constantly freezes and takes a relatively long time to unfreeze.
What could be causing it and how might I fix it?
The only change I made was I added a 50-line piece of code from my colleague, which outputs a table. It is relatively complex I suppose, but I just don't understand why a 50-line bit code being complex would mean that I literally cannot type anything anymore, yet can still execute the code just fine.
I'm on SQL Server Mgmt Studio 18, and I also have dbForge that my boss forced us to get. I don't even use it really, but maybe it's breaking things somehow?
The stored proc is more or less structured as
WITH
x as (
...
),
y as (
...
),
z as (
...
)
SELECT * FROM z WHERE ...
etc.
TL;DR;
The President of the company and the power user of the software walked into my office. They had been discussing a problem and had a suggestion command, delete all the data in the database before year 20XX. They told me not worry if it doesn't work and the test environment burns down. (I restore the test environment nightly through jobs)
Am I overthinking this, because I have the primary key and linkage that filters out everything prior to 20XX. I had embedded this in the software as it seems the proper way to filter that data.
They absolutely want the data deleted from some or all the tables. I can't tell which tables to attempt and which to ignore. I look at key setups and index. I know if I delete from a table and violate the referential integrity that nothing will be deleted.
TL;DR;
Having the key of interest, how do I delete its data in all tables containing its key, while honoring referential integrity? I can filter the data with SQL, but they don't want that solution and will slap me down if I try to mention it again.
Hello, I did a small(10 entities) ERD Diagram about Football Club Structure, but I'm not sure, that's correct 100%.
Write all mistakes, what I can do better etc.
I would be grateful! Maybe I would have some questions about ERD Diagrams later.
Hoping what I’m looking for exists. I’m looking for a book more about the underlying design of sql(or other common databases) than writing queries or doing db maintenance. Think ‘nand to Tetris’ course but for databases(and book form).
So I understand they speed up queries substantially and that it’s important to use them when joining but what are they actually and how do they work?
Hi all, I'm new to MySQL and while trying to run some code on it, it kept returning duplicated results. It was working fine earlier, but now whenever I use WHERE in my query it happens where I get 4x the actual result (shown below).
I have checked the original table without using WHERE many times and there are no duplicates so I'm confused as to why this is happening. I'm not sure if using WHERE even has anything to do with it, I think it might be a bug, but any help would be appreciated. Thank you!
Here's the second image showing it's just repeating itself or duplicating, so instead of just giving me 100ish rows of data it's giving me 460 rows.
Third image is just a clearer example where I used to ORDER BY to show how much it duplicated itself
If I provision an SQL database from services like DigitalOcean, Linode, Vultr, or AWS, and obtain the connection string, would that database be considered publicly unsafe, even though it requires a username and password for access? Additionally, if I use the connection string in my desktop app, is that okay or not? Do I need to secure it somehow from unauthorized access? What security measures should I take to ensure it's safe?
Thanks for your insights!
Hi all.
I have a query that is quite complex in PostgreSQL. By complex, I mean it has a lot of joins, counts, CTEs, etc. I also believe this is the best way to get my desired results with the current DDL I am working with. I won't paste the query itself, however I will show an anonymized part of it that seems to be where it is taking a long time. Just a little rundown:
1. The first CTE is a union between two tables, this part does not take a long time at all to execute.
2. The query that I perform on this CTE above, its goal is to retrieve the records belonging to a group with the highest date column.
This is what this looks like:
-- foo CTE: Union between two tables, different logic for joins / where, etc.
WITH foo AS (
SELECT
col1 AS col1,
col2 AS col2,
col3 AS col3,
col4 AS col4,
col5 AS col5
FROM tbl1 t1
JOIN j1 ...
JOIN j2 ...
WHERE
...
UNION ALL
SELECT
col1 AS col1,
ccol2 AS col2,
col3 AS col3,
col4 AS col4,
col5 AS col5
FROM tbl2 t2
JOIN j1 ...
JOIN j2 ...
WHERE
...
)
SELECT
DISTINCT ON (f.col1, f.col2, f.col3)
f.col1,
f.col2,
f.col3,
f.col5
FROM foo f
ORDER BY
f.col1,
f.col2,
f.col3,
f.col4 DESC -- This is where the logic is to pull the record having the highest col4 (date column), from the distinct group of col1, col2, and col3.
;
Here is where my issue is. Because of the complexity of the source data, a CTE seems the only viable approach. This CTE will return a couple million records, growing as time goes on, and the logic to do the distinct on / order by takes forever. I cannot index a CTE.
I've tried making that CTE a temporary table instead, but just creating the temporary table itself takes too long. I need this logic to be returned at endpoint level speeds, and the logic above is just one part of many albeit the part that is taking by far the longest, how can I achieve this?
Thank you!
I'm a newbie when it comes to SQL and technology so please bare with me. Here's my understanding so far with the example of Microsoft's SQL Server (developer/local version) and please let me know if there are any corrections or gaps in my understanding:
I understand SQL is a programming language for the client to communicate and interact with the database server to produce some result with the data whether it be view only or manipulating data.
On the other hand, my understanding of a Server is a computer that runs a program that fetches info to the client upon request and it usually has a primary role, i.e: a database server is a computer that primarily contains a program that hosts a database.
SQL Server, a Microsoft product, is one of the many SQL programs out there (MySQL, Postgres, etc.) which I believe is called a relational database management system (RDBMS). Then it gets a bit tricky because there is a supplemental front-end program called SQL Server Management Studio (SSMS) where you actually have functionalities to type or click, i.e: entering queries.
Okay now wrapping this all together does mean the SQL Server bundle including both SQL Server application and SSMS application is considered a co-existing server and client application? Because 1) I can host my own database on my own computer which I believe lives in the SQL Server program and 2) I am entering queries to access my own database information which is through the SSMS application / client application?
I am creating a database to track portable items, each of these items has a unique integer ID (QID). I will have multiple customers owning different amounts of these items, and I have a table in my database reserved for each customer (e.g. customer1: 1-27, customer2: 28-33 and so on).
when one of these items is identified, all that is taken from it is the integer ID, and I need to find which customer it belongs to, therefore I have created the table in image 1.
when I retrieve the ID of the item I dump the contents of the table from image 1 into a dictionary and perform a binary search on them manually after querying the database to find the range within which the retrieved ID is (image 2).
Given I am relatively beginner with SQL, I believe I may be missing a far more efficient way to do it than this but I was unsure whether or not SQL would be able to do a binary search on this data.
Any help would be very appreciated, just trying to work my head around trying to use indexing for this and other tables at the moment.
Thanks :)
Hi There, I am very new to SQL. I have a table that I would like to transform a column's data to a new field. I am trying to write a query and I would like to take the first part of the data text. Example and NotTesting and add to a new column called ReasonType.
ColumnName ReasonDescription
Data:
Testing123
Testing123
NotTesting123
NotTesting123
Result:
Testing
Testing
NotTesting
NotTesting
Am working on an exercise where I have to manually review and update data from multiple sets of entries that is non standardized. I am exploring for a solution that will allow me to extract the existing data and insert it into a field that is actionable. This will result is a process that will automatically update future entries from submitted data entries into the correct field that we can use for analysis. The current solution that we are using is too slow and process heavy. What would be the best approach to create a solution that will give me the desired outcomes. Thanks.
Little bored of seeing endless homework and interview questions, so I thought we could start a more fruitful conversation thread for those of us who aren’t passing exams or looking for a career change.
_Today I figured we could start with Cursors vs while loops.
Which do you prefer, and why? Which is more optimised, in your opinion. Or, which just looks nicer._
If this goes well I’d like to do more on other subjects such as:
Does anyone else have other ideas?
leet code problem : problem link
note please dont tell me the answer how to solve this problem
i just want to know why ordering working different here ?
please if anyone know about this please let me know
and i solved this problem using other methods but i want to know why it coming
Hi everyone,
I'm relatively new to SQL but am somewhat familiar with data transformations with Power Query. I'm having some trouble rewriting some M code with SQL and getting the joins to return the correct rows
I have the below SQL Server query
SELECT
LOWER(config.email) AS email,
system_assignments.system_assignment_id,
FROM security.configuration config
LEFT OUTER JOIN security.systems systems
ON config.permission_source = systems.system_name
AND config.access_level = systems.access_level
AND config.approved_data = systems.approved_data
LEFT OUTER JOIN security.system_assignments system_assignments
ON config.system_assignment_value = system_assignments.system_assignment_value
AND systems.system_id = system_assignments.system_id;
Which is based off of this Power Query code
let
Source = #"security configuration",
#"Merged Queries" = Table.NestedJoin(Source, {"permission_source", "approved_data", "access_level"}, systems, {"system_name", "approved_data", "access_level"}, "systems", JoinKind.LeftOuter),
#"Expanded systems" = Table.ExpandTableColumn(Table.Buffer(#"Merged Queries"), "systems", {"system_id"}, {"systems.system_id"}),
#"Merged Queries1" = Table.NestedJoin(#"Expanded systems", {"systems.system_id", "system_assignment_value"}, system_assignments, {"system_id", "system_assignment_value"}, "system_assignments", JoinKind.LeftOuter),
#"Expanded system_assignments" = Table.ExpandTableColumn(Table.Buffer(#"Merged Queries1"), "system_assignments", {"system_assignment_id"}, {"system_assignments.system_assignment_id"}),
#"Removed Other Columns" = Table.SelectColumns(#"Expanded system_assignments",{"email", "system_assignments.system_assignment_id"})
in
#"Removed Other Columns"
#"Expanded system_assignments"
The power query code gives me the exact result that I expect, while the SQL code gets a few rows wrong. The row counts at the end of both queries are the same.
I'm not interested in making the code more performant at the moment. I just need some assistance as to why the joins aren't performing the same way.
Is there any entry level certifications you guys recommend? I've had some experience with sql through work but I'd like to get more exposure with the end goal of being able to put it on my resume. I just want to strengthen my resume and be able to say "hey not an expert but got the basics down,"
I have an event, a spectator, and a form a spectator can fill out about an event. So let's say the events are sports games of various types (baseball, football, water polo, etc). A spectator can go to many games and a game can have many spectators. A spectator needs to submit a form for every game. What's the right way to define these relationships for good efficiency in my queries? Spectators should have the ability to see/edit their forms for every game they attended/plan on attending. Admins of the app would want to be able to view forms for every future/past event.
(Basically, I'm trying to figure out the right way to deal with the form. Should I add a relationship between the form and the event in addition to the spectator? Is that better for query optimization?)
Also, where do I go to learn the right way to design this kind of schema?
Hi! I am very very new to databases despite loving data and recently being hired to be in charge of a nonprofits data collection, reporting, etc.
I am currently working in Caspio to create a system for collecting, organizing, managing and reporting on data for a a group of specialists that provide trainings.
This is complicated and a lot of info but I am trying to explain the best way I can and ANY assistance (given to me like I'm five) would be very much appreciated. What I am struggling with is how to get all of the data to work together (ha!). Let me explain...
I have specialists - they create groups. Within each group is enrollment of a participant(s). The participant(s) attend classes (the classes are under the umbrella of a "group.") I am needing to track attendance of each participant that is enrolled in the group and that attends the classes.
I currently have these tables with primary IDs and relevant data:
Specialists -- Specialist .ID, Name, Email
Participant - Participant .ID, Name, Email, Language
A Specialist uses a form to create a Group, using the Group Table, and generates a new unique Group.ID. This table includes: Group .ID (PK), Specialist .ID (FK), Group Name, Language, Funder, etc.
Then, the specialist needs to enroll a participant to the group. And this is where it all gets tricky.
I have another table called Group_Enrollment to show groups and the groups' enrolled participants together (with Groups .ID as a foreign key and Participant .ID as a foreign key.) At this point I have several questions... if I were to use a trigger to have all of inserted data in Groups table be inserted into the Group_Enrollment table, would this automatically associate the Specialists table since the Groups table include the specialists .ID? Or would I need to do a join and include all of the fields for the specialist in order to include that information in the the Group_Enrollment table?
(Caspio does allow link relationships between tables using IDs and I do have those set up)
I have further tables... Specialists create "Classes" using the Classes table and creating a Class .ID. This table includes the Group .ID so that each class is associated with a parent group. But again, in order for the CLASSES table to have all of the info about a group that it needs, do I need to have a trigger that inserts all of the fields of the Group table or can I just input the Group .ID?
This is so long and just the tip of the assistance I need. But any help at this point is very appreciated.
Hi so I’m currently in the midst of an online data analyst course learning SQL, Python, and PowerBI. Currently finished the SQL portion of the course and know the basics…currently going through leetcode’s SQL50 as practice outside of my course however I’d like to practice stuff that I’d actually use in a data analyst role and need for interviews. Any recommendations on
I use MySQL Workbench and have a few months before I finish this course
Any tips and tricks are welcomed! :)
Hi all! In short, I’m trying to find when a customer is being billed for one product at multiple prices.
Some background: I’m using a single table that holds this information. I’ve been trying to find a way to have a query pull if a single customer is being billed different prices on one SKU. In the system I work in, accounts should have only one price per SKU, so I’m trying to find any SKUs with multiple price points per account.
Every account is completely different with what and how many SKUs are being billed, and pricing for SKUs is individualized for each account. There are thousands of accounts.
Attempts: I tried putting the same information into two temp tables (##1 & ##2) to then try to pull when ##1.customer=##2.customer and ##1.SKU=##2.SKU but ##1.price<>##2.price, but my system said there was too much data. Outside of that, everything else I’ve tried just pulls distinct data of every customer with more than one price on their accounts. Since accounts have more than one SKU and each SKU would have different pricing, this doesn’t pull the data I’m looking for.
Is it possible to retrieve only member id's that have both "xyz" and " abc" in the column rather one or the other? Issue is the set up has duplicate member id numbers in different rows. I don't need all of xyz or all of abc. I only want the member id that meets the condition of having both xyz and abc associated with it.
member id | type | |
---|---|---|
00000000 | xyz | |
00000000 | abc | |
Our SQL Server default instance has it's collection set to the default - Latin1_General_CI_AS
One of our Live Databases has it's collation set to SQL_Latin1_General_CP1_CI_AS
I need them to match as when we do the upgrade of our ERP system to the latest version it will fail as the collations need to match between the Live DB and the Database that is auto-generated with our newer ERP version.
From everything I have read online, it states that it's best to change the SQL Server Instance collation than to change the Database collation.
I asked ChatGPT and it came back with this:
https://chatgpt.com/share/67237bc2-77cc-8005-a35b-cbf8760a1973
Unsure if I should do that or do a complete reinstall of SQL?
I found a job which requires knowing SQL. I want to learn it but i have no idea where to start. I am pretty skilled with computers but i never even heard of SQL before. Should i watch tutorials should i just download it and try it? And can anyone explain the basics of it?
I've had a mostly non-tech job for the last few years although I do work with developers. In past positions I used to be pretty good at writing SQL for UIs and for ad hoc reporting mainly using Oracle DBs. Some of these queries were quite complex. I find myself missing it lately so I was wondering if companies hire/contract for just SQL support even if it pays less than "full stack" type jobs. I am not interested in learning Java, Python or anything non-SQL related.
Thanks for any advice.
Edit: Thanks for all the replies. This is one of the most helpful subreddits I have ever seen! Some other details - I have a couple decades of experience mainly with large health insurance companies and large banks. I should also have mentioned that I would need something that is 100% remote at this time. I know that may limit me even further, but that is the reality of my current situation.
Recently I'm trying to solve questions that require using window functions as well as pivot , ntile, percentile and more and often i have to write at least 5 CTEs before reaching the final query. So I was just wondering what is the amount of CTE you guys actually have to write in your working life daily.
In SQL Server, adding semicolons to the end of statements is optional, unfortunately.
Does anyone here have a good solution that can read an SQL script and magically place semicolons in the proper place? I don't need it to be 100% accurate. The scripts will have multiple SQL statements in them.
I have potentially thousands of scripts I need to perform this on and cannot use a LLM.
I've tried various formatters/liters, but I haven't had any luck. I hope the community here can help me.
,
I'm in the middle of a data migration and I need to search scripts for certain strings, and these strings can be in different rows. So I want to do a gaps and islands approach and search for these string occurrences between semicolons. For example, I need to search for "%INTO% and %Account% that exists in a single SQL statement within the script. Again, these scripts can have multiple SQL statements, so I need to be able to separate them out. I don't need this to be 100% accurate, but something is better than nothing.
I did write a Python script that adds semicolons based on certain rules, but there has to be something better than what I have.