BYOND & MySQL Question

Aug 27 2009, 12:06 am
Nadrew	Because it uses a more strict method of calling MySQL stuff. You should always use them whether you need to or not, it's just good practice.

Aug 27 2009, 12:12 am In response to Nadrew
AJX	Nadrew wrote: Because it uses a more strict method of calling MySQL stuff. You should always use them whether you need to or not, it's just good practice. Didn't know that. Good to know. Thanks.

Aug 27 2009, 12:17 am (Edited on Aug 27 2009, 12:36 am)

AJX

This is actually just a MySQL question, but I'm hoping Nadrew will have an answer.

I want to make a query that checks multiple columns for duplicates across any of the columns. And not just duplicates in the same column. If a row has the same data in column A as another row has in column B, I want it to return that row in the query.

I can't think of how this would done, or how to google the question to get any relevant results.

Aug 27 2009, 12:49 am In response to AJX
Jemai1	SELECT * FROM `yourtable` WHERE `yourcolumn1` LIKE 'yourstring' AND `yourcolumn2` = yournumber Something like that.

Aug 27 2009, 12:53 am In response to Jemai1
AJX	Those will check for duplicates in ONE column. That isn't what I need. I'd like to check to see if there are duplicates ACROSS multiple columns. Example: Row#: C1 \| C2 \| C3 Row1: 1 \| 2 \| 3 Row2: 4 \| 5 \| 6 Row3: 2 \| 7 \| 8 Row4: 9 \| 3 \| 10 The query would return Row 1 and Row 3 paired, because C1 and C2 match, and row 4 and cow 1 paired, because C2 and C3 match.

Aug 27 2009, 12:55 am In response to AJX
Jemai1	I edited my post if you didn't notice.

Aug 27 2009, 12:59 am In response to Jemai1
AJX	Jemai1 wrote: I edited my post if you didn't notice. I didn't, but that still isn't what I'm looking for. Your code checks to see if Column A has one string, and column B has another. 1: I'm looking for duplicate entries, not specific information. 2: The information can be found in any one of 5 columns, and I want it to return any matches across any of the columns.

Aug 27 2009, 1:27 am In response to AJX
Jemai1	Are you using php?

Aug 27 2009, 1:28 am In response to Jemai1
AJX	Jemai1 wrote: Are you using php? BYOND

Aug 27 2009, 8:27 am In response to AJX
Stephen001	Refactor your table design so it makes sense. That is my recommendation.

Aug 27 2009, 7:31 pm In response to AJX
Ripiz	for(var/A in table1) for(var/B in table2) if(A!=B && A.value==B.value) world<<"found duplicate" Hope this explains HOW, but it's not working code

Aug 27 2009, 7:42 pm In response to Ripiz
AJX	Ripiz wrote: Hope this explains HOW, but it's not working code 1: MySQL, not BYOND. 2: I could easily write a code to grab all the information, put it into BYOND, then find dups, but that would be too much resource work.

Aug 27 2009, 7:45 pm In response to AJX
Ripiz	MySQL doesn't support that function, without making your own loops. This link might help too: How to find dublicates in MySQL?

Aug 27 2009, 8:01 pm

In response to Ripiz

AJX

Ripiz wrote:

MySQL doesn't support that function, without making your own loops.
This link might help too: How to find dublicates in MySQL?

I have already discovered that there is no direct way to do what I was discussing. It is possible, but requires a hack that isn't worth the resources.

I already know how to find duplicates in MySQL. That is very simple. That wasn't my question. I found out how to do that, along with everything else I've done in MySQL, through Google. I asked here because I couldn't find anything on google, that was the point.

Aug 27 2009, 8:31 pm In response to AJX
Ripiz	It won't work without quadruple loop then

Aug 27 2009, 8:46 pm In response to Ripiz
AJX	Ripiz wrote: It won't work without quadruple loop then 5 Columns. 5 Loops. Anyway~ I already know how to program it that way. That isn't plausible. Thanks though, again...

Aug 27 2009, 9:46 pm

In response to Ripiz

Stephen001

Just for posterity, I will post the summary of a discussion AJX and myself had on this topic (my intentionally vague sweeping statements are usually designed for just that).

The solution to a duplication of data problem is not "How do I keep my tables consistent?" but "How do I design my tables so that I don't have to care?". MySQL DB is by nature a relational database, so what it excels at is storage and retrieval of data that is linked by relationships. Designing a good relationl database structure is actually not that much different from good OO, one table (class) per concept and use references to rows in other table instead of duplicating data.

Core to this relational mapping notion is PRIMARY KEY and FOREIGN KEY. A primary key is just a unique identifier for a row in a table. For instance if you have a "Client" table storing various client data, a good primary key for that table might be the client's ckey. By being a primary key, it'll only feature once in that column, so you can use it elsewhere.

To use this primary key in other tables, there are two requirements. First is a one-time setup thing. In order to use these references properly, your tables (Client and any table that references Client.ckey) need to use the InnoDB table engine. You can usually pick this in something like phpmyadmin, or just stick TYPE=INNODB on the end of your CREATE TABLE command. This is just because the standard table engine MyISAM is designed for quick and dirty access, not serious data integrity. The other requirement is that columns in other tables that reference a PRIMARY KEY (like Client.key) must be a FOREIGN KEY.

So okay,let's set up an example set of tables for AJX's scenario:

CREATE TABLE clients (ckey VARCHAR(255), somedata INT, PRIMARY KEY(ckey)) ENGINE=INNODB;

And here's an example of it with data:

+------------------------------------------+
|Client                                    |
|InnoDB                                    |
+-----------------------------+------------+
|ckey VARCHAR(255) PRIMARY KEY|somedata INT|
+-----------------------------+------------+
|Stephen001                   |12          |
|AJX                          |2           |
+-----------------------------+------------+

Now our second table will handle one of AJX's problems, how to map IP addresses to keys. Any one key can be linked to a number of IP addresses, and similarly and one IP address can be linked to any number of keys. AJX happens to store 5 IPs per key because of bad table design, we can actually store as many as we like using a relational design. Here's the table, I'll explain everything after:

CREATE TABLE IPMapping (ckey VARCHAR(255), ip CHAR(16), PRIMARY KEY (ckey, ip), FOREIGN KEY ckey REFERENCES Client(ckey) ON UPDATE CASCADE, ON DELETE CASCADE) ENGINE=INNODB;

And the table with data might be:

+------------------------------------------+
|IPMapping                                 |
|InnoDB                                    |
|Compound PRIMARY KEY (ckey, ip)           |
|ckey REFERENCES Client.ckey               |
| >  CASCADE ON UPDATE                     |
| >  CASCADE ON DELETE                     |
+-----------------------------+------------+
|ckey VARCHAR(255)            |ip CHAR(16) |
+-----------------------------+------------+
|Stephen001                   |192.168.0.1 |
|AJX                          |192.168.0.2 |
|AJX                          |192.168.0.1 |
+-----------------------------+------------+

Now I've thrown a few new concepts in there, but it's pretty easy. PRIMARY KEY (ckey, ip) means it's a compound key. Each row in the table must have a unique pair of ckey AND ip. What this means is I can insert as many ckey = AJX rows as I like, but I cannot insert another row of ckey = AJX, ip = 192.168.0.1. This is basically just to stop duplication. An insert on that would fail, and you should take that into account in your error handling. The smart stuff however is with the FOREIGN KEY. We are making the IPMapping.ckey column refer to Client.ckey. The reason we do this is partly so the database can optimise operations on those columns, but mostly for this CASCADE stuff.

ON UPDATE CASCADE means if we change the value of ckey on a row in Client, the new ckey will be applied to IPMapping.ckey too. So if I change AJX in Client.ckey to Ripiz, the IPMapping table will have two rows for Ripiz now with the IPs 192.168.0.2 and 192.168.0.1, and none for AJX. A row in a relational database table is a lot like a DM object, and a FOREIGN KEY reference is a lot like storing an object in another one, like say mob.client. If you changed a variable in client, the change is reflected in mob.client.

ON DELETE CASCADE is another nice trick. If we delete the row in Client with the primary key, it deletes the rows in IPMapping with that FOREIGN KEY. So if we delete the Stephen001 row in Client, the Stephen001 row in IPMapping alos goes. Once again, this is a lot like the garbage collector in BYOND, if you have an object referenced in another one, and null the referencing object, both will be cleaned up and deleted by BYOND. This makes sure you don't leave dangling references, and you can delete Client entries without worrying about another X many queries to delete all their referred data.

Now here's the really neat bit for AJX. By changing the table design, we can now store as many IP address to Key (and vice versa) mappings as we like! Not only that, but the query to get them is easy and requires not further BYOND processing to make sure it's correct. To get all the IP address attached to a ckey, it's just:

SELECT ip FROM IPMapping WHERE ckey = 'Stephen001';

Which returns 192.168.0.1. In AJX's case, it returns 192.168.0.2, 192.168.0.1. And conversely:

SELECT ckey FROM IPMapping WHERE ip = '192.168.0.2';

Returns AJX. In 192.168.0.1's case, it returns Stephen001, AJX.

He can repeat this process for computer-id's, and store as many mapping as he likes. The general concept is it's a Many-to-Many mappings table. Neat, huh?

Aug 27 2009, 11:28 pm

In response to AJX

Keeth

5 columns. 5 loops.

SELECT [primary_key_goes_here]
FROM `abc`
WHERE (
    c1=[c1] OR c2=[c1] OR c3=[c1] OR
    c1=[c2] OR c2=[c2] OR c3=[c2] OR
    c1=[c3] OR c2=[c3] OR c3=[c3]
)

If you couldn't tell, this will go through every row in the table abc. If the supplied c1, c2, or c3 values match any of this row's c1, c2, or c3 values, it will be selected.

Let's assume this is our table structure.

C1  |   C2  |   C3
1    |   2    |   3
4    |   5    |   6
2    |   7    |   8
9    |   3    |   10

If I wanted to check for conflicts against a group of numbers I already possess, we'd only need one query; the one posted just above. However, if we wanted to go through and check for conflicts with existing entries, we'd need something like this:

for(row in rows)
    var/DBQuery/query = dbcon.NewQuery({"SELECT [primary_key]
FROM `abc` 
WHERE [primery_key] != [row.primary_key]
AND (
    c1=[row.c1] OR c2=[row.c1] OR c3=[row.c1] OR
    c1=[row.c2] OR c2=[row.c2] OR c3=[row.c2] OR
    c1=[row.c3] OR c2=[row.c3] OR c3=[row.c3]
)"})

    query.Execute()

    if (!query.RowCount()) world << "No conflicts."
    else // do whatever you need here

Note, this is mostly psuedo-code. But anyway... what's going on here is pretty much the same as above, but now we're looping through all of the rows in the table and checking for conflicts against them. To do this, we're also modifying the query to ignore the current row. By doing this, we add:

WHERE [primery_key] != [row.primary_key]
AND ([conditionals go here])

This tells it that we don't want to check against any rows with the same primary key (AKA the same row) as the current row.

If I've severely misunderstood something, I apologize. But this seems to be exactly what you're looking for...

Aug 27 2009, 11:53 pm In response to Keeth
AJX	Yea that'd be it. Thanks. (But I have already redone my table structure to something more intelligent at the suggestion and instruction of Stephen.)

ID:151680 Aug 27 2009, 12:05 am
AJX	Whenever I use MySQL it seems I need to enclose any column names in "`"s, where in PHP I only need to do this if the column has spaces. Any idea on why that is?