Unix Technical Forum

Count unique data in column

This is a discussion on Count unique data in column within the MySQL forums, part of the Database Server Software category; --> Hi, I have an optimization question. Is there a way to count instances of each unique entry in a ...


Go Back   Unix Technical Forum > Database Server Software > MySQL

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-28-2008, 08:44 AM
kev
 
Posts: n/a
Default Count unique data in column

Hi,

I have an optimization question. Is there a way to count instances of
each unique entry in a column (not unique entries)? Let me explain
using the column count.

count
3
3
4
5
2
3
2

Question:
Is there a way to return an associative array of total instances of
unique entries?
Desired Result: {2}->2, {3}->3, {4}->1, {5}->1

The current script requests all data (SELECT *) and sorts them in a
script. I want to offset the count to the SQL server (good idea?).

Kev

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-28-2008, 08:44 AM
Bill Karwin
 
Posts: n/a
Default Re: Count unique data in column

kev wrote:
> Is there a way to return an associative array of total instances of
> unique entries?
> Desired Result: {2}->2, {3}->3, {4}->1, {5}->1


SELECT `count`, COUNT(`count`) FROM tablename GROUP BY `count`

This is made a little bit confusing by the fact that your column is
named `count`.

Regards,
Bill K.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-28-2008, 08:44 AM
kev
 
Posts: n/a
Default Re: Count unique data in column

Bill Karwin wrote:
> kev wrote:
> > Is there a way to return an associative array of total instances of
> > unique entries?
> > Desired Result: {2}->2, {3}->3, {4}->1, {5}->1

>
> SELECT `count`, COUNT(`count`) FROM tablename GROUP BY `count`
>
> This is made a little bit confusing by the fact that your column is
> named `count`.
>
> Regards,
> Bill K.


Query Result:

2 - 2
3 - 3
4 - 1
5 - 1

Simple! Thanks Bill.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-28-2008, 08:44 AM
kev
 
Posts: n/a
Default Re: Count unique data in column

I have a concern about my request on top of that since we are on the
topic of optimization.

What is a good way to approach SQL queries?

Should I connect to the server on every query? (looks like a bad idea
because you will increase server bandwidth assuming not localhost)

Example:
Start session

On every query: Connect => Query => Disconnect

End session

Should I use pconnect (OR connect and save connection handler) and
disconnect after the session has ended? How does the server handle the
event when users are on a pconnect and the browser is closed? (would
pconnect disconnect once session is over?) I do not want to leave
multiple connections open to timeout.

Example:
Start session => Connect

On every query: Query

End session = Disconnect

Feedback would be great. Just want to establish a standard. I am
currently using the first method but I am wondering if it should worry
about scalability. Thanks in advance

Kev

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-28-2008, 08:44 AM
Bill Karwin
 
Posts: n/a
Default Re: Count unique data in column

kev wrote:
> Should I connect to the server on every query?


No, I would not recommend this. Connecting costs a large amount of
overhead. It has to find the host, establish a socket, authenticate
your username and password, and set up other resources. This is
actually pretty expensive to do, and you should avoid doing it for every
query.

> Should I use pconnect (OR connect and save connection handler) and
> disconnect after the session has ended?


I assume you mean mysql_pconnect(), part of the PHP mysql package.
http://www.php.net/function.mysql-pconnect

Yes, this is a good thing to do for greater scalability. This reduces
the number of times that PHP needs to create a new connection to MySQL
The web server (Apache) that runs PHP manages the persisent connections
and lets them be re-used by later PHP requests.

However, if you use PHP in CGI mode, starting a new process for each PHP
request, the persistent connections won't last anyway.

> How does the server handle the
> event when users are on a pconnect and the browser is closed? (would
> pconnect disconnect once session is over?)


According to the manual, mysql_close() does not close connections opened
by mysql_pconnect(). The persistent database connections will be used
by later requests for different web sessions.

Also note that subsequent PHP requests are _not_ likely to use the same
database connection. So if you create connection-based things like
temporary tables or user variables, use them only during the same PHP
request, and then you should drop that table or set that variable to
NULL, or else some other web session will get to see it.

> I do not want to leave multiple connections open to timeout.


You can force MySQL to time out long connections using the
"interactive-timeout" option in the my.cnf (my.ini) configuration file.
The default for this timeout is 28800 seconds. See
http://dev.mysql.com/doc/refman/5.0/...variables.html

Regards,
Bill K.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-28-2008, 08:44 AM
Jerry Stuckle
 
Posts: n/a
Default Re: Count unique data in column

kev wrote:
> I have a concern about my request on top of that since we are on the
> topic of optimization.
>
> What is a good way to approach SQL queries?
>
> Should I connect to the server on every query? (looks like a bad idea
> because you will increase server bandwidth assuming not localhost)
>
> Example:
> Start session
>
> On every query: Connect => Query => Disconnect
>
> End session
>
> Should I use pconnect (OR connect and save connection handler) and
> disconnect after the session has ended? How does the server handle the
> event when users are on a pconnect and the browser is closed? (would
> pconnect disconnect once session is over?) I do not want to leave
> multiple connections open to timeout.
>
> Example:
> Start session => Connect
>
> On every query: Query
>
> End session = Disconnect
>
> Feedback would be great. Just want to establish a standard. I am
> currently using the first method but I am wondering if it should worry
> about scalability. Thanks in advance
>
> Kev
>


Kev,

I don't generally use pconnect() - it keeps connections open even if no one is
using the website (or php code). It might be find if you're running a site with
100K hits/hr. But none of mine do that.

When I need MySQL, I just connect at the start of the page and close at the end
of the page. Nice, clean, no muss, no fuss.

Additionally, if you're using shared hosting, persistent connections may be
disabled.


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 02-28-2008, 08:45 AM
Axel Schwenke
 
Posts: n/a
Default Re: Count unique data in column

Bill Karwin <bill@karwin.com> wrote:
> kev wrote:
>> Should I connect to the server on every query?

>
> No, I would not recommend this. Connecting costs a large amount of
> overhead. It has to find the host, establish a socket, authenticate
> your username and password, and set up other resources. This is
> actually pretty expensive to do, and you should avoid doing it for every
> query.


I agree that one should not connect for every single query.
OTOH MySQL connects are pretty cheap, probably cheaper than the
underlying TCP connection setup in the network stack.

>> Should I use pconnect (OR connect and save connection handler) and
>> disconnect after the session has ended?

>
> I assume you mean mysql_pconnect(), part of the PHP mysql package.
> http://www.php.net/function.mysql-pconnect
>
> Yes, this is a good thing to do for greater scalability.


No, it is not. Contrary, using mysql_pconnect() is considered harmful,
as it tends to hog the MySQL server with idle connections. A very
detailed (but German) discussion of the topic is here:

<http://kris.koehntopp.de/artikel/webtune/>

In a nutshell:

a) many connections:

This problem arises if you use different connections (that is <host,
user, password> tupels) to connect to the database. If your scripts
(*all* scripts running in one Apache instance) use N different
connections, PHP will use up to N * max_clients connections to the
database - independent from the real webserver load.

Contrary, if each of your scripts uses only M different connections
at a time (in most cases M=1) and refrains from using persistent
connections, you will have at most M * max_clients open connections
to the database. Plus, you will only reach that limit if your web-
server is fully loaded.

Idle connections will eat memory and file descriptors. Not good.


b) MySQL sessions have state

There are lots of objects in MySQL with session scope: locks, user
variables, temporary tables. Currently there is no way to reset a
session, so the best way to destroy those objects is to close the
connection. Additionally there is a hidden attribute - the selected
database (mysql_select_db(), USE <db>). If you re-use a MySQL
connection handle in PHP, you cannot know the selected database.
Therefore many database abstraction layers for PHP call
mysql_select_db() *mandatory* right before each mysql_query().

Forgotten locks may render your database useless. Forgotten temp.
tables will eat memory and/or disk space. User variables may spread
sensitive information between web applications. Very bad.


Conclusion: mysql_pconnect() solves a nonexistent problem. Also it
creates new problems. You should not use it.


XL
--
Axel Schwenke, Senior Software Developer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 02-28-2008, 08:45 AM
Bill Karwin
 
Posts: n/a
Default Re: Count unique data in column

Awesome! Thanks for the information, Axel.

Regards,
Bill K.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 02-28-2008, 08:46 AM
kev
 
Posts: n/a
Default Re: Count unique data in column


Axel Schwenke wrote:
> Bill Karwin <bill@karwin.com> wrote:
> > kev wrote:
> >> Should I connect to the server on every query?

> >
> > No, I would not recommend this. Connecting costs a large amount of
> > overhead. It has to find the host, establish a socket, authenticate
> > your username and password, and set up other resources. This is
> > actually pretty expensive to do, and you should avoid doing it for every
> > query.

>
> I agree that one should not connect for every single query.
> OTOH MySQL connects are pretty cheap, probably cheaper than the
> underlying TCP connection setup in the network stack.
>
> >> Should I use pconnect (OR connect and save connection handler) and
> >> disconnect after the session has ended?

> >
> > I assume you mean mysql_pconnect(), part of the PHP mysql package.
> > http://www.php.net/function.mysql-pconnect
> >
> > Yes, this is a good thing to do for greater scalability.

>
> No, it is not. Contrary, using mysql_pconnect() is considered harmful,
> as it tends to hog the MySQL server with idle connections. A very
> detailed (but German) discussion of the topic is here:
>
> <http://kris.koehntopp.de/artikel/webtune/>
>
> In a nutshell:
>
> a) many connections:
>
> This problem arises if you use different connections (that is <host,
> user, password> tupels) to connect to the database. If your scripts
> (*all* scripts running in one Apache instance) use N different
> connections, PHP will use up to N * max_clients connections to the
> database - independent from the real webserver load.
>
> Contrary, if each of your scripts uses only M different connections
> at a time (in most cases M=1) and refrains from using persistent
> connections, you will have at most M * max_clients open connections
> to the database. Plus, you will only reach that limit if your web-
> server is fully loaded.
>
> Idle connections will eat memory and file descriptors. Not good.
>
>
> b) MySQL sessions have state
>
> There are lots of objects in MySQL with session scope: locks, user
> variables, temporary tables. Currently there is no way to reset a
> session, so the best way to destroy those objects is to close the
> connection. Additionally there is a hidden attribute - the selected
> database (mysql_select_db(), USE <db>). If you re-use a MySQL
> connection handle in PHP, you cannot know the selected database.
> Therefore many database abstraction layers for PHP call
> mysql_select_db() *mandatory* right before each mysql_query().
>
> Forgotten locks may render your database useless. Forgotten temp.
> tables will eat memory and/or disk space. User variables may spread
> sensitive information between web applications. Very bad.
>
>
> Conclusion: mysql_pconnect() solves a nonexistent problem. Also it
> creates new problems. You should not use it.
>
>
> XL
> --
> Axel Schwenke, Senior Software Developer, MySQL AB
>
> Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
> MySQL User Forums: http://forums.mysql.com/


Good to know. Thanks for the info. Now I have to figure out how to pass
the SQL connection handle across pages/the session.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 02-28-2008, 08:46 AM
Jerry Stuckle
 
Posts: n/a
Default Re: Count unique data in column

kev wrote:
> Axel Schwenke wrote:
>
>>Bill Karwin <bill@karwin.com> wrote:
>>
>>>kev wrote:
>>>
>>>>Should I connect to the server on every query?
>>>
>>>No, I would not recommend this. Connecting costs a large amount of
>>>overhead. It has to find the host, establish a socket, authenticate
>>>your username and password, and set up other resources. This is
>>>actually pretty expensive to do, and you should avoid doing it for every
>>>query.

>>
>>I agree that one should not connect for every single query.
>>OTOH MySQL connects are pretty cheap, probably cheaper than the
>>underlying TCP connection setup in the network stack.
>>
>>
>>>>Should I use pconnect (OR connect and save connection handler) and
>>>>disconnect after the session has ended?
>>>
>>>I assume you mean mysql_pconnect(), part of the PHP mysql package.
>>>http://www.php.net/function.mysql-pconnect
>>>
>>>Yes, this is a good thing to do for greater scalability.

>>
>>No, it is not. Contrary, using mysql_pconnect() is considered harmful,
>>as it tends to hog the MySQL server with idle connections. A very
>>detailed (but German) discussion of the topic is here:
>>
>><http://kris.koehntopp.de/artikel/webtune/>
>>
>>In a nutshell:
>>
>>a) many connections:
>>
>>This problem arises if you use different connections (that is <host,
>>user, password> tupels) to connect to the database. If your scripts
>>(*all* scripts running in one Apache instance) use N different
>>connections, PHP will use up to N * max_clients connections to the
>>database - independent from the real webserver load.
>>
>>Contrary, if each of your scripts uses only M different connections
>>at a time (in most cases M=1) and refrains from using persistent
>>connections, you will have at most M * max_clients open connections
>>to the database. Plus, you will only reach that limit if your web-
>>server is fully loaded.
>>
>>Idle connections will eat memory and file descriptors. Not good.
>>
>>
>>b) MySQL sessions have state
>>
>>There are lots of objects in MySQL with session scope: locks, user
>>variables, temporary tables. Currently there is no way to reset a
>>session, so the best way to destroy those objects is to close the
>>connection. Additionally there is a hidden attribute - the selected
>>database (mysql_select_db(), USE <db>). If you re-use a MySQL
>>connection handle in PHP, you cannot know the selected database.
>>Therefore many database abstraction layers for PHP call
>>mysql_select_db() *mandatory* right before each mysql_query().
>>
>>Forgotten locks may render your database useless. Forgotten temp.
>>tables will eat memory and/or disk space. User variables may spread
>>sensitive information between web applications. Very bad.
>>
>>
>>Conclusion: mysql_pconnect() solves a nonexistent problem. Also it
>>creates new problems. You should not use it.
>>
>>
>>XL
>>--
>>Axel Schwenke, Senior Software Developer, MySQL AB
>>
>>Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
>>MySQL User Forums: http://forums.mysql.com/

>
>
> Good to know. Thanks for the info. Now I have to figure out how to pass
> the SQL connection handle across pages/the session.
>


Kev,

You can't pass connections across pages - the connection will automatically
close at the end of the PHP script if you don't close it yourself (it's a good
idea to close it, though).

But that's OK - you really don't want to keep the connection open while the user
goes to lunch. Just connect before your first MySQL and close the connection
after the last one (or at the end of the page).


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:40 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com